仕事中の問題と解決メモ。

最近はPythonとGoogle Cloud Platformがメイン。株式会社ビズオーシャンで企画と開発運用、データ活用とか。https://github.com/uyamazak/

BigQuery障害 2016/11/09

復旧

日本時間13:07頃に復旧したもよう。

Google BigQuery Incident #18022

BigQuery Streaming API failing

Incident began at 2016-11-08 16:25 and ended at 2016-11-08 20:07 (all times are US/Pacific).

DATE TIME DESCRIPTION
Nov 08, 2016 20:21
The issue with the BigQuery Streaming API should have been resolved for all affected tables as of 20:07 US/Pacific. We will conduct an internal investigation of this issue and make appropriate improvements to our systems to prevent or minimize future recurrence. We will provide a more detailed analysis of this incident once we have completed our internal investigation.
Nov 08, 2016 20:00
We're continuing to work to restore the service to the BigQuery Streaming API. We will add an update at 20:30 US/Pacific with further information.
Nov 08, 2016 19:44
We are continuing to investigate the issue with BigQuery Streaming API. We will add an update at 20:00 US/Pacific with further information.
Nov 08, 2016 19:00
We have taken steps to mitigate the issue, which has led to some improvements. The issue continues to impact the BigQuery Streaming API and tables with a streaming buffer. We will provide a further status update at 19:30 US/Pacific with current details
Nov 08, 2016 18:30
We are continuing to investigate the issue with BigQuery Streaming API. The issue may also impact tables with a streaming buffer, making them inaccessible. This will be clarified in the next update at 19:00 US/Pacific with current details.
Nov 08, 2016 18:00
We are still investigating the issue with BigQuery Streaming API. There are no other details to share at this time but we are actively working to resolve this. We will provide another status update by 18:30 US/Pacific with current details.
Nov 08, 2016 17:30
We are still investigating the issue with the BigQuery Streaming API. Current data indicates that all projects are affected by this issue. We will provide another status update by 18:00 US/Pacific with current details.
Nov 08, 2016 17:28
We are investigating an issue with the BigQuery Streaming API. We will provide more information by 17:30 US/Pacific.

2016/11/9の10時過ぎ、jupyter上でBigQueryのデータを取ろうとしたらいつもと違うエラー。

WEBから見ようとしても下記エラーが出てクエリができない。
古いテーブルは見れたりする。

Error: Something went wrong with the table you queried. Contact the table owner for assistance.

検索したらインシデントに出てきた

Google BigQuery Incident #18022

BigQuery Streaming API failing

Incident began at 2016-11-08 16:25 (all times are US/Pacific).

Nov 08, 2016 18:00
We are still investigating the issue with BigQuery Streaming API. There are no other details to share at this time but we are actively working to resolve this. We will provide another status update by 18:30 US/Pacific with current details.

Nov 08, 2016 17:30
We are still investigating the issue with the BigQuery Streaming API. Current data indicates that all projects are affected by this issue. We will provide another status update by 18:00 US/Pacific with current details.

Nov 08, 2016 17:28
We are investigating an issue with the BigQuery Streaming API. We will provide more information by 17:30 US/Pacific.

https://status.cloud.google.com/incident/bigquery/18022

Twitter

twitter.com


自社でデータを集めているプロジェクトoceanusでは、GKEでローカルネットワークにRedisを置いてあり、WEBサーバーはBigQueryではなくRedisに書き込んでいる。

別プロセス(別コンテナ)がRedisから取り出してBigQueryに書き込んでおり、BigQueryにつながらない間は溜まる形になっているので、今回の障害でデータの損失は設計上無いはず。

当初は、直接BigQueryだと遅いし、リトライで実行時間が増えるなど、レスポンス時間短縮のための設計だったけど、役立った。

BigQueryの書き込み失敗は頻繁にあり、リトライする設計は必須だけど、今回のような長時間のダウンはなかなか無いので良いテストになりそう。

ぱっと見た感じ頻繁にログ吐きすぎてたから、修正が必要。


Google BigQueryではじめる自前ビッグデータ処理入門

Google BigQueryではじめる自前ビッグデータ処理入門