GKEでKubernetesのアップグレードを無停止で行う手順

通常のimageの変更であれば、無停止でアップグレード、デプロイができるkubernetesだけどGoogle Container Engine（GKE）で、Kubernetes自体のアップグレードを行うとダウンタイムが発生してしまう。

1.4でいろいろと機能が増えたり、Image typeがcontainer-vmからgciというのに変わり、パフォーマンスも上がるようなので、変えたいところ。今後マルチリージョンなども考えているので手順を作った。

This node pool will use the "gci" node image, which replaces "container-vm" as the default node image in Kubernetes version 1.4. The "gci" node image provides better security and performance but has limitations that may affect some users.

ざっくり

別のコンテナクラスタを作る
既存と同じ構成でdeployment、serviceを立ち上げる
ロードバランサーのバックエンドサービスに追加
様子をみて問題がなかったら古い方を消す。

クラスタ作り

普通に管理画面から作るか、gcloudコマンドラインで作る

https://console.cloud.google.com/kubernetes

既存と同じく、n1-standard-1*2で作った

gcloud container \
--project "{プロジェクト名}" clusters create "{新しいクラスタ名}" \
--zone "asia-east1-b" \
--machine-type "n1-standard-1" \
--scope "https://www.googleapis.com/auth/compute","https://www.googleapis.com/auth/devstorage.read_write","https://www.googleapis.com/auth/taskqueue","https://www.googleapis.com/auth/bigquery","https://www.googleapis.com/auth/datastore","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
--num-nodes "2" \
--network "default" \
--enable-cloud-logging \
--enable-cloud-monitoring \
--enable-autoscaling \
--min-nodes "2" \
--max-nodes "5"

既存のyamlを書き出し

serviceとdeploymentをyamlで書き出す

kubectl get services -o yaml > ファイル名
kubectl get deployment -o yaml > ファイル名

新規作成時には不要なものが多く含まれるので、公式のものなどを参考に必要なもの以外は消していく。

{
  "kind": "Service",
  "apiVersion": "v1",
  "metadata": {
    "name": string
  },
  "spec": {
    "ports": [{
      "port": int,
      "targetPort": int
    }],
    "selector": {
      string: string
    },
    "type": "LoadBalancer",
    "loadBalancerSourceRanges": [
      "10.180.0.0/16",
      "10.245.0.0/24"
    ]
  }
}

Kubernetes - Service Operations

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80

Kubernetes - Deployments

またserviceにはkubernetesのが含まれるのでまるごと消す。

書き出したyamlでcreate

古いものの書き出しが終わったら、gcloudコマンドで、新しいクラスタにつながるようにする。

gcloud config set container/cluster {クラスタ名}
gcloud container clusters get-credentials --zone {ゾーン名} {クラスタ名}

gcloud config list

で変わっているか確認できる

永続ディスクや、環境変数の設定が必要な場合は、kubectl edit deployment {name}で変えておく。

ファイルからそれぞれ読み込む。

kubectl create -f ファイル名

DBなどIPが環境変数に必要な場合は、serviceからやってIPが決まってから、deploymentのyamlを書き換えた方が楽。

動作確認

動作確認には、本番のHTTP(S)ロードバランサーが使えないので、Kubernetesのロードバランサーを使う

kubectl expose deployment {公開するdeployment} --type=LoadBalancer --name=testlb

serviceが追加されるのでkc get serviceでEXTERNAL-IPを確認してアクセス。EXTERNAL-IPが振られるまでには１，２分かかる

問題がなければkubectl delete service testlbで消しておく。

ロードバランサーにつなげる

管理画面から、
ネットワーキング
→負荷分散
→ロードバランサー
→編集
→バックエンドの設定
→バックエンドの追加
で新しく作ったインスタンスグループを選択する

同じ構成であれば、ヘルスチェックも同じのが使えるはず。

本番のドメインにアクセスして、どちらかに繋がったのか判別するのが難しい。新、旧でバージョンをログに出すなどの工夫が必要そう。

この手順、yamlを確立しておけば、スケールアウトや、別リージョンへの移動など（早く東京リージョンに移したい）も安心して行える。

GAミント至上主義

Web Monomaniacal Developer.