Skip to content

Latest commit

 

History

History
67 lines (45 loc) · 3.92 KB

kubernetes.md

File metadata and controls

67 lines (45 loc) · 3.92 KB

Kubernetes Runbooks

You should be familiar with Kubernetes (k8s). We use lots of Service, Deployment, Ingress and PersistentVolumeClaim objects along with a few others where needed. Our clusters run with RBAC on Google's Kubernetes Engine (GKE).

Links: infra-oss.moov.io | Google Cloud Status | GKE Dashboard

There are also several community guides for troubleshooting Kubernetes problems:

Useful Tools

Viewing Pod/Container logs

$ kubectl get pods -n infra  | grep kube-ingress
kube-ingress-index-5cb86955ff-md64n   1/1       Running   0          18m
kube-ingress-index-5cb86955ff-xdb5m   1/1       Running   0          18m

# --tail only shows the last N logs
# -f keeps tailing the pod/container stdout
$ kubectl logs -n infra [--tail 10] [-f] kube-ingress-index-5cb86955ff-xdb5m
...

See also: Viewing logs in Kubernetes

Viewing Logs with Loki / Grafana

Loki is a new log aggregation platform which attempts to transform logs into metric streams (with log information as labels). This project is new, but Grafana allows exploring, building dashboards, and alerts. Checkout the explore page showing paygate logs and the basic usage guide.

Rolling Pods / Containers

If you need to restart a Pod/Container simply list out the pods and issue kubectl delete:

$ kubectl get pods -n infra  | grep kube-ingress
kube-ingress-index-5cb86955ff-md64n   1/1       Running   0          18m
kube-ingress-index-5cb86955ff-xdb5m   1/1       Running   0          18m

$ kubectl delete pod -n infra kube-ingress-index-5cb86955ff-rtdms
pod "kube-ingress-index-5cb86955ff-rtdms" deleted

Node Sizing / Availability

Currently our Kubernetes cluster runs on preemptible instances which can terminate themselves in under 60s. We largely do this for cost savings before having a product, but will likely run a combination of permanent and preemptible nodes going forward. It's important to remember several guidelines: (Source)

  • Have a backup plan (permanent node pool)
  • Find unpopular instance sizes
    • If a new family comes out (i.e. m5) m4's might become cheaper and less requested.
  • Set a maximum bid price
  • Run multi-zone setups to avoid shortages in a single GCP zone

Emacs

chrisbarrett/kubernetes-el works with our setup. Talk to @adamdecaf for help.