Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod stuck in terminating state #774

Closed
flbla opened this issue Mar 2, 2023 · 10 comments
Closed

Pod stuck in terminating state #774

flbla opened this issue Mar 2, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@flbla
Copy link

flbla commented Mar 2, 2023

Describe the bug
I upgraded from 0.14.0 to 1.0.0-beta.0 (also tried with 1.0.0-alpha.0)

My pod has this annotation : azure.workload.identity/inject-proxy-sidecar: "true"
and this label : azure.workload.identity/use: "true"

The service account has this label : azure.workload.identity/use: "true" and this annotation : azure.workload.identity/client-id: xxxxx

When I try to delete the pod, it stay in Terminating state.
If I rollback to 0.14.0 it works without any issue

Steps To Reproduce
Deploy the AAD WI 1.0.0-beta.0 with Helm
Deploy a pod with this annotation : azure.workload.identity/inject-proxy-sidecar: "true" and this label : azure.workload.identity/use: "true"

Environment

  • Kubernetes version (use kubectl version): 1.25.5
  • Cloud provider or hardware configuration: Azure AKS
@flbla flbla added the bug Something isn't working label Mar 2, 2023
@flbla
Copy link
Author

flbla commented Mar 2, 2023

it looks like this bug #773

@aramase
Copy link
Member

aramase commented Mar 2, 2023

@flbla #773 is specifically for jobs. There is no construct to tell the sidecar to exit when the main container in the pod terminates. Can you provide the kubectl describe output of the pod with the proxy sidecar and also the logs from each of the container?

@aramase
Copy link
Member

aramase commented Mar 2, 2023

I just tried a pod with proxy using v1.0.0-beta.0 and the pod terminates without any issues when deleted.

@flbla
Copy link
Author

flbla commented Mar 3, 2023

velero-5f7d856778-kwpnr 2/2 Terminating 0 8m32s

Webhook controller logs when I did the delete pod :

azure-wi-webhook-controller-manager-67d569b889-5ggwm manager {"level":"debug","timestamp":"2023-03-03T08:25:53.082181Z","logger":"controller-runtime.webhook.webhooks","caller":"/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/http.go:96$admission.(*Webhook).ServeHTTP","message":"received request","webhook":"/mutate-v1-pod","UID":"ff0a5f98-6e3e-403e-8d57-2ba66d5531e2","kind":"/v1, Kind=Pod","resource":{"group":"","version":"v1","resource":"pods"}}
azure-wi-webhook-controller-manager-67d569b889-5ggwm manager {"level":"debug","timestamp":"2023-03-03T08:25:53.189337Z","logger":"controller-runtime.webhook.webhooks","caller":"/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/http.go:143$admission.(*Webhook).writeAdmissionResponse","message":"wrote response","webhook":"/mutate-v1-pod","code":200,"reason":"","UID":"ff0a5f98-6e3e-403e-8d57-2ba66d5531e2","allowed":true}

velero pod describe :

k describe pod velero-5f7d856778-9lt4v
Name:                      velero-5f7d856778-9lt4v
Namespace:                 kube-infra
Priority:                  140000
Priority Class Name:       user-medium
Node:                      
Start Time:                Fri, 03 Mar 2023 09:23:48 +0100
Labels:                    app.kubernetes.io/instance=velero
                           app.kubernetes.io/managed-by=Helm
                           app.kubernetes.io/name=velero
                           azure.workload.identity/use=true
                           helm.sh/chart=velero-3.1.2
                           name=velero
                           pod-template-hash=5f7d856778
Annotations:               azure.workload.identity/inject-proxy-sidecar: true
                           cni.projectcalico.org/containerID: d7ef3ec9559f65d9c918d823d24d997fcd3e851a3d5b77700ef205f24ab3967d
                           cni.projectcalico.org/podIP: 100.64.1.7/32
                           cni.projectcalico.org/podIPs: 100.64.1.7/32
                           kubernetes.io/limit-ranger:
                             LimitRanger plugin set: cpu, memory request for init container azwi-proxy-init; cpu, memory limit for init container azwi-proxy-init
                           prometheus.io/path: /metrics
                           prometheus.io/port: 8085
                           prometheus.io/scrape: true
                           vpaObservedContainers: velero, azwi-proxy
                           vpaUpdates:
                             Pod resources updated by autoscale-velero: container 0: cpu request, memory request, cpu limit, memory limit; container 1: memory request,...
Status:                    Terminating (lasts <invalid>)
Termination Grace Period:  3600s
IP:                        100.64.1.7
IPs:
  IP:           100.64.1.7
Controlled By:  ReplicaSet/velero-5f7d856778
Init Containers:
  velero-plugin-for-microsoft-azure:
    Container ID:   containerd://6fb49dfd1547463817d3a7f5c7c679ca613b39c362cad227b84bdcfd20354b2d
    Image:          velero-plugin-for-microsoft-azure:v1.5.1
    Image ID:       velero-plugin-for-microsoft-azure@sha256:b2d7e000e64f68708f2494b3f4f59d56104dd3b3dc0e485b5f6569888f8b6a37
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 03 Mar 2023 09:23:50 +0100
      Finished:     Fri, 03 Mar 2023 09:23:50 +0100
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  200M
    Requests:
      cpu:     100m
      memory:  100M
    Environment:
      AZURE_CLIENT_ID:             a7f34c7e-4cf5-4d9b-9268-050cd48178fc
      AZURE_TENANT_ID:             4a7c8238-5799-4b16-9fc6-9ad8fce5a7d9
      AZURE_FEDERATED_TOKEN_FILE:  /var/run/secrets/azure/tokens/azure-identity-token
      AZURE_AUTHORITY_HOST:        https://login.microsoftonline.com/
    Mounts:
      /target from plugins (rw)
      /var/run/secrets/azure/tokens from azure-identity-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-94f4c (ro)
  azwi-proxy-init:
    Container ID:   containerd://76b506ba3708d9505c2ede6c4fc06fbd579b2a78f00da4c011396227a7db68bc
    Image:          mcr.microsoft.com/oss/azure/workload-identity/proxy-init:v1.0.0-beta.0
    Image ID:       mcr.microsoft.com/oss/azure/workload-identity/proxy-init@sha256:e31b6f2d4aad5ee592b069c48c69d6775f17104f5a422528b401e16eeeeb393e
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 03 Mar 2023 09:23:54 +0100
      Finished:     Fri, 03 Mar 2023 09:23:54 +0100
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  200M
    Requests:
      cpu:     100m
      memory:  100M
    Environment:
      PROXY_PORT:                  8000
      AZURE_CLIENT_ID:             a7f34c7e-4cf5-4d9b-9268-050cd48178fc
      AZURE_TENANT_ID:             4a7c8238-5799-4b16-9fc6-9ad8fce5a7d9
      AZURE_FEDERATED_TOKEN_FILE:  /var/run/secrets/azure/tokens/azure-identity-token
      AZURE_AUTHORITY_HOST:        https://login.microsoftonline.com/
    Mounts:
      /var/run/secrets/azure/tokens from azure-identity-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-94f4c (ro)
Containers:
  velero:
    Container ID:  containerd://5a2078a3c369a7b9765509e4bbfa891f0066c5598c025f40f5390e30b2c04eb7
    Image:         velero:v1.10.0
    Image ID:      velero@sha256:702312f01639fea8b5a8711c6e00107e30fa5c969fb2a709d789590f47a132e5
    Port:          8085/TCP
    Host Port:     0/TCP
    Command:
      /velero
    Args:
      server
      --uploader-type=restic
      --backup-sync-period=60m
      --store-validation-frequency=60m
    State:          Running
      Started:      Fri, 03 Mar 2023 09:23:58 +0100
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     250m
      memory:  625Mi
    Requests:
      cpu:     50m
      memory:  131072k
    Environment:
      VELERO_SCRATCH_DIR:          /scratch
      VELERO_NAMESPACE:            kube-infra (v1:metadata.namespace)
      LD_LIBRARY_PATH:             /plugins
      AZURE_CREDENTIALS_FILE:      /credentials/cloud
      AZURE_CLIENT_ID:             a7f34c7e-4cf5-4d9b-9268-050cd48178fc
      AZURE_TENANT_ID:             4a7c8238-5799-4b16-9fc6-9ad8fce5a7d9
      AZURE_FEDERATED_TOKEN_FILE:  /var/run/secrets/azure/tokens/azure-identity-token
      AZURE_AUTHORITY_HOST:        https://login.microsoftonline.com/
    Mounts:
      /credentials from cloud-credentials (rw)
      /plugins from plugins (rw)
      /scratch from scratch (rw)
      /var/run/secrets/azure/tokens from azure-identity-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-94f4c (ro)
  azwi-proxy:
    Container ID:  containerd://70f598f38a1a66944a43f6e24ee413b6860739dec3002f43adc6e20b3b2ac7be
    Image:         mcr.microsoft.com/oss/azure/workload-identity/proxy:v1.0.0-beta.0
    Image ID:      mcr.microsoft.com/oss/azure/workload-identity/proxy@sha256:3d6c129918ac9d3dcf30d1c21b120d66cb2c370f85fcd8d8cd1ffc09fbd7a4c6
    Port:          8000/TCP
    Host Port:     0/TCP
    Args:
      --proxy-port=8000
      --log-level=info
    State:          Running
      Started:      Fri, 03 Mar 2023 09:24:00 +0100
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     12m
      memory:  131072k
    Requests:
      cpu:     12m
      memory:  131072k
    Environment:
      AZURE_CLIENT_ID:             a7f34c7e-4cf5-4d9b-9268-050cd48178fc
      AZURE_TENANT_ID:             4a7c8238-5799-4b16-9fc6-9ad8fce5a7d9
      AZURE_FEDERATED_TOKEN_FILE:  /var/run/secrets/azure/tokens/azure-identity-token
      AZURE_AUTHORITY_HOST:        https://login.microsoftonline.com/
    Mounts:
      /var/run/secrets/azure/tokens from azure-identity-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-94f4c (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  cloud-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  velero
    Optional:    false
  plugins:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  scratch:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-94f4c:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
  azure-identity-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3600
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason           Age    From               Message
  ----     ------           ----   ----               -------
  Normal   Scheduled        2m39s  default-scheduler  Successfully assigned kube-infra/velero-5f7d856778-9lt4v to 
  Normal   Pulling          2m40s  kubelet            Pulling image "velero-plugin-for-microsoft-azure:v1.5.1"
  Normal   Pulled           2m38s  kubelet            Successfully pulled image "velero-plugin-for-microsoft-azure:v1.5.1" in 1.28249213s
  Normal   Created          2m38s  kubelet            Created container velero-plugin-for-microsoft-azure
  Normal   Started          2m38s  kubelet            Started container velero-plugin-for-microsoft-azure
  Normal   Pulling          2m37s  kubelet            Pulling image "mcr.microsoft.com/oss/azure/workload-identity/proxy-init:v1.0.0-beta.0"
  Normal   Pulled           2m34s  kubelet            Successfully pulled image "mcr.microsoft.com/oss/azure/workload-identity/proxy-init:v1.0.0-beta.0" in 3.400730805s
  Normal   Created          2m34s  kubelet            Created container azwi-proxy-init
  Normal   Started          2m34s  kubelet            Started container azwi-proxy-init
  Normal   Pulling          2m32s  kubelet            Pulling image "velero:v1.10.0"
  Normal   Pulling          2m30s  kubelet            Pulling image "mcr.microsoft.com/oss/azure/workload-identity/proxy:v1.0.0-beta.0"
  Normal   Created          2m30s  kubelet            Created container velero
  Normal   Started          2m30s  kubelet            Started container velero
  Normal   Pulled           2m30s  kubelet            Successfully pulled image "velero:v1.10.0" in 2.038785075s
  Normal   Pulled           2m28s  kubelet            Successfully pulled image "mcr.microsoft.com/oss/azure/workload-identity/proxy:v1.0.0-beta.0" in 1.651778218s
  Normal   Created          2m28s  kubelet            Created container azwi-proxy
  Normal   Started          2m28s  kubelet            Started container azwi-proxy
  Normal   Killing          35s    kubelet            Stopping container velero
  Normal   Killing          35s    kubelet            Stopping container azwi-proxy

proxy logs :

velero-5f7d856778-kwpnr azwi-proxy {"level":"info","timestamp":"2023-03-03T08:25:59.453392Z","logger":"proxy","caller":"/workspace/pkg/proxy/proxy.go:97$proxy.(*proxy).Run","message":"starting the proxy server","port":8000,"userAgent":"azure-workload-identity/proxy/v1.0.0-beta.0 (linux/amd64) a8fe94e/2023-03-01-22:17"}
velero-5f7d856778-kwpnr azwi-proxy {"level":"info","timestamp":"2023-03-03T08:26:02.153210Z","logger":"proxy","caller":"/workspace/pkg/proxy/proxy.go:178$proxy.(*proxy).readyzHandler","message":"received readyz request","method":"GET","uri":"/readyz"}
velero-5f7d856778-kwpnr azwi-proxy {"level":"info","timestamp":"2023-03-03T08:26:06.249334Z","logger":"proxy","caller":"/workspace/pkg/proxy/proxy.go:107$proxy.(*proxy).msiHandler","message":"received token request","method":"GET","uri":"/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=a7f34c7e-4cf5-4d9b-9268-050cd48178fc&resource=https%3A%2F%2Fmanagement.azure.com%2F"}
velero-5f7d856778-kwpnr azwi-proxy {"level":"info","timestamp":"2023-03-03T08:26:09.053250Z","logger":"proxy","caller":"/workspace/pkg/proxy/proxy.go:134$proxy.(*proxy).msiHandler","message":"successfully acquired token","method":"GET","uri":"/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=a7f34c7e-4cf5-4d9b-9268-050cd48178fc&resource=https%3A%2F%2Fmanagement.azure.com%2F"}

the end of the velero container logs :

velero-5f7d856778-kwpnr velero time="2023-03-03T08:33:37Z" level=info msg="Stopping and waiting for webhooks" logSource="/go/pkg/mod/github.com/bombsimon/logrusr/[email protected]/logrusr.go:108"
velero-5f7d856778-kwpnr velero time="2023-03-03T08:33:37Z" level=info msg="Wait completed, proceeding to shutdown the manager" logSource="/go/pkg/mod/github.com/bombsimon/logrusr/[email protected]/logrusr.go:108"

and an other describe on the pod velero :

  Normal   Killing          8m32s  kubelet            Stopping container velero
  Normal   Killing          8m32s  kubelet            Stopping container azwi-proxy

and still stuck :
velero-5f7d856778-kwpnr 2/2 Terminating 0 17m

@aramase
Copy link
Member

aramase commented Mar 3, 2023

Webhook controller logs when I did the delete pod :

The azwi webhook is not on the delete path for pods. It's only for pod CREATE

- apiGroups:
- ""
apiVersions:
- v1
operations:
- CREATE
resources:
- pods

Is the kubectl describe pod output for a pod that's terminating? It seems like both the containers in the pod are Stopping container. The velero pod seems to be stuck waiting for webhooks?

@flbla
Copy link
Author

flbla commented Mar 6, 2023

The azwi webhook is not on the delete path for pods. It's only for pod CREATE

indeed, probably the creation of a new pod (as I deploy Kub Deployment)

yes, it's the output of a terminating pod, containers are in "stopping" state, but never stopped.
I don't know, if I deploy velero with AAD WI 0.14.0 it's working.
velero pods are almost instantly stopped if I delete them, but with 1.0.0-alpha/beta they are in this stuck state.
I didn't try with other application yet.

@aramase
Copy link
Member

aramase commented Mar 6, 2023

yes, it's the output of a terminating pod, containers are in "stopping" state, but never stopped.

The velero container seems to be Running too. Could you check the kubelet logs on the node?

I didn't try with other application yet.

Could you try with other applications? I tried in my cluster with different versions and can't repro pod termination issues.

@aramase
Copy link
Member

aramase commented Mar 23, 2023

@flbla Any updates re: #774 (comment)?

@flbla
Copy link
Author

flbla commented Mar 24, 2023

Didn't had time yet to check
Will try to check this next week

@aramase
Copy link
Member

aramase commented Mar 24, 2023

Didn't had time yet to check Will try to check this next week

Sounds good! I'm going to close the issue with #774 (comment). Please feel free to reopen if you have any questions or confirm the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants