You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #223 we see that evicted user pods will cause a user to have a faulty routing and be unable to login, as the spawner does not realize the user pod is in bad shape, and can only be corrected by a hub restart.
I think I have found a solution to this, but first I want to share what I've learned about a pod's "status".
Theory
A pods status
What you see here under "STATUS", written out by kubectl get pods, is actually the a ContainerStatus's reason.
status.phase
The phase is easy to overview, but it is not what you see if you write kubectl get pods even though you will recognize Pending and Running.
status.containerStatuses.[0].state / lastState
This is what you actually see when you write kubectl get pods in the STATUS field. There are three kinds of states: Running, Terminated, Waitining. Both Terminated and Waiting has a reason field along with a message field.
ifctr_statisNone: # No status, no container (we hope)
# This seems to happen when a pod is idle-culled.
return1
forcinctr_stat:
# return exit code if notebook container has terminated
ifc.name=='notebook':
ifc.state.terminated:
# call self.stop to delete the pod
ifself.delete_stopped_pods:
yieldself.stop(now=True)
returnc.state.terminated.exit_code
break
# None means pod is running or starting up
returnNone
The code's execution logic
Is the pod phase Pending? Do nothing.
If not, does the notebook container lack a state? Do something!!!
If not, is the notebook container a terminated state? Do something!!!
Else, do nothing.
I think we can do something here to fix #223, but I'm not sure what, because I have not been able to figure out how status.phase and status.containerStatuses[<the notebook container>].state will behave if we have an Evicted pod for example.
Suggested change and action plan
Perhaps we should delete pods that are in the Succeeded and Failedstatus.phase? That would probably make routes etc for users having pods with a kubectl get pods "STATUS" of Completed or Evicted be deleted properly and be able to respawn without needing the hub to restart.
Figure out that value the pod.phase and container state reason will have for a pod eviction!
Add some logging or similar to kubespawner, install and run that version.
Spawn a user and make it run out of memory and get evicted somehow, fork bomb or set a very narrow limit.
Inspect the hub's logs where kubespawner logs will be shown.
Document info about an evicted pod
kubectl get pod --namespace <my-namespace><name-of-evicted-pod> --output yaml
kubectl describe pod --namespace <my-namespace><name-of-evicted-pod>
Concrete questions I'd like answered
When is the containerStatuses array None and what is the status.phase when it happens?
ctr_stat=data.status.container_statusesifctr_statisNone: # No status, no container (we hope)# This seems to happen when a pod is idle-culled.return1
What values can reason take? What is the status.phase when a container is found in terminated state?
We should log something about c.state.terminated.reason as well as data.status.phase when c.state.terminated is truthy.
ifc.state.terminated:
# call self.stop to delete the podifself.delete_stopped_pods:
yieldself.stop(now=True)
returnc.state.terminated.exit_code
References
By looking at the PodStatus object, you can inspect nested resources like the phase field, or the containerStatuses array of ContaerinStatus etc...
I just came across this because I was looking at orphaned, evicted pods on mybinder.org.
Using a pod with this state:
Status: Failed
Reason: Evicted
Message: The node was low on resource: memory. Container notebook was using 1900004Ki, which exceeds its request of 471859200.
The KubeSpawner logs show that the Spawner does notice that the pod has stopped and treat it as a failure:
[W 2021-11-20 20:47:31.401 JupyterHub base:1072] User jupyterlab-jupyterlab-demo-cmqn27qt server stopped, with exit [I 2021-11-20 20:47:31.401 JupyterHub proxy:309] Removing user jupyterlab-jupyterlab-demo-cmqn27qt from proxy (/user/jupyterlab-jupyterlab-demo-cmqn27qt/)
which means that the more severe problem that prompted this Issue may be resolved (I haven't been able to figure out the time between eviction and noticing that it stopped). But the pod is still not deleted for some reason.
Intro
In #223 we see that evicted user pods will cause a user to have a faulty routing and be unable to login, as the spawner does not realize the user pod is in bad shape, and can only be corrected by a hub restart.
I think I have found a solution to this, but first I want to share what I've learned about a pod's "status".
Theory
A pods status
What you see here under "STATUS", written out by
kubectl get pods
, is actually the a ContainerStatus'sreason
.status.phase
The phase is easy to overview, but it is not what you see if you write

kubectl get pods
even though you will recognizePending
andRunning
.status.containerStatuses.[0].state / lastState
This is what you actually see when you write

kubectl get pods
in the STATUS field. There are three kinds of states:Running
,Terminated
,Waitining
. BothTerminated
andWaiting
has areason
field along with amessage
field.Issue analysis
Inspect this code
kubespawner/kubespawner/spawner.py
Lines 1316 to 1332 in 472a662
The code's execution logic
Pending
? Do nothing.state
? Do something!!!I think we can do something here to fix #223, but I'm not sure what, because I have not been able to figure out how
status.phase
andstatus.containerStatuses[<the notebook container>].state
will behave if we have an Evicted pod for example.Suggested change and action plan
Perhaps we should delete pods that are in the
Succeeded
andFailed
status.phase
? That would probably make routes etc for users having pods with akubectl get pods
"STATUS" ofCompleted
orEvicted
be deleted properly and be able to respawn without needing the hub to restart.Ping @minrk @betatim @choldgraf !
Things to learn / document
Concrete questions I'd like answered
status.phase
when it happens?status.phase
when a container is found in terminated state?We should log something about
c.state.terminated.reason
as well asdata.status.phase
whenc.state.terminated
is truthy.References
By looking at the PodStatus object, you can inspect nested resources like the phase field, or the containerStatuses array of ContaerinStatus etc...
I made a mindmap about pod.state things and events.
The text was updated successfully, but these errors were encountered: