-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-127 (UserNS): allow customizing subids length #5020
base: master
Are you sure you want to change the base?
Conversation
The number of subuids and subgids for each of pods is hard-coded to 65536, regardless to the total ID count specified in `/etc/subuid` and `/etc/subgid`: https://github.com/kubernetes/kubernetes/blob/v1.32.0/pkg/kubelet/userns/userns_manager.go#L211-L228 This is not enough for some images. Nested containerization needs a huge number of subids too. Signed-off-by: Akihiro Suda <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: AkihiroSuda The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The mapping length (multiple of 65536) will be customizable via a new | ||
`KubeletConfiguration` property `subidsPerPod`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd got the impression we might want to make the mapping size configurable on a per-Pod basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(what if you have a particular Pod that assigns a (POSIX) ID to each user, and you have 42000000 users, but all your other Pods only need 65000 UIDs?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible but not a common case IMO, and the implementation of adding a pod API field would be much more complex than adding a kubelet configuration field. I'm not sure the maintenance burden is worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So long as we're not accidentally tying ourselves into not being able to extend the Pod API in the future. If we are tying ourselves, let's make sure we'd never want the option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about introducing a Pod security context property like securityContext.userNS.staticMappingWithUsername: "foo"
.
This will run getsubids foo
to obtain the subID range, and assign the entire range to the Pod.
(So, this is different from getsubids kubelet
which returns the total range for the 110 pods)
Multiple pods may use the same range at their own risk.
This allows assigning an extremely large subID range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the max idrange inside of a container flexible? as in: could we have a kubelet field that toggles a dynamic range and the runtime interpret the range in the image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flexible. A container may use UID that is not present in /etc/passwd
in the image. So, a runtime cannot "interpret the range in the image".
It should be still possible to have OCI Image annotations to declare the range of the needed UIDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to prevent that such a field is not abused? An image could claim all the available IDs and prevents that other pods can be created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admission-time checks is where I'd start; also ResourceQuota and LimitRange specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prevents that other pods can be created
No, with securityContext.userNS.staticMappingWithUsername: "foo"
which allows ID conflicts and requires the explicit configuration of the securityContext.
This should be still probably prohibited for Restricted
Pod Security Standard.
@AkihiroSuda can you please elaborate on what is needed for the use case? We have several options, none is perfect with the info we have, but with more info on the use case we might be able to make a better decision. For example, one option proposed here is to use bigger ranges for all pods. That might work or not, depending how big the ranges need to be (it can be that we can't run the number of maxPods configured to the node if the ranges are very big). Another option is to use the pod.spec as you suggested, but we need to think about abuses as @giuseppe was mentioning. Can you share more details on the use case, so we can see what might be the best way to tackle it? |
The number of subuids and subgids for each of pods is hard-coded to 65536, regardless to the total ID count specified in
/etc/subuid
and/etc/subgid
: https://github.com/kubernetes/kubernetes/blob/v1.32.0/pkg/kubelet/userns/userns_manager.go#L211-L228This is not enough for some images.
Nested containerization needs a huge number of subids too.