Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-127 (UserNS): allow customizing subids length #5020

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AkihiroSuda
Copy link
Member

  • One-line PR description: KEP-127 (UserNS): allow customizing subids length
  • Other comments:

The number of subuids and subgids for each of pods is hard-coded to 65536, regardless to the total ID count specified in /etc/subuid and /etc/subgid: https://github.com/kubernetes/kubernetes/blob/v1.32.0/pkg/kubelet/userns/userns_manager.go#L211-L228

This is not enough for some images.
Nested containerization needs a huge number of subids too.

The number of subuids and subgids for each of pods is hard-coded to 65536,
regardless to the total ID count specified in `/etc/subuid` and `/etc/subgid`:
https://github.com/kubernetes/kubernetes/blob/v1.32.0/pkg/kubelet/userns/userns_manager.go#L211-L228

This is not enough for some images.
Nested containerization needs a huge number of subids too.

Signed-off-by: Akihiro Suda <[email protected]>
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jan 5, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: AkihiroSuda
Once this PR has been reviewed and has the lgtm label, please assign mrunalp for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jan 5, 2025
Comment on lines +338 to +339
The mapping length (multiple of 65536) will be customizable via a new
`KubeletConfiguration` property `subidsPerPod`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd got the impression we might want to make the mapping size configurable on a per-Pod basis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(what if you have a particular Pod that assigns a (POSIX) ID to each user, and you have 42000000 users, but all your other Pods only need 65000 UIDs?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible but not a common case IMO, and the implementation of adding a pod API field would be much more complex than adding a kubelet configuration field. I'm not sure the maintenance burden is worth it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So long as we're not accidentally tying ourselves into not being able to extend the Pod API in the future. If we are tying ourselves, let's make sure we'd never want the option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about introducing a Pod security context property like securityContext.userNS.staticMappingWithUsername: "foo".
This will run getsubids foo to obtain the subID range, and assign the entire range to the Pod.
(So, this is different from getsubids kubelet which returns the total range for the 110 pods)

Multiple pods may use the same range at their own risk.
This allows assigning an extremely large subID range. $(2^{32}-65536)$ at maximum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the max idrange inside of a container flexible? as in: could we have a kubelet field that toggles a dynamic range and the runtime interpret the range in the image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flexible. A container may use UID that is not present in /etc/passwd in the image. So, a runtime cannot "interpret the range in the image".

It should be still possible to have OCI Image annotations to declare the range of the needed UIDs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to prevent that such a field is not abused? An image could claim all the available IDs and prevents that other pods can be created

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admission-time checks is where I'd start; also ResourceQuota and LimitRange specifically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prevents that other pods can be created

No, with securityContext.userNS.staticMappingWithUsername: "foo" which allows ID conflicts and requires the explicit configuration of the securityContext.

This should be still probably prohibited for Restricted Pod Security Standard.

@rata
Copy link
Member

rata commented Jan 21, 2025

@AkihiroSuda can you please elaborate on what is needed for the use case?

We have several options, none is perfect with the info we have, but with more info on the use case we might be able to make a better decision.

For example, one option proposed here is to use bigger ranges for all pods. That might work or not, depending how big the ranges need to be (it can be that we can't run the number of maxPods configured to the node if the ranges are very big). Another option is to use the pod.spec as you suggested, but we need to think about abuses as @giuseppe was mentioning.

Can you share more details on the use case, so we can see what might be the best way to tackle it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants