Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sudo usage in cmd-* #3984

Closed
wants to merge 1 commit into from
Closed

Conversation

mtalexan
Copy link
Contributor

@mtalexan mtalexan commented Dec 4, 2024

Sudo is being used directly all over the place, which doesn't work if the user is already root (something the has_privileges function checks for and supports). Add a SUDO and SUDO_W_ENV variable that evaluates to the equivalent sudo command when not root, but is blank when running as root.
Also add a sudo (overlapping) and sudo_w_env alias that map to a fake-root function defined to just run the arguments passed without any actual sudo call. This makes sure any other tools called from the cmd-* shell scripts (like python scripts) won't call real sudo with their hardcoded commands when running as root.

Copy link

openshift-ci bot commented Dec 4, 2024

Hi @mtalexan. Thanks for your PR.

I'm waiting for a coreos member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mtalexan
Copy link
Contributor Author

mtalexan commented Dec 4, 2024

Encountered this when running the cosa container with:

podman run ... --user=root --cap-add=SYS_ADMIN ....

I still need access to the pkgcache-repo the rpm-ostree compose tree call creates so I can run some custom commands between the cosa fetch and cosa build and populate some things into the overrides folder dynamically based on the contents of the RPMs that were installed. I use the manifest-lock.x86_64.json and the pkgcache-repo to query the contents of what will be installed, and do a dynamic lookup on extra things I need to manually insert.

Apparently the pkgcache-repo is no longer an archive repo it seems, so when the cache/ folder is mounted into the container from the host, a non-root user in the container is unable to set the xattrs on the files in the ostree. Currently this case doesn't occur because an unprivileged container user causes the rpm-ostree compose tree command to run in runvm_with_cache, which that ensures the cache/ folder is actually part of the cache2.qcow2 file instead of mounted directly from the host. This prevents the pkgcache-repo from being accessed outside the runvm_with_cache though.

So to retain access to the pkgcache-repo between the fetch and build commands, I have to run the container as --user=root.

@mtalexan
Copy link
Contributor Author

Just checking in here after the holidays.

It looks like Jenkins crashed or went down while it was supposed to be building, and never completed.

@jbtrystram
Copy link
Contributor

/retest

Copy link

openshift-ci bot commented Jan 21, 2025

@mtalexan: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/rhcos 1bbc89e link true /test rhcos

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dustymabe
Copy link
Member

Not sure I understand the problem with running sudo when you are using the root user?

[dustymabe@media ~]$ podman run -it --rm --entrypoint /bin/bash quay.io/coreos-assembler/coreos-assembler:latest 
[builder@c4f36a8def0a srv]$ cat /etc/shadow
cat: /etc/shadow: Permission denied
[builder@c4f36a8def0a srv]$ sudo cat /etc/shadow | wc -l
33
[builder@c4f36a8def0a srv]$ exit
exit
[dustymabe@media ~]$ podman run -it --rm --user=root --entrypoint /bin/bash quay.io/coreos-assembler/coreos-assembler:latest 
[root@4c459c3790e8 srv]# cat /etc/shadow | wc -l
33
[root@4c459c3790e8 srv]# sudo cat /etc/shadow | wc -l
33

@mtalexan
Copy link
Contributor Author

I get it trying to query/prompt for a password (or other authorization), which results in a PAM error every time. The same PAM error regardless of whether I'm running as root or not.

[~]$ podman run --rm -it --entrypoint=/bin/bash quay.io/coreos-assembler/coreos-assembler:latest
[builder@1468ea1ff1ee srv]$ sudo cat /etc/shadow | wc -l
sudo: PAM account management error: Authentication service cannot retrieve authentication info
sudo: a password is required
0
[builder@1468ea1ff1ee srv]$ exit
exit

[~]$ podman run --rm --user=root -it --entrypoint=/bin/bash quay.io/coreos-assembler/coreos-assembler:latest
[root@550c9f6229b2 srv]# sudo cat /etc/shadow | wc -l
sudo: PAM account management error: Authentication service cannot retrieve authentication info
sudo: a password is required
0
[root@550c9f6229b2 srv]# exit
exit

I know there's been an open issue on the Fedora public base images since the F39 release for a PAM (mis-)configuration that breaks password prompting for sudo credentials except when using some unknown tool combination on a Fedora host system (e.g. default toolbox config on a host running a recent Fedora release). In my experience this only applies if the sudoers isn't setup to be password-less for that user, i.e. it actually needs to query/prompt for credentials via PAM.

I also believe most distros are setup to not have root in the sudoers file, or grant permissions for root to run sudo (since it would be redundant), and will error out if you try to use sudo as root. In my experience this includes the last ~10 years of releases from: Ubuntu, Arch, and OpenSUSE.
Since the PAM error in the Fedora public base images prevents any querying of sudo authorization outside the sudoers file, I assumed doing it as root would have failed anyway like almost every other distro does. I sought to eliminate the (at best unnecessary) use of sudo when running as root to both avoid the open PAM issue, but also presumably to solve what I thought was likely to be an error even if the PAM error didn't exist.

Sudo is being used directly all over the place, which doesn't work when the user is already root.
Add a SUDO and SUDO_W_ENV variable that evaluates to the equivalent sudo command, but is
blanked when running as root.
Also add a sudo and sudo_w_env alias that map to a fake-root function that just runs the
command passed without any sudo call, so any attempted use of sudo by python scripts triggered from
the cmd-* scripts won't actually use sudo when running as root.
@mtalexan
Copy link
Contributor Author

/retest

Copy link

openshift-ci bot commented Jan 21, 2025

@mtalexan: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dustymabe
Copy link
Member

I don't understand why you'd get different results?

can you run the cat using the following container startup command and paste the full output?:

podman run --pull=always --rm --user=root -it --entrypoint=/bin/bash quay.io/coreos-assembler/coreos-assembler:latest

@mtalexan
Copy link
Contributor Author

mtalexan commented Jan 21, 2025

Weird, it didn't auto-pull an up to date version of the latest tag before, so I was running a copy from mid-December 2024.

With --pull=always to force it, I'm still getting the same error though (consistent with what's been reported on the Fedora base images ever since F39):

[~]$ podman run --pull=always --rm --user=root -it --entrypoint=/bin/bash quay.io/coreos-assembler/coreos-assembler:latest
Trying to pull quay.io/coreos-assembler/coreos-assembler:latest...
Getting image source signatures
Copying blob 451ef5bd8681 done   | 
Copying blob 7f0754b77be7 done   | 
Copying blob a52c777f25d4 done   | 
Copying blob dfe32983ad5a done   | 
Copying blob 8445f41b77c1 done   | 
Copying blob 2a912ca08877 done   | 
Copying blob b3ff8a0be438 done   | 
Copying blob 52cabd2a22c2 done   | 
Copying blob 83ce49b350c8 done   | 
Copying blob 9dc0860eff5d done   | 
Copying blob 4ef37b2c3006 done   | 
Copying blob cff0cb9d7f6d done   | 
Copying blob 0a6149cfd0f4 done   | 
Copying blob 147aec9af54d done   | 
Copying blob 62499f028be8 done   | 
Copying blob a6c00ebc15d5 done   | 
Copying blob f2f45b7c196b done   | 
Copying blob f0c2e5b6d369 done   | 
Copying config 9dc80c6b03 done   | 
Writing manifest to image destination
[root@a7be90f0ba97 srv]# sudo cat /etc/shadow | wc -l
sudo: PAM account management error: Authentication service cannot retrieve authentication info
sudo: a password is required
0
[root@a7be90f0ba97 srv]# exit
exit

EDIT: removed the 2 lines where I tab-tabbed to auto-complete the name of /etc/shadow and it listed all the files matching the pattern before I ran any actual command.

@mtalexan
Copy link
Contributor Author

mtalexan commented Jan 21, 2025

Minor correction, it looks like the PAM account management issue may have been occurring since Fedora 38, not Fedora 39.

I've tried looking into the generic issue before, but didn't get very far. It's related to something that's configured in the PAM modules provided by the Fedora public images (used as a base image for the coreos-assembler images here). From Issue reports on various projects[1][2] it seems like it's probably related to the pam_systemd module somehow. From the reports when I was looking previously, it seems to only affect the Fedora container image regardless of the host system used, with the exception that using Toolbox on a modern Fedora host system with default settings to run the Fedora container images doesn't ever seem to encounter the issue.


  1. [Error]sudo: PAM account management error: Authentication service cannot retrieve authentication info 89luca89/distrobox#1078 (multiple, I just picked one of them)
  2. Unable to run sudo on 40 or 41 containers if --privileged is present fedora-cloud/docker-brew-fedora#117

@dustymabe
Copy link
Member

are you running Ubuntu/Debian with AppArmor enabled?

fedora-cloud/docker-brew-fedora#117 (comment)

@mtalexan
Copy link
Contributor Author

are you running Ubuntu/Debian with AppArmor enabled?

fedora-cloud/docker-brew-fedora#117 (comment)

I saw that, and I am running on an Ubuntu host with AppArmor enabled. However, I'm getting the problem even when not running --privileged, and even if I include --security-opt=apparmor=unconfined.

``` [~]$ podman run --pull=always --rm --user=root --security-opt=apparmor=unconfined -it --entrypoint=/bin/bash quay.io/coreos-assembler/coreos-assembler:latest Trying to pull quay.io/coreos-assembler/coreos-assembler:latest... Getting image source signatures Copying blob 451ef5bd8681 skipped: already exists Copying blob a52c777f25d4 skipped: already exists Copying blob 7f0754b77be7 skipped: already exists Copying blob 8445f41b77c1 skipped: already exists Copying blob 2a912ca08877 skipped: already exists Copying blob b3ff8a0be438 skipped: already exists Copying blob 4ef37b2c3006 skipped: already exists Copying blob 52cabd2a22c2 skipped: already exists Copying blob 83ce49b350c8 skipped: already exists Copying blob 9dc0860eff5d skipped: already exists Copying blob cff0cb9d7f6d skipped: already exists Copying blob 0a6149cfd0f4 skipped: already exists Copying blob 147aec9af54d skipped: already exists Copying blob 62499f028be8 skipped: already exists Copying blob a6c00ebc15d5 skipped: already exists Copying blob f2f45b7c196b skipped: already exists Copying blob f0c2e5b6d369 skipped: already exists Copying blob dfe32983ad5a skipped: already exists Copying config 9dc80c6b03 done | Writing manifest to image destination [root@a8e24a157095 srv]# sudo cat /etc/shadow | wc -l sudo: PAM account management error: Authentication service cannot retrieve authentication info sudo: a password is required 0 [root@a8e24a157095 srv]# exit exit ```

That said, I just double checked by turning off AppArmor entirely, and it turns out it somehow does solve the issue. So the AppArmor on the host is somehow limiting what files can be accessed fully within the completely isolated container, even when AppArmor is explicitly disabled for the container. ( ¯_(ツ)_/¯ ).


I guess this PR can be closed as unnecessary then.

  • Fedora is setup to allow root to call sudo and PAM will authenticate it without prompting.
  • PAM issues within Fedora containers are caused by host AppArmor profile problems.

@mtalexan mtalexan closed this Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants