-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build controller handling of service account can lead to failing build runs and other artifacts #337
Comments
My preference: I'd rule our (B) and (D) alone because they require the user to fix something. (B) or (D) would only be okay if the addition (config setting to force generated service account) is also implemented. Imo (A) would be quite complex because it would not just mean to watch secret deletions but for a full solution, one would also need to take care of the build controller restart where we would need to then validate all service accounts. The nice thing of this option on the other hand is that it does not affect the performance when a new build run is there and we have to create the task run because we there can assume that all service accounts are okay. But that's imo not worth from an effort-value perspective My preference is (C) because it still means that the user does not need to fix something manually. And I hope the impact on the task run creation performance is not too high. |
For C as example, Do you mean use a I may prefer to switch the default behavior to |
No, with C I mean the following: we do not change the behavior of which service account is taken. That logic would stay (though, the hard-coded pipeline name is worth an env variable, but anyway). What would be done in addition in maybe this function, https://github.com/redhat-developer/build/blob/08b80ac29535a922d75d64c1f4f212591c7a4c44/pkg/controller/buildrun/credentials.go#L14-L37 or maybe rather in a separate function, is to look at the overall set of secrets in the service account and verify that they exist (either by a |
Is this something that we need to do? For a very long time now the k8s default behavior is if a secret exists in a namespace, it is readable by any service account in the namespace. The only secrets directly linked to Service accounts are image pull secrets and service account tokens, the latter of which are automatically generated by k8s. For image pull secrets, we only need to link pull secrets that are needed to pull pod images, not images referenced in Dockerfiles and such. |
Hi @adambkaplan, we use secrets as part of a build run for two purposes where k8s has not its hands on:
The magic to create the necessary files within the tasks home directory (/home/tekton, an emptyDir volume) happens inside the In short: yes, we have to do this. |
@SaschaSchwarze0 sorry, im digesting the msgs, will provide my 2 cents soonish |
My opinion is that we do not need to do anything, because of the following:
$ k -n test-build get build
NAME REGISTERED REASON BUILDSTRATEGYKIND BUILDSTRATEGYNAME CREATIONTIME
buildpack-nodejs-build False secret icr-docker does not exist ClusterBuildStrategy buildpacks-v3 4s plus the BuildRun will not start due to the
In sequential order, what Tekton does is:
so Im not seeing the value on doing what Tekton already does.
My proposal is that this is not a bug and that we should not do anything in code, but delegate to tekton the validation(which is already being done). We can add a documentation in this project around the use-case of this issue and what does the error represented by tekton means in simple words. |
Thank you for your opinion @qu1queee. I in the meantime think, we should go with another option. These two I consider: (E) Remove fallback to The reason for this is that I learned in the meantime (in a KubeCon talk) that a taskrun will fail if two secrets in the service account are for the same resource (same container registry, same git endpoint). The reason is that Tekton's I'm always in favor of providing a solution that works for the user without much to take care for him. The generated service account seems to be the best choice in my opinion. One could go this the all-or-nothing way and completely remove the support for named service accounts (F) which would also simplify the CRD and code as we do not need to implement both options. Or one still allows the service account usage as an opt-in (E). |
This is not ideal for openshift. My understanding is that in openshift the
I´m not sure about the above, I tried to validate this use case. I basically have two secrets with the same content(for dockerhub) but different name, and I can see that the initContainers:
- args:
- -docker-config=icr-knbuild
- -docker-config=icr-docker
- -docker-config=icr-docker2
command:
- /ko-app/creds-init
env:
- name: HOME
value: /tekton/home
image: gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/creds-init:v0.14.2@sha256:64a511c0e68f611f630ccbe3744c97432f69c300940f4a32e748457581394dd6
imagePullPolicy: IfNotPresent
name: credential-initializer
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /workspace
name: tekton-internal-workspace
- mountPath: /tekton/home
name: tekton-internal-home
- mountPath: /tekton/results
name: tekton-internal-results
- mountPath: /tekton/creds-secrets/icr-knbuild
name: tekton-internal-secret-volume-icr-knbuild-gptgx
- mountPath: /tekton/creds-secrets/icr-docker
name: tekton-internal-secret-volume-icr-docker-tsqlb
- mountPath: /tekton/creds-secrets/icr-docker2
name: tekton-internal-secret-volume-icr-docker2-bbsll
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-4rl9h
readOnly: true @SaschaSchwarze0 can you validate the above? I think what I´m saying is that we should leave the code as it is. |
Yeah, later, but the error should happen here: https://github.com/tektoncd/pipeline/blob/v0.14.2/pkg/credentials/dockercreds/creds.go#L77-L79 |
I agree with your observation @qu1queee and consider this a Tekton bug. Tekton allows container registry credentials to be of these formats (see Configuring authentication for Docker):
We typically (or me actually always) take the first option. But, the code that I had pointed to is for the second option and only this option prevents duplicate credentials for the same registry. The other one is missing that checks here and here. That'll become my first Tekton PR. :-) |
I also agree this is not a bug, but we can improve the behavior or document the potential problem. We keep the So that is why I suggest change the And we tested the multiple container registries in one SA before, we can have different container registries in one SA, such as one is |
@zhangtbj yes, I think we should do this as you said. Can you open a new issue for that? and work on it if you have time or ask someone in CDL :) @SaschaSchwarze0 can we then close this one in favor of the new upcoming issue from @zhangtbj ? |
Sry, was focused on other stuff the last days. I think we need to validate our ideas with our RH friends. My understanding currently is that we plan to run the build service differently in our offerings:
With such conflicting interests, I wonder whether this can only be resolved using a configuration option that controls the default behavior. Would be similar to what the e2e tests do = one passes an environment variable to the operator which is the name of the service account, the special value |
@SaschaSchwarze0 I think there is no need for this:
in my opinion this is adding complexity on the code and I think the SA management should only belong to the Build or BuildRun definition. But you have a good point, if we auto-generate the SA always by default then we need to:
We should close this issue and open a new one where we discussed the current topic. This issue original intention is not longer valid. |
I am not somebody who likes these kind of discussions as they consume unnecessary time. The issue I describe in this issue is perfectly valid. And the intention is to prevent users to come to service accounts that contain deleted secrets. Changing the default to using a generated service account is a perfect solution to mitigate that problem as it gives the responsibility of deciding to take a dedicated user account into the hands of the user who specifies this in the buildrun. And combined with a certain documentation on what it means to use a dedicated service account (secrets being added to it) and a hint that the user is then responsible to remove secrets from it if he decides to delete one, imo is a perfect resolution of exactly what I want to address. |
I tried to get started with Shipwright since yesterday but I'm stuck with what I assume to be the issue described here.
Source code and registry is on a private self hosted Gitlab instance. Using kaniko strategy. I can confirm that using a public git repo works as expected (except for the incorrect timestamp on the image, it says created 40yrs ago :) |
Hi @davidberglund, thank you for the feedback. Your question is not related to this issues. The best place for questions is our slack channel, #shipwright, in the Kubernetes slack, or to open a new issue. Regarding your question: BuildRun pods consist of multiple containers. As such, you need to run |
In a build run spec, there is a section about the service account. This section can be
When the generated service account is not used, our logic in generate task run will add the secrets referenced in the build (for example for the source repository and the target container registry) to the specified service account if the secret is not already contained.
If the user now deletes one of those secrets, the service account is broken and our controller will also not repair it. If the user now tries to create a build run that uses such a service account, then the build run will fail with this reason:
failed to create task run pod "new-build-run-kwkl5-j9b27": translating TaskSpec to Pod: secrets "something" not found. Maybe invalid TaskSpec
As a service account can be used for other means as well (especially if scenario 3 applies and a fallback to default happens), those other things might also fail (job, deployment) (I should mention that I did not try how they behave with such a service account).Here are some options that I see:
(A) Watch secret deletions and remove references from the service account(s). This would probably be possible if we would only support either a hard-coded service account or a generated one. But given the user can use a named service account for his build runs, this could mean that we need to manage a lot service accounts.
(B) At the time we assign a named service account to a build run, we check whether all secrets really exist and if one does not exist we fail the build run with an error message that is nicer than the one coming from Tekton.
(C) At the time we assign a named service account to a build run, we check whether all secrets really exist and if one does not exist we remove it from the service account.
(D) We do nothing and only document the current behavior.
In addition to those options, we may also consider a configuration setting for the controller that allows to force a build run to use a generated service account by ignoring the intent of the user specified in the build run.
The text was updated successfully, but these errors were encountered: