-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate a temp certificate for OCP4 Trusted CA remediation #12226
Generate a temp certificate for OCP4 Trusted CA remediation #12226
Conversation
/test |
@rhmdnd: The
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test 4.12-e2e-aws-ocp4-moderate |
🤖 A k8s content image for this PR is available at: Click here to see how to deploy itIf you alread have Compliance Operator deployed: Otherwise deploy the content and operator together by checking out ComplianceAsCode/compliance-operator and: |
applications/openshift/networking/default_ingress_ca_replaced/tests/ocp4/e2e-remediation.sh
Outdated
Show resolved
Hide resolved
applications/openshift/networking/default_ingress_ca_replaced/tests/ocp4/e2e-remediation.sh
Outdated
Show resolved
Hide resolved
e7a5e89
to
d9290cf
Compare
/test 4.12-e2e-aws-ocp4-moderate |
/test 4.12-e2e-aws-ocp4-moderate |
/test images |
/test e2e-aws-ocp4-pci-dss |
I see that the profiles that don't include |
/test 4.12-e2e-aws-ocp4-pci-dss |
/test 4.15-e2e-aws-ocp4-pci-dss |
@rhmdnd The I'm running 4.15 and 4.16 ocp4-pci-dss test runs on this PR to gather more data. |
And more quirkiness: |
ComplianceAsCode/ocp4e2e#48 landed so let's rekick some of the jobs and see if we can get some more information from the CI clusters. |
/test 4.12-e2e-aws-ocp4-moderate |
d9290cf
to
3b73daa
Compare
/test 4.12-e2e-aws-ocp4-moderate |
I had to rewrite a good portion of the remediation so that it 1.) pointed to an actual config map and 2.) reused an existing certificate so the change propagated through the various components. I was able to get this working on a local cluster, so hopefully CI works, too. |
Still seeing the timeout issue. Checking to see if it's due to a lower context timeout in ComplianceAsCode/ocp4e2e#50. |
/test 4.15-e2e-aws-ocp4-moderate |
The 4.13 and 4.12 tests failed because the networking operator was in a degraded state, timing out after an hour. The 4.16 test failed because the timeout was reached even though the cluster operators eventually stabilized. |
I was able to dig this out of the networking operator logs:
The actual certificate is huge, which might be blowing out the size limits of the config map depending on how the syncing is handled. Locally, I needed to use |
Lately, we've been experiencing issues with manual remediations timing out during functional testing. This manifests in the following error: === RUN TestE2e/Apply_manual_remediations <snip> helpers.go:1225: Running manual remediation '/tmp/content-3345141771/applications/openshift/networking/default_ingress_ca_replaced/tests/ocp4/e2e-remediation.sh' helpers.go:1225: Running manual remediation '/tmp/content-3345141771/applications/openshift/general/file_integrity_notification_enabled/tests/ocp4/e2e-remediation.sh' helpers.go:1231: Command '/tmp/content-3345141771/applications/openshift/authentication/idp_is_configured/tests/ocp4/e2e-remediation.sh' timed out In this particular case, it looks like the remediation to add an Identity Provider to the cluster failed, but this is actually an unintended side-effect of another change that updated the idp_is_configured remediation to use a more robust technique for determining if the cluster applied the remediation successfully: ComplianceAsCode#12120 ComplianceAsCode#12184 Because we updated the remediation to use `oc adm wait-for-stable-cluster`, we're effectively checking all cluster operators to ensure they're healthy. This started causing timeouts because a separate, unrelated remediation was also getting applied in our testing that updated the default CA, but didn't include a ConfigMap that contained the CA bundle. As a result, one of the operators didn't come up because it was looking for a ConfigMap that didn't exist. The `oc adm wait-for-stable-cluster` command was hanging on a legitimate issue in a separate remediation. This commit attempts to fix that issue by updating the trusted CA remediation by creating a configmap for the expected certificate bundle.
3b73daa
to
6696642
Compare
ca-bundle.crt: $BUNDLE | ||
metadata: | ||
name: trusted-ca-bundle | ||
namespace: openshift-config-managed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimenting with this to see if it works around the following issue:
I0730 23:51:58.527582 1 log.go:198] Failed to sync additional trust bundle configmap openshift-config-managed/trusted-ca-bundle: failed to update trusted CA bundle configmap 'openshift-config-managed/trusted-ca-bundle': ConfigMap "trusted-ca-bundle" is invalid: []: Too long: must have at most 1048576 bytes
By creating it manually instead of relying on the sync logic in the networking operator.
@rhmdnd: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/retest-required |
/test 4.15-e2e-aws-ocp4-moderate |
Code Climate has analyzed commit 6696642 and detected 0 issues on this pull request. The test coverage on the diff in this pull request is 100.0% (50% is the threshold). This pull request will bring the total coverage in the repository to 59.4% (0.0% change). View more on Code Climate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhmdnd Thank you for this fix, it was quite a journey to figure out the problem and address it.
Lately, we've been experiencing issues with manual remediations timing
out during functional testing. This manifests in the following error:
In this particular case, it looks like the remediation to add an
Identity Provider to the cluster failed, but this is actually an
unintended side-effect of another change that updated the
idp_is_configured remediation to use a more robust technique for
determining if the cluster applied the remediation successfully:
#12120
#12184
Because we updated the remediation to use
oc adm wait-for-stable-cluster
, we're effectively checking all clusteroperators to ensure they're healthy.
This started causing timeouts because a separate, unrelated remediation
was also getting applied in our testing that updated the default CA, but
didn't include a ConfigMap that contained the CA bundle. As a result,
one of the operators didn't come up because it was looking for a
ConfigMap that didn't exist. The
oc adm wait-for-stable-cluster
command was hanging on a legitimate issue in a separate remediation.
This commit attempts to fix that issue by updating the trusted CA
remediation by generating a certificate for testing purposes, then
creates a ConfigMap called
trusted-ca-bundle
, before updating thetrusted CA.