-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created a new query and lint rule for gke serial port logging org policy-issue#30 #106
base: main
Are you sure you want to change the base?
Conversation
@ebenezergraham python3.12 version doesnt have distutils in standard library, either downgrade python version or pip install setuptools. |
@schweikert can you suggest something for this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for contributing to the project.
In general everything looks good, we just need to polish it in some places.
gcpdiag/queries/gke.py
Outdated
@@ -461,6 +478,51 @@ def cluster_hash(self) -> Optional[str]: | |||
return self._resource_data['id'][:8] | |||
raise UndefinedClusterPropertyError('no id') | |||
|
|||
def check_serial_port_logging_policy(self) -> Optional[bool]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file doesn't look like a good place for this function/check: orgpolicies don't belong to GKE and I don't see how it can be re-used by other modules.
I suggest moving the logic to err_2025_001_serial_port_logging.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will move this function to err_2025_001_serial_port_logging.py
gcpdiag/queries/gke.py
Outdated
logging.error(f"Error checking serial port logging policy: {e}") | ||
return None | ||
|
||
def is_serial_port_logging_compliant(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as for check_serial_port_logging_policy()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will move this function to err_2025_001_serial_port_logging.py
# limitations under the License. | ||
|
||
# Lint as: python3 | ||
"""GKE clusters must comply with serial port logging policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should define a good/target state, e.g. GKE cluster complies with the serial port logging organization policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified the text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use here the same text as in 2025_001.md: GKE cluster complies with the serial port logging organization policy.
|
||
When the constraints/compute.disableSerialPortLogging policy is enabled, | ||
GKE clusters must be created with logging disabled (serial-port-logging-enable: 'false'), | ||
otherwise the creation will fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we just need clarify creation of what: new nodes in nodepools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added these in the comments.
report.add_failed( | ||
cluster, | ||
msg= | ||
('Cluster does not comply with serial port logging policy. ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the message from line 18 will be displayed for this section during execution, why printing this message for every cluster? what if I have 100+ clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eugenenuke Do you suggest to remove the msg and remediation part here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, correct
'Update the node pool metadata to set ' | ||
'"serial-port-logging-enable": "false" for all node pools')) | ||
|
||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what exception are you expecting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of exceptions like key/value error, if error accessing policy or metadata. Error with google api. i can probably add them more explicitly @eugenenuke let me know ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, the only call that triggers an API is gke.get_clusters()
and it's out of this try/except
block (note that it has its own block)
Other function calls look safe enough to omit try/except
completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make it 100% clear: we don't need try/except
here
Thanks Evgenii, I will make the required changes |
modified the lint file and moved function from gke.py file as per Evgenii comments
@eugenenuke i added the requested change. |
weight: 1 | ||
type: docs | ||
description: > | ||
GKE clusters complies with the serial port logging organization policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clusters
- > cluster
# limitations under the License. | ||
|
||
# Lint as: python3 | ||
"""GKE clusters must comply with serial port logging policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use here the same text as in 2025_001.md: GKE cluster complies with the serial port logging organization policy.
continue | ||
|
||
# Check if cluster complies with serial port logging policy | ||
if is_serial_port_logging_compliant(cluster): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true/false
return value provides us only per-cluster
granularity, this might be too coarse
if we have some big number of nodepools (e.g. dozens) in a cluster and only a few of them have a conflicting logging configuration, would it be easy for a user to fix the problem? In the current state we expect them to figure out the list of affected pools by themselves.
if we get a list of "affected" nodepools, we can display it within "add_failed", e,g:report.add_failed(cluster, "The following nodepools don't comply with the serial port logging org policy: <list of np>")
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eugenenuke ok good idea. let me see how i can capture the nodepool name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eugenenuke ok good idea. let me see how i can capture the nodepool name
Ok, i added the new way to capture the nodepool name.
1) gcloud container clusters create example-cluster \ | ||
--metadata serial-port-logging-enable=false | ||
2) gcloud container node-pools create node-pool-1 \ | ||
--cluster=example-cluster --metadata serial-port-logging-enable=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This text seems to be closer to "Description" than "Remediation". We'd better explicitly tell what needs to be done.
Essentially, there are 3 options to remediate the issue that we need to cover:
- disable the org policy
- recreate all non-compliant node pools
- recreate all non-compliant clusters
Pls add some more details for each of the option and provide a link to a public page that explains how to do that, e.g.:
Also modified the website content for remediation
return None | ||
except Exception as e: | ||
logging.error(f"Error checking serial port logging policy: {e}") | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I would ignore try/except here, but as return None
causes an issue, I'm highlighting it again:
- do we really need try/except? (hint: no)
- we need to do all logging through
report.add_
(hint: no "logging.error()" in a lint rule) - what happens if at line 82, where two values are expected, we get
None
? (hint: the return value should be the same type)
PS: is_serial_port_logging_compliant() seems to be redundant, it only wraps the response from check_serial_port_logging_policy(). why don't we keep check_serial_port_logging_policy() only?
PPS: do we really need a dedicated is_compliant
variable? If we just return a list of noncompliant pools, that should be enough data to work with and there will be only one return value. In this case, I'd rename check_serial_port_logging_policy()
to something like get_noncompliant_pools()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok , so we dont need the status true/false as such anyway
for nodepool in cluster.nodepools: | ||
# Check if serial port logging is disabled in node config metadata | ||
metadata = nodepool.config.metadata or {} | ||
serial_logging_enabled = metadata.get('serial-port-logging-enable', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the ideal candidate to move under main/gcpdiag/queries/gke.py.
@property
def is_serial_logging_enabled(self) -> bool:
return ...
gcpdiag/queries/gke.py
Outdated
@@ -61,6 +61,10 @@ def image_type(self) -> str: | |||
def oauth_scopes(self) -> list: | |||
return self._resource_data['oauthScopes'] | |||
|
|||
@property | |||
def metadata(self) -> dict: | |||
return self._resource_data.get('metadata', {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with
@property
def is_serial_logging_enabled(self) -> bool:
return ...
moved the serial port logic to gke.py file. refactor the err file to give only non compliant pools.
@eugenenuke moved the required logic and adjusted the run rule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good now. Thanks!
@kaushik853 pls make sure all the pre-commit tests pass |
@eugenenuke can you suggest where the pre commit is failing now ? some linting? pylint |
Pre-commits:
First is to make sure that there is no empty line at the end of *.py and *.md files To fix the first two, you can run Make-test:
you need to run "make snapshots" command from the repo root directory, add the generated file(s) to the commit and push the changes |
@eugenenuke the pre-commit always end with error |
It seems to come from pylint. am i allowed to change the pylint version in .pre-commit-config.yaml file. |
My python version in virtualenv is 3.12.8 |
If not i give a try with github codespace here if it improves the pylint problem |
not sure what the, i have both pylint and astroid installed with correct versions |
@eugenenuke the pylint version as per the pre-commit yaml is 3.3.1, which uninstall other astroid version and installs astroid 3.3.8. and im getting this error of uncompatible astrod version. Is thise some upstream error ? |
1 similar comment
@eugenenuke the pylint version as per the pre-commit yaml is 3.3.1, which uninstall other astroid version and installs astroid 3.3.8. and im getting this error of uncompatible astrod version. Is thise some upstream error ? |
these checks are all passing locally |
To identify when the constraints/compute.disableSerialPortLogging policy is enabled using the orgpolicy.py file, follow these steps:
The PREFETCH_ORG_CONSTRAINTS tuple in the orgpolicy.py file includes the constraints/compute.disableSerialPortLogging policy.
The _get_effective_org_policy_all_constraints function fetches all constraints listed in PREFETCH_ORG_CONSTRAINTS for a given project.
The get_effective_org_policy function retrieves the effective organization policy for a specific constraint by calling _get_effective_org_policy_all_constraints and checking if the constraint is enforced.
The BooleanPolicyConstraint class has an is_enforced method that returns whether the policy is enforced.
I modified the gke.py file in queries and added a new lint rule/err file and created a documentation under website folder.