Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created a new query and lint rule for gke serial port logging org policy-issue#30 #106

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
deeddbf
Created a new query and lint rule for gke serial port logging org policy
kaushik853 Jan 3, 2025
3b8d2e9
Merge branch 'GoogleCloudPlatform:main' into main
kaushik853 Jan 4, 2025
8a6566f
added pre commit syntax checks
kaushik853 Jan 4, 2025
f253937
added pre commit syntax checks findings
kaushik853 Jan 4, 2025
b79d088
Merge branch 'GoogleCloudPlatform:main' into main
kaushik853 Jan 15, 2025
63c84fc
added changes as per review comments
kaushik853 Jan 15, 2025
d32a182
removed unused import library
kaushik853 Jan 15, 2025
3cbf2a6
modified spacing as per the pre-commit
kaushik853 Jan 15, 2025
1e8c843
spacing as per the pre-commit
kaushik853 Jan 15, 2025
4c1e675
Merge branch 'GoogleCloudPlatform:main' into main
kaushik853 Jan 18, 2025
10e0e73
added per nodepool failure report
kaushik853 Jan 18, 2025
ad6c192
modified as per pre-commit check
kaushik853 Jan 18, 2025
44ea689
remove syntax error
kaushik853 Jan 18, 2025
a900aa5
added as per Evgenii comments
kaushik853 Jan 20, 2025
59ebb06
as per pre-commit syntax check
kaushik853 Jan 20, 2025
fca6e67
Merge branch 'GoogleCloudPlatform:main' into main
kaushik853 Jan 24, 2025
9db9fa4
Merge branch 'GoogleCloudPlatform:main' into main
kaushik853 Jan 25, 2025
f247ac4
Merge branch 'GoogleCloudPlatform:main' into main
kaushik853 Jan 26, 2025
4cd17e5
pre-commit fix
kaushik853 Jan 26, 2025
902b6ff
pre-commit fix-pylint
kaushik853 Jan 26, 2025
82f7c90
pre-commit fix-pylint
kaushik853 Jan 26, 2025
b6dbc9a
pre-commit fix-pylint
kaushik853 Jan 27, 2025
4cec3ff
pre-commit fix-pylint
kaushik853 Jan 27, 2025
6c91997
pre-commit fix-pylint
kaushik853 Jan 27, 2025
c6c819a
Merge branch 'GoogleCloudPlatform:main' into main
kaushik853 Jan 29, 2025
71edd26
Merge branch 'GoogleCloudPlatform:main' into main
kaushik853 Feb 4, 2025
16e0775
Merge branch 'GoogleCloudPlatform:main' into main
kaushik853 Feb 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions gcpdiag/lint/gke/err_2025_001_serial_port_logging.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Copyright 2023 Google LLC
kaushik853 marked this conversation as resolved.
Show resolved Hide resolved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Lint as: python3
"""GKE clusters must comply with serial port logging policy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should define a good/target state, e.g. GKE cluster complies with the serial port logging organization policy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified the text.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use here the same text as in 2025_001.md: GKE cluster complies with the serial port logging organization policy.


When the constraints/compute.disableSerialPortLogging policy is enabled,
GKE clusters must be created with logging disabled (serial-port-logging-enable: 'false'),
otherwise the creation will fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we just need clarify creation of what: new nodes in nodepools

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added these in the comments.

"""

from gcpdiag import lint, models
from gcpdiag.queries import gke

def run_rule(context: models.Context, report: lint.LintReportRuleInterface):
clusters = gke.get_clusters(context)
clusters_checked = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be easier to implement this way:

clusters = gke.get_clusters(context)
if not clusters:
  report.add_skipped(None, 'No clusters found')
  return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eugenenuke ok, i will remove the clusters_checked variable. but what is the point for return ? here

clusters = gke.get_clusters(context)
if not clusters:
  report.add_skipped(None, 'No clusters found')
  return``

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you can remove return if you want, as the following for loop will run zero times and have the same effect.


for cluster in clusters.values():
clusters_checked += 1

try:
# Skip Autopilot clusters as they are managed by Google
if cluster.is_autopilot:
report.add_skipped(
cluster,
'Skipping Autopilot cluster - serial port logging managed by Google'
)
continue

# Check if cluster complies with serial port logging policy
if cluster.is_serial_port_logging_compliant():
report.add_ok(cluster)
else:
report.add_failed(
cluster,
msg=
('Cluster does not comply with serial port logging policy. '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the message from line 18 will be displayed for this section during execution, why printing this message for every cluster? what if I have 100+ clusters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eugenenuke Do you suggest to remove the msg and remediation part here ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, correct

'When constraints/compute.disableSerialPortLogging is enabled, '
'all node pools must have serial-port-logging-enable set to false'
),
remediation=(
'Update the node pool metadata to set '
kaushik853 marked this conversation as resolved.
Show resolved Hide resolved
'"serial-port-logging-enable": "false" for all node pools'))

except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exception are you expecting here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of exceptions like key/value error, if error accessing policy or metadata. Error with google api. i can probably add them more explicitly @eugenenuke let me know ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, the only call that triggers an API is gke.get_clusters() and it's out of this try/except block (note that it has its own block)

Other function calls look safe enough to omit try/except completely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make it 100% clear: we don't need try/except here

report.add_skipped(cluster, f'Error checking cluster: {str(e)}')

if not clusters_checked:
report.add_skipped(None, 'No clusters found')
2 changes: 2 additions & 0 deletions gcpdiag/lint/gke/snapshots/ERR_2025_001.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
* gke/ERR/2025_001: When the constraints/compute.disableSerialPortLogging policy is enabled,
GKE clusters must be created with logging disabled (serial-port-logging-enable: 'false')
65 changes: 64 additions & 1 deletion gcpdiag/queries/gke.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
from boltons.iterutils import get_path

from gcpdiag import caching, config, models, utils
from gcpdiag.queries import apis, crm, gce, network, web
from gcpdiag.queries import apis, crm, gce, network, orgpolicy, web
from gcpdiag.utils import Version

# To avoid name conflict with L342
Expand Down Expand Up @@ -61,6 +61,23 @@ def image_type(self) -> str:
def oauth_scopes(self) -> list:
return self._resource_data['oauthScopes']

@property
def metadata(self) -> dict:
return self._resource_data.get('metadata', {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with

  @property
  def is_serial_logging_enabled(self) -> bool:
    return ...


def is_serial_port_logging_enabled(self) -> bool:
"""Check if serial port logging is enabled in the node configuration.

Returns:
bool: True if serial port logging is enabled or not explicitly disabled,
False if explicitly disabled
"""
metadata = self.metadata
# Check if serial-port-logging-enable exists in metadata
# If not specified or set to 'true', it's considered enabled
return metadata.get('serial-port-logging-enable', 'true').lower() == 'true'



class NodePool(models.Resource):
"""Represents a GKE node pool."""
Expand Down Expand Up @@ -461,6 +478,51 @@ def cluster_hash(self) -> Optional[str]:
return self._resource_data['id'][:8]
raise UndefinedClusterPropertyError('no id')

def check_serial_port_logging_policy(self) -> Optional[bool]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file doesn't look like a good place for this function/check: orgpolicies don't belong to GKE and I don't see how it can be re-used by other modules.

I suggest moving the logic to err_2025_001_serial_port_logging.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move this function to err_2025_001_serial_port_logging.py

"""
Checks if serial port logging policy is enforced and if cluster complies with it.

Returns:
bool: True if cluster complies with policy, False if not compliant,
None if policy not enforced
"""
try:
# Get the policy constraint status
constraint = orgpolicy.get_effective_org_policy(
self.project_id, 'constraints/compute.disableSerialPortLogging')

# If policy is not enforced, return None (no compliance check needed)
if not isinstance(
constraint,
orgpolicy.BooleanPolicyConstraint) or not constraint.is_enforced():
return None

# Get cluster node pools
for nodepool in self.nodepools:
# Check if serial port logging is disabled in node config metadata
if nodepool.config.is_serial_port_logging_enabled():
return False

return True

except ValueError:
# Policy not found or not supported
return None
except Exception as e:
logging.error(f"Error checking serial port logging policy: {e}")
return None

def is_serial_port_logging_compliant(self) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as for check_serial_port_logging_policy()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move this function to err_2025_001_serial_port_logging.py

"""
Checks if the cluster is compliant with serial port logging policy.

Returns:
bool: True if compliant or policy not enforced, False if non-compliant
"""
compliance = self.check_serial_port_logging_policy()
# Return True if policy is not enforced (compliance is None) or cluster is compliant
return compliance is None or compliance


@caching.cached_api_call
def get_clusters(context: models.Context) -> Mapping[str, Cluster]:
Expand Down Expand Up @@ -694,3 +756,4 @@ def find_date_str_in_td(e):
) as e:
logging.error('Error in extracting gke release schedule: %s', e)
return release_data

25 changes: 25 additions & 0 deletions website/content/en/rules/gke/ERR/2025_001.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
title: "gke/ERR/2025_001"
linkTitle: "ERR/2025_001"
weight: 1
type: docs
description: >
GKE clusters must comply with serial port logging organization policy.
kaushik853 marked this conversation as resolved.
Show resolved Hide resolved
---

**Product**: [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine)\
**Rule class**: ERR - Something that is very likely to be wrong

### Description

When the constraints/compute.disableSerialPortLogging organization policy is enabled,
GKE clusters must be created with logging disabled (serial-port-logging-enable: 'false'),
otherwise the creation will fail.


### Remediation
kaushik853 marked this conversation as resolved.
Show resolved Hide resolved

### Further information

1. https://cloud.google.com/kubernetes-engine/docs/how-to/view-logs

Loading