Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend SSG library to more easily collect profile selections #12797

Merged

Conversation

marcusburghardt
Copy link
Member

@marcusburghardt marcusburghardt commented Jan 9, 2025

Description:

After #12604 and #12717 the SSG library started to be used externally, initially being tested for OSCAL integrations in trestle-bot. During the tests it was noticed an increasing complexity to extract profile information, such as rules and variables.

In order to make it simpler, new functions were introduced to collect a list of profiles for a list of given products.
These profiles are returned with the respective selections of rules and variables and also some ids.
The management of yaml properties, control files and inheritance is done internally to avoid complexity on integrations.

The new functions don't change any other existing functions used by building scripts and therefore brings no risk to break existing code.

Rationale:

  • Simplifies the integration of external tools with CaC content.
  • Reduces the knowledge curve for developers on-boarding on SSG library.
  • Introduces new functions to more easily collect and process rules and variables.
  • Fixes https://issues.redhat.com/browse/CPLYTM-444

Review Hints:

As usual the recommended approach to review the PR is commit by commit since they were organized chronologically.
Most of the functions are not new. They were already implemented in variables.py file (#12717) but were now moved to a new file called profiles.py where the moved functions were refactored to be more usable in the project.

Unit tests were included for new external functions and can be tested by:

python -m pytest tests/unit/ssg-module/test_profiles.py
python -m pytest tests/unit/ssg-module/test_variables.py

For usability, you can test with this example script:

import os
from ssg.products import get_all
from ssg.profiles import get_profiles_from_products
from ssg.variables import (
    get_variable_property,
    get_variable_values,
    get_variables_from_profiles,
)
from pprint import pprint

def _get_root_content_dir() -> str:
    # Update the path according to your environment
    home_dir = os.path.expanduser("~")
    content_root_dir = os.path.join(home_dir, "CaC", "Forks", "content")
    return content_root_dir


def get_products(content_root_dir: str) -> dict:
    products = get_all(content_root_dir)
    return products


def validate_products(products: dict, rhel_products: list) -> bool:
    if rhel_products in products:
        return True
    return False


def main():
    rhel_products = ['rhel8', 'rhel9', 'rhel10']
    content_root_dir = _get_root_content_dir()
    products = get_products(content_root_dir)
    
    missing_products = [product for product in rhel_products if product not in products.linux]
    if missing_products:
        print(f"Check the existence of these products: {missing_products}")
        exit(1)
    
    # Get all profiles for the specified products. Besides profile and product ids and titles,
    # the profile object also contain rules and variables with respective values for that profile.
    profiles = get_profiles_from_products(content_root_dir, rhel_products, sorted=True)
    for profile in profiles:
        print(f'Rules for {profile.product_id} in profile {profile.profile_id}:')
        print(profile.rules)
        
        print(f'Variables for {profile.product_id} in profile {profile.profile_id}:')
        print(profile.variables)

    # Get variables used in a list of profiles and organize them into a nested dictionary.
    profile_variables = get_variables_from_profiles(profiles)
    #pprint(profile_variables)

    # Variables are defined in control files and profiles as variable_id=variable_option.
    # But the real values for these options are stored in the variable files. This function
    # will return these specific values including the implicit default value for the variable.
    profile_variables_values = get_variable_values(content_root_dir, profile_variables)
    #pprint(profile_variables_values)

    # Once we have a variable id, we can easily collect any property of it. For example, the title
    # and description.
    variable_title = get_variable_property(content_root_dir, 'sshd_idle_timeout_value', 'title')
    variable_description = get_variable_property(content_root_dir, 'sshd_idle_timeout_value', 'description')
    print(f'{variable_title} - {variable_description}')


if __name__ == "__main__":
    main()

This commits move all profile related rules from variables.py to a new
file called profiles.py and update the profile functions to make them
more generic and reusable. This also simplifies the variables.py files
which is now only used to process variables files, options and
respective values.

Signed-off-by: Marcus Burghardt <[email protected]>
It was not considering the case when a profile inherits all levels of
a control file, which is also a common case.

Signed-off-by: Marcus Burghardt <[email protected]>
It may be cases where the integration may or may not benefit of sorted
selections, so the capability was included as optional where the default
is false, meaning the result will be ordered by the original order they
where defined in profiles and control files.

Signed-off-by: Marcus Burghardt <[email protected]>
With the introduction of profiles.py, it sounds more straight forward
in some cases to collect all profiles selections first and then only
work with the variables options and values. So the function
get_variables_from_profiles is now external to optmize cases where
profiles selections were already obtained.

Signed-off-by: Marcus Burghardt <[email protected]>
It was tricky to mock a test scenario for this function without creating
a very complex test. That was the reason I opted to use real content
taking into account the predictability of the project, which I
considered enough for this particular unit test.

Signed-off-by: Marcus Burghardt <[email protected]>
@marcusburghardt marcusburghardt added the Infrastructure Our content build system label Jan 9, 2025
@marcusburghardt marcusburghardt added this to the 0.1.76 milestone Jan 9, 2025
Copy link

github-actions bot commented Jan 9, 2025

Start a new ephemeral environment with changes proposed in this pull request:

Fedora Environment
Open in Gitpod

Oracle Linux 8 Environment
Open in Gitpod

The merit of this commit is for @qduanmu.
She discovered and fixed this in PR#12792 but I was refactoring the
library in the meantime, so it was incorporated here.

Signed-off-by: Marcus Burghardt <[email protected]>
@vojtapolasek vojtapolasek self-assigned this Jan 9, 2025
It was broken in smaller functions for a better readability with less
complexity.

Signed-off-by: Marcus Burghardt <[email protected]>
The product title is also useful to give more context about the product
id and the cost to include it in the profile objects is minimal, so it
was included to simplify collection of profiles information.

Signed-off-by: Marcus Burghardt <[email protected]>
Signed-off-by: Marcus Burghardt <[email protected]>
For now there are specific functions to return variables options and
values but there are more properties in variables files that may be
interesting, such as title and description. To avoid multiple reads of
the same file, it was introduced a cache for variables yaml content.

Signed-off-by: Marcus Burghardt <[email protected]>
Copy link

codeclimate bot commented Jan 9, 2025

Code Climate has analyzed commit 8dd74d8 and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 93.3% (50% is the threshold).

This pull request will bring the total coverage in the repository to 61.8% (0.1% change).

View more on Code Climate.

@Mab879
Copy link
Member

Mab879 commented Jan 10, 2025

/packit retest-failed

@marcusburghardt
Copy link
Member Author

Gate / Build, Test on Ubuntu 24.04 (pull_request) should be solved by #12796
Gate / Build on Windows (pull_request) should be solved by #12798

@vojtapolasek
Copy link
Collaborator

Hello @marcusburghardt and thank you for the PR. I don't see problems in most of the PR, but I do not consider using unit test which actually uses the real content as a good idea. It can introduce problems in the future in case we change something in the content.
I tried to mock the content dir and actually I think it is not such a problem, please see this branch.
https://github.com/vojtapolasek/content/tree/pr12797_testing
Would you consider adding these test data? Maybe they can be even reduced.

@marcusburghardt
Copy link
Member Author

Hello @marcusburghardt and thank you for the PR. I don't see problems in most of the PR, but I do not consider using unit test which actually uses the real content as a good idea. It can introduce problems in the future in case we change something in the content. I tried to mock the content dir and actually I think it is not such a problem, please see this branch. https://github.com/vojtapolasek/content/tree/pr12797_testing Would you consider adding these test data? Maybe they can be even reduced.

Let me check your mock. If you found an easier way to mock this, I would be happy to use. I was not completely in peace using real content, but just considered it to avoid more complexity than necessary.

Another point is that this PR is already pretty big. Would you be fine if I open another PR right after these changes only improving this test unit?

@vojtapolasek
Copy link
Collaborator

Yes @marcusburghardt I was also concerned about making the PR quite big. I am fine with adjusting the unit test in a separate PR.

Copy link
Collaborator

@vojtapolasek vojtapolasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve. Builds on Ubuntu and windows are broken now:
#12798
#12795

@vojtapolasek vojtapolasek merged commit 04c056a into ComplianceAsCode:master Jan 10, 2025
104 of 106 checks passed
@marcusburghardt marcusburghardt deleted the ssg_profile_functions branch January 10, 2025 10:20
"""
parts = control_line.split(":")
if len(parts) == 3:
return parts[0], parts[2]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcusburghardt Is it necessary to process control_id(the parts[1]) besides the policy and rule level filters here, will this get some extra rules by ignoring this field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary. parts[1] would be a very special case where only some controls from the control file are picked up.
https://complianceascode.readthedocs.io/en/latest/manual/developer/03_creating_content.html#using-controls-in-profiles

No profiles are using this. Instead, the profiles use all the controls informed in the control file to keep the best coverage as possible. The common use cases are to unselect rules not necessary for a particular product, in case of more generic control files such as PCI-DSS.

for var_file in all_variable_files:
variables_content = {}

for var_file in get_variable_files(content_dir):
try:
yaml_content = open_and_macro_expand(var_file)
Copy link

@qduanmu qduanmu Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I failed to run 'get_variable_values' outside of the content repo. I got error like:

Error processing file /xxx/content/linux_os/guide/auditing/auditd_configure_rules/var_audit_failure_mode.var: cannot access local variable 'filename' where it is not associated with a value

Seems like the error is still caused by loading jinja macros, which is not installed together with the ssg library. In ssg/constants.py:

JINJA_MACROS_DIRECTORY = os.path.abspath(os.path.join(os.path.dirname(os.path.dirname(
    __file__)), "shared", "macros"))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. It should be fixed by #12816

@marcusburghardt marcusburghardt added Highlight This PR/Issue should make it to the featured changelog. enhancement General enhancements to the project. labels Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement General enhancements to the project. Highlight This PR/Issue should make it to the featured changelog. Infrastructure Our content build system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants