Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add hierarchical option to merge command #338

Merged
merged 40 commits into from
Feb 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
020a4e1
feat: merge components function refactored
CBeck-96 Dec 13, 2024
603fb73
feat: backward compatibility of merge_components ensured
CBeck-96 Dec 14, 2024
dfbe2bc
test: adds test for filter component function
CBeck-96 Dec 14, 2024
0c3c795
feat: adds integration of nested copmponents to already existing comp…
CBeck-96 Dec 14, 2024
80558a3
chore: removes redundant function import
CBeck-96 Dec 14, 2024
a329f04
fix: logic error in the listing of subcomponents
CBeck-96 Dec 15, 2024
77a4fd1
test: fixes logic error in merge_hierarchical test
CBeck-96 Dec 15, 2024
8ddbf08
test: adds integration tests
CBeck-96 Dec 15, 2024
8bad320
doc: adds mention of hierarchical to documentation
CBeck-96 Dec 15, 2024
d0e6b6c
doc: removes draft of flowchart
CBeck-96 Dec 15, 2024
a5a7ede
chore: isort
CBeck-96 Dec 15, 2024
4423566
chore: removes redundant function
CBeck-96 Dec 15, 2024
82b7f12
fix: add all nested components in list
CBeck-96 Dec 17, 2024
8436973
test: refactore merge tests
CBeck-96 Dec 17, 2024
14ab396
test: move test components to json file
CBeck-96 Dec 17, 2024
a449b5a
test: extend integration tests to cover more nested component cases
CBeck-96 Dec 17, 2024
3ac2883
doc: change sbom to SBOM
CBeck-96 Dec 17, 2024
f48699f
doc: fix grammar mistake
CBeck-96 Dec 17, 2024
b991a79
test: add individualized test cases for merge
CBeck-96 Dec 19, 2024
7baface
chore: isort
CBeck-96 Dec 19, 2024
11a5060
doc: change documentaion text and add picture options
CBeck-96 Dec 19, 2024
64796a2
draft: add potential pictures for documentation
CBeck-96 Dec 20, 2024
70799e1
refactor: add type annotation
CBeck-96 Jan 7, 2025
997a189
doc: add picture to merge documentation
CBeck-96 Jan 7, 2025
46cafec
Merge branch 'main' into 152-merge-is-not-hierarchical
CBeck-96 Jan 7, 2025
8402d3f
doc: remove redundant pictures
CBeck-96 Jan 7, 2025
684a614
Merge branch 'main' into 152-merge-is-not-hierarchical
CBeck-96 Jan 9, 2025
1770824
Merge branch 'main' into 152-merge-is-not-hierarchical
CBeck-96 Jan 20, 2025
4f42336
Merge branch 'main' into 152-merge-is-not-hierarchical
CBeck-96 Jan 30, 2025
5ca7dff
doc: fix grammar mistakes in comments and function descriptions
CBeck-96 Jan 30, 2025
084e7eb
Merge branch 'main' into 152-merge-is-not-hierarchical
CBeck-96 Jan 31, 2025
0eaff4c
refactor: change name of new components in filter_components()
CBeck-96 Jan 31, 2025
b696269
test: removes redundant code from tests
CBeck-96 Feb 3, 2025
b1a7d4c
test: add integration test for warning duplicate component
CBeck-96 Feb 3, 2025
17deec3
Merge branch 'main' into 152-merge-is-not-hierarchical
CBeck-96 Feb 3, 2025
59f0de9
doc: correct link to cycloneDX documentation
CBeck-96 Feb 3, 2025
2154dcc
doc: clarify the handling of nested components during merge with and …
CBeck-96 Feb 3, 2025
a421d2e
doc: fix grammar and consistency issues
CBeck-96 Feb 4, 2025
8ea339a
doc: correct function documentation
CBeck-96 Feb 4, 2025
17812ee
doc: updated picture for merge structure
CBeck-96 Feb 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions cdxev/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -441,9 +441,14 @@ def create_merge_parser(
parser.add_argument(
"--from-folder",
metavar="<from-folder>",
help="Path to a folder with sboms to be merged",
help="Path to a folder with SBOMs to be merged.",
type=Path,
)
parser.add_argument(
"--hierarchical",
help="Flag to determine if the components should be merged hierarchical.",
action="store_true",
)
add_output_argument(parser)

parser.set_defaults(cmd_handler=invoke_merge, parser=parser)
Expand Down Expand Up @@ -879,7 +884,7 @@ def invoke_merge(args: argparse.Namespace) -> int:
)

inputs = [sbom for (sbom, _) in (read_sbom(input) for input in inputs)]
output = merge(inputs)
output = merge(inputs, hierarchical=args.hierarchical)
write_sbom(output, args.output)
return Status.OK

Expand Down
32 changes: 0 additions & 32 deletions cdxev/auxiliary/sbomFunctions.py
Original file line number Diff line number Diff line change
Expand Up @@ -377,38 +377,6 @@ def _recurse(
_recurse(sbom["components"], func, *args, **kwargs)


def get_corresponding_reference_to_component(
component: dict, list_of_components: list
) -> tuple[bool, str]:
"""
Function that checks if a given component is contained
in a list of components and returns the bom-ref from
the corresponding component in the list.

Parameters
----------
component: dict
A component dict
list_of_components: str
A list of component dicts

Returns
-------
is_in_list: bool
A boolean describing if the component is in the list
bomref_from_list:
The bom-ref from the corresponding component in the list
"""
is_in_list = False
bomref_from_list = ""
for component_from_list in list_of_components:
if compare_components(component, component_from_list):
is_in_list = True
bomref_from_list = component_from_list.get("bom-ref", "")
break
return is_in_list, bomref_from_list


# Function for the usage of the python cyclonedx model


Expand Down
225 changes: 162 additions & 63 deletions cdxev/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
compare_time_flag_from_vulnerabilities,
compare_vulnerabilities,
copy_ratings,
extract_components,
get_bom_refs_from_dependencies,
get_corresponding_reference_to_component,
get_dependency_by_ref,
get_ref_from_components,
)
Expand All @@ -18,7 +18,76 @@
logger = logging.getLogger(__name__)


def merge_components(governing_sbom: dict, sbom_to_be_merged: dict) -> t.List[dict]:
def filter_component(
present_components: list[ComponentIdentity],
components_to_add: list,
kept_components: list,
dropped_components: list,
add_to_existing: dict,
) -> list[dict]:
"""
Function that goes through a list of components and their nested sub components
and determine if they are present in a provided list with component identities.

The function operates directly on the lists and dictionary provided and returns
a list of filtered top level components that were not found in present_components.
Filtered means, that the nested components are also not already present.

param present_components: a list of component identities that are already present in the SBOM.
param components_to_add: a list of components that shall be compared against the list of
already present components.
param kept_components: list of components not present in the list of provided components,
including nested components.
param dropped_components: list of added components that are already present.
param add_to_existing: list of nested components that have to be added to present_components.

returns: filtered_components: list of top level components not present in present_components
"""
filtered_components: list[dict] = []
for component in components_to_add:
component_id = ComponentIdentity.create(component, allow_unsafe=True)
# component is new
if component_id not in present_components:
nested_components = filter_component(
present_components,
component.get("components", []),
kept_components,
dropped_components,
add_to_existing,
)
if component.get("components", []):
component["components"] = nested_components
filtered_components.append(component)
kept_components.append(component)

# component already present
# contained components get filtered and added to the component in the main sbom
else:
logger.warning(
LogMessage(
"Potential loss of information",
f"Dropping a duplicate component ({component_id}) from the merge result.",
)
)
dropped_components.append(component)
nested_components = filter_component(
present_components,
component.get("components", []),
kept_components,
dropped_components,
add_to_existing,
)
if nested_components:
add_to_existing[component_id] = (
add_to_existing.get(component_id, []) + nested_components
)

return filtered_components


def merge_components(
governing_sbom: dict, sbom_to_be_merged: dict, hierarchical: bool = False
) -> t.List[dict]:
"""
Function that gets two lists of components and merges them unique into one.

Expand All @@ -34,67 +103,93 @@ def merge_components(governing_sbom: dict, sbom_to_be_merged: dict) -> t.List[di
Output:
list_of_merged_components: List with the uniquely merged components of the submitted sboms
"""
list_of_merged_components = governing_sbom.get("components", [])
list_of_merged_components: t.List[dict] = governing_sbom.get("components", [])
list_of_added_components = sbom_to_be_merged.get("components", [])
list_of_merged_bom_refs = get_ref_from_components(list_of_merged_components)
for component in list_of_added_components:
is_in_list, bom_ref_from_list = get_corresponding_reference_to_component(
component, list_of_merged_components
)
if is_in_list:
component_id = ComponentIdentity.create(component, allow_unsafe=True)
logger.warning(
LogMessage(
"Potential loss of information",
f"Dropping a duplicate component ({component_id}) from the merge result.",
)

present_component_identities: dict[ComponentIdentity, dict] = {}
for component in extract_components(governing_sbom.get("components", [])):
present_component_identities[
ComponentIdentity.create(component, allow_unsafe=True)
] = component

kept_components: list[dict] = []
dropped_components: list[dict] = []
add_to_existing: dict[ComponentIdentity, dict] = {}
list_present_component_identities = list(present_component_identities.keys())
list_of_filtered_components = filter_component(
list_present_component_identities,
list_of_added_components,
kept_components,
dropped_components,
add_to_existing,
)

list_of_merged_components += list_of_filtered_components

if hierarchical:
for key in add_to_existing.keys():
list_of_subcomponents = (
present_component_identities[key].get("components", [])
+ add_to_existing[key]
)
# if the component in the sbom_to_be_merged has a different
# bom-ref than the governing_sbom, then the bom-ref will be
# replaced through the one from the governing_sbom.
# while doing so, the algorithm checks, that the sbom does not
# already contain a different component with that ref, if so
# that component's bom-ref will be renamed
if bom_ref_from_list != component.get("bom-ref", 1):
counter = 0
new_reference = bom_ref_from_list
while not replace_ref_in_sbom(
new_reference, component.get("bom-ref", ""), sbom_to_be_merged
):
counter += 1
new_reference = bom_ref_from_list + "_" + str(counter)
present_component_identities[key]["components"] = list_of_subcomponents
else:
for key in add_to_existing.keys():
for new_component in add_to_existing[key]:
list_of_merged_components.append(new_component)

for component in dropped_components:
# if the component in the sbom_to_be_merged has a different
# bom-ref than the governing_sbom, then the bom-ref will be
# replaced through the one from the governing_sbom.
# While doing so, the algorithm checks, that the SBOM does not
# already contain a different component with that ref, if so
# that component's bom-ref will be renamed.
component_id = ComponentIdentity.create(component, allow_unsafe=True)
bom_ref_from_list = present_component_identities[component_id].get(
"bom-ref", ""
)
if bom_ref_from_list != component.get("bom-ref", 1):
counter = 0
new_reference = bom_ref_from_list
while not replace_ref_in_sbom(
new_reference, component.get("bom-ref", ""), sbom_to_be_merged
):
counter += 1
new_reference = bom_ref_from_list + "_" + str(counter)

for component in kept_components:
if not (component.get("bom-ref", 1) in list_of_merged_bom_refs):
list_of_merged_bom_refs.append(component.get("bom-ref", ""))
else:
if not (component.get("bom-ref", 1) in list_of_merged_bom_refs):
list_of_merged_components.append(component)
list_of_merged_bom_refs.append(component.get("bom-ref"))
else:
# if the bom-ref already exists in the components, add a incrementing number to
# the bom-ref
list_of_bom_refs_to_be_added = get_ref_from_components(
sbom_to_be_merged.get("components", [])
)
list_of_bom_refs_to_be_added.append(
sbom_to_be_merged.get("metadata", {})
.get("component", {})
.get("bom-ref", "")
)
bom_ref_is_not_unique = False
new_bom_ref = component.get("bom-ref")
n = 0
while new_bom_ref in list_of_merged_bom_refs or bom_ref_is_not_unique:
n += 1
new_bom_ref = component.get("bom-ref") + "_" + str(n)
# The new bom-ref must not appear in either of the sboms
if new_bom_ref in list_of_bom_refs_to_be_added:
bom_ref_is_not_unique = True
else:
bom_ref_is_not_unique = False
replace_ref_in_sbom(
new_bom_ref, component.get("bom-ref", ""), sbom_to_be_merged
)
list_of_merged_components.append(component)
list_of_merged_bom_refs.append(new_bom_ref)
return list_of_merged_components # type:ignore [no-any-return]
# if the bom-ref already exists in the components, add a incrementing number to
# the bom-ref
list_of_bom_refs_to_be_added = get_ref_from_components(
sbom_to_be_merged.get("components", [])
)
list_of_bom_refs_to_be_added.append(
sbom_to_be_merged.get("metadata", {})
.get("component", {})
.get("bom-ref", "")
)
bom_ref_is_not_unique = False
new_bom_ref = component.get("bom-ref", "")
n = 0
while new_bom_ref in list_of_merged_bom_refs or bom_ref_is_not_unique:
n += 1
new_bom_ref = component.get("bom-ref", "") + "_" + str(n)
# The new bom-ref must not appear in either of the SBOMs
if new_bom_ref in list_of_bom_refs_to_be_added:
bom_ref_is_not_unique = True
else:
bom_ref_is_not_unique = False
replace_ref_in_sbom(
new_bom_ref, component.get("bom-ref", ""), sbom_to_be_merged
)
list_of_merged_bom_refs.append(new_bom_ref)

return list_of_merged_components


def merge_dependency(
Expand Down Expand Up @@ -163,7 +258,9 @@ def merge_dependency_lists(
return list_of_merged_dependencies


def merge_2_sboms(original_sbom: dict, sbom_to_be_merged: dict) -> dict:
def merge_2_sboms(
original_sbom: dict, sbom_to_be_merged: dict, hierarchical: bool = False
) -> dict:
"""
Function that merges two sboms.

Expand All @@ -181,7 +278,9 @@ def merge_2_sboms(original_sbom: dict, sbom_to_be_merged: dict) -> dict:
components_of_sbom_to_be_merged.append(component_from_metadata)
list_of_original_dependencies = original_sbom.get("dependencies", [])
list_of_new_dependencies = sbom_to_be_merged.get("dependencies", [])
list_of_merged_components = merge_components(original_sbom, sbom_to_be_merged)
list_of_merged_components = merge_components(
original_sbom, sbom_to_be_merged, hierarchical=hierarchical
)
merged_dependencies = merge_dependency_lists(
list_of_original_dependencies,
list_of_new_dependencies,
Expand All @@ -208,7 +307,7 @@ def merge_2_sboms(original_sbom: dict, sbom_to_be_merged: dict) -> dict:
return merged_sbom


def merge(sboms: t.Sequence[dict]) -> dict:
def merge(sboms: t.Sequence[dict], hierarchical: bool = False) -> dict:
"""
Function that merges a list of sboms successively in to the first one and creates an JSON file.
for the result
Expand All @@ -222,7 +321,7 @@ def merge(sboms: t.Sequence[dict]) -> dict:
"""
merged_sbom = sboms[0]
for k in range(1, len(sboms)):
merged_sbom = merge_2_sboms(merged_sbom, sboms[k])
merged_sbom = merge_2_sboms(merged_sbom, sboms[k], hierarchical=hierarchical)
return merged_sbom


Expand Down
1 change: 1 addition & 0 deletions docs/source/img/merge_hierarchical_structure.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions docs/source/usage/merge.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@ The process runs iteratively, merging two SBOMs in each iteration. In the first

In mathematical terms: :math:`output = (((input_1 * input_2) * input_3) * input_4 ...)`

The merge is per default not hierarchical for the ``components`` field of a ``component`` (`CycloneDX documentation <https://cyclonedx.org/docs/1.6/json/#components_items_components>`_). This means that components that were contained in the ``components`` of an already present component will just be added as new components under the SBOMs' ``components`` sections.
The ``--hierarchical`` flag allows for hierarchical merges. This affects only the top level components of the merged SBOM. The structured of nested components is preserved in both cases (except the removal of already present components), as shown for "component 4" in the image below.

.. image:: /img/merge_hierarchical_structure.svg
:alt: Merge components structure default and hierarchical.

A few notes on the merge algorithm:

- The ``metadata`` field is always retained from the first input and never changed through a merge with the exception of the ``timestamp``.
Expand Down
Loading