Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yarn: SBOM components #739

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

slimreaper35
Copy link
Member

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • Docs updated (if applicable)
  • Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

The class is very often interpreted as `algorithm:hash`. Let's make use
of builtin __str__ magic method and return the same format.

Signed-off-by: Michal Šoltis <[email protected]>
Each "yarn classic" package type has a property that returns the package
URL based on its attributes and community PURL specification [1].

All package types share the same base -> name, version, and type which
is set to "npm" ("yarn" does not exist).

- `FilePackage`, `WorkspacePackage`, `LinkPackage` have in addition
  subpath component (extra subpath within a package, relative to the
  package root)

- `UrlPackage` has one extra qualifier -> its URL as it is definied

- `GitPackage` has one extra qualifier -> package version control system
  URL with a specific syntax [2]

- `RegistryPackage` has two extra qualifiers -> repository_url (default
  repository/registry for "npm" is https://registry.npmjs.org so
  alternative registries such as https://registry.yarnpkg.com should be
  qualified via the qualifier) [3], [4] + the checksum of the package
  converted from Subresource Integrity representation

---
[1]: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst
[2]: https://github.com/spdx/spdx-spec/blob/cfa1b9d08903/chapters/3-package-information.md#37-package-download-location-
[3]: https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#npm
[4]: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#known-qualifiers-keyvalue-pairs

Signed-off-by: Michal Šoltis <[email protected]>
After a successful pre-fetching of all packages, report all downloaded
packages as components in the final SBOM.

Create the `Component` object from each package based on package
attributes.

Dev packages should have `cdx:npm:package:development` property, that is
added to the component if package is marked for development -> `dev`
attribute is set to True.

Move the rest of the unit test logic to `test_fetch_yarn_source` from
its predecessor in yarn-berry implementation.

closes containerbuildsystem#636

Signed-off-by: Michal Šoltis <[email protected]>
The commit follows the previous one, that implements generating SBOM
components. Now e2e tests that generate SBOMs should be updated to
reflect this change by running integration tests and setting our custom
env variable: `CACHI2_GENERATE_TEST_DATA=true`

Signed-off-by: Michal Šoltis <[email protected]>
Copy link
Collaborator

@a-ovchinnikov a-ovchinnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a couple small questions/nitpicks. Good to go once those are resolved.

"""Return package URL."""
qualifiers = {}

if self.url != DEFAULT_NPM_REGISTRY:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that self.url can be anything? If not then could it be constrained to a union of literals?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a check for that:
https://github.com/containerbuildsystem/cachi2/blob/main/cachi2/core/package_managers/yarn_classic/resolver.py#L184

IOW, if a package is RegistryPackage it is one of those npm registries.

@@ -26,6 +29,9 @@
re.compile(r"^https?:.+\.git#.+"),
)

DEFAULT_NPM_REGISTRY = "https://registry.npmjs.org"
ALTERNATIVE_NPM_REGISTRY = "https://registry.yarnpkg.com"


class _BasePackage(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this inherit from ABC and have purl as an abstract method?

return _create_sbom_component(packages)


def _create_sbom_component(packages: Iterable[YarnClassicPackage]) -> list[Component]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor naming nitpick: it creates multiple components, so better be named _create_sbom_components. The docstring seems to agree with me on this one.

Another controversial naming scheme to consider is this: _sbom_components_created_from.
Then in the invocation spot you'll have:

return _sbom_components_created_from(packages)

which is mostly a regular English sentence (except for weird punctuation).
I am absolutely not insisting on the second one, it is just a readability idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be _create_sbom_components (probably just a typo).

qualifiers = {"download_url": self.url}
return PackageURL(
type="npm",
name=self.name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some of the non-registry packages, we may want to read their respective package.json files from the cached tarballs to get their true names. As a user, I can call these non-registry dependencies whatever I want in my project's package.json and it could be completely different from what it really is:

"dependencies": {
  "not-fecha": "https://github.com/taylorhakes/fecha.git"
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the filename in the cache come from package.json or the URL ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants