Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SPDX expressions #2400

Merged
merged 5 commits into from
Aug 19, 2023
Merged

Conversation

hborchardt
Copy link
Contributor

@hborchardt hborchardt commented Jan 21, 2023

Description

Hi, I am very interested in seeing SPDX expression support in dependency-track.

I implemented a SPDX expression parser and reimplemented the license group policy evaluator and license policy evaluator logic.
It also implements the logic for the policy operator ALL, where previously violations were added if any policy condition is violated, regardless of the choice of the policy operator.

Addressed Issue

Addresses #170

Additional Details

The parser is based on the shunting yard algorithm for infix operators and parentheses.

  • AND and OR operators work as expected.
  • WITH operator works by converting GPL-2.0 WITH classpath-exception into a GPL-2.0-with-classpath-exception license, which is available in the default license list
  • + operator is interpreted as GPL-2.0+ being equivalent to GPL-2.0 OR GPL-2.0-or-later

I store the SPDX expression in the license field of the Component. It currently has a maximum length of 255 characters - should this be increased?
If a SPDX expression is available in the CycloneDX BOM, it is preferred over the content of the licenses info. If a resolvedLicense is found, the resolved license takes precedence over the SPDX expression.

I was also looking into making the SPDX expression tree information available to the frontend for visualisation - currently it is just parsed on-demand upon policy validation - but I did not find a reference of how such a "derived/computed" property of a model class would be made available to the frontend.

As this is my first time contributing to dependency-track, I am very open for criticism and comments of any sort.

Checklist

  • I have read and understand the contributing guidelines
  • This PR fixes a defect, and I have provided tests to verify that the fix is effective
  • This PR implements an enhancement, and I have provided tests to verify that it works as intended
  • This PR introduces changes to the database model, and I have added corresponding update logic
  • This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

@sonatype-lift
Copy link
Contributor

sonatype-lift bot commented Jan 21, 2023

🛠 Lift Auto-fix

Some of the Lift findings in this PR can be automatically fixed. You can download and apply these changes in your local project directory of your branch to review the suggestions before committing.1

# Download the patch
curl https://lift.sonatype.com/api/patch/github.com/DependencyTrack/dependency-track/2400.diff -o lift-autofixes.diff

# Apply the patch with git
git apply lift-autofixes.diff

# Review the changes
git diff

Want it all in a single command? Open a terminal in your project's directory and copy and paste the following command:

curl https://lift.sonatype.com/api/patch/github.com/DependencyTrack/dependency-track/2400.diff | git apply

Once you're satisfied commit and push your changes in your project.

Footnotes

  1. You can preview the patch by opening the patch URL in the browser.

@hborchardt hborchardt force-pushed the spdx-expressions branch 2 times, most recently from 09f512f to ee68704 Compare January 21, 2023 19:29
@hborchardt
Copy link
Contributor Author

I realized that the previous interpretation of the WITH operator as an AND operator does not make sense - it makes it hard to disallow GPL-2.0 but allow GPL-2.0 WITH classpath-exception. So I changed the implementation to interpret GPL-2.0 WITH classpath-exception as GPL-2.0-with-classpath-exception

Copy link
Member

@nscuro nscuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @hborchardt! 🙌

Beside the other comments, I think it would be beneficial to have some documentation that explains how the policy evaluation works with license expressions.

IS and IS_NOT are straightforward when used with licenses or license groups, but expressions add the dimension of license compatibility into the mix, which is not as easy to grasp.

I have to admit I struggled to understand how it's working, so I assume it will be even more of a problem for non-technical users.

component.setResolvedLicense(license);
if (licenseChoice != null) {
if (licenseChoice.getExpression() != null) {
component.setLicense(licenseChoice.getExpression());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing the license field here has the downside that we can't differentiate between license name and license expression when exporting BOMs again. For example, when using the "inventory" export with a project that has components that have license expressions, the export will show:

"licenses" : [
  {
    "license" : {
      "name" : "LGPL-2.1-only WITH CPE AND MIT OR BSD-3-Clause"
    }
  }
],

Introducing a separate licenseExpression field to the Component class would help to disambiguate.

As you pointed out in the PR description, a limit of 255 characters may not be able to fit all possible expressions. If you end up adding a new field, you could define the jdbcType as CLOB (translates to TEXT or MEDIUMTEXT, depending on the database) instead, which does not have a character limit.

See here for an example of a CLOB field:

@Persistent(defaultFetchGroup = "true")
@Column(name = "DETAILS", jdbcType = "CLOB", allowsNull = "true")
@NotNull
private String analysisDetails;

Also note that the frontend should be updated to show the expression in the Component Details view:

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will then create a PR in the frontend project for this.

Comment on lines 86 to 108
if (policy.getOperator() == Operator.ALL) {
if (allPoliciesViolated) {
return violations;
} else {
// discard found violations, because not all conditions had violations
return List.of();
}
} else {
// Operator.ANY means just give all found violations
return violations;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Policy operators are evaluated by PolicyEngine, and should not be re-used by condition evaluators. There have been multiple improvements to this recently, e.g. in #2336 and #2460.

I think the linked PRs fixed the problem you referred to in the PR description:

It also implements the logic for the policy operator ALL, where previously violations were added if any policy condition is violated, regardless of the choice of the policy operator.

Can you confirm this is the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it looks like those PRs address this in a more general way, which is nice. I haven't extensively tested them, but the logic seems sound (as long as one evaluator.evaluate(...) invocation does not return more than one violation per condition - which is probably a reasonable assumption, but not totally obvious)

boolean hasViolations = false;
if (condition.getOperator() == PolicyCondition.Operator.IS) {
// call canLicenseBeUsed with inverted logic
if (!canLicenseBeUsed(qm, expression, LicenseGroupType.NegativeList, lg)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is very hard to understand. What is license in canLicenseBeUsed, given that both expression and lg can contain multiple licenses? What exactly is the meaning of NegativeList vs. PositiveList?

Reading it as:

if expression is incompatible with [licenses in lg]

makes sense, but the NegativeList part is throwing me off.

Any chance we can improve the wording here, so that it becomes a bit more obvious what's happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the wording to ForbiddenLicenses and AllowedLicenses, which hopefully better captures the intent. Also added a section to the docs, explaining the details on how to use it and set it up, and what it does.

@hborchardt
Copy link
Contributor Author

Thanks for the feedback - I think I managed to address all of it

@msymons
Copy link
Member

msymons commented Apr 25, 2023

@hborchardt: following the release of v4.8.0, this PR now has conflicts that need to be resolved before it can be merged. Please can you fix things? Then the PR can be reviewed.

@melba-lopez
Copy link
Contributor

@hborchardt are you able to update this PR to not have a branch conflict?

Also, @nscuro do we want to consider (maybe in the future) incorporating CycloneDX CLI to convert SPDX to CDX so that we let that handle the import versus creating something new?

@melba-lopez melba-lopez added the enhancement New feature or request label Jul 28, 2023
@melba-lopez
Copy link
Contributor

@hborchardt can you update your branch? this is slated for the 4.9 release and would like for your contributions to be included!

@hborchardt
Copy link
Contributor Author

Hi, just yesterday I was updating the branch accordingly, but I have to redo it now... But you can expect an updated branch in the next days

Implement a shunting yard algorithm to process the infix operators and
parentheses in the SPDX expression grammar. Parses an expression into
a tree of SpdxExpression/SpdxExpressionOperation objects.

Signed-off-by: Hendrik Borchardt <[email protected]>
Store it when importing a cyclonedx BOM, and export it when creating
one. Use it for resolving the license from the database, when the
expression is just one license id.

Signed-off-by: Hendrik Borchardt <[email protected]>
Recursively check license group compatibility with SpdxExpression,
according to the SpdxExpressionOperations and whether the PolicyCondition
specifies a positive list or negative list of licenses.

As a special case, a `null` license group is interpreted as "unresolved"
licenses, to be used by the LicensePolicyEvaluator.

Signed-off-by: Hendrik Borchardt <[email protected]>
A license policy should be the same as a LicenseGroup policy where the
group contains just one license.

Also handle the case of "unresolved" keyword as condition value, for
referring to unresolved licenses.

Signed-off-by: Hendrik Borchardt <[email protected]>
Copy link
Member

@nscuro nscuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for being patient with us @hborchardt, apologies for letting you wait for so long.

I think this is a really cool addition, I am sure the SpdxExpressionParser will come in handy in multiple other areas as well.

While testing, I found a few more gaps:

  • licenseExpression is not considered when creating or updating Components in ComponentResource, such that initially setting or modifying an expression was not possible
  • For the frontend, the licenseExpression field was not sent to the API server as part of the update request
  • As noted in a comment, the manual DB schema migration is not necessary

Additionally, thanks to your parser, we can detect invalid expressions. This is super useful and allows us to prevent invalid expressions from entering the system. Validation of expressions should be performed upon BOM ingestion, and manual creation / updating of components via REST API.

I already made those changes and will push them after merging your PRs.

Comment on lines +67 to +77
private void addLicenseExpressionColumnToComponents(Connection connection) throws Exception {
// The JDBC type "CLOB" is mapped to the type CLOB for H2, MEDIUMTEXT for MySQL, and TEXT for PostgreSQL and SQL Server.
LOGGER.info("Adding \"LICENSE_EXPRESSION\" to \"COMPONENTS\"");
if (DbUtil.isH2()) {
DbUtil.executeUpdate(connection, "ALTER TABLE \"COMPONENTS\" ADD \"LICENSE_EXPRESSION\" CLOB");
} else if (DbUtil.isMysql()) {
DbUtil.executeUpdate(connection, "ALTER TABLE \"COMPONENTS\" ADD \"LICENSE_EXPRESSION\" MEDIUMTEXT");
} else {
DbUtil.executeUpdate(connection, "ALTER TABLE \"COMPONENTS\" ADD \"LICENSE_EXPRESSION\" TEXT");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, non-intrusive schema changes like this one are completely taken care of by the ORM, so there's no manual migration needed. The intent is very much appreciated though, thanks for having the schema change in mind!

@nscuro nscuro merged commit f4bace4 into DependencyTrack:master Aug 19, 2023
7 checks passed
@nscuro nscuro mentioned this pull request Aug 19, 2023
2 tasks
@nscuro nscuro removed their assignment Aug 19, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants