Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scorecard Integration #1294

Open
wants to merge 61 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
173db43
Add ScoreCard config into settings.py
404-geek Jun 26, 2024
272b99c
code style fix nexB#598
404-geek Jun 26, 2024
944cee2
Merge branch 'refs/heads/ossf_api_template' into scorecard_integration
404-geek Jun 26, 2024
bc445c1
settings.py code style fix nexB#598
404-geek Jun 26, 2024
f241b3b
mixin import and models declaration nexB#1283
404-geek Jun 27, 2024
605a5cf
transforming scorecard data into object for saving nexB#1283
404-geek Jun 27, 2024
3833dca
added test cases for saving scorecard data into models and modified s…
404-geek Jul 1, 2024
37380c8
Merge branch 'main' into feat-models_integration
404-geek Jul 1, 2024
5502b5f
added score checks mixin to models nexB#1283
404-geek Jul 1, 2024
2aeb2a4
Merge branch 'nexB:main' into feat-models_integration
404-geek Jul 3, 2024
103fca0
empty details in score response handled nexB#1283
404-geek Jul 3, 2024
952e6a6
changed class names to camel case for models and modified the tests n…
404-geek Jul 3, 2024
3ba3db5
code formatted nexB#1283
404-geek Jul 3, 2024
2f2b846
code formatted nexB#1283
404-geek Jul 3, 2024
39a056d
code formatted nexB#1283
404-geek Jul 3, 2024
e5f3e7a
docstrings formatted nexB#1283
404-geek Jul 3, 2024
c2f5c4d
created basic fetch and availability functions for scorecode pipeline…
404-geek Jul 4, 2024
6ee5b07
Merge branch 'nexB:main' into scorecard_integration
404-geek Jul 7, 2024
760afc2
Merge remote-tracking branch 'origin/scorecard_integration' into scor…
404-geek Jul 7, 2024
0dbc92f
modified doc strings and models and imported ScoreCode package in set…
404-geek Jul 7, 2024
d652f42
setup.cfg nexB#1283
404-geek Jul 7, 2024
aa154c0
Merge branch 'nexB:main' into scorecard_integration
404-geek Jul 8, 2024
37ad73a
Merge branch 'refs/heads/feat-models_integration' into scorecard_inte…
404-geek Jul 8, 2024
563991b
reinstated deleted code during rebase nexB#1283
404-geek Jul 11, 2024
4632dfc
code formatting nexB#1283
404-geek Jul 11, 2024
923c834
database migrations for scorecard nexB#1283
404-geek Jul 11, 2024
94bfcd0
updated the scanpipe only fields nexB#1283
404-geek Jul 13, 2024
d129f73
changed scorecode commit hash for latest pull nexB#1283
404-geek Jul 13, 2024
24c7be0
Merge branch 'nexB:main' into scorecard_integration
404-geek Jul 15, 2024
259f004
update pipeline code and changed scorecode hash commit nexB#1283
404-geek Jul 15, 2024
ccd75ae
changed imports structure nexB#1283
404-geek Jul 15, 2024
9d80ef1
Merge branch 'nexB:main' into scorecard_integration
404-geek Jul 25, 2024
29da290
modified lookup and save logic nexB#1283
404-geek Jul 25, 2024
9d72734
merged migrations due to conflicts nexB#1283
404-geek Jul 27, 2024
301122e
updated migrations nexB#1283
404-geek Jul 27, 2024
5812d97
updated doc string for get_scorecard_info_packages.py nexB#1283
404-geek Jul 27, 2024
df50416
Added scorecard pipeline to SCIO with intergration test nexB#1283
404-geek Jul 28, 2024
88a43dc
Merge branch 'main' into scorecard_integration
404-geek Aug 20, 2024
fc4945e
moved the data to be regenerated if reqiured nexB#1283
404-geek Aug 20, 2024
c00b3b3
updated urls for testing nexB#1283
404-geek Aug 20, 2024
b97ff7a
added merged migration file nexB#1283
404-geek Aug 20, 2024
496945b
Changed docstring and renamed functions according to suggestions nexB…
404-geek Aug 22, 2024
f86d5bb
class name changes in steps of pipeline nexB#1283
404-geek Aug 22, 2024
36e955a
pipeline name updated nexB#1283
404-geek Aug 22, 2024
90c113a
Merge branch 'aboutcode-org:main' into scorecard_integration
404-geek Sep 22, 2024
43886a3
update pipeline code and steps nexB#1283
404-geek Sep 22, 2024
7c134c7
Merge branch 'main' into scorecard_integration
404-geek Oct 30, 2024
3d4d6ea
update pipeline steps to work with scorecode 0.0.2 release nexB#1283
404-geek Nov 1, 2024
f4ed4b5
Merge remote-tracking branch 'origin/scorecard_integration' into scor…
404-geek Nov 1, 2024
b9229a5
update migration nexB#1283
404-geek Nov 1, 2024
f80f111
Merge branch 'aboutcode-org:main' into scorecard_integration
404-geek Nov 1, 2024
ee2ea14
update migration nexB#1283
404-geek Nov 1, 2024
bd5b9b3
rename pipeline name with data parsing function nexB#1283
404-geek Nov 3, 2024
2b4629e
code valid nexB#1283
404-geek Nov 3, 2024
99ee48d
update setup.cfg nexB#1283
404-geek Nov 3, 2024
80153f8
optimize code while saving score checks nexB#1283
404-geek Nov 4, 2024
77efae0
Merge branch 'aboutcode-org:main' into scorecard_integration
404-geek Dec 2, 2024
113a557
update test cases and regen scorecard data logic nexB#1283
404-geek Dec 3, 2024
64e17fe
update migration file nexB#1283
404-geek Dec 3, 2024
027fe2d
remove unwanted change nexB#1283
404-geek Dec 3, 2024
7f43f9e
Merge branch 'main' into scorecard_integration
404-geek Feb 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions scanpipe/migrations/0067_packagescore_scorecardcheck.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Generated by Django 5.0.7 on 2024-07-27 10:20

import django.db.models.deletion
import uuid
from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('scanpipe', '0066_alter_webhooksubscription_options_and_more'),
]

operations = [
migrations.CreateModel(
name='PackageScore',
fields=[
('scoring_tool', models.CharField(blank=True, choices=[('ossf-scorecard', 'Ossf'), ('others', 'Others')], help_text='Defines the source of a score or any other scoring metricsFor example: ossf-scorecard for scorecard data', max_length=100)),
('scoring_tool_version', models.CharField(blank=True, help_text='Defines the version of the scoring tool used for scanning thepackageFor Eg : 4.6 current version of OSSF - scorecard', max_length=50)),
('score', models.CharField(blank=True, help_text='Score of the package which is scanned', max_length=50)),
('scoring_tool_documentation_url', models.CharField(blank=True, help_text='Documentation URL of the scoring tool used', max_length=100)),
('score_date', models.DateTimeField(blank=True, editable=False, help_text='Date when the scoring was calculated on the package', null=True)),
('uuid', models.UUIDField(db_index=True, default=uuid.uuid4, editable=False, primary_key=True, serialize=False, verbose_name='UUID')),
('discovered_package', models.ForeignKey(blank=True, editable=False, help_text='The package for which the score is given', null=True, on_delete=django.db.models.deletion.CASCADE, related_name='discovered_packages_score', to='scanpipe.discoveredpackage')),
],
options={
'abstract': False,
},
),
migrations.CreateModel(
name='ScorecardCheck',
fields=[
('check_name', models.CharField(blank=True, help_text='Defines the name of check corresponding to the OSSF scoreFor example: Code-Review or CII-Best-PracticesThese are the some of the checks which are performed on a scanned package', max_length=100)),
('check_score', models.CharField(blank=True, help_text='Defines the score of the check for the package scannedFor Eg : 9 is a score given for Code-Review', max_length=50)),
('reason', models.CharField(blank=True, help_text='Gives a reason why a score was given for a specific checkFor eg, : Found 9/10 approved changesets -- score normalized to 9', max_length=300)),
('details', models.JSONField(blank=True, default=list, help_text='A list of details/errors regarding the score')),
('uuid', models.UUIDField(db_index=True, default=uuid.uuid4, editable=False, primary_key=True, serialize=False, verbose_name='UUID')),
('for_package_score', models.ForeignKey(blank=True, editable=False, help_text='The checks for which the score is given', null=True, on_delete=django.db.models.deletion.CASCADE, related_name='discovered_packages_score_checks', to='scanpipe.packagescore')),
],
options={
'abstract': False,
},
),
]
14 changes: 14 additions & 0 deletions scanpipe/migrations/0068_merge_20240820_1656.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Generated by Django 5.0.7 on 2024-08-20 16:56

from django.db import migrations


class Migration(migrations.Migration):

dependencies = [
('scanpipe', '0067_discoveredpackage_notes'),
('scanpipe', '0067_packagescore_scorecardcheck'),
]

operations = [
]
tdruez marked this conversation as resolved.
Show resolved Hide resolved
107 changes: 105 additions & 2 deletions scanpipe/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
from collections import Counter
from collections import defaultdict
from contextlib import suppress
from datetime import datetime
from itertools import groupby
from operator import itemgetter
from pathlib import Path
Expand Down Expand Up @@ -74,6 +75,8 @@
from licensedcode.cache import build_spdx_license_expression
from licensedcode.cache import get_licensing
from matchcode_toolkit.fingerprinting import IGNORED_DIRECTORY_FINGERPRINTS
from ossf_scorecard.contrib.django.models import PackageScoreMixin
from ossf_scorecard.contrib.django.models import ScorecardChecksMixin
from packagedcode.models import build_package_uid
from packagedcode.utils import get_base_purl
from packageurl import PackageURL
Expand Down Expand Up @@ -1833,10 +1836,10 @@ class Run(UUIDPKModel, ProjectRelatedModel, AbstractTaskFieldsModel):
scancodeio_version = models.CharField(max_length=100, blank=True)
description = models.TextField(blank=True)
current_step = models.CharField(max_length=256, blank=True)
selected_groups = models.JSONField(
selected_steps = models.JSONField(
null=True, blank=True, validators=[validate_none_or_list]
)
selected_steps = models.JSONField(
selected_groups = models.JSONField(
tdruez marked this conversation as resolved.
Show resolved Hide resolved
null=True, blank=True, validators=[validate_none_or_list]
)

Expand Down Expand Up @@ -3898,6 +3901,106 @@ def as_spdx(self):
)


class PackageScore(UUIDPKModel, PackageScoreMixin):
def __str__(self):
return self.score or str(self.uuid)

Comment on lines +4030 to +4032
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow the conventions used across the existing Models:

  1. Fields
  2. class Meta
  3. str

discovered_package = models.ForeignKey(
DiscoveredPackage,
related_name="discovered_packages_score",
help_text=_("The package for which the score is given"),
on_delete=models.CASCADE,
editable=False,
blank=True,
null=True,
Comment on lines +4039 to +4040
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a DiscoveredPackageScore instance really exists without a DiscoveredPackage FK defined?

)

@classmethod
@transaction.atomic()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is required here? We are not doing multiple database updates that could crash and needs to be handled?

Copy link
Collaborator Author

@404-geek 404-geek Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are updating the checks and score table at one go for that reason, I have kept it atomic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@404-geek Could you provide an example that shows why atomic() is useful here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@404-geek You haven't address the question above yet ;)

def create_from_data(cls, DiscoveredPackage, scorecard_data, scoring_tool=None):
tdruez marked this conversation as resolved.
Show resolved Hide resolved
"""Create ScoreCard Object from ScoreCard Object"""
tdruez marked this conversation as resolved.
Show resolved Hide resolved
final_data = {
"score": scorecard_data.score,
"scoring_tool_version": scorecard_data.scoring_tool_version,
"scoring_tool_documentation_url": (
scorecard_data.scoring_tool_documentation_url
),
}

date_str = scorecard_data.score_date

formats = ["%Y-%m-%d", "%Y-%m-%dT%H:%M:%SZ"]

if date_str:
naive_datetime = None

for fmt in formats:
try:
naive_datetime = datetime.strptime(date_str, fmt)
except ValueError:
continue

score_date = timezone.make_aware(
naive_datetime, timezone.get_current_timezone()
)

else:
score_date = timezone.now()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there cases where we don't have dates? Can you show some examples?

tdruez marked this conversation as resolved.
Show resolved Hide resolved

final_data["score_date"] = score_date

scorecard_object = cls.objects.create(
**final_data,
discovered_package=DiscoveredPackage,
scoring_tool=scoring_tool,
)

# Create associated scorecard_checks
checks_data = scorecard_data.checks

ScorecardCheck.objects.bulk_create(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there so many checks where this would make a difference in time taken to save these?

[
ScorecardCheck(
check_name=check_data.check_name,
check_score=check_data.check_score,
reason=check_data.reason or "",
details=check_data.details or [],
for_package_score=scorecard_object,
)
for check_data in checks_data
]
)

return scorecard_object


class ScorecardCheck(UUIDPKModel, ScorecardChecksMixin):
def __str__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required? Do you show this anywhere?
I don't think showing just the score/uuid is helpful at all. We need the repo + score anyway if you're logging this. But you should get the values from the objects directly where you log it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just kept it like that

Do you have anything in mind to return if someone calls the instance directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow the conventions used across the existing Models:

  1. Fields
  2. class Meta
  3. str

return self.check_score or str(self.uuid)

for_package_score = models.ForeignKey(
PackageScore,
related_name="discovered_packages_score_checks",
help_text=_("The checks for which the score is given"),
on_delete=models.CASCADE,
editable=False,
blank=True,
null=True,
Comment on lines +4110 to +4111
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those really optional fields?

)

@classmethod
def create_from_data(cls, package_score, check_data):
"""Create a ScorecardCheck instance from provided data."""
final_data = {
"check_name": check_data.get("name"),
"check_score": check_data.get("score"),
"reason": check_data.get("reason"),
"details": check_data.get("details", []),
tdruez marked this conversation as resolved.
Show resolved Hide resolved
"for_package_score": package_score,
}
return cls.objects.create(**final_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not create in directly and have defaults? Doesn't seem like we are doing much here

Copy link
Collaborator Author

@404-geek 404-geek Jul 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AyanSinhaMahapatra thanks for pointing out.
I have updated it to create the object directly



def normalize_package_url_data(purl_mapping, ignore_nulls=False):
"""
Normalize a mapping of purl data so database queries with
Expand Down
89 changes: 89 additions & 0 deletions scanpipe/pipelines/get_scorecard_info_packages.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# SPDX-License-Identifier: Apache-2.0
#
# http://nexb.com and https://github.com/nexB/scancode.io
# The ScanCode.io software is licensed under the Apache License version 2.0.
# Data generated with ScanCode.io is provided as-is without warranties.
# ScanCode is a trademark of nexB Inc.
#
# You may not use this software except in compliance with the License.
# You may obtain a copy of the License at: http://apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software distributed
# under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
# CONDITIONS OF ANY KIND, either express or implied. See the License for the
# specific language governing permissions and limitations under the License.
#
# Data Generated with ScanCode.io is provided on an "AS IS" BASIS, WITHOUT WARRANTIES
# OR CONDITIONS OF ANY KIND, either express or implied. No content created from
# ScanCode.io should be considered or used as legal advice. Consult an Attorney
# for any legal advice.
#
# ScanCode.io is a free software code scanning tool from nexB Inc. and others.
# Visit https://github.com/nexB/scancode.io for support and download.

from ossf_scorecard import scorecard

from scanpipe.models import PackageScore
from scanpipe.pipelines import Pipeline


class FetchScoreCodeInfo(Pipeline):
tdruez marked this conversation as resolved.
Show resolved Hide resolved
"""
Pipeline to fetch ScoreCode information for packages and dependencies.

This pipeline retrieves ScoreCode data for each package in the project and
stores it in the corresponding package instances

Attributes
----------
download_inputs (bool): Indicates whether inputs should be downloaded.
is_addon (bool): Indicates whether this pipeline is an add-on.

Methods
-------
steps(cls):
Defines the steps for the pipeline.

check_scorecode_service_availability(self):
Checks if the ScoreCode service is configured and available.

lookup_save_packages_scorecode_info(self):
Fetches ScoreCode information for each discovered package in the project
and saves the information to the respective package instances.

tdruez marked this conversation as resolved.
Show resolved Hide resolved
"""

download_inputs = False
is_addon = True

@classmethod
def steps(cls):
return (
cls.check_scorecode_service_availability,
cls.lookup_save_packages_scorecode_info,
)

def check_scorecode_service_availability(self):
"""Check if the scorecode service is configured and available."""
if not scorecard.is_configured():
raise Exception("scorecode service is not configured.")

if not scorecard.is_available():
raise Exception("scorecode service is not available.")

def lookup_save_packages_scorecode_info(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_packages_scorecode_info would be better.

"""Fetch scorecode information for each of the project's discovered packages."""
packages = self.project.discoveredpackages.all()
scorecard_packages_data = scorecard.fetch_scorecard_info(
packages=packages,
logger=self.log,
)

if scorecard_packages_data:
scorecard.save_scorecard_info(
package_scorecard_data=scorecard_packages_data,
cls=PackageScore,
logger=self.log,
)

else:
raise Exception("No Data Found for the packages")
tdruez marked this conversation as resolved.
Show resolved Hide resolved
9 changes: 9 additions & 0 deletions scanpipe/tests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,10 @@
# ScanCode.io is a free software code scanning tool from nexB Inc. and others.
# Visit https://github.com/nexB/scancode.io for support and download.

import json
import os
from datetime import datetime
from pathlib import Path
from unittest import mock

from django.apps import apps
Expand Down Expand Up @@ -264,3 +266,10 @@ def make_dependency(project, **extra):
"license_key": "mpl-2.0",
},
}

scorecard_data = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That line seems unnecessary.


data = Path(__file__).parent / "data"

with open(f"{data}/scorecode/scorecard_response.json") as file:
scorecard_data = json.load(file)
Comment on lines +306 to +309
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be loaded in the module init but as needed in the test function context.

Loading