-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation #1
base: main
Are you sure you want to change the base?
Conversation
import_test.go
Outdated
import ( | ||
"testing" | ||
|
||
Spdx3_0_1 "github.com/spdx/spdx-go-model/v3_0_1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to review the "import path" of this mod/package while user will confused by the version between spec and go module itself.
It's better to use multiple modules
Ref: https://go.dev/doc/modules/managing-source#multiple-module-source
spdx-v3 github.com/spdx/spdx-go-model/spdx-v3-0-1
spdx-v4 github.com/spdx/spdx-go-model/spdx-v4-0-0
While user can install by
go get github.com/spdx/spdx-go-model/[email protected]
go get github.com/spdx/spdx-go-model/spdx-v4-0-0@latest
or we can setup a prettier path like k8s project (k8s.io/api/core/v1
) as for SPDX like spdx.dev/go/v3.0.1@latest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we actually want to do multiple modules in this case. If we make multiple modules in the same repository, it looks like when we release, we would have to drop a tag for each version of SPDX spec in the repo. For example, to make a release, we would have to make tags:
spdx_v_3_0_1/v1.0.0
spdx_v_3_1_0/v1.0.0
etc.
(note: I couldn't figure out how to a module with -
in the name)
Which will be a real pain given that we'll hopefully have several simultaneous SPDX versions in here, and also because all of the tags will always be dropped in the same location (because, all the code in this repo is effectively auto generated each release).
Given that the bindings for each release will "behave" pretty much the same for each SPDX version, I'm also not convinced that you'd want to allow users to pick different releases of the different bindings (e.g. v1.0.0 bindings for SPDX 3.0.1, v1.0.1 for SPDX 3.1.0, etc.). The binding code is only going to drastically change if we bump the version of shalc2code used to generate the bindings and I think we can manage that by A) fixing the version of shacl2code used (which I will do), and B) bumping the module major version and branching if necessary (e.g. when shacl2code changes the API).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also of note, I will make the bindings be named spdx_v3_0_1
instead of just v3_0_1
as that does remove the need for the import alias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the README, workflows and related artifacts - which all look good.
Looking for others with more GoLang expertise to review the generated code.
fc6d5d0
to
b024e54
Compare
The initial implementation of the SPDX go bindings
@JPEWdev could you please give me a overview here about what you were looking a review for. Yesterday, in the meeting there were some glitch in the network from my side due to which I couldn't understand fully. |
I'm not a go expert, this is the first go program I've ever written, so what I would like is for some people knowledgeable in go to look over the code (in particular the usage in the examples) and make sure it's reasonable from a go user perspective |
ci.Created().Set(time.Now()) | ||
|
||
// Set the person as the creator of the SPDX data | ||
ci.CreatedBy().AppendObj(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use append()
for Slices Instead of Custom Methods.
- Instead of
ci.CreatedBy().AppendObj(p)
, useci.CreatedBy = append(ci.CreatedBy, p)
.
The built-in append function is the idiomatic way to work with slices in Go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this is two fold:
-
it allows validation of the assigned value immediately when it's appended. This reduces errors when using the bindings as you can determine exaclty where a bad value was assigned, instead of having to find out later when all context is lost.
-
In the particular case where a property references an object, either an object or a string IRI can be assigned, so there is also an AppendIRI() method; both of these are convenience shortcuts to the Append() method which requires a properly types Ref object
p.ID().Set("http://spdx.org/spdxdocs/person-example") | ||
|
||
// Set the name for the person | ||
p.Name().Set("Joshua") | ||
|
||
// Create a creation info object | ||
ci := spdx_v3_0_1.MakeCreationInfo() | ||
ci.SpecVersion().Set("3.0.1") | ||
ci.Created().Set(time.Now()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid Unnecessary Getters
- Instead of using
p.ID().Set(value)
, usep.SetID(value)
, - Similarly,
p.Name().Set(value)
, usep.SetName(value)
- Similarly,
ci.SpecVersion().Set(value)
, useci.SetSpecVersion(value)
- and,
ci.Created().Set(valu)
, useci.SetCreated(value)
Something like this: https://go.dev/play/p/tREj7_5eQFN
You can checkout these resources to understand more about getter and setter in go:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this is because all of this code is auto-generated, so having a common "string property" class that can be reused all the places where a class has a string property is much easier to generate code for than (effectively) open coding it everywhere. I do appreciate this is not "the go way", so I'll look into it though
email.ExternalIdentifierType().Set(spdx_v3_0_1.ExternalIdentifierTypeEmail) | ||
email.Identifier().Set("[email protected]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here: Avoid Unnecessary Getter/Setter Chaining:
Refractor it to:
email.SetExternalIdentifierType(spdx_v3_0_1.ExternalIdentifierTypeEmail)
email.SetIdentifier("[email protected]")
email := spdx_v3_0_1.MakeExternalIdentifier() | ||
email.ExternalIdentifierType().Set(spdx_v3_0_1.ExternalIdentifierTypeEmail) | ||
email.Identifier().Set("[email protected]") | ||
p.ExternalIdentifier().AppendObj(email) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly avoid AppendObj
and simply refractor it to:
p.ExternalIdentifiers = append(p.ExternalIdentifiers, email)
objset.AddObject(p) | ||
objset.AddObject(doc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this, if it is internally appending an objects as name suggest, AddObject
, then instead of that use a simple append()
.
Refractor it as:
objset.Objects = append(objset.Objects, p, doc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
objset does indexing on objects added to it to make lookup faster, so append()
cannot be used
doc.ID().Set("http://spdx.org/spdxdocs/doc-example") | ||
doc.CreationInfo().SetObj(ci) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly these one, as per above suggested to avoid getter.
- Instead of
doc.CreationInfo.Set(value)
, usedoc.SetCreationInfo(ci)
, and doc.ID().Set(value)
, bydoc.SetID(value)
Refractor with:
func (d *SpdxDocument) SetCreationInfo(ci CreationInfo) {
d.CreationInfo = ci
}
and
func (d *SpdxDocument) SetID(id string) {
d.SetID = id
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't be that simple, due to validation requirements (as stated, an important point of these bindings is to fail as early as possible if a bad value is assigned)
CreationInfo can be either a Object or a String IRI reference, so maybe something like:
doc.SetCreationInfoObj(value)
doc.SetCreationInfoIRI("http://foo.bar/creationinfo")
doc.SetCreationInfo(creationInfoRef)
would be OK?
|
||
encoder := json.NewEncoder(file) | ||
if err := objset.Encode(encoder); err != nil { | ||
t.Error(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensures test execution stops immediately when a failure occurs.
t.Error(err)
will continue test even after errors- whereas,
t.Fatal(err)
, will stops execution test upon failure.
So, refractor this too:
if err != nil {
t.Fatal(err) // Terminates test immediately
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks.
I’ve written and re-written this a few times now, so please accept my apologies for the delay and also for the length of this comment. And, before continuing, I want to sincerely express my gratitude for your effort here to get SPDX 3 support implemented for Go — it’s not a trivial task — in fact, it may be one of the more challenging languages! I think the easiest way to present this is to start at the end by generating one of the Go data models that I think works best for tools-golang. As a maintainer, I hope to provide a way for users to use the SPDX 3 format with minimal friction by using Go language features as best we can to help enforce and/or guide correct usage. This is especially challenging due to the lack of inheritance in the language and the reliance on inheritance in the spec. Rather than give exhaustive feedback on this PR, I think it’s easier to show what works about this updated model and explain why I’ve made some of the decisions. Generally, this approach is:
To start the discussion, I’ve re-written the example using this struct-first data model: // Create a new Person
p := &spdx.Person{Agent: spdx.Agent{Element: spdx.Element{
// Set the unique identifier (IRI) for the person
ID: "http://spdx.org/spdxdocs/person-example",
// Set the name for the person
Name: "Joshua",
// Add an e-mail address for the person
ExternalIdentifiers: spdx.ExternalIdentifierList{
&spdx.ExternalIdentifier{
ExternalIdentifierType: spdx.ExternalIdentifierType_Email,
Identifier: "[email protected]",
},
},
}}}
ci := &spdx.CreationInfo{
SpecVersion: "3.0.1",
Created: time.Now(),
CreatedBys: spdx.AgentList{p},
}
// set the circular reference: the creation info as the creation info of the Person
p.CreationInfo = ci
// Create a SPDX Document object (can use direct property access)
doc := &spdx.SpdxDocument{}
doc.ID = “http://spdx.org/spdxdocs/doc-example",
doc.CreationInfo = ci, What’s different and why? Type safetyTypes are enforced more appropriately: since interfaces in Go are implicit, these can have problems constraining in certain situations and make some type assertions impossible. The biggest spots this is an issue are where inheritance-only interfaces get generated, for example: type Agent interface {
Element
} And more problematic top-level SHACLObjects: type PresenceType interface {
SHACLObject
} … this means anything with a type constraint using one of these interfaces will also allow ANY type that satisfies the supertype’s interface. In other words, I can set any swp := spdx_v3_0_1.MakeSoftwarePackage()
ci.CreatedBy().AppendObj(swp) // allowed
agent := swp.(spdx.Agent) // allowed The “struct-first” approach defines interfaces based on the structs, so regardless of extra properties being added from the SHACL model, the interface restricts to correct subtypes; e.g. this is a compilation error: swp := &spdx.SoftwarePackage{}
ci.CreatedBys = append(ci.CreatedBys, swp) // not allowed In this PR, a user just references strings for the constants, as in the example, &spdx.ExternalIdentifier{
ExternalIdentifierType: spdx.PresenceType_Yes,
} ValidationIn this PR, since each setter operation checks values for validity, a user must error check every operation like: if err := externalIdentifier.ExternalIdentifierType.Set(“asdf”); err != nil {
return err
} This gets fairly cumbersome, which is why you don’t find a lot of Go interfaces where setters return errors. If the code fails-fast — which this pattern promotes — implementations could easily end up providing one error at a time to a user and the user ends up playing whack-a-mole rather than knowing up front all the things they need to fix. In this PR, there are other validations that happen at other times, too, so I would opt for a single “validate” call to check an entire object graph for validity without setters returning errors. The updated model avoids setter validation altogether by allowing direct property access. Idiomatic GoMost typical data format libraries in Go use concrete structs with only the properties that represent the data values, rather than interfaces containing additional metadata about the type. In this PR, it is not really possible to instantiate these structs directly — they require setup using unexported functions, so these must be created using constructor functions and in these construction functions there is work is happening setting up property constraints and metadata, like this: validators = append(validators,
EnumValidator{[]string{
"https://spdx.org/rdf/3.0.1/terms/Dataset/DatasetType/audio",
…
"https://spdx.org/rdf/3.0.1/terms/Dataset/DatasetType/video",
}})
o.datasetDatasetType = NewListProperty[string]("datasetDatasetType", validators, decodeValidators) While using interfaces is quite common, it’s rare for data objects to be implemented with interfaces. For example, if I have code that only deals with Having concrete structs also aligns more closely with UtilitiesThe last, and maybe the most important thing for working with documents, is that I’ve included some utility methods and specialized collection types. These collections are simply typed slices, so they are easily able to be iterated over, and modified using typical go code, such as: The example usage in this PR is only creating the bare minimum required for a valid document and serializing it. What isn’t shown is working with existing documents. This is where having such a flexible structure based on inheritance can become quite tedious to work with in Go. I’d proffer that most users of SPDX are performing one type of task at time: operating on all the files or packages or some other specific set of types. Let’s take a basic example: identify all the CONTAINS relationships from files to packages. Since the RootElements type is an Element list, it’s no help figuring any of this out, we’re left iterating through everything and performing a bunch of checks; something like this: filesToPackages := map[spdx.SoftwareFile][]spdx.SoftwarePackage{}
for _, e := range doc.RootElement().Get() {
if !e.IsObj() {
continue
}
sbom, _ := e.GetObj().(spdx.SoftwareSbom)
if sbom == nil {
continue
}
for _, v := range sbom.RootElement().Get() {
if !v.IsObj() {
continue
}
relationship, _ := v.GetObj().(spdx.Relationship)
if relationship == nil {
continue
}
if relationship.RelationshipType().Get() != spdx.RelationshipTypeContains {
continue
}
if !relationship.From().IsObj() {
continue
}
fi, _ := relationship.From().Get().(spdx.SoftwareFile)
if fi == nil {
continue
}
for _, to := range relationship.To().Get() {
if !to.IsObj() {
continue
}
swp, _ := to.GetObj().(spdx.SoftwareFile)
if swp == nil {
continue
}
filesToPackages[f] = append(filesToPackages[f], p)
}
}
} By generating iterator functions for each subtype with a specialized collection, this becomes very easy: any filesToPackages := map[*spdx.SoftwareFile][]*spdx.SoftwarePackage{}
for _, v := range sbom.RootElements.RelationshipIter() {
if v.RelationshipType != spdx.RelationshipType_Contains {
continue
}
spdx.As(v.From, func(f *spdx.SoftwareFile) {
for _, p := range v.Tos.SoftwarePackageIter() {
filesToPackages[f] = append(filesToPackages[f], p)
}
})
} I also didn't want to gloss over the lack of ref types: the updated model generates an external IRI type that implements all interfaces, so it can be used in place of any reference type, e.g.:
Last thoughtsThere are a number of things mentioned here that are not exclusive to the data model I’ve presented — some things could be applied to the data model in this PR to solve some of the issues like type constraints and enum values, but overall it will be difficult to use this data model in I also didn’t touch on an important detail: how the data model gets serialized to JSON-LD. I’m much less concerned about this detail aside from feeling strongly that the serialization and validation metadata should be stored separately from the data to avoid unnecessary memory usage. I believe the idiomatic solution would involve using Go struct tags to specify field mapping information, much like the Go standard library uses them to specify field mapping information for plain JSON. Reflection in Go is excellent: it is performant and ideal for this type of thing, which is why it’s used throughout the standard lib for similar purposes. The updated data model, including validation code comes in around 6,000 lines whereas this PR is around 18,000. This isn’t a huge problem but since a user will have to ship copies of every single version of this model including every patch version, there could end up being quite a lot of code that builds up, increasing compile times, binary sizes, etc.; so smaller is better here. This is really just a nice-to-have. To touch on the package naming and versioning from one of the comments: since this library will have every data model, rather than updating a single data model over time, I don’t think using a It is my hope we can have a model here that works well for |
I'm adding an implementation example for this particular implementation as of Feb 5 2025 (in case this gets lost in the comments). I haven't looked at @kzantow's comments yet. This is specifically for the protobom folks @puerco.
This should produce the following:
I'm not sure how to test ingestion and marshaling yet:
|
@kzantow @JPEWdev Last I looked at this, the issues with Go boil down to the following:
In this implementation, there is a SHACLObject struct with its own properties that get implemented. However, it doesn't follow the typical "implement the interface, then implement the struct" pattern that is used in Element for example. That's probably why you can do:
I think implementing the meta SHACLObject and its properties makes the code amenable to be generated from the single shacl .ttl file without errors. Also, the reason I had suggested creating a new repository to host the generated code is so that helper functions and helper libraries can be created separate from this. Users will not have to look at this underlying mess and instead have cleaner functions to use in their applications. |
The SHACLObject exists so that object in the document can be added to the SHACLObjectSet, which only deals with SHACLObjects
|
This is fixed by JPEWdev/shacl2code#43 |
I think that this sounds reasonable as a framing.. IIUC that project would be a SHACL focused minimal use, and that would mainly be the validation and use of UML. The tools-golang repo would be focused on having a good user experience of the spdx 3.0 model. Similar to what we do for the 2.x models, usage of 2.x models are based on the "common struct" and users need not need to be aware of the internals for whether it is a 2.1 or 2.2 model. My only concern is creating a split of mix use of the two libraries which can result in some really weird use behavior.
I agree very much with this as the pattern for golang. However another way of perhaps allowing a struct-first declarative structure is via using the
One thought i was exploring here was trying to map it to more common golang definitions that follow inheritance type patterns. Most commonly the "inheritance" style types defined in golang mirror the OR type (which is analogous with the oneof keyword in protobuf. Thus a struct of an agent could be reflected that way:
This could be used in addition to the
|
I don't really care too much if another approach should be taken; the problem I have is that I don't really have time to write a brand new golang implementation in shacl2code at the moment; is there someone who would be willing to take that on? I can certainly provide advice on what needs to be done to get the generation integrated into shacl2code properly (or maybe the existing golang implementation I have is sufficient to demonstrate that). |
I think the work you've done here is plenty @JPEWdev ! And I think that is a good start that can build upon. Based on @kzantow 's ability to create a variant in spdx/tools-golang#249, it sounds like the generation code is in a good state that is ready for others to add contributions? Would it be fair to say that 3.0 is still early in adoption, and so we will have the opportunity to iterate. Given the several concerns brought up in this PR, perhaps could merge and released as a 0.x version so that the messaging is that there will likely be changes to the interface in the future? That way folks that are blocked can start using it (with early adopter usability caveats). |
The initial implementation of the SPDX go bindings