-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is a package? #257
Comments
Related: #161 |
Anything below that feels deterministic is not authoritative, just my opinion.
Very likely. The question is, is that a contract with
Possibly. It depends on
Possibly. Depends on the ecosystem. As you say, it's up to the end user to know what to do with that information. For example, are you stripping the qualifiers from the purl string and trying to use them to compare hashes of files on disks? That would not be right, but purl doesn't know that's what you wanted to do with the string. If you're "just" trying to render a bunch of logos for software that you use in your application, the qualifier might not be as important. And purl wouldn't know that's what you wanted to do with the string.
Possibly, depending on ecosystem conventions.
You're making two points here. I disagree that
Possibly. The more I read what you're writing I think it makes sense to clarify how a qualifier might be redundant for identification purposes. Are some qualifiers helpful for disambiguation? Are there other qualifiers helpful for verification? (I generally agree, although I haven't thought of it as a big problem, that you shouldn't use the checksum qualifier to actually verify package integrity)
I don't see it this way. I actually don't think it refers to any files, but to the concept of a package. There are remarkably few mentions to "files" in the spec or the types; some, like in I don't think we've down a great job at explaining
Realistically it'll be reported against the first one; someone will need to read the CVE description to understand if this applies to sources or to binaries. I don't understand Maven or Gradle well enough to weigh in on the Gradle key-value modifiers, but you might be onto something. Would love to hear more thoughts. And sorry if it wasn't clear from the rest of my comment, yes, I don't think we define package well. Some of that might be by design, but I think there's more we can do. |
@big-guy Security tools should avoid assumptions.
These must be considered different.
They are different. An example of a qualifier is arch, which could be amd64 or arm64 representing different packages.
They are different packages, but the same vulnerability must be applicable for both the source and compiled form unless the vulnerability is only due to the compiler settings or runtime used. In such cases, the vulnerability must be assigned to the compiler or the runtime. |
@big-guy Thank you ++ for this detailed set of questions and and comments! FWIW, I enabled "discussions" at #260 as we may want to use this in the future! Here are some answers:
yes.
Whether you include a version or not in a PURL is a choice in a given context, and has nothing to do with the package type IMHO. Where you could have no version and be able to locate a single package? Say I want to talk about a package in general, I could use And for version ranges, this is going to be separate. See #139 |
@big-guy re: Qualifiers
This is a different package in most cases (but say if you add a checksum this may not change much). But where it matters depends on the context. A vulnerability may affect all the architectures of a Debian package (e.g., all qualifiers) or just when on arm arch. When it comes to a vulnerability database, it may prefer to enumerate all the affected arches at all times or only track them when needed. And I see your point as this could be clarified alright. |
@bureado all your comments in #257 (comment) are right to the point! Thank you ++ @prabhu same for #257 (comment) 👍 Thank! @big-guy re:
There are cases where you may want to make this distinction. Say with we have this case where the sources and binaries differ, for instance the binary is really an uberjar with "shaded" contents of a vulnerable log4j. Here the source may not be vulnerable, but the binary ma be? |
@big-guy re: well-known qualifiers
Good point, yet I do not see this as an error per se. May be instead something we should recommend tools to normalize and simplify? This is redundant for sure! |
@big-guy re: files
This already speced in this case as Maven calls this "type" and "classifier" https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#maven ... I personally think that the way Maven handles this is a little contrived, but this is likely this way because of "frozen accidents" in its history and kept for backward compatibility? There is a complex matrix to consider. PURL is not trying to change the world, just modestly to make it easy enough to handle the common case easily and obviously and to accommodate the more complex cases (but may be not as obviously). Here is a possible analogy that may not be too shabby! Say the PURL spec is like a the spec for an address book of people and places. 🧑🤝🧑 🏙️ Each package type is like a country or state and defines how you can identify and locate a place reasonably uniquely. Uniquely enough that the post can deliver the mail. In a city with well defined streets and street numbers, you get a precise location with the street name and number and may be an apartment number. In some cases you may want the address for a single person with its name, or the whole household. If someone is off the grid in the bayou or some isolated mountain, crafting a proper address may be more hairy and fuzzy. Worst case I may need GPS coordinates for these edge cases. I may also have many different ways to write an address or a name. Heck, some folks also live in orbit on the ISS and GPS will not work there! I think the same applies to software. I wish everything was well organized and tidy, but we have to deal with a lot of warts and weirdness! |
@big-guy oops! just reading what I wrote and I missed a point :]
How would |
@big-guy so finally about your questions:
It depends on the context and what's been baked in PURL, the same way a physical address may point to a country, (type), city (namespace), street (name), building number (~ version :]) or a room or a person (qualifiers, subpaths or name) and everything in between. (I guess the address analogy breaks down and wears out quickly.) |
I think the spec has to clarify
If there's a PURL in a vulnerability report and I have a PURL in my SBOM, when should I react, when do they match? Like above, if there's no arch in the vulnerability report - does it match ALL architectures then? |
@oej,
If I were to implement a VA tool, then yes, I would take no explicit arch as a need to match all arches. |
I'm sorry if this has been discussed somewhere else. I looked through the PURL spec and the answer isn't explicitly stated. #242 has similar vibes.
What is a package?
hierarchy
The spec describes the components of a PURL as a hierarchy.
If you had these two PURLs, would it make sense to call them the same package, but one of them is more specific?
pkg:example/org/mypkg
pkg:example/org/[email protected]
IOW, does
pkg:example/org/mypkg
generically refer to all versions of the package?I think this falls into something per-ecosystem because only the name is required universally. So for some ecosystems, you may never have a version or you must always have a version. If an ecosystem mixed these two notations, it would be confusing.
qualifiers
Let's say this is a package:
pkg:example/org/[email protected]
But does this represent a different package?
pkg:example/org/[email protected]?key=value
I can see arguments for both.
Yes - the qualifiers identify a specific set of files and different meta information could be appropriate (different CVEs, different licenses, different dependencies, etc).
No - the qualifiers are optional and a package is more than just a single set of files. It's the collection of all things from all qualifiers.
If we look at regular URLs, URLs with different query parameters are treated as separate URLs, but the query parameter might not affect the semantic content of what's available at the URL. e.g., a page with
?sort=ascending
could be identical to one without it (if ascending is the default).I think this might be a per-ecosystem decision, but it makes talking about what a PURL points to a little harder.
well-known qualifiers
The spec describes a few well-known qualifiers for all package types. There's a warning to keep the use of qualifiers to a bare minimum for "package identification".
Like above, if this is a package:
pkg:example/org/[email protected]
Is this a different package?
pkg:example/org/[email protected]?repository_url=https://example.com
If not, why would this information be useful?
My thinking is that these coudl be different packages, but tooling should treat them as potentially the same package in one direction. That means if there's a known CVE against
pkg:example/org/[email protected]
, it should also be assumed to be againstpkg:example/org/[email protected]?repository_url=https://example.com
, but not the other way around.Related to this, what if the
repository_url
happens to point to the default package registry? I think this could be considered an error.Is this a different package?
pkg:example/org/[email protected]?download_url=https://example.com/mypkg.zip
I think no, but this means there would always be exceptions to qualifiers being meaningful to package identification.
vcs_url
andfile_name
feel similar todownload_url
. These shouldn't be considered part of the identification.checksum
seems similar too, except if you did try to consider it part of the package identification, would the lack of certain checksum values also be considered "the same"?multiple files
Let's say that when resolving the files for package
pkg:example/org/[email protected]
, we actually do multiple things:pkg:example/org/[email protected]
in the registryThis means that
pkg:example/org/[email protected]
potentially refers to several files.It seems like the PURL spec is written with the idea that the PURL refers to a single file (see the well known qualifiers above), but this isn't universally the case for all ecosystems.
other
I think the questions for qualifiers may apply to
#subpath
too.why?
This all came up because I've been looking at how Maven and Gradle would use PURLs to describe things internally or in reports.
If this is a package in Maven
pkg:maven/org.apache.xmlgraphics/[email protected]
andpkg:maven/org.apache.xmlgraphics/[email protected]?classifier=sources
refers to the sources of that package, I could see theoretically that different CVEs could be reported against each of these, but I'm not sure it ever makes sense to do that.In Gradle, you could follow a similar convention where the group-name-version coordinates map to the same location in a Maven repository, but Gradle could publish many different files there that are selected by something other than group-name-version.
If PURLs need to designate a particular file that was used, this means Gradle would need to encode more information:
pkg:gradle/org.apache.xmlgraphics/[email protected]?org.gradle.libraryelements=jar,...
where...
is a list of a dozen or more key-values that Gradle considered when selecting the file.The key-values are used by Gradle to select between different variants of the same thing. Compare this to other package managers that have different artifacts for separate architectures or OSes. If Gradle doesn't include that information, the PURL is what Maven has
pkg:maven/org.apache.xmlgraphics/[email protected]
. If PURLs are used by other tooling to associate CVEs against packages, then all CVEs apply to all variants.There are other complications here (e.g., some key-values can be considered equivalent even if they have different values), but I thought I'd start with the simpler question:
what is a package?
The text was updated successfully, but these errors were encountered: