-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distro matchers should be guided by package metadata not detected distro #86
Comments
Now that the PURLs contain distro information when generated by Syft, I think we could use the PURL info instead of the distro or types. @wagoodman would an approach that uses the PURL info first, then falls back to other sources if not explicitly set in the PURL be reasonable? I've also seen cases where folks pull in a package from another distro version (e.g. use a rhel 8 pkg in rhel 7) so this would allow us to properly handle that case if/when syft can detect it. But, assuming that a user has manually edited the syft output to be correct (PURLs etc), then Grype could consume it. |
Agreed, this should definitely be possible now:
We should be able to make a distro object from the package. I'll pick this up, since it seems like it will help match quality. |
I have a couple concerns here with the linked PR before I am ready to merge it:
Syft commands showing information lost from distro node in SBOM vs distro key in PURL:❯ syft -q -o json debian:unstable-slim | jq '.artifacts[] | .purl'
"pkg:deb/debian/[email protected]?arch=arm64&distro=debian" ❯ syft -q -o json debian:unstable-slim | jq '.distro'
{
"prettyName": "Debian GNU/Linux trixie/sid",
"name": "Debian GNU/Linux",
"id": "debian",
"versionCodename": "trixie",
"homeURL": "https://www.debian.org/",
"supportURL": "https://www.debian.org/support",
"bugReportURL": "https://bugs.debian.org/"
} |
I would think that it's the right approach, and I would think you wouldn't need to even log a warning tbh. If a package was built for a specific distro, I would think that it would have the vulnerabilities for that package on that distro it was meant for, and you'd want to use that distro's CVE database to get the severity. I would think that the biggest application of this is scanning images where you're copying packages from one layer to the next, likely with the final image being a scratch image. One other way of approaching this (if the PURL doesn't always have the correct information) is to fall back to the distribution that was detected on the layer where the package was found. I would think this would provide more accurate results than using the final layers detected distro for all found packages. |
In cases where there is a multi-stage build this could be useful:
We don't have visibility in this case at all, since the image being analyzed only has one layer, the last scratch section. There isn't much we can do here. OS package managers don't really leave around information on a per-package basis to figure which distro the package was sourced from (anyone: please correct me if I'm wrong and point out where to look for this!). There is a pseudo-related issue anchore/syft#435 which talks about trying to track all of the layers within an image that has a reference to this package (which is different). Syft and grype allow you to track all of the layer references for a package with |
Thanks for the response @wagoodman, and sorry for the late reply (I saw this when you posted it but forgot to respond).
Right, that is a big "problem", I actually hadn't realised that was the case for multi stag builds (which I actually use). I suppose the only way to get distribution information from a multi stag scratch build would be if the docker file owner added a distribution hint to the docker file explicitly. Would you agree @wagoodman? This layer tracking approach could still be valuable for the mult layered images though where the packages might not be installed through the package manager in the layer.
I'm wondering if what I was previously thinking even makes sense for images that aren't multi staged (just regular multi layered). You wouldn't really need to track the distribution of each layer and associate them to the packages on that layer, I don't think it's really easy (or possible...) to have a "multi distro" build without using the multi stage builds. It's not like the distro is going to change for these layers (I could be wrong though...), so all we'd really need is the distro detection logic to see if it can find a distribution hint in the previous layers if it can't find them in the current layer. |
👋 Great discussion so far! I've got a couple of use-cases for this capability that may help guide decisions here. Apologies if these duplicate some discussion above (I think there is some overlap with the "local build" discussion) :
For the "what to do with partial distro information" case, there are some interesting options IMO depending on which use-case the user is trying to achieve since that may impact the reason for the partial information. Because of that, it seems like configurable behavior is best. The options I can identify thus far (open to suggestions!) are:
What do you all think? |
I think it would be preferable to fall back to the SBOM distro in these cases; otherwise we might get a lot of false positives, for example from assuming that the package is the system perl that shipped with super old Debian or something. Also, as part of this work, we should update Syft to include the Debian version number in the PURLs for debian packages it finds; that would be a better fix, but we still need to handle the case where there's a partial distro in the PURL.
This is an interesting idea. One other concern here is: Do other SBOM tools put distros in PURLs at all? We don't want grype to assume that it's SBOM came from Syft too often. |
I agree that should be the default, but there are use cases where the FPs are a tradeoff that a user may want to make (e.g. they don't know which version of debian a package came from so showing all lets them see the full surface. Its not a common case but one that I think the tool could handle with explicit configuration from the user indicating they want to make that tradeoff. This is for the case where a user gets an SBOM they didn't create and/or wasn't created from a single tool.
Agreed fully.
Agree that we shouldn't assume the Syft semantics for a field specifically, but we should be able to make it clear to a user which fields in the SBOM are used, how, and enable them to get the matching behavior they want if they craft an SBOM in a specific way to match the security process or scope they want to achieve. That's why I'm ok with reducing code complexity by pushing these decisions to configuration so the user can tell Grype how they want it to behave in ambiguous cases. |
I did some more experimenting, and it looks like syft includes the Debian version except for trixie/sid/unstable:
I met with @wagoodman and I think we can move forward with the grype work, if we change syft to include the distro codename in the PURL if there's no version ID. That leaves us with the following changes:
Item 2 will be implemented by fixing up #1530. Item 1 needs a separate change to Syft. Does that sound good to everyone @zhill and @wagoodman ? |
Thanks @willmurphyscode I got distracted by other stuff and didn't get back to this for a while. The plan sounds good, and opens the door to allow the PURL to describe other package sources even for the same type, such as an RPM from Google installed into a CentOS image. That could be detected and an accurate PURL created, which indicates we've got some future proofing as well here as the SBOM side evolves. Thanks! |
This will be much easier to do as part of #2128, and requires that Syft puts some distro version info (even if only a codename) in the PURL even if the version is not available. I'm putting this back in the backlog and adding it to the schema v6 milestone. |
From a codename perspective we now have the information needed to search by codename in v6 schema v6. Here's a prototype of the OS table:
In an upcoming PR there will be a string-to-os resolver helper function |
Currently we use the detected distro to guide rpm, deb, and apk matchers to find vulnerabilities. This is functional, however, it would be more accurate to use the package type (rpm, deb, apk) to select the vulnerability namespace and not the distro detected (redhat:8, ubuntu:20, alpine:3:12).
Problem: we don't know the distro version from the package type, so it is not possible to select the "correct" vulnerability namespace. This is worth thinking about nonetheless.
The text was updated successfully, but these errors were encountered: