Skip to content

Commit

Permalink
feat: Blog posts for OSSF Scorecard launch (#363)
Browse files Browse the repository at this point in the history
Signed-off-by: John McBride <[email protected]>
  • Loading branch information
jpmcb authored Aug 6, 2024
1 parent e5a505b commit 4226584
Show file tree
Hide file tree
Showing 6 changed files with 363 additions and 0 deletions.
128 changes: 128 additions & 0 deletions blog/2024/2024-08-06-introducing-ossf-scorecard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
---
title: "Introducing OpenSSF Scorecard for OpenSauced"
tags: ["open source security foundation", "openssf", "openssf scorecard", "open source", "open source compliance", "open source security"]
authors: jpmcb
slug: introducing-ossf-scorecard
description: "Learn how OpenSauced integrates OpenSSF Scorecard to enhance open source security and compliance."
---

In September of 2022, the European Parliament introduced the [“Cyber Resilience Act”](https://digital-strategy.ec.europa.eu/en/policies/cyber-resilience-act),
commonly called the CRA: a new piece of legislation that requires anyone providing
digital products in the EU to meet certain security and compliance requirements.

<!-- truncate -->

But there’s a catch: before the CRA, companies providing or distributing software
would often need to take on much of the risk when ensuring safe and reliable software
was being shipped to end users. Now, software maintainers further down the supply
chain will have to carry more of that weight. Not only may certain open source
maintainers need to meet certain requirements, but they may have to provide an
up to date security profile of their project.

[As the Linux Foundation puts it](https://www.linuxfoundation.org/blog/understanding-the-cyber-resilience-act):

> The Act shifts much of the security burden onto those who develop software,
as opposed to the users of software. This can be justified by two assumptions:
first, software developers know best how to mitigate vulnerabilities and distribute
patches; and second, it’s easier to mitigate vulnerabilities at the source than
requiring users to do so.

There’s a lot to unpack in the CRA. And it’s still not clear how individual open
source projects, maintainers, foundations, or companies will be directly impacted
But, it’s clear that the broader open source ecosystem needs easier ways to understand
the security risk of projects deep within dependency chains. With all that in mind,
we are very excited to introduce the OpenSSF Scorecard ratings within the OpenSauced
platform.

## What is the OpenSSF Scorecard?

The OpenSSF is [the Open Source Security Foundation](https://openssf.org/): a multidisciplinary group of
software developers, industry leaders, security professionals, researchers, and
government liaisons. The OpenSSF aims to enable the broader open source ecosystem
“to secure open source software for the greater public good.” They interface with
critical personnel across the software industry to fight for a safer technological
future.

[The OpenSSF Scorecard project](https://github.com/ossf/scorecard) is an effort
to unify what best practices open source maintainers and consumers should use to
judge if their code, practices, and dependencies are safe. Ultimately, the “scorecard”
command line interface gives any the capability to inspect repositories, run “checks”
against those repos, and derive an overall score for the risk profile of that project.
It’s a very powerful software tool that gives you a general picture of where a piece
of software is considered risky. It can also be a great starting point for any open
source maintainer to develop better practices and find out where they may need to
make improvements. By providing a standardized approach to assessing open source
security and compliance, the Scorecard helps organizations more easily identify
supply chain risks and regulatory requirements.

## OpenSauced OpenOSSF Scorecards

Using the scorecard command line interface as a cornerstone, we’ve built infrastructure
and tooling to enable OpenSauced to capture scores for nearly all repositories on
GitHub. Anything over a 6 or a 7 is generally considered safe to use with no blaring
issues. Scores of 9 or 10 are doing phenomenally well. And projects with lower scores
should be inspected closely to understand what’s gone wrong.

Scorecards are enabled across all repositories. With this integration, we aim to
make it easier for software maintainers to understand the security posture of their
project and for software consumers to be assured that their dependencies are safe
to use.

Starting today, you can see the score for any project within individual [Repository Pages](https://opensauced.pizza/docs/features/repo-pages/).
For example, in [kubernetes/kubernetes](https://app.opensauced.pizza/s/kubernetes/kubernetes),
we can see the project is safe for use:

![Kubernetes Scorecard](../../static/img/kubernetes-scorecard.png)

Let’s look at another example: [crossplane/crossplane](https://app.opensauced.pizza/s/crossplane/crossplane).
These maintainers are doing an awesome job of ensuring they are following best
practices for open source security and compliance!!

![Crossplan Scorecard](../../static/img/crossplane-scorecard.png)

The checks that the OpenSSF Scorecard looks for involves a wide range of common
open source security practices, both “in code” and with the maintenance of the
project: e.g. checking for code review best practices, if there are “dangerous
workflows” present (like untrusted code being run and checked out during CI/CD runs),
if the project is actively maintained, the use of signed releases, and many more.

## The Future of OpenSSF Scorecards at OpenSauced

We plan to bring the OpenSSF Scorecard to more of the OpenSauced platform, as we
aim to be the definitive place for open source security and compliance for maintainers
and consumers. As part of that, we’ll be bringing more details to the OpenSSF Scorecard
with how individual checks are ranked:

![Future Scorecard](../../static/img/future-scorecard.png)

We’ll also be bringing OpenSSF Scorecard to our premium offering, [Workspaces](https://opensauced.pizza/docs/features/workspaces/):

![Bottlerocket Scorecard Workspace](../../static/img/future-scorecard-workspaces.png)

Within a Workspace, you’ll soon be able to get an idea of how each of the projects
you are tracking stack up alongside each other's score for open source security and
compliance. You can use the OpenSSF Score together with all the Workspace insights
and metrics, all in one single dashboard, to get a good idea of what’s happening within
a set of repositories and what their security posture is. In this example, I’m tracking
all the repositories within the bottlerocket-os org on GitHub, a security focused
Linux based operating system: I can see that each of the repositories has a good
rating which gives me greater confidence in the maintenance status and security
posture of this ecosystem. This also enables stakeholders and maintainers of Bottlerocket
to have a birds eye snapshot of the compliance and maintenance status of the
entire org.

As the CRA and similar regulations push more of the security burden onto developers,
tools like the OpenSSF Scorecard become invaluable. They offer a standardized, accessible
way to assess and improve the security of open source projects, helping maintainers
meet new compliance requirements and giving software consumers confidence in their
choices.

Looking ahead, we're committed to expanding these capabilities at OpenSauced. By
providing comprehensive security insights, from individual repository scores to
organization-wide overviews in Workspaces, we're working to create a more secure
and transparent open source ecosystem, to enable anyone in the open source community
to better understand their software dependencies, feel empowered to make a meaningful
change if needed, and provide helpful tools to open source maintainers to better
maintain their projects.

Stay saucy!
235 changes: 235 additions & 0 deletions blog/2024/2024-08-08-ossf-scorecard-technical-deep-dive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
---
title: "Using Kubernetes jobs to scale OpenSSF Scorecard"
tags: ["open source security foundation", "openssf", "openssf scorecard", "open source", "open source compliance", "open source security", "kubernetes", "kubernetes jobs"]
authors: jpmcb
slug: ossf-scorecard-technical-deep-dive
description: "Learn how OpenSauced uses Kubernetes to scale the OpenSSF Scorecard."
unlisted: true
---

We recently released integrations with the [OpenSSF Scorecard on the OpenSauced platform](https://opensauced.pizza/blog/introducing-ossf-scorecard).
The OpenSSF Scorecard is a powerful Go command line interface that anyone can use
to begin understanding the security of their projects and dependencies. It runs
several checks for dangerous workflows, CICD best practices, if the project is
still maintained, and much more. This enables software builders and consumers to
understand their security posture, deduce if a project is safe to use, and where
improvements to security practices need to be made.

<!-- truncate -->

But one of our goals with integrating the OpenSSF Scorecard into the OpenSauced
platform was to make this available to the broader open source ecosystem at large.
If it’s a repository on GitHub, we wanted to be able to display a score for it.
This meant scaling the Scorecard CLI to target nearly any repository on GitHub.
Much easier said than done!

In this blog post, let’s dive into how we did that using Kubernetes and what technical
decisions we made with implementing this integration.

## Technical decisions

We knew that we would need to build a cron type microservice that would frequently
update scores across a myriad of repositories: the true question was how we would
do that. It wouldn't make sense to run the scorecard CLI ad-hoc: the platform could
too easily get overwhelmed and we wanted to be able to do deeper analysis on scores
across the open source ecosystem, even if the OpenSauced repo page hasn’t been
visited recently. Initially, we looked at using the Scorecard Go library as direct
dependent code and running scorecard checks within a single, monolithic microservice.
We also considered using serverless jobs to run one off scorecard containers that
would give back the results for individual repositories.

The approach we ended up landing on, which marries simplicity, flexibility, and
power, is to use Kubernetes Jobs at scale, all managed by a “scheduler” Kubernetes
controller microservice. Instead of building a deeper code integration with scorecard,
running one off Kubernetes Jobs gives us the same benefits of using a serverless approach,
but with reduced cost since we’re managing it all directly on our Kubernetes cluster.
Jobs also offer alot of flexibility in how they run: they can have long, extended
timeouts, they can use disk, and like any other Kubernetes paradigm, they can have
multiple pods doing different tasks.

Let’s break down the individual components of this system and see how they work
in depth:

## Building the Kubernetes controller

The first and biggest part of this system is the “scorecard-k8s-scheduler”; a Kubernetes
controller-like microservice that kicks off new jobs on-cluster. While this microservice
follows many of the principles, patterns, and methods used when building a traditional
Kubernetes controller or operator, it does not watch for or mutate custom resources
on the cluster. Its function is to simply kick off Kubernetes Jobs that run the Scorecard
CLI and gather finished job results.

Let’s look first at the main control loop in the Go code. This microservice uses
the Kubernetes Client-Go library to interface directly with the cluster the microservice
is running on: this is often referred to as an on-cluster config and client. Within
the code, after bootstrapping an on-cluster config and client, we poll for repositories
in our database that need updating. Once some repos are found, we kick off Kubernetes
jobs on individual worker “threads” that will wait for each job to finish.

```go
// buffered channel, sort of like semaphores, for threaded working
sem := make(chan bool, numConcurrentJobs)

// continuous control loop
for {
// blocks on getting semaphore off buffered channel
sem <- true

go func() {
// release the hold on the channel for this Go routine when done
defer func() {
<-sem
}()

// grab repo needing update, start scorecard Kubernetes Job on-cluster,
// wait for results, etc. etc.

// sleep the configured amount of time to relieve backpressure
time.Sleep(backoff)
}()
}
```

This “infinite control loop” method, with a buffered channel, is a common way in
Go to continuously do something but only using a configured number of threads.
The number of concurrent Go funcs that are running at any one given time depends
on what configured value the “numConcurrentJobs” variable has. This sets up the
buffered channel to act as a worker pool or semaphore which denotes the number of
concurrent Go funcs running at any one given time. Since the buffered channel is
a shared resource that all threads can use and inspect, I often like to think of
this as a semaphore: a resource, much like a mutex, that multiple threads can attempt
to lock on and access. In our production environment, we’ve scaled the number of
threads in this scheduler all running at once. Since the actual scheduler isn’t
very computationally heavy and will just kick off jobs and wait for results to eventually
surface, we can push the envelope of what this scheduler can manage. We also have
a built-in backoff system that attempts to relieve pressure when needed: this system
will increment the configured “backoff” value if there are errors or if there are
no repos found to go calculate the score for. This ensures we’re not continuously
slamming our database with queries and the scorecard scheduler itself can remain
in a “waiting” state, not taking up precious compute resources on the cluster.

Within the control loop, we do a few things: first, we query our database for repositories
needing their scorecard updated. This is a simple database query that is based on
some timestamp metadata we watch for and have indexes on. Once a configured amount
of time passes since the last score was calculated for a repo, it will bubble up
to be crunched by a Kubernetes Job running the Scorecard CLI.

## Kicking off Scorecard jobs

Next, once we have a repo to get the score for, we kick off a Kubernetes Job using
the “gcr.io/openssf/scorecard” image. Bootstrapping this job in Go code using Client-Go
looks very similar to how it would look with yaml, just using the various libraries
and apis available via “k8s.io” imports and doing it programmatically:

```go
// defines the Kubernetes Job and its spec
job := &batchv1.Job{
// structs and details for the actual Job including metav1.ObjectMeta and batchv1.JobSpec
}

// create the actual Job on cluster using the in-cluster config and client
return s.clientset.BatchV1().Jobs(ScorecardNamespace).Create(ctx, job, metav1.CreateOptions{})
```

After the job is created, we wait for it to signal it has completed or errored.
Much like with kubectl, Client-Go offers a helpful way to “watch” resources and
observe their state when they change:

```go
// watch selector for the job name on cluster
watch, err := s.clientset.BatchV1().Jobs(ScorecardNamespace).Watch(ctx, metav1.ListOptions{
FieldSelector: "metadata.name=" + jobName,
})

// continuously pop off the watch results channel for job status
for event := range watch.ResultChan() {
// wait for job success, error, or other states
}
```

Finally, once we have a successful job completion, we can grab the results from
the Job’s pod logs which will have the actual json results from the scorecard
CLI! Once we have those results, we can upsert the scores back into the database
and mutate any necessary metadata to signal to our other microservices or the
OpenSauced API that there’s a new score!

As mentioned before, the scorecard-k8s-scheduler can have any number of concurrent
jobs running at once: in our production setting we have a large number of jobs running
at once, all managed by this microservice. The intent is to be able to update scores
every 2 weeks across all repositories on GitHub. With this kind of scale, we hope
to be able to provide powerful tooling and insights to any open source maintainer
or consumer!

## Role-based access control

The “scheduler” microservice ends up being a small part of this whole system: anyone
familiar with Kubernetes controllers knows that there are additional pieces of Kubernetes
infrastructure that are needed to make the system work. In our case, we needed some
role-based access control (RBAC) to enable our microservice to create Jobs on the cluster.

First, we need a service account: this is the account that will be used by the
scheduler and have access controls bound to it:

```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: scorecard-sa
namespace: scorecard-ns
```
We place this service account in our “scorecard-ns” namespace where all this runs.
Next, we need to have a role and role binding for the service account. This includes
the actual access controls (including being able to create Jobs, view pod logs, etc.)
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: scorecard-scheduler-role
namespace: scorecard-ns
rules:
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "watch", "patch", "update"]
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list", "watch"]

---

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: scorecard-scheduler-role-binding
namespace: scorecard-ns
subjects:
- kind: ServiceAccount
name: scorecard-sa
namespace: scorecard-ns
roleRef:
kind: Role
name: scorecard-scheduler-role
apiGroup: rbac.authorization.k8s.io
```
You might be asking yourself “Why do I need to give this service account access
to get pods and pod logs? Isn’t that an over extension of the access controls?”
Remember! Jobs have pods and in order to get the pod logs that have the actual
results of the scorecard CLI, we must be able to list the pods from a job and then
read their logs!
The second part of this, the “RoleBinding”, is where we actually attach the Role
to the service account. This service account can then be used when kicking off
new jobs on the cluster.
All in all, this architecture allows us to use the flexibility and power of serverless like setups,
but it still takes advantage of the cost savings and existing infrastructure we have
with Kubernetes. Using existing paradigms and components can be a great way to unlock
existing capabilities you already have within your platform of choice!
Huge shout out to [Alex Ellis](https://github.com/alexellis) and his excellent [run-job controller](https://github.com/alexellis/run-job):
this was a huge inspiration and reference for correctly using Client-Go with Jobs!
Stay saucy!
Binary file added static/img/crossplane-scorecard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/future-scorecard-workspaces.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/future-scorecard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/kubernetes-scorecard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 4226584

Please sign in to comment.