Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency Extraction Data Source Driver #5094

Closed
wants to merge 7 commits into from

Conversation

puerco
Copy link
Contributor

@puerco puerco commented Nov 29, 2024

Summary

This PR introduces the dependency extraction data source.

Not ready for review, checking it in early and needs other PRs and a rebase to work.

This needs #5091 #5092 #5093

Fixes #5090

Change Type

Mark the type of change your PR introduces:

  • Bug fix (resolves an issue without affecting existing features)
  • Feature (adds new functionality without breaking changes)
  • Breaking change (may impact existing functionalities or require documentation updates)
  • Documentation (updates or additions to documentation)
  • Refactoring or test improvements (no bug fixes or new functionality)

Testing

Outline how the changes were tested, including steps to reproduce and any relevant configurations.
Attach screenshots if helpful.

Review Checklist:

  • Reviewed my own code for quality and clarity.
  • Added comments to complex or tricky code sections.
  • Updated any affected documentation.
  • Included tests that validate the fix or feature.
  • Checked that related changes are merged.

@puerco puerco requested a review from a team as a code owner November 29, 2024 03:53
@coveralls
Copy link

coveralls commented Nov 29, 2024

Coverage Status

coverage: 55.088% (-0.1%) from 55.217%
when pulling 50413b6 on puerco:deps-datasource
into 036082b on mindersec:main.

@puerco puerco force-pushed the deps-datasource branch 2 times, most recently from b8401d4 to a7015f7 Compare December 2, 2024 02:49
@puerco puerco changed the title WIP: Dependency Extraction Data Source WIP: Dependency Extraction Data Source Driver Dec 2, 2024
@puerco puerco force-pushed the deps-datasource branch 9 times, most recently from 5b1774c to e166c37 Compare December 4, 2024 05:11
@puerco
Copy link
Contributor Author

puerco commented Dec 4, 2024

OK, this should be ready for review. The options in the data source definitions don't do anything now, we need #5127 and #5128 for them to work

@puerco puerco changed the title WIP: Dependency Extraction Data Source Driver Dependency Extraction Data Source Driver Dec 4, 2024
This commit adds the deps data source types to the minder proto
definitions.

Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
This commit wires the deps data source through the service machinery.

Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
This adds the dependency data source driver logic to the minder codebase

Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
Copy link
Contributor

@blkt blkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments.
Could you share an example of a definition for this data source? It helps me a lot understanding its use from the perspective of the end user.

Comment on lines +4026 to +4038
// DepsDataSource is the dependency data source driver
message DepsDataSource {
message Def {
// Path where the dependency extractor will look for dependency data
string path = 1;

// List of ecosystems to check, defaults all supported by the extractor
repeated string ecosystems = 2;
}

map<string, Def> def = 1;
}

Copy link
Contributor

@blkt blkt Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is there some form of validation that we can add here?

Looks like this can be almost copy-pasted.

@@ -56,6 +57,7 @@ func buildFromDataSource(
}

if err := dsf.ValidateArgs(jsonObj); err != nil {
log.Error().Interface("dsfunction", k).Err(err).Msg("error validating datasource function arguments")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would avoid logging so deep in the stack, it would be much better to raise a proper error and log in upper layers instead.

Given that Data Sources are meant to be user-defined, I don't think there's much value to log an error, as the user would not have access to it. Besides, we have no idea how often this function is going to be called by REGO code in the rule type, this might be spammy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly for us to track the rego eval engine, it's hard to observe what it's doing without them. Would it be better if I set the level to Debug?

@@ -69,6 +71,7 @@ func buildFromDataSource(
)
ret, err := dsf.Call(ctx, jsonObj)
if err != nil {
log.Error().Interface("dsfunction", k).Err(err).Msgf("function call returned error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: see comment above.

Comment on lines +451 to +467
case *minderv1.DataSource_Deps:
for name, def := range drv.Deps.GetDef() {
defBytes, err := protojson.Marshal(def)
if err != nil {
return fmt.Errorf("failed to marshal REST definition: %w", err)
}

if _, err := tx.AddDataSourceFunction(ctx, db.AddDataSourceFunctionParams{
DataSourceID: dsID,
ProjectID: projectID,
Name: name,
Type: v1datasources.DataSourceDriverDeps,
Definition: defBytes,
}); err != nil {
return fmt.Errorf("failed to create data source function: %w", err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is there a way to share this code path regardless of the type?

return errs
}

func (_ *depsDataSourceHandler) ValidateUpdate(_ any) error { return nil }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I guess we could verify something here, like the cardinality of the functions, but might be impractical. Any ideas?

@puerco
Copy link
Contributor Author

puerco commented Dec 9, 2024

I'll prioritize #5165 because we need that data source sooner, and then come back to this one.

Copy link
Contributor

github-actions bot commented Jan 9, 2025

This PR needs additional information before we can continue. It is now marked as stale because it has been open for 30 days with no activity. Please provide the necessary details to continue or it will be closed in 30 days.

@evankanderson
Copy link
Member

I borrowed much of the patterns of this code for #5326, which was an alternate approach. Thanks for the work on this, @puerco !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dependency Data Source Driver
4 participants