Skip to content

Commit

Permalink
Add a batch-create management command (#1509)
Browse files Browse the repository at this point in the history
Signed-off-by: tdruez <[email protected]>
  • Loading branch information
tdruez authored Jan 9, 2025
1 parent cf651f1 commit f32e77e
Show file tree
Hide file tree
Showing 10 changed files with 417 additions and 29 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ v34.9.4 (unreleased)
The labels are now always presented in alphabetical order for consistency.
https://github.com/aboutcode-org/scancode.io/issues/1520

- Add a ``batch-create`` management command that allows to create multiple projects
at once from a directory containing input files.
https://github.com/aboutcode-org/scancode.io/issues/1437

v34.9.3 (2024-12-31)
--------------------

Expand Down
92 changes: 91 additions & 1 deletion docs/command-line-interface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ ScanPipe's own commands are listed under the ``[scanpipe]`` section::
add-input
add-pipeline
archive-project
batch-create
check-compliance
create-project
create-user
Expand All @@ -83,7 +84,8 @@ For example::
$ scanpipe create-project --help
usage: scanpipe create-project [--input-file INPUTS_FILES]
[--input-url INPUT_URLS] [--copy-codebase SOURCE_DIRECTORY]
[--pipeline PIPELINES] [--execute] [--async]
[--pipeline PIPELINES] [--label LABELS] [--notes NOTES]
[--execute] [--async]
name

Create a ScanPipe project.
Expand Down Expand Up @@ -124,6 +126,10 @@ Optional arguments:
- ``--copy-codebase SOURCE_DIRECTORY`` Copy the content of the provided source directory
into the :guilabel:`codebase/` work directory.

- ``--notes NOTES`` Optional notes about the project.

- ``--label LABELS`` Optional labels for the project.

- ``--execute`` Execute the pipelines right after project creation.

- ``--async`` Add the pipeline run to the tasks queue for execution by a worker instead
Expand All @@ -133,6 +139,90 @@ Optional arguments:
.. warning::
Pipelines are added and are executed in order.

.. _cli_batch_create:

`$ scanpipe batch-create [--input-directory INPUT_DIRECTORY] [--input-list FILENAME.csv]`
-----------------------------------------------------------------------------------------

Processes files from the specified ``INPUT_DIRECTORY`` or rows from ``FILENAME.csv``,
creating a project for each file or row.

- Use ``--input-directory`` to specify a local directory. Each file in the directory
will result in a project, uniquely named using the filename and a timestamp.

- Use ``--input-list`` to specify a ``FILENAME.csv``. Each row in the CSV will be used
to create a project based on the data provided.

Supports specifying pipelines and asynchronous execution.

Required arguments (one of):

- ``input-directory`` The path to the directory containing the input files to process.
Ensure the directory exists and contains the files you want to use.

- ``input-list`` Path to a CSV file with project names and input URLs.
The first column must contain project names, and the second column should list
comma-separated input URLs (e.g., Download URL, PURL, or Docker reference).

**CSV content example**:

+----------------+---------------------------------+
| project_name | input_urls |
+================+=================================+
| project-1 | https://url.com/file.ext |
+----------------+---------------------------------+
| project-2 | pkg:deb/debian/[email protected] |
+----------------+---------------------------------+

Optional arguments:

- ``--project-name-suffix`` Optional custom suffix to append to project names.
If not provided, a timestamp (in the format [YYMMDD_HHMMSS]) will be used.

- ``--pipeline PIPELINES`` Pipelines names to add on the project.

- ``--notes NOTES`` Optional notes about the project.

- ``--label LABELS`` Optional labels for the project.

- ``--execute`` Execute the pipelines right after project creation.

- ``--async`` Add the pipeline run to the tasks queue for execution by a worker instead
of running in the current thread.
Applies only when ``--execute`` is provided.

Example: Processing Multiple Docker Images
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Assume multiple Docker images are available in a directory named ``local-data/`` on
the host machine.
To process these images with the ``analyze_docker_image`` pipeline using asynchronous
execution::

$ docker compose run --rm \
--volume local-data/:/input-data:ro \
web scanpipe batch-create input-data/ \
--pipeline analyze_docker_image \
--label "Docker" \
--execute --async

**Explanation**:

- ``local-data/``: A directory on the host machine containing the Docker images to
process.
- ``/input-data/``: The directory inside the container where ``local-data/`` is
mounted (read-only).
- ``--pipeline analyze_docker_image``: Specifies the ``analyze_docker_image``
pipeline for processing each Docker image.
- ``--label "Docker"``: Tagging all the projects with the "Docker" label to enable
easy search and filtering.
- ``--execute``: Runs the pipeline immediately after creating a project for each
image.
- ``--async``: Adds the pipeline run to the worker queue for asynchronous execution.

Each Docker image in the ``local-data/`` directory will result in the creation of a
project with the specified pipeline (``analyze_docker_image``) executed by worker
services.

`$ scanpipe list-pipeline [--verbosity {0,1,2,3}]`
--------------------------------------------------
Expand Down
29 changes: 28 additions & 1 deletion docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,33 @@ It does not compute such summary.
You can also have a look at the different steps for each pipeline from the
:ref:`built_in_pipelines` documentation.

How to create multiple projects at once?
-----------------------------------------

You can use the :ref:`cli_batch_create` command to create multiple projects
simultaneously.
This command processes all files in a specified input directory, creating one project
per file.
Each project is uniquely named using the file name and a timestamp by default.

For example, to create multiple projects from files in a directory named
``local-data/``::

$ docker compose run --rm \
--volume local-data/:/input-data:ro \
web scanpipe batch-create input-data/

**Options**:

- **Custom Pipelines**: Use the ``--pipeline`` option to add specific pipelines to the
projects.
- **Asynchronous Execution**: Add ``--execute`` and ``--async`` to queue pipeline
execution for worker processing.
- **Project Notes and Labels**: Use ``--notes`` and ``--label`` to include metadata.

Each file in the input directory will result in the creation of a corresponding project,
ready for pipeline execution.

Can I run multiple pipelines in parallel?
-----------------------------------------

Expand Down Expand Up @@ -279,7 +306,7 @@ data older than 7 days::
See :ref:`command_line_interface` chapter for more information about the scanpipe
command.

How can I provide my license policies ?
How can I provide my license policies?
---------------------------------------

For detailed information about the policies system, refer to :ref:`policies`.
46 changes: 46 additions & 0 deletions scanpipe/management/commands/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,28 @@ def display_status(self, project, verbosity):
self.stdout.write(line)


class PipelineCommandMixin:
def add_arguments(self, parser):
super().add_arguments(parser)
parser.add_argument(
"--pipeline",
action="append",
dest="pipelines",
default=list(),
help=(
"Pipelines names to add to the project. "
"The pipelines are added and executed based on their given order. "
'Groups can be provided using the "pipeline_name:option1,option2" '
"syntax."
),
)
parser.add_argument(
"--execute",
action="store_true",
help="Execute the pipelines right after the project creation.",
)


class AddInputCommandMixin:
def add_arguments(self, parser):
super().add_arguments(parser)
Expand Down Expand Up @@ -427,6 +449,7 @@ def create_project(
input_urls=None,
copy_from="",
notes="",
labels=None,
execute=False,
run_async=False,
command=None,
Expand All @@ -451,6 +474,10 @@ def create_project(
)

project.save()

if labels:
project.labels.add(*labels)

if command:
command.project = project

Expand Down Expand Up @@ -491,6 +518,20 @@ def execute_project(self, run_async=False):


class CreateProjectCommandMixin(ExecuteProjectCommandMixin):
def add_arguments(self, parser):
super().add_arguments(parser)
parser.add_argument(
"--notes",
help="Optional notes about the project.",
)
parser.add_argument(
"--label",
action="append",
dest="labels",
default=list(),
help="Optional labels for the project.",
)

def create_project(
self,
name,
Expand All @@ -499,16 +540,21 @@ def create_project(
input_urls=None,
copy_from="",
notes="",
labels=None,
execute=False,
run_async=False,
):
if execute and not pipelines:
raise CommandError("The --execute option requires one or more pipelines.")

return create_project(
name=name,
pipelines=pipelines,
input_files=input_files,
input_urls=input_urls,
copy_from=copy_from,
notes=notes,
labels=labels,
execute=execute,
run_async=run_async,
command=self,
Expand Down
Loading

0 comments on commit f32e77e

Please sign in to comment.