From 5a8ba59e52b973bcc7834eb9addf2f03c38edd75 Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Mon, 1 Jun 2020 02:14:49 +0300 Subject: [PATCH 1/8] docs: replace `dvc pipeline` with `dvc dag` --- config/prismjs/dvc-commands.js | 3 - .../{pipeline/show.md => dag.md} | 71 ++++--------------- .../docs/command-reference/pipeline/index.md | 47 ------------ .../docs/command-reference/pipeline/list.md | 41 ----------- content/docs/sidebar.json | 19 ++--- .../docs/user-guide/running-dvc-on-windows.md | 4 +- redirects-list.json | 3 + 7 files changed, 21 insertions(+), 167 deletions(-) rename content/docs/command-reference/{pipeline/show.md => dag.md} (56%) delete mode 100644 content/docs/command-reference/pipeline/index.md delete mode 100644 content/docs/command-reference/pipeline/list.md diff --git a/config/prismjs/dvc-commands.js b/config/prismjs/dvc-commands.js index 41edc54b68..937f9da24e 100644 --- a/config/prismjs/dvc-commands.js +++ b/config/prismjs/dvc-commands.js @@ -25,9 +25,6 @@ module.exports = [ 'plots modify', 'plots diff', 'plots', - 'pipeline show', - 'pipeline list', - 'pipeline', 'move', 'metrics show', 'metrics diff', diff --git a/content/docs/command-reference/pipeline/show.md b/content/docs/command-reference/dag.md similarity index 56% rename from content/docs/command-reference/pipeline/show.md rename to content/docs/command-reference/dag.md index 57848ead1e..7a55a4e4ff 100644 --- a/content/docs/command-reference/pipeline/show.md +++ b/content/docs/command-reference/dag.md @@ -1,4 +1,4 @@ -# pipeline show +# dag Show [stages](/doc/command-reference/run) in a pipeline that lead to the specified stage. By default it lists @@ -7,43 +7,25 @@ specified stage. By default it lists ## Synopsis ```usage -usage: dvc pipeline show [-h] [-q | -v] [-c | -o] [-l] [--ascii] - [--dot] [--tree] - [targets [targets ...]] +usage: dvc dag [-h] [-q | -v] [--dot] [--full] [target] positional arguments: - targets DVC-files to show pipeline for. Optional. - (Finds all DVC-files in the workspace by default.) + targets Stage or output to show pipeline for. + Optional. (Finds all stages in the workspace by default.) ``` ## Description -`dvc show` displays the stages of a pipeline up to one or more target DVC-files -(stage files). All stages are shown unless specific `targets` are specified. The -`-c` and `-o` options allow to list the corresponding commands or data file flow -instead of stages. - -> Note that the stages in these lists are in descending order, that is, from -> first to last. +`dvc dag` displays the stages of a pipeline up to the target stage. If `target` +is omitted, it will show the full project DAG. ## Options -- `-c`, `--commands` - show pipeline as a list (diagram if `--ascii` or `--dot` - is used) of commands instead of paths to DVC-files. - -- `-o`, `--outs` - show pipeline as a list (diagram if `--ascii` or `--dot` is - used) of stage outputs instead of paths to DVC-files. - -- `--ascii` - visualize pipeline. It will print a graph (ASCII) instead of a - list of path to DVC-files. (`less` pager may be used, see - [Paging the output](#paging-the-output) below for details). - -- `--dot` - show contents of `.dot` files with a DVC pipeline graph. It can be - passed to third party visualization utilities. - -- `--tree` - list dependencies tree like recursive directory listing. +- `--dot` - show DAG in `DOT` format. It can be passed to third party + visualization utilities. -- `-l`, `--locked` - print frozen stages only. See `dvc freeze`. +- `--full` - show full DAG that the target belongs too, instead of show DAG + consisting only of ancestors. - `-h`, `--help` - prints the usage/help message, and exit. @@ -69,7 +51,7 @@ variable. For example, the following command will replace the default pager with [`more`](), for a single run: ```bash -$ DVC_PAGER=more dvc pipeline show --ascii my-pipeline.dvc +$ DVC_PAGER=more dvc dag my-pipeline.dvc ``` For a persistent change, define `DVC_PAGER` in the shell configuration. For @@ -81,28 +63,10 @@ export DVC_PAGER=more ## Examples -Default mode: show stage files that `output.dvc` recursively depends on: - -```dvc -$ dvc pipeline show output.dvc -raw.dvc -data.dvc -output.dvc -``` - -The same as previous, but show commands instead of DVC-files: - -```dvc -$ dvc pipeline show output.dvc --commands -download.py s3://mybucket/myrawdata raw -cleanup.py raw data -process.py data output -``` - Visualize DVC pipeline: ```dvc -$ dvc pipeline show eval.txt.dvc --ascii +$ dvc dag eval.txt.dvc .------------------------. | data/Posts.xml.zip.dvc | `------------------------' @@ -143,14 +107,3 @@ $ dvc pipeline show eval.txt.dvc --ascii | eval.txt.dvc | `--------------' ``` - -List dependencies recursively if the graph has a tree structure: - -```dvc -$ dvc pipeline show e.file.dvc --tree -e.file.dvc -├── c.file.dvc -│ └── b.file.dvc -│ └── a.file.dvc -└── d.file.dvc -``` diff --git a/content/docs/command-reference/pipeline/index.md b/content/docs/command-reference/pipeline/index.md deleted file mode 100644 index b9cf3eb4ae..0000000000 --- a/content/docs/command-reference/pipeline/index.md +++ /dev/null @@ -1,47 +0,0 @@ -# pipeline - -A set of commands to manage -[pipelines](/doc/tutorials/get-started/data-pipelines): -[show](/doc/command-reference/pipeline/show) and -[list](/doc/command-reference/pipeline/list). - -## Synopsis - -```usage -usage: dvc pipeline [-h] [-q | -v] {show,list} ... - -positional arguments: - COMMAND - show Show pipeline. - list List pipelines. -``` - -## Description - -A data pipeline, in general, is a series of data processing -[stages](/doc/command-reference/run) (for example console commands that take an -input and produce an output). A pipeline may produce intermediate -data, and has a final result. Machine learning (ML) pipelines typically start a -with large raw datasets, include intermediate featurization and training stages, -and produce a final model, as well as accuracy -[metrics](/doc/command-reference/metrics). - -In DVC, pipeline stages and commands, their data I/O, interdependencies, and -results (intermediate or final) are specified with `dvc add` and `dvc run`, -among other commands. This allows DVC to restore one or more pipelines of stages -interconnected by their dependencies and outputs later. (See `dvc repro`.) - -> DVC builds a dependency graph -> ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) to do this. - -`dvc pipeline` commands help users display the existing project pipelines in -different ways. - -## Options - -- `-h`, `--help` - prints the usage/help message, and exit. - -- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no - problems arise, otherwise 1. - -- `-v`, `--verbose` - displays detailed tracing information. diff --git a/content/docs/command-reference/pipeline/list.md b/content/docs/command-reference/pipeline/list.md deleted file mode 100644 index 3ee8cfdb2b..0000000000 --- a/content/docs/command-reference/pipeline/list.md +++ /dev/null @@ -1,41 +0,0 @@ -# pipeline list - -List connected groups of [stages](/doc/command-reference/run) (pipelines). - -## Synopsis - -```usage -usage: dvc pipeline list [-h] [-q | -v] -``` - -## Description - -Displays a list of all existing stages in the project, grouped in -their corresponding [pipeline](/doc/command-reference/pipeline), when connected. - -> Note that the stages in these lists are in ascending order, that is, from last -> to first. - -## Options - -- `-h`, `--help` - prints the usage/help message, and exit. - -- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no - problems arise, otherwise 1. - -- `-v`, `--verbose` - displays detailed tracing information. - -## Examples - -List available pipelines: - -```dvc -$ dvc pipeline list -Dvcfile -====================================================================== -raw.dvc -data.dvc -output.dvc -====================================================================== -2 pipelines total -``` diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index ff700adeb2..d71a123f9a 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -155,6 +155,10 @@ "label": "config", "slug": "config" }, + { + "label": "dag", + "slug": "dag" + }, { "label": "destroy", "slug": "destroy" @@ -236,21 +240,6 @@ } ] }, - { - "label": "pipeline", - "slug": "pipeline", - "source": "pipeline/index.md", - "children": [ - { - "label": "pipeline list", - "slug": "list" - }, - { - "label": "pipeline show", - "slug": "show" - } - ] - }, { "label": "plots", "slug": "plots", diff --git a/content/docs/user-guide/running-dvc-on-windows.md b/content/docs/user-guide/running-dvc-on-windows.md index e658eee649..1d4e216a08 100644 --- a/content/docs/user-guide/running-dvc-on-windows.md +++ b/content/docs/user-guide/running-dvc-on-windows.md @@ -70,8 +70,8 @@ directory, as explained in ## Enabling paging with `less` By default, DVC tries to use [Less]() -as pager for the output of `dvc pipeline show`. Windows doesn't have the `less` -command available however. Fortunately, there is a easy way of installing it via +as pager for the output of `dvc dag`. Windows doesn't have the less command +available however. Fortunately, there is a easy way of installing `less` via [Chocolatey](https://chocolatey.org/) (please install the tool first): ```dvc diff --git a/redirects-list.json b/redirects-list.json index 391cbf27e5..9285fed991 100644 --- a/redirects-list.json +++ b/redirects-list.json @@ -33,6 +33,9 @@ "^/doc/command-reference/plot$ /doc/command-reference/plots", "^/doc/command-reference/lock$ /doc/command-reference/freeze", "^/doc/command-reference/unlock$ /doc/command-reference/unfreeze", + "^/doc/command-reference/pipeline$ /doc/command-reference/dag", + "^/doc/command-reference/pipeline/show$ /doc/command-reference/dag", + "^/doc/command-reference/pipeline/list$ /doc/command-reference/dag", "^/(.+)/$ /$1" ] From 28cc03f0198a31f11662f7af5a093fe0d5e2533a Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 1 Jun 2020 13:10:29 -0500 Subject: [PATCH 2/8] Update content/docs/command-reference/dag.md --- content/docs/command-reference/dag.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/command-reference/dag.md b/content/docs/command-reference/dag.md index 7a55a4e4ff..1341504270 100644 --- a/content/docs/command-reference/dag.md +++ b/content/docs/command-reference/dag.md @@ -24,8 +24,8 @@ is omitted, it will show the full project DAG. - `--dot` - show DAG in `DOT` format. It can be passed to third party visualization utilities. -- `--full` - show full DAG that the target belongs too, instead of show DAG - consisting only of ancestors. +- `--full` - show full DAG that the `target` belongs too, instead of showing + the part that consists only of the target ancestors. - `-h`, `--help` - prints the usage/help message, and exit. From 05d40b2c8e29561de12ccda74d6e5bae89a1034a Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 1 Jun 2020 13:10:38 -0500 Subject: [PATCH 3/8] Update content/docs/command-reference/dag.md --- content/docs/command-reference/dag.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/dag.md b/content/docs/command-reference/dag.md index 1341504270..cb8e052607 100644 --- a/content/docs/command-reference/dag.md +++ b/content/docs/command-reference/dag.md @@ -50,7 +50,7 @@ It's possible to override the default pager via the `DVC_PAGER` environment variable. For example, the following command will replace the default pager with [`more`](), for a single run: -```bash +```dvc $ DVC_PAGER=more dvc dag my-pipeline.dvc ``` From 1fb7f4a18ffb903976dd2ef4a266fce5309a78f9 Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Wed, 3 Jun 2020 04:23:56 +0300 Subject: [PATCH 4/8] Update content/docs/command-reference/dag.md Co-authored-by: Jorge Orpinel --- content/docs/command-reference/dag.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/command-reference/dag.md b/content/docs/command-reference/dag.md index cb8e052607..b83de9d8d6 100644 --- a/content/docs/command-reference/dag.md +++ b/content/docs/command-reference/dag.md @@ -10,8 +10,8 @@ specified stage. By default it lists usage: dvc dag [-h] [-q | -v] [--dot] [--full] [target] positional arguments: - targets Stage or output to show pipeline for. - Optional. (Finds all stages in the workspace by default.) + targets Stage or output to show pipeline for (optional) + Finds all stages in the workspace by default. ``` ## Description From a9718a73a7e5f7f9b8993ba26aa94630d7e608aa Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Wed, 3 Jun 2020 04:27:45 +0300 Subject: [PATCH 5/8] add DOT link --- content/docs/command-reference/dag.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/content/docs/command-reference/dag.md b/content/docs/command-reference/dag.md index b83de9d8d6..1e02c88b92 100644 --- a/content/docs/command-reference/dag.md +++ b/content/docs/command-reference/dag.md @@ -21,11 +21,12 @@ is omitted, it will show the full project DAG. ## Options -- `--dot` - show DAG in `DOT` format. It can be passed to third party - visualization utilities. +- `--dot` - show DAG in + [DOT]() + format. It can be passed to third party visualization utilities. -- `--full` - show full DAG that the `target` belongs too, instead of showing - the part that consists only of the target ancestors. +- `--full` - show full DAG that the `target` belongs too, instead of showing the + part that consists only of the target ancestors. - `-h`, `--help` - prints the usage/help message, and exit. From 8f4643740fab6ae8b558d17b64268a32cf30d78e Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Tue, 23 Jun 2020 21:54:30 +0300 Subject: [PATCH 6/8] dag: add old pipeline description https://github.com/iterative/dvc.org/pull/1383#discussion_r443092478 --- content/docs/command-reference/dag.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/content/docs/command-reference/dag.md b/content/docs/command-reference/dag.md index 1e02c88b92..9d71f53437 100644 --- a/content/docs/command-reference/dag.md +++ b/content/docs/command-reference/dag.md @@ -16,6 +16,22 @@ positional arguments: ## Description +A data pipeline, in general, is a series of data processing +[stages](/doc/command-reference/run) (for example console commands that take an +input and produce an output). A pipeline may produce intermediate +data, and has a final result. Machine learning (ML) pipelines typically start a +with large raw datasets, include intermediate featurization and training stages, +and produce a final model, as well as accuracy +[metrics](/doc/command-reference/metrics). + +In DVC, pipeline stages and commands, their data I/O, interdependencies, and +results (intermediate or final) are specified with `dvc add` and `dvc run`, +among other commands. This allows DVC to restore one or more pipelines of stages +interconnected by their dependencies and outputs later. (See `dvc repro`.) + +> DVC builds a dependency graph +> ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) to do this. + `dvc dag` displays the stages of a pipeline up to the target stage. If `target` is omitted, it will show the full project DAG. From 3cbeb07bb48b71a911a686cfcb3fcf3b5978a51d Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Tue, 23 Jun 2020 21:58:55 +0300 Subject: [PATCH 7/8] docs: dag: add new example https://github.com/iterative/dvc.org/pull/1383#discussion_r443167713 https://github.com/iterative/dvc.org/pull/1383#discussion_r443090323 --- content/docs/command-reference/dag.md | 64 ++++++++++----------------- 1 file changed, 23 insertions(+), 41 deletions(-) diff --git a/content/docs/command-reference/dag.md b/content/docs/command-reference/dag.md index 9d71f53437..ff01c3f9f4 100644 --- a/content/docs/command-reference/dag.md +++ b/content/docs/command-reference/dag.md @@ -68,7 +68,7 @@ variable. For example, the following command will replace the default pager with [`more`](), for a single run: ```dvc -$ DVC_PAGER=more dvc dag my-pipeline.dvc +$ DVC_PAGER=more dvc dag ``` For a persistent change, define `DVC_PAGER` in the shell configuration. For @@ -83,44 +83,26 @@ export DVC_PAGER=more Visualize DVC pipeline: ```dvc -$ dvc dag eval.txt.dvc - .------------------------. - | data/Posts.xml.zip.dvc | - `------------------------' - * - * - * - .---------------. - | Posts.xml.dvc | - `---------------' - * - * - * - .---------------. - | Posts.tsv.dvc | - `---------------' - * - * - * - .---------------------. - | Posts-train.tsv.dvc | - `---------------------' - * - * - * - .--------------------. - | matrix-train.p.dvc | - `--------------------' - *** *** - ** *** - ** ** -.-------------. ** -| model.p.dvc | ** -`-------------' *** - *** *** - ** ** - ** ** - .--------------. - | eval.txt.dvc | - `--------------' +$ dvc dag + +---------+ + | prepare | + +---------+ + * + * + * + +-----------+ + | featurize | + +-----------+ + ** ** + ** * + * ** ++-------+ * +| train | ** ++-------+ * + ** ** + ** ** + * * + +----------+ + | evaluate | + +----------+ ``` From a74f4a1d85086bd60ae8eb86d9b41490e6ee4b7f Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Tue, 23 Jun 2020 22:25:41 +0300 Subject: [PATCH 8/8] replace pipeline with dag --- content/docs/command-reference/checkout.md | 4 ++-- content/docs/command-reference/init.md | 5 ++--- content/docs/command-reference/repro.md | 4 ++-- 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/content/docs/command-reference/checkout.md b/content/docs/command-reference/checkout.md index 82bd56d501..baafff3655 100644 --- a/content/docs/command-reference/checkout.md +++ b/content/docs/command-reference/checkout.md @@ -65,8 +65,8 @@ progress made by the checkout. There are two methods to restore a file missing from the cache, depending on the situation. In some cases a pipeline must be reproduced (using `dvc repro`) to -regenerate its outputs (see also `dvc pipeline`). In other cases the cache can -be pulled from remote storage using `dvc pull`. +regenerate its outputs (see also `dvc dag`). In other cases the cache can be +pulled from remote storage using `dvc pull`. ## Options diff --git a/content/docs/command-reference/init.md b/content/docs/command-reference/init.md index 4965558e99..54cbf603b4 100644 --- a/content/docs/command-reference/init.md +++ b/content/docs/command-reference/init.md @@ -61,9 +61,8 @@ sub-projects to mitigate the issues of initializing in the Git repository root: download files and directories, to reproduce pipelines, etc. It can be expensive in the large repositories with a lot of projects. -- Not enough isolation/granularity - commands like `dvc metrics diff`, - `dvc pipeline show` and others by default dump all the metrics, all the - pipelines, etc. +- Not enough isolation/granularity - commands like `dvc metrics diff`, `dvc dag` + and others by default dump all the metrics, all the pipelines, etc. #### How does it affect DVC commands? diff --git a/content/docs/command-reference/repro.md b/content/docs/command-reference/repro.md index 5cf75dbde2..ea924c7a82 100644 --- a/content/docs/command-reference/repro.md +++ b/content/docs/command-reference/repro.md @@ -129,8 +129,8 @@ only execute the final stage. The stage is only executed if the user types "y". - `-p`, `--pipeline` - reproduce the entire pipelines that the stage file - `targets` belong to. Use `dvc pipeline show .dvc` to show the parent - pipeline of a target stage. + `targets` belong to. Use `dvc dag ` to show the parent pipeline of a + target stage. - `-P`, `--all-pipelines` - reproduce all pipelines, for all the stage files present in `DVC` repository.