diff --git a/.gitignore b/.gitignore index 90c96b27eb..091da139a3 100644 --- a/.gitignore +++ b/.gitignore @@ -154,6 +154,7 @@ kedro.db kedro/html docs/tmp-build-artifacts docs/build +docs/temp docs/node_modules docs/source/04_user_guide/source/.ipynb tests/template/fake_project/ diff --git a/docs/build-docs.sh b/docs/build-docs.sh index 3dacc3bd2c..b506018fab 100755 --- a/docs/build-docs.sh +++ b/docs/build-docs.sh @@ -24,4 +24,15 @@ fi # Clean up build artefacts rm -rf docs/build/html/_sources -rm -rf docs/build/[0-9][0-9]_* + +# Copy built HTML to temp directory, clean up build dir and replace with built docs only +rm -rf docs/temp +mkdir docs/temp/ +mkdir docs/temp/html +cp -rf docs/build/html/* docs/temp/html + +rm -rf docs/build +mkdir docs/build +mkdir docs/build/html +cp -rf docs/temp/html/* docs/build/html +rm -rf docs/temp diff --git a/docs/source/faq/faq.md b/docs/source/faq/faq.md index 50d18593a3..1ba16ad911 100644 --- a/docs/source/faq/faq.md +++ b/docs/source/faq/faq.md @@ -9,51 +9,6 @@ Kedro is an open-source Python framework for creating reproducible, maintainable For the source code, take a look at the [Kedro repository on Github](https://github.com/kedro-org/kedro). -## Who maintains Kedro? - -Kedro was originally designed by [Aris Valtazanos](https://github.com/arisvqb) and [Nikolaos Tsaousis](https://github.com/tsanikgr) at QuantumBlack to solve challenges they faced in their project work. Their work was later turned into an internal product by [Peteris Erins](https://github.com/Pet3ris), [Ivan Danov](https://github.com/idanov), [Nikolaos Kaltsas](https://github.com/nikos-kal), [Meisam Emamjome](https://github.com/misamae) and [Nikolaos Tsaousis](https://github.com/tsanikgr). In the project's latest iteration it is an incubating project within [LF AI & Data](https://lfaidata.foundation/). - -Currently, the core Kedro team consists of -[Ahdra Merali](https://github.com/AhdraMeraliQB), -[Andrew Mackay](https://github.com/Mackay031), -[Ankita Katiyar](https://github.com/ankatiyar), -[Antony Milne](https://github.com/AntonyMilneQB), -[Cvetanka Nechevska](https://github.com/cvetankanechevska), -[Deepyaman Datta](https://github.com/deepyaman), -[Gabriel Comym](https://github.com/comym), -[Huong Nguyen](https://github.com/Huongg), -[Ivan Danov](https://github.com/idanov), -[Jannic Holzer](https://github.com/jmholzer), -[Jo Stichbury](https://github.com/stichbury), -[Joel Schwarzmann](https://github.com/datajoely), -[Lim Hoang](https://github.com/limdauto), -[Merel Theisen](https://github.com/merelcht), -[Nero Okwa](https://github.com/NeroOkwa), -[Nok Lam Chan](https://github.com/noklam), -[Rashida Kanchwala](https://github.com/rashidakanchwala), -[Sajid Alam](https://github.com/SajidAlamQB), -[Tynan DeBold](https://github.com/tynandebold) and -[Yetunde Dada](https://github.com/yetudada). - -Former core team members with significant contributions include: -[Andrii Ivaniuk](https://github.com/andrii-ivaniuk), -[Anton Kirilenko](https://github.com/Flid), -[Dmitrii Deriabin](https://github.com/dmder), -[Gordon Wrigley](https://github.com/tolomea), -[Hamza Oza](https://github.com/hamzaoza), -[Ignacio Paricio](https://github.com/ignacioparicio), -[Jiri Klein](https://github.com/jiriklein), -[Kiyohito Kunii](https://github.com/921kiyo), -[Laís Carvalho](https://github.com/laisbsc), -[Liam Brummitt](https://github.com/bru5), -[Lorena Bălan](https://github.com/lorenabalan), -[Nasef Khan](https://github.com/nakhan98), -[Richard Westenra](https://github.com/richardwestenra), -[Susanna Wong](https://github.com/studioswong) and -[Zain Patel](https://github.com/mzjp2). - -And last, but not least, all the open-source contributors whose work went into all Kedro [releases](https://github.com/kedro-org/kedro/blob/main/RELEASE.md). - ## What are the primary advantages of Kedro? If you're a Data Scientist, then you should be interested in Kedro because it enables you to: @@ -97,6 +52,37 @@ The responsibility of _"What time will this pipeline run?"_, _"How do I manage m failed?"_ is left to the orchestrators. We also have deployment guidelines for using orchestrators as deployment targets and are working in collaboration with the maintainers of some of those tools to make the deployment experience as enjoyable as possible. +## What is the typical Kedro project development workflow? + +When you build a Kedro project, you will typically follow a standard development workflow: + +![](../meta/images/typical_workflow.png) + +### 1. Set up the project template + +* Create a new project with `kedro new` +* Install project dependencies with `pip install -r src/requirements.txt` +* Configure the following in the `conf` folder: + * Logging + * Credentials + * Any other sensitive / personal content + +### 2. Set up the data + +* Add data to the `data` folder +* Reference all datasets for the project in the `conf/base/catalog.yml` file + +### 3. Create the pipeline + +* Create the data transformation steps as Python functions +* Add your functions as nodes, to construct the pipeline +* Choose how to run the pipeline: sequentially or in parallel + +### 4. Package the project + + * Build the project documentation + * Package the project for distribution + ## What is data engineering convention? [Bruce Philp](https://github.com/bruceaphilp) and [Guilherme Braccialli](https://github.com/gbraccialli-qb) are the @@ -166,10 +152,115 @@ There are a host of articles, podcasts, talks and Kedro showcase projects in the Our preferred Kedro-community channel for feedback is through [GitHub issues](https://github.com/kedro-org/kedro/issues). We update the codebase regularly; you can find news about updates and features in the [RELEASE.md file on the Github repository](https://github.com/kedro-org/kedro/blob/develop/RELEASE.md). +## Who maintains Kedro? + +Kedro was originally designed by [Aris Valtazanos](https://github.com/arisvqb) and [Nikolaos Tsaousis](https://github.com/tsanikgr) at QuantumBlack to solve challenges they faced in their project work. Their work was later turned into an internal product by [Peteris Erins](https://github.com/Pet3ris), [Ivan Danov](https://github.com/idanov), [Nikolaos Kaltsas](https://github.com/nikos-kal), [Meisam Emamjome](https://github.com/misamae) and [Nikolaos Tsaousis](https://github.com/tsanikgr). In the project's latest iteration it is an incubating project within [LF AI & Data](https://lfaidata.foundation/). + +Currently, the core Kedro team consists of +[Ahdra Merali](https://github.com/AhdraMeraliQB), +[Andrew Mackay](https://github.com/Mackay031), +[Ankita Katiyar](https://github.com/ankatiyar), +[Antony Milne](https://github.com/AntonyMilneQB), +[Cvetanka Nechevska](https://github.com/cvetankanechevska), +[Deepyaman Datta](https://github.com/deepyaman), +[Gabriel Comym](https://github.com/comym), +[Huong Nguyen](https://github.com/Huongg), +[Ivan Danov](https://github.com/idanov), +[Jannic Holzer](https://github.com/jmholzer), +[Jo Stichbury](https://github.com/stichbury), +[Joel Schwarzmann](https://github.com/datajoely), +[Lim Hoang](https://github.com/limdauto), +[Merel Theisen](https://github.com/merelcht), +[Nero Okwa](https://github.com/NeroOkwa), +[Nok Lam Chan](https://github.com/noklam), +[Rashida Kanchwala](https://github.com/rashidakanchwala), +[Sajid Alam](https://github.com/SajidAlamQB), +[Tynan DeBold](https://github.com/tynandebold) and +[Yetunde Dada](https://github.com/yetudada). + +Former core team members with significant contributions include: +[Andrii Ivaniuk](https://github.com/andrii-ivaniuk), +[Anton Kirilenko](https://github.com/Flid), +[Dmitrii Deriabin](https://github.com/dmder), +[Gordon Wrigley](https://github.com/tolomea), +[Hamza Oza](https://github.com/hamzaoza), +[Ignacio Paricio](https://github.com/ignacioparicio), +[Jiri Klein](https://github.com/jiriklein), +[Kiyohito Kunii](https://github.com/921kiyo), +[Laís Carvalho](https://github.com/laisbsc), +[Liam Brummitt](https://github.com/bru5), +[Lorena Bălan](https://github.com/lorenabalan), +[Nasef Khan](https://github.com/nakhan98), +[Richard Westenra](https://github.com/richardwestenra), +[Susanna Wong](https://github.com/studioswong) and +[Zain Patel](https://github.com/mzjp2). + +And last, but not least, all the open-source contributors whose work went into all Kedro [releases](https://github.com/kedro-org/kedro/blob/main/RELEASE.md). + + ## How can I cite Kedro? If you're an academic, Kedro can also help you, for example, as a tool to solve the problem of reproducible research. Use the "Cite this repository" button on [our repository](https://github.com/kedro-org/kedro) to generate a citation from the [CITATION.cff file](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files). +## Can I create a virtual environment without `conda`? + +You can use `venv` or `pipenv` instead. + +### `venv` (instead of `conda`) + +If you use Python 3, you should already have the `venv` module installed with the standard library. Create a directory for working with Kedro within your virtual environment: + +```bash +mkdir kedro-environment && cd kedro-environment +``` + +This will create a `kedro-environment` directory in your current working directory. Next, to create a new virtual environment in this directory, run: + +```bash +python -m venv env/kedro-environment # macOS / Linux +python -m venv env\kedro-environment # Windows +``` + +Activate this virtual environment: + +```bash +source env/kedro-environment/bin/activate # macOS / Linux +.\env\kedro-environment\Scripts\activate # Windows +``` + +To exit the environment: + +```bash +deactivate +``` + +### `pipenv` (instead of `conda`) + +Install `pipenv` as follows: + +```bash +pip install pipenv +``` + +Create a directory for the virtual environment and change to that directory: + +```bash +mkdir kedro-environment && cd kedro-environment +``` + +Once all the dependencies are installed, to start a session with the correct virtual environment activated: + +```bash +pipenv shell +``` + +To exit the shell session: + +```bash +exit +``` + + ## How can I get my question answered? -If your question isn't answered above, check out the [searchable archive from our retired Discord server](https://linen-discord.kedro.org/community or post a new query on the [Slack organisation](https://join.slack.com/t/kedro-org/shared_invite/zt-1eicp0iw6-nkBvDlfAYb1AUJV7DgBIvw). +If your question isn't answered above, talk to the community on the [Kedro Slack channels](https://join.slack.com/t/kedro-org/shared_invite/zt-1eicp0iw6-nkBvDlfAYb1AUJV7DgBIvw). diff --git a/docs/source/get_started/install.md b/docs/source/get_started/install.md index b62f0125ad..66eac9a859 100644 --- a/docs/source/get_started/install.md +++ b/docs/source/get_started/install.md @@ -7,11 +7,7 @@ pip install kedro ``` ```{note} -It is also possible to install Kedro using `conda`, as follows, but we recommend you use `pip` at this point to eliminate any potential dependency issues: -``` - -```bash -conda install -c conda-forge kedro +It is also possible to install Kedro using `conda install -c conda-forge kedro` but we recommend you use `pip` at this point to eliminate any potential dependency issues: ``` Both `pip` and `conda` install the core Kedro module, which includes the CLI tool, project template, pipeline abstraction, framework, and support for configuration. diff --git a/docs/source/get_started/prerequisites.md b/docs/source/get_started/prerequisites.md index b492502bca..daa2bc00a0 100644 --- a/docs/source/get_started/prerequisites.md +++ b/docs/source/get_started/prerequisites.md @@ -1,100 +1,51 @@ # Installation prerequisites -- Kedro supports macOS, Linux and Windows (7 / 8 / 10 and Windows Server 2016+). If you - encounter any problems on these platforms, please check the [frequently asked questions](../faq/faq.md), the [searchable archive from our retired Discord server](https://linen-discord.kedro.org) or post a new query on the [Slack organisation](https://join.slack.com/t/kedro-org/shared_invite/zt-1eicp0iw6-nkBvDlfAYb1AUJV7DgBIvw). +Kedro supports macOS, Linux and Windows. If you encounter any problems on these platforms, please check the [frequently asked questions](../faq/faq.md), the [searchable archive from our retired Discord server](https://linen-discord.kedro.org) or post a new query on the [Slack organisation](https://join.slack.com/t/kedro-org/shared_invite/zt-1eicp0iw6-nkBvDlfAYb1AUJV7DgBIvw). -- To work with Kedro, we highly recommend that you [download and install Anaconda](https://www.anaconda.com/products/individual#Downloads) (Python 3.x version). - -- If you use PySpark, you must also [install Java](https://www.oracle.com/java/technologies/javase-downloads.html). If you are a Windows user, you will need admin rights to complete the installation. - -## Virtual environments - -The main purpose of Python virtual environments is to create an isolated environment for a Python project to have its own dependencies, regardless of other projects. We recommend you create a new virtual environment for *each* new Kedro project you create. - -> [Read more about Python Virtual Environments](https://realpython.com/python-virtual-environments-a-primer/). - -Depending on your preferred Python installation, you can create virtual environments to work with Kedro as follows: - -- With [`conda`](#conda), a package and environment manager program bundled with Anaconda - -- Without Anaconda, using [`venv`](#venv-instead-of-conda) or [`pipenv`](#pipenv-instead-of-conda) - -### `conda` - -[Install `conda`](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) on your computer. - -Create a new Python virtual environment, called `kedro-environment`, using `conda`: - -```bash -conda create --name kedro-environment python=3.7 -y -``` - -This will create an isolated Python 3.7 environment. To activate it: +If you are a Windows user, you will need to install [`git`](https://git-scm.com/) onto your machine if you do not have it. To confirm whether you have it installed: ```bash -conda activate kedro-environment +git -v ``` -To exit `kedro-environment`: - -```bash -conda deactivate -``` +You should see the version of `git` available, or an error message to indicate that it is not installed. -```{note} -The `conda` virtual environment is not dependent on your current working directory and can be activated from any directory. -``` +PySpark users must [install Java](https://www.oracle.com/java/technologies/javase-downloads.html) (if you are working on Windows, you will need admin rights to complete the installation). -### `venv` (instead of `conda`) +## Virtual environments +We recommend that you create a new Python virtual environment for *each* new Kedro project you create. A virtual environment creates an isolated environment for a Python project to have its own dependencies, regardless of other projects. -If you use Python 3, you should already have the `venv` module installed with the standard library. Create a directory for working with Kedro within your virtual environment: +If you don't already have it, you should [download and install Anaconda](https://www.anaconda.com/products/individual#Downloads) (Python 3.x version), which comes bundled with a package and environment manager called `conda`. -```bash -mkdir kedro-environment && cd kedro-environment -``` +> [Read more about Python virtual environments](https://realpython.com/python-virtual-environments-a-primer/) or [watch an explainer video about them](https://youtu.be/YKfAwIItO7M). -This will create a `kedro-environment` directory in your current working directory. Next, to create a new virtual environment in this directory, run: -```bash -python -m venv env/kedro-environment # macOS / Linux -python -m venv env\kedro-environment # Windows -``` +Depending on your preferred Python installation, you can also create virtual environments to work with Kedro using `venv` or `pipenv` instead of `conda`. Further information about these can be found in the [FAQ](../faq/faq.md) -Activate this virtual environment: +### Create a virtual environment with `conda` -```bash -source env/kedro-environment/bin/activate # macOS / Linux -.\env\kedro-environment\Scripts\activate # Windows -``` +1. [Install `conda`](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) on your computer. -To exit the environment: +2. Create a new Python virtual environment, called `kedro-environment`, using `conda`: ```bash -deactivate +conda create --name kedro-environment python=3.8 -y ``` -### `pipenv` (instead of `conda`) +This will create an isolated Python 3.8 environment. -Install `pipenv` as follows: +3. Activate the new environment: ```bash -pip install pipenv -``` - -Create a directory for the virtual environment and change to that directory: - -```bash -mkdir kedro-environment && cd kedro-environment +conda activate kedro-environment ``` -Once all the dependencies are installed, to start a session with the correct virtual environment activated: +4. To exit `kedro-environment`: ```bash -pipenv shell +conda deactivate ``` -To exit the shell session: - -```bash -exit +```{note} +The `conda` virtual environment is not dependent on your current working directory and can be activated from any directory. ``` diff --git a/docs/source/index.rst b/docs/source/index.rst index 13fcf4b3ca..c6458ccb3d 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -78,11 +78,19 @@ Welcome to Kedro's documentation! tutorial/spaceflights_tutorial tutorial/tutorial_template tutorial/set_up_data - tutorial/create_pipelines - tutorial/visualise_pipeline - tutorial/namespace_pipelines - tutorial/set_up_experiment_tracking + tutorial/create_a_pipeline + tutorial/add_another_pipeline tutorial/package_a_project + tutorial/spaceflights_tutorial_faqs + +.. toctree:: + :maxdepth: 2 + :caption: Visualisation with Kedro-Viz + + visualisation/kedro-viz_visualisation + visualisation/visualise_charts_with_plotly + visualisation/experiment_tracking + .. toctree:: :maxdepth: 2 diff --git a/docs/source/logging/experiment_tracking.md b/docs/source/logging/experiment_tracking.md index 2d230b50f6..47802af01b 100644 --- a/docs/source/logging/experiment_tracking.md +++ b/docs/source/logging/experiment_tracking.md @@ -10,7 +10,7 @@ However, Kedro was missing a way to log metrics and capture all this logged data Experiment tracking in Kedro adds in the missing pieces and will be developed incrementally. -The following section outlines the setup within your Kedro project to enable experiment tracking. You can also refer to the [tutorial on setting up experiment tracking](../tutorial/set_up_experiment_tracking.md) for a step-by-step process to access your tracking datasets on Kedro-Viz. +The following section outlines the setup within your Kedro project to enable experiment tracking. You can also refer to the [Kedro Viz documentation about experiment tracking](../visualisation/experiment_tracking.md) for a step-by-step process to access your tracking datasets on Kedro-Viz. ## Enable experiment tracking diff --git a/docs/source/meta/images/coffee-cup.png b/docs/source/meta/images/coffee-cup.png new file mode 100644 index 0000000000..e394bb0dc1 Binary files /dev/null and b/docs/source/meta/images/coffee-cup.png differ diff --git a/docs/source/meta/images/moon-rocket.png b/docs/source/meta/images/moon-rocket.png new file mode 100644 index 0000000000..09f4efc52f Binary files /dev/null and b/docs/source/meta/images/moon-rocket.png differ diff --git a/docs/source/nodes_and_pipelines/modular_pipelines.md b/docs/source/nodes_and_pipelines/modular_pipelines.md index 7d0ae7000e..77212393b8 100644 --- a/docs/source/nodes_and_pipelines/modular_pipelines.md +++ b/docs/source/nodes_and_pipelines/modular_pipelines.md @@ -27,7 +27,7 @@ In this section, you will learn about how to take advantage of modular pipelines 3. **The `kedro.pipeline.modular_pipeline.pipeline` wrapper method unlocks the real power of modular pipelines** * Applying [namespaces](https://en.wikipedia.org/wiki/Namespace) allows you to simplify your mental model and isolate 'within pipeline' processing steps. - * ``Kedro-Viz`` is able to accelerate development by [rendering namespaced](../tutorial/visualise_pipeline.md) pipelines as collapsible 'super nodes'. + * ``Kedro-Viz`` is able to accelerate development by rendering namespaced pipelines as collapsible 'super nodes'.