Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial documentation (Spaceflights) improvements #1967

Merged
merged 45 commits into from
Nov 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
c9562ec
Update first two pages of the tutorial
stichbury Oct 20, 2022
66f770f
Update tutorial docs
stichbury Oct 20, 2022
45537b0
Tutorial improvements
stichbury Oct 24, 2022
36e1e90
Update create pipelines docs
stichbury Oct 25, 2022
df7a44a
Further spaceflights enhancements
stichbury Oct 25, 2022
0753c7c
Merge branch 'main' into kedro-1874-spaceflights-improvements
stichbury Oct 26, 2022
6346146
Updates
stichbury Oct 26, 2022
05c8faa
More changes to docs structure and content
stichbury Oct 30, 2022
bf40827
Merge branch 'main' into kedro-1874-spaceflights-improvements
stichbury Oct 30, 2022
e8bf8f1
Few more tweaks
stichbury Oct 30, 2022
72063a0
Update kedro-viz_visualisation.md
stichbury Nov 1, 2022
4813607
Update build-docs.sh
stichbury Nov 1, 2022
da279d4
Update .gitignore to add docs/temp directory
stichbury Nov 1, 2022
a546b13
Update docs/source/tutorial/create_pipelines.md
stichbury Nov 1, 2022
a4d40a1
Merge branch 'kedro-1874-spaceflights-improvements' of https://github…
stichbury Nov 1, 2022
74f0e9a
Merge branch 'main' into kedro-1874-spaceflights-improvements
stichbury Nov 1, 2022
c50f32b
More changes to add modular pipelines
stichbury Nov 1, 2022
4d52da5
Merge branch 'kedro-1874-spaceflights-improvements' of https://github…
stichbury Nov 1, 2022
7ae6371
Few more updates
stichbury Nov 2, 2022
7a24d8d
Pre-commit checks
stichbury Nov 2, 2022
f517323
Merge branch 'main' into kedro-1874-spaceflights-improvements
stichbury Nov 2, 2022
2ed06f0
Fix a few issues
stichbury Nov 2, 2022
0b482a0
Merge branch 'kedro-1874-spaceflights-improvements' of https://github…
stichbury Nov 2, 2022
5dd4c68
Update docs/source/tutorial/set_up_data.md
stichbury Nov 4, 2022
1048b6e
Update docs/source/tutorial/set_up_data.md
stichbury Nov 4, 2022
49138e2
Update docs/source/tutorial/spaceflights_tutorial.md
stichbury Nov 4, 2022
7a34ec0
Merge branch 'main' into kedro-1874-spaceflights-improvements
stichbury Nov 7, 2022
c09cbbb
Merge changes by picklejuicedev
stichbury Nov 8, 2022
887553e
Update add_another_pipeline.md
stichbury Nov 8, 2022
5801bec
More changes following feedback
stichbury Nov 8, 2022
4abd5e4
Further updates following review
stichbury Nov 9, 2022
fa70a8b
Update docs/source/tutorial/package_a_project.md
stichbury Nov 9, 2022
5bd2d3c
Update docs/source/visualisation/kedro-viz_visualisation.md
stichbury Nov 9, 2022
a6aa4e9
Merge branch 'main' into kedro-1874-spaceflights-improvements
stichbury Nov 9, 2022
577ebb0
Further changes following feedback
stichbury Nov 9, 2022
1df4c1c
Few more changes
stichbury Nov 10, 2022
79d12de
Final tweaks
stichbury Nov 10, 2022
c35793a
Merge branch 'main' into kedro-1874-spaceflights-improvements
stichbury Nov 10, 2022
e00f884
Update docs/source/tutorial/package_a_project.md
stichbury Nov 10, 2022
3e937f3
Final changes
stichbury Nov 10, 2022
5f2af7d
Merge branch 'kedro-1874-spaceflights-improvements' of https://github…
stichbury Nov 10, 2022
93099d4
Linted
stichbury Nov 10, 2022
e238fc0
Merge branch 'main' into kedro-1874-spaceflights-improvements
stichbury Nov 10, 2022
e145599
Update experiment_tracking.md
stichbury Nov 10, 2022
335863b
Merge branch 'main' into kedro-1874-spaceflights-improvements
stichbury Nov 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ kedro.db
kedro/html
docs/tmp-build-artifacts
docs/build
docs/temp
docs/node_modules
docs/source/04_user_guide/source/.ipynb
tests/template/fake_project/
Expand Down
13 changes: 12 additions & 1 deletion docs/build-docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,15 @@ fi

# Clean up build artefacts
rm -rf docs/build/html/_sources
rm -rf docs/build/[0-9][0-9]_*

# Copy built HTML to temp directory, clean up build dir and replace with built docs only
rm -rf docs/temp
mkdir docs/temp/
mkdir docs/temp/html
cp -rf docs/build/html/* docs/temp/html

rm -rf docs/build
mkdir docs/build
mkdir docs/build/html
cp -rf docs/temp/html/* docs/build/html
rm -rf docs/temp
stichbury marked this conversation as resolved.
Show resolved Hide resolved
183 changes: 137 additions & 46 deletions docs/source/faq/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,51 +9,6 @@ Kedro is an open-source Python framework for creating reproducible, maintainable

For the source code, take a look at the [Kedro repository on Github](https://github.com/kedro-org/kedro).

## Who maintains Kedro?

Kedro was originally designed by [Aris Valtazanos](https://github.com/arisvqb) and [Nikolaos Tsaousis](https://github.com/tsanikgr) at QuantumBlack to solve challenges they faced in their project work. Their work was later turned into an internal product by [Peteris Erins](https://github.com/Pet3ris), [Ivan Danov](https://github.com/idanov), [Nikolaos Kaltsas](https://github.com/nikos-kal), [Meisam Emamjome](https://github.com/misamae) and [Nikolaos Tsaousis](https://github.com/tsanikgr). In the project's latest iteration it is an incubating project within [LF AI & Data](https://lfaidata.foundation/).

Currently, the core Kedro team consists of
[Ahdra Merali](https://github.com/AhdraMeraliQB),
[Andrew Mackay](https://github.com/Mackay031),
[Ankita Katiyar](https://github.com/ankatiyar),
[Antony Milne](https://github.com/AntonyMilneQB),
[Cvetanka Nechevska](https://github.com/cvetankanechevska),
[Deepyaman Datta](https://github.com/deepyaman),
[Gabriel Comym](https://github.com/comym),
[Huong Nguyen](https://github.com/Huongg),
[Ivan Danov](https://github.com/idanov),
[Jannic Holzer](https://github.com/jmholzer),
[Jo Stichbury](https://github.com/stichbury),
[Joel Schwarzmann](https://github.com/datajoely),
[Lim Hoang](https://github.com/limdauto),
[Merel Theisen](https://github.com/merelcht),
[Nero Okwa](https://github.com/NeroOkwa),
[Nok Lam Chan](https://github.com/noklam),
[Rashida Kanchwala](https://github.com/rashidakanchwala),
[Sajid Alam](https://github.com/SajidAlamQB),
[Tynan DeBold](https://github.com/tynandebold) and
[Yetunde Dada](https://github.com/yetudada).

Former core team members with significant contributions include:
[Andrii Ivaniuk](https://github.com/andrii-ivaniuk),
[Anton Kirilenko](https://github.com/Flid),
[Dmitrii Deriabin](https://github.com/dmder),
[Gordon Wrigley](https://github.com/tolomea),
[Hamza Oza](https://github.com/hamzaoza),
[Ignacio Paricio](https://github.com/ignacioparicio),
[Jiri Klein](https://github.com/jiriklein),
[Kiyohito Kunii](https://github.com/921kiyo),
[Laís Carvalho](https://github.com/laisbsc),
[Liam Brummitt](https://github.com/bru5),
[Lorena Bălan](https://github.com/lorenabalan),
[Nasef Khan](https://github.com/nakhan98),
[Richard Westenra](https://github.com/richardwestenra),
[Susanna Wong](https://github.com/studioswong) and
[Zain Patel](https://github.com/mzjp2).

And last, but not least, all the open-source contributors whose work went into all Kedro [releases](https://github.com/kedro-org/kedro/blob/main/RELEASE.md).

## What are the primary advantages of Kedro?

If you're a Data Scientist, then you should be interested in Kedro because it enables you to:
Expand Down Expand Up @@ -97,6 +52,37 @@ The responsibility of _"What time will this pipeline run?"_, _"How do I manage m
failed?"_ is left to the orchestrators. We also have deployment guidelines for using orchestrators as deployment
targets and are working in collaboration with the maintainers of some of those tools to make the deployment experience as enjoyable as possible.

## What is the typical Kedro project development workflow?

When you build a Kedro project, you will typically follow a standard development workflow:

![](../meta/images/typical_workflow.png)

### 1. Set up the project template

* Create a new project with `kedro new`
* Install project dependencies with `pip install -r src/requirements.txt`
* Configure the following in the `conf` folder:
* Logging
* Credentials
* Any other sensitive / personal content

### 2. Set up the data

* Add data to the `data` folder
* Reference all datasets for the project in the `conf/base/catalog.yml` file

### 3. Create the pipeline

* Create the data transformation steps as Python functions
* Add your functions as nodes, to construct the pipeline
* Choose how to run the pipeline: sequentially or in parallel

### 4. Package the project

* Build the project documentation
* Package the project for distribution

## What is data engineering convention?

[Bruce Philp](https://github.com/bruceaphilp) and [Guilherme Braccialli](https://github.com/gbraccialli-qb) are the
Expand Down Expand Up @@ -166,10 +152,115 @@ There are a host of articles, podcasts, talks and Kedro showcase projects in the

Our preferred Kedro-community channel for feedback is through [GitHub issues](https://github.com/kedro-org/kedro/issues). We update the codebase regularly; you can find news about updates and features in the [RELEASE.md file on the Github repository](https://github.com/kedro-org/kedro/blob/develop/RELEASE.md).

## Who maintains Kedro?

Kedro was originally designed by [Aris Valtazanos](https://github.com/arisvqb) and [Nikolaos Tsaousis](https://github.com/tsanikgr) at QuantumBlack to solve challenges they faced in their project work. Their work was later turned into an internal product by [Peteris Erins](https://github.com/Pet3ris), [Ivan Danov](https://github.com/idanov), [Nikolaos Kaltsas](https://github.com/nikos-kal), [Meisam Emamjome](https://github.com/misamae) and [Nikolaos Tsaousis](https://github.com/tsanikgr). In the project's latest iteration it is an incubating project within [LF AI & Data](https://lfaidata.foundation/).

Currently, the core Kedro team consists of
[Ahdra Merali](https://github.com/AhdraMeraliQB),
[Andrew Mackay](https://github.com/Mackay031),
[Ankita Katiyar](https://github.com/ankatiyar),
[Antony Milne](https://github.com/AntonyMilneQB),
[Cvetanka Nechevska](https://github.com/cvetankanechevska),
[Deepyaman Datta](https://github.com/deepyaman),
[Gabriel Comym](https://github.com/comym),
[Huong Nguyen](https://github.com/Huongg),
[Ivan Danov](https://github.com/idanov),
[Jannic Holzer](https://github.com/jmholzer),
[Jo Stichbury](https://github.com/stichbury),
[Joel Schwarzmann](https://github.com/datajoely),
[Lim Hoang](https://github.com/limdauto),
[Merel Theisen](https://github.com/merelcht),
[Nero Okwa](https://github.com/NeroOkwa),
[Nok Lam Chan](https://github.com/noklam),
[Rashida Kanchwala](https://github.com/rashidakanchwala),
[Sajid Alam](https://github.com/SajidAlamQB),
[Tynan DeBold](https://github.com/tynandebold) and
[Yetunde Dada](https://github.com/yetudada).

Former core team members with significant contributions include:
[Andrii Ivaniuk](https://github.com/andrii-ivaniuk),
[Anton Kirilenko](https://github.com/Flid),
[Dmitrii Deriabin](https://github.com/dmder),
[Gordon Wrigley](https://github.com/tolomea),
[Hamza Oza](https://github.com/hamzaoza),
[Ignacio Paricio](https://github.com/ignacioparicio),
[Jiri Klein](https://github.com/jiriklein),
[Kiyohito Kunii](https://github.com/921kiyo),
[Laís Carvalho](https://github.com/laisbsc),
[Liam Brummitt](https://github.com/bru5),
[Lorena Bălan](https://github.com/lorenabalan),
[Nasef Khan](https://github.com/nakhan98),
[Richard Westenra](https://github.com/richardwestenra),
[Susanna Wong](https://github.com/studioswong) and
[Zain Patel](https://github.com/mzjp2).

And last, but not least, all the open-source contributors whose work went into all Kedro [releases](https://github.com/kedro-org/kedro/blob/main/RELEASE.md).


## How can I cite Kedro?

If you're an academic, Kedro can also help you, for example, as a tool to solve the problem of reproducible research. Use the "Cite this repository" button on [our repository](https://github.com/kedro-org/kedro) to generate a citation from the [CITATION.cff file](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files).

## Can I create a virtual environment without `conda`?

You can use `venv` or `pipenv` instead.

### `venv` (instead of `conda`)

If you use Python 3, you should already have the `venv` module installed with the standard library. Create a directory for working with Kedro within your virtual environment:

```bash
mkdir kedro-environment && cd kedro-environment
```

This will create a `kedro-environment` directory in your current working directory. Next, to create a new virtual environment in this directory, run:

```bash
python -m venv env/kedro-environment # macOS / Linux
python -m venv env\kedro-environment # Windows
```

Activate this virtual environment:

```bash
source env/kedro-environment/bin/activate # macOS / Linux
.\env\kedro-environment\Scripts\activate # Windows
```

To exit the environment:

```bash
deactivate
```

### `pipenv` (instead of `conda`)

Install `pipenv` as follows:

```bash
pip install pipenv
```

Create a directory for the virtual environment and change to that directory:

```bash
mkdir kedro-environment && cd kedro-environment
```

Once all the dependencies are installed, to start a session with the correct virtual environment activated:

```bash
pipenv shell
```

To exit the shell session:

```bash
exit
```


## How can I get my question answered?

If your question isn't answered above, check out the [searchable archive from our retired Discord server](https://linen-discord.kedro.org/community or post a new query on the [Slack organisation](https://join.slack.com/t/kedro-org/shared_invite/zt-1eicp0iw6-nkBvDlfAYb1AUJV7DgBIvw).
If your question isn't answered above, talk to the community on the [Kedro Slack channels](https://join.slack.com/t/kedro-org/shared_invite/zt-1eicp0iw6-nkBvDlfAYb1AUJV7DgBIvw).
6 changes: 1 addition & 5 deletions docs/source/get_started/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,7 @@ pip install kedro
```

```{note}
It is also possible to install Kedro using `conda`, as follows, but we recommend you use `pip` at this point to eliminate any potential dependency issues:
```

```bash
conda install -c conda-forge kedro
It is also possible to install Kedro using `conda install -c conda-forge kedro` but we recommend you use `pip` at this point to eliminate any potential dependency issues:
```

Both `pip` and `conda` install the core Kedro module, which includes the CLI tool, project template, pipeline abstraction, framework, and support for configuration.
Expand Down
91 changes: 21 additions & 70 deletions docs/source/get_started/prerequisites.md
Original file line number Diff line number Diff line change
@@ -1,100 +1,51 @@
# Installation prerequisites

- Kedro supports macOS, Linux and Windows (7 / 8 / 10 and Windows Server 2016+). If you
encounter any problems on these platforms, please check the [frequently asked questions](../faq/faq.md), the [searchable archive from our retired Discord server](https://linen-discord.kedro.org) or post a new query on the [Slack organisation](https://join.slack.com/t/kedro-org/shared_invite/zt-1eicp0iw6-nkBvDlfAYb1AUJV7DgBIvw).
Kedro supports macOS, Linux and Windows. If you encounter any problems on these platforms, please check the [frequently asked questions](../faq/faq.md), the [searchable archive from our retired Discord server](https://linen-discord.kedro.org) or post a new query on the [Slack organisation](https://join.slack.com/t/kedro-org/shared_invite/zt-1eicp0iw6-nkBvDlfAYb1AUJV7DgBIvw).

- To work with Kedro, we highly recommend that you [download and install Anaconda](https://www.anaconda.com/products/individual#Downloads) (Python 3.x version).

- If you use PySpark, you must also [install Java](https://www.oracle.com/java/technologies/javase-downloads.html). If you are a Windows user, you will need admin rights to complete the installation.

## Virtual environments

The main purpose of Python virtual environments is to create an isolated environment for a Python project to have its own dependencies, regardless of other projects. We recommend you create a new virtual environment for *each* new Kedro project you create.

> [Read more about Python Virtual Environments](https://realpython.com/python-virtual-environments-a-primer/).

Depending on your preferred Python installation, you can create virtual environments to work with Kedro as follows:

- With [`conda`](#conda), a package and environment manager program bundled with Anaconda

- Without Anaconda, using [`venv`](#venv-instead-of-conda) or [`pipenv`](#pipenv-instead-of-conda)

### `conda`

[Install `conda`](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) on your computer.

Create a new Python virtual environment, called `kedro-environment`, using `conda`:

```bash
conda create --name kedro-environment python=3.7 -y
```

This will create an isolated Python 3.7 environment. To activate it:
If you are a Windows user, you will need to install [`git`](https://git-scm.com/) onto your machine if you do not have it. To confirm whether you have it installed:
stichbury marked this conversation as resolved.
Show resolved Hide resolved

```bash
conda activate kedro-environment
git -v
```

To exit `kedro-environment`:

```bash
conda deactivate
```
You should see the version of `git` available, or an error message to indicate that it is not installed.

```{note}
The `conda` virtual environment is not dependent on your current working directory and can be activated from any directory.
```
PySpark users must [install Java](https://www.oracle.com/java/technologies/javase-downloads.html) (if you are working on Windows, you will need admin rights to complete the installation).

### `venv` (instead of `conda`)
## Virtual environments
We recommend that you create a new Python virtual environment for *each* new Kedro project you create. A virtual environment creates an isolated environment for a Python project to have its own dependencies, regardless of other projects.

If you use Python 3, you should already have the `venv` module installed with the standard library. Create a directory for working with Kedro within your virtual environment:
If you don't already have it, you should [download and install Anaconda](https://www.anaconda.com/products/individual#Downloads) (Python 3.x version), which comes bundled with a package and environment manager called `conda`.

```bash
mkdir kedro-environment && cd kedro-environment
```
> [Read more about Python virtual environments](https://realpython.com/python-virtual-environments-a-primer/) or [watch an explainer video about them](https://youtu.be/YKfAwIItO7M).

This will create a `kedro-environment` directory in your current working directory. Next, to create a new virtual environment in this directory, run:

```bash
python -m venv env/kedro-environment # macOS / Linux
python -m venv env\kedro-environment # Windows
```
Depending on your preferred Python installation, you can also create virtual environments to work with Kedro using `venv` or `pipenv` instead of `conda`. Further information about these can be found in the [FAQ](../faq/faq.md)
stichbury marked this conversation as resolved.
Show resolved Hide resolved

Activate this virtual environment:
### Create a virtual environment with `conda`

```bash
source env/kedro-environment/bin/activate # macOS / Linux
.\env\kedro-environment\Scripts\activate # Windows
```
1. [Install `conda`](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) on your computer.

To exit the environment:
2. Create a new Python virtual environment, called `kedro-environment`, using `conda`:

```bash
deactivate
conda create --name kedro-environment python=3.8 -y
```

### `pipenv` (instead of `conda`)
This will create an isolated Python 3.8 environment.

Install `pipenv` as follows:
3. Activate the new environment:

```bash
pip install pipenv
```

Create a directory for the virtual environment and change to that directory:

```bash
mkdir kedro-environment && cd kedro-environment
conda activate kedro-environment
```

Once all the dependencies are installed, to start a session with the correct virtual environment activated:
4. To exit `kedro-environment`:

```bash
pipenv shell
conda deactivate
```

To exit the shell session:

```bash
exit
```{note}
The `conda` virtual environment is not dependent on your current working directory and can be activated from any directory.
```
16 changes: 12 additions & 4 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,11 +78,19 @@ Welcome to Kedro's documentation!
tutorial/spaceflights_tutorial
tutorial/tutorial_template
tutorial/set_up_data
tutorial/create_pipelines
tutorial/visualise_pipeline
tutorial/namespace_pipelines
tutorial/set_up_experiment_tracking
tutorial/create_a_pipeline
tutorial/add_another_pipeline
tutorial/package_a_project
tutorial/spaceflights_tutorial_faqs

.. toctree::
:maxdepth: 2
:caption: Visualisation with Kedro-Viz

visualisation/kedro-viz_visualisation
stichbury marked this conversation as resolved.
Show resolved Hide resolved
visualisation/visualise_charts_with_plotly
visualisation/experiment_tracking


.. toctree::
:maxdepth: 2
Expand Down
2 changes: 1 addition & 1 deletion docs/source/logging/experiment_tracking.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ However, Kedro was missing a way to log metrics and capture all this logged data

Experiment tracking in Kedro adds in the missing pieces and will be developed incrementally.

The following section outlines the setup within your Kedro project to enable experiment tracking. You can also refer to the [tutorial on setting up experiment tracking](../tutorial/set_up_experiment_tracking.md) for a step-by-step process to access your tracking datasets on Kedro-Viz.
The following section outlines the setup within your Kedro project to enable experiment tracking. You can also refer to the [Kedro Viz documentation about experiment tracking](../visualisation/experiment_tracking.md) for a step-by-step process to access your tracking datasets on Kedro-Viz.

## Enable experiment tracking

Expand Down
Binary file added docs/source/meta/images/coffee-cup.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/meta/images/moon-rocket.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading