Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Update prerequisites #16

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .gitpod.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
tsks:
- init: pip install -r spaceflights-completed/src/requirements.txt
command: |
cd spaceflights-jupyter
jupyter lab --ip='*' --NotebookApp.token='' --NotebookApp.password=''
name: jupyter

ports:
- port: 8888
onOpen: ignore
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Kedro Training

[![Gitpod ready-to-code](https://img.shields.io/badge/Gitpod-ready--to--code-blue?logo=gitpod)](https://gitpod.io/#https://github.com/AntonyMilneQB/kedro-training)

This repository contains training materials that will teach you how to use [Kedro](https://github.com/quantumblacklabs/kedro/). This content is based on the standard [spaceflights tutorial described in the Kedro documentation](https://kedro.readthedocs.io/en/stable/03_tutorial/01_spaceflights_tutorial.html).

The training documentation was most recently updated against Kedro 0.17.0 in February 2021.
Expand Down
6 changes: 6 additions & 0 deletions spaceflights-completed/.coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[report]
fail_under=0
show_missing=True
exclude_lines =
pragma: no cover
raise NotImplementedError
157 changes: 157 additions & 0 deletions spaceflights-completed/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
##########################
# KEDRO PROJECT

# ignore all local configuration
conf/local/**
!conf/local/.gitkeep

# ignore potentially sensitive credentials files
conf/**/*credentials*

# ignore everything in the following folders
data/**
logs/**

# except their sub-folders
!data/**/
!logs/**/

# also keep all .gitkeep files
!.gitkeep

# keep also the example dataset
!data/01_raw/*


##########################
# Common files

# IntelliJ
.idea/
*.iml
out/
.idea_modules/

### macOS
*.DS_Store
.AppleDouble
.LSOverride
.Trashes

# Vim
*~
.*.swo
.*.swp

# emacs
*~
\#*\#
/.emacs.desktop
/.emacs.desktop.lock
*.elc

# JIRA plugin
atlassian-ide-plugin.xml

# C extensions
*.so

### Python template
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/

# Translations
*.mo
*.pot

# Django stuff:
*.log
.static_storage/
.media/
local_settings.py

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
.ipython/profile_default/history.sqlite
.ipython/profile_default/startup/README

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# mkdocs documentation
/site

# mypy
.mypy_cache/
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
import logging.config
import sys
from pathlib import Path
from typing import Any, Dict

from IPython.core.magic import needs_local_scope, register_line_magic

# Find the project root (./../../../)
from kedro.framework.startup import _get_project_metadata

startup_error = None
project_path = Path(__file__).parents[3].resolve()


@register_line_magic
def reload_kedro(path, line=None, env: str = None, extra_params: Dict[str, Any] = None):
"""Line magic which reloads all Kedro default variables."""
global startup_error
global context
global catalog
global session

try:
import kedro.config.default_logger
from kedro.framework.hooks import get_hook_manager
from kedro.framework.project import configure_project
from kedro.framework.session import KedroSession
from kedro.framework.session.session import _activate_session
from kedro.framework.cli.jupyter import collect_line_magic
except ImportError:
logging.error(
"Kedro appears not to be installed in your current environment "
"or your current IPython session was not started in a valid Kedro project."
)
raise

try:
path = path or project_path

# clear hook manager
hook_manager = get_hook_manager()
name_plugin_pairs = hook_manager.list_name_plugin()
for name, plugin in name_plugin_pairs:
hook_manager.unregister(name=name, plugin=plugin)

# remove cached user modules
metadata = _get_project_metadata(path)
to_remove = [
mod for mod in sys.modules if mod.startswith(metadata.package_name)
]
# `del` is used instead of `reload()` because: If the new version of a module does not
# define a name that was defined by the old version, the old definition remains.
for module in to_remove:
del sys.modules[module]

configure_project(metadata.package_name)
session = KedroSession.create(
metadata.package_name, path, env=env, extra_params=extra_params
)
_activate_session(session, force=True)
logging.debug("Loading the context from %s", str(path))
context = session.load_context()
catalog = context.catalog

logging.info("** Kedro project %s", str(metadata.project_name))
logging.info("Defined global variable `context`, `session` and `catalog`")

for line_magic in collect_line_magic():
register_line_magic(needs_local_scope(line_magic))
logging.info("Registered line magic `%s`", line_magic.__name__)
except Exception as err:
startup_error = err
logging.exception(
"Kedro's ipython session startup script failed:\n%s", str(err)
)
raise err


reload_kedro(project_path)
7 changes: 7 additions & 0 deletions spaceflights-completed/.isort.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[settings]
multi_line_output=3
include_trailing_comma=True
force_grid_wrap=0
use_parentheses=True
line_length=88
known_third_party=kedro
121 changes: 121 additions & 0 deletions spaceflights-completed/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# spaceflights

## Overview

This is your new Kedro project, which was generated using `Kedro 0.17.2`, with the completed version of the [spaceflights tutorial](https://kedro.readthedocs.io/en/stable/03_tutorial/01_spaceflights_tutorial.html) and the data necessary to run the project.

Take a look at the [Kedro documentation](https://kedro.readthedocs.io) to get started.

## Rules and guidelines

In order to get the best out of the template:

* Don't remove any lines from the `.gitignore` file we provide
* Make sure your results can be reproduced by following a [data engineering convention](https://kedro.readthedocs.io/en/stable/11_faq/01_faq.html#what-is-data-engineering-convention)
* Don't commit data to your repository
* Don't commit any credentials or your local configuration to your repository. Keep all your credentials and local configuration in `conf/local/`

## How to install dependencies

Declare any dependencies in `src/requirements.txt` for `pip` installation and `src/environment.yml` for `conda` installation.

To install them, run:

```
kedro install
```

## How to run Kedro

You can run your Kedro project with:

```
kedro run
```

## How to test your Kedro project

Have a look at the file `src/tests/test_run.py` for instructions on how to write your tests. You can run your tests as follows:

```
kedro test
```

To configure the coverage threshold, look at the `.coveragerc` file.


## Project dependencies

To generate or update the dependency requirements for your project:

```
kedro build-reqs
```

This will copy the contents of `src/requirements.txt` into a new file `src/requirements.in` which will be used as the source for [`pip-compile`](https://github.com/jazzband/pip-tools#example-usage-for-pip-compile). You can see the output of the resolution by opening `src/requirements.txt`.

After this, if you'd like to update your project requirements, please update `src/requirements.in` and re-run `kedro build-reqs`.

[Further information about project dependencies](https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/01_dependencies.html#project-specific-dependencies)

## How to work with Kedro and notebooks

> Note: Using `kedro jupyter` or `kedro ipython` to run your notebook provides these variables in scope: `context`, `catalog`, and `startup_error`.

### Jupyter
To use Jupyter notebooks in your Kedro project, you need to install Jupyter:

```
pip install jupyter
```

After installing Jupyter, you can start a local notebook server:

```
kedro jupyter notebook
```

### JupyterLab
To use JupyterLab, you need to install it:

```
pip install jupyterlab
```

You can also start JupyterLab:

```
kedro jupyter lab
```

### IPython
And if you want to run an IPython session:

```
kedro ipython
```

### How to convert notebook cells to nodes in a Kedro project
You can move notebook code over into a Kedro project structure using a mixture of [cell tagging](https://jupyter-notebook.readthedocs.io/en/stable/changelog.html#cell-tags) and Kedro CLI commands.

By adding the `node` tag to a cell and running the command below, the cell's source code will be copied over to a Python file within `src/<package_name>/nodes/`:

```
kedro jupyter convert <filepath_to_my_notebook>
```
> *Note:* The name of the Python file matches the name of the original notebook.

Alternatively, you may want to transform all your notebooks in one go. Run the following command to convert all notebook files found in the project root directory and under any of its sub-folders:

```
kedro jupyter convert --all
```

### How to ignore notebook output cells in `git`
To automatically strip out all output cell contents before committing to `git`, you can run `kedro activate-nbstripout`. This will add a hook in `.git/config` which will run `nbstripout` before anything is committed to `git`.

> *Note:* Your output cells will be retained locally.

## Package your Kedro project

[Further information about building project documentation and packaging your project](https://kedro.readthedocs.io/en/stable/03_tutorial/05_package_a_project.html)
Loading