Skip to content

Commit

Permalink
initial draft
Browse files Browse the repository at this point in the history
  • Loading branch information
ravi-kumar-pilla committed Apr 19, 2024
1 parent 2ca6d2e commit 53828e0
Show file tree
Hide file tree
Showing 46 changed files with 26,666 additions and 0 deletions.
151 changes: 151 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
##########################
# KEDRO PROJECT

# ignore all local configuration
conf/local/**
!conf/local/.gitkeep

# ignore potentially sensitive credentials files
conf/**/*credentials*

# ignore everything in the following folders
data/**

# except their sub-folders
!data/**/

# also keep all .gitkeep files
!.gitkeep

# keep also the example dataset
!data/01_raw/*


##########################
# Common files

# IntelliJ
.idea/
*.iml
out/
.idea_modules/

### macOS
*.DS_Store
.AppleDouble
.LSOverride
.Trashes

# Vim
*~
.*.swo
.*.swp

# emacs
*~
\#*\#
/.emacs.desktop
/.emacs.desktop.lock
*.elc

# JIRA plugin
atlassian-ide-plugin.xml

# C extensions
*.so

### Python template
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/

# Translations
*.mo
*.pot

# Django stuff:
*.log
.static_storage/
.media/
local_settings.py

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# mkdocs documentation
/site

# mypy
.mypy_cache/
1 change: 1 addition & 0 deletions .telemetry
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
consent: false
20 changes: 20 additions & 0 deletions conf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# What is this for?

This folder should be used to store configuration files used by Kedro or by separate tools.

This file can be used to provide users with instructions for how to reproduce local configuration with their own credentials. You can edit the file however you like, but you may wish to retain the information below and add your own section in the section titled **Instructions**.

## Local configuration

The `local` folder should be used for configuration that is either user-specific (e.g. IDE configuration) or protected (e.g. security keys).

> *Note:* Please do not check in any local configuration to version control.
## Base configuration

The `base` folder is for shared configuration, such as non-sensitive and project-related configuration that may be shared across team members.

WARNING: Please do not put access credentials in the base configuration folder.

## Find out more
You can find out more about configuration from the [user guide documentation](https://docs.kedro.org/en/stable/configuration/configuration_basics.html).
104 changes: 104 additions & 0 deletions conf/base/catalog.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Here you can define all your data sets by using simple YAML syntax.
#
# Documentation for this file format can be found in "The Data Catalog"
# Link: https://kedro.readthedocs.io/en/stable/data/data_catalog.html
#
# We support interacting with a variety of data stores including local file systems, cloud, network and HDFS
#
# An example data set definition can look as follows:
#
#bikes:
# type: pandas.CSVDataset
# filepath: "data/01_raw/bikes.csv"
#
#weather:
# type: spark.SparkDataset
# filepath: s3a://your_bucket/data/01_raw/weather*
# file_format: csv
# credentials: dev_s3
# load_args:
# header: True
# inferSchema: True
# save_args:
# sep: '|'
# header: True
#
#scooters:
# type: pandas.SQLTableDataset
# credentials: scooters_credentials
# table_name: scooters
# load_args:
# index_col: ['name']
# columns: ['name', 'gear']
# save_args:
# if_exists: 'replace'
# # if_exists: 'fail'
# # if_exists: 'append'
#
# The Data Catalog supports being able to reference the same file using two different Dataset implementations
# (transcoding), templating and a way to reuse arguments that are frequently repeated. See more here:
# https://kedro.readthedocs.io/en/stable/data/data_catalog.html

companies:
type: pandas.CSVDataset
filepath: data/01_raw/companies.csv

reviews:
type: pandas.CSVDataset
filepath: data/01_raw/reviews.csv

shuttles:
type: pandas.ExcelDataset
filepath: data/01_raw/shuttles.xlsx
load_args:
engine: openpyxl

preprocessed_companies:
type: pandas.ParquetDataset
filepath: data/02_intermediate/preprocessed_companies.pq

preprocessed_shuttles:
type: pandas.ParquetDataset
filepath: data/02_intermediate/preprocessed_shuttles.pq

model_input_table:
type: pandas.ParquetDataset
filepath: data/03_primary/model_input_table.pq

regressor:
type: pickle.PickleDataset
filepath: data/06_models/regressor.pickle
versioned: true

metrics:
type: tracking.MetricsDataset
filepath: data/09_tracking/metrics.json

companies_columns:
type: tracking.JSONDataset
filepath: data/09_tracking/companies_columns.json

shuttle_passenger_capacity_plot_exp:
type: plotly.PlotlyDataset
filepath: data/08_reporting/shuttle_passenger_capacity_plot_exp.json
versioned: true
plotly_args:
type: bar
fig:
x: shuttle_type
y: passenger_capacity
orientation: h
layout:
xaxis_title: Shuttles
yaxis_title: Average passenger capacity
title: Shuttle Passenger capacity

shuttle_passenger_capacity_plot_go:
type: plotly.JSONDataset
filepath: data/08_reporting/shuttle_passenger_capacity_plot_go.json
versioned: true

dummy_confusion_matrix:
type: matplotlib.MatplotlibWriter
filepath: data/08_reporting/dummy_confusion_matrix.png
versioned: true
Empty file added conf/base/parameters.yml
Empty file.
Empty file.
12 changes: 12 additions & 0 deletions conf/base/parameters_data_science.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
model_options:
test_size: 0.2
random_state: 3
features:
- engines
- passenger_capacity
- crew
- d_check_complete
- moon_clearance_complete
- iata_approved
- company_rating
- review_scores_rating
Empty file.
Empty file added conf/local/.gitkeep
Empty file.
43 changes: 43 additions & 0 deletions conf/logging.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# To enable this custom logging configuration, set KEDRO_LOGGING_CONFIG to the path of this file.
# More information available at https://docs.kedro.org/en/stable/logging/logging.html
version: 1

disable_existing_loggers: False

formatters:
simple:
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"

handlers:
console:
class: logging.StreamHandler
level: INFO
formatter: simple
stream: ext://sys.stdout

info_file_handler:
class: logging.handlers.RotatingFileHandler
level: INFO
formatter: simple
filename: info.log
maxBytes: 10485760 # 10MB
backupCount: 20
encoding: utf8
delay: True

rich:
class: kedro.logging.RichHandler
rich_tracebacks: True
# Advance options for customisation.
# See https://docs.kedro.org/en/stable/logging/logging.html#project-side-logging-configuration
# tracebacks_show_locals: False

loggers:
kedro:
level: INFO

spaceflights_pandas_viz:
level: INFO

root:
handlers: [rich, info_file_handler]
Empty file added data/01_raw/.gitkeep
Empty file.
Loading

0 comments on commit 53828e0

Please sign in to comment.