Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #2

Merged
merged 3 commits into from
Apr 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 6 additions & 74 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#################################################################################

PROJECT_DIR := $(shell dirname $(realpath $(lastword $(MAKEFILE_LIST))))
BUCKET = [OPTIONAL] your-bucket-for-syncing-data (do not include 's3://')
BUCKET = carl-p8-v2
PROFILE = default
PROJECT_NAME = p8_cloud
PYTHON_INTERPRETER = python3
Expand All @@ -27,13 +27,11 @@ requirements: test_environment

## Make Dataset
data: requirements
# $(PYTHON_INTERPRETER) src/data/make_dataset.py data/raw data/processed
# mkdir -p data/raw data/processed data/external data/interim
# curl https://s3.eu-west-1.amazonaws.com/course.oc-static.com/projects/Data_Scientist_P4/2016_Building_Energy_Benchmarking.csv
# curl https://s3-eu-west-1.amazonaws.com/static.oc-static.com/prod/courses/files/Parcours_data_scientist/Projet+-+Impl%C3%A9menter+un+mod%C3%A8le+de+scoring/Projet+Mise+en+prod+-+home-credit-default-risk.zip -o ./data/raw/data.zip
# unzip data/raw/data.zip -d data/raw
# rm -f data/raw/data.zip
# ls -lS data/raw
mkdir -p data/raw data/processed data/external data/interim
curl https://s3.eu-west-1.amazonaws.com/course.oc-static.com/projects/Data_Scientist_P8/fruits.zip -o ./data/raw/data.zip
unzip data/raw/data.zip -d data/raw
rm -f data/raw/data.zip
ls -lS data/raw

## Delete all compiled Python files
clean:
Expand Down Expand Up @@ -89,69 +87,3 @@ endif
test_environment:
$(PYTHON_INTERPRETER) test_environment.py

#################################################################################
# PROJECT RULES #
#################################################################################



#################################################################################
# Self Documenting Commands #
#################################################################################

.DEFAULT_GOAL := help

# Inspired by <http://marmelab.com/blog/2016/02/29/auto-documented-makefile.html>
# sed script explained:
# /^##/:
# * save line in hold space
# * purge line
# * Loop:
# * append newline + line to hold space
# * go to next line
# * if line starts with doc comment, strip comment character off and loop
# * remove target prerequisites
# * append hold space (+ newline) to line
# * replace newline plus comments by `---`
# * print line
# Separate expressions are necessary because labels cannot be delimited by
# semicolon; see <http://stackoverflow.com/a/11799865/1968>
.PHONY: help
help:
@echo "$$(tput bold)Available rules:$$(tput sgr0)"
@echo
@sed -n -e "/^## / { \
h; \
s/.*//; \
:doc" \
-e "H; \
n; \
s/^## //; \
t doc" \
-e "s/:.*//; \
G; \
s/\\n## /---/; \
s/\\n/ /g; \
p; \
}" ${MAKEFILE_LIST} \
| LC_ALL='C' sort --ignore-case \
| awk -F '---' \
-v ncol=$$(tput cols) \
-v indent=19 \
-v col_on="$$(tput setaf 6)" \
-v col_off="$$(tput sgr0)" \
'{ \
printf "%s%*s%s ", col_on, -indent, $$1, col_off; \
n = split($$2, words, " "); \
line_length = ncol - indent; \
for (i = 1; i <= n; i++) { \
line_length -= length(words[i]) + 1; \
if (line_length <= 0) { \
line_length = ncol - indent - length(words[i]) - 1; \
printf "\n%*s ", -indent, " "; \
} \
printf "%s ", words[i]; \
} \
printf "\n"; \
}' \
| more $(shell test $(shell uname) = Darwin && echo '--no-init --raw-control-chars')
52 changes: 43 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,49 @@
P8_cloud
==============================
# P8_cloud

OpenClassrooms Projet 8 : Déployer un modèle dans le cloud

Project Organization
------------
## Description

[Project briefing from OpenClassrooms](https://openclassrooms.com/fr/paths/164/projects/633/assignment)
Une startup de l'AgriTech souhaite développer une application mobile de classification de fruits par reconnaissance d'image, avant de l'implémenter dans un robot cueilleur.

**Mission** : mettre en place une **architecture Big Data sur le cloud** pour traiter les données de l'application mobile

- calcul distribué avec Spark
- cloud AWS dans le respect des normes RGPD
- diffusion des poids du modèle TensorFlow
- réduction de dimension PCA
- sans entrainer modèle

## Usage

## Data

- Kaggle dataset : <https://www.kaggle.com/datasets/moltean/fruits>
- 131 fruits, 90380 images

```raw
├── Apple Braeburn
│   ├── 3_100.jpg
│   ├── r_3_100.jpg
│   └── ...
├── Banana
│   ├── 12_100.jpg
│   ├── r_105_100.jpg
│   └── ...
├── Strawberry
│   ├── 100_100.jpg
│   ├── r_64_100.jpg
│   └── ...
└── ...
   └── ...
```

## Install

## Makefile

## Project Organization

├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
Expand Down Expand Up @@ -48,8 +87,3 @@ Project Organization
│   └── visualize.py
└── tox.ini <- tox file with settings for running tox; see tox.readthedocs.io


--------

<p><small>Project based on the <a target="_blank" href="https://drivendata.github.io/cookiecutter-data-science/">cookiecutter data science project template</a>. #cookiecutterdatascience</small></p>
1 change: 0 additions & 1 deletion s3_sync/jupyter/jovyan/p8_aws.ipynb

This file was deleted.

Loading