Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task_5 -Automate curation #48

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .dvc.yaml.swp
Binary file not shown.
3 changes: 3 additions & 0 deletions .dvc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/config.local
/tmp
/cache
4 changes: 4 additions & 0 deletions .dvc/config
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[core]
remote = mygoogledrive
['remote "mygoogledrive"']
url = ../gdrive:1mNe5F-CMQBm8E8Ah13WhWyDBMV_5vQky
3 changes: 3 additions & 0 deletions .dvcignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore
43 changes: 43 additions & 0 deletions .github/workflows/process-json-file.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: process-json-file

on:
# Run the workflow every day at 6:00am UTC
schedule:
- cron: "0 6 * * *"

jobs:
process-json:
runs-on: ubuntu-latest

steps:
# Checkout the code from the repository
- name: Checkout code
uses: actions/checkout@v2

# Install Python
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.8

# Install dependencies
- name: Install dependencies
run: pip install -r requirements.txt

# Run the Python script to process the JSON file
- name: Process JSON file
run: python data.py

# Commit changes to the repository
- name: Commit changes
run: |
git config --global user.name "Your Name"
git config --global user.email "[email protected]"
git add processed_data.json
git commit -m "Process data"

# Push changes to the remote repository
- name: Push changes
uses: ad-m/[email protected]
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
3 changes: 3 additions & 0 deletions .github/workflows/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
geopandas
matplotlib
pandas
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.tif
81 changes: 0 additions & 81 deletions Data/Administrative/GlobalRoadsOpenAccess_gROADS.ipynb

This file was deleted.

1 change: 1 addition & 0 deletions Data/EcoRegion/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/HoldridgeLifeZones.json
4 changes: 4 additions & 0 deletions Data/EcoRegion/HoldridgeLifeZones.json.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 6f5c2d925d682513d2ec64e706659492
size: 1914744
path: HoldridgeLifeZones.json
6 changes: 6 additions & 0 deletions Data/EcoRegion/HoldridgeLifeZones.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
path: <Land_Sector_Datasets/Data/EcoRegion>
meta:
description: <LevelI,LevelII,LevelIII>
author: <joyakinyi>
email: <[email protected]>

141 changes: 0 additions & 141 deletions Data/LandCover/Hansen v1.7 Global Forest Change.ipynb

This file was deleted.

Binary file not shown.
Binary file added Data/Soil/Transformed/transformed_file.tif
Binary file not shown.
19 changes: 19 additions & 0 deletions Data/data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import json

# Read in the JSON data
with open('KEN_AL2_Kenya_GEZ.json', 'r') as f:
data = json.load(f)

# Process the data
processed_data = []
for item in data:
# Perform some transformation on the data
processed_item = {
'name': item['name'],
'age': item['gez_code'] * 2
}
processed_data.append(processed_item)

# Write the processed data to a new JSON file
with open('processed_data.json', 'w') as f:
json.dump(processed_data, f)
5 changes: 5 additions & 0 deletions Data/forest-management-data-2015/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
/reference_data_set_updated.csv
/metafile.txt
*.csv


4 changes: 4 additions & 0 deletions Data/forest-management-data-2015/metafile.txt.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: e698f3972b8775c503e9a9fae56ad50b
size: 903
path: metafile.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: b6cc94f41fff3d6cd09f9a386090bcc6
size: 10776130
path: reference_data_set_updated.csv
Binary file added Data/loaded_data.pkl
Binary file not shown.
Empty file added conda
Empty file.
17 changes: 17 additions & 0 deletions dvc.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
schema: '2.0'
stages:
extract:
cmd: echo "No extraction needed for tif file in Data/Soil"
transform:
cmd:
- gdalwarp -s_srs EPSG:4326 -t_srs EPSG:4326 -to SRC_METHOD=NO_GEOTRANSFORM -tr
0.5 0.5 -r near -te -180.0 -90.0 180.0 90.0 -te_srs EPSG:4326 -of GTiff Data/Soil/GlobalSoilOrganicCarbonDensityinkgCm_1mDepth.tif
Data/Soil/transformed_file.tif
deps:
- path: Data/Soil/GlobalSoilOrganicCarbonDensityinkgCm_1mDepth.tif
md5: cf9794c1d61bb6eeacaa10dfa5954931
size: 1038378
outs:
- path: Data/Soil/transformed_file.tif
md5: 2ec4f2db772d40135fb4abdc92e534dc
size: 1038378
12 changes: 12 additions & 0 deletions dvc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
stages:
load_data:
cmd: python load.py
outs:
- Data/loaded_data.pkl

process_data:
cmd: python process.py
deps:
- Data/loaded_data.pkl
outs:
- Data/processed_data.csv
16 changes: 16 additions & 0 deletions load.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import pandas as pd

data_path = 'Data/forest-management-data-2015/reference_data_set_updated.csv'
metafile_path = 'Data/forest-management-data-2015/metafile.txt'

# Load the dataset
df = pd.read_csv(data_path)

# Load the metafile
with open(metafile_path,'r',encoding='utf-8') as f:
metafile_contents = f.read()

# Process the data and save the result
df=df.dropna()
# Save the whole DataFrame to pickle
df.to_pickle('Data/loaded_data.pkl')
Loading