Merge pull request #24 from BU-Spark/final-deliverable

Final deliverables (Organized code)
BU-Spark · Jun 28, 2024 · 337b843 · 337b843
2 parents cafb54f + 1ca1598
commit 337b843
Show file tree

Hide file tree

Showing 4,055 changed files with 2,795,197 additions and 6 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,5 +1,6 @@
-
-
+.idea/*
+.ipynb_checkpoints/*
+.ipynb_checkpoints
 
 # Jetbrains Products
 # Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider

diff --git a/README.md b/README.md
@@ -1,7 +1,14 @@
-# TEMPLATE-base-repo
+# COMMIT DOC
 
-Create a new branch from dev, add changes on the new branch you just created.
+The code is in data_cleaning.ipynb
 
-Open a Pull Request to dev. Add your PM and TPM as reviewers. 
+The 'india' folder contains the shapefiles which i used to classify which state a particular coordinate of latitude and longitude falls into.
+I thoroughly tested this and know its correct.
 
-At the end of the semester during project wrap up open a final Pull Request to main from dev branch.
+The citizenData folder contains the cleaned CSV files which are formatted similar to reference data for the ease of plotting and visualization.
+
+The updated_alldata.csv is the backup dataset which i kept just in case. It is basically just the original dataset except that I filled in the state names using latitude and longitude and sorted it by species. 
+
+# UPDATE
+
+Created new folder 'all data' containing citizen and reference data with consistent species names. I haven't deleted the reference and citizen data folders jic its needed in the near future.
diff --git a/code/-2_values.ipynb b/code/-2_values.ipynb
diff --git a/code/-2_values_README_key.md b/code/-2_values_README_key.md
@@ -0,0 +1,11 @@
+-2_values.ipynb creates a CSV file, adding new columns to alldata.csv (raw citizen data). The new columns indicate whether a phenophase should be reported as -2 (e.g. The open fruit phenophase **does not** appear in mangos, but citizens report values other than -2 in the open fruit column) or is mistakenly reported as -2 (e.g. The mature leaves phenophase **does** appear in mangos, but citizens report values of -2 in the mature leaves column) for each phenophase. This process is done for all ~177 species within the citizen data. Present and absent phenpohases are determined according to SW tree phenology handbook.
+
+The possible values in the new columns are 0, 1, & 2.
+
+## `[Phenophase]_incorrect_-2` Column Key
+
+| Label  | Meaning |
+| :----: | :----- |
+| 0      | Valid |
+| 1      | Mistakenly reported as -2 (false positive) |
+| 2      | Mistakenly reported as not -2 (false negative) |
diff --git a/code/data_cleaning.ipynb b/code/data_cleaning.ipynb
diff --git a/code/data_cleaning.py b/code/data_cleaning.py
diff --git a/code/mean_transition_times_data_generation.ipynb b/code/mean_transition_times_data_generation.ipynb
diff --git a/code/selecting_reference_data.ipynb b/code/selecting_reference_data.ipynb
diff --git a/code/validation_labeling.ipynb b/code/validation_labeling.ipynb
diff --git a/code/validation_labels_README_key.md b/code/validation_labels_README_key.md
@@ -0,0 +1,19 @@
+`validation_labels_alldata.csv` is a copy of alldata.csv with a new column `validation_label` which labels the observations that were dropped from the citizen data in our team's data cleaning process. The reason for dropping each observation is given by the validation label's value. The meanings of these values are listed in the key below:
+
+## Key for `validation_label` Column
+
+| Label | Meaning |
+| :----: | :----- |
+| 0      | Kept   |
+| 1      | Dropped because a phenophase was incorrectly reported as being -2 |
+| 2      | Dropped because a phenophase had missing data (Null Values) |
+| 3      | Dropped because observation was flagged as anomalous |
+
+## Counts for `validation_label` Column
+
+| Label | Number of Observations |
+| :----: | :----- |
+| 0      | 318332 |
+| 1      | 46200 |
+| 2      | 210436 |
+| 3      | 17625 |