Skip to content

Commit

Permalink
Merge pull request #24 from BU-Spark/final-deliverable
Browse files Browse the repository at this point in the history
Final deliverables (Organized code)
  • Loading branch information
zacharymeurer authored Jun 28, 2024
2 parents cafb54f + 1ca1598 commit 337b843
Show file tree
Hide file tree
Showing 4,055 changed files with 2,795,197 additions and 6 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@


.idea/*
.ipynb_checkpoints/*
.ipynb_checkpoints

# Jetbrains Products
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider
Expand Down
15 changes: 11 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
# TEMPLATE-base-repo
# COMMIT DOC

Create a new branch from dev, add changes on the new branch you just created.
The code is in data_cleaning.ipynb

Open a Pull Request to dev. Add your PM and TPM as reviewers.
The 'india' folder contains the shapefiles which i used to classify which state a particular coordinate of latitude and longitude falls into.
I thoroughly tested this and know its correct.

At the end of the semester during project wrap up open a final Pull Request to main from dev branch.
The citizenData folder contains the cleaned CSV files which are formatted similar to reference data for the ease of plotting and visualization.

The updated_alldata.csv is the backup dataset which i kept just in case. It is basically just the original dataset except that I filled in the state names using latitude and longitude and sorted it by species.

# UPDATE

Created new folder 'all data' containing citizen and reference data with consistent species names. I haven't deleted the reference and citizen data folders jic its needed in the near future.
109,550 changes: 109,550 additions & 0 deletions code/-2_values.ipynb

Large diffs are not rendered by default.

11 changes: 11 additions & 0 deletions code/-2_values_README_key.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
-2_values.ipynb creates a CSV file, adding new columns to alldata.csv (raw citizen data). The new columns indicate whether a phenophase should be reported as -2 (e.g. The open fruit phenophase **does not** appear in mangos, but citizens report values other than -2 in the open fruit column) or is mistakenly reported as -2 (e.g. The mature leaves phenophase **does** appear in mangos, but citizens report values of -2 in the mature leaves column) for each phenophase. This process is done for all ~177 species within the citizen data. Present and absent phenpohases are determined according to SW tree phenology handbook.

The possible values in the new columns are 0, 1, & 2.

## `[Phenophase]_incorrect_-2` Column Key

| Label | Meaning |
| :----: | :----- |
| 0 | Valid |
| 1 | Mistakenly reported as -2 (false positive) |
| 2 | Mistakenly reported as not -2 (false negative) |
2,373 changes: 2,373 additions & 0 deletions code/data_cleaning.ipynb

Large diffs are not rendered by default.

681 changes: 681 additions & 0 deletions code/data_cleaning.py

Large diffs are not rendered by default.

5,969 changes: 5,969 additions & 0 deletions code/mean_transition_times_data_generation.ipynb

Large diffs are not rendered by default.

462 changes: 462 additions & 0 deletions code/selecting_reference_data.ipynb

Large diffs are not rendered by default.

691 changes: 691 additions & 0 deletions code/validation_labeling.ipynb

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions code/validation_labels_README_key.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
`validation_labels_alldata.csv` is a copy of alldata.csv with a new column `validation_label` which labels the observations that were dropped from the citizen data in our team's data cleaning process. The reason for dropping each observation is given by the validation label's value. The meanings of these values are listed in the key below:

## Key for `validation_label` Column

| Label | Meaning |
| :----: | :----- |
| 0 | Kept |
| 1 | Dropped because a phenophase was incorrectly reported as being -2 |
| 2 | Dropped because a phenophase had missing data (Null Values) |
| 3 | Dropped because observation was flagged as anomalous |

## Counts for `validation_label` Column

| Label | Number of Observations |
| :----: | :----- |
| 0 | 318332 |
| 1 | 46200 |
| 2 | 210436 |
| 3 | 17625 |
Loading

0 comments on commit 337b843

Please sign in to comment.