-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #24 from BU-Spark/final-deliverable
Final deliverables (Organized code)
- Loading branch information
Showing
4,055 changed files
with
2,795,197 additions
and
6 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,14 @@ | ||
# TEMPLATE-base-repo | ||
# COMMIT DOC | ||
|
||
Create a new branch from dev, add changes on the new branch you just created. | ||
The code is in data_cleaning.ipynb | ||
|
||
Open a Pull Request to dev. Add your PM and TPM as reviewers. | ||
The 'india' folder contains the shapefiles which i used to classify which state a particular coordinate of latitude and longitude falls into. | ||
I thoroughly tested this and know its correct. | ||
|
||
At the end of the semester during project wrap up open a final Pull Request to main from dev branch. | ||
The citizenData folder contains the cleaned CSV files which are formatted similar to reference data for the ease of plotting and visualization. | ||
|
||
The updated_alldata.csv is the backup dataset which i kept just in case. It is basically just the original dataset except that I filled in the state names using latitude and longitude and sorted it by species. | ||
|
||
# UPDATE | ||
|
||
Created new folder 'all data' containing citizen and reference data with consistent species names. I haven't deleted the reference and citizen data folders jic its needed in the near future. |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
-2_values.ipynb creates a CSV file, adding new columns to alldata.csv (raw citizen data). The new columns indicate whether a phenophase should be reported as -2 (e.g. The open fruit phenophase **does not** appear in mangos, but citizens report values other than -2 in the open fruit column) or is mistakenly reported as -2 (e.g. The mature leaves phenophase **does** appear in mangos, but citizens report values of -2 in the mature leaves column) for each phenophase. This process is done for all ~177 species within the citizen data. Present and absent phenpohases are determined according to SW tree phenology handbook. | ||
|
||
The possible values in the new columns are 0, 1, & 2. | ||
|
||
## `[Phenophase]_incorrect_-2` Column Key | ||
|
||
| Label | Meaning | | ||
| :----: | :----- | | ||
| 0 | Valid | | ||
| 1 | Mistakenly reported as -2 (false positive) | | ||
| 2 | Mistakenly reported as not -2 (false negative) | |
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
`validation_labels_alldata.csv` is a copy of alldata.csv with a new column `validation_label` which labels the observations that were dropped from the citizen data in our team's data cleaning process. The reason for dropping each observation is given by the validation label's value. The meanings of these values are listed in the key below: | ||
|
||
## Key for `validation_label` Column | ||
|
||
| Label | Meaning | | ||
| :----: | :----- | | ||
| 0 | Kept | | ||
| 1 | Dropped because a phenophase was incorrectly reported as being -2 | | ||
| 2 | Dropped because a phenophase had missing data (Null Values) | | ||
| 3 | Dropped because observation was flagged as anomalous | | ||
|
||
## Counts for `validation_label` Column | ||
|
||
| Label | Number of Observations | | ||
| :----: | :----- | | ||
| 0 | 318332 | | ||
| 1 | 46200 | | ||
| 2 | 210436 | | ||
| 3 | 17625 | |
Oops, something went wrong.