Skip to content

Commit

Permalink
Add README.md and Final Report to Main (#25)
Browse files Browse the repository at this point in the history
* Updated README.md to contain detailed description of the project and repository

* Added project report

* removed deviation plots and yearly average plots because they were bugged

---------

Co-authored-by: Zachary Meurer <[email protected]>
  • Loading branch information
zacharymeurer and Zachary Meurer authored Jul 2, 2024
1 parent 337b843 commit 308df4f
Show file tree
Hide file tree
Showing 304 changed files with 71 additions and 8 deletions.
79 changes: 71 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,77 @@
# COMMIT DOC
# PIT-NE SeasonWatch Project Overview

The code is in data_cleaning.ipynb
_Further details on project background, process, and results in SeasonWatch_project_report_

The 'india' folder contains the shapefiles which i used to classify which state a particular coordinate of latitude and longitude falls into.
I thoroughly tested this and know its correct.
SeasonWatch, a citizen science organization based in India, provided our PIT-NE team with a citizen database containing daily tree phenology data collected from citizen scientists in India and a reference database containing weekly tree phenology data collected from credible sources (e.g. textbooks).

The citizenData folder contains the cleaned CSV files which are formatted similar to reference data for the ease of plotting and visualization.
## Applications

The updated_alldata.csv is the backup dataset which i kept just in case. It is basically just the original dataset except that I filled in the state names using latitude and longitude and sorted it by species.
Our team processed and analyzed these databases to provide valuable information to support the SeasonWatch in their climate research efforts:

# UPDATE
### Data Processing

Created new folder 'all data' containing citizen and reference data with consistent species names. I haven't deleted the reference and citizen data folders jic its needed in the near future.
- Cleaned and reformatted citizen and reference databases (Made database formatting consistent, handled data with incorrectly reported features, etc.)
- Developed a data validation system for citizen database (Used isolation forests for anomaly detection)

### Data Analysis

- Created visualizations of the citizen and reference data over time (Bar and line charts highlighting discrepancies between the citizen and reference observations over time)
- Developed a process for selecting representative citizen observations over a year to use as up-to-date baselines for any species.
- Designed a scoring function to identify flowering and fruiting stage transitions throughout a given year.

## Repository Structure

### code (Contains Python notebooks used in the final product)
- -2_values (Flags citizen observations with incorrect reports regarding the presence or absence of a phenophase in the reported species)
- data_cleaning (Cleans citizen and reference databases, and validates citizen database)
- mean_transition_times_generation (Creates visualizations and a dataset of probability distributions of phenophase transition times based on a score function)
- selecting_reference_data (Creates visualizations and a dataset of representative citizen observations selected as baselines)
- validation_labels (Flags citizen observations dropped during the data cleaning process and gives reasons for dropping them)
- visualizations (Creates visualizations of the citizen and reference data)
### data (Contains CSV files of original data and data produced by the Python notebooks in code)
- citizen_states_cleaned (Cleaned and reformatted citizen database sorted by states)
- india_map (Geographic data used for finding the Inidan state given a set of coordinates)
- original_citizen_data (Citizen database given by SeasonWatch)
- original_reference_data (Reference database given by SeasonWatch)
- reference_states_cleaned (Cleaned and reformatted reference database sorted by states)
- alldata_labeling_-2_all_species (Citizen database given by SeasonWatch with incorrect reports regarding the presence or absence of a phenophase in the reported species flagged)
- average_transition_times (Dataset of probability distributions of phenophase transition times based on a score function)
- cleaned_alldata (Cleaned and reformatted citizen database as one dataset)
- selected_reference_data (Dataset of representative citizen observations selected as baselines)
- species codes (Dataset mapping tree species ids to names)
- validation_labels_alldata (Citizen database given by SeasonWatch with citizen observations dropped during the data cleaning process flagged and reasons for dropping them given)
### dev_code (Contains Python notebooks used in the development process)
- jobfiles (Files of jobs submitted to shared cloud computing service)
- scc-config (Config for submitting jobs to shared cloud computing service)
- kmeans_pca_testing (Experimenting with and visualizing data validation methods)
- mean transition times from repeat observations (Experimenting with only using regular citizen observations to find phenophase transition times)
- mean_transition_times_dev (Experimenting with different methods for finding phenophase transition times)
- plotting (Preliminary, experimental visualizations)
- ref_cit_na_comparison (Comparing how much citizen data has associated reference data)
### plots (Contains PNG files depicting plots produced by the Python notebooks in code)

> _Citizen observations are usually depicted as percentages. This measure indicates the percentage of citizen reports observing a phenophase in the given week._
>
> _Plots report information weekly (48 weeks per year) over a year._
- combination_percentage_charts (Compares citizen data and reference data over time; bar charts indicate number of citizen observations that week)
- overlaid_percentage_plots (Compares related phenophases within citizen data over time; bar charts indicate number of citizen observations that week)
- repeat_combination_percentage_charts (Compares regular observations and all observations within citizen data over time; bar charts indicate number of citizen observations that week)
- repeat_observations (Compares differences between regular observations and reference data over time, and between all observations and reference data over time)
- selected_ref_vs_cit (Compares citizen data and selected baselines over time)
- transition_bar_plots (Depicts number of observations reporting a phenophase appearing over time)
- two_values_weighted (Compares percentage presence of a phenophase and the magnitude of the presence of a phenophase within the citizen data over time)

## Usage Guide

### Step 1: Data Cleaning

Data should be cleaned, reformatted, and validated before it is applied to anything. Thus, the data cleaning notebook or script should be run before any visualization or analysis.

> _Edit file paths within the code to any new citizen data or reference data CSV files._
### Step 2: Plotting & Analysis

Any other notebook within the code folder can be run next to update the data and plots. Notebooks have functions for plotting and producing datasets. Modify parameters (states, species, year, etc.) to the functions as needed (i.e. If selected reference data on tamarind in Kerala in 2018 is wanted, set the function parameters to match that).

> _Edit plot and CSV file paths within the code as needed._
Binary file added SeasonWatch_Project_Report.pdf
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Loading

0 comments on commit 308df4f

Please sign in to comment.