Created: 2021 Anshuman Dewangan
This repository supports both image classification and object detection models for wildfire smoke detection for the publication: FIgLib & SmokeyNet: Dataset and Deep Learning Model for Real-Time Wildland Fire Smoke Detection.
Please include the following citation in your work:
Dewangan, A.; Pande, Y.; Braun, H.-W.; Vernon, F.; Perez, I.; Altintas, I.; Cottrell, G.W.; Nguyen, M.H. FIgLib & SmokeyNet: Dataset and Deep Learning Model for Real-Time Wildland Fire Smoke Detection. Remote Sens. 2022, 14, 1007. https://doi.org/10.3390/rs14041007
Visualization of model performance:
Relevant Files:
./scripts/setup_files.sh
: Copies raw data and labels into home directory for faster data loading../src/dynamic_dataloader.py
: Includes datamodule and dataloader for training./data/metadata.pkl
: Dictionary generated byDynamicDataModule.prepare_data()
that includes helpful information about the data. Seedynamic_dataloader.py
for the full list of keys.
Relevant Directories - Raw Data:
/userdata/kerasData/data/new_data/raw_images
: location of raw images/userdata/kerasData/data/new_data/raw_images_flow
: location of optical flow images/userdata/kerasData/data/new_data/raw_images_mog
: location of images processed with MOG background removal
Relevant Directories - Labels:
/userdata/kerasData/data/new_data/drive_clone
: location of raw XML labels/userdata/kerasData/data/new_data/drive_clone_numpy
: location of preprocessed labels created by./scripts/prepare_data.sh
/userdata/kerasData/data/new_data/bbox_labels.csv
: csv file containing bounding box labels for all the images/userdata/kerasData/data/new_data/drive_clone_bbox
: bbox labels for all images as .npy files/userdata/kerasData/data/new_data/drive_clone_filled_bbox
: image mask with bboxes filled as 1s
Relevant Directories - Train/Test Splits:
./data/final_split/
: data split where train = all the labeled fires and val/test is a random split of unlabeled fires (with night fires removed)../data/split1/
and./data/split2/
: random train/val/test split of all labeled fires only
Relevant Directories - Data Cleaning:
./data/mislabeled_fires.txt
: list of fires that should be thrown out because their binary labels are erroneous (ie. ground truth says there isn't a fire when there actually is)./data/night_fires.txt
: list of fires that occur during the night (so they can be removed)./data/omit_mislabeled.txt
: list of images that are supposed to be labeled but do not have bbox labels
metadata.pkl
is a key file containing a dictionary that is generated by prepare_data.py
to assist in the data loading process. The keys of the dictionary are:
- fire_to_images (dict): dictionary with fires as keys and list of corresponding images as values
- omit_no_xml (list of str): list of images that erroneously do not have XML files for labels. Does not include unlabeled fires.
- omit_no_contour (list of str): list of images that erroneously do not have loaded contours for labels. Does not include unlabeled fires.
- omit_no_contour_or_bbox (list of str): list of images that erroneously do not have contours or bboxes. Does not include unlabeled fires.
- omit_mislabeled (list of str): list of images that erroneously have no XML files and are manually selected as mislabeled. Does not include unlabeled fires.
- monochrome_fires (list of str): list of fires that are monochrome
- night_fires (list of str): list of fires that are in nighttime
- mislabeled_fires (list of str): list of fires in which the ground truth has erroneous labels and thus should be removed
- labeled_fires (list of str): list of fires that have at least some labels
- unlabeled_fires (list of str): list of fires that have not been labelled at all
- train_only_fires (list of str): list of fires that should only be used for train (not 'mobo-c')
- eligible_fires (list of str): list of fires that can be used for test or train (not in train_only_fires)
- bbox_labels (dict): dictionary with images as keys and 4-element array of bounding box coordinates as values
Should you lose prior data or receive new data, use the following steps to prepare the data prior to model training:
- Run
./scripts/download_raw_data.sh
to download raw images from the HPWREN website to/userdata/kerasData/data/new_data/raw_images_new/
directory - Follow the instructions at the bottom of this Google Doc to download the bounding box and contour annotation labels (Note: to be released to the public in late 2022)
- Run
python3.9 ./scripts/prepare_data.py
to create./data/metadata_new.pkl
and .npy label files in/userdata/kerasData/data/new_data/drive_clone_numpy_new/
- (Optional) Run
python3.9 ./scripts/generate_flow.py
to create optical flow outputs in/userdata/kerasData/data/new_data/raw_images_flow_new/
and background removal outputs in/userdata/kerasData/data/new_data/raw_images_mog_new/
Generating train/val/test split:
- From labeled_fires, remove night_fires and mislabeled_fires. Use as the train set.
- From unlabeled_fires, split 50/50 between validation and test sets.
Relevant Files:
model_components.py
: Different loss functions and torch models to use withmain_model.py
. Each model has its own forward pass.main_model.py
: Main model to use withlightning_module.py
. Chains forward passes and sums loss functions from individual model_components
Models:
Models are created with model_components that can be chained together using the --model-type-list
command line argument. Intermediate supervision from tile_labels or image_labels provides additional feedback to each model_component. Models can be one of five types:
- RawToTile: Raw inputs -> tile predictions
- RawToImage: Raw inputs -> image predictions
- TileToTile: Tile predictions -> tile predictions
- TileToImage: Tile predictins -> image predictions
- ImageToImage: Image predictions -> image predictions
Special models include:
- Feature Pyramid Networks
- Backbones incorporating optical flow
- Object detection models
Relevant Files:
main.py
: Kicks off training and evaluation. Contains many command line arguments to vary hyperparameters.lightning_module.py
: PyTorch Lightning LightningModule that defines optimizers, training step and metrics.run_train.sh
: Used to easily start training from main.py with command line arguments.
Steps to Run:
To run training, use ./run_train.sh
. You can check main.py
for a full list of tunable hyperparameters as command line arguments.
Relevant Directories:
./lightning_logs/
(currently not pushed to repo): Automatically generated each run where logs & checkpoints are saved./saved_logs/
(currently not pushed to repo): It is suggested to move logs you want to save long-term in this directory
Steps to Access:
Logs can be accessed using Tensorboard: tensorboard --logdir ./lightning_logs
Useful Scripts:
./scripts/paper_experiments.sh
: kicks off all experiments useful for a research paper (different backbones, ablation study, object detection models)./scripts/labelme.sh
: sets up LabelMe for generating additional annotations. Only runs partial setup; look at code within file for complete instructions.
Utility Notebooks:
helper.ipynb
: code to average test metrics, calculate inference speed, and debug code.visual_analysis.ipynb
: code to visualize errors, generate videos, and create human experiment.
- On Debian, change "torch_gtrxl" to "gtrxl_torch"
- If the
download_raw_data.sh
script is blocked by robots.txt, add:-e robots=off
This repository is released under the Apache 2.0 license. Please see the LICENSE file for more information.