Skip to content

A deep learning library built for video understanding tasks. Primarily relies upon PyTorch Lightning and wandb. Takes inspiration from fast.ai

Notifications You must be signed in to change notification settings

zanedurante/video_understanding

Repository files navigation

video_understanding

A deep learning library built for video understanding tasks. Primarily relies upon PyTorch Lightning and wandb. Takes inspiration from fast.ai

Installation:

Follow the instruction steps in the Setup Instructions.

Style guides:

Use black to format the code: run black . in the main directory. If you do not have black installed, install it with pip install black.

Before submitting commits

Format with black . and run pytest in the parent directory.

Should you use this repo?

In the current state, absolutely not. You can see all the TODOs at the bottom of this README.

TODOs

  • Add UCF101 dataset to start
  • Create video frame loader
  • Create video data visualizer (from dataset name)
  • Add CLIP implementation (like ViFi-CLIP)
  • Add initial version of trainer with wandb support in train.py
  • Add general trainer code
  • Add debug mode and lr_find flag for running train.py
  • Get good classification performance on UCF101
  • Make as a package (pip install -e .)
  • Delete temp ckpt file created
  • Figure out why the learning rate and momentum is not logged to wandb
  • Create good nested config system
  • Fix config test
  • Add hyperparameter sweeps with wandb
  • Detect the number of classes for the dataset(s) automatically (requires setup)
  • Finishing touches on wandb sweeps
  • Configure model checkpoint locations
  • Create evaluate.py (give config, create args for metrics, and give test/val.csv)
  • Renew pytest-bed
  • Add hmdb dataset
  • Add Kinetics-400 dataset
  • Add Kinetics-700 dataset
  • Add WebVid10M dataset
  • Add RewrittenWebVid dataset
  • Revisit codebase to support video-text matching as a task.
  • Allow for multiple datasets for training (how to weight?)
  • Modify code to allow for large csv loading
  • Add multiple dataset support + Kinetics-400
  • Add NTP part to the codebase
  • Add the webvid rewritten dataset to the codebase.
  • Add VideoMAEv2 model to codebase? Maybe we just do a frame-level MAE model instead?
  • Add LoRA fine-tuning capabilities (especially important for LLaMA models and maybe for video encoders too)
  • Explore other optimizers?
  • Create next TODOs

About

A deep learning library built for video understanding tasks. Primarily relies upon PyTorch Lightning and wandb. Takes inspiration from fast.ai

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages