This is an implementation of bag of visual words model in Python for feature extraction in videos.
The current repository is just one layer of a framework for video classification, composed by:
- Bag-of-Visual-Words ( Feature extraction for each frame)
- Long-Short Term Memory ( Maximizing Temporal Dependencies of features)
- Softmax Classifier ( Classify the video, given the outputs of LSTM)
The first part consists of given an input video, split it up into sequence of frames and save these images in a folder that
represents the video class. After this we extract the features for each image into the folder of video, generating a Histogram
of Visual Words belonging to each image. The first part is done by process_video.py
and the second by feature_extraction.py
The script feature_extraction.py
will generate a visual vocabulary using the images provided by process_video.py.
The feature extraction consists of:
- Extracting local features of all datasets
- Generating a codebook of visual words with clustering of the features
- Aggregating the histograms of the visual words for each of the training images
This code relies on:
- SIFT features for local features (external implementation)
- k-means for generation of the words via clustering
To extract the frame sequence of a given video input eg. video.mp4 that belongs to a given class eg. walking, use this command:
python process_video.py walking video.mp4
Then will be create a new folder with the same name of the class given containing each frame extracted
You can extract the features of a video passing the path for the folder containing the video frames (eg. walking) extracted before by process_video.py
script and passing what will be the dataset folder (eg. dataset_folder) used to generate (or simply use) the codebook. Use this command:
python feature_extraction.py dataset_folder/ walking/
The dataset should have following structure, where all the video frames belonging to one class are in the same folder:
.
|-- path_to_folders_with_video_frames
| |-- class1
| |-- class2
| |-- class3
...
| └-- classN
To install the necessary libraries run following code from working directory:
# installing sift
wget http://www.cs.ubc.ca/~lowe/keypoints/siftDemoV4.zip
unzip siftDemoV4.zip
cp sift*/sift sift
# installing sift
Download and unpack the latest VLFeat binary package from the download page (currently the latest version is 0.9.20).Copy
the binary sift and the libvl.dylib to the bag-of-visual-words repository path. The binaries are in the bin/
directory,just pick the sub-directory for your platform.
If you're using Linux and get an IOError: SIFT executable not found
error, try sudo apt-get install libc6-i386
.
David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
Taken from http://www.janeriksolem.net/2009/02/sift-python-implementation.html (Linux) or http://www.maths.lth.se/matematiklth/personal/solem/downloads/vlfeat.py (Mac)