Skip to content

yangyihui2020/Video-based-Action-Recognition

Repository files navigation

Video-based-Action-Recognition

Read this in other languages: English, 中文.

This project is based on an LSTM model and can be used to recognize specific actions within long videos.

Project Structure

Here is the PyTorch version of the project code, which can be accelerated using a GPU.

Video-based-Action-Recognition/
│
├── CLIP/                           # Code related to video clipping tools, which you can use to build your own video dataset
│ └── CLIP.py
│
├── extract*keyframes/              # Module for extracting keyframes
│ ├── extract_keyframes.py          # Python script for extracting keyframes
│ └── extract_keyframes.sh          # Shell script for extracting keyframes
│
├── models/                         # Directory for storing model files
│
├── Output/                         # Output directory, where processed video files are stored
│
├── source/                         # Source code directory
│ ├── pycache*/
│ ├── lstm_model_inference.py       # Model inference code
│ ├── lstm_model_train.py           # Model training code
│ └── lstm_model.py                 # LSTM model definition code
│
├── test_videos/                    # Directory for test videos
├── video_dataset/                  # Video dataset directory, containing subdirectories for different actions
│ ├── openfile1/                    # Stores MP4 videos related to opening folders, each video is about 15 seconds long
│ └── openwebsite/                  # Stores MP4 videos related to opening websites, each video is about 15 seconds long
│
├── lstm_model_inference.sh         # Shell script for model inference
├── lstm_model_train.sh             # Shell script for model training
│
└── README.md                       # Project README file

Environment Setup

Omitted

Dataset Preparation

The dataset here is essentially a collection of short videos classified by action categories, with each category corresponding to a directory containing many video files. When creating your own dataset, you can organize the video data by action categories in a similar way, with one directory corresponding to one action.

Preparing Your Training Videos

For each training video, it is recommended that the video only contains the predefined actions you want to recognize, with a video length of about 10 seconds. The more training videos you have for each category, the better, and the richer the content (referring to the diversity of the video content, such as different backgrounds when operating terminals). Try to avoid having a dataset that is too uniform, making the data more generalized, which can improve the model's applicability in different scenarios.

Note that the action categories mentioned here will also be used in the model training and will be passed as a parameter to the model training code. For example, there are two subfolders under myvideodataset: openfile1 and openwebsite. When running lstm_model_train.py later, you will need to pass the parameter --classes_list openfile1 openwebsite.

Starting the Project

Keyframe Extraction

Run the following command in the command line:

bash extract_keyframes.sh

Note that extract_keyframes.sh sets several parameters, the meanings of which are as follows:

--video_path: The path to the video that needs to be processed.

--output_dir: The output path where the output video will be saved. Defaults to './Output/'.

--output_video_name: The file name of the output video. Defaults to 'output.mp4'.

--frame_rate: The frame rate of the output video. If this parameter is not specified, it defaults to the frame rate of the input video.

--method: The method used for keyframe extraction, which can include:
        "TOP_ORDER": Select keyframes based on the top `num_top_frames` frames with the largest frame differences.
        "Threshold": Select frames where the relative change exceeds `threshold`.
        "LOCAL_MAXIMA" (default): Select local maximum frames from the smoothed frame difference array.

--num_top_frames: The number of keyframes extracted using the "TOP_ORDER" method. Defaults to 50.

--threshold: Used in the "Threshold" method to define the minimum relative change between consecutive frames to consider a frame as a keyframe. This parameter is a floating-point number between 0 and 1, with a default of 0.6. The higher the value, the fewer keyframes are extracted.

--len_window: The window size used for smoothing frame differences in the "LOCAL_MAXIMA" method. Defaults to 50.

--split: If you need to split the output video, you can add this parameter. The output video will be split into segments of equal length, and the split videos will be saved in the 'output_dir' directory.

--clip_duration: The duration (in seconds) of each segment when splitting the video. Defaults to 60 seconds. If you do not add the --split parameter, you do not need to specify this parameter.

Model Training

You can simply run the following command in the command line:

bash lstm_model_train.sh

Note that lstm_model_train.sh sets several parameters, the meanings of which are as follows:

- `--dataset_dir`:
  - *Type*: `str`
  - *Required*: `True`
  - *Description*: Specifies the directory path where the training dataset is located. This dataset should contain the video files used for training, and may also include annotations for each class.

- `--classes_list`:
  - *Type*: `str`
  - *Nargs*: `+`
  - *Required*: `True`
  - *Description*: A list of class names included in the dataset. These names are used to label the data during training and should match the directory names or labels within the dataset directory.

- `--image_height`:
  - *Type*: `int`
  - *Default*: `64`
  - *Description*: Sets the height to which the input images should be resized. This parameter is crucial for ensuring that all input data has a consistent size, which is necessary for many deep learning models.

- `--image_width`:
  - *Type*: `int`
  - *Default*: `64`
  - *Description*: Sets the width to which the input images should be resized. Like the height, this ensures uniformity in the input data dimensions.

- `--sequence_length`:
  - *Type*: `int`
  - *Default*: `20`
  - *Description*: Defines the length of the sequence of frames that will be fed into the LSTM model. This is particularly important for time-series models like LSTM, which rely on sequences of data to make predictions.

- `--seed`:
  - *Type*: `int`
  - *Default*: `27`
  - *Description*: Sets the random seed used during training. This helps ensure reproducibility of results, as it ensures that random operations performed during training (such as data shuffling) are consistent across different runs.

- `--epochs`:
  - *Type*: `int`
  - *Default*: `100`
  - *Description*: Specifies the number of epochs (full passes through the dataset) for which the model should be trained. More epochs can improve performance but also increase the risk of overfitting.

- `--batch_size`:
  - *Type*: `int`
  - *Default*: `4`
  - *Description*: Determines the number of samples that will be propagated through the network in one pass. Smaller batch sizes can lead to more accurate gradient estimates but may require more epochs to converge.

- `--model_save_dir`:
  - *Type*: `str`
  - *Default*: `./models`
  - *Description*: The directory where the trained model will be saved. This directory will store the model weights and any other relevant files.

- `--model_name`:
  - *Type*: `str`
  - *Default*: `lstm_model`
  - *Description*: The name under which the trained model will be saved. This name is used to create the filename for the saved model.

Model Inference

You can simply run the following command in the command line:

bash lstm_model_inference.sh

Note that lstm_model_inference.sh sets several parameters, the meanings of which are as follows:

- `--model_path`:
  - *Type*: `str`
  - *Required*: `True`
  - *Description*: The path to the pre-trained model.

- `--input_video_file_path`:
  - *Type*: `str`
  - *Required*: `True`
  - *Description*: The path to the input video file.

- `--output_dir`:
  - *Type*: `str`
  - *Default*: `./Output`
  - *Description*: The directory where the output video will be saved.

- `--output_video_name`:
  - *Type*: `str`
  - *Default*: `output.mp4`
  - *Description*: The name of the output video file.

- `--sequence_length`:
  - *Type*: `int`
  - *Default*: `20`
  - *Description*: The length of the input sequence.

- `--image_height`:
  - *Type*: `int`
  - *Default*: `64`
  - *Description*: The height of the input images.

- `--image_width`:
  - *Type*: `int`
  - *Default*: `64`
  - *Description*: The width of the input images.

- `--classes_list`:
  - *Type*: `str`
  - *Nargs*: `+`
  - *Required*: `True`
  - *Description*: A list of class names.

- `--target_resolution`:
  - *Type*: `int`
  - *Default*: `300`
  - *Description*: The target resolution for the video.

- `--confidence_threshold`:
  - *Type*: `float`
  - *Default*: `0.99`
  - *Description*: The confidence rate threshold.

- `--actionlist_to_recognise`:
  - *Type*: `str`
  - *Nargs*: `+`
  - *Required*: `True`
  - *Description*: A list of actions to recognize.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published