A Python toolkit for generating video captions using the Lance database format and Gemini API for automatic captioning.
- Automatic video/audio/image description using Google's Gemini API or only image with pixtral-large 124B
- Export captions in SRT format
- Support for multiple video formats
- Batch processing with progress tracking
- Maintains original directory structure
- Configurable through TOML files
- Lance database integration for efficient data management
- Import videos into Lance database format
- Preserve original directory structure
- Support for both single directory and paired directory structures
- Extract videos and captions from Lance datasets
- Maintains original file structure
- Exports captions as SRT files in the same directory as source videos
- Auto Clip with SRT timestamps
- Automatic video scene description using Gemini API or Pixtral API
- Batch processing support
- SRT format output with timestamps
- Robust error handling and retry mechanisms
- Progress tracking for batch operations
- API prompt configuration management
- Customizable batch processing parameters
- Default schema includes file paths and metadata
Give unrestricted script access to powershell so venv can work:
- Open an administrator powershell window
- Type Set-ExecutionPolicy Unrestricted and answer A
- Close admin powershell window
Run the following PowerShell script:
./1、install-uv-qinglong.ps1
- First install PowerShell:
./0、install pwsh.sh
- Then run the installation script using PowerShell:
sudo pwsh ./1、install-uv-qinglong.ps1
use sudo pwsh if you in Linux.
video example: https://files.catbox.moe/8fudnf.mp4
Use the PowerShell script to import your videos:
./lanceImport.ps1
Use the PowerShell script to export data from Lance format:
./lanceExport.ps1
Use the PowerShell script to generate captions for your videos:
./run.ps1
Note: You'll need to configure your Gemini API key in run.ps1
before using the auto-captioning feature.
Pixtral API key optional for image caption.
Now we support step-1.5v-mini optional for video captioner.
$dataset_path = "./datasets"
$gemini_api_key = ""
$gemini_model_path = "gemini-2.0-flash-thinking-exp-01-21"
$pixtral_api_key = ""
$pixtral_model_path = "pixtral-large-2411"
$step_api_key = ""
$step_model_path = "step-1.5v-mini"
$dir_name = $true
$mode = "long"
$not_clip_with_caption = $false # Not clip with caption | 不根据caption裁剪
$wait_time= 1
$max_retries = 100
$segment_time= 300
基于 Lance 数据库格式的视频自动字幕生成工具,使用 Gemini API 进行场景描述生成。
- 使用 Google Gemini API 进行视频场景自动描述
- 导出 SRT 格式字幕文件
- 支持多种视频格式
- 批量处理并显示进度
- 保持原始目录结构
- 通过 TOML 文件配置
- 集成 Lance 数据库实现高效数据管理
- 将视频导入 Lance 数据库格式
- 保持原始目录结构
- 支持单目录和配对目录结构
- 从 Lance 数据集中提取视频和字幕
- 保持原有文件结构
- 在源视频所在目录导出 SRT 格式字幕
- 使用 Gemini API 进行视频场景描述
- 支持批量处理
- 生成带时间戳的 SRT 格式字幕
- 健壮的错误处理和重试机制
- 批处理进度跟踪
- API 配置管理
- 可自定义批处理参数
- 默认结构包含文件路径和元数据
运行以下 PowerShell 脚本:
./1、install-uv-qinglong.ps1
- 首先安装 PowerShell:
./0、install pwsh.sh
- 然后使用 PowerShell 运行安装脚本:
pwsh ./1、install-uv-qinglong.ps1
使用 PowerShell 脚本导入视频:
./lanceImport.ps1
使用 PowerShell 脚本从 Lance 格式导出数据:
./lanceExport.ps1
使用 PowerShell 脚本为视频生成字幕:
./run.ps1
注意:使用自动字幕生成功能前,需要在 run.ps1
中配置 Gemini API 密钥。
Pixtral API 秘钥 可选为图片打标。
现在我们支持使用阶跃星辰的视频模型进行视频标注。
$dataset_path = "./datasets"
$gemini_api_key = ""
$gemini_model_path = "gemini-2.0-flash-thinking-exp-01-21"
$pixtral_api_key = ""
$pixtral_model_path = "pixtral-large-2411"
$step_api_key = ""
$step_model_path = "step-1.5v-mini"
$dir_name = $true
$mode = "long"
$not_clip_with_caption = $false # Not clip with caption | 不根据caption裁剪
$wait_time= 1
$max_retries = 100
$segment_time= 300