This repository contains a Python script for a speech processing pipeline. The pipeline includes recording audio, transcribing the audio, translating the transcription, and serving the translated text through a server.
Using AI: openAI/Whisper & facebook/nllb
Ensure you have the following dependencies installed:
- Python 3.x
- Recommended to use a python virtual environment
- Required Python libraries (specified in
requirements.txt
)
To install the required dependencies, please follow the instructions below.
-
Clone this repository:
git clone -b main https://github.com/cclngit/SpeechToTextToTranslate.git
-
Navigate to the project directory:
cd SpeechToTextToTranslate
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
-
On Windows:
venv\Scripts\activate
-
On macOS and Linux:
source venv/bin/activate
-
-
Create a model directory and download the speech-to-text and translation models:
mkdir models ct2-transformers-converter --model facebook/nllb-200-distilled-600M --output_dir models/nllb-200-distilled-600M ct2-transformers-converter --model openai/whisper-tiny --output_dir models/faster-whisper-tiny --copy_files tokenizer.json --quantization float32
-
Install dependencies:
pip install -r requirements.txt
-
Configure the pipeline:
- Modify
config.json
to customize the pipeline settings as needed.
- Modify
To run the speech processing pipeline, execute the run_pipeline.py
script:
python src/main.py
Make sure to provide a valid configuration file named config.json
in the repository directory. You can use the provided config_template.json
as a reference.
But don't forget to download the models as mentioned in the setup section.
Whisper have 4 models :
- whisper-tiny
- whisper-small
- whisper-medium
- whisper-large
And the nllb-200-distilled-600M model can be found on Huggingface or on OpenNMT
The pipeline's behavior can be configured using the config.json
file. You can customize settings such as directories for recordings, transcriptions, translations, recording frequency, duration, language settings, etc.
Contributions are welcome! If you encounter any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Special thanks to the developers of the libraries and tools used in this project.
This project is for educational and demonstration purposes only. Ensure compliance with applicable laws and regulations when using this software in real-world scenarios.