Speech Processing Pipeline

This repository contains a Python script for a speech processing pipeline. The pipeline includes recording audio, transcribing the audio, translating the transcription, and serving the translated text through a server.

Using AI: openAI/Whisper & facebook/nllb

Requirements

Ensure you have the following dependencies installed:

Python 3.x
Recommended to use a python virtual environment
Required Python libraries (specified in requirements.txt)

Setup

To install the required dependencies, please follow the instructions below.

Clone this repository:

git clone -b main https://github.com/cclngit/SpeechToTextToTranslate.git

Navigate to the project directory:
```
cd SpeechToTextToTranslate
```
Create a virtual environment:
```
python -m venv venv
```
Activate the virtual environment:
- On Windows:
```
venv\Scripts\activate
```
- On macOS and Linux:
```
source venv/bin/activate
```

Create a model directory and download the speech-to-text and translation models:

mkdir models
ct2-transformers-converter --model facebook/nllb-200-distilled-600M --output_dir models/nllb-200-distilled-600M
ct2-transformers-converter --model openai/whisper-tiny --output_dir models/faster-whisper-tiny --copy_files tokenizer.json --quantization float32

Install dependencies:
```
pip install -r requirements.txt
```
Configure the pipeline:
- Modify config.json to customize the pipeline settings as needed.

Usage

To run the speech processing pipeline, execute the run_pipeline.py script:

python src/main.py

Make sure to provide a valid configuration file named config.json in the repository directory. You can use the provided config_template.json as a reference. But don't forget to download the models as mentioned in the setup section. Whisper have 4 models :

whisper-tiny
whisper-small
whisper-medium
whisper-large

And the nllb-200-distilled-600M model can be found on Huggingface or on OpenNMT

Configuration

The pipeline's behavior can be configured using the config.json file. You can customize settings such as directories for recordings, transcriptions, translations, recording frequency, duration, language settings, etc.

Contributing

Contributions are welcome! If you encounter any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Special thanks to the developers of the libraries and tools used in this project.

Disclaimer

This project is for educational and demonstration purposes only. Ensure compliance with applicable laws and regulations when using this software in real-world scenarios.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.vscode		.vscode
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert_file.sh		convert_file.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Processing Pipeline

Requirements

Setup

Usage

Configuration

Contributing

License

Acknowledgments

Disclaimer

About

Languages

License

cclngit/SpeechToTextToTranslate

Folders and files

Latest commit

History

Repository files navigation

Speech Processing Pipeline

Requirements

Setup

Usage

Configuration

Contributing

License

Acknowledgments

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Languages