This project aims to provide a simple(r) alternative to trunk-transcribe for people who want to transcribe just a couple talkgroups.
Important
Please note that until I get an extra nvidia GPU to test with, this project has only been tested with CPU transcription.
- A working trunk-recorder setup
- s3 storage, for your calls you want to transcribe
- an mqtt broker. if you are already using the trunk-recorder mqtt status plugin, you can re-use the same broker
- python3, pip (note that 3.12 seems to have issues - try 3.10 or 3.11)
Create config.py
in config/ using the config_example.py
as your guide.
Install the needed python libraries:
pip install -r requirements.txt
Then give python3 main.py
a shot!
Once that is running, edit upload-script-example.py
to point at the right s3 bucket and mqtt broker, and configure your trunk-recorder instance to run this script when systems you want to transcribe have calls:
"uploadScript": "python3 whatever-you-name-the-script.py"
Note
if the upload script is not running on the same host, you will need to run pip3 install boto3 paho_mqtt
or else the script will complain.
I recommend making a folder to run this in but it's up to you. for example:
mkdir transcription
cd transcription
wget https://github.com/cschmittiey/simple-tr-transcription/blob/main/docker-compose.yml
mkdir config
wget https://github.com/cschmittiey/simple-tr-transcription/blob/main/config/config_example.py -o config/config.py
## edit your config file at this point
docker compose up -d # start up the container in the background
docker compose logs -f # to watch the logs
Docker compose example if you know what you're doing:
services:
transcription:
#build: .
image: ghcr.io/cschmittiey/simple-tr-transcription:latest
restart: unless-stopped
volumes:
- ./config:/app/config
Important
for multiple systems, as the script is written now you will want multiple copies of the script with a different S3 bucket for each system. sorry. maybe someday i'll fix that 🤷
- multi-system support
- figure out how to have multiple workers work nicely
- rate limit transcriptions so the system doesn't get over-whelmed?
- maybe only upload talkgroups that we want transcribed? this should be configurable, maybe someone wants all the calls in s3?
- upload script add minimum/maximum length limits for the file maybe
- better readme, lol