subs2cards | subs2dubs | enhance | translit | |
---|---|---|---|---|
Make tokenized subtitle | ✅ | 🚫 | ❌ | ✅ |
Make translit subtitle | ✅ | 🚫 |
❌ |
✅ |
Make enhanced track | ✅ | ✅ |
✅ | ❌ |
Make a merged video | ✅ | ✅ | ✅ | ❌ |
Make tokenized dubtitle | ✅ | ✅ | ❌ | 🚫 |
Make translit dubtitle | ✅ | ✅ | ❌ | 🚫 |
Make dubtitle | ✅ | ✅ | ❌ | ❌ |
Make condensed audio | ✅ | ❌ | ❌ | ❌ |
Make Anki notes |
✅ | ❌ | ❌ | ❌ |
in progress:
-
add subtitle transliteration? remote API is difficult but so is shipping python with NLP libs. 🤔 https://awesome-go.com/tokenizers/ https://go.libhunt.com/ Thai: ✅ thai2english.com PythaiNLP + my own lib? ❌ deepcut: accurate but bad perf, unmaintained Japanese: https://github.com/taishi-i/awesome-japanese-nlp-resources/ ✅ go-ichiran ikawaha / kagome ❌ shogo82148 / go-mecab : above should be enough ginza (py) Kanji translit: https://github.com/ysugimoto/go-kakasi Kana-romaji transliterator: robpike / nihongo OR gojp / kana Chinese: Tokenizer https://github.com/yanyiwu/gojieba Transliterator https://github.com/mozillazg/go-pinyin or https://github.com/mozillazg/go-unidecode (same author) Korean: Transliterator https://github.com/hangulize/hangulize // doubt it's worth it: learning hangul is easy Indic languages/scripts: https://github.com/libindic/indic-trans https://github.com/virtualvinodh/aksharamukha (already offers a docker-compose) Cyrillic: https://github.com/barseghyanartur/transliterate OR https://github.com/mehanizm/iuliia-go
Transliteration needed too: Arabic, Cantonese
-
fork progressbar bc its time prediction use a rate based on few past seconds to make an ETA and it is garbage when tasks are CPU bound + massive task pool
-
for bulk processing: leverage WithLevel() to implement --less-lethal
-
(MUST TEST:) insanely-fast-whisper
later:
- Make autosub local-independent: en match if en-US, no match if en-US and en-IN. Add a --strict
- integrate with viper and yaml config file:
- whisper initial_prompt
- tokens
- gain & limiter parameters for merging
- more debug info (FFmpeg version, mediainfo, platform...)
- with libvips binding fuzz trim to remove black padding if ratio is different
might:
- speechmatics (NO GO LIB) https://docs.speechmatics.com/introduction/batch-guide https://docs.speechmatics.com/jobsapi#tag/RetrieveTranscriptResponse
- use Enhanced voice audiotrack as basis for audio clips
- use lower bitrate opus with DRED & LBRR when standardized 1,2
- lossless AVIF extraction from AV1 (HQ but worst than JPEG in size)