Skip to content

a Swiss Army Knife for comprehensible input and language learning with Anki

License

Notifications You must be signed in to change notification settings

tassa-yoniso-manasi-karoto/langkit

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

91 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Status: prerelease

Fork of Bunkai, which reimplemented the functionality first pioneered by cb4960 with subs2srs.

tldr

๐—•๐—ฎ๐˜€๐—ถ๐—ฐ ๐˜€๐˜‚๐—ฏ๐˜€๐Ÿฎ๐˜€๐—ฟ๐˜€ ๐—ณ๐˜‚๐—ป๐—ฐ๐˜๐—ถ๐—ผ๐—ป๐—ฎ๐—น๐—ถ๐˜๐˜†
$ langkit subs2cards media.mp4 media.th.srt media.en.srt

๐—”๐˜‚๐˜๐—ผ๐—บ๐—ฎ๐˜๐—ถ๐—ฐ ๐˜€๐˜‚๐—ฏ๐˜๐—ถ๐˜๐—น๐—ฒ ๐˜€๐—ฒ๐—น๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป (๐˜ฉ๐˜ฆ๐˜ณ๐˜ฆ: ๐˜ญ๐˜ฆ๐˜ข๐˜ณ๐˜ฏ ๐˜ฃ๐˜ณ๐˜ข๐˜ป๐˜ช๐˜ญ๐˜ช๐˜ข๐˜ฏ ๐˜ฑ๐˜ฐ๐˜ณ๐˜ต๐˜ถ๐˜จ๐˜ฆ๐˜ด๐˜ฆ ๐˜ง๐˜ณ๐˜ฐ๐˜ฎ ๐˜ค๐˜ข๐˜ฏ๐˜ต๐˜ฐ๐˜ฏ๐˜ฆ๐˜ด๐˜ฆ ๐˜ฐ๐˜ณ ๐˜ช๐˜ง ๐˜ถ๐˜ฏ๐˜ข๐˜ท๐˜ข๐˜ช๐˜ญ๐˜ข๐˜ฃ๐˜ญ๐˜ฆ, ๐˜ต๐˜ณ๐˜ข๐˜ฅ๐˜ช๐˜ต๐˜ช๐˜ฐ๐˜ฏ๐˜ข๐˜ญ ๐˜ค๐˜ฉ๐˜ช๐˜ฏ๐˜ฆ๐˜ด๐˜ฆ)
$ langkit subs2cards media.mp4 -l "pt-BR,yue,zh-Hant"

๐—•๐˜‚๐—น๐—ธ ๐—ฝ๐—ฟ๐—ผ๐—ฐ๐—ฒ๐˜€๐˜€๐—ถ๐—ป๐—ด (๐—ฟ๐—ฒ๐—ฐ๐˜‚๐—ฟ๐˜€๐—ถ๐˜ƒ๐—ฒ)
$ langkit subs2cards /path/to/media/dir/  -l "th,en"

๐— ๐—ฎ๐—ธ๐—ฒ ๐—ฎ๐—ป ๐—ฎ๐˜‚๐—ฑ๐—ถ๐—ผ๐˜๐—ฟ๐—ฎ๐—ฐ๐—ธ ๐˜„๐—ถ๐˜๐—ต ๐—ฒ๐—ป๐—ต๐—ฎ๐—ป๐—ฐ๐—ฒ๐—ฑ/๐—ฎ๐—บ๐—ฝ๐—น๐—ถ๐—ณ๐—ถ๐—ฒ๐—ฑ ๐˜ƒ๐—ผ๐—ถ๐—ฐ๐—ฒ๐˜€ ๐—ณ๐—ฟ๐—ผ๐—บ ๐˜๐—ต๐—ฒ ๐Ÿฎ๐—ป๐—ฑ ๐—ฎ๐˜‚๐—ฑ๐—ถ๐—ผ๐˜๐—ฟ๐—ฎ๐—ฐ๐—ธ ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—บ๐—ฒ๐—ฑ๐—ถ๐—ฎ (๐˜™๐˜ฆ๐˜ฑ๐˜ญ๐˜ช๐˜ค๐˜ข๐˜ต๐˜ฆ ๐˜ˆ๐˜—๐˜ ๐˜ต๐˜ฐ๐˜ฌ๐˜ฆ๐˜ฏ ๐˜ฏ๐˜ฆ๐˜ฆ๐˜ฅ๐˜ฆ๐˜ฅ)
$ langkit enhance media.mp4 -a 2 --sep demucs

๐— ๐—ฎ๐—ธ๐—ฒ ๐—ฎ ๐—ฑ๐˜‚๐—ฏ๐˜๐—ถ๐˜๐—น๐—ฒ ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—บ๐—ฒ๐—ฑ๐—ถ๐—ฎ ๐˜‚๐˜€๐—ถ๐—ป๐—ด ๐—ฆ๐—ง๐—ง ๐—ผ๐—ป ๐˜๐—ต๐—ฒ ๐˜๐—ถ๐—บ๐—ฒ๐—ฐ๐—ผ๐—ฑ๐—ฒ๐˜€ ๐—ผ๐—ณ ๐—ฝ๐—ฟ๐—ผ๐˜ƒ๐—ถ๐—ฑ๐—ฒ๐—ฑ ๐˜€๐˜‚๐—ฏ๐˜๐—ถ๐˜๐—น๐—ฒ ๐—ณ๐—ถ๐—น๐—ฒ (๐˜™๐˜ฆ๐˜ฑ๐˜ญ๐˜ช๐˜ค๐˜ข๐˜ต๐˜ฆ ๐˜ˆ๐˜—๐˜ ๐˜ต๐˜ฐ๐˜ฌ๐˜ฆ๐˜ฏ ๐˜ฏ๐˜ฆ๐˜ฆ๐˜ฅ๐˜ฆ๐˜ฅ)
$ langkit subs2dubs --stt whisper media.mp4 (media.th.srt) -l "th"

๐—–๐—ผ๐—บ๐—ฏ๐—ถ๐—ป๐—ฒ ๐—ฎ๐—น๐—น ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—ฎ๐—ฏ๐—ผ๐˜ƒ๐—ฒ ๐—ถ๐—ป ๐—ผ๐—ป๐—ฒ ๐—ฐ๐—ผ๐—บ๐—บ๐—ฎ๐—ป๐—ฑ
$ langkit subs2cards /path/to/media/dir/  -l "th,en" --stt whisper --sep demucs

Requirements

This fork require FFmpeg v6 or higher (dev builds being preferred), Mediainfo, a Replicate API token.

The FFmpeg dev team recommends end-users to use only the latest builds from the dev branch (master builds). The FFmpeg binary's location can be provided by a flag, in $PATH or in a "bin" directory placed in the folder where langkit is.

At the moment tokens should be passed through these env variables: REPLICATE_API_TOKEN, ASSEMBLYAI_API_KEY, ELEVENLABS_API_TOKEN.

Extra features compared to subs2srs

Default encoding to OPUS / AVIF

Use modern codecs to save storage. The image/audio codecs which langkit uses are state-of-the-art and are currently in active development.

The static FFmpeg builds guarantee that you have up-to-date codecs. If you don't use a well-maintained bleeding edge distro or brew, use the dev builds. You can check your distro here.

Automatic Speech Recognition / Speech-to-Text support

Translations of dubbings and of subtitles differ.ยน Therefore dubbings can't be used with subtitles in the old subs2srs unless said subs are closed captions or dubtitles.
With the flag --stt you can use Whisper (v3-large) on the audio clips corresponding to timecodes of the subtitles to get the transcript of the audio and then, have it replace the translation of the subtitles. AFAIK Language Reactor was the first to combine this with language learning from content however I found the accuracy of the STT they use to be unimpressive.

By default a dubtitle file will also be created from these transcriptions.

Side-by-side comparison table

Name (to be passed with --stt) Word Error Rate average across all supported langs (june 2024) Number of languages supported Price Type Note
whisper, wh 10,3% 57 $1.1/1000min MIT See here for a breakdown of WER per language.
insanely-fast-whisper, fast 16,2% 57 $0.0071/run MIT
universal-1, u1 8,7% 17 $6.2/1000min proprietary Untested (doesn't support my target lang)

See ArtificialAnalysis and Amgadoz @Reddit for detailed comparisons.

Note: OpenAI just released a turbo model of large-v3 but they say it's on a par with large-v2 as far as accuracy is concerned so I won't bother to add it.

Condensed Audio

langkit will automatically make an audio file containing all the audio snippets of dialog in the audiotrack.
This is meant to be used for passive listening.
More explanations and context here: Optimizing Passive Immersion: Condensed Audio - YouTube

Enhanced voice audiotrack

Make a new audiotrack with voices louder. This is very useful for languages that are phonetically dense, such as tonal languages, or for languages that sound very different from your native language.

It works by merging the original audiotrack with an audiotrack containing the voices only.
The separated voices are obtained using one of these:

Side-by-side comparison table

Name (to be passed with --sep) Quality of separated vocals Price Type Note
demucs, de good very cheap 0.063$/run MIT license Recommended
demucs_ft, ft good cheap 0.252$/run MIT license Fine-tuned version: "take 4 times more time but might be a bit better". I couldn't hear any difference with the original in my test.
spleeter, sp rather poor very, very cheap 0.00027$/run MIT license
elevenlabs, 11, el good very, very expensive
1$/MINUTE
proprietary Not fully supported due to limitations of their API (mp3 only) which desync the processed audio with the original.
Requires an Elevenlabs API token.
Does more processing than the others: noises are entirely eliminated, but it distort the soundstage to put the voice in the center. It might feel a bit uncanny in an enhanced track.

Note

demucs and spleeter are originally meant for songs (ie. tracks a few minutes long) and the GPUs allocated by Replicate to these models are not the best. You may encounter OOM GPU (out of memory) errors when trying to process audio tracks of movies. As far as my testing goes, trying a few hours later solves the problem.
Replicate also offers to make deployments with a GPU of one's choice, but this isn't cost-effective or user-friendly so it probably won't ever be supported.

Parallelization / multi-threading by default

By default all CPU cores available are used. You can reduce CPU usage by passing a lower --workers value than the default.

Bulk / recursive directory processing

...if you pass a directory instead of a mp4. The target and native language must be set using -l, see tldr section.

...But why?

There are plenty of good options already: Language Reactor (previously Language Learning With Netflix), asbplayer, mpvacious, voracious, memento...

Here is a list: awesome-immersion

They are awesome but all of them are media-centric: they are implemented around watching shows.

The approach here is word-centric:

  • word-centric notes referencing all common meanings: I cross-source dictionaries, LLMs to the map the meanings, connotations and register of a word. Then I use another tool to search my database of generated TSV to illustrate & disambiguate with real-world examples the meanings I have found. This results in high quality notes regrouping all examples sentences, TTS, picture... and any other fields related to the word, allowing for maximum context.
  • word-note reuse for language laddering: another advantage of this approach it that you can use this very note as basis for making cards for a new target language further down the line, while keeping all your previous note fields at hand for making the cards template for your new target language. The initial language acts just like Note ID for a meaning mapped across multiple languages. The majority of the basic vocabulary can be translated across languages directly with no real loss of meaning (and you can go on to disambiguate it further, using the method above for example). The effort that you spend on your first target language will thus pay off on subsequent languages.

There are several additional tools I made to accomplish this but they are hardcoded messes so don't expect me to publish them, langkit is enough work for me by itself! :)

License

All new contributions from commit d540bd4 onward are licensed under GPL-3.0.

See original README of bunkai below for the basic features:


Dissects subtitles and corresponding media files into flash cardsfor sentence mining with an SRS system like Anki. It is inspired by the linked article on sentence mining and existing tools, which you might want to check out as well.

Features

  • One or two subtitle files: Two subtitle files can be used together to provide foreign and native language expressions on the same card.
  • Multiple subtitle formats: Any format which is supported by go-astisub is also supported by this application, although some formats may work slightly better than others. If in doubt, try to use .srt subtitles.

Installation

There is no proper release process at this time, nor a guarantee of stability of any sort, as I'm the only user of the software that I am aware of. For now, you must install the application from source.

Requirements:

  • go command in PATH (only to build and install the application)
  • ffmpeg command in PATH (used at runtime)
go get github.com/tassa-yoniso-manasi-karoto/langkit

Usage

langkit is mainly used to generate flash cards from one or two subtitle files and a corresponding media file.

For example:

langkit subs2cards media-content.mp4 foreign.srt native.srt

The above command generates the tab-separated file foreign.tsv and a corresponding directory foreign.media/ containing the associated images and audio files. To do sentence mining, import the file foreign.tsv into a new deck and then, at least in the case of Anki, copy the media files manually into Anki's collection.media directory.

Before you can import the deck with Anki though, you must add a new Note Type which includes some or all of the fields below on the front and/or back of each card. The columns in the generated .tsv file are as follows:

# Name Description
1 Sound Extracted audio as a [sound] tag for Anki
2 Time Subtitle start time code as a string
3 Source Base name of the subtitle source file
4 Image Selected image frame as an <img> tag
5 ForeignCurr Current text in foreign subtitles file
6 NativeCurr Current text in native subtitles file
7 ForeignPrev Previous text in foreign subtitles file
8 NativePrev Previous text in native subtitles file
9 ForeignNext Next text in foreign subtitles file
10 NativeNext Next text in native subtitles file

When you review the created deck for the first time, you should go quickly through the entire deck at once. During this first pass, your goal should be to identify those cards which you can understand almost perfectly, if not for the odd piece of unknown vocabulary or grammar; all other cards which are either too hard or too easy should be deleted in this pass. Any cards which remain in the imported deck after mining should be refined and moved into your regular deck for studying the language on a daily basis.

For other uses, run langkit --help to view the built-in documentation.

Subtitle editors

The state of affairs when it comes to open-source subtitle editors is a sad one, but here's a list of editors which may or may not work passably. If you know a good one, please let me know!

Name Platforms Description
Aegisub macOS & others Seems to have been a popular choice, but is no longer actively maintained.
Jubler macOS & others Works reasonably well, but fixing timing issues is still somewhat cumbersome.

Known alternatives

There are at least three alternatives to this application that I know of, by now. Oddly enough, I found substudy just after the prototype and movies2anki when I published this repository. Something is off with my search skills! :)

  • movies2anki: Fully-integrated add-on for Anki which has some advanced features and supports all platforms
  • substudy: CLI alternative to subs2srs with the ability to export into other formats as well, not just SRS decks
  • subs2srs: GUI software for Windows with many features, and inspiration for substudy and Bunkai

Releases

No releases published

Packages

No packages published

Languages

  • Go 100.0%