Skip to content

A simple python script for converting translation memories from TMX to Stardict format (Searchable from Goldendict)

Notifications You must be signed in to change notification settings

Celso-Scott/TMX-to-Goldendict-Converter

Repository files navigation

84000

TMX to Goldendict Convertor

This is a simple script to convert TMX translation memory files into Stardict format dictionary files that may be searched with the Goldendict app or similar application. I’ve found this is a very useful way to search them as an alternative or supplement to searching them from a CAT platform. This script was primarily used by myself to convert 84000 TMX files, but I realized this same process may also be used by others to convert there own TMX files. Thus you will see two scripts along with corresponding folders (folders will be created by running scripts), one for “84000” and one labeled “other.” The 84000 script is separate because it preserves and inserts the Toh and folio numbers into the definition field. Please just ignore the “header_placeholder” but leave it place; the pyglossary script for whatever reason scraps the first line when converting to the Stardict format, so I just added a placeholder line "SOURCE TARGET" that will be removed and prevent the first TM segment from being lost.

As for the “other” script, in theory, it should work with any TMX file with any language pairing. I tested it with some files generated by smartCAT and they worked, but I’m sure there will be various bugs to arise depending on formats, so please leave issues or pull requests as needed. The final format of the files in the “ready_to_merge” output folders should be a simple tab separated document. The “merge” script will merge all the documents in the output folders into a single document and change the file extension to .txt. From here the conversion is not yet complete. You will need to run the “pyglossary” script that is available along with documentation from the following link:

pyglossary

My python knowledge is very minimal so I don’t know how to merge these scripts together but if anyone else would like to improve on the process by all means be my guest.

Pyglossary will create three Stardict .ifo dictionary files that may then be read by Goldendict.

Please see my youtube videos for further details on how to use Goldendict and a step by step run through for using the pyglossary script.

-Celso Wilkinson ([email protected])

About

A simple python script for converting translation memories from TMX to Stardict format (Searchable from Goldendict)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages