Skip to content

Commit

Permalink
Aksharamukha update (#103)
Browse files Browse the repository at this point in the history
* Yiddish transliteration via submodules.

* Update checkout workflow.

* Change refs for Yiddish submodules.

* Fix WORKDIR in Dockerfile

* Do not remove yiddish module.

* Manually add yiddish submodules.

* Use git clone instead of submodule.

* Move ext checkout to github actions.

* Chinese numerals (#97)

* WIP Parse Chinese numerals.

* WIP complete number parsing.

* Complete Chinese numerals:

* Use standard table override instead of pre-config hooks.
* Add few test strings.

* Complete numerals:

* Transliterate all numeric examples correctly
* Modify hook return logic for consistency
* WIP partial spacing fix.

* Some cleanup; upgrade docker OS.

* Add dependency for uwsgi.

* Squashed commit of the following: (#98)

commit 30859a5
Author: scossu <[email protected]>
Date:   Wed Feb 28 22:17:36 2024 -0500

    Move ext checkout to github actions.

commit 6d8da6d
Author: scossu <[email protected]>
Date:   Wed Feb 28 21:45:01 2024 -0500

    Use git clone instead of submodule.

commit ade9da5
Author: scossu <[email protected]>
Date:   Wed Feb 28 21:42:45 2024 -0500

    Manually add yiddish submodules.

commit 77cb9ef
Author: scossu <[email protected]>
Date:   Wed Feb 28 21:23:37 2024 -0500

    Do not remove yiddish module.

commit e405b36
Author: scossu <[email protected]>
Date:   Wed Feb 28 09:11:41 2024 -0500

    Fix WORKDIR in Dockerfile

commit 95445ba
Author: scossu <[email protected]>
Date:   Wed Feb 28 09:07:50 2024 -0500

    Change refs for Yiddish submodules.

commit 208ea09
Author: scossu <[email protected]>
Date:   Wed Feb 28 08:45:58 2024 -0500

    Update checkout workflow.

* Add debug output to /trans response.

* Split docker files and requirements.

* Add bad request debug handler.

* Add bad request debug handler.

* Adjust CI workflows.

* Fix image name typo.

* Refine triggers.

* Fix typo on test workflow trigger.

* Use JSON in POST body.

* Also use JSON in feedback request; update docs.

* Return json data in 400 debug.

* Update Aksharamukha.

* Add new set of languages; separate pre and post options in Aksharamukha.
  • Loading branch information
scossu authored May 14, 2024
1 parent fa5b48d commit 99dcaac
Show file tree
Hide file tree
Showing 15 changed files with 231 additions and 7 deletions.
2 changes: 1 addition & 1 deletion deps.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# External dependencies.
aksharamukha>=2.1,<3
aksharamukha>=2.2,<3
camel-tools>=1.5
funcy>=1.15,<2
pymarc>=4.0,<5
Expand Down
18 changes: 12 additions & 6 deletions scriptshifter/hooks/aksharamukha/romanizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,22 @@
logger = getLogger(__name__)


def s2r_post_config(ctx, src_script):
def s2r_post_config(ctx, src_script, pre=[], post=[]):
# options = detect_preoptions(ctx.src, src_script)
options = [n for n, v in ctx.options.items() if v and n != "capitalize"]
ctx.dest = process(src_script, "IAST", ctx.src, pre_options=options)
pre_options = pre + [
n for n, v in ctx.options.items() if v and n != "capitalize"]
ctx.dest = process(
src_script, "RomanLoC", ctx.src,
pre_options=pre_options, post_options=post)

return BREAK


def r2s_post_config(ctx, dest_script):
options = [n for n, v in ctx.options.items() if v and n != "capitalize"]
ctx.dest = process("IAST", dest_script, ctx.src, post_options=options)
def r2s_post_config(ctx, dest_script, pre=[], post=[]):
post_options = post + [
n for n, v in ctx.options.items() if v and n != "capitalize"]
ctx.dest = process(
"RomanLoC", dest_script, ctx.src,
pre_options=pre, post_options=post_options)

return BREAK
16 changes: 16 additions & 0 deletions scriptshifter/tables/data/gujarati.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
general:
name: Gujarati

script_to_roman:
hooks:
post_config:
-
- aksharamukha.romanizer.s2r_post_config
- src_script: "Gujarati"

roman_to_script:
hooks:
post_config:
-
- aksharamukha.romanizer.r2s_post_config
- dest_script: "Gujarati"
24 changes: 24 additions & 0 deletions scriptshifter/tables/data/index.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ greek_classical:
name: Greek (classical)
greek_modern:
name: Greek (modern)
gujarati:
name: Gujarati
hebrew:
name: Hebrew
hindi:
Expand All @@ -68,6 +70,8 @@ katakana:
name: Japanese (Katakana)
kalmyk_cyrillic:
name: Kalmyk (Cyrillic)
kannada:
name: Kannada
kara-kalpak_cyrillic:
name: Kara-Kalpak (Cyrillic)
karachai-balkar_cyrillic:
Expand All @@ -80,6 +84,8 @@ khakass_cyrillic:
name: Khakass (Cyrillic)
khanty_cyrillic:
name: Khanty (Cyrillic)
khmer:
name: Khmer
komi_cyrillic:
name: Komi (Cyrillic)
korean_nonames:
Expand All @@ -96,8 +102,12 @@ lithuanian_cyrillic:
name: Lithuanian (Cyrillic)
macedonian:
name: Macedonian
marathi:
name: Marathi (Devanagari)
mansi_cyrillic:
name: Mansi (Cyrillic)
malayalam:
name: Malayalam
moldovan_cyrillic:
name: Moldovan (Cyrillic)
mongolian_cyrillic:
Expand All @@ -108,8 +118,16 @@ mordvin_cyrillic:
name: Mordvin (Cyrillic)
nenets_cyrillic:
name: Nenets (Cyrillic)
oriya:
name: Oriya
ossetic_cyrillic:
name: Ossetic (Cyrillic)
pali:
name: Pali
panjabi:
name: Panjabi
prakrit:
name: Prakrit (Devanagari)
pulaar:
name: Pulaar (Adlam)
gurmukhi:
Expand All @@ -118,10 +136,14 @@ romani_cyrillic:
name: Romani (Cyrillic)
russian:
name: Russian
sanskrit:
name: Sanskrit (Devanagari)
serbian:
name: Serbian
shor_cyrillic:
name: Shor (Cyrillic)
sinhalese:
name: Sinhalese
syriac_cyrillic:
name: Syriac (Cyrillic)
tajik_cyrillic:
Expand All @@ -132,6 +154,8 @@ tamil_brahmi:
name: Tamil Brahmi
tamil_extended:
name: Tamil (extended)
telugu:
name: Telugu
thai:
name: Thai
tatar-kryashen_cyrillic:
Expand Down
16 changes: 16 additions & 0 deletions scriptshifter/tables/data/kannada.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
general:
name: Kannada

script_to_roman:
hooks:
post_config:
-
- aksharamukha.romanizer.s2r_post_config
- src_script: "Kannada"

roman_to_script:
hooks:
post_config:
-
- aksharamukha.romanizer.r2s_post_config
- dest_script: "Kannada"
16 changes: 16 additions & 0 deletions scriptshifter/tables/data/khmer.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
general:
name: Khmer

script_to_roman:
hooks:
post_config:
-
- aksharamukha.romanizer.s2r_post_config
- src_script: "Khmer"

roman_to_script:
hooks:
post_config:
-
- aksharamukha.romanizer.r2s_post_config
- dest_script: "Khmer"
16 changes: 16 additions & 0 deletions scriptshifter/tables/data/malayalam.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
general:
name: Khmer

script_to_roman:
hooks:
post_config:
-
- aksharamukha.romanizer.s2r_post_config
- src_script: "Khmer"

roman_to_script:
hooks:
post_config:
-
- aksharamukha.romanizer.r2s_post_config
- dest_script: "Khmer"
18 changes: 18 additions & 0 deletions scriptshifter/tables/data/marathi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
general:
name: Marathi (Devanagari)

script_to_roman:
hooks:
post_config:
-
- aksharamukha.romanizer.s2r_post_config
- src_script: "Devanagari"
- post: ["HindiMarathiRomanLoCFix"]

roman_to_script:
hooks:
post_config:
-
- aksharamukha.romanizer.r2s_post_config
- dest_script: "Devanagari"
- pre: ["HindiMarathiRomanLoCFix"]
16 changes: 16 additions & 0 deletions scriptshifter/tables/data/oriya.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
general:
name: Oriya

script_to_roman:
hooks:
post_config:
-
- aksharamukha.romanizer.s2r_post_config
- src_script: "Oriya"

roman_to_script:
hooks:
post_config:
-
- aksharamukha.romanizer.r2s_post_config
- dest_script: "Oriya"
16 changes: 16 additions & 0 deletions scriptshifter/tables/data/pali.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
general:
name: Pali

script_to_roman:
hooks:
post_config:
-
- aksharamukha.romanizer.s2r_post_config
- src_script: "Pali"

roman_to_script:
hooks:
post_config:
-
- aksharamukha.romanizer.r2s_post_config
- dest_script: "Pali"
16 changes: 16 additions & 0 deletions scriptshifter/tables/data/panjabi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
general:
name: Panjabi

script_to_roman:
hooks:
post_config:
-
- aksharamukha.romanizer.s2r_post_config
- src_script: "Punjabi"

roman_to_script:
hooks:
post_config:
-
- aksharamukha.romanizer.r2s_post_config
- dest_script: "Punjabi"
16 changes: 16 additions & 0 deletions scriptshifter/tables/data/prakrit.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
general:
name: Prakrit (Devanagari)

script_to_roman:
hooks:
post_config:
-
- aksharamukha.romanizer.s2r_post_config
- src_script: "Devanagari"

roman_to_script:
hooks:
post_config:
-
- aksharamukha.romanizer.r2s_post_config
- dest_script: "Devanagari"
16 changes: 16 additions & 0 deletions scriptshifter/tables/data/sanskrit.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
general:
name: Sanskrit (Devanagari)

script_to_roman:
hooks:
post_config:
-
- aksharamukha.romanizer.s2r_post_config
- src_script: "Devanagari"

roman_to_script:
hooks:
post_config:
-
- aksharamukha.romanizer.r2s_post_config
- dest_script: "Devanagari"
16 changes: 16 additions & 0 deletions scriptshifter/tables/data/sinhalese.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
general:
name: Sinhalese

script_to_roman:
hooks:
post_config:
-
- aksharamukha.romanizer.s2r_post_config
- src_script: "Sinhala"

roman_to_script:
hooks:
post_config:
-
- aksharamukha.romanizer.r2s_post_config
- dest_script: "Sinhala"
16 changes: 16 additions & 0 deletions scriptshifter/tables/data/telugu.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
general:
name: Telugu

script_to_roman:
hooks:
post_config:
-
- aksharamukha.romanizer.s2r_post_config
- src_script: "Telugu"

roman_to_script:
hooks:
post_config:
-
- aksharamukha.romanizer.r2s_post_config
- dest_script: "Telugu"

0 comments on commit 99dcaac

Please sign in to comment.