Skip to content

Commit

Permalink
Test (#111)
Browse files Browse the repository at this point in the history
* Yiddish transliteration via submodules.

* Update checkout workflow.

* Change refs for Yiddish submodules.

* Fix WORKDIR in Dockerfile

* Do not remove yiddish module.

* Manually add yiddish submodules.

* Use git clone instead of submodule.

* Move ext checkout to github actions.

* Chinese numerals (#97)

* WIP Parse Chinese numerals.

* WIP complete number parsing.

* Complete Chinese numerals:

* Use standard table override instead of pre-config hooks.
* Add few test strings.

* Complete numerals:

* Transliterate all numeric examples correctly
* Modify hook return logic for consistency
* WIP partial spacing fix.

* Some cleanup; upgrade docker OS.

* Add dependency for uwsgi.

* Squashed commit of the following: (#98)

commit 30859a5
Author: scossu <stefano@cossu.cc>
Date:   Wed Feb 28 22:17:36 2024 -0500

    Move ext checkout to github actions.

commit 6d8da6d
Author: scossu <stefano@cossu.cc>
Date:   Wed Feb 28 21:45:01 2024 -0500

    Use git clone instead of submodule.

commit ade9da5
Author: scossu <stefano@cossu.cc>
Date:   Wed Feb 28 21:42:45 2024 -0500

    Manually add yiddish submodules.

commit 77cb9ef
Author: scossu <stefano@cossu.cc>
Date:   Wed Feb 28 21:23:37 2024 -0500

    Do not remove yiddish module.

commit e405b36
Author: scossu <stefano@cossu.cc>
Date:   Wed Feb 28 09:11:41 2024 -0500

    Fix WORKDIR in Dockerfile

commit 95445ba
Author: scossu <stefano@cossu.cc>
Date:   Wed Feb 28 09:07:50 2024 -0500

    Change refs for Yiddish submodules.

commit 208ea09
Author: scossu <stefano@cossu.cc>
Date:   Wed Feb 28 08:45:58 2024 -0500

    Update checkout workflow.

* Add debug output to /trans response.

* Split docker files and requirements.

* Add bad request debug handler.

* Adjust CI workflows.

* Fix image name typo.

* Refine triggers.

* Fix typo on test workflow trigger.

* Use JSON in POST body.

* Also use JSON in feedback request; update docs.

* Return json data in 400 debug.

* Update Aksharamukha.

* Add new set of languages; separate pre and post options in Aksharamukha. (#102)

* Add all remaining Devanagari scripts. (#107)

* Add R2S for Kurdish, Persian, Pushto, Urdu, and bidirectional Divehi.

* Add R2S for Kurdish, Persian, Pushto, Urdu, and bidirectional Divehi. (#108)

* Fix YAML syntax errors.

* P3 legacy mappings (#109)

* Add R2S for Kurdish, Persian, Pushto, Urdu, and bidirectional Divehi.

* Fix YAML syntax errors.

* Fix table section for Divehi.

* P3 legacy mappings (#110)

* Add R2S for Kurdish, Persian, Pushto, Urdu, and bidirectional Divehi.

* Fix YAML syntax errors.

* Fix table section for Divehi.

* Fix mapping for Divehi.
scossu authored Jun 10, 2024
1 parent 0f0bb3a commit 0aad22c
Showing 19 changed files with 2,501 additions and 6 deletions.
573 changes: 573 additions & 0 deletions legacy/data/DivehiThaanaRomanization.cfg

Large diffs are not rendered by default.

125 changes: 125 additions & 0 deletions legacy/data/KurdishRomanization.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# version 0.9.1
# Original table by William Kopycki
# Last updated 08 July 2009

[General]
Name=Kurdish
ScriptCode=(3
Truncation=%

[RomanToScript]
FieldsIncluded=100 110 111 130 240 245 246 250 260 440 490 600 610 611 630 651 700 710 711 730 740 800 830
SubfieldsAlwaysExcluded=uvxy0123456789
OtherSubfieldsExcludedByTag=650/a 260/c 246/i
IncludeFormattingCharactersLcPattern=True

# "Authorized" names:

# Punctuation marks:
# %=U+066A ; cannot transliterate the truncation character
*=U+066D
,=U+060C
;=U+061B
?=U+061F

# Numbers (these should be Arabic-Indic digits from 0660-0669. We will use 06F0-06F9 for Persian and Urdu--WK)
0=U+0660
1=U+0661
2=U+0662
3=U+0663
4=U+0664
5=U+0665
6=U+0666
7=U+0667
8=U+0668
9=U+0669

# Vowels and vowel/consonant combinations
U+02BBE=U+0639U+0647U+200C
U+02BBe=U+0639U+0647U+200C
A=U+0626U+0627
a=U+0627
EU+0302=U+0626U+0647U+200C
eU+0302=U+06CE
E=U+0626U+0647U+200C
e=U+0647U+200C
IU+0302=U+0626U+064A

# here is the "alif maksura" which otherwise serves as the "Persian yah U+06CC which is not valid in MARC-8 character set.

# THIS NEEDS TO BE ADJUSTED FOR "i[circumflex]y" and probably "e[circumflex]y combinations to = U+0649

iU+0302U+0020=U+0649
iU+0302=U+064A
I=
i=
O=U+06C6
o=U+06C6
uU+0302=U+0648U+0648
U=U+0626U+0648
u=U+0648

# Consonants:
B=U+0628
b=U+0628
CU+0327=U+0686
cU+0327=U+0686
C=U+062C
c=U+062C
DU+0323=U+0636
dU+0323=U+0636
D=U+062F
d=U+062F
F=U+0641
f=U+0641
G=U+06AF
g=U+06AF
HU+0308=U+062D
hU+0308=U+062D
H=U+0647
h=U+0647
J=U+0698
j=U+0698
K=U+06A9
k=U+06A9
#L and l with stroke
U+0141=U+06B5
U+0142=U+06B5
L=U+0644
l=U+0644
M=U+0645
m=U+0645
N=U+0646
n=U+0646
P=U+067E
p=U+067E
Q=U+0642
q=U+0642
RU+0304=U+0695
rU+0304=U+0695
R=U+0631
r=U+0631
SU+0323=U+0635
sU+0323=U+0635
SU+0327=U+0634
sU+0327=U+0634
S=U+0633
s=U+0633
TU+0323=U+0637
tU+0323=U+0637
T=U+062A
t=U+062A
V=U+06A8
v=U+06A8
W=U+0648
w=U+0648
XU+0308=U+063A
xU+0308=U+063A
X=U+062E
x=U+062E
Y=U+064A
y=U+064A
Z=U+0632
z=U+0632

[ScriptToRoman]
2 changes: 1 addition & 1 deletion legacy/data/PersianRomanization.cfg
Original file line number Diff line number Diff line change
@@ -5,13 +5,13 @@

[General]
Name=Persian
ScriptCode=(3
Truncation=%

[RomanToScript]
FieldsIncluded=100 110 111 130 240 245 246 250 260 264 440 490 600 610 611 630 651 700 710 711 730 740 800 830
SubfieldsAlwaysExcluded=uvxy0123456789
OtherSubfieldsExcludedByTag=100/e 110/e 111/j 246/i 260/c 264/c 650/a 700/e 700/i 710/e 710/i 711/i 711/j 730/i
Subfield6Code=(3
IncludeFormattingCharactersLcPattern=True

# RDA boilerplate phrases not transliterated:
2 changes: 1 addition & 1 deletion legacy/data/PushtoRomanization.cfg
Original file line number Diff line number Diff line change
@@ -5,13 +5,13 @@

[General]
Name=Pushto
ScriptCode=(3
Truncation=%

[RomanToScript]
FieldsIncluded=100 110 111 130 245 246 250 260 264 440 490 505 600 610 611 630 651 700 710 711 730 740 800 830
SubfieldsAlwaysExcluded=uvxy0123456789
OtherSubfieldsExcludedByTag=100/e 110/e 111/j 246/i 260/c 264/c 650/a 700/e 700/i 710/e 710/i 711/i 711/j 730/i
Subfield6Code=(3
IncludeFormattingCharactersLcPattern=True

# RDA boilerplate phrases not transliterated:
2 changes: 1 addition & 1 deletion legacy/data/UrduRomanization.cfg
Original file line number Diff line number Diff line change
@@ -5,13 +5,13 @@

[General]
Name=Urdu
ScriptCode=(3
Truncation=%

[RomanToScript]
FieldsIncluded=100 110 111 130 240 245 246 250 260 264 440 490 505 600 610 611 630 651 700 710 711 730 740 800 830
SubfieldsAlwaysExcluded=uvxy0123456789
OtherSubfieldsExcludedByTag=100/e 110/e 111/j 246/i 260/c 264/c 650/a 700/e 700/i 710/e 710/i 711/i 711/j 730/i
Subfield6Code=(3
IncludeFormattingCharactersLcPattern=True

# RDA boilerplate phrases not transliterated:
Loading

0 comments on commit 0aad22c

Please sign in to comment.