Skip to content

Latest commit

 

History

History
27 lines (18 loc) · 834 Bytes

README.md

File metadata and controls

27 lines (18 loc) · 834 Bytes

Syllable Encoder

Decodes syllables inside a word. Applies vocabulary limit to low-frequency syllables. Can learn sylable frequencies from scratch.

This syllable encoder currently supports Turkish only.

Usage

$ pip install git+https://github.com/ftkurt/python-syllable.git@master
from syllable import Encoder


encoder = Encoder(lang="tr", limitby="vocabulary", limit=3000)  # params chosen for demonstration purposes

example = "İki kürkü yırtık kel kör kirpi dadanmış."
print(encoder.tokenize(example))
# i ki kür kü yır tık kel kör kir pi da dan mış
print(next(encoder.transform([example])))
# [10, 11, 713, 161, 859, 347, 349, 1081, 639, 384, 4, 49, 156]
print(next(encoder.inverse_transform(encoder.transform([example]))))
# i ki kür kü yır tık kel kör kir pi da dan mış