All operations depending on tokenizing should be eliminated #136

neelsmith · 2019-08-30T11:27:27Z

They should be cleanly separated from the OHCO2 functionality, since tokenization depends on specifying an orthographic system.

OHCO2 should focus exclusively on citation-based operatios.

Eumaeus · 2019-11-02T04:02:03Z

Agree!

For Iliads, the NGram function is limited, since it works only within poetic lines (in our current code), or not at all with tokenized editions. I've had my CS students tokenize their text, and write new Ngram functions that use citable tokens, producing range-URNs for each. It is better.

neelsmith · 2019-11-04T18:36:39Z

I've defined tokenizing as part of an orthography trait, and implemented in a couple of implementations of orthography.

…

On Sat, Nov 2, 2019 at 12:02 AM Christopher W. Blackwell < ***@***.***> wrote: Agree! For Iliads, the NGram function is limited, since it works only within poetic lines (in our current code), or not at all with tokenized editions. I've had my CS students tokenize their text, and write new Ngram functions that use citable tokens, producing range-URNs for each. It is better. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#136?email_source=notifications&email_token=AAQ2FBVSGKOY6CRF5FI6ET3QRT3TXA5CNFSM4ISMJKG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC4TF6A#issuecomment-549008120>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAQ2FBWW7WJGNLM6OCFFNHLQRT3TXANCNFSM4ISMJKGQ> .

neelsmith added enhancement question labels Aug 30, 2019

neelsmith added this to the analytical corpus milestone Jun 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All operations depending on tokenizing should be eliminated #136

All operations depending on tokenizing should be eliminated #136

neelsmith commented Aug 30, 2019

Eumaeus commented Nov 2, 2019

neelsmith commented Nov 4, 2019 via email

All operations depending on tokenizing should be eliminated #136

All operations depending on tokenizing should be eliminated #136

Comments

neelsmith commented Aug 30, 2019

Eumaeus commented Nov 2, 2019

neelsmith commented Nov 4, 2019 via email