Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All operations depending on tokenizing should be eliminated #136

Open
neelsmith opened this issue Aug 30, 2019 · 2 comments
Open

All operations depending on tokenizing should be eliminated #136

neelsmith opened this issue Aug 30, 2019 · 2 comments

Comments

@neelsmith
Copy link
Contributor

They should be cleanly separated from the OHCO2 functionality, since tokenization depends on specifying an orthographic system.

OHCO2 should focus exclusively on citation-based operatios.

@Eumaeus
Copy link
Contributor

Eumaeus commented Nov 2, 2019

Agree!

For Iliads, the NGram function is limited, since it works only within poetic lines (in our current code), or not at all with tokenized editions. I've had my CS students tokenize their text, and write new Ngram functions that use citable tokens, producing range-URNs for each. It is better.

@neelsmith
Copy link
Contributor Author

neelsmith commented Nov 4, 2019 via email

@neelsmith neelsmith added this to the analytical corpus milestone Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants