You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some sections of the text are too long to be held in the context window of LLMs. Here's a list of the sections (the encodings of Plutarch are a particularly bad offender):
@npmccallum
The Isocrates appears incorrect to me (sections were not encoded) but this was very early conversion work and will be revisited as part of the workflow.
For the Plutarch, smaller sections (excepting if there are errors lurking in here) would require another level of CTS structure to be imposed on the texts. That would be an arbitrary imposition on the standard structure. (I also do not believe another layer is possible in the case of works with 3 levels already.)
The plain text versions of the texts might be a better option depending on the type of work being done? I know others have done post-processing text chunking as needed using those versions.
As CTS referencing permits designating any span of text for referencing smaller portions or subsets of the works, we haven't been adding more containers top-down as a general practice excepting some more obscure works here and there.
This is really beyond my role and would be something others should decide.
Some sections of the text are too long to be held in the context window of LLMs. Here's a list of the sections (the encodings of Plutarch are a particularly bad offender):
The worst, by far, is this one:
http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A2008.01.0301%3Asection%3D22
Would it be possible to break up these sections into smaller sizes?
The text was updated successfully, but these errors were encountered: