v2.0.0 2020-10-29
Second major release of manuscript-object with significant changes to the core object code.
- All of the central code is now in two files, with one auxiliary file (utils.py).
- update.py usage remains unchanged
- The titular "manuscript object" is now a class called Manuscript inside a file called manuscript.py. Each entry is turned into an object of the Entry class inside a file called entry.py.
Some highlights:
- much faster (update_entries() is, as before, the longest step)
- increased verbosity during generation
- manuscript and entry modules are importable and interactable
- Manuscript and Entry classes control their own behavior
- e.g. generating and updating derivatives happens inside the Manuscript class
- update.py works as before, but simply calls the update methods inside Manuscript
- this means if you want to generate the derivative output in a Python shell and interact with it as a string or table, you can do so by importing manuscript and running one of the derivative generation methods
- derivative generation takes place in 2 steps: generation and then writing. This enables checks for correctness before writing to disk
- All xml is converted to lxml.etree objects for easier and more consistent parsing
- text renditions of editorial tags are created using an XSLT stylesheet
- this stylesheet takes parameters, so if you don't want to render del tags as <-TEXT->, for example, you can just set that to "false()"
- As possible, functions are reused rather than duplicated in order to facilitate bug checks, e.g., there's only one function which tells you how to convert a string to an lxml.etree Element.
- the Entry class is very flexible:
- there are different methods to take a valid lxml.etree Element, a string of well-formed XML, or a filepath to a valid XML file
- folio and identity arguments are optional
- only one version of each entry is given at a time (handling tc, tcn, and tl versions is done by the Manuscript object, not the Entry)
- if it is desired to test or inspect the contents of a txt or xml file -instead of manually opening a file - it can simply be loaded as an Entry object in a Python shell and look at the text and the properties that way
To do:
- implementing more automated spot- and unit-tests
- sophisticated search function for Manuscript
- type annotations are useful and correct (e.g., specificity of "xml") - see use in
https://github.com/cu-mkp/manuscript-object/blob/94d158d814bf9a62071a11845a9b2938d561ab3e/entry.py#L10 - optional arguments to Manuscript specifying which entries you want to generate
- function to inspect the context around a particular term
- visualization engine
- thesaurus
see also any open issues: https://github.com/cu-mkp/manuscript-object/issues