-
Notifications
You must be signed in to change notification settings - Fork 5
Glossary
Glossary of terms from the domain of image processing/OCR as used within the OCR-D framework
A block is a polygon inside a page.
The semantics or function of a block such as heading, page number, column, print space...
TODO
See Glyph
See TextLine
See Region
See Glyph
A TextLine is a block of text without line breaks.
A word is a sequence of glyphs not containing any word-bounding whitespace.
TODO
TODO
TODO
TODO
TODO
TODO
The software repository contains all document analysis algorithms developed during the project including tests. It will also contain the documentation and installation instructions for deploying a document analysis workflow.
The research data repository contains the results of all activities during document analysis. At least it contains the end results of every processed document and its full provenance. The research data repository must be available locally.
Contains all trained (OCR) models for document analysis. The model repository must be available locally. Ideally, a publicly available model repository will be developed.