Skip to content

An extensible viewer for OCR-D mets.xml files

License

Notifications You must be signed in to change notification settings

bertsky/browse-ocrd

This branch is 238 commits behind hnesk/browse-ocrd:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

650adcf · Nov 5, 2020
Nov 1, 2020
Aug 1, 2020
Oct 20, 2020
Nov 3, 2020
Aug 21, 2020
Nov 2, 2020
Nov 1, 2020
Nov 2, 2020
Jul 20, 2020
Jul 21, 2020
Nov 3, 2020
Nov 5, 2020
Jul 23, 2020
Nov 1, 2020
Nov 1, 2020
Nov 2, 2020
Nov 3, 2020

Repository files navigation

OCR-D Browser

An extensible viewer for OCR-D mets.xml files

Screenshot

OCRD Browser with two image and one xml view

Installation on Ubuntu 18.04

sudo make deps-ubuntu
pip install browse-ocrd

Usage

browse-ocrd ./path/to/mets.xml # or open interactively

Features

  • Browse fileGrps and pages, arranging views next to each other for comparison
  • Show original or derived images (AlternativeImage on any level of the structural hierarchy)
  • Show multiple images at once for different pages (horizontally) or different segments (vertically), zooming freely
  • Show raw PAGE-XML with syntax highlighting, open with PageViewer
  • Show concatenated PAGE-XML text annotation
  • Show rendered HTML comparison from dinglehopper evaluations

Configuration

Configuration file locations

At startup the following directories a searched for a config file named ocrd-browser.conf

# directories and their default values under Ubuntu 20.04
GLib.get_system_config_dirs()  # '/etc/xdg/xdg-ubuntu/ocrd-browser.conf', '/etc/xdg/ocrd-browser.conf'
GLib.get_user_config_dir()     # '/home/jk/.config/ocrd-browser.conf'  
os.getcwd()                    # './ocrd-browser.conf'

Configuration file syntax

The ocrd-browser.conf file is an ini-file with the following keys:

[FileGroups]
# Preferred fileGrp names for thumbnail display in the Page Browser 
# Comma seperated list of regular expressions
preferredImages = OCR-D-IMG, OCR-D-IMG.*, ORIGINAL

# Each Tool has a section header [Tool XYZ]
# At the moment the only defined tool is "PageViewer"  
[Tool PageViewer]
# (ba)sh commandline to execute with placeholders  
commandline = /usr/bin/java -jar /home/jk/bin/JPageViewer/JPageViewer.jar --resolve-dir {workspace.directory} {file.path.absolute}

The commandline string will be used as a python format string with the keyword arguments:

  • workspace : The current ocrd.Workspace, all properties get shell escaped (by shlex.quote) automatically.
  • file : The current ocrd_models.OcrdFile, all properties get shell escaped (by shlex.quote) automatically, also there is an additional property path with the properties absolute and relative, so {file.path.absolute} will be replaced by the shell quoted absolute path of the file.

Note: You can get PRImA's PageViewer at Github.

About

An extensible viewer for OCR-D mets.xml files

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.5%
  • Makefile 1.1%
  • Other 0.4%