An Interpretable Deep Learning Approach for Morphological Script Type Analysis (IWCP 2024)
Malamatenia Vlachou Efstathiou, Ioannis Siglidis, Dominique Stutzmann and Mathieu Aubry-
For minimal inference on pre-trained and finetuned models without having to install, we provide a standalone Colab
notebook, available also as inference.ipynb.
A figures.ipynb notebook is provided to reproduce the paper results and graphs. You'll need to download & extract and in the base folder first.
conda create --name ltw pytorch==2.1.1 torchvision==0.15.0 cudatoolkit=11.3 -c pytorch -c conda-forge
conda activate ltw
python -m pip install -r requirements.txt
In this case you'll need to download & extract only the
python scripts/ iwcp_south_north.yaml
python scripts/ -i runs/iwcp_south_north/train/ -o runs/iwcp_south_north/finetune/ --mode g_theta --max_steps 2500 --invert_sprites --script Northern_Textualis Southern_Textualis -a datasets/iwcp_south_north/annotation.json -d datasets/iwcp_south_north/ --split train
python scripts/ -i runs/iwcp_south_north/train/ -o runs/iwcp_south_north/finetune/ --mode g_theta --max_steps 2500 --invert_sprites -a datasets/iwcp_south_north/annotation.json -d datasets/iwcp_south_north/ --split all
Create your config files:
sep: '' # How the character separator is denoted in the annotation.
space: ' ' # How the space is denoted in the annotation.
For its structure, see the config file provided for our experiment.
Create your dataset folder:
├── annotation.json
└── images
├── <image_id>.png
└── ...
The annotation.json file should be a dictionary with entries of the form:
"<image_id>": {
"split": "train", # {"train", "val", "test"} - "val" is ignored in the unsupervised case.
"label": "A beautiful calico cat." # The text that corresponds to this line.
"script": "Times_New_Roman" # (optional) Corresponds to the script type of the image
You can completely ignore the annotation.json file in the case of unsupervised training without evaluation.
Train and finetune
python scripts/ <CONFIG_NAME>.yaml
- On a group of documents defined by their "script" type with:
python scripts/ -i runs/<MODEL_PATH> -o <OUTPUT_PATH> --mode g_theta --max_steps <int> --invert_sprites --script '<SCRIPT_NAME>' -a <DATASET_PATH>/annotation.json -d <DATASET_PATH> --split <train or all>
- On individual documents with:
python scripts/ -i runs/<MODEL_PATH> -o <OUTPUT_PATH> --mode g_theta --max_steps <int> --invert_sprites -a <DATASET_PATH>/annotation.json -d <DATASET_PATH> --split <train or all>
[!NOTE] To ensure a consistent set of characters regardless of the annotation source for our analysis, we implement internally choco-mufin, using a disambiguation-table.csv to normalize or exclude characters from the annotations. The current configuration suppresses allographs and edition signs (e.g., modern punctuation) for a graphetic result.
title = {An Interpretable Deep Learning Approach for Morphological Script Type Analysis},
author = {Vlachou-Efstathiou, Malamatenia and Siglidis, Ioannis and Stutzann, Dominique and Aubry, Mathieu},
publisher = {Document Analysis and Recognition--ICDAR 2021 Workshops: Athens, Greece, August 30--September 4, 2023, Proceedings},
year = {2024},
Check out also: Siglidis, I., Gonthier, N., Gaubil, J., Monnier, T., & Aubry, M. (2023). The Learnable Typewriter: A Generative Approach to Text Analysis.
This study was supported by the CNRS through MITI and the 80|Prime program (CrEMe Caractérisation des écritures médiévales) , and by the European Research Council (ERC project DISCOVER, number 101076028). We thank Ségolène Albouy, Raphaël Baena, Sonat Baltacı, Syrine Kalleli, and Elliot Vincent for valuable feedback on the paper.