OCR Training Data Collection and Annotation Tutorial

This guide will walk you through the process of collecting and annotating OCR training data using the "Training Dojo" tool.

Collecting OCR Training Data

To start collecting OCR training data, click the button in the top right corner of the main interface.
The system will begin capturing data for OCR training.

Managing OCR Training Data

Once you've collected some data, go to the File menu.
Click on "OCR Training Data Settings" to access data management options.

In the settings window, you'll find:
- The folder where data is stored
- A button to open this folder directly
- A button to save the collected data as a ZIP file
- An option to set the maximum size of saved data in MB

Using the Training Dojo Tool

In the OCR Training Data Settings, click the button to open the Training Dojo tool.
The tool will display all collected files for annotation.

Interface Overview

Left side: List of all files
- Files with approved annotations have a checkmark
Right side: Current image and annotation interface
- Line edit field for entering or approving annotations
- Approved annotations are highlighted in green

Annotation Process

For each image, enter the correct text in the line edit field.

Press Enter to approve the annotation and move to the next image.
Use Up/Down arrow keys to navigate through the list.
Use Ctrl+Down and Ctrl+Up to jump to the next unapproved image.
Click the filter button to show only unapproved images.

Keyboard Shortcuts

Enter: Approve annotation and move to next image
Up/Down Arrows: Navigate through the list
Ctrl+Down: Jump to next unapproved image
Ctrl+Up: Jump to previous unapproved image

Exporting Annotated Data

Once you've finished annotating, exit the Training Dojo tool.
In the OCR Training Data Settings:
- Click the button to save data as a ZIP file
- Choose a location to save the file
Send the exported ZIP file to our team at support@scoresight.live for processing. We will be in touch with a new OCR model.

Best Practices

Annotate regularly to maintain a manageable workload.
Use keyboard shortcuts for faster navigation and annotation.
Double-check your annotations before approving.
Use the filter option to focus on unapproved images when nearing completion.
Set a reasonable max size for saved data to prevent overwhelming file sizes.

By following this process, you'll contribute valuable training data to improve the OCR system's accuracy and performance. Thank you for your efforts in enhancing our OCR capabilities!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_annotation.md

data_annotation.md

OCR Training Data Collection and Annotation Tutorial

Collecting OCR Training Data

Managing OCR Training Data

Using the Training Dojo Tool

Interface Overview

Annotation Process

Keyboard Shortcuts

Exporting Annotated Data

Best Practices

Files

data_annotation.md

Latest commit

History

data_annotation.md

File metadata and controls

OCR Training Data Collection and Annotation Tutorial

Collecting OCR Training Data

Managing OCR Training Data

Using the Training Dojo Tool

Interface Overview

Annotation Process

Keyboard Shortcuts

Exporting Annotated Data

Best Practices