OCR result variations: saving vs. in-memory processing #89

wsyxbcl · 2024-06-25T02:39:17Z

wsyxbcl
Jun 25, 2024

I'm not sure if this is the right place to ask since it's not directly related to the OCRS itself, but I'd appreciate your advice.

I'm performing OCR on video files with the following workflow: extract the first frame, load it into memory, crop it, and then feed it to the OCR engine. However, I've noticed a discrepancy in recognition results when I save the cropped image to disk and load it again. Although it is the image byte difference under the hood, I didn't expect such a significant impact on the OCR results.

Any advice would be appreciated.

Here's a brief demonstration:
The initial workflow

let first_frame_data = extract_first_frame(media_path)?;
let img = image::load_from_memory(&first_frame_data)?;
let cropped_img = crop_image(&mut img.into_rgb8(), 0.5, 0.9, 0.5, 0.1);

let img_source = ImageSource::from_bytes(cropped_img.as_raw(), cropped_img.dimensions())?;

let ocr_input = engine.prepare_input(img_source)?;
let ocr_text = engine.get_text(&ocr_input)?;

Save and load

let first_frame_data = extract_first_frame(media_path)?;
let img = image::load_from_memory(&first_frame_data)?;
let cropped_img = crop_image(&mut img.into_rgb8(), 0.5, 0.9, 0.5, 0.1);

cropped_img.save("cropped_img_demo.jpg")?;
let img_data = image::open("cropped_img_demo.jpg")?.to_rgb8();
let img_source = ImageSource::from_bytes(img_data.as_raw(), img_data.dimensions())?;

let ocr_input = engine.prepare_input(img_source)?;
let ocr_text = engine.get_text(&ocr_input)?;

And some helper functions just in case you're curious:

pub fn extract_first_frame(video_path: PathBuf) -> anyhow::Result<Vec<u8>>{
    let mut child = Command::new("ffmpeg")
        .args([
            "-i", video_path.to_str().unwrap(),
            "-vf", "select=eq(n\\,0)",
            "-vframes", "1",
            "-f", "image2pipe",
            "-vcodec", "png",
            "-",
        ])
        .stdout(Stdio::piped())
        .stderr(Stdio::null()) // Suppress ffmpeg output
        .spawn()?;
    
    let mut output = child.stdout.take().context("Failed to open FFmpeg stdout")?;
    let mut buffer = Vec::new();
    output.read_to_end(&mut buffer)?;
    child.wait()?;
    Ok(buffer)
}

pub fn crop_image(image: &mut ImageBuffer<Rgb<u8>, Vec<u8>>, x_ratio: f32, y_ratio: f32, width_ratio: f32, height_ratio: f32) -> ImageBuffer<Rgb<u8>, Vec<u8>> {
    let (width, height) = image.dimensions();
    let x = (width as f32 * x_ratio) as u32;
    let y = (height as f32 * y_ratio) as u32;
    let width = (width as f32 * width_ratio) as u32;
    let height = (height as f32 * height_ratio) as u32;
    crop(image, x, y, width, height).to_image()
}

Answered by robertknight

Jun 25, 2024

Does it make a difference if you save as PNG rather than JPEG format? PNG is a lossless format whereas JPEG is lossy. If the model is uncertain about a particular character, so that different letters have very close probabilities, then it is possible that JPEG compression might cause the choice to flip between the different outputs.

If you can upload an example of the cropped image in both PNG and JPEG formats I might be able to confirm if that is what is happening.

View full answer

robertknight · 2024-06-25T08:25:21Z

robertknight
Jun 25, 2024
Maintainer

cropped_img.save("cropped_img.jpg")?;
let img_data = image::open("cropped_img_demo.jpg")?.to_rgb8();

This looks like you're loading from a different path than the one you're saving to. Is that a typo?

My advice here would be to save the images just before you call prepare_input and compare the results when you use an in-memory image vs. one you've loaded from disk. Are there any noticeable differences? If you run the saved images through the ocrs command-line tool you can also get it to visualize inputs to the recognition step via the --text-line-images flag.

4 replies

wsyxbcl Jun 25, 2024
Author

This looks like you're loading from a different path than the one you're saving to. Is that a typo?

Oops, that is a typo (and not in the source code)

to save the images just before you call prepare_input

I don't really understand, but here's a clearer version with the output:

let img_data = extract_first_frame(media_path)?;
let img = image::load_from_memory(&img_data)?;
let cropped_img = crop_image(&mut img.into_rgb8(), 0.5, 0.9, 0.5, 0.1);
cropped_img.save("cropped_img_demo.jpg")?;

let img_data_disk = image::open("cropped_img_demo.jpg")?.to_rgb8();
let img_source_disk = ImageSource::from_bytes(img_data_disk.as_raw(), img_data_disk.dimensions())?;
let img_source_memory = ImageSource::from_bytes(cropped_img.as_raw(), cropped_img.dimensions())?;

let ocr_input_disk = engine.prepare_input(img_source_disk)?;
let ocr_text_disk = engine.get_text(&ocr_input_disk)?;
let ocr_timestamp = extract_timestamp(ocr_text_disk.clone())?;
println!("OCR text from disk: {}", ocr_text_disk);

let ocr_input_memory = engine.prepare_input(img_source_memory)?;
let ocr_text_memory = engine.get_text(&ocr_input_memory)?;
println!("OCR text from memory: {}", ocr_text_memory);

Output:

OCR text from disk: 05/30/2020 00:52:18 -001C
G
1
OCR text from memory: 05/30?/2020 00:52:18 -001C
SA
1

ocrs-cli output matches "OCR text from disk", those two read the same image bytes anyway.

> ocrs ./cropped_img_demo.jpg
05/30/2020 00:52:18 -001C
G
1

robertknight Jun 25, 2024
Maintainer

Does it make a difference if you save as PNG rather than JPEG format? PNG is a lossless format whereas JPEG is lossy. If the model is uncertain about a particular character, so that different letters have very close probabilities, then it is possible that JPEG compression might cause the choice to flip between the different outputs.

If you can upload an example of the cropped image in both PNG and JPEG formats I might be able to confirm if that is what is happening.

Answer selected by wsyxbcl

wsyxbcl Jun 25, 2024
Author

Does it make a difference if you save as PNG rather than JPEG format?

Yes that is the exact reason of "the image byte difference under the hood".

The png output the same as in-memory image.

Interesting that this would cause the swing in the results though. BTW., do you have any ideas on how cropping might improve performance? For instance, consider an image with "123" in the middle versus another image with "123" in the middle but with a border background (two images have same ppi). We are only concerned with the recognition accuracy of the string "123", ignoring any potential background recognition errors.

an example of the cropped image in both PNG and JPEG formats

Have fun playing around

robertknight Jun 25, 2024
Maintainer

Interesting that this would cause the swing in the results though. BTW., do you have any ideas on how cropping might improve performance?

Have a look at the outputs generated by the --text-map, --text-mask and --text-line-images debug options in the ocrs CLI tool. The first two generate PNG masks showing the outputs of text detection and the last generates a lines folder containing the inputs to the line recognition process. These might give some clues about why the recognition is working better for one image than another.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR result variations: saving vs. in-memory processing #89

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

OCR result variations: saving vs. in-memory processing #89

wsyxbcl Jun 25, 2024

Replies: 1 comment · 4 replies

robertknight Jun 25, 2024 Maintainer

wsyxbcl Jun 25, 2024 Author

robertknight Jun 25, 2024 Maintainer

wsyxbcl Jun 25, 2024 Author

robertknight Jun 25, 2024 Maintainer

wsyxbcl
Jun 25, 2024

Replies: 1 comment 4 replies

robertknight
Jun 25, 2024
Maintainer

wsyxbcl Jun 25, 2024
Author

robertknight Jun 25, 2024
Maintainer

wsyxbcl Jun 25, 2024
Author

robertknight Jun 25, 2024
Maintainer