OCR result variations: saving vs. in-memory processing #89
-
I'm not sure if this is the right place to ask since it's not directly related to the OCRS itself, but I'd appreciate your advice. I'm performing OCR on video files with the following workflow: extract the first frame, load it into memory, crop it, and then feed it to the OCR engine. However, I've noticed a discrepancy in recognition results when I save the cropped image to disk and load it again. Although it is the image byte difference under the hood, I didn't expect such a significant impact on the OCR results. Any advice would be appreciated. Here's a brief demonstration: let first_frame_data = extract_first_frame(media_path)?;
let img = image::load_from_memory(&first_frame_data)?;
let cropped_img = crop_image(&mut img.into_rgb8(), 0.5, 0.9, 0.5, 0.1);
let img_source = ImageSource::from_bytes(cropped_img.as_raw(), cropped_img.dimensions())?;
let ocr_input = engine.prepare_input(img_source)?;
let ocr_text = engine.get_text(&ocr_input)?; Save and load let first_frame_data = extract_first_frame(media_path)?;
let img = image::load_from_memory(&first_frame_data)?;
let cropped_img = crop_image(&mut img.into_rgb8(), 0.5, 0.9, 0.5, 0.1);
cropped_img.save("cropped_img_demo.jpg")?;
let img_data = image::open("cropped_img_demo.jpg")?.to_rgb8();
let img_source = ImageSource::from_bytes(img_data.as_raw(), img_data.dimensions())?;
let ocr_input = engine.prepare_input(img_source)?;
let ocr_text = engine.get_text(&ocr_input)?; And some helper functions just in case you're curious:pub fn extract_first_frame(video_path: PathBuf) -> anyhow::Result<Vec<u8>>{
let mut child = Command::new("ffmpeg")
.args([
"-i", video_path.to_str().unwrap(),
"-vf", "select=eq(n\\,0)",
"-vframes", "1",
"-f", "image2pipe",
"-vcodec", "png",
"-",
])
.stdout(Stdio::piped())
.stderr(Stdio::null()) // Suppress ffmpeg output
.spawn()?;
let mut output = child.stdout.take().context("Failed to open FFmpeg stdout")?;
let mut buffer = Vec::new();
output.read_to_end(&mut buffer)?;
child.wait()?;
Ok(buffer)
}
pub fn crop_image(image: &mut ImageBuffer<Rgb<u8>, Vec<u8>>, x_ratio: f32, y_ratio: f32, width_ratio: f32, height_ratio: f32) -> ImageBuffer<Rgb<u8>, Vec<u8>> {
let (width, height) = image.dimensions();
let x = (width as f32 * x_ratio) as u32;
let y = (height as f32 * y_ratio) as u32;
let width = (width as f32 * width_ratio) as u32;
let height = (height as f32 * height_ratio) as u32;
crop(image, x, y, width, height).to_image()
} |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
cropped_img.save("cropped_img.jpg")?;
let img_data = image::open("cropped_img_demo.jpg")?.to_rgb8(); This looks like you're loading from a different path than the one you're saving to. Is that a typo? My advice here would be to save the images just before you call |
Beta Was this translation helpful? Give feedback.
Does it make a difference if you save as PNG rather than JPEG format? PNG is a lossless format whereas JPEG is lossy. If the model is uncertain about a particular character, so that different letters have very close probabilities, then it is possible that JPEG compression might cause the choice to flip between the different outputs.
If you can upload an example of the cropped image in both PNG and JPEG formats I might be able to confirm if that is what is happening.