Skip to content
This repository has been archived by the owner on Oct 24, 2024. It is now read-only.

Commit

Permalink
🐛 Fix Tesseract for ingests
Browse files Browse the repository at this point in the history
This commit will update both IIIF Print and Derivative Rodeo to fix
erorrs that caused the CreateDerivativesJob to fail and never reach the
Tesseract stage.  Also we update the logic in the FileSetIndexer to
ensure we don't get a nil value for `all_text_tsimv`.

Ref:
  - https://github.com/scientist-softserv/adventist-dl/issues/695
  - notch8/iiif_print#309
  - notch8/derivative_rodeo#74
  • Loading branch information
kirkkwang committed Dec 7, 2023
1 parent 1014beb commit 0685be9
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 6 deletions.
6 changes: 3 additions & 3 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ GIT

GIT
remote: https://github.com/scientist-softserv/derivative_rodeo.git
revision: e860b62effa0d29f74515fec4056eb5acb009d69
revision: f8e2173fc907a2f24db37479679a4a84c840e00c
branch: main
specs:
derivative-rodeo (0.5.2)
derivative-rodeo (0.5.3)
activesupport (>= 5)
aws-sdk-s3
aws-sdk-sqs
Expand All @@ -31,7 +31,7 @@ GIT

GIT
remote: https://github.com/scientist-softserv/iiif_print.git
revision: 9f4b13098ec843f0f8c51c4bdfd0cdcc417c526d
revision: e476998ab453afabf1bcb8afa059b4416af9b705
branch: main
specs:
iiif_print (1.0.0)
Expand Down
5 changes: 2 additions & 3 deletions app/indexers/hyrax/file_set_indexer_decorator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ module Hyrax
module FileSetIndexerDecorator
def generate_solr_document
return super unless Flipflop.default_pdf_viewer?
return super unless object.pdf?
return super unless object.original_file&.content.is_a? String

super.tap do |solr_doc|
solr_doc['all_text_timv'] = solr_doc['all_text_tsimv'] = pdf_text
Expand All @@ -15,9 +17,6 @@ def generate_solr_document
private

def pdf_text
return unless object.pdf?
return unless object.original_file&.content.is_a? String

text = IO.popen(['pdftotext', '-', '-'], 'r+b') do |pdftotext|
pdftotext.write(object.original_file.content)
pdftotext.close_write
Expand Down

0 comments on commit 0685be9

Please sign in to comment.