Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PDF worker support #1071

Open
wants to merge 33 commits into
base: master
Choose a base branch
from
Open

Add PDF worker support #1071

wants to merge 33 commits into from

Conversation

mvasilak
Copy link
Contributor

@mvasilak mvasilak commented Feb 13, 2025

TODO:

  • Use recognizer with Retrieve Metadata action for standalone attachments.
  • Use recognizer for standalone files added within the app.
  • Show in UI that there is metadata retrieval in progress for affected items.

To be considered:

  • Add popup window with all metadata retrievals' progress, that will be minimized in the items bottom toolbar.
  • Allow canceling metadata retrievals.

@mvasilak mvasilak force-pushed the pdf-worker branch 2 times, most recently from 0a6f159 to 64b65a1 Compare February 19, 2025 10:59
@mvasilak mvasilak marked this pull request as ready for review February 19, 2025 11:04
@mvasilak
Copy link
Contributor Author

@michalrentka @dstillman this is ready for a first review / user testing. I did a very simple UI iteration, where items that are retrieving metadata, are indicating this by an animating subtitle, as it is empty for attachment type. If functionality is as desired, we can improve the UI as we see fit.

Copy link
Contributor

@michalrentka michalrentka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't go through everything yet, but there will probably be bigger changes, so I'll have to go through most of it again anyway.

@mvasilak mvasilak requested a review from michalrentka March 4, 2025 17:00
@mvasilak mvasilak requested a review from michalrentka March 5, 2025 13:25
Copy link
Contributor

@michalrentka michalrentka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to submit now. Although ideally I'd like to see some changes in RecognizerController, but they would require some further refactoring which might be unnecessary time kill at this point.

Anyway if I understand RecognizerController correctly, the order of events there is one big chain, which could be summed up as: enqueue pdf worker -> start remote recognition -> lookup identifiers -> process identified items (create new items).

So theoretically it could be simply written as something like:

pdfWorkerController.queue(work: PDFWorkerController.PDFWork(file: task.file, kind: .recognizer))
.filter({
    // filter out in-progress states
})
.flatMap({
    // Report errors if failed/cancelled, continue with extracted data
})
.flatMap({
    return apiClient.send(request: RecognizerRequest(parameters: data))
})
.flatMap({
    // extract identifiers
})
.flatMap({
    // This would require a new controller which would loop through all identifiers and report when they are all finished
    // It could be `Observable` and report multiple `onNext` events so that each event is handled separately if needed, but I guess it would be even better to gather all data first and then have one database write with multiple items
    identifierController.lookupIdentifiers()
})
.subscribe(
    // Process lookup results - create/report items
    // Report errors
)
.disposed(by: disposeBag)

It would be neater and much easier to understand quickly what's going on. Of course there would probably have to be some more filters/flatMaps for special cases, but I think it would still be even less verbose than it is currently, where you have to subscribe to and process each individual step.

Anyway this is not required at this point, just a thought for the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants