Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Intercept incompatible (not UTF-8) text on upload #123

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

nboyse
Copy link
Collaborator

@nboyse nboyse commented Feb 27, 2025

Context

At the moment, if a user uploads a document that contains text unstructured cannot interpret (not utf-8 format), then the user document upload will fail, they’ll flag it to us, we’ll investigate and provide them with a fixed doc. This should just be integrated in the codebase to remove the necessity of manually fixing it.

Changes proposed in this pull request

Amending the upload view to assess the document pre processing for its compatibility with unstructured and converting it if required.

Guidance to review

Do you agree with the approach? Any improvements to be made?

Relevant links

https://uktrade.atlassian.net/jira/software/projects/REDBOX/boards/558?selectedIssue=REDBOX-609

Things to check

  • I have added any new ENV vars in all deployed environments
  • I have tested any code added or changed
  • I have run integration tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant