Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EREGCSC-2252 Extract text from Outlook messages #1105

Merged
merged 17 commits into from
Dec 18, 2023
Merged

Conversation

cgodwin1
Copy link
Contributor

@cgodwin1 cgodwin1 commented Dec 12, 2023

Resolves #2252

Description-

This PR implements text extraction for msg, eml, and zip files.

This pull request changes...

  • msg files and eml files extract
  • attachments within emails are extracted (as long as the attachments are supported file types)
  • zip files extract (including subdirectories, as long as the files contained are supported file types)

Steps to manually verify this change...

  1. Ensure unit tests pass
  2. Upload an eml file with an attachment and verify it extracts
  3. Upload a msg file with an attachment and verify it extracts
  4. Upload a zip file with subdirectories and verify it extracts

Copy link

✨ See the Django Site in action

Copy link

✨ See the Django Site in action

Copy link

✨ See the Django Site in action

Copy link

✨ See the Django Site in action

@cgodwin1 cgodwin1 marked this pull request as ready for review December 14, 2023 21:46
Copy link

✨ See the Django Site in action

@peggles2 peggles2 added the Needs Review This PR needs a code review label Dec 15, 2023
Copy link
Contributor

@thwalker6 thwalker6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@thwalker6 thwalker6 added Approved and removed Needs Review This PR needs a code review labels Dec 15, 2023
Copy link

✨ See the Django Site in action

@cgodwin1 cgodwin1 merged commit 18db8b0 into main Dec 18, 2023
19 checks passed
peggles2 pushed a commit that referenced this pull request Jan 3, 2024
* Move strip characters to main func

* Add zip extraction

* Linting

* Add eml support

* EREGCSC-2267 sanitize file names

* linter fix

* remove unnecessary comment

* Fix eml recursive payload extractor

* Linting

* Add logging maybe

* Trying to get deployed text extractor to work

* Remove opt

* Remove opt again

* Update README

---------

Co-authored-by: Thomas Walker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants