Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

please make get_body_text more robust #1601

Open
josch opened this issue Apr 21, 2022 · 1 comment
Open

please make get_body_text more robust #1601

josch opened this issue Apr 21, 2022 · 1 comment

Comments

@josch
Copy link
Contributor

josch commented Apr 21, 2022

Hi,

especially when encountering malformed spam email, alot keeps quitting on me with tracebacks like this:

  File "/usr/share/alot/alot/widgets/search.py", line 187, in <genexpr>
    lastcontent = ' '.join(m.get_body_text() for m in msgs)
  File "/usr/share/alot/alot/db/message.py", line 287, in get_body_text
    return extract_body_part(self.get_mime_part())
  File "/usr/share/alot/alot/db/utils.py", line 497, in extract_body_part
    rendered_payload = render_part(
  File "/usr/share/alot/alot/db/utils.py", line 345, in render_part
    raw_payload = remove_cte(part)
  File "/usr/share/alot/alot/db/utils.py", line 440, in remove_cte
    bp = base64.b64decode(payload)
  File "/usr/lib/python3.9/base64.py", line 87, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

or

  File "/usr/share/alot/alot/widgets/search.py", line 187, in <genexpr>
    lastcontent = ' '.join(m.get_body_text() for m in msgs)
  File "/usr/share/alot/alot/db/message.py", line 287, in get_body_text
    return extract_body_part(self.get_mime_part())
  File "/usr/share/alot/alot/db/utils.py", line 497, in extract_body_part
    rendered_payload = render_part(
  File "/usr/share/alot/alot/db/utils.py", line 345, in render_part
    raw_payload = remove_cte(part)
  File "/usr/share/alot/alot/db/utils.py", line 436, in remove_cte
    bp = quopri.decodestring(payload.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 8114-8123: ordinal not in range(128)

I'm currently running alot with the following patch:

--- a/alot/db/message.py	2022-04-21 14:03:34.085067550 +0200
+++ b/alot/db/message.py	2022-04-21 12:17:26.415798127 +0200
@@ -284,7 +284,10 @@
 
     def get_body_text(self):
         """ returns bodystring extracted from this mail """
-        return extract_body_part(self.get_mime_part())
+        try:
+            return extract_body_part(self.get_mime_part())
+        except:
+            return "ERROR"
 
     def matches(self, querystring):
         """tests if this messages is in the resultset for `querystring`"""

This replaces the message body by ERROR which is fine because those messages are spam anyways and at least alot doesn't quit. If a messages makes alot quit, it's quite time consuming to find that one spam message that tripped it off. With this patch such messages can be quickly identified and marked as spam. Certainly something more descriptive than ERROR should be returned, maybe even a traceback that helps identifying the problem?

@kbingham
Copy link
Contributor

I think I've hit the same issue here too - and probably have something equally bad as a temporary fix.

Even if it's spam though - it would be better to prepare and display the text as much as possible, so I think it needs something more, perhaps within extract_body_part() ? (or get_mime_part() ?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants