-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix html_as_text
appearing in every element metadata
#319
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -31,6 +31,7 @@ class LayoutElement(TextRegion): | |||
prob: Optional[float] = None | |||
image_path: Optional[str] = None | |||
parent: Optional[LayoutElement] = None | |||
text_as_html: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was complaining about the missing text_as_html
attribute when I added types to format_table_elements
method. Since we use that attribute then I think it should be included in a class. I was also wondering whether it should go inside this class or the parent TextRegion
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not following this... is this coming from your test text_as_html_attributes = [fe.text_as_html for fe in formatted_elements]
(maybe use if hasattribute(fe, "text_as_html"))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it was coming from models.chipper.UnstructuredChipperModel.format_table_elements
method after I added types to the function definition. If there is a reason for not including it in the class then it can stay as was, I will just use the hasattribute
like you proposed. I can see that it is used like that in unstructured
repo so maybe I should just stick with that? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to hasattr
This PR fixes the issue described here: Unstructured-IO/unstructured#2463
Now
text_as_html
will only be available for elements that are HTML strings (contain HTML tags)E.g. output for non html element
E.g. output for html element