Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HTML parsing features #11
base: main
Are you sure you want to change the base?
Add HTML parsing features #11
Changes from 21 commits
67cd354
8c15b01
6a8cb4d
b8af7e5
b8bedbf
bb84284
eb121c0
92d9136
4538750
cd6e190
60c987f
f7616c0
e818b5c
b1f83c7
29f7d0a
eeb5f0a
fcdd93d
2d77cdc
e0f2540
bc61a6a
b3a57be
62fdc48
3dd8fb5
b2bc8bb
2e3278d
55badec
c446d6f
1b5b3e0
2ee59bf
891c25f
2b77947
d5b6869
894b79f
590ac97
4468c8e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to support full HTML docs and HTML fragments, then this method should:
$html
is a full DOM page, thenThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in PHP partial HTML is more common than a full document. Except you are implementing it as some kind of middleware to parse the whole HTML response.
But in general it should support both if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed this by using the more general HTML parser, then adding a step where we check if the input HTML is a Page/Doc and selecting the
body
from that. As I was already replacing based on a HTML fragment (the body), supporting fragments as input was rather simple.Large diffs are not rendered by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the "incorrect" method to set charset to UTF8.
Unfortunately it will not be respected by
DOMDocument
and Emoji will become HTML entities.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be fixed before merge - we should just wrap all fragments in our own HTML page that uses the correct charset meta tag. This will ensure they are encoded correctly and then we can capture the fragment from our "template".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the "correct" meta tag to set charset to UTF8.
This will allow
DOMDocument
to preserve the raw Emoji in the document.