-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Function to get rid of invalid html #19
Comments
I don't remember offhand. But my recommendation for sanitizing HTML is to parse and render with html-conduit. In practice, that comment is probably out of date, since HTML5 does standardize HTML parsing rules. But rendering something more idiomatic is probably a good move. |
It does not appear that I actually found some valid html5 that becomes invalid when round-tripped through Specifically anything that contains something like Yeah given all of HTML5's standardized error correction we may just chance it with A fully html5-compliant parser would definitely be ideal though. |
Seems like just about all Haskell html5 parsers have the same |
Made a separate issue #20 to address html5 parsing issues |
I am looking into rendering some user-submitted html, so unsurprisingly I'm planning on using this library to sanitize it.
However:
This makes me pretty nervous. Preventing invalid HTML from breaking the rest of the page is pretty essential.
How difficult would it be to add a functionality that would get rid of or fix invalid html to avoid this problem? Or at least invalid html that causes problems in practice.
Are there any known examples of html that is problematic even after sanitization? I was hoping that a parent div with some
overflow: hidden
would be sufficient.The text was updated successfully, but these errors were encountered: