-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newlines not collapsed from HTML #31
Comments
Hi! When rendered with markdown
could render as a single line: https://www.markdownguide.org/basic-syntax#line-breaks and https://github.github.com/gfm/#soft-line-breaks It is up to the markdown parser to handle it the way it wants:
A quick test here shows that the GitHub renderer decides that it should be a hard linebreak: continuous I am not sure if we are up to spec here, but it seems like we are. I'm open to all feedback on this issue! Best, |
I'm not very familiar with the spec here; maybe a switch to trigger the line break behaviour would be most fitting. |
I'll look into that. On a related note, did you try the source of the generated docs? It's |
Related is how headings and paragraphs are handled. Example 1md("<h2>Some Heading</h2>\n<p>Some text</p>", heading_style='ATX') Expected: ## Some Heading\n\nSome text\n\n Actual: ## Some Heading\n\n\nSome text\n\n Example 2md("<p>Paragraph 1</p>\n<p>Paragraph 2</p>") Expected: Paragraph 1\n\nParagraph 2\n\n Actual: Paragraph 1\n\n\nParagraph 2\n\n |
I ran into similar issues and now use it like this in my local Feediverse clone:
This somewhat normalises newlines in the input before handing it to markdownify (assuming no Update: the snippet above breaks whitespace in |
@Numerlor - the reflowing enhancement from #169 will allow you to set a wrap width of print(
repr(
markdownify.markdownify("""
continuous
line of
text
""", wrap=True, wrap_width=None)
)
) which results in this (I see the leading/trailing spaces do survive though):
Would you consider this issue fixed as a result? |
After 97c78ef was merged, the newlines in the parsed HTML are no longer collapsed into normal spaces, resulting in erroneous line breaks in the output.
In 0.6.1 the above code outputs
'continuous line of text'
like it'd look like when rendered in a browser,while in 0.6.3 it preserves the newlines and outputs
'continuous\nline of\ntext'
This causes issues when the html is wrapped to some length or linebreaks are used to separate out tags, for example
The text was updated successfully, but these errors were encountered: