You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<html>
<body>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
</p>
</body>
</html>
When HTML with indented block elements is converted, the indent causes incorrect formatting in the output.
Converting this indented <p> element:
from markdownify import markdownify as md
print(repr(md("""\
<p>This is
some text.</p>
""")))
produces this:
' This is\n some text.\n\n\n'
^ ^^^
It happens for non-<p> elements too. Converting these indented <h1> elements with the UNDERLINED and ATX heading formats:
As a workaround, we iterate through all text object descendants in all text-containing block elements (<p>, <entry>, <li>, etc.) and convert newlines to spaces, but this is expensive on large document sets.
The text was updated successfully, but these errors were encountered:
chrispy-snps
changed the title
Indent in <p> causes indent in Markdown output
Indent before HTML block elements causes indent in Markdown output
Nov 26, 2023
In our HTML, block elements are indented:
When HTML with indented block elements is converted, the indent causes incorrect formatting in the output.
Converting this indented
<p>
element:produces this:
It happens for non-
<p>
elements too. Converting these indented<h1>
elements with theUNDERLINED
andATX
heading formats:produces this:
As a workaround, we iterate through all text object descendants in all text-containing block elements (
<p>
,<entry>
,<li>
, etc.) and convert newlines to spaces, but this is expensive on large document sets.Possibly related to #31.
The text was updated successfully, but these errors were encountered: