Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix penultimate sections of multiline fragments being ignored by the Markdown renderer #201

Merged
merged 2 commits into from
Nov 26, 2023

Conversation

Lordfirespeed
Copy link
Contributor

When using the Markdown renderer, if a fragment containing newline characters is encountered, it is split by \n and each line is yielded in turn.

 for fragment in fragments:
    if "\n" in fragment.text:
        lines = fragment.text.split("\n")
        yield current_line + lines[0]
        for inner_line in lines[1:-2]:
            yield inner_line
        current_line = lines[-1]
    else:
        current_line += fragment.text

However, with a quick check in the python REPL we find:

>>> "abcde"[0] 
'a'
>>> "abcde"[1:-2]
'bc'
>>> "abcde"[-1]
'e'

Notably, d, the penultimate element, is being skipped.
This can be fixed by changing the bounds of the 'inner' slice from [1:-2] to [1:-1].

@Lordfirespeed Lordfirespeed changed the title Fix penultimate sections of multiline fragments being ignored Fix penultimate sections of multiline fragments being ignored by the Markdown renderer Nov 21, 2023
@pbodnar pbodnar added the bug label Nov 24, 2023
@coveralls
Copy link

coveralls commented Nov 24, 2023

Coverage Status

coverage: 94.113% (+0.04%) from 94.073%
when pulling 7aaad0d on Lordfirespeed:patch-1
into 70e8e8d on miyuchina:master.

@pbodnar
Copy link
Collaborator

pbodnar commented Nov 24, 2023

@Lordfirespeed, good catch, thank you. Would you mind to also add a unit test on this? I'm a little bit struggling to find a use case where a Fragment would be created with text including \n, i.e. when the problematic part of code would be called "in real life". (Or maybe @anderskaplan as the author of MarkdownRenderer will know? :))

@Lordfirespeed
Copy link
Contributor Author

Lordfirespeed commented Nov 24, 2023

My case is a bit of an odd one, apologies!

I wanted to parse LaTeX so that I could use regex inside latex blocks to replace command sequences e.g.

  • \R -> \mathbb{R}
  • \rArr -> \Rightarrow

Because I was trying to migrate my notes from KaTeX to MathJax.

I 'wrote' this very basic class:

from mistletoe.markdown_renderer import MarkdownRenderer
from mistletoe.latex_token import Math
from itertools import chain


class MathMarkdownRenderer(MarkdownRenderer):
    def __init__(self, *extras, **kwargs):
        super().__init__(
            *chain(
                (Math,),
                extras,
            ),
            **kwargs
        )

    def render_math(self, token):
        return self.render_raw_text(token)

And used it like so:

from mistletoe import Document as MarkdownDocument
from math_markdown_renderer import MathMarkdownRenderer

with MathMarkdownRenderer() as renderer:
    with open(file, "r") as infile:
        document = MarkdownDocument(infile)
    # I made changes to the parsed document here
    render = renderer.render(document)

I found that multiline LaTeX would almost always contain newline characters, e.g. the following:

$$
a=b
$$

would yield a Math token with the following content:

"$$\na=b\n$$"

So, in short, yes, I can add a unit test; but not without also adding functionality to the library 😅

@anderskaplan
Copy link
Contributor

Good catch!

The easiest way to trigger the bug in a unit test would be to modify one of the existing tests. For example:

    def test_images_and_links(self):
        input = [
            "[a link](#url (title))\n",
            "[another link](<url-in-angle-brackets> '*emphasized\n",
            "multi-line\n",
            "title*')\n",
            '![an \\[*image*\\], escapes and emphasis](#url "title")\n',
            "<http://auto.link>\n",
        ]
        output = self.roundtrip(input)
        self.assertEqual(output, "".join(input))

(The only change is that the line "multi-line" has been added. The bug will only trigger when there are more than two rows.)

@pbodnar
Copy link
Collaborator

pbodnar commented Nov 25, 2023

@anderskaplan, thanks a lot, great to have you back! :)

@Lordfirespeed, so I guess you can just add "multi-line\n", to the existing unit test as Anders suggests, and that's it. I would merge the PR afterwards.

@Lordfirespeed
Copy link
Contributor Author

Lordfirespeed commented Nov 25, 2023

@anderskaplan, thanks a lot, great to have you back! :)

@Lordfirespeed, so I guess you can just add "multi-line\n", to the existing unit test as Anders suggests, and that's it. I would merge the PR afterwards.

I'm not sure that particular example would cause the bug to happen as [0] and [-1] are both handled specifically, so the multiline fragment needs to contain at least two newlines - but I get what you mean! I'll get on that.

Edit: Oh! sorry, never mind. I was confused.

@Lordfirespeed
Copy link
Contributor Author

All done! I have not confirmed the unit test functions as intended as I don't have a codespace open at the moment.

@pbodnar pbodnar merged commit 5f4c550 into miyuchina:master Nov 26, 2023
8 checks passed
@pbodnar
Copy link
Collaborator

pbodnar commented Nov 26, 2023

Thank you!

@pbodnar pbodnar added this to the 1.3.0 milestone Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants