Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applies a workaround for parser bug dealing with inline comments. #72

Closed
wants to merge 1 commit into from

Conversation

dyoo
Copy link
Collaborator

@dyoo dyoo commented Sep 5, 2023

This is some continuation of work for #63 for implementing special behavior for HTML comments. This PR splits off inline comments from content that immediately follows them, as a workaround for pulldown-cmark/pulldown-cmark#712.

@dyoo dyoo force-pushed the parser-workaround branch 3 times, most recently from c9c53cb to 31c57de Compare September 5, 2023 17:49
This tries to split off inline comments from content that immediately
follows them.  This works around bug
pulldown-cmark/pulldown-cmark#712.
@dyoo dyoo force-pushed the parser-workaround branch from 31c57de to 49b7015 Compare September 5, 2023 17:58
@mgeisler
Copy link
Collaborator

mgeisler commented Sep 7, 2023

Hi @dyoo, thanks for the PR! We could patch this on our side, though I wonder if you've looked into fixing it upstream? I don't really know the internals of pulldown_cmark very well, but it would be super if you could spend an afternoon looking into it and perhaps improve this for everybody 🙂

I think the corner cases affected by this problem (in the skipping code) might be small enough that we don't have to work around them. I just tested with variations of the unit test:

- foo
<!-- mdbook-xgettext: skip -->
- bar
- baz

works as expected and skips only - bar. I would be okay to say that

- foo
- <!-- mdbook-xgettext: skip --> bar
- baz

just isn't supported right now. Especially because the consequences are small and easy to see: translators will see when they suddenly lose access to the rest of the list because both bar and baz are no longer extracted.

@dyoo
Copy link
Collaborator Author

dyoo commented Sep 7, 2023

I did a little more analysis of the situation, and what is happening with the parser is following the CommonMark spec, so there is technically not a bug on their end.

The problematic situation is that, by starting with an HTML comment, we're creating an HTML block. According to the specification, that block lasts till the end of the line. The content of the HTML block is passed, unescaped, into the rendered content.

So the usage of block-level HTML comments is a little imperfect, and we have to be careful to be aware of this end-of-line behavior. That does mean that if we're going to use <!-- i18n ... -> as our escape hatch, we do need to say something in usage notes to the effect that it must be the only thing on a line: otherwise, unintuitive things can happen.

I think we can close this bug. I'll extract out some of the code refactorings I did to make the code a little nicer to read and do that as a separate PR.

@dyoo dyoo closed this Sep 7, 2023
@mgeisler
Copy link
Collaborator

So the usage of block-level HTML comments is a little imperfect, and we have to be careful to be aware of this end-of-line behavior. That does mean that if we're going to use <!-- i18n ... -> as our escape hatch, we do need to say something in usage notes to the effect that it must be the only thing on a line: otherwise, unintuitive things can happen.

Thanks for investigating this!

As you say, we should explain this eventually if/when people run into problems by trying to skip things in complex situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants