Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempted conversion of markdown shows HTML-style comments and links get broken #96

Open
wnm3 opened this issue Oct 17, 2024 · 0 comments

Comments

@wnm3
Copy link

wnm3 commented Oct 17, 2024

I'm attaching a markdown file that I converted by reading the lines and attempting to call the reshaper method to switch from ltr -> rtl. Lines like below get broken:

<!-- <map "id="FPMap2" "> -->
<!-- </map "id="FPMap2" "> -->

![logos.jpg](https://www.rewity.com/forum/rewity/images/logos.jpg)

become:

<!-- <map "id="FPMap2" "> --
<!-- </map "id="FPMap2" "> --

![logos.jpg](https://www.rewity.com/forum/rewity/images/logos.jpg

It also didn't reverse the order of the table cells (fenced using pipe characters) as shown below on the left with the result using arabic-reshaper and simply using

and
to surround the ltr markdown content on the right:
image

younes.md

Code used to create the output:

import arabic_reshaper
from RAG_Data_Pipeline.utility.DPUtils import DPUtils

lines = DPUtils.loadTextFile(
    "./tests/resources/ragdatapipeline/markdowngenerator/arabic/younes.md"
)
rtl_lines = ""
for line in lines:
    rtl_line = arabic_reshaper.reshape(line[: len(line) - 1])  # remove newline
    rtl_lines += rtl_line + "\n"
DPUtils.saveTextFile(
    "./tests/resources/ragdatapipeline/markdowngenerator/arabic/rtl_younes.md",
    rtl_lines,
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant