-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add table 'rowspan' support #121
Comments
I had this issue as well, and I was able to get the desired behavior with a customization. Requires:
import pandas as pd
class MyMarkdownConverter(MarkdownConverter):
"""A custom MarkdownConverter.
This class is a subclass of the MarkdownConverter class from the markdownify library.
It overrides the convert_table, convert_th, convert_tr, convert_td, convert_thead, and convert_tbody methods
to provide a No-Op for the <th>, <tr>, <td>, <thead>, and <tbody> tags, respectively.
For <table> tags, it converts the table to a DataFrame and then converts the DataFrame to Markdown.
This gives us the desired behavior of handling rowspan, which markdownify does not handle.
"""
def convert_table(self, el, text, convert_as_inline):
try:
df = pd.read_html(StringIO(str(el)))[0]
# replace nan with empty string
df = df.fillna("")
except Exception as e:
print(f"Error converting table to DataFrame: {str(el)}")
print(e)
# Convert DataFrame to Markdown
return df.to_markdown(index=False)
def convert_th(self, el: NavigableString, text, convert_as_inline):
"""This method is empty because we want a No-Op for the <th> tag."""
# return the html as is
return str(el)
def convert_tr(self, el: NavigableString, text, convert_as_inline):
"""This method is empty because we want a No-Op for the <tr> tag."""
return str(el)
def convert_td(self, el: NavigableString, text, convert_as_inline):
"""This method is empty because we want a No-Op for the <td> tag."""
return str(el)
def convert_thead(self, el: NavigableString, text, convert_as_inline):
"""This method is empty because we want a No-Op for the <thead> tag."""
return str(el)
def convert_tbody(self, el: NavigableString, text, convert_as_inline):
"""This method is empty because we want a No-Op for the <tbody> tag."""
return str(el) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Had a quick look at the code and it seems that there's support for 'colspan' attribute, but not 'rowspan'. Any plans to add support?
HTML example
Parsed MD table
Desired MD output
The text was updated successfully, but these errors were encountered: