Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle relative image URLs #134

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Conversation

kaichen
Copy link

@kaichen kaichen commented Jun 23, 2024

Add ability to process non-full image url, such as 'path/to/img.png' or '/path/to/img.png'

@AlexVonB
Copy link
Collaborator

Hey, thanks for your contribution! Any reason why the base_url gets cut into host and protocol, instead of using it as-is as prefix? Maybe the user wants to prefix their URLs with a full locator.

@chrispy-snps
Copy link
Collaborator

@kaichen - could you provide an example use case for this feature? I don't fully understand it from the pull request description.

@kaichen
Copy link
Author

kaichen commented Jan 2, 2025

could you provide an example use case for this feature? I don't fully understand it from the pull request description.

Some webpages might use relative paths for their image URLs. When using this library to download HTML and convert it to Markdown, need the full image URLs to ensure the images render correctly.

@kaichen
Copy link
Author

kaichen commented Jan 2, 2025

Hey, thanks for your contribution! Any reason why the base_url gets cut into host and protocol, instead of using it as-is as prefix? Maybe the user wants to prefix their URLs with a full locator.

just want to make sure base_url join relative correctly.

@chrispy-snps
Copy link
Collaborator

I have mixed feelings about this.

On one hand, I always appreciate a pull request contribution. And on the surface, this provides a nice convenience for this use case.

But on the other hand, Markdownify's job is to render the provided HTML to Markdown, and as the Unix mantra says, "do one thing and do it well." Modifying link content is content modification, not content rendering, which feels more like source preprocessing before Markdownify is called.

Two more random thoughts:

  • <a> links should be given similar consideration.

  • Another approach is to use a link-formatting function in process_img() and process_a():

    def format_link(link_text):
        return link_text;  # default is to use link text as-is
    

    then allow the user to override this, either by an option that takes a callback function, or by a subclassed function override.

Or maybe I am overthinking it, and this is simply a nice convenience that we should implement. :)

@AlexVonB, what are your thoughts on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants