Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Switch -f to pass the path to the static HTML file #133

Open
janreges opened this issue Jan 5, 2025 · 3 comments
Open

[Feature]: Switch -f to pass the path to the static HTML file #133

janreges opened this issue Jan 5, 2025 · 3 comments
Labels
enhancement New feature or request

Comments

@janreges
Copy link

janreges commented Jan 5, 2025

Describe the improvement

Hi @JohannesKaufmann,

I love your HTML to MD converter and, as I've also emailed you, I want to integrate it into https://github.com/janreges/siteone-crawler

I can handle the implementation without this feature, but it would be great if I could also pass the HTML as a path to a file on disk, e.g. via -f /path/to/file.html. SiteOne Crawler supports all platforms and the HTML to MD conversion phase will be done in the final crawling phase with conversion of static *.html files.

I can use the cat, echo or type commands on Windows, but support for working with files on disk would be more straightforward.

If there was already this support for files on disk, it would make sense to also add -o /path/to/output.html as an alternative to write MD on stdout.

Thank you again for this great tool 💪 From my research its output is the best quality of all the tools I have tried.

@janreges janreges added the enhancement New feature or request label Jan 5, 2025
@JohannesKaufmann
Copy link
Owner

Good point, will add that in the next days 👍

@janreges
Copy link
Author

janreges commented Jan 7, 2025

Thank you, man!

Btw, first implementation of html-to-markdown to SiteOne Crawler is ready (see this README part about markdown conversion) and in this README are first examples:

@JohannesKaufmann
Copy link
Owner

Nice, that already looks great!

Definitely report any bug that you encounter (e.g. code block for react).

And we can also move some of the features over at some point (e.g. table support is already in the works)


Short update what I am working on:

  • single files already work well ☑️
  • supporting multiple files ⏳
  • helpful error messages & test cases ⏳
# Single file
html2markdown --input file.html --output file.md

# Directory
html2markdown --input src/ --output dist/

# Glob pattern
html2markdown --input "src/*.html" --output "dist/"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants