-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to support images? #51
Comments
Hi, Thanks for the kind words. I'm glad it's useful to you. Which command are you using? If you use The archive.is support doesn't seem to work anymore, and it was never officially supported, so I don't know if it would be possible to fix that. But for many Web pages (i.e. ones that don't require JavaScript to render), wget does a fine job of archiving the page and its content. |
I'm using You've given me an idea though. If I can use It's hard to beat the convenience of org-web-tools though for inserting the content of an article at point! |
I see. Yes, theoretically images could be downloaded to a directory, and they could be inserted into the Org content. Maybe a better way to handle that would be to have Wget download them and make the archive, then extract the archive and use Pandoc to convert it to Org content. |
I may have discovered an interesting alternative to There's a web instance, readability-bot that accepts a URL and renders the contents with Readability.js. For a test workflow:
This generates a nice, flat, mirror of the page in readable mode. Note, that I'm using index.html below to refer to the root HTML file I'm interested in, but I haven't found how to convince wget to output to it yet.
The org file now has embedded images as well. Though it's not quite clear how to manage all the assets just yet. Alternatively, invoke pandoc to generate an
This has the benefit of being a full self contained document that can be read with nov.el and marked up with org-noter. I had no idea pandoc supports epub generation. What an amazing piece of software. Anyway, this is getting too long. Any thoughts on the best way to manage images when converting to org? |
Hi folks, I wanted to build an archive of a blog I often refer to (gnuplotting.org) in an org-mode format. I ended up writing this bash script:
In my case, the website I was scraping was nice and simple, for more complex pages rdrview might be useful to further cut down on cruft. Once the files are in org-mode, I merged them all together using an Lastly, I used macros to clean up the output, such as unfilling all paragraphs of text, positioning captions correctly below figures, and renaming the source language from "prettyprint" to "gnuplot". |
Howdy @alphapapa, thanks for another amazing package!
I would love to download images as well. For instance this article works just fine with
eww-readable
, and a couple of images are critical to understanding the context.Looking at the org-web-tools code, it appears that images are not fetched at all and therefore cannot be displayed. Pandoc support may be the other potential pitfall.
Am I on the right track, or are there other issues for supporting images that I'm not seeing?
The text was updated successfully, but these errors were encountered: