How can I do blog to epub on Manjaro Linux or Firefox?
from CoderSupreme@programming.dev to programming@programming.dev on 21 Nov 17:01
https://programming.dev/post/41118638
from CoderSupreme@programming.dev to programming@programming.dev on 21 Nov 17:01
https://programming.dev/post/41118638
I’m trying to convert a blog into an EPUB and keep running into issues with existing tools.
I first tried blog2epub, but it fails during parsing with:
lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: meta line 10 and head, line 17, column 8
I then tried WebToEpub on Firefox, providing:
- Content selector:
.article-content - Chapter title selector:
.title
It generated an EPUB, but the file wouldn’t open in any reader.
What I’m looking for is a tool where I can point to a blog’s base URL, define CSS selectors for the article title and body, and have it automatically fetch all entries and create one chapter per post. Or something similar.
Does anyone know of a reliable tool, script, or workflow that does this well on Linux?
#programming
threaded - newest
I recently learned about abogen and audiblez and what I want to do is blog to adiobook but I’m still stuck in getting the book from the blog.
I’m now thinking maybe c/linux would have been a better place to ask since I’m not trying to program anything. Let me know if I should move it there.
No, but could you feed the website with mismatched tags through something like tidy first? That error looks like maybe it’s expecting xhtml and getting html. Maybe the site is declaring one, then using the other. Lots of software won’t care because it’s a pretty common error, but some panics.
HTML 5 in actual production use is only partially convertible (it’s lossy). You need to get handsy with it. *
But one way around: get a markdown editor that can convert copy&paste from the web (i know of typora, it fetches (and opt. saves) images too) and then pandoc that.
* div#main, a.h1, div with naked text, i’ve seen things…