Is there a more efficient way to scrape and download all stories from a forum than my current multi-step process?
from MindfulMaverick@piefed.zip to programming@programming.dev on 26 Feb 23:50
https://piefed.zip/c/programming/p/1151238/is-there-a-more-efficient-way-to-scrape-and-download-all-stories-from-a-forum-than-my-curr
from MindfulMaverick@piefed.zip to programming@programming.dev on 26 Feb 23:50
https://piefed.zip/c/programming/p/1151238/is-there-a-more-efficient-way-to-scrape-and-download-all-stories-from-a-forum-than-my-curr
I’m currently using a pagination, link extraction, and Python filtering process before feeding links to fichub-cli to download all stories from a specific forum. The workflow is detailed in this post: https://piefed.zip/post/1151173 . Looking for a more streamlined, possibly one-command solution that could crawl the forum, extract thread links, and download them automatically. Any suggestions?
#programming
threaded - newest
If you ask nicely, the admin might just give you a database dump.
If your interest is in bulk download of erotic stories and you don’t specifically care about that forum (which I assume is the case, if you just want to dump the entire thing) — like, you’re looking for a training corpus to fine-tune an LLM to generate material along those lines or something in that neighborhood — I suspect that there are considerably-more-substantial archives than “hundreds”.
checks
It looks like ftp.asstr.org is still running an anonymous-access public FTP server. They’ll have years of archives from the relevant text erotica Usenet groups. You won’t need to screen-scrape that; just use any client that can recursively download from an FTP server.
I’m trying to download all fics from that specific forum. Sorry I wasn’t clear.