Is there a more efficient way to scrape and download all stories from a forum than my current multi-step process?
from MindfulMaverick@piefed.zip to programming@programming.dev on 26 Feb 23:50
https://piefed.zip/c/programming/p/1151238/is-there-a-more-efficient-way-to-scrape-and-download-all-stories-from-a-forum-than-my-curr

I’m currently using a pagination, link extraction, and Python filtering process before feeding links to fichub-cli to download all stories from a specific forum. The workflow is detailed in this post: https://piefed.zip/post/1151173 . Looking for a more streamlined, possibly one-command solution that could crawl the forum, extract thread links, and download them automatically. Any suggestions?

#programming

threaded - newest

bleistift2@sopuli.xyz on 27 Feb 00:14 next collapse

If you ask nicely, the admin might just give you a database dump.

tal@lemmy.today on 27 Feb 01:28 next collapse
  • Start with the comprehensive link collection from Cyb3rNexus’s GitHub Gist – it already contains hundreds of pre-filtered thread links!
  • For more recent stories, navigate to NSFW Creative Writing

If your interest is in bulk download of erotic stories and you don’t specifically care about that forum (which I assume is the case, if you just want to dump the entire thing) — like, you’re looking for a training corpus to fine-tune an LLM to generate material along those lines or something in that neighborhood — I suspect that there are considerably-more-substantial archives than “hundreds”.

checks

It looks like ftp.asstr.org is still running an anonymous-access public FTP server. They’ll have years of archives from the relevant text erotica Usenet groups. You won’t need to screen-scrape that; just use any client that can recursively download from an FTP server.

MindfulMaverick@piefed.zip on 27 Feb 07:30 collapse

I’m trying to download all fics from that specific forum. Sorry I wasn’t clear.