/tech/ - Technology and Computing

Technology, computing, and related topics (like anime)

Build Back Better

More updates on the way. -r

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


Robi's MultiScraper tool seems promising but broken (at least for me) Anonymous 03/11/2021 (Thu) 00:59:35 No.3233
Not sure if this is the place to post this. Perhaps someone can give me some pointers, I attempted to use the MultiScraper tool that Robi wrote to do some board migration, however, the script seems to break and only can migrate one thread (files are not downloaded etc). https://gitgud.io/rb/MultiScraper Link to the project. I've attached some screenshots to show the errors that I'm getting. For reference I'm using Python 3.8 (Ubuntu 20.04), and have followed the exact instructions on the repo (this is a LynxChan to LynxChan migration). A question that I have that is not answered in the repo is however the target board should be established, in other words, should I make a blank board with the target board name, or just run the script and the script will make the board by default?
Open file (20.07 KB 773x308 ClipboardImage.png)
>>3233 Update: I removed inc from inc.download_file(f.path) on live 307 in lynxchan.py, this seemed to allow for files to be properly downloaded now, and one whole thread downloaded before getting this error again: "AttributeError: 'NoneType' object has no attribute 'cssselect'" Or check the screenshot
>>3233 Hopefully Robi will post here (or somewhere else) and explain exactly what his code is doing. I don't into Python, but I wrote a C++-based scraper that has been working for about a year and a half now. If I knew what his stuff was doing exactly, then I could probably relate that to my work and explain exactly how I'm doing it. Maybe it would help out some.
>>3235 Post your scraper somewhere?
>>3236 Sure, OK. (It's been readily available all along, actually). I just recently made a version that supports operating on a Raspberry Pi too. BUMP currently supports about 6 different types of imageboard software, and also kind of unifies the archives into a generalized, filesystem-based structure after download. So, it's kind of both a multi-scraper, as well as a generalization of the data storage that should enable easy script-based exporting to any type of other system -- no intermediate database required. >>>/robowaifu/8772

Report/Delete/Moderation Forms
Delete
Report