/tech/ - Technology and Computing

Technology, computing, and related topics (like anime)

Site was down because of hosting-related issues. Figuring out why it happened now.

Build Back Better

Sorry for the delays in the BBB plan. An update will be issued in the thread soon in late August. -r

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


Anonymous 07/24/2020 (Fri) 06:04:09 No.3007
What can we do to help mitigate spam on imageboards? In the past few months I have noticed an increase of spam (especially political drama manipulation and website ads) mass posted on literally dozens to hundreds of imageboards at a time. I have no doubt this will increase as we approach the US elections. How can we use technology to help mitigate the destructive impact of spam? One place where this is important is on political boards which often mistake political manipulation spam for legitimate contributions as it fits their board topic. It's also less obvious to people who use only one IB.
Open file (713.49 KB 868x651 Science.png)
Examples of spam you might recognize: https://html.duckduckgo.com/html?q=well%20now%20it%27s%20here%20and%20you%20pathetic%20incel%20losers%20are%20just%20sitting%20at%20home%20jacking%20off https://html.duckduckgo.com/html?q=%22bear%20witness%20that%20there%20is%20nothing%20worthy%20of%20worship%20in%20truth%20but%20Allah%22 https://html.duckduckgo.com/html?q=A%20slow%20but%20friendly%20greeting%20from%20Slothchan! https://html.duckduckgo.com/html?q=%22Antifa%20blm%20planning%20on%20mass%20murdering%20white%20people%20on%20july%22 Some ideas: >use search engines to identify and call out spam like above. Anyone can do it without effort, but it's manual volunteer work. This also becomes harder as more imageboards use robots.txt to unindex themselves and because it takes a few hours for search engines to update. >make public scraper bot to regularly check various imageboards for identical posts. It could have a public feed of recent spam topics and archived proof of spamming with a list of sites hit with it. You would just post a link to that site whenever it shows up. Downside: persistent threat could add random elements to try and defeat automatic identification, cat and mouse game. Unlikely to see that dedication though? This could also be used to ID false-flag spamming a topic by listing the post time (if it gets posted on a /pol/ board and then spammed six hours later, it's probably inorganic.) >moderation webring. Have a way for staff to report spam posts and have it report identical posts on other imageboards if they exist. Downside: potential for abuse. >prevent bots using captchas/invisible fake forms/etc. Fake forms won't stop manual spam, not sure how much is manually done. They can cheaply buy API'd bot-ready captchas solutions unless you DIY one that only a person interested in your board could solve. >block Tor 1) no. 2) Most/all aren't from Tor from what I've seen.
>>3007 Spam is a partially solved problem. It was estimated that over 80% flow of emails is spam. Nobody is covered in spam because spam detection solutions are good, they have all kinds of statistical learning filters. Modify rspamd, or cheaply format posts as emails and pipe that through rspamd. Moderators can easilly teach rspamd filters by "Marking as spam" like you do for emails. Just make sure to disable dns checks and email specific modules.
>>3009 That's a good idea but I am skeptical of how effective automatic filters could be for less generic spam, such as ones not including links like most of the examples given, especially on random boards or political boards. If they weren't posted on multiple imageboards or in inappropriate places, they could easily be identified as legitimate OP posts. That would still be great for generic link-spam though.
>political drama manipulation huh the old trump/russia collusion trope funny to see it unironically on an imageboard
>>3009 Email is used by basically everyone who is on the internet. The dataset to train spam filters is enormous. I reckon it wouldn't work nearly as well at a place like this.

Report/Delete/Moderation Forms
Delete
Report