-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
findspam.py: Add reason user unregistered or non-existent #4131
Conversation
Is this actually a good discriminator between spam vs. non-spam? It appears that this will detect every single post that is by an unregistered user. What is the approximate volume of those per day across SE? This also detects deleted accounts. SD usually scans new posts fairly quickly after they are posted. How often is a spammer going to have deleted their account prior to SD scanning? It looks like this will detect every post where the user has been deleted. There are a lot of old posts where the user was deleted sometime between when it was posted and now. I suspect the vast majority of those are not spam. I would suggest this detection exclude from detection any post that is over X age, where X is quite small (maybe a day?, or less?). |
At least some of the requests to the SE API don't actually ask for the |
This PR is not working currently with unknown reasons. |
The These In fact, we should be able to use a single value for all four of the Note that PR #3885 restructures the code with the |
From what I know, not really. Anonymous edits happen all the time, etc. and not all of them are spam. (Most anon edits tend to lead to unregistered/registrations anyways, and most of the Spam contributors are "Registered" to try and circumvent things)
I'm assuming this volume is very very high because of attempts and Unregistered users making edit contributions.
In a few rare instances this has been the case, but usually it is NOT the case that a user is deleted prior to an SD scan. SD is usually faster at detecting than moderators are at destroying users.
This as well is important. This PR hasn't had any changes in a week, and has not taken into account anything currently. Further:
I'm closing this because it's "not working" currently. If you fix the issues, resubmit a PR but also keep in mind all the comments made thus far. |
Spam posts often show such pattern.