This bot has been developed in an attempt to help capture possible vandalism by identifying edits that:
- remove all code
- replace content with nonsense or repeated words
- include solutions to questions
- remove large amounts of text from the post
- use certain keywords or offensive language within the edit summary
The point of the bot is to help identify bad edits and/or potential vandalism made to posts in real time so that the changes can be quickly rolled back.
The bot queries the Stack Exchange API every minute to fetch a list of the most recently edited posts. There is logic to check that the post has been edited and that it has been edited by the author.
The post_id
from each post is then extracted and the Stack Exchange API is again queried for a list of revisions. To reduce API calls multiple ids are sent at once, and then logic is in place to ensure we are using the latest revision.
Edits can be made up of a title change, body change of a question, tag changes or changes made to the body of an answer. Currently tags are not checked. Instead the title, question body and answer body depending on what has been edited are run through filters, as is the edit summary.
BlacklistedWords
; certain words are appended to titles. The bot reads a file which holds a list of keywords to watch out for within titles
TextRemoved
; the bot checks if 80% or more of the body has been removed and whether the Jaro Winkler score of the diff is less than 0.6.BlacklistedWords
; certain words are appended to posts. The bot reads a separate file for questions and answers. Both hold a list of keywords to watch forCodeRemoved
; the bot checks if the latest edit removed all code from the post.FewUniqueCharacters
; the bot checks if the post contains few unique characters — this rule is similar to SmokeDetector's "Few unique characters" one.RepeatedWords
; the bot checks whether there are 5 or less unique words in the post.VeryLongWord
; the bot checks the post for a word longer than 50 characters long. Code blocks are stripped before the check is performed.
BlacklistedWords
; certain words are used within the edit summaries. The bot holds a separate file for question edit summaries and answer edit summaries. Both hold a list of keywords to watch for.OffensiveWord
; the bot checks for offensive language used within the edit summary. This is done via a separate regex file.
The project is running under the user Belisarius in the SOBotics room. A more detailed presentation is at http://belisarius.sobotics.org/ including a list of commands.
Currently feedback is taken by replying to the chat message with either tp
(True Positive) or fp
(False Positive).
A sample image of a report is:
- Maven 3.6
- Java 11
- SQLite for reading the database - instructions to install for Windows, Linux and MacOS
-
Clone the repository:
git clone https://github.com/SOBotics/Belisarius cd Belisarius
-
Install dependencies:
mvn clean install
-
Run
cp properties/login.example.properties properties/login.properties
and fill
properties/login.properties
. -
Start the bot by running:
java -cp target/belisarius-1.8.0.jar:./lib/* bugs.stackoverflow.belisarius.Application
If you want to change the location of the log file, edit src/main/resources/log4j.xml
. The project must be rebuilt (mvn install
), for the changes to be applied.