This bot fetches posts from a specified subreddit, searches for book titles mentioned by users in {{double braces}}
, queries a database to find info on those books, and replies to the posts with additional details.
This is a revamp of Goodreads-bot, that first launched but died after Goodreads shut down API access. This version relies on a private database.
The bot has a 3-step workflow:
-
Crawling - Fetches recent posts from the subreddit and saves new post IDs to a BigQuery table.
-
Matching - Checks the BigQuery table for new post IDs, extracts book titles from the post text, queries the database to find book info, and prepares a reply.
-
Replying - Posts the prepared reply as a comment on the original post.
The main classes handling each step are:
Reader
- Crawls subreddit looking for summoning pattern{{something}}
and saves the post IDs to reply to.Matcher
- Matches book titles, fetches additional information, formats reply, posts on Reddit.Bot
- Initializes Reader and Matcher objects and runs workflow
The bot behavior is configured via the config.json
file:
subreddit
- Subreddit name to crawllimit
- Max number of posts to fetch per crawlmin_ratio
- Minimum matching score (/100) to accept a book title matchtable_*
- Names of the BigQuery tables
The bot mainly uses the following BigQuery tables:
table_dim_books
- Main book info table
master_grlink
- Goodreads link for the bookshort_title
- Title without Series names (if so)first_author
- Main authorseries_title
- Series name if part of a series, NULL if notbook_number
- Book number within series, NULL if nottags
- Array of topic tagssummary
- Book summary text- ...plus other metadata fields
table_crawl_dates
- Tracks last crawl timestamp per subreddit
subreddit
- Subreddit namecrawl_timestamp
- Last crawl time for subreddit
table_to_match
- New post IDs to process
subreddit
- Subreddit of postpost_id
- Reddit post IDpost_timestamp
- Post creation timepost_type
-submission
orcomment
table_reply_logs
- Logs replies posted by bot
post_id
- Reddit post IDreply_id
- Reddit reply IDscore
- Book title match scoremaster_grlink
- Link to book page
To start the bot:
- Set up BigQuery credentials
- Configure
config.json
- Run the command:
python main.py --config config.json
The main.py
script will initialize the Reader
and Matcher
objects and run through the workflow ONCE: one crawling, one matching. Adapt the file for more advanced behavior.