At the time of development, Nextdoor could not be scraped easily using more traditional methods (e.g. Scrapy, Beautiful Soup, etc.) because requests to retrieve the next set of posts use a "random" number as a parameter.
Thus, this is a simple python script that uses Selenium to simulate user input to scrape relevant data off nextdoor.com. It uses a chromedriver (included in this repo) as the browser.
As of the last update, this script will work with Python 3.8.0+. It is highly recommended a virtual environment is used for this script.
Once a virtual environment is built, pip install -r requirements.txt
must be run in a command prompt within the Nextdoor_Script directory to install relevant packages. The script will not work without these libraries.
Feel free to fork this repo and make it your own! This was just a personal project of mine, but if it is useful to anyone else, I'm happy to share this project. If you'd like to use it as is:
- Clone the repository into your directory of choosing.
- Create your own
.env
file, and fill out the variables - Open command prompt, navigate to the Nextdoor_Scraper directory, and run:
python nextdoor.py
if you don't want to save the html file separately (as backup in case of failure)python html_saver.py
if you want to save the html files andpython html_scraper.py
to scrape the local files separately (more stable for longer scrapes since it'll save the files)