Use this scrapy application to obtain bill data from the NC Legislature website.
The scrapy extracts each bill's data into an object. Use scrapy command to out put a JSON list of bill objects.
The objective of this project is to export this data into a more usable format for its presentation by the citizens of North Carolina. The data can also be migrated into other apps or made available for further analysis.
Requires python3 and scrapy.
- Install python3.
- Install scrapy, using pip for example:
pip install scrapy
. - Clone this repository and navigate into ncleg scraper directory:
ncleg/
. - Copy file
example.settings.py
tosettings.py
. Adjust Scrapy configuration according to your needs. - Tell scrapy to crawl "bills" via command line instruction. Pass "session" and "chamber" options (chamber is optional, passing no param will scrape both chambers). For example scrape bills Senate bills from 2017-2018 session to a json file:
scrapy crawl <spider> -a chamber=S -a session=2017 -o <filename>.json
.
bills
- retrieves individual bill informationmembers
- retrieves NCGA representative datamembersvotes
- retrieves each member vote from the entire session specified along with some basic member info
In order to politely preserve this public data resource please manage your autothrottle settings appropriately in settings.py
file. For more information read Scrapy's documentation.
If you want to seed a database with the data parsed by these spiders we can utilize the MongoPipeline. You will want to enable the pipeline in settings.py
. You will also want to set the MONGO_URI and MONGO_DATABASE in the settings. Collections names will be the spider name by default.
Drop by the Code For Charlotte Community Action Nights held weekly. Code for Charlotte is where this project originates along with many other wonderful civic, minded projects.
Visit the VoterSmarterNC JIRA Tracker for the list of desired features. Feel free to fork and use for your own project and needs!
More documentation on extending scrapy functionality & output formats.