Welcome to the Hypertext Corpus Initiative (HCI) project.
This project consist in the following components:
- HCI core
- HCI crawler
TBD
The HCI crawler implemented as a Scrapy project. For more information see: http://jiminy.medialab.sciences-po.fr/hci/index.php/Scrapy_implementation_proposal
Code is in hcicrawler/
directory.
Requirements:
- Scrapy >= 0.14
- pymongo >= 2.0