Skip to content

heikkidoeleman/Hypertext-Corpus-Initiative

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hypertext Corpus Initiative

Welcome to the Hypertext Corpus Initiative (HCI) project.

This project consist in the following components:

  • HCI core
  • HCI crawler

HCI core

TBD

HCI crawler

The HCI crawler implemented as a Scrapy project. For more information see: http://jiminy.medialab.sciences-po.fr/hci/index.php/Scrapy_implementation_proposal

Code is in hcicrawler/ directory.

Requirements

Requirements:

  • Scrapy >= 0.14
  • pymongo >= 2.0

Releases

No releases published

Packages

No packages published

Languages

  • Java 95.9%
  • Python 3.3%
  • JavaScript 0.8%