Skip to content
chrisalbright edited this page Aug 29, 2013 · 10 revisions

Directory

Cerebro

Cerebro is a an experimental social web mining and analysis tool. The nominal purpose is to search Meetup, Linkedin, Gethub, and other sources to find developers who may want to join West Coast Code Choppers. Think of it as the tool that would be used by Professor Charles Francis Xavier to search for X-men (http://en.wikipedia.org/wiki/X-men). In this case, individuals with keen logical perception and feel compelled to engage in writing software as an art.

A sample query to Cerebro could be something like "who are the local functional programming geeks within 10 miles of here who are likely to be interested in participating in further development of Cerebro?" This sort of query requires data mining, analytics, as well as a number of other areas.

Once Cerebro is built, this sort of technology can be used for other purposes, such as collecting text data from congressional websites and correlating political stances with money sources. (Perhaps by statistical analysis of political language, such as described in Natural Language Processing in Python

Since West Code Choppers is an open-source developer group, Cerebro is developed as an open source testbed to explore technology and its consequences. Development session on Cerebro involve teaching, discussion, and training, as well as coding.

West Code Choppers social web mining tool. It is a public experiment to explore the following issues:

  1. What kind of personal information can be retrieved from the web?
  2. How would such as system be developed?
  3. How invasive would such as program be?
  4. What sort of due diligence is required from individuals who want to limit their exposure?

Structure

Cerebro is expected to consist of the following:

  1. A set of scrapers that parse public web sites.
  2. A spider that distributes and coordinates the scraping activity (with low observability).
  3. A data conversion utility to convert the incoming data into a common format.
  4. A set of knowledge bases to contain

Planning

8/28/13

  1. Overview of Cerebro concept.
  2. Overview of development process
  3. Introduction to scrapers
  4. Team coding of scrapers (Basic scraper will be provided for adaptation)
  5. Discussion of results

Engineering Resources

  1. DataSources
  2. References
  3. Scrapy
Clone this wiki locally