Retrieves user profiles from social networks simulataneusly.
Send spiders
to the web and gather social content therein!
- python setup.py install
- install celery
- install redis
- edit
social_scraper/settings.py
add facebook & twitterauth
tokens
- python run_tests.py
- start_scraper
The server is running on port 8080
by default
Be sure to run celery worker before you start:
celery -A social_scraper.webapi.celery worker
curl -i http://localhost:8080/api/v0.1/users/twitter/sikorskiradek
curl -i http://localhost:8080/api/v0.1/users/facebook/barackobama
you may also access user_profile
from js client
or web browser
to just run spider, type:
- scrapy runspider twitter -A
<username>
- scrapy runspider facebook -A
<username>
Scrapyd allows deploying spiders, starting and stopping them using JSON web service
- pip install scrapyd
- scrapyd-deploy -p social_scraper
Job requests (spiders) are initialized from webserver using celery and send to scrapy
ecosystem
Written with Twisted, a popular event-driven networking framework for Python. Thus, it’s implemented using a non-blocking (aka asynchronous) code for for concurrency.
- Linkedin spider