Uses: Scrapy, Selenium web driver, Chromium headless, docker and python3.
The first spider aims to visit as more linkedin's user pages as possible :-D, the objective is to gain visibility with your account: since LinkedIn notifies the issued User when someone visits his page.
This spider aims to collect all the users working for a company on linkedin.
- It goes on the company's front-page;
- Clicks on "See all 1M employees" button;
- Starts collecting User related Scapy items.
Needed:
- docker;
- docker-compose;
- VNC viewer, like vinagre (ubuntu);
- python3.6;
- virtualenvs;
Install docker from the official website https://www.docker.com/
Install VNC viewer if you do not have one. For ubuntu, go for vinagre:
sudo apt-get update
sudo apt-get install vinagre
Copy conf_template.py
in conf.py
and fill the quotes with your credentials.
Only linkedin spider, not the companies spider. Open your terminal, move to the project folder and type:
docker-compose up -d --build
Open vinagre, and type address and port localhost:5900
. The password is secret
.
or otherwise:
vinagre localhost:5900
or
make view
Use your terminal again, type in the same window:
docker-compose down
Setup your python virtual environment (trivial but mandatory):
virtualenvs -p python3.6 .venv
source .venv/bin/activate
pip install -r requirements.txt
Create the selenium server, open the VNC window and launch the tests, type those in three different terminals on the project folder:
make dev
make view
make tests
For more details have a look at the Makefile (here is used to shortcut and not to build).
- Development:
scrapy crawl companies -a selenium_hostname=localhost
or
scrapy crawl linkedin -a selenium_hostname=localhost