Skip to content

Latest commit

 

History

History
97 lines (72 loc) · 3.24 KB

README.md

File metadata and controls

97 lines (72 loc) · 3.24 KB

Linkedin Scraper using Scrapy

  • Scrape number of profiles that exist in result of Linkedin searchUrl.
  • Export the content of profiles to Excel and Json files.

Installation

  • Use the package manager pip to install Scrapy.
    (Anaconda Recomended)
cd LinkedinScraperProject     
pip install -r requirements.txt    
  • clone the project
git clone https://github.com/khaleddallah/GoogleImageScrapyDownloader.git

Usage

  • get into the directory of the project:
cd LinkedinScraperProject   
  • to get help :
python LinkedinScraper -h
usage: 
python LinkedinScraper [-h] [-n NUM] [-o OUTPUT] [-p] [-f format] [-m excelMode] (searchUrl or profilesUrl)

positional arguments:
  searchUrl     URL of Linkedin search URL or Profiles URL

optional arguments:
  -h, --help    show this help message and exit
  -n NUM        num of profiles
                ** the number must be lower or equal of result number
                'page' will parse profiles of url page (10 profiles) (Default)
  -o OUTPUT     Output file
  -p            Enable Parse Profiles
  -f FORMAT     json    Json output file
                excel    Excel file output
                all    Json and Excel output files
  -m EXCELMODE  1    to make each profile in Excel file appear in one row
                m    to make each profile in Excel file appear in multi row


Examples

python LinkedinScraper -p -o 'ABC' 'https://www.linkedin.com/in/khaled-dallah/' 'https://www.linkedin.com/in/linustorvalds/'
python LinkedinScraper -n 23 'https://www.linkedin.com/search/results/all/?keywords=Robotic&origin=GLOBAL_SEARCH_HEADER'
python LinkedinScraper -n 17 -f excel -m 1 'https://www.linkedin.com/search/results/all/?keywords=Robotic&origin=GLOBAL_SEARCH_HEADER'

Built with

  • Python 3.7
  • Scrapy
  • openpyxl

Author

Issues:

Report bugs and feature requests here.

Contribute

Contributions are always welcome!

License

This project is licensed under the LGPL-V3.0 License - see the LICENSE.md file for details