GitHub - bltomlin/WebScraper: Python webscrapper that utilizes the BeautifulSoup library to parse web data to local files.

WebScraper

A program to demonstrate knowledge on sending HTTP-requests and process the responses, working with an external library, library documentation, file system management and parsing website data.

Installation

# clone the repo
$ git clone https://github.com/bltomlin/WebScraper`

# change the working directory to WebScraper
$ cd WebScraper

# install beautifulsoup4
$ pip install beautifulsoup4

Usage

The program will prompt how many pages you would like to scan from nature.com's 2020 articles list.

Enter the number of pages you wish to scrape:

The program will then prompt

Enter the article type you would like results for:

You can enter any of these types:

Article
Author Correction
Book Review
Career Column
Comment
Correspondence
Editorial
Futures
Nature Briefing
Nature Index
Nature Podcast
News
News & Views
News Feature
News Round-Up
Outlook
Publisher Correction
Research Highlight
Where I Work
World View

The program will then scan each page for the desired article type and save the articles to a text file locally in the WebScraper directory.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.DS_Store		.DS_Store
README.md		README.md
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebScraper

Installation

Usage

About

Releases

Packages

Languages

bltomlin/WebScraper

Folders and files

Latest commit

History

Repository files navigation

WebScraper

Installation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages