Skip to content

Python webscrapper that utilizes the BeautifulSoup library to parse web data to local files.

Notifications You must be signed in to change notification settings

bltomlin/WebScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

WebScraper

A program to demonstrate knowledge on sending HTTP-requests and process the responses, working with an external library, library documentation, file system management and parsing website data.

Installation

# clone the repo
$ git clone https://github.com/bltomlin/WebScraper`

# change the working directory to WebScraper
$ cd WebScraper

# install beautifulsoup4
$ pip install beautifulsoup4

Usage

The program will prompt how many pages you would like to scan from nature.com's 2020 articles list.

Enter the number of pages you wish to scrape:

The program will then prompt

Enter the article type you would like results for:

You can enter any of these types:

  • Article
  • Author Correction
  • Book Review
  • Career Column
  • Comment
  • Correspondence
  • Editorial
  • Futures
  • Nature Briefing
  • Nature Index
  • Nature Podcast
  • News
  • News & Views
  • News Feature
  • News Round-Up
  • Outlook
  • Publisher Correction
  • Research Highlight
  • Where I Work
  • World View

The program will then scan each page for the desired article type and save the articles to a text file locally in the WebScraper directory.

About

Python webscrapper that utilizes the BeautifulSoup library to parse web data to local files.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages