Scraping is an art
Scraping is a word which is generally trending these days. The formal meaning of word scraping is to collect /gather all the data which are required. This is also termed as WEB MINING. Scraping is an “art in modern technology” which pulls out all the information which is available or required. It is “art” because it is the final trick in the pocket to scrape all the mess (DATA).
Web scraping/mining is done when you use snippet to fetch a web page from web site and parse the contents to extract some data (meaning full data). The important piece in scraping is html parsing but most of the browser doesn’t require the cleanest html in order to render, you will require the html parser which will make sense of html which is not always formed. In web scraping most questions raised are what the best/fast/reliable/efficient language for web scraping is? And what the necessary modules are? And so on. Soon these questions will be enlightened.
Scraping language:
The author (beginner) wants to be efficient with the learning time and with language which is going to help him to scrape the web Without hesitation I strongly recommend you python to go with because python is the language you should go forward with and will explain you why?
Speed does not matter much for this kind of applications/process but the ease of programming does so python and Beautiful Soup are the place to start.
Python is used throughout the world as a technical language because of its syntax and semantic which lend itself to simplicity.
Let me spot you on this too that “processor performance are the bottle neck web mining/scraping input output operations performed”. If the building and maintenance of the project is to scrape, the language is not important there are plenty of good scripting languages and accomplishing libraries which perform efficiently
Conclusion why python:
There is nothing special in python as a language that makes it special suited or better option for this kind of operation. The popularity of language in specific domain is usually explained by the availability of libraries written to address each problem. The good thing about python is that it has library for everything. This is why it is handy to go language for many developers, it lets you build working project fast and with least delay.