Web Scrappoing usin Python

This are the parctice scrips used to practice web scapping

bold italics

REGEX Cheatsheet

Character	Example	Definition
*	ab	Matches the previous character 0 or more times
+	a+b+	Matches the previous character 1 or more times
[ ]	[a-z]	Matches any character from a to z
[^ ]]	[a-z]	Does not matches any character from a to z
()	(ab)	A grouped subexpression, this are executed first
`\|`	`(foo\|foot)s`	or Matches one of the other expression
{m,n}	a{2,3}	Matches the preceding character, m to n
.	b.d	Matches any charater
^	^a	Indicates an expression at the begining of the sting
\	^	An escape charater
$	[A-Z]*$	Often at the of the expression it matches the end of the string
?!	^((?![A-Z]).)*$	Does not contain seomthing?? expand
?	(swimming )? pool	makes the previous expression optional
??	(swimming )? pool	lazy
(?=)	A(?=B)	look ahead Matches an A followed by a B: AB, ABC,
(?!)	A(?!B)	look ahead negatice find a expression A where B does not follows
(?<=)	(?<=B)A	look behind Find Expresion A where B preceds it
(?<!)	(?<!B)A	look behind negatice find expression A where expression B does not precced
(?>)	`(?>foo\|foot)s`	atomic groups a groupe which trows away altenative patterns if the first alternative does not match

###BeautifulSoup4

It is a Python libraby used for scrapping websites

It probably might have to be installed. I used pip-3.6 install beautifulsoup4

The beautifulSoup librabry creates a data structure out of the html document, enabiling the user to maniputale HTML tags a data objs. This is very useful if one is looking traverse links.

One can create a beautifulSoup object by passing the the html document and a parser.

soup = BaautifulSoup(html_doc, 'html_parser')

one can see the html page with:

print(soup.prettify())

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
chapter_1		chapter_1
chapter_2		chapter_2
chapter_3		chapter_3
mapping_wikipedia		mapping_wikipedia
README.md		README.md
scrappy_the_web_scrapper.py		scrappy_the_web_scrapper.py
target_site.txt		target_site.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scrappoing usin Python

This are the parctice scrips used to practice web scapping

REGEX Cheatsheet

About

Releases

Packages

Languages

GoranTopic/Web-Scrapping-with-Python

Folders and files

Latest commit

History

Repository files navigation

Web Scrappoing usin Python

This are the parctice scrips used to practice web scapping

REGEX Cheatsheet

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages