Skip to content

Latest commit

 

History

History
693 lines (422 loc) · 5.72 KB

README.md

File metadata and controls

693 lines (422 loc) · 5.72 KB

tripadvisor-singapore

python script to scrape reviews from tripadvisor. perform sentiment analysis and word frequency count. this built upon cs50 sentiments assignment.
https://captmomo.github.io/tripadvisor-singapore-zoo/

Description

Made a python script to scrape reviews from Tripadvisor and process the raw text. And another script to do a sentiment analysis and word frequency count. I used the CS50 Sentiments project as a starting point. Scraping was done with a combination of beautiful soup and selenium. The analysis was done using nltk

The results are quite different from what was on the tripadvisor page. tripadvisor

Sentiment analysis piechart

Results using VADER sentiment intensity analyzer

vader plot

Results from totaling the score for each review:

individual reviews

Results from analyzing the entire text as a whole:

entire text

Top 50 most frequently used words

You may notice that there’s a difference between my results and tripadvisor’s most talked about topics. I think this is because they are counting the frequency of ngrams. What I did was after tokenizing the text, I singularized (is that a word?) the words and compared them to a list of animals I obtained from the Singapore Zoo wikipedia page.

Word Occurances
zoo 20561
animals 12605
singapore 6001
see 5729
day 5504
well 5305
great 4819
get 4554
good 4076
time 3998
around 3962
one 3916
visit 3899
safari 3711
kids 3350
night 3180
place 3174
also 3099
breakfast 2854
really 2773
best 2722
shows 2633
would 2539
take 2520
go 2513
like 2493
tram 2393
show 2383
many 2364
animal 2349
experience 2228
food 2212
worth 2089
close 1942
water 1901
park 1899
orangutans 1783
zoos 1781
walk 1777
much 1758
must 1723
feeding 1691
enclosures 1570
amazing 1540
lot 1538
nice 1532
area 1530
children 1488
lots 1485
bus 1483

Animals mentioned

Animal Occurances
orangutan 2160
bear 1523
monkey 1057
lion 674
lemur 271
snake 234
kangaroo 176
baboon 174
hippo 147
zebra 130
penguin 115
cheetah 112
leopard 100
otter 90
komodo 80
sloth 80
dog 76
deer 74
goat 66
fox 66
parrot 47
lizard 42
python 42
tapir 33
tamarin 32
meerkat 31
flamingo 27
gibbon 23
rabbit 19
rat 19
cobra 16
mole 14
warthog 12
arapaima 10
iguana 8
panther 8
raccoon 8
pig 7
babirusa 7
hog 7
boa 7
saki 5
giraffe 4
falabella 3
rhinoceros 3
hippopotamus 1
terrapin 1