python script to scrape reviews from tripadvisor. perform sentiment analysis and word frequency count.
this built upon cs50 sentiments assignment.
https://captmomo.github.io/tripadvisor-singapore-zoo/
Made a python script to scrape reviews from Tripadvisor and process the raw text. And another script to do a sentiment analysis and word frequency count. I used the CS50 Sentiments project as a starting point. Scraping was done with a combination of beautiful soup and selenium. The analysis was done using nltk
The results are quite different from what was on the tripadvisor page.
You may notice that there’s a difference between my results and tripadvisor’s most talked about topics. I think this is because they are counting the frequency of ngrams. What I did was after tokenizing the text, I singularized (is that a word?) the words and compared them to a list of animals I obtained from the Singapore Zoo wikipedia page.
Word | Occurances |
---|---|
zoo | 20561 |
animals | 12605 |
singapore | 6001 |
see | 5729 |
day | 5504 |
well | 5305 |
great | 4819 |
get | 4554 |
good | 4076 |
time | 3998 |
around | 3962 |
one | 3916 |
visit | 3899 |
safari | 3711 |
kids | 3350 |
night | 3180 |
place | 3174 |
also | 3099 |
breakfast | 2854 |
really | 2773 |
best | 2722 |
shows | 2633 |
would | 2539 |
take | 2520 |
go | 2513 |
like | 2493 |
tram | 2393 |
show | 2383 |
many | 2364 |
animal | 2349 |
experience | 2228 |
food | 2212 |
worth | 2089 |
close | 1942 |
water | 1901 |
park | 1899 |
orangutans | 1783 |
zoos | 1781 |
walk | 1777 |
much | 1758 |
must | 1723 |
feeding | 1691 |
enclosures | 1570 |
amazing | 1540 |
lot | 1538 |
nice | 1532 |
area | 1530 |
children | 1488 |
lots | 1485 |
bus | 1483 |
Animal | Occurances |
---|---|
orangutan | 2160 |
bear | 1523 |
monkey | 1057 |
lion | 674 |
lemur | 271 |
snake | 234 |
kangaroo | 176 |
baboon | 174 |
hippo | 147 |
zebra | 130 |
penguin | 115 |
cheetah | 112 |
leopard | 100 |
otter | 90 |
komodo | 80 |
sloth | 80 |
dog | 76 |
deer | 74 |
goat | 66 |
fox | 66 |
parrot | 47 |
lizard | 42 |
python | 42 |
tapir | 33 |
tamarin | 32 |
meerkat | 31 |
flamingo | 27 |
gibbon | 23 |
rabbit | 19 |
rat | 19 |
cobra | 16 |
mole | 14 |
warthog | 12 |
arapaima | 10 |
iguana | 8 |
panther | 8 |
raccoon | 8 |
pig | 7 |
babirusa | 7 |
hog | 7 |
boa | 7 |
saki | 5 |
giraffe | 4 |
falabella | 3 |
rhinoceros | 3 |
hippopotamus | 1 |
terrapin | 1 |