This is a repo for course projects @ Professor Torsten Suel's course Web Search Engines.
-
Jcrawler : a primitive multi-threaded focused web crawler to collect web pages from www, with concentration on given key words. Language : python
-
indexer : a c++ program to parse web pages, do reverse index, and generate final index for later query processing. involving massive data processing, file compression(var-byte).
-
query processor, ask former built inverted index to answer user's search queries.
-
Foursquare crawler and recommendation system : including a crawler to collect user, venue, rating, check in information from Foursquare, Twitter and Facebook, then apply machine learning algorithms (collaborative-filtering, SVD, etc) to recommend friends and venues to users.