DD2476-Project: Shopping System with Craigslist Product

Note: The folder crawler is deprecated. All code are move into website folder. Be carefult to configure the path if you want to rerun the code.

Project Main features:

Preparation

0 prerequisites

Ubuntu 16.04
Python 2.7
Pytorch, torchvision
Elasticsearch And then install all the other python dependencies using pip:

pip install -r pip_list.txt

1 Craw the raw data without images

Use crawler.py to craw data from Craigslist website, totally 35 classes (~80,000 products) Files will be stored in website/data/ file. The raw data tar file is also provided.

2 Craw images with the help of mpi for speeding up

Use crawImages_mpi.py to craw the images and store in local computer. The download log is stored in downloadImages.log. The data will be stored in website/images/. The filtered new data json will be also generated and stored.

3 Extract CNN features using Resnet (Pytorch)

Use extractFeatureFinal.py. The features are stored here. Put the file totalRes18feat.txt under website/

Insert Data and run elasticsearch

python insert.py
elasticsearch # remember add PATH in ~/.bashrc

Run the server

Link: localhost:8080/

python server.py

Authorship

Ruiyang Ma, Zesen Wang, Zehang Weng, Zitao Zhang

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
crawler		crawler
website		website
36x48_Trifold_Templatev12 (4).pdf		36x48_Trifold_Templatev12 (4).pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DD2476-Project: Shopping System with Craigslist Product

Project Main features:

Preparation

0 prerequisites

1 Craw the raw data without images

2 Craw images with the help of mpi for speeding up

3 Extract CNN features using Resnet (Pytorch)

Insert Data and run elasticsearch

Run the server

Authorship

About

Releases

Packages

Contributors 3

Languages

WangZesen/DD2476-Project

Folders and files

Latest commit

History

Repository files navigation

DD2476-Project: Shopping System with Craigslist Product

Project Main features:

Preparation

0 prerequisites

1 Craw the raw data without images

2 Craw images with the help of mpi for speeding up

3 Extract CNN features using Resnet (Pytorch)

Insert Data and run elasticsearch

Run the server

Authorship

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages