- simhash
- e2lsh
- gqr + pca hashing
We assume you have set up a NFS director (denoeted as /data) that every machine can access it.
$ cd /data
$ git clone --recursive https://github.com/lijinf2/losha.git
$ cd losha
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release ..
$ make help
$ make -j4 Master
$ make -j4 e2lsh
$ cd ../script
$ python idfvecs.py
$ hadoop dfs -mkdir /losha /losha/audio
$ hadoop dfs -Ddfs.blocksize=8388608 -put audio_base.idfvecs /losha/audio
$ hadoop dfs -Ddfs.blocksize=8388608 -put audio_query.idfvecs /losha/audio
$ cd ../gqr/ && mkdir build && cd ./script
$ sh cal_groundtruth.sh
Run in single machine (assume the hostname is master, Master run in one terminal and e2lsh runs in another terminal)
$ cd ../../script
$ ../build/husky/Master --conf ../conf/e2lsh.conf
$ // open another shell to run e2lsh
$ ../build/e2lsh --conf ../conf/e2lsh.conf
$ sh evaluate.sh
$ ../build/husky/Master --conf ../conf/e2lsh-slaves.conf
$ ./exec.sh ../build/e2lsh --conf ../conf/e2lsh-slaves.conf
$ sh evaluate.sh
- Master and e2lsh run on two different shells
- Always remove output files on HDFS before next try.
- We suggest to set up GQR and Husky before using LoSHa.
LoSHa: A General Framework for Scalable Locality Sensitive Hashing (SIGIR 2017)