-
Notifications
You must be signed in to change notification settings - Fork 19
/
Copy pathREADME
56 lines (48 loc) · 2.5 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
Example code for "Web-scale computer vision using mapreduce for multimedia data mining"
Notes about these examples
- Goal: Ground the examples in the paper with actual implementations to improve understanding
- As similar in function to the paper as possible
- Variable names may differ if it improves readability (often the 'type' as it is more descriptive)
- These are not the implementations used for the performance tests as those mix C/Python, use more complex Python libraries, use the TypedBytes input format, etc. The performance test code is in a folder called "performance" and is less documented than the others. This shows how to integrate C code, etc.
- Portions of the algorithms may be 'mocked' out if their functionality is not of focus
- Independent to make them easier to understand at the expense of duplicative code
- A copy of the paper is provided in the project root
- Any external test data is in the "input" folder in the project root
Requirements
- python (2.6.5)
- hadoopy (0.1)
- numpy (1.3.0)
- PIL (1.1.7)
- nose (0.11.1)
Additional Requirements (if you want to run Hadoop cluster examples)
- cxfreeze (4.0.1)
- hadoop (Cloudera CDH3 0.20.2+228)
Running Tests
At the project root you can run "nosetest" from the project root if you have it installed. Otherwise, each test can be run individually from the project root.
For example (in BASH shell in project root)
$ python kmeans/kmeans_test.py
..
----------------------------------------------------------------------
Ran 2 tests in 0.013s
OK
Mapping from paper Algorithm #'s to code
Written PEP8 Tests Hadoop Example Run/Data
Algorithm 1: wordcount/wordcount.py X X X
Algorithm 2: normalize/normalize.py X X X
Algorithm 3: classtrain/classtrain.py X X X
Algorithm 4: slidingwindow/slidingwindow.py X X X
Algorithm 5: kmeans/kmeans.py X X X
Algorithm 6: kmeans/kmeans_imc.py X X X
Algorithm 7: bof/bof_maponly.py X X X
Algorithm 8: bof/bof.py X X X
Algorithm 9: bgsub/bgsub.py X X X
Algorithm 10: compose/compose.py X X X
If you use this in your research, please cite as
B. White, T. Yeh, J. Lin, and L. Davis, "Web-scale computer vision using mapreduce for multimedia data mining," in MDMKDD, 2010.
Bibtex
@inproceedings{white10kdd,
author = {Brandyn White and Tom Yeh and Jimmy Lin and Larry Davis},
booktitle = {MDMKDD},
title = {Web-Scale Computer Vision using MapReduce for Multimedia Data Mining},
year = {2010}
}