-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME
59 lines (44 loc) · 2.47 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
Tool for TREC Entity Track 2011
This tool is a demo for indexing and querying the Sindice dataset 2011 available
at [1]. Its purpose is to show how to extract entities from the dataset and to
index them using SIREn [2]. Some example queries are written in the JUnit tests.
The tool is composed of the following classes:
- Indexing: it takes care of creating the index;
- Sindice{ED,DE}Indexing: it extracts entities from the dataset, depending on
the format;
- Entity: it is the class representing an Entity;
- Utils: it contains utility methods used for indexing.
The command line interface is the class org.sindice.siren.index.IndexingCLI.
### Basic build steps ###
0) Install JDK 1.6 (or greater), Maven 2.0.9 (or greater)
1) Move to the root folder of the tool
2) Run maven
Step 0) Set up your development environment (JDK 1.6 or greater, Maven 2.0.9 or
greater)
We'll assume that you know how to get and set up the JDK - if you don't, then we
suggest starting at http://java.sun.com and learning more about Java, before
continuing as this tool is based on SIREn. SIREn runs with JDK 1.6 and later.
Like many Open Source java projects, SIREn uses Apache Maven for build control.
Specifically, you MUST use Maven version 2.0.9 or greater.
Step 1) From the command line, change (cd) into the root directory of the tool
The root directory contains the project pom.xml file. By default, you do not
need to change any of the settings in this file, but you do need to run maven
from this location so it knows where to find pom.xml.
Step 2) Run maven
Assuming you have maven in your PATH, typing the following commands at the shell
prompt should run maven.
1- To create a jar with maven:
$ mvn assembly:assembly
this will create the jar target/siren-eostool-0.1-SNAPSHOT-assembly.jar
2- To run the JUnit tests with maven:
$ mvn test
3- To create an index for the Sindice-ED dataset format:
$ java -cp $JAR org.sindice.siren.index.IndexingCLI \
--dumps-dir src/test/resources/ \
--index-dir /tmp/test-ED \
--format SINDICE_ED
where $JAR=target/siren-eostool-0.1-SNAPSHOT-assembly.jar
This will create an index at /tmp/test-ED from the files at
src/test/resources/ corresponding to the dataset format SINDICE_ED.
[1] http://data.sindice.com/trec2011/index.html
[2] https://github.com/rdelbru/SIREn