This repository contains the implementation of DICE, a framework developped for the Joint-Reasoning for Multi-Faceted Commonsense Knowledge paper by Yohan Chalier, Simon Raznewski and Gerhard Weikum at the Max-Planck Institute for Informatics, available on arXiv.
Install Anaconda. Create a new environment using Python v3.6:
conda create -n dice python=3.6
Then install the dependencies.
source activate dice
pip install -r requirements.txt
Now install the Gurobi solver. Follow the instructions from the documentation to install it and retrieve a license. Here is an example of the ~/.bashrc
configuration:
export GUROBI_HOME="${HOME}/gurobi811/linux64"
export PATH="${PATH}:${GUROBI_HOME}/bin"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${GUROBI_HOME}/lib"
export GRB_LICENSE_FILE="${HOME}/gurobi.lic.${HOSTNAME}"
Now download the required external resources and put them in a subfolder data/
. Paths must match those written in the dice.constants.Data
class. Here is an overview of them:
Resource | Path | Required | Source |
---|---|---|---|
Decomposable attention model | attention.tar.gz | yes | AllenAI pre-trained model for Textual Entailment, based on Parikh et al, 2017 |
Word2vec model | word2vec.bin | only to generate the MMAP model | GoogleNews pre-trained model |
Word2vec model (Mapped Memory) | word2vec.model | yes | manually generated |
Quasimodo KB | quasimodo.tsv | only for Quasimodo KB builder | Full Quasimodo statements |
ConceptNet KB | conceptnet-kb.tsv | only for ConceptNet KB builder | Commonsense Knowledge Representation resources for Li et al. (2016), top 300k ConceptNet statements |
Tuple-KB | tuple-kb.tsv | only for TupleKb KB builder | Aristo Tuple KB v5 (March 2017) |
English word frequencies | english-frequencies.txt | only for disambiguation into WordNet senses | Invoke IT Word Frequency Lists for English, 2012 |
ConcetpNet taxonomy | conceptnet-taxonomy.json | only for ConceptNet taxonomy builder | manually gathered |
WebIsALOD | webisalod.json | only for WebIsALOD taxonomy builder | manually gathered |
The manually generated/gathered files can be retrieved using the gather.py
script.
python gather.py
An archive containing everything needed is available for download here (2.5 GB) .
Basic usage has the following form:
python dice.py [FLAGS] <MODULE> [ARGUMENTS]
Use the --help
flag for a detailed list of available modules and how to use them.
A common scenario starts with the formatting of a knowledge base through one of the kb builder modules (currently, conceptnet_kb
, tuple_kb
and quasimodo_kb
), and then run the full pipeline
on it.