Text to Knowledge Graph (Text2KG) is a dockerized set of services built to generate Knowledge Graphs; the created statements are shown in the form of RDF Triples, extracted from prosaic text sources. The services can be consumed via REST API.
This work was presented at the European Semantic Web Conference (ESWC2022) in the frame of the Knowledge Graph Generation from Text (TEXT2KG) workshop and is still a work in progress.
The High level view of the proposed pipeline stage the Knowledge Base Construction process.
To run the entire system, perform the following steps:
- Download the repository and excecute the docker compose file
git clone https://github.com/d1egoprog/Text2KG.git
docker-compose -p text2kg -f deploy.yml up
- Open the NEO4J web GUI on the ( localhost default port
7474
) console and execute in the cypher console:
CREATE CONSTRAINT n10s_unique_uri ON (r:Resource) ASSERT r.uri IS UNIQUE;
call n10s.graphconfig.init();
call n10s.graphconfig.init( { handleMultival: "ARRAY" })
call n10s.graphconfig.set( { keepLangTag: true, handleRDFTypes: "LABELS_AND_NODES" })
- Check the status of the REST API library on the localhost, default port
9002
; it can be changed on the docker compose file.
To explain the usage of this component, jupyter notebooks were prepared to showcase the functionability of each step in the proposed methodology. Initially is necesary to log into the prepared jupyterlab interface and open the text2kg
folder, where the notebooks are located.
The notebooks are located under the examples] folder in this repository. NOTE: The examples will be mounted in the Text2KG
folder in the JupyterLab environment.
To check the functionality, you can open a web browser window to your docker-engine IP
and the chosen service, e.g., PORT=9988
; if you run this on your machine should be on localhost:9988/lab. The Jupyter Lab landing page should deploy if the deployment went correctly, asking for the session token. To obtain the token, just query the system log by using the command:
docker logs text2kg_tfgpu_jupyter_1
An output similar to this one should appear:
To access the server, open this file in a browser: file:///home/jupyter/.local/share/jupyter/runtime/jpserver-1-open.html Or copy and paste one of these URLs: http://3538c43d20f3:8888/lab?token=<TOKEN> or http://127.0.0.1:8888/lab?token=<TOKEN>
Take the value of the token
variable from the URL, in this example, and paste it into the token textbox displayed in the browser.
Happy hacking!! 🖖🖖.
- Stanford CoreNLP: A docker image has been prepared to be used as an external service able to be consumed by the 'stanza' python library, for more detailed information check the GitHub repository.
- NEO4J: Enterprise solution for graph database with neosemantics plugin, for more detailed information check the GitHub repository and the python library repository.
- TensorFlow: The dockerized version of the popular framework selected for working with GPUs, with and additional JupyterLab Installation. For more information a GitHub repository has been prepared and a Docker images as well.
If this work is with your interest you can read the presented paper and if you use it in your research please don't forget to cite 👍 this work, the suggested citation in BibTex format is:
@inproceedings{Rincon-Yanez2022,
author = {Rincon-Yanez, Diego and Senatore, Sabrina},
title = {{FAIR Knowledge Graph construction from text, an approach applied to fictional novels}},
booktitle = {Proceedings of the 1st International Workshop on Knowledge Graph Generation From Text and the 1st International Workshop on Modular Knowledge co-located with 19th Extended Semantic Conference (ESWC 2022)},
issn = {1613-0073}
pages = {94--108},
address = {Hersonissos, Greece},
publisher = {CEUR-WS},
url = {http://ceur-ws.org/Vol-3184/TEXT2KG_Paper_7.pdf},
year = {2022}
}
If you have any questions in deployment or any error is found, please contact me by opening an issue. And contributing is always welcome. The Github repository Issues URL.
This system is intended for demo purposes and is released under a free to use policy. At the moment the system is open to use in all environments, but the source of the API component is not released yet . However results produced from the execution of the pipeline must be released in CC-BY-SA licence.