Text2KG - Knowledge Graph construction from text, an approach applied to fictional novels

Text to Knowledge Graph (Text2KG) is a dockerized set of services built to generate Knowledge Graphs; the created statements are shown in the form of RDF Triples, extracted from prosaic text sources. The services can be consumed via REST API.

This work was presented at the European Semantic Web Conference (ESWC2022) in the frame of the Knowledge Graph Generation from Text (TEXT2KG) workshop and is still a work in progress.

The High level view of the proposed pipeline stage the Knowledge Base Construction process.

System Preparation

To run the entire system, perform the following steps:

Download the repository and excecute the docker compose file

git clone https://github.com/d1egoprog/Text2KG.git
docker-compose -p text2kg -f deploy.yml up

Open the NEO4J web GUI on the ( localhost default port 7474) console and execute in the cypher console:

CREATE CONSTRAINT n10s_unique_uri ON (r:Resource) ASSERT r.uri IS UNIQUE;
call n10s.graphconfig.init();
call n10s.graphconfig.init( {  handleMultival: "ARRAY" })
call n10s.graphconfig.set( { keepLangTag: true, handleRDFTypes: "LABELS_AND_NODES" })

Check the status of the REST API library on the localhost, default port 9002; it can be changed on the docker compose file.

Usage

To explain the usage of this component, jupyter notebooks were prepared to showcase the functionability of each step in the proposed methodology. Initially is necesary to log into the prepared jupyterlab interface and open the text2kg folder, where the notebooks are located.

The notebooks are located under the examples] folder in this repository. NOTE: The examples will be mounted in the Text2KG folder in the JupyterLab environment.

Open the JupyterLab Interface

To check the functionality, you can open a web browser window to your docker-engine IP and the chosen service, e.g., PORT=9988; if you run this on your machine should be on localhost:9988/lab. The Jupyter Lab landing page should deploy if the deployment went correctly, asking for the session token. To obtain the token, just query the system log by using the command:

docker logs text2kg_tfgpu_jupyter_1

An output similar to this one should appear:

To access the server, open this file in a browser:
    file:///home/jupyter/.local/share/jupyter/runtime/jpserver-1-open.html
Or copy and paste one of these URLs:
    http://3538c43d20f3:8888/lab?token=<TOKEN>
 or http://127.0.0.1:8888/lab?token=<TOKEN>

Take the value of the token variable from the URL, in this example, and paste it into the token textbox displayed in the browser.

Happy hacking!! 🖖🖖.

External Dependencies

Stanford CoreNLP: A docker image has been prepared to be used as an external service able to be consumed by the 'stanza' python library, for more detailed information check the GitHub repository.
NEO4J: Enterprise solution for graph database with neosemantics plugin, for more detailed information check the GitHub repository and the python library repository.
TensorFlow: The dockerized version of the popular framework selected for working with GPUs, with and additional JupyterLab Installation. For more information a GitHub repository has been prepared and a Docker images as well.

Citations

If this work is with your interest you can read the presented paper and if you use it in your research please don't forget to cite 👍 this work, the suggested citation in BibTex format is:

@inproceedings{Rincon-Yanez2022,
  author = {Rincon-Yanez, Diego and Senatore, Sabrina},
  title = {{FAIR Knowledge Graph construction from text, an approach applied to fictional novels}},
  booktitle = {Proceedings of the 1st International Workshop on Knowledge Graph Generation From Text and the 1st International Workshop on Modular Knowledge co-located with 19th Extended Semantic Conference (ESWC 2022)},
  issn = {1613-0073}
  pages = {94--108},
  address = {Hersonissos, Greece},
  publisher = {CEUR-WS},
  url = {http://ceur-ws.org/Vol-3184/TEXT2KG_Paper_7.pdf},
  year = {2022}
}

Contact

If you have any questions in deployment or any error is found, please contact me by opening an issue. And contributing is always welcome. The Github repository Issues URL.

Licensing

This system is intended for demo purposes and is released under a free to use policy. At the moment the system is open to use in all environments, but the source of the API component is not released yet . However results produced from the execution of the pipeline must be released in CC-BY-SA licence.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
examples		examples
img		img
LICENSE		LICENSE
README.md		README.md
deploy.yml		deploy.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text2KG - Knowledge Graph construction from text, an approach applied to fictional novels

System Preparation

Usage

Open the JupyterLab Interface

External Dependencies

Citations

Contact

Licensing

About

Releases 2

Packages

License

d1egoprog/Text2KG

Folders and files

Latest commit

History

Repository files navigation

Text2KG - Knowledge Graph construction from text, an approach applied to fictional novels

System Preparation

Usage

Open the JupyterLab Interface

External Dependencies

Citations

Contact

Licensing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Packages