JenTab

Prvious Releases
- SemTab 2020 and Knowledge Graph Construction Workshop @ESWC 2021

Matching Tabular Data to Knowledge Graphs

Target Knowledge Graphs (KGs) are Wikidata & DBpedia
Participant at SemTab 2021: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching
Solves the Semantic Table Annotation (STA) tasks
- Cell Entity Annotation (CEA)
- Column Type Annotation (CTA)
- Column Property Annotation (CPA)

Architecture

The image above shows the distributed architecture of JenTab. Here you are a brief description of each service:

Manager: a central node, is responsible for load balancing and collects results, errors and audit records.
Runner: client node which handles the communication among
- pre-processing services (Clean Cells, Type Prediction )
- Approach
- Manager
Generic Lookup: pre-computed service, our primary solution handling miss-spellings.
Solver: Encapsulates our pipeline in terms of several calls across the dependent services.
Wikidata_Proxy encapsulates the lookup up and SPARQL query endpoint for DBpedia
DBpedia_Proxy encapsulates the lookup up and SPARQL query endpoint for Wikidata
Caching Server Centralized caching server

Quick Setup

The first step of JenTab setup is to structure the assets folder. For demonstration, here, we will setup the first round,

Input configuration (dataset)
- 2020 Dataset per Round
- Download tables and targets for Round 1
- Your downloaded tables should go under
  - /assets/data/input/2020/Round 1/
- Your downloaded CEA_Round1_Targets.csv, CTA_Round1_Targets.csv and CPA_Round1_Targets.csv should go under
  - /assets/data/input/2020/Round 1/targets/
Pre-computed Generic_Lookup db3 files
- Generic_Lookup per Round
- Download the db3 file for R1
- Your downloaded lookup.db3 should go under
  - /assets/cache/Generic_Lookup/
Baseline_Approach requires the stopwords
- Download stopwords.txt
- Rename the downloaded file to stopwords.txt
- Download lid.176.ftz for fastText model
- Your files should go under:
  - /assets/Baseline_Approach/

assets must have the following directory structure after the previous steps

+--assets
\----data
|   \----cache
|   |   \----Generic_Lookup
|   |           lookup.db3
|   |           
|   \----input
|       \----2020
|           +----Round 1
|           |   +----tables
|           |   \----targets
|                       CEA_Round1_Targets.csv
|                       CTA_Round1_Targets.csv
|                       CPA_Round1_Targets.csv
|
\---Baseline_Approach
|       stopwords.txt
|       lid.176.ftz
|       
\---Wikidata_Endpoint
        excluded_classes.csv
        excluded_colheaders.csv

After the assets are ready, the fastest way to get JenTab up and running is via docker setup, with the following order.

cd /services
Manager
- Change the default credentials in services/Manager/config.py to yours
  - username: YourManagerUsername
  - password: YourManagerPassword
- Make sure that the dataset configuration in services/Manager/config.py is set to:
  - ROUND = 1
  - YEAR = 2020
- Use the following command to lanuch the Manager node
  - docker-compose -f docker-compose.manager.yml up
- Manager is suppose to run at http://localhost:5100
All other services docker-compose -f docker-compose.yml up
Runner
- cd /Runner
- Change manager credentials in services/Runner/config.py to your selected ones
- Make sure that manager_url = 'http://127.0.0.1:5100' #local in the services/Runner/config.py
- Build an image for the Runner docker build -t runner .
- Run docker run --network="host" runner

Note1: for basic understanding of docker commands, please visit the official documentation of docker.
Note2: We also support native execution, but, in this case, you will setup each service on its own. So, we refer to:
- each folder of each service under services.
- services.md summarizes the currently used services and their ports.

Results

Materials

Nora Abdelmageed, Sirko Schindler. JenTab: A Toolkit for Semantic Table Annotations. (Accepted Paper)
Nora Abdelmageed, Sirko Schindler. JenTab: Matching Tabular Data to Knowledge Graphs. (paper)
Ontology Matching workshop on 2 November 2020 (video slides)

Citation

@inproceedings{abdelmageed_semtab2021, title={{JenTab Meets SemTab 2021's New Challenges}}, author={Abdelmageed, Nora and Schindler, Sirko}, booktitle={The 20th International Semantic Web Conference (ISWC)}, year={2021} }

@article{abdelmageed2021jentab, title={JenTab: A Toolkit for Semantic Table Annotations}, author={Abdelmageed, Nora and Schindler, Sirko}, booktitle={Knowledge Graph Construction (KGC) Workshop ESWC 2021, Accepted} year={2021} }

@inproceedings{abdelmageed2020jentab, title={Jentab: Matching tabular data to knowledge graphs}, author={Abdelmageed, Nora and Schindler, Sirko}, booktitle={The 19th International Semantic Web Conference (ISWC)}, year={2020} }

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
assets/Wikidata_Endpoint		assets/Wikidata_Endpoint
images		images
services		services
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JenTab

Architecture

Quick Setup

Results

Materials

Citation

About

Releases 3

Packages

Languages

License

fusion-jena/JenTab

Folders and files

Latest commit

History

Repository files navigation

JenTab

Architecture

Quick Setup

Results

Materials

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages