Matching Tabular Data to Knowledge Graphs
- Target Knowledge Graphs (KGs) are Wikidata & DBpedia
- Participant at SemTab 2021: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching
- Solves the Semantic Table Annotation (STA) tasks
- Cell Entity Annotation (CEA)
- Column Type Annotation (CTA)
- Column Property Annotation (CPA)
The image above shows the distributed architecture of JenTab. Here you are a brief description of each service:
- Manager: a central node, is responsible for load balancing and collects results, errors and audit records.
- Runner: client node which handles the communication among
- pre-processing services (Clean Cells, Type Prediction )
- Approach
- Manager
- Generic Lookup: pre-computed service, our primary solution handling miss-spellings.
- Solver: Encapsulates our pipeline in terms of several calls across the dependent services.
- Wikidata_Proxy encapsulates the lookup up and SPARQL query endpoint for DBpedia
- DBpedia_Proxy encapsulates the lookup up and SPARQL query endpoint for Wikidata
- Caching Server Centralized caching server
The first step of JenTab setup is to structure the assets folder. For demonstration, here, we will setup the first round,
- Input configuration (dataset)
- 2020 Dataset per Round
- Download tables and targets for Round 1
- Your downloaded
tables
should go under/assets/data/input/2020/Round 1/
- Your downloaded
CEA_Round1_Targets.csv
,CTA_Round1_Targets.csv
andCPA_Round1_Targets.csv
should go under/assets/data/input/2020/Round 1/targets/
- Pre-computed
Generic_Lookup
db3
files- Generic_Lookup per Round
- Download the
db3
file forR1
- Your downloaded
lookup.db3
should go under/assets/cache/Generic_Lookup/
Baseline_Approach
requires the stopwords- Download stopwords.txt
- Rename the downloaded file to
stopwords.txt
- Download lid.176.ftz for fastText model
- Your files should go under:
/assets/Baseline_Approach/
assets
must have the following directory structure after the previous steps
+--assets
\----data
| \----cache
| | \----Generic_Lookup
| | lookup.db3
| |
| \----input
| \----2020
| +----Round 1
| | +----tables
| | \----targets
| CEA_Round1_Targets.csv
| CTA_Round1_Targets.csv
| CPA_Round1_Targets.csv
|
\---Baseline_Approach
| stopwords.txt
| lid.176.ftz
|
\---Wikidata_Endpoint
excluded_classes.csv
excluded_colheaders.csv
After the assets are ready, the fastest way to get JenTab up and running is via docker setup, with the following order.
cd /services
- Manager
- Change the default credentials in services/Manager/config.py to yours
- username:
YourManagerUsername
- password:
YourManagerPassword
- username:
- Make sure that the dataset configuration in services/Manager/config.py is set to:
ROUND = 1
YEAR = 2020
- Use the following command to lanuch the Manager node
docker-compose -f docker-compose.manager.yml up
- Manager is suppose to run at http://localhost:5100
- Change the default credentials in services/Manager/config.py to yours
- All other services
docker-compose -f docker-compose.yml up
- Runner
cd /Runner
- Change manager credentials in services/Runner/config.py to your selected ones
- Make sure that
manager_url = 'http://127.0.0.1:5100' #local
in the services/Runner/config.py - Build an image for the Runner
docker build -t runner .
- Run
docker run --network="host" runner
- Note1: for basic understanding of docker commands, please visit the official documentation of docker.
- Note2: We also support native execution, but, in this case, you will setup each service on its own. So, we refer to:
- each folder of each service under services.
- services.md summarizes the currently used services and their ports.
- Nora Abdelmageed, Sirko Schindler. JenTab: A Toolkit for Semantic Table Annotations. (Accepted Paper)
- Nora Abdelmageed, Sirko Schindler. JenTab: Matching Tabular Data to Knowledge Graphs. (paper)
- Ontology Matching workshop on 2 November 2020 (video slides)
@inproceedings{abdelmageed_semtab2021, title={{JenTab Meets SemTab 2021's New Challenges}}, author={Abdelmageed, Nora and Schindler, Sirko}, booktitle={The 20th International Semantic Web Conference (ISWC)}, year={2021} }
@article{abdelmageed2021jentab, title={JenTab: A Toolkit for Semantic Table Annotations}, author={Abdelmageed, Nora and Schindler, Sirko}, booktitle={Knowledge Graph Construction (KGC) Workshop ESWC 2021, Accepted} year={2021} }
@inproceedings{abdelmageed2020jentab, title={Jentab: Matching tabular data to knowledge graphs}, author={Abdelmageed, Nora and Schindler, Sirko}, booktitle={The 19th International Semantic Web Conference (ISWC)}, year={2020} }