HackaLOD 2024 - LOD-Linker

This repository contains scripts that are developed to link People records extracted from three LOD sources and how we linked them using Knowledge Graph Embeddings (KGE)
This is our contribution for HackaLOD 2024
Team:
- Nora Abdelmageed - LOD Expert at Nieuwe Instituut
- Lois Hutubessy - Program Manager at Nieuwe Instituut
- Rianne Piening - Application Manager at RKD
- Kelly James - Collectiebeheerder at Nieuwe Instituut

Data Sources and Criteria

We retrieved people records from three LOD sources using SPARQL queries. However, you are not limited to SPARQL; you can extract CSV files from your favorite application, e.g., a collection management system. The following are the data sources where we explain the fields and filter functions applied for each. You can find our Queries and retrieved Data in this repository as well.

Sources

Nieuwe Instituut (NI)
Wikidata
RKD

Fields:

Names (including all alternative names)
birth dates, birthplaces
OPTIONAL(death dates, death places)
occupations

Filters

Nieuwe Instituut (NI)

retrieved labels for all places and occupations are English

Wikidata

Country of birth is Netherlands
Occupation: architect, photographer, designer that corresponds to wd:Q42973 wd:Q33231 wd:Q5322166
Retrieved labels for all places and occupations are English

RKD

Retrieved labels for all places and occupations are English
death dates and places are mandatory.

Methodology

LOD-Linker relies on Knowledge Graph Embeddings (KGE), TransH model.
The implementation uses the pykeen library for KGE.
We conducted 4 experiments to test. All are found in the Code folder in this repository:

Language & Data Points
- All retrieved labels should be in the same language?
- Can we match English with Dutch
Embedding Size
- Is a small dimension sufficient, or are large embeddings required?
Data Cleaning
- Should we normalize all names? Could you get rid of all punctuation and make sure letter case?
- Can we keep the original format of the names (remember it is one of the downsides of baselines)
Data Balance
- Each source should contribute nearly the same amount of record
- One source could be dominant (biases?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HackaLOD 2024 - LOD-Linker

Data Sources and Criteria

Methodology

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Code		Code
Data		Data
Queries		Queries
LICENSE		LICENSE
README.md		README.md

License

NoYo25/Hackalod2024-LOD-linker

Folders and files

Latest commit

History

Repository files navigation

HackaLOD 2024 - LOD-Linker

Data Sources and Criteria

Methodology

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages