Skip to content

This repository contains scripts that are developed to link People records extracted from various LOD sources and how we linked them using Knowledge Graph Embeddings (KGE)

License

Notifications You must be signed in to change notification settings

NoYo25/Hackalod2024-LOD-linker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HackaLOD 2024 - LOD-Linker

  • This repository contains scripts that are developed to link People records extracted from three LOD sources and how we linked them using Knowledge Graph Embeddings (KGE)
  • This is our contribution for HackaLOD 2024
  • Team:
    • Nora Abdelmageed - LOD Expert at Nieuwe Instituut
    • Lois Hutubessy - Program Manager at Nieuwe Instituut
    • Rianne Piening - Application Manager at RKD
    • Kelly James - Collectiebeheerder at Nieuwe Instituut

Data Sources and Criteria

We retrieved people records from three LOD sources using SPARQL queries. However, you are not limited to SPARQL; you can extract CSV files from your favorite application, e.g., a collection management system. The following are the data sources where we explain the fields and filter functions applied for each. You can find our Queries and retrieved Data in this repository as well.

Sources

  1. Nieuwe Instituut (NI)
  2. Wikidata
  3. RKD

Fields:

  • Names (including all alternative names)
  • birth dates, birthplaces
  • OPTIONAL(death dates, death places)
  • occupations

Filters

  1. Nieuwe Instituut (NI)
  • retrieved labels for all places and occupations are English
  1. Wikidata
  • Country of birth is Netherlands
  • Occupation: architect, photographer, designer that corresponds to wd:Q42973 wd:Q33231 wd:Q5322166
  • Retrieved labels for all places and occupations are English
  1. RKD
  • Retrieved labels for all places and occupations are English
  • death dates and places are mandatory.

Methodology

  • LOD-Linker relies on Knowledge Graph Embeddings (KGE), TransH model.
  • The implementation uses the pykeen library for KGE.
  • We conducted 4 experiments to test. All are found in the Code folder in this repository:
  1. Language & Data Points
    • All retrieved labels should be in the same language?
    • Can we match English with Dutch
  2. Embedding Size
    • Is a small dimension sufficient, or are large embeddings required?
  3. Data Cleaning
    • Should we normalize all names? Could you get rid of all punctuation and make sure letter case?
    • Can we keep the original format of the names (remember it is one of the downsides of baselines)
  4. Data Balance
    • Each source should contribute nearly the same amount of record
    • One source could be dominant (biases?)

About

This repository contains scripts that are developed to link People records extracted from various LOD sources and how we linked them using Knowledge Graph Embeddings (KGE)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published