SympTEMIST

Preparation

Clone xMEN repository (https://github.com/hpi-dhc/xmen) to obtain dataloaders and benchmark script:
```
git clone https://github.com/hpi-dhc/xmen
cd xmen
poetry install
```
Get latest version of SympTEMIST gazetteer from Zenodo (https://zenodo.org/records/10635215) and put the TSV into xmen/local_files

Prepare KB and indicies for candidate generation:

xmen dict benchmarks/benchmark/symptemist.yaml --code examples/dicts/bsc_gazetteer.py
xmen dict benchmarks/benchmark/symptemist.yaml --all

Notebooks

Notebook	Description
0_Dataset.ipynb	Statistics for SympTEMIST shared task and comparable datasets
1_LLM_Simplification.ipynb	Applying LLM-based text simplification

Experiments

Executing 1_LLM_Simplification.ipynb results in a dataset of candidates based on simplified mentions (symptemist_candidates_simplified_cutoff) in the current folder. It can be used as a candidate set for running the full SympTEMIST entity linking pipeline with a trainable re-ranker. The BERT checkpoint for initializing the cross-encoder can also be adapted.

cd xmen/benchmarks

Without Simplification

python run_benchmark.py benchmark=symptemist output=./training

With Simplification (Pre-Processed Candidates)

python run_benchmark.py benchmark=symptemist output=./training +candidates_path=../../symptemist/symptemist_candidates_simplified_cutoff

With Simplification and Alternative BERT Checkpoint

(e.g. for PlanTL-GOB-ES/roberta-base-bne)

python run_benchmark.py benchmark=symptemist output=./training +candidates_path=../../symptemist/symptemist_candidates_simplified_cutoff linker.reranking.training.model_name=PlanTL-GOB-ES/roberta-base-bne

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
0_Dataset.ipynb		0_Dataset.ipynb
1_LLM_Simplification.ipynb		1_LLM_Simplification.ipynb
README.md		README.md
distemist_lookup_gpt-4-0125-preview_20240314-200849_prompt1.pkl		distemist_lookup_gpt-4-0125-preview_20240314-200849_prompt1.pkl
symptemist_lookup_gpt-4-0125-preview_20240214-205237_prompt1.pkl		symptemist_lookup_gpt-4-0125-preview_20240214-205237_prompt1.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SympTEMIST

Preparation

Notebooks

Experiments

Without Simplification

With Simplification (Pre-Processed Candidates)

With Simplification and Alternative BERT Checkpoint

About

Releases

Packages

Languages

hpi-dhc/symptemist

Folders and files

Latest commit

History

Repository files navigation

SympTEMIST

Preparation

Notebooks

Experiments

Without Simplification

With Simplification (Pre-Processed Candidates)

With Simplification and Alternative BERT Checkpoint

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages