2024 TREC Biogen Validation Script

This script's goal is to validate submissions according to the rules described on the task website.

Installation Instructions

You will probably want to set up a virtual/conda environment for this.

Clone the repository
Install dependencies using pip: `pip install -r requirements.txt
Install the en_core_web_lg Spacy model: python -m spacy download en_core_web_lg
Make sure you have the necessary data files handy:
- BioGen2024topics-json.txt
- pubmed_ids_last_20_years.json.gz

Instructions for Use

The simplest way to run the script is via the command line:

python -m trec_biogen_validator \
    --path_to_submission= PATH_TO_SUBMISSION_JSON_FILE \
    --path_to_valid_pmids PATH_TO_PUBMED_IDS_JSON_FILE \
    --path_to_topics PATH_TO_TOPICS_JSON_FILE

The program will open the submission file, go through each topic, and perform appropriate validations, printing out information about errors and warnings along the way.

The program has an optional argument, --dump_sentence_tokenization; if this is set, topics that have errors or warnings will be accompanied by a dump of the Spacy sentence tokenization, to aid in debugging.

In addition to this method of command-line use, you can also run the program via a config file. The script uses jsonargparse so any of its formats will work; for example, YAML is an option:

path_to_submission: PATH_TO_SUBMISSION_JSON_FILE
path_to_valid_pmids: PATH_TO_PUBMED_IDS_JSON_FILE
path_to_topics: PATH_TO_TOPICS_JSON_FILE
dump_sentence_tokenization: true

And then from the terminal:

python -m trec_biogen_validator --config=PATH_TO_CONFIG_FILE

API usage

The real action is happening in the trec_biogen_validator.util.validator.Validator class; if you want to "roll your own" wrapper around this functionality, it should be pretty straightforward. Take a look at the unit tests or at trec_biogen_validator/__main__.py for examples of how the class works.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
test		test
trec_biogen_validator		trec_biogen_validator
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2024 TREC Biogen Validation Script

Installation Instructions

Instructions for Use

API usage

About

Releases

Packages

Languages

stevenbedrick/trec_biogen_validator

Folders and files

Latest commit

History

Repository files navigation

2024 TREC Biogen Validation Script

Installation Instructions

Instructions for Use

API usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages