Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the difference between the cdb.dat files provided by the UMLS Small/Full models versus creating own cdb.dat from UMLS DB #137

Open
stefanhgm opened this issue Jun 23, 2023 · 1 comment

Comments

@stefanhgm
Copy link

Hi everyone,

first of all thanks for your great work! I want to setup MedCatTrainer for UMLS tagging. When using one of the prepared UMLS models (small/large) I get the tagging including the CUI, but no other information. I think the problem is the missing "concepts imported" (red cross for the project). Hence, I tried to upload the necessary concepts (cdb.dat).

I wondered if the cdb.dat files of the prepared UMLS models come with the necessary information already (i.e. UMLS name, types, ...) or if it is still necessary to load the UMLS into Postgres, execute the script from the MedCat paper to get a CSV, and building the cdb. I tried the latter process, but it is very cumbersome and the documentation seems outdated (e.g. prep_cdb.prepare_csvs(paths) as used here https://towardsdatascience.com/medcat-extracting-diseases-from-electronic-health-records-f53c45b3d1c1 does not exist anymore).

In case a newly generated cdb.dat file is needed, I wondered if you could also provide it via the NIH links as the UMLS models. I think this would save a lot of hassle for anyone trying to use the UMLS as a backbone of MedCat.

Cheers!

@tomolopolis
Copy link
Member

hi @stefanhgm - thanks - apologies for the slow response. We don't use UMLS much and mostly rely on SNOMED CT backed MedCAT models, but nevertheless UMLS should still work in the trainer. I assume that you've got the trainer running and are annotating documents with a UMLS model as you say you're seeing the concept IDs but just no meta data on the right hand side, and you're seeing a red check mark.

Did you follow the steps in the docs to get a green check mark?

Yes the UMLS models comes with the associated metadata to be populated into the MedCATTrainer solr search service

If you execute:

from medcat.cdb import CDB
cdb = CDB.load('<<umls small file>>')
cdb.cui2names
<<
a populated dict of names
>>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants