Name	Name	Last commit message	Last commit date
Latest commit adam-sutton-1992 removed setup.sh as unnessescary for cpu only builds Oct 23, 2023 519ad84 · Oct 23, 2023 History 1,267 Commits
.github/workflows	.github/workflows	CU-862jjprjw Fix github actions failures (#317 )	Apr 17, 2023
configs	configs	CU-2wgnqg5 Fix typo in default regression config	Oct 13, 2022
docs	docs	Bump aiohttp from 3.8.3 to 3.8.5 (#333 )	Jul 26, 2023
examples	examples	CU-8677ge6j8 Version identification and updating (#313 )	Mar 6, 2023
medcat	medcat	CU-8692wgmkm: Remove py2neo dependency and the code that used it (#356 )	Oct 10, 2023
media	media	added latest release news / accepted paper	Oct 18, 2022
models	models	Update for models folder	Mar 18, 2019
notebooks	notebooks	TWDS neo4j post	Dec 7, 2021
tests	tests	CU-8692kpchc Fix for Rosalind link not working (#342 )	Sep 4, 2023
tutorial	tutorial	CU-2vzhd93 Remove logging tutorials (move to MedCATtutorials)	Oct 13, 2022
webapp	webapp	Bump urllib3 from 1.26.5 to 1.26.17 in /webapp/webapp	Oct 3, 2023
.dockerignore	.dockerignore	implemented a basic api used by the nlp rest service	Jun 5, 2019
.flake8	.flake8	DeID with HF Transformers (#240 )	Jun 21, 2022
.gitignore	.gitignore	CU-2e77a01: add to gitignore docs build outputs	Feb 17, 2022
.readthedocs.yaml	.readthedocs.yaml	CU-2e77a01: add readthedocs.yml config and don't pollute setup.py wit…	Feb 18, 2022
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	Create CODE_OF_CONDUCT.md	Apr 8, 2021
CONTRIBUTING.md	CONTRIBUTING.md	CU-1yfef9z Added initial and simplistic CONTRIBUTING.md (#258 )	Sep 8, 2022
LICENSE	LICENSE	Update LICENSE	Aug 4, 2022
README.md	README.md	changed README.md to reflect installation options.	Oct 16, 2023
mypy.ini	mypy.ini	Add type checks for medcat/*.py (#164 )	Nov 10, 2021
requirements-dev.txt	requirements-dev.txt	CU-862j7b9jc Mypy full release - 1.0.0 (#308 )	Feb 20, 2023
requirements.txt	requirements.txt	CU-38g55wn / CU-39cmv82 Support for python3.11 (and 3.10) (#285 )	Jan 20, 2023
setup.py	setup.py	CU-8692wgmkm: Remove py2neo dependency and the code that used it (#356 )	Oct 10, 2023

Repository files navigation

Medical oncept Annotation Tool

MedCAT can be used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT and UMLS. Paper on arXiv.

Official Docs here

Discussion Forum discourse

Available Models

We have 4 public models available:

UMLS Small (A modelpack containing a subset of UMLS (disorders, symptoms, medications...). Trained on MIMIC-III)
SNOMED International (Full SNOMED modelpack trained on MIMIC-III)
UMLS Dutch v1.10 (a modelpack provided by UMC Utrecht containing UMLS entities with Dutch names trained on Dutch medical wikipedia articles and a negation detection model repository/paper trained on EMC Dutch Clinical Corpus).
UMLS Full. >4MM concepts trained self-supervsied on MIMIC-III. v2022AA of UMLS.

To download any of these models, please follow this link and sign into your NIH profile / UMLS license. You will then be redirected to the MedCAT model download form. Please complete this form and you will be provided a download link.

News

Paper van Es, B., Reteig, L.C., Tan, S.C. et al. Negation detection in Dutch clinical texts: an evaluation of rule-based and machine learning methods. BMC Bioinformatics 24, 10 (2023).
New tool in the Cogstack ecosystem [19. December 2022] Foresight -- Deep Generative Modelling of Patient Timelines using Electronic Health Records
New Paper using MedCAT [21. October 2022]: A New Public Corpus for Clinical Section Identification: MedSecId.
Major Change to the Permissions of Use [4. August 2022] MedCAT now uses the Elastic License 2.0. For further information please click here.
New Downloader [15. March 2022]: You can now download the latest SNOMED-CT and UMLS model packs via UMLS user authentication.
New Feature and Tutorial [7. December 2021]: Exploring Electronic Health Records with MedCAT and Neo4j
New Minor Release [20. October 2021] Introducing model packs, new faster multiprocessing for large datasets (100M+ documents) and improved MetaCAT.
New Release [1. August 2021]: Upgraded MedCAT to use spaCy v3, new scispaCy models have to be downloaded - all old CDBs (compatble with MedCAT v1) will work without any changes.
New Feature and Tutorial [8. July 2021]: Integrating 🤗 Transformers with MedCAT for biomedical NER+L
General [1. April 2021]: MedCAT is upgraded to v1, unforunately this introduces breaking changes with older models (MedCAT v0.4), as well as potential problems with all code that used the MedCAT package. MedCAT v0.4 is available on the legacy branch and will still be supported until 1. July 2021 (with respect to potential bug fixes), after it will still be available but not updated anymore.
Paper: What’s in a Summary? Laying the Groundwork for Advances in Hospital-Course Summarization
(more...)

Installation

To install the latest version of MedCAT run the following command:

pip install medcat

To install the latest version of MedCAT without torch GPU support run the following command:

pip install medcat --extra_index_url https://download.pytorch.org/whl/cpu/

Demo

A demo application is available at MedCAT. This was trained on MIMIC-III and all of SNOMED-CT.

Tutorials

A guide on how to use MedCAT is available at MedCAT Tutorials. Read more about MedCAT on Towards Data Science.

Logging

Since MedCAT is primarily a library, logging has been effectively disabled by default. The idea is that the user of the library should have the choice of what, where, and how to log the information from a specific library they are using.

The idea is that the user can directly modify the logging behaviour of either the entire library or a certain set of modules within as they wish. We have provided a convenience method to add default handlers that log into the console as well as medcat.log (medcat.add_default_log_handlers).

Some details as to how one can configure the logging are described in the MedCAT Tutorials.

Acknowledgements

Entity extraction was trained on MedMentions In total it has ~ 35K entites from UMLS

The vocabulary was compiled from Wiktionary In total ~ 800K unique words

Powered By

A big thank you goes to spaCy and Hugging Face - who made life a million times easier.

Citation

@ARTICLE{Kraljevic2021-ln,
  title="Multi-domain clinical natural language processing with {MedCAT}: The Medical Concept Annotation Toolkit",
  author="Kraljevic, Zeljko and Searle, Thomas and Shek, Anthony and Roguski, Lukasz and Noor, Kawsar and Bean, Daniel and Mascio, Aurelie and Zhu, Leilei and Folarin, Amos A and Roberts, Angus and Bendayan, Rebecca and Richardson, Mark P and Stewart, Robert and Shah, Anoop D and Wong, Wai Keong and Ibrahim, Zina and Teo, James T and Dobson, Richard J B",
  journal="Artif. Intell. Med.",
  volume=117,
  pages="102083",
  month=jul,
  year=2021,
  issn="0933-3657",
  doi="10.1016/j.artmed.2021.102083"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical oncept Annotation Tool

Available Models

News

Installation

Demo

Tutorials

Logging

Acknowledgements

Powered By

Citation

About

Releases 45

Packages

Contributors 24

Languages

License

CogStack/MedCAT

Folders and files

Latest commit

History

Repository files navigation

Medical oncept Annotation Tool

Available Models

News

Installation

Demo

Tutorials

Logging

Acknowledgements

Powered By

Citation

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 45

Packages 0

Contributors 24

Languages

Packages