deduce

Installation - Versions - Getting Started - Documentation - Contributiong - Authors - License

Deduce 2.0.0 has been released! It includes a 10x speedup, and way more features for customizing and tailoring. Some small changes are needed to keep going from version 1, read more about it here: docs/migrating-to-v2

De-identify clinial text written in Dutch using deduce, a rule-based de-identification method for Dutch clinical text.

The development, principles and validation of deduce were initially described in Menger et al. (2017). De-identification of clinical text is needed for using text data for analysis, to comply with legal requirements and to protect the privacy of patients. Our rule-based method removes Protected Health Information (PHI) in the following categories:

Person names, including initials
Geographical locations smaller than a country
Names of institutions that are related to patient treatment
Dates
Ages
Patient numbers
Telephone numbers
E-mail addresses and URLs

If you use deduce, please cite the following paper:

Menger, V.J., Scheepers, F., van Wijk, L.M., Spruit, M. (2017). DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text, Telematics and Informatics, 2017, ISSN 0736-5853

Installation

pip install deduce

Versions

For most cases the latest version is suitable, but some specific milestones are:

2.0.0 - Major refactor, with speedups, many new options for customizing, functionally very similar to original
1.0.8 - Small bugfixes compared to original release
1.0.1 - Original release with Menger et al. (2017)

Detailed versioning information is accessible in the changelog.

Getting started

The basic way to use deduce, is to pass text to the deidentify method of a Deduce object:

from deduce import Deduce

deduce = Deduce()

text = """Dit is stukje tekst met daarin de naam Jan Jansen. De patient J. Jansen 
        (e: [email protected], t: 06-12345678) is 64 jaar oud en woonachtig 
        in Utrecht. Hij werd op 10 oktober door arts Peter de Visser ontslagen 
        van de kliniek van het UMCU."""

doc = deduce.deidentify(text)

The output is available in the Document object:

from pprint import pprint

pprint(doc.annotations)

AnnotationSet({Annotation(text='Jan Jansen', start_char=39, end_char=49, tag='persoon', length=10),
               Annotation(text='Peter de Visser', start_char=185, end_char=200, tag='persoon', length=15),
               Annotation(text='[email protected]', start_char=76, end_char=93, tag='url', length=17),
               Annotation(text='10 oktober', start_char=164, end_char=174, tag='datum', length=10),
               Annotation(text='patient J. Jansen', start_char=54, end_char=71, tag='persoon', length=17),
               Annotation(text='64', start_char=114, end_char=116, tag='leeftijd', length=2),
               Annotation(text='UMCU', start_char=234, end_char=238, tag='instelling', length=4),
               Annotation(text='06-12345678', start_char=98, end_char=109, tag='telefoonnummer', length=11),
               Annotation(text='Utrecht', start_char=143, end_char=150, tag='locatie', length=7)})

print(doc.deidentified_text)

"""Dit is stukje tekst met daarin de naam <PERSOON-1>. De <PERSOON-2> 
(e: <URL-1>, t: <TELEFOONNUMMER-1>) is <LEEFTIJD-1> jaar oud en woonachtig 
in <LOCATIE-1>. Hij werd op <DATUM-1> door arts <PERSOON-3> ontslagen 
van de kliniek van het <INSTELLING-1>."""

Aditionally, if the names of the patient are known, they may be added as metadata, where they will be picked up by deduce:

from deduce.person import Person

patient = Person(first_names=["Jan"], initials="JJ", surname="Jansen")
doc = deduce.deidentify(text, metadata={'patient': patient})

print (doc.deidentified_text)

"""Dit is stukje tekst met daarin de naam <PATIENT>. De <PATIENT> 
(e: <URL-1>, t: <TELEFOONNUMMER-1>) is <LEEFTIJD-1> jaar oud en woonachtig 
in <LOCATIE-1>. Hij werd op <DATUM-1> door arts <PERSOON-1> ontslagen 
van de kliniek van het <INSTELLING-1>."""

As you can see, adding known names keeps references to <PATIENT> in text. It also increases recall, as not all known names are contained in the lookup lists.

Documentation

A more extensive tutorial on using, configuring and modifying deduce is available at: docs/tutorial

Basic documentation and API are available at: docs

Contributing

For setting up the dev environment and contributing guidelines, see: docs/contributing

Authors

Vincent Menger - Initial work
Jonathan de Bruin - Code review
Pablo Mosteiro - Bug fixes, structured annotations

License

This project is licensed under the GNU LGPLv3 license - see the LICENSE.md file for details

Name		Name	Last commit message	Last commit date
Latest commit History 361 Commits
.github/workflows		.github/workflows
data/lookup_lists		data/lookup_lists
deduce		deduce
docs		docs
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
config.json		config.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

deduce

Installation

Versions

Getting started

Documentation

Contributing

Authors

License

About

Releases

Packages

Languages

License

jacob-rousseau/deduce

Folders and files

Latest commit

History

Repository files navigation

deduce

Installation

Versions

Getting started

Documentation

Contributing

Authors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages