The Greater Plains Collaborative (GPC), led by KUMC Medical Informatics, is a PCORI Clinical Data Research Network (CDRN). As explained in the GROUSE executive summary:
... we seek to expand the data completeness of our patients’ health care processes and outcomes and understand the information gain for our complete and comprehensive population by comparing correlations between the Medicare and Medicaid claims data with the data in our CDRN that includes the electronic health record and billing data from each of our component health systems, clinical registry data (e.g. hospital tumor registries), private payer claims data, and patient-reported outcomes, as available, and work with our GPC and CDRN investigators to answer specific cohort questions to achieve our overarching aims.
This software supports the project as follows:
- GPC sites will send the NewWave-GDIT (Research Data Distribution Center under contract with CMS) encrypted beneficiary identifiers (e.g., SSN, HICs, name, DOB) along with a randomly generated ID for each of their patients.
- NewWave-GDIT will generate the finder files that contain mappings
between the random IDs and CMS IDs and send the crosswalk files
and CMS data back to KUMC.
- grouse_tables.csv summarizes the CMS data
- staging supports decrypting/loading the CMS data into Oracle from the encrypted archives.
- KUMC will integrate the crosswalk and CMS data with individual
site EMR data (limited data set) through linking the random
IDs. This will allow KUMC to achieve record linkage of patients’s
EMR and claims data without obtaining or retaining actual patient
level PII from individual GPC sites.
- deid supports creating a de-identified copy of the CMS tables.
- site_integration supports integrating site EMR data
- The merged dataset will be de-identified and made available to project team members for running analyses or using the i2b2 client.
[//]: # TODO: add obfuscate.py, which supports step 1.
The platform used at the University of Kansas Medical Center Division of Medical Informatics includes:
- Linux (SLES 12)
- Python
- Oracle database 12c
- Jenkins (to run scripts in a repeatable way while maintaining logs)