Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Andromeda object with JSON files instead of RDS files #22

Open
lhjohn opened this issue Jun 29, 2021 · 6 comments
Open

Andromeda object with JSON files instead of RDS files #22

lhjohn opened this issue Jun 29, 2021 · 6 comments

Comments

@lhjohn
Copy link

lhjohn commented Jun 29, 2021

Hi,

I am looking into the possibility of writing and reading Andromeda data objects directly from other programming languages or web interfaces such as C++ or Python. Currently, we can read and write the covariate data in Python or C++ using SQLite (example).

However, there are a number of RDS files in the Andromeda object (cohort.rds, metaData.rds, outcomes.rds, etc..), which can be read exclusively by R.

How feasible would it be to natively support a JSON version of these RDS files in the Andromeda object.

This would allow us to implement our "own version of Andromeda" using C++ or Python but still keep compatibility with OHDSI's R eco system.

@msuchard
Copy link
Member

An alternative to re-engineering Andromeda to support yet another language, maybe we could try:

https://github.com/ofajardo/pyreadr

@ablack3
Copy link
Collaborator

ablack3 commented Jul 1, 2021

I certainly like the idea of being able to read covariate data in python. Are there any OHDSI python packages where a loadAndromeda function could live? I guess it would be in https://github.com/OHDSI/DeepPatientLevelPrediction.

@lhjohn
Copy link
Author

lhjohn commented Jul 1, 2021

We are working on an initial implementation in OHDSI/DeepPatientLevelPrediction, as this package will be build primarily using PyTorch.

I am using @msuchard suggested package pyreadr. It seems to work fine for unnested, standard dataframes.

@schuemie
Copy link
Member

schuemie commented Jul 6, 2021

It might be important to separate two things:

On the one hand we have Andromeda objects that are mainly a SQLite database (zipped when stored) and some R attributes (typically an R list object with some meta-data).

On the other hand there are more complex objects that include Andromeda objects. I think the PlpData object is actually an R list where one member is an Andromeda object.

In packages that I've been involved in (FeatureExtraction, CohortMethod, SelfControlledCaseSeries) the data objects inherit from Andromeda objects, and are therefore still just a SQLite database with some meta data as attributes. For example, the covariateData object created by FeatureExtraction is an Andromeda object with several tables in the SQLite database and a single 'metaData' attribute that is a list with two members: the populationSize (numeric) and a cohortId vector of numeric.

@ablack3
Copy link
Collaborator

ablack3 commented Oct 25, 2021

I think Andromeda is responsible for saving and restoring user defined attributes which could be any R object. @schuemie Would it be reasonable to restrict what types of attributes can be assigned to an Andromeda object? For example is fitted model a valid andromeda attribute?

library(Andromeda)
and <- andromeda(cars = cars)
attr(and, "model") <- lm(speed ~ dist, and$cars)

# I dont think I can convert a fitted model to json easily
jsonlite::toJSON(and$model)

@schuemie
Copy link
Member

Yes, I don't see an issue with restricting the attributes to objects that can be converted to JSON.

One annoyance I've found when converting R objects to JSON is that the object class attribute is lost. I've written some code that preserves object attributes like these in the JSON, as you can see here. I recommend also using that here as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants