-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to design and distribute a Strategus network study #148
Comments
As I mentioned, for now I would just keep renv lock file and specifications JSON separate. A strawman proposal on how to distribution that in the short term:
# Instantiate R environment:
install.packages("renv")
download.file("https://raw.githubusercontent.com/ohdsi-studies/<repo name>/main/renv.lock", "renv.lock")
renv::init()
# Run study:
download.file("https://raw.githubusercontent.com/ohdsi-studies/<repo name>/main/specs.json", "specs.json")
analysisSpecifications <- ParallelLogger::loadSettingsFromJson("specs.json")
executionSettings <- Strategus::createCdmExecutionSettings(
workDatabaseSchema = "<work schema>",
cdmDatabaseSchema = "<cdm schema>",
cohortTableNames = CohortGenerator::getCohortTableNames(cohortTable = "<cohort table name>"),
workFolder = "<work folder>",
resultsFolder = "<results folder>",
minCellCount = 5
)
connectionDetails <- DatabaseConnector::createConnectionDetails(...)
Strategus::execute(
analysisSpecifications = analysisSpecifications,
executionSettings = executionSettings,
connectionDetails = connectionDetails
) Alternatively, or additionally, we could have the repo contain an RStudio project with the renv lock file, specs JSON, and a single R script that instantiates the R environment and executes the study. People could clone the repo and modify and run the R script. In the future we'd expect an execution engine to handle all of this. |
Thanks @schuemie - I'd support keeping the renv.lock file and the analysis specification as separate documents since they serve two different functional purposes. Having the renv.lock file describe the configuration of the R environment for the study enables us to use renv as that package intends vs. trying to expose renv functionality inside of Strategus.
I'd support using an RStudio project for distributing a Strategus study since it would also allow us to bundle together additional resources as mentioned in #98 and provide support for viewing results at a specific site per #78.
I'll tag @konstjar for his thoughts here. Arachne supports uploading a study .zip file that contains a script used to execute the study and supports supplying that script with parameters for execution. So I think that having either a script as you showed in your previous post or an R Project would work well to run via Arachne. |
Bringing over this from #51 so that we can discuss in the context of how we'd propose the design and distribution of a Strategus study.
A potential pitfall with the workflow above is that we could become out-of-sync between the renv.lock file that was used to create the analysis specification and the packages needed to run the study. @schuemie suggested here that we include the renv.lock file into the analysis specification so that we have a hard link between the lock file and specifications. Additionally, @chrisknoll expressed a desire to have a release of Strategus comes with a published renv.lock file that contains which versions of package dependencies have been tested with the given version of Strategus, and that within a single release of Strategus you may have multiple updates to underlying packages. So, if we adopt the ideas above (include renv.lock in the analysis spec and have an renv.lock file that comes with Strategus), what does a developer workflow look like to design a study using Strategus? Here's how I was thinking about it at the moment, sticking with the idea that we're still distributing an R project:
What makes me uneasy about this approach is that if we need to change an R dependency, we'd have to update the renv.lock file in the root of the project AND the analysis specification. We'd potentially need methods inside of Strategus to: keep the renv.lock file in the root of the project in sync with the one that ultimately winds up in the analysis specification, methods to check that the environment used to execute the study is consistent with what is declared in the analysis specification, etc. I fully agree that we need to have a hard link between the lock file and specifications. Is that not what the R project is providing in this case since it is including the renv.lock file? Also tagging @mdlavallee92 as I think setting up the Strategus development environment was a topic in Ulysses here: OHDSI/Ulysses#17. We'd need to decide on how we want developers to design a study using Strategus and where needed we can use Ulysses to help with the setup. |
As discussed yesterday, perhaps we could create a StrategusBootstrap package (might come up with a cooler name). This would be a very light package, with only a dependency on
The first is aimed at the time when we start to design our study. The user might do something like: install.packages("StrategusBootstrap")
StrategusBootstrap::createStrategusEnvironment() After this, Strategus and its dependencies will be installed and ready to start defining the study. The second is aimed at network study execution. The user might do something like: install.packages("StrategusBootstrap")
StrategusBootstrap::createStrategusEnvironment("ohdsi-studies\SemaglutideNaion") After this, Strategus and its dependencies are installed and ready to execute the study. Both functionalities could write some additional R file(s) that the user can use for their respective task (designing the study or executing the study) |
I'm going to remove this from the v1.0 milestone and will leave this issue open since we have not fully addressed all of the points from this discussion. For the v1.0 release, this will be documented and tracked via ohdsi-studies/StrategusStudyRepoTemplate#4. |
Adding this issue since it is related to a number of other issues that are part of the v1.0 milestone such as #98, #78, #29.
Per discussion with @schuemie: As discussed at the last Global Symposium, one idea is to combine the renv.lock file with the study analysis specifications into a single JSON object, and that renv.lock file can be anything.
We imagined several scenarios, including:
For now, we could just have the lock file and the specifications be separate (although they definitely belong together). Anyone wanting to run the study would use the lock file to instantiate their R environment, and at that point the right version of Strategus would be installed and ready to run the specifications.
This does beg the question: how do we want to design and distribute a Strategus network study? If we look at previous studies that used Strategus (anti-VEGF SOS Study), we are using an R Project to encapsulate the entire study. We've also discussed only requiring the analysis specification JSON as the means to exchange a full study amongst collaborators. Let's use this issue to discuss this important topic.
The text was updated successfully, but these errors were encountered: