You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running a job across multiple CDMs does not take advantage of multi-process execution within targets (which would also support clusters).
This means that job execution is often spending time waiting on I/O in terms of the CDMs or users have to configure the execution manually across the CDMs.
The individual tasks themselves are also not fine grained enough to allow some I/O blocking tasks to be separated from cpu intensive tasks (such as PLP, SCCS or CohortMethod that operate on local andromeda objects with multiple processes).
Though some aspects of this may be related to having multiple steps within an individual analytics package that would be difficult to resolve with the way tasks are currently set up, exposing the targets workflow could be significantly improved by use of meta-targets and usage of internal targets functions to spawn multiple jobs.
This would also give us the advanced functionality of targets (e.g. use of SLURM clusters to execute multiple jobs) but even if it didn't it would be healthy to uncouple our execution infrastructure away from targets, which is currently just really being used for dependency trees.
Current approach
send study execution per CDM to strategus
Strategus creates targets tasks script internally
Strategus executes tasks calling targets
Proposed approach
Strategus takes 1) analysis script 2) Cdms to execute on
Strategus creates targets list for targets file across configured cdms
User executes targets::tar_make in custom way (or call is just masked by targets).
Note, that in both cases the execution of results uploading tasks to a results db and execution of the meta-analysis step is still an optional target type. However, in the latter case we weill be able to clearly see a dependent task for all cdm executions.
Stretch goal
Allow package maintainers to split out tasks within analytics packages by exposing an interface that allows targets to see them. For example, in PLP there can be a single process task "pull covariates" and a multiprocess task "train models". Internally, we still take advantages of multithreaded calls e.g. in C++ code or external libraries but in this case there are multiple models that use independent parameters and/or hyperparameters so will finish execution at different times. The same applies in case of any study that has many Target/Comparator/Indication comparisons which will need independent propensity score models, for example.
The text was updated successfully, but these errors were encountered:
Running a job across multiple CDMs does not take advantage of multi-process execution within targets (which would also support clusters).
This means that job execution is often spending time waiting on I/O in terms of the CDMs or users have to configure the execution manually across the CDMs.
The individual tasks themselves are also not fine grained enough to allow some I/O blocking tasks to be separated from cpu intensive tasks (such as PLP, SCCS or CohortMethod that operate on local andromeda objects with multiple processes).
Though some aspects of this may be related to having multiple steps within an individual analytics package that would be difficult to resolve with the way tasks are currently set up, exposing the targets workflow could be significantly improved by use of meta-targets and usage of internal targets functions to spawn multiple jobs.
This would also give us the advanced functionality of targets (e.g. use of SLURM clusters to execute multiple jobs) but even if it didn't it would be healthy to uncouple our execution infrastructure away from targets, which is currently just really being used for dependency trees.
Current approach
targets
Proposed approach
targets::tar_make
in custom way (or call is just masked by targets).Note, that in both cases the execution of results uploading tasks to a results db and execution of the meta-analysis step is still an optional target type. However, in the latter case we weill be able to clearly see a dependent task for all cdm executions.
Stretch goal
Allow package maintainers to split out tasks within analytics packages by exposing an interface that allows targets to see them. For example, in PLP there can be a single process task "pull covariates" and a multiprocess task "train models". Internally, we still take advantages of multithreaded calls e.g. in C++ code or external libraries but in this case there are multiple models that use independent parameters and/or hyperparameters so will finish execution at different times. The same applies in case of any study that has many Target/Comparator/Indication comparisons which will need independent propensity score models, for example.
The text was updated successfully, but these errors were encountered: