Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structural change suggestion to use of Targets #128

Open
azimov opened this issue Mar 21, 2024 · 0 comments
Open

Structural change suggestion to use of Targets #128

azimov opened this issue Mar 21, 2024 · 0 comments

Comments

@azimov
Copy link
Collaborator

azimov commented Mar 21, 2024

This is a suggestion about how the execution works and how it could be improved.

Currently the way I see this happening is, roughly, as follows:

  • Take the analysis spec and work out the dependencies
  • Create targets from this dependency tree
  • Run targets from inside strategus inside an internal work folder

My problem with this approach is that (as noted in #63) this makes it kind of difficult to do things when something goes wrong. For example, there are manual work arounds when a module fails (because of a bug in the package) but its very difficult to invalidate the cache that causes it to run so you can't skip the step etc.
Furthermore, it means that execution can only happen in a rigid way that is maybe not how users may wish to work. For example, there may be simple custom scripts that run on the data set that could easily be added to the workflow etc.

Proposed change
Instead I propose the following change:

  • An analysis spec still creates a targets workflow for the user but this is largely a convenience that creates an _targets.R file in the user space.
  • RunStrategus will create this by default but the users can also call targets::tar_make outside of this

We should also try to follow patterns like those in the tarchetypes package

** Benefits of this approach **

  • Users can customize their own execution around targets (e.g. are they using a SLURM cluster? or just want to set parallel execution because they have a 32 core machine and CohortMethod and SelfControlledCaseSeries can execute simultaneously)
  • Users can easily configure this to run on multiple databases where at the moment this is sequential
  • Multiprocess targets will allow some modules to fail but have execution continue on other non-dependent parts of the workflow DAG
  • Users can add extra steps (e.g. I might have a custom utility that pipes the csv files into a report that I publish)
  • Users can better interrogate failure steps and remove them from the workflow
  • Outputs in the terminal will look nicer
  • Users will be able to use all targets features (such as visualizing the workflow)

Issues we will need to figure out:

  1. I see there being problems when users customize this targets file and then change the analysis spec. I think creating targets dynamically from the analysis spec is possible though (so a target is itself a visual tool)

  2. will other things still work (e.g. loading code in modules could be tricky)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

2 participants