Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument torchsupport with logging framework that supports MLflow #6

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ankmathur96
Copy link

@ankmathur96 ankmathur96 commented May 12, 2020

Context
There's a desire to support MLflow logging in torchsupport's tooling to support the SCGPM COVID-19 effort. The way that torchsupport is currently structured involves a hardcoded support for a Tensorboard SummaryWriter. However, as written, this means significant code change would be required to support MLflow and it would be duplicative with what already exists for Tensorboard.

Proposal for Review
I propose that the AbstractVAETraining class, instead of having a Tensorboard writer, has a "Logger". This logger supports a generalized "log" API, which is implemented by backends that implement the Logger interface. For example, there would be a TensorboardLogger interface that implements logging Tensorboard events. In this framework, it's easy to conceptualize an MLflowLogger, which would log to MLflow as well with the same API. This way, if you're implementing some kind of custom training workflow, you have a consistent logging interface to call that is flexible in terms of what service you log to.

There's 2 open questions here:

  1. Where should such a logging system sit code-wise? Should it be in the Training abstract class? For now, it's been put in AbstractVAETraining, since that's where the writer was defined.
  2. Should writer be removed? For now, it is not, since other training workflows may have overwritten functions like step/train/checkpoint and assumed the existence of self.writer.

TODOs:

  1. Remove self.writer calls in the pre-defined VAE workflows. They're left in so you all have context for where changes were made.
  2. Add a way to include a tracking URI for MLflow - this is possible to pass in as an environment variable, but ideally, we'd like it to be settable as a param.
  3. Saving model parameters - right now, the code seems to be packaging up parameters from the optimizer and models through some mechanism. It would be good to find a way to log these to MLflow as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant