Instrument torchsupport with logging framework that supports MLflow #6

ankmathur96 · 2020-05-12T23:59:51Z

Context
There's a desire to support MLflow logging in torchsupport's tooling to support the SCGPM COVID-19 effort. The way that torchsupport is currently structured involves a hardcoded support for a Tensorboard SummaryWriter. However, as written, this means significant code change would be required to support MLflow and it would be duplicative with what already exists for Tensorboard.

Proposal for Review
I propose that the AbstractVAETraining class, instead of having a Tensorboard writer, has a "Logger". This logger supports a generalized "log" API, which is implemented by backends that implement the Logger interface. For example, there would be a TensorboardLogger interface that implements logging Tensorboard events. In this framework, it's easy to conceptualize an MLflowLogger, which would log to MLflow as well with the same API. This way, if you're implementing some kind of custom training workflow, you have a consistent logging interface to call that is flexible in terms of what service you log to.

There's 2 open questions here:

Where should such a logging system sit code-wise? Should it be in the Training abstract class? For now, it's been put in AbstractVAETraining, since that's where the writer was defined.
Should writer be removed? For now, it is not, since other training workflows may have overwritten functions like step/train/checkpoint and assumed the existence of self.writer.

TODOs:

Remove self.writer calls in the pre-defined VAE workflows. They're left in so you all have context for where changes were made.
Add a way to include a tracking URI for MLflow - this is possible to pass in as an environment variable, but ideally, we'd like it to be settable as a param.
Saving model parameters - right now, the code seems to be packaging up parameters from the optimizer and models through some mechanism. It would be good to find a way to log these to MLflow as well.

ankmathur96 added 3 commits May 12, 2020 16:36

instrument torchsupport with logging tools

30fdb43

add back writer

70ee362

minor fixes

28acb5a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrument torchsupport with logging framework that supports MLflow #6

Instrument torchsupport with logging framework that supports MLflow #6

ankmathur96 commented May 12, 2020 •

edited

Loading

Instrument torchsupport with logging framework that supports MLflow #6

Are you sure you want to change the base?

Instrument torchsupport with logging framework that supports MLflow #6

Conversation

ankmathur96 commented May 12, 2020 • edited Loading

ankmathur96 commented May 12, 2020 •

edited

Loading