DVC integration to track data artifacts & support for MLOps integration #250
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
DVC i.e Data Version Control is a data and ML experiment management tool for work flow management system that takes advantage of the existing DevOps & VCS toolset that everyone is already familiar with (Git, IDE, CI/CD, etc). There are multiple use-cases and purpose for integrating DVC specifically in ML/DL related code-bases, as managing and keeping training & validation logs for lengthy & complex training experimentation's in GitHub VCS along with trained model is not something git is made for, managing binary artifacts is still very challenging and git/GitHub doesn't encourage to support them hence
DVC
makes these things very hassle free.DVC
is storage agnostic as multiple backend storage's like S3-bucket,azure,SSH, google-drive,etc. can be used to store & track all binary artifacts.DVC
is very easy and similar to use like git, as most of the commands are similar to git.I can further update on this PR if the author is interested to integrate
DVC
to this code-base.Thank You!