-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: benchmark sanity checks #271
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dilya-bench-refactor #271 +/- ##
========================================================
+ Coverage 90.64% 91.07% +0.42%
========================================================
Files 60 60
Lines 2203 2253 +50
========================================================
+ Hits 1997 2052 +55
+ Misses 206 201 -5 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @gumityolcu for your hard work. I have:
- done some refactoring to isolate logging from the main library's functionality
- added tests for both wandb (set to offline) and tensorboard
- removed duplicated model-to-device placing
Logging is done through Lightning logger determined through trainer configs. "wandb" or "tensorboard" can be given. The same logger is used to log training statistics, and the sanity check results in bench_prept/train.py script. I used "tensorboard" in the tests because wandb requires creating and exposing an API key. I did not use Hydra's instantiate since logger creation is handled through configs and Lightning.
Sanity checks are implemented as a seperate function that returns a dictionary of scores. The base class computes train and val accuracy and subclasses build on that. Added test for sanity check functionality.