Implement syncbn for TensorFlow #18671

edwardyehuang · 2023-10-23T07:54:04Z

Add SyncBN implementation (via synchronized argument in layers.BatchNormalization) for TensorFlow. Note that, I don't know how to write the test for it in Keras online test (it requires multi-gpus).

#18667

edwardyehuang · 2023-10-23T07:59:55Z

Also, I am not sure if it is better to move the framework-specified code (e.g. TensorFlow) to backbend, the current distribution_lib.py states "!!!DO NOT USE!!!"

codecov-commenter · 2023-10-23T08:05:58Z

Codecov Report

Attention: 27 lines in your changes are missing coverage. Please review.

Comparison is base (cb65582) 78.53% compared to head (e7a442f) 78.47%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #18671      +/-   ##
==========================================
- Coverage   78.53%   78.47%   -0.07%     
==========================================
  Files         335      335              
  Lines       32943    32975      +32     
  Branches     6450     6454       +4     
==========================================
+ Hits        25873    25878       +5     
- Misses       5512     5538      +26     
- Partials     1558     1559       +1

Flag	Coverage Δ
keras	`78.37% <18.18%> (-0.07%)`	⬇️
keras-jax	`63.39% <18.18%> (-0.05%)`	⬇️
keras-numpy	`57.70% <18.18%> (-0.05%)`	⬇️
keras-tensorflow	`64.49% <18.18%> (-0.05%)`	⬇️
keras-torch	`65.20% <18.18%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
keras/layers/normalization/batch_normalization.py	`74.76% <18.18%> (-25.24%)`	⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fchollet

Thanks for the PR!

@qlzh727 can you advise on how to proceed for testing the feature with 2 devices?

keras/layers/normalization/batch_normalization.py

fchollet

LGTM, thanks. Will post-process this on our side.

qlzh727 · 2023-10-23T17:26:32Z

For sync batch norm test, user can config 2 virtual cpus and use them for mirrored strategy test.

https://www.tensorflow.org/api_docs/python/tf/config/set_logical_device_configuration

fchollet · 2023-10-23T17:49:28Z

@qlzh727 the test for this feature is now moved to nn_test.py:test_moments_sync: https://github.com/keras-team/keras/blob/master/keras/ops/nn_test.py#L1441

How should we modify the test to ensure correctness? (right now I think the sync branch is actually never run)

qlzh727 · 2023-10-23T17:52:24Z

ack, I will add a test for that. The sync logic will only have a difference in the distribution setting.

Implement syncbn for TensorFlow

e7a442f

google-ml-butler bot added the size:M label Oct 23, 2023

google-ml-butler bot assigned gbaned Oct 23, 2023

edwardyehuang mentioned this pull request Oct 23, 2023

Missing SyncBatchNormalization in keras core #18667

Closed

fchollet reviewed Oct 23, 2023

View reviewed changes

edwardyehuang added 2 commits October 23, 2023 16:38

code refined

ca38326

fix : make calculate_mean_and_var private

4ac3428

fchollet approved these changes Oct 23, 2023

View reviewed changes

fchollet merged commit 666b8d3 into keras-team:master Oct 23, 2023
5 of 6 checks passed

qlzh727 mentioned this pull request Oct 23, 2023

Add unit test for sync batch norm under distribution strategy. #18677

Merged

edwardyehuang mentioned this pull request Mar 4, 2024

[BUG][all_reduce] INVALID_ARGUMENT: You must feed a value for placeholder tensor #19246

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement syncbn for TensorFlow #18671

Implement syncbn for TensorFlow #18671

edwardyehuang commented Oct 23, 2023

edwardyehuang commented Oct 23, 2023

codecov-commenter commented Oct 23, 2023 •

edited

Loading

fchollet left a comment

fchollet left a comment

qlzh727 commented Oct 23, 2023

fchollet commented Oct 23, 2023

qlzh727 commented Oct 23, 2023

Implement syncbn for TensorFlow #18671

Implement syncbn for TensorFlow #18671

Conversation

edwardyehuang commented Oct 23, 2023

edwardyehuang commented Oct 23, 2023

codecov-commenter commented Oct 23, 2023 • edited Loading

Codecov Report

fchollet left a comment

Choose a reason for hiding this comment

fchollet left a comment

Choose a reason for hiding this comment

qlzh727 commented Oct 23, 2023

fchollet commented Oct 23, 2023

qlzh727 commented Oct 23, 2023

codecov-commenter commented Oct 23, 2023 •

edited

Loading