fix: fix seed with multiple ranks #4479

njzjz · 2024-12-19T08:07:29Z

Fix #4440.

Summary by CodeRabbit

New Features
- Enhanced seed handling to support both single integers and lists for improved randomness in distributed training.
- Added logging for neighbor statistics calculation during training.
Bug Fixes
- Improved error handling in data loading processes to ensure robustness.
Documentation
- Updated documentation for methods related to seed and batch size management for clarity.

Fix deepmodeling#4440.

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

deepmd/pt/utils/dataloader.py

coderabbitai · 2024-12-19T08:09:15Z

📝 Walkthrough

Walkthrough

The pull request introduces modifications to seed handling across multiple files in the DeePMD-kit codebase. The primary change involves updating the random seed generation mechanism to improve safety and randomness in distributed training scenarios. Instead of simply adding the rank to the seed, the new implementation uses a list containing the rank and a modified seed value. This change is consistently applied across TensorFlow, PyTorch, and other entrypoint files to ensure more robust random number generation.

Changes

File	Change Summary
`deepmd/pd/entrypoints/main.py`	Modified `prepare_trainer_input_single` to use a list for `rank_seed`
`deepmd/pt/entrypoints/main.py`	Updated `prepare_trainer_input_single` with new `rank_seed` calculation
`deepmd/pt/utils/dataloader.py`	Enhanced `setup_seed` to handle list/tuple seeds, improved batch size handling
`deepmd/tf/entrypoints/train.py`	Modified seed initialization logic, added logging for neighbor statistics
`deepmd/utils/random.py`	Updated `seed` function signature to accept integer or list of integers
`deepmd/pd/utils/dataloader.py`	Updated `setup_seed` to accept list/tuple seeds and added `mix_entropy` function

Assessment against linked issues

Objective	Addressed	Explanation
Fix unsafe seed generation in parallel training [#4440]	✅
Prevent overlapping worker seeds	✅

Possibly related PRs

feat: support seed for pt/dp models #3773: Adds seed parameter for random seed initialization, which is conceptually similar to the current changes in seed handling.
feat: add seeds to dpmodel and fix seeds in tf & pt #3880: Fixes seed handling in TensorFlow and PyTorch, aligning with the main PR's focus on modifying seed handling in distributed training scenarios.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

deepmd/utils/random.py (1)
Line range hint 59-65: Update docstring to reflect new parameter type

The docstring should be updated to document that val can now be either an integer or a list of integers.
 def seed(val: Optional[Union[int, list[int]]] = None) -> None:
     """Seed the generator.
 
     Parameters
     ----------
-    val : int
+    val : Optional[Union[int, list[int]]]
         Seed.
     """
     _RANDOM_GENERATOR.seed(val)
deepmd/pt/utils/dataloader.py (1)
53-57: Simplify code using ternary operator

The if-else block can be simplified using a ternary operator.
-    if isinstance(seed, (list, tuple)):
-        mixed_seed = mix_entropy(seed)
-    else:
-        mixed_seed = seed
+    mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed
🧰 Tools

🪛 Ruff (0.8.2)

53-56: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

54-54: Undefined name mix_entropy

(F821)
deepmd/pd/entrypoints/main.py (1)

126-126: Improved seed handling for distributed training

The new seed calculation [rank, seed % (2**32)] is a robust improvement that:

Prevents potential seed collisions between ranks that could occur with simple addition

Ensures reproducibility while maintaining unique seeds per rank

Guards against integer overflow with modulo operation

Consider documenting this seed handling pattern in the codebase's distributed training guide to ensure consistent usage across different implementations.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0914e1 and 124f432.

📒 Files selected for processing (5)

deepmd/pd/entrypoints/main.py (1 hunks)
deepmd/pt/entrypoints/main.py (1 hunks)
deepmd/pt/utils/dataloader.py (1 hunks)
deepmd/tf/entrypoints/train.py (1 hunks)
deepmd/utils/random.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

deepmd/pt/utils/dataloader.py

53-56: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

54-54: Undefined name mix_entropy

(F821)

🔇 Additional comments (3)

deepmd/tf/entrypoints/train.py (1)

224-226: LGTM: Improved seed handling for distributed training

The change ensures unique seeds across different ranks by combining rank and seed into a list. This prevents potential seed collisions in distributed training scenarios.

deepmd/pt/entrypoints/main.py (2)

141-141: LGTM: Consistent seed handling across backends

The implementation matches the PaddlePaddle version, ensuring consistent seed handling behavior across different backends. This is crucial for reproducibility in distributed training scenarios.

141-141: Verify consistent seed handling across codebase

Let's verify that all seed handling implementations across the codebase follow this new pattern.

✅ Verification successful

Based on the search results, I can now generate the final response:

Seed handling is consistent across implementations

The seed handling implementation is consistent between PyTorch and PaddlePaddle implementations:

Both deepmd/pt/entrypoints/main.py and deepmd/pd/entrypoints/main.py use identical seed handling:
```
rank_seed = [rank, seed % (2**32)] if seed is not None else None
```
The seed is correctly passed to DpLoaderSet in both implementations with the same pattern
No other locations in the codebase require similar rank-based seed handling updates

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for other potential seed handling implementations that might need updates

# Look for seed-related patterns in Python files
echo "Searching for seed handling patterns..."
rg -l "seed.*=.*rank" --type py

# Look for DpLoaderSet usage with seed parameter
echo "Searching for DpLoaderSet instantiations..."
rg "DpLoaderSet\(.*seed.*\)" --type py

Length of output: 320

Script:

#!/bin/bash
# Let's examine the seed handling in both files to verify consistency

echo "Checking seed handling in PyTorch implementation:"
rg -A 3 -B 3 "seed.*=.*rank" deepmd/pt/entrypoints/main.py

echo -e "\nChecking seed handling in PaddlePaddle implementation:"
rg -A 3 -B 3 "seed.*=.*rank" deepmd/pd/entrypoints/main.py

# Let's also check for any other seed-related patterns
echo -e "\nChecking other seed-related patterns:"
rg "seed" --type py -C 2

Length of output: 74170

deepmd/pt/utils/dataloader.py

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

deepmd/pt/utils/dataloader.py (2)
56-61: Add input validation for seed parameter

While the seed handling logic is correct, consider adding validation for None or invalid types to make the function more robust.
 def setup_seed(seed) -> None:
+    if seed is None:
+        return
     if isinstance(seed, (list, tuple)):
         mixed_seed = mix_entropy(seed)
     else:
+        if not isinstance(seed, (int, np.integer)):
+            raise TypeError(f"Seed must be an integer or a collection of integers, got {type(seed)}")
         mixed_seed = seed
     torch.manual_seed(mixed_seed)
     torch.cuda.manual_seed_all(mixed_seed)
🧰 Tools

🪛 Ruff (0.8.2)

56-59: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

56-59: Simplify code using ternary operator

The if-else block can be simplified using a ternary operator for better readability.
-    if isinstance(seed, (list, tuple)):
-        mixed_seed = mix_entropy(seed)
-    else:
-        mixed_seed = seed
+    mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed
🧰 Tools

🪛 Ruff (0.8.2)

56-59: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 124f432 and cf6e6e2.

📒 Files selected for processing (1)

deepmd/pt/utils/dataloader.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

deepmd/pt/utils/dataloader.py

56-59: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

🔇 Additional comments (2)

deepmd/pt/utils/dataloader.py (2)

39-41: LGTM: Clean import addition

The import of mix_entropy is properly placed and necessary for the enhanced seed handling functionality.

56-61: Verify seed consistency across distributed processes

The seed handling changes look good, but let's verify the consistency across distributed processes.

✅ Verification successful

Based on the search results, I can now generate the final response:

Seed handling is properly implemented for distributed training

The changes ensure consistent random number generation across distributed processes by:

Using mix_entropy to handle seed lists/tuples in distributed scenarios
Setting both CPU and CUDA seeds consistently via torch.manual_seed and torch.cuda.manual_seed_all
Properly integrating with PyTorch's distributed sampler which handles seed synchronization across ranks

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential race conditions or inconsistencies in seed handling across distributed processes

# Search for other seed-related code in the codebase
echo "Checking for other seed-related code:"
rg "setup_seed|seed\s*=|\.seed\(" --type python

# Search for distributed training related code that might interact with seed
echo "Checking distributed training code that might affect seed:"
rg "DistributedSampler|dist\.is_initialized|get_rank" --type python -A 3

Length of output: 398

Script:

#!/bin/bash
# Let's try without file type filtering since python extension isn't recognized
echo "Checking for other seed-related code:"
rg "setup_seed|seed\s*=|\.seed\("

echo -e "\nChecking distributed training code that might affect seed:"
rg "DistributedSampler|dist\.is_initialized|get_rank"

# Also check for any other random state management
echo -e "\nChecking for random state management:"
rg "random\.(seed|get_state|set_state)|numpy\.(random|seed)|torch\.(manual_seed|cuda\.manual_seed)"

Length of output: 52957

🧰 Tools

🪛 Ruff (0.8.2)

56-59: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

deepmd/pd/utils/dataloader.py (1)

33-33: Consider consolidating common utilities

The import of mix_entropy from PyTorch utils (deepmd.pt.utils.utils) into a PaddlePaddle module suggests shared functionality. Consider moving common utilities to a framework-agnostic location (e.g., deepmd.utils.common) to improve maintainability.

Also applies to: 40-42

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf6e6e2 and 63dc3d3.

📒 Files selected for processing (1)

deepmd/pd/utils/dataloader.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

deepmd/pd/utils/dataloader.py

57-60: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

62-62: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

🔇 Additional comments (1)

deepmd/pd/utils/dataloader.py (1)

56-63: Seed handling implementation looks good

The implementation properly handles both scalar and composite seeds, ensuring deterministic behavior across PaddlePaddle, cuDNN, and DeePMD components.

Consider these style improvements:

-    if isinstance(seed, (list, tuple)):
-        mixed_seed = mix_entropy(seed)
-    else:
-        mixed_seed = seed
+    mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed
-    os.environ["FLAGS_cudnn_deterministic"] = "True"
+    os.environ["FLAGS_CUDNN_DETERMINISTIC"] = "True"

Let's verify the seed handling consistency across frameworks:

✅ Verification successful

Seed handling is consistent across frameworks with proper deterministic behavior

The verification confirms that seed handling is implemented consistently across frameworks:

PaddlePaddle implementation (deepmd/pd/utils/dataloader.py):
- Uses mix_entropy for composite seeds
- Sets paddle.seed and dp_random.seed
- Enables cuDNN deterministic mode
PyTorch implementation (deepmd/pt/utils/dataloader.py):
- Identical logic for composite seeds
- Sets torch.manual_seed, torch.cuda.manual_seed_all, and dp_random.seed
- Enables cuDNN deterministic mode

Both implementations follow the same pattern and ensure deterministic behavior across all components.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check seed handling implementation across frameworks
echo "Checking seed handling in TensorFlow, PyTorch, and PaddlePaddle implementations..."

# Search for setup_seed function implementations
echo "=== Setup Seed Implementations ==="
rg -A 10 "def setup_seed" --type py

# Search for seed-related function calls
echo "=== Seed Usage Patterns ==="
rg "setup_seed|mix_entropy|dp_random.seed" --type py

Length of output: 3541

🧰 Tools

🪛 Ruff (0.8.2)

57-60: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

62-62: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

deepmd/pd/utils/dataloader.py (4)
39-44: Consider implications of cross-module dependency

The import of mix_entropy from the PyTorch utilities (deepmd.pt.utils) into a Paddle module creates a cross-framework dependency. While this might be intentional for code reuse, it could make maintenance more challenging.

Consider moving shared utilities like mix_entropy to a common module (e.g., deepmd.utils) to maintain better separation between framework-specific code.

58-65: Add type hints and update docstring

The function now accepts both scalar and sequence seeds, but this isn't documented. Consider adding type hints and updating the docstring.
-def setup_seed(seed):
+def setup_seed(seed: Union[int, list, tuple]) -> None:
+    """Set up random seeds for reproducibility.
+    
+    Parameters
+    ----------
+    seed : Union[int, list, tuple]
+        Random seed. If a sequence is provided, it will be mixed using entropy
+        mixing to generate a single seed, useful for distributed training.
+    """
🧰 Tools

🪛 Ruff (0.8.2)

59-62: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

64-64: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

63-64: Document the environment variable

The FLAGS_cudnn_deterministic environment variable's purpose and impact should be documented.

Consider adding a comment explaining why this flag is set and its performance implications:
     paddle.seed(mixed_seed)
+    # Enable deterministic cudnn algorithms for reproducibility
+    # Note: This may impact performance
     os.environ["FLAGS_cudnn_deterministic"] = "True"
🧰 Tools

🪛 Ruff (0.8.2)

64-64: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

64-64: Follow paddle's environment variable convention

The environment variable should be capitalized to follow paddle's convention.
-    os.environ["FLAGS_cudnn_deterministic"] = "True"
+    os.environ["FLAGS_CUDNN_DETERMINISTIC"] = "True"
🧰 Tools

🪛 Ruff (0.8.2)

64-64: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63dc3d3 and c2884d0.

📒 Files selected for processing (1)

deepmd/pd/utils/dataloader.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

deepmd/pd/utils/dataloader.py

59-62: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

64-64: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

🔇 Additional comments (1)

deepmd/pd/utils/dataloader.py (1)

65-65: Verify seed propagation in distributed training

The seed propagation to dp_random needs verification in distributed training scenarios.

✅ Verification successful

Let me gather more information about distributed training implementation to verify the seed propagation.

Based on the analysis of the shell script outputs, I can now provide a final response regarding the seed propagation in distributed training:

Seed propagation is properly handled in distributed scenarios

The codebase shows proper seed handling across different distributed training implementations:

TensorFlow (Horovod): Seeds are properly set with rank awareness via dp_random.seed([run_opt.my_rank, seed]) in distributed mode
PyTorch (DDP): Seeds are comprehensively set through torch.manual_seed, torch.cuda.manual_seed_all, and dp_random.seed
PaddlePaddle: Seeds are properly set through paddle.seed and dp_random.seed

Each framework ensures deterministic behavior by:

Using rank-aware seeding in distributed mode
Setting framework-specific RNG seeds
Enabling deterministic operations (e.g., cudnn.deterministic = True)

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if dp_random seed is used consistently across the codebase

# Search for dp_random.seed usage
echo "Checking dp_random.seed usage:"
rg "dp_random\.seed"

# Search for potential random number generation
echo "\nChecking random number generation:"
rg "random\.(rand|randn|choice|shuffle)"

Length of output: 2843

Script:

#!/bin/bash
# Check distributed training implementation and seed handling

# Search for distributed training related code
echo "Checking distributed training implementation:"
rg -A 5 "(DistributedDataParallel|distributed\.launch|distributed\.init|horovod)"

# Search for rank or world_size related code near dp_random
echo "\nChecking rank handling near dp_random:"
rg -B 3 -A 3 "dp_random" --context-separator="---"

# Search for seed handling in training files
echo "\nChecking seed handling in training files:"
fd "train.*\.py" --exec rg -l "seed|random"

Length of output: 28728

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

deepmd/pd/utils/dataloader.py (2)
39-42: Consider consolidating shared utilities

The import of mix_entropy from the PyTorch utilities (deepmd.pt) into the Paddle module suggests shared functionality. Consider moving shared utilities to a common module to maintain better separation of concerns.

62-62: Use uppercase for environment variable name

Following Python conventions, environment variables should be uppercase.
-    os.environ["FLAGS_cudnn_deterministic"] = "True"
+    os.environ["FLAGS_CUDNN_DETERMINISTIC"] = "True"
🧰 Tools

🪛 Ruff (0.8.2)

62-62: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2884d0 and 2d4a8e9.

📒 Files selected for processing (1)

deepmd/pd/utils/dataloader.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

deepmd/pd/utils/dataloader.py

57-60: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

62-62: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

🔇 Additional comments (1)

deepmd/pd/utils/dataloader.py (1)

56-63: Verify seed consistency across distributed processes

The implementation looks correct for handling both single and collection seeds. However, we should verify that this provides consistent results across distributed processes.

✅ Verification successful

Seed handling implementation is correct and consistent across distributed processes

The verification shows that:

Both PyTorch and Paddle implementations use identical mix_entropy functions for handling seed collections
The seed handling is framework-agnostic and properly integrated with distributed training
The implementation maintains consistency by:
- Using mixed seed for framework-specific RNG (paddle.seed/torch.manual_seed)
- Using original seed for dp_random
- Setting deterministic behavior flags

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for other seed handling implementations to ensure consistency
# Look for other setup_seed implementations
rg -A 5 "def setup_seed" 

# Look for other uses of mix_entropy
rg "mix_entropy\("

Length of output: 1166

Script:

#!/bin/bash
# Check mix_entropy implementation in both pd and pt utils
ast-grep --pattern 'def mix_entropy($$$)'

# Check for any distributed training related seed handling
rg -B 2 -A 5 "distributed|world_size|local_rank" deepmd/pd/utils/dataloader.py deepmd/pt/utils/dataloader.py

Length of output: 3903

Script:

#!/bin/bash
# Check mix_entropy implementation in both utils files
cat deepmd/pd/utils/utils.py deepmd/pt/utils/utils.py | rg -A 10 "def mix_entropy"

# Check if there's any rank-specific seed handling
rg -B 2 -A 5 "get_rank|local_rank.*seed" deepmd/pd/utils/dataloader.py deepmd/pt/utils/dataloader.py

Length of output: 3229

🧰 Tools

🪛 Ruff (0.8.2)

57-60: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

62-62: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

codecov · 2024-12-19T10:08:46Z

Codecov Report

Attention: Patch coverage is 94.44444% with 1 line in your changes missing coverage. Please review.

Project coverage is 84.42%. Comparing base (c0914e1) to head (2d4a8e9).
Report is 9 commits behind head on devel.

Files with missing lines	Patch %	Lines
deepmd/tf/entrypoints/train.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##            devel    #4479   +/-   ##
=======================================
  Coverage   84.41%   84.42%           
=======================================
  Files         670      670           
  Lines       62147    62158   +11     
  Branches     3487     3486    -1     
=======================================
+ Hits        52464    52474   +10     
- Misses       8556     8558    +2     
+ Partials     1127     1126    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fix: fix seed with multiple ranks

b95718a

Fix deepmodeling#4440.

njzjz requested review from caic99, wanghan-iapcm and Copilot December 19, 2024 08:07

github-actions bot added the Python label Dec 19, 2024

Fix setup_seed function to handle non-list seed

124f432

Copilot AI reviewed Dec 19, 2024

View reviewed changes

deepmd/pt/utils/dataloader.py Show resolved Hide resolved

coderabbitai bot reviewed Dec 19, 2024

View reviewed changes

deepmd/pt/utils/dataloader.py Show resolved Hide resolved

import mix_entropy

15af914

njzjz linked an issue Dec 19, 2024 that may be closed by this pull request

[BUG] seed is unsafe in TF parallel training #4440

Open

[pre-commit.ci] auto fixes from pre-commit.com hooks

cf6e6e2

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Dec 19, 2024

View reviewed changes

njzjz and others added 2 commits December 19, 2024 04:08

fix pd

63dc3d3

[pre-commit.ci] auto fixes from pre-commit.com hooks

c2884d0

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Dec 19, 2024

View reviewed changes

njzjz and others added 2 commits December 19, 2024 04:24

fix import

de6d22b

[pre-commit.ci] auto fixes from pre-commit.com hooks

2d4a8e9

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Dec 19, 2024

View reviewed changes

wanghan-iapcm approved these changes Dec 19, 2024

View reviewed changes

njzjz added this to the v3.0.1 milestone Dec 21, 2024

caic99 approved these changes Dec 25, 2024

View reviewed changes

wanghan-iapcm added this pull request to the merge queue Dec 25, 2024

njzjz removed this from the v3.0.1 milestone Dec 25, 2024

Merged via the queue into deepmodeling:devel with commit f8605ee Dec 25, 2024
60 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix seed with multiple ranks #4479

fix: fix seed with multiple ranks #4479

njzjz commented Dec 19, 2024 •

edited by coderabbitai bot

Loading

Copilot AI left a comment

coderabbitai bot commented Dec 19, 2024 •

edited

Loading

Walkthrough

Changes

Assessment against linked issues

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

codecov bot commented Dec 19, 2024 •

edited

Loading

fix: fix seed with multiple ranks #4479

fix: fix seed with multiple ranks #4479

Conversation

njzjz commented Dec 19, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

Copilot AI left a comment

Choose a reason for hiding this comment

coderabbitai bot commented Dec 19, 2024 • edited Loading

Walkthrough

Changes

Assessment against linked issues

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 19, 2024 • edited Loading

Codecov Report

njzjz commented Dec 19, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 19, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Dec 19, 2024 •

edited

Loading