Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: use functools.cached_property for cached properties #4187

Merged
merged 3 commits into from
Oct 8, 2024

Conversation

njzjz
Copy link
Member

@njzjz njzjz commented Oct 7, 2024

functools.cached_property (new in Python 3.8) is more suitable for cached properties.

Summary by CodeRabbit

  • New Features

    • Introduced a new parameter neighbor_list for enhanced neighbor list handling in model evaluation.
    • Added support for percentage strings in the test_size parameter for flexible test size configuration.
    • New method _make_auto_ts to facilitate test size calculations based on specified percentages.
  • Bug Fixes

    • Improved caching mechanisms for properties, enhancing performance and memory management.
  • Documentation

    • Added comments and clarifications in the code to improve understanding of batch and test size handling.

njzjz and others added 3 commits October 6, 2024 22:31
`functools.cached_property` (new in Python 3.8) is more suitable for cached properties.
Signed-off-by: Jinzhe Zeng <[email protected]>
Copy link
Contributor

coderabbitai bot commented Oct 7, 2024

📝 Walkthrough

Walkthrough

The pull request introduces several modifications across three files, focusing on enhancing property management through the use of cached_property instead of other caching methods. Key changes include updates to the DeepEval, DPTabulate, and DeepmdDataSystem classes, with the addition of new parameters and methods to improve flexibility and usability. Specifically, the DeepEval class constructor now accepts a neighbor_list parameter, and the DeepmdDataSystem class can handle test_size as a percentage string. Overall, the changes streamline property access and enhance configurability.

Changes

File Path Change Summary
deepmd/tf/infer/deep_eval.py - Updated DeepEval class to use @cached_property for model_type, model_version, and sess.
- Added neighbor_list parameter to __init__ method.
- Adjusted _init_tensors and _init_attr methods for new structure.
deepmd/tf/utils/tabulate.py - Modified DPTabulate class to use @cached_property for _n_all_excluded instead of @lru_cache.
deepmd/utils/data_system.py - Changed DeepmdDataSystem class to use @cached_property for default_mesh.
- Updated __init__ to accept test_size as a percentage string.
- Added _make_auto_ts method and deprecated get_test.

Possibly related PRs

Suggested labels

Python

Suggested reviewers

  • wanghan-iapcm

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (7)
deepmd/tf/utils/tabulate.py (3)

Line range hint 779-792: Consider using functools.cache for _all_excluded method.

While the change to use @cached_property for _n_all_excluded is good, we could further improve the caching strategy for the _all_excluded method.

Suggestion:
Replace @lru_cache with @functools.cache for the _all_excluded method. @functools.cache is a simpler unbounded cache introduced in Python 3.9, which is more appropriate for this use case as it doesn't require specifying a max size.

from functools import cache

@cache
def _all_excluded(self, ii: int) -> bool:
    """Check if type ii excludes all types."""
    return all((ii, type_i) in self.exclude_types for type_i in range(self.ntypes))

This change would:

  1. Simplify the caching mechanism
  2. Potentially improve performance by removing the need for LRU tracking
  3. Make the code more consistent with the use of modern Python features

Line range hint 1-892: Consider applying caching to other methods for performance optimization.

While the current changes improve the caching strategy for _n_all_excluded, there are other methods in the class that might benefit from similar optimizations. Consider applying @cached_property or @functools.cache to methods that:

  1. Have deterministic output based solely on their inputs
  2. Are called frequently
  3. Have computationally expensive operations

Potential candidates for optimization:

  • _get_layer_size
  • _get_table_size
  • _get_data_type
  • _get_last_layer_size

Example:

@cached_property
def _layer_size(self):
    # Existing implementation of _get_layer_size
    ...

@cached_property
def _table_size(self):
    # Existing implementation of _get_table_size
    ...

This refactoring could potentially improve the overall performance of the class by reducing redundant computations.


Line range hint 1-892: Overall good changes, but consider a more comprehensive optimization pass.

The current changes to improve caching for the _n_all_excluded method are a step in the right direction. However, given the complexity of the DPTabulate class and its critical role in model compression, it might be beneficial to conduct a more comprehensive optimization pass.

Suggestions for future improvements:

  1. Consistently apply modern caching strategies (@cached_property, @functools.cache) throughout the class where appropriate.
  2. Consider refactoring large methods (e.g., build, _build_lower) into smaller, more manageable functions.
  3. Evaluate the use of NumPy operations for performance-critical sections, especially in methods like _make_data.
  4. Add more comprehensive type hints to improve code readability and catch potential type-related issues early.
  5. Consider adding more detailed docstrings for complex methods to improve maintainability.

These changes could significantly improve the performance, readability, and maintainability of this critical component. Would you like assistance in planning or implementing any of these suggestions?

deepmd/tf/infer/deep_eval.py (4)

Line range hint 308-313: Ensure proper management of cached TensorFlow session

While caching the sess property using @cached_property avoids re-creating the TensorFlow session multiple times, it's important to ensure that the session is properly closed when it's no longer needed to prevent resource leaks.

Consider adding a method to explicitly close the session when the DeepEval instance is no longer in use or implementing context management to handle the session's lifecycle.


Line range hint 340-340: Correct method name spelling: _graph_compatable_graph_compatible

There is a typo in the method name _graph_compatable. The correct spelling should be _graph_compatible.

Apply this diff to correct the typo:

-def _graph_compatable(self) -> bool:
+def _graph_compatible(self) -> bool:

Line range hint 1220-1224: Ensure proper management of cached TensorFlow session in DeepEvalOld

Caching the sess property in DeepEvalOld using @cached_property is efficient but requires careful management to prevent resource leaks. Ensure that the TensorFlow session is properly closed when the instance is no longer in use.

Consider implementing a mechanism to manage the session's lifecycle, such as a method to close the session explicitly or using a context manager.


1220-1220: Correct method name spelling in DeepEvalOld: _graph_compatable_graph_compatible

There is a typo in the method name _graph_compatable in the DeepEvalOld class. It should be corrected to _graph_compatible.

Apply this diff to correct the typo:

-def _graph_compatable(self) -> bool:
+def _graph_compatible(self) -> bool:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between a0747b9 and b8785da.

📒 Files selected for processing (3)
  • deepmd/tf/infer/deep_eval.py (7 hunks)
  • deepmd/tf/utils/tabulate.py (2 hunks)
  • deepmd/utils/data_system.py (2 hunks)
🧰 Additional context used
🔇 Additional comments (8)
deepmd/tf/utils/tabulate.py (1)

774-777: Improved caching strategy for _n_all_excluded method.

The change from @property @lru_cache to @cached_property for the _n_all_excluded method is a good optimization. @cached_property combines the functionality of @property and @lru_cache(maxsize=None), providing a more efficient and cleaner implementation.

Benefits of this change:

  1. Reduced memory usage: @cached_property stores the result directly on the instance, unlike @lru_cache which maintains a separate cache.
  2. Thread-safety: @cached_property is thread-safe by default, which is beneficial in multi-threaded environments.
  3. Simplified code: The decorator combines two decorators into one, making the code more readable.
deepmd/utils/data_system.py (2)

6-6: Confirm cached_property compatibility with project requirements

The use of functools.cached_property requires Python 3.8 or higher. Please ensure that the project's minimum Python version is 3.8 or newer to maintain compatibility.


Line range hint 241-246: Efficient caching with @cached_property for default_mesh

Utilizing @cached_property for the default_mesh property effectively caches its value, enhancing performance by avoiding unnecessary recomputations.

deepmd/tf/infer/deep_eval.py (5)

4-4: Use of functools.cached_property is appropriate

The import of cached_property from the functools module allows the use of the @cached_property decorator, which efficiently caches property values. This change aligns with the PR objective to utilize functools.cached_property for cached properties.


Line range hint 266-288: Appropriate use of @cached_property for model_type

Decorating the model_type method with @cached_property is appropriate since the model type is a constant attribute that does not change during the lifetime of the DeepEval instance. Caching it improves performance by preventing redundant computations.


Line range hint 290-306: Appropriate use of @cached_property for model_version

Applying @cached_property to the model_version method is suitable because the model version remains constant. This change enhances efficiency by caching the value after the first retrieval.


Line range hint 1192-1200: Appropriate use of @cached_property for model_type in DeepEvalOld

Decorating the model_type method with @cached_property in the DeepEvalOld class is acceptable since the model type remains constant. This change improves performance by caching the value.


Line range hint 1202-1218: Appropriate use of @cached_property for model_version in DeepEvalOld

Applying @cached_property to the model_version method in DeepEvalOld is appropriate. Caching this property enhances efficiency by avoiding repeated computations of a constant value.

@github-actions github-actions bot added the Python label Oct 7, 2024
Copy link

codecov bot commented Oct 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.45%. Comparing base (a0747b9) to head (b8785da).
Report is 192 commits behind head on devel.

Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #4187      +/-   ##
==========================================
- Coverage   83.45%   83.45%   -0.01%     
==========================================
  Files         537      537              
  Lines       52148    52140       -8     
  Branches     3047     3047              
==========================================
- Hits        43519    43511       -8     
  Misses       7683     7683              
  Partials      946      946              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue Oct 8, 2024
Merged via the queue into deepmodeling:devel with commit dcdd804 Oct 8, 2024
59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants