Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENG-2320] Refactor browse method to use worker pattern #121

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

kanapuli
Copy link
Contributor

@kanapuli kanapuli commented Jan 31, 2025

Description

Benchmark results for the old recursive approach

image

Benchmark results for the new worker pattern

image

Summary by CodeRabbit

Summary by CodeRabbit

  • Refactor

    • Implemented a worker pool pattern for node processing.
    • Removed level parameter from the Browse function.
    • Introduced a new NodeTask type for better node management.
    • Added ServerMetrics struct for monitoring OPCUA server performance.
  • Bug Fixes

    • Improved concurrency and error handling in node browsing.
    • Enhanced node processing efficiency.
  • Chores

    • Updated function signatures across multiple files.
    • Renamed browse function to Browse for consistency.

@kanapuli kanapuli self-assigned this Jan 31, 2025
Copy link
Contributor

coderabbitai bot commented Jan 31, 2025

Walkthrough

The pull request introduces a significant refactoring of the OPC UA browsing functionality in the opcua_plugin package. The primary change is the implementation of a worker pool pattern for processing nodes, which aims to control the number of goroutines created during node browsing. The Browse function's signature has been modified to remove the level parameter, and a new NodeTask structure has been introduced to manage node processing more efficiently. The changes enhance the concurrency model of the browsing operation while maintaining the existing functionality.

Changes

File Change Summary
opcua_plugin/browse.go - Refactored Browse and browse functions to use worker pool pattern
- Removed level parameter from function signatures
- Added NodeTask type to encapsulate node browsing parameters
- Implemented concurrent node processing with a dynamic number of workers
- Added browseChildren helper function
opcua_plugin/browse_frontend.go, opcua_plugin/server_info.go - Renamed browse function to Browse
opcua_plugin/opcua_opc-plc_test.go, opcua_plugin/opcua_unittest_test.go - Updated Browse function calls to remove level parameter

Assessment against linked issues

Objective Addressed Explanation
Control number of Goroutines [ENG-2320]

Possibly related PRs

  • Eng 2008 benthos umh struggles to read out arrays in opc ua v2 #84: The changes in the main PR are related to those in the retrieved PR as both involve modifications to the browse and Browse functions in the opcua_plugin/browse.go file, specifically in how parameters are handled and the overall browsing logic.
  • [ENG-2320] Reduce cyclomatic complexity of the browse function #119: The changes in the main PR are related to the modifications in the browse function within the retrieved PR, as both involve significant alterations to the same function's signature and logic in opcua_plugin/browse.go.
  • [ENG-2074] Remove duplicate browse logic #91: The changes in the main PR are related to the modifications in the Browse and browse functions in the retrieved PR, as both involve updates to their signatures and the handling of parameters, specifically the removal of the level parameter in favor of a new structure or channel.

Suggested reviewers

  • JeremyTheocharis
  • led0nk
  • Scarjit

Poem

🐰 Browsing nodes with worker might,
Goroutines dancing, oh so light!
A pool of threads, efficient and neat,
OPC UA's performance now a treat!
Code refactored with rabbit's delight! 🌟

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@kanapuli kanapuli marked this pull request as ready for review January 31, 2025 13:17
@kanapuli kanapuli requested a review from a team January 31, 2025 13:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
opcua_plugin/browse.go (6)

65-66: Clarify intended usage or visibility

Although documented as "avoid using this function directly," it is exported. Consider renaming or adding explicit comments if its usage is truly meant to be restricted to test scenarios.


69-75: Document struct fields

The NodeTask struct plays a crucial role in controlling recursion depth and mapping parent IDs. Consider adding inline doc comments for each field to enhance maintainability and readability.


77-79: Reduce parameter overload

The browse function’s signature has many parameters. Grouping them into a dedicated configuration struct would improve clarity and maintainability.


81-84: Make worker count configurable

Hardcoding numWorkers = 100 may limit flexibility. Consider allowing a configurable value or an adaptive strategy based on workload.


110-121: Simplify worker signature

Similar to browse, the worker function accepts numerous parameters. You could encapsulate them in a WorkerContext struct to reduce parameter churn.


323-323: Check concurrency impact

Each browse invocation spawns a worker pool of 100 goroutines. Repeated calls in a loop can multiply resource usage. Consider a shared pool or dynamic scaling if performance is a concern.

opcua_plugin/server_info.go (1)

40-42: Potential worker oversubscription

Each parallel Browse call starts a pool of 100 workers, possibly leading to hundreds of concurrent goroutines. Consider running them sequentially or unifying them under one task pool if resource constraints become an issue.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78dd8fe and 10d85b3.

📒 Files selected for processing (5)
  • opcua_plugin/browse.go (3 hunks)
  • opcua_plugin/browse_frontend.go (1 hunks)
  • opcua_plugin/opcua_opc-plc_test.go (1 hunks)
  • opcua_plugin/opcua_unittest_test.go (8 hunks)
  • opcua_plugin/server_info.go (1 hunks)
🔇 Additional comments (15)
opcua_plugin/browse.go (6)

86-89: Worker pool initialization

This concurrency pattern matches a standard worker-pool approach and appears implemented correctly.


92-98: Initial task scheduling

Enqueuing the start node with level 0 is clear and logically sound.


101-108: Graceful shutdown logic

Closing taskChan after taskWg.Wait() and then waiting for workerWg ensures an orderly teardown.


222-251: Child task creation

The recursive scheduling of child nodes via browseChildren is well-structured and ensures deeper levels can be explored.


399-399: Heartbeat browsing

Spawning a separate goroutine for the heartbeat node is consistent with the existing approach. No issues detected.


122-220: Non-blocking send can drop data

Using a default case when writing to opcuaBrowserChan means data could be discarded if the channel is full. Verify whether dropping node definitions is acceptable. If not, remove the default to block until the channel is available.

-   select {
-   case opcuaBrowserChan <- browserDef:
-   default:
-       logger.Debugf("Worker %%d: opcuaBrowserChan blocked, skipping", id)
-   }
+   opcuaBrowserChan <- browserDef
opcua_plugin/browse_frontend.go (1)

44-44: Consistent usage of Browse

Invoking Browse here aligns with the refactored concurrency model. No additional concerns apparent.

opcua_plugin/opcua_unittest_test.go (7)

106-106: LGTM!

The function call has been correctly updated to match the new signature by removing the level parameter.


133-133: LGTM!

The function call has been correctly updated to match the new signature by removing the level parameter.


171-171: LGTM!

The function call has been correctly updated to match the new signature by removing the level parameter.


211-211: LGTM!

The function call has been correctly updated to match the new signature by removing the level parameter.


251-251: LGTM!

The function call has been correctly updated to match the new signature by removing the level parameter.


294-294: LGTM!

The function call has been correctly updated to match the new signature by removing the level parameter.


695-695: LGTM!

The function call has been correctly updated to match the new signature by removing the level parameter.

opcua_plugin/opcua_opc-plc_test.go (1)

1433-1433: LGTM!

The function call has been correctly updated to match the new signature by removing the level parameter.

opcua_plugin/browse.go Outdated Show resolved Hide resolved
opcua_plugin/browse.go Show resolved Hide resolved
opcua_plugin/browse.go Show resolved Hide resolved
@kanapuli kanapuli requested a review from Scarjit January 31, 2025 14:41
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
opcua_plugin/browse.go (2)

77-108: Consider making worker pool size configurable.

While the implementation is solid, hardcoding the number of workers and buffer size might not be optimal for all scenarios. Consider making these values configurable based on system resources or user requirements.

-const numWorkers = 100 // Adjust based on your needs
+// WorkerPoolConfig holds worker pool configuration
+type WorkerPoolConfig struct {
+    NumWorkers int
+    BufferSize int
+}
+
+// DefaultWorkerPoolConfig returns default configuration
+func DefaultWorkerPoolConfig() WorkerPoolConfig {
+    return WorkerPoolConfig{
+        NumWorkers: 100,
+        BufferSize: 1000,
+    }
+}

138-142: Consider making max depth configurable.

The maximum browse depth is hardcoded to 25 levels. Consider making this configurable to accommodate different OPC UA server hierarchies.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10d85b3 and ef6a8e7.

📒 Files selected for processing (1)
  • opcua_plugin/browse.go (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: go-test-opcua-plc
🔇 Additional comments (4)
opcua_plugin/browse.go (4)

65-66: LGTM! Clean wrapper implementation.

The public wrapper function is well-documented and correctly implements the new worker pattern interface.


69-75: LGTM! Well-structured task encapsulation.

The NodeTask struct effectively encapsulates all necessary parameters for node processing, making the code more maintainable and easier to understand.


234-252: LGTM! Well-implemented helper functions.

The helper functions are well-structured with proper error handling and task management. The browseChildren function effectively manages the hierarchical browsing process.


187-193: Verify potential data loss in opcuaBrowserChan.

The non-blocking send to opcuaBrowserChan could silently drop nodes if the channel is full. Consider logging metrics about dropped nodes or implementing a backpressure mechanism.

✅ Verification successful

Current implementation of opcuaBrowserChan is appropriate

The non-blocking send with debug logging is an intentional design choice in this worker pool architecture. The implementation:

  • Prevents worker deadlocks
  • Provides visibility through debug logging
  • Operates in a browse context where skipping occasional nodes is preferable to blocking workers
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other instances of opcuaBrowserChan usage to understand the impact
rg -A 5 "opcuaBrowserChan.*<-"

Length of output: 333


Script:

#!/bin/bash
# Find channel declaration and initialization
rg -A 2 "opcuaBrowserChan\s*:?=.*chan"

# Find consumer/receiver code
rg -B 2 "<-\s*opcuaBrowserChan"

# Get broader context of browse operation
ast-grep --pattern 'func browse($$$) {
  $$$
}'

Length of output: 2739

@kanapuli kanapuli marked this pull request as draft January 31, 2025 15:54
opcua_plugin/browse.go Outdated Show resolved Hide resolved
opcua_plugin/browse.go Outdated Show resolved Hide resolved
return
}
// browse uses a worker pool pattern to process nodes
func browse(ctx context.Context, startNode NodeBrowser, startPath string, logger Logger, parentNodeId string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if someone has multiple "parent nodes"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeremyTheocharis It is handled already inside the constructNodeHierarchy function. As you can see in the image, I have modelled an OPCUA server with a node called Temperature (nodeid= ns=2;i=2) has two parents nodes 1. DMK , 2. Boiler. The Browse operation correctly shows the temperature node then under their corresponding parent nodes.
image

@kanapuli kanapuli marked this pull request as ready for review February 7, 2025 14:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
opcua_plugin/browse.go (3)

64-68: Enhance function documentation.

The function documentation should be expanded to:

  1. Explain the purpose of each parameter
  2. Document the worker pool pattern implementation
  3. Clarify why direct usage should be avoided

70-76: Add field-level documentation to NodeTask struct.

Each field in the struct should be documented to explain its purpose and any constraints.

 type NodeTask struct {
+    // node represents the OPC UA node to be processed
     node         NodeBrowser
+    // path represents the hierarchical path to this node
     path         string
+    // level represents the current depth in the node hierarchy
     level        int
+    // parentNodeId represents the ID of the parent node
     parentNodeId string
 }

157-281: Consider breaking down the worker function for better maintainability.

The worker function handles multiple responsibilities. Consider extracting these into separate functions:

  1. Node attribute processing
  2. Node class handling
  3. Task processing logic
+func processNodeTask(task NodeTask, def *NodeDef) error {
+    // Extract node attribute processing logic
+}

+func handleNodeClass(def NodeDef, task NodeTask) error {
+    // Extract node class handling logic
+}

 func worker(
     ctx context.Context,
     id uuid.UUID,
     // ... other parameters ...
 ) {
     // ... existing setup ...
     
     for {
         select {
         case task, ok := <-taskChan:
             if !ok {
                 return
             }
             
-            // Current implementation
+            if err := processNodeTask(task, &def); err != nil {
+                sendError(ctx, err, errChan, logger)
+                taskWg.Done()
+                continue
+            }
+            
+            if err := handleNodeClass(def, task); err != nil {
+                sendError(ctx, err, errChan, logger)
+            }
         }
     }
 }
opcua_plugin/browse_workers.go (4)

14-14: Correct the spelling in the comment.

There's a minor typo in the comment for SampleSize: "requsts" should be "requests".

Apply this diff to fix the typo:

-	SampleSize     = 5 // Number of requsts to measure response time
+	SampleSize     = 5 // Number of requests to measure response time

56-84: Consider preserving partial history of response times.

Currently, the responseTimes slice is cleared after each adjustment, discarding historical data. Retaining a small rolling window (instead of resetting to an empty slice) could improve the stability of your average measurements, preventing abrupt changes.


75-82: Refine debug messages for clarity.

Although these debug logs are developer-focused (rather than user-facing error messages), you might consider adding minimal guidance or referencing next steps for developers, e.g., “Consider analyzing network conditions or server load.” This aligns with best practices for actionable logging.


92-101: Optimize the sample buffer handling.

The recordResponseTime function discards only the oldest measurement when capacity is reached. For high-throughput systems, consider using a ring buffer or specialized data structure to manage these durations more efficiently and reduce slice overhead.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef6a8e7 and 1d84ab2.

📒 Files selected for processing (3)
  • opcua_plugin/browse.go (4 hunks)
  • opcua_plugin/browse_frontend.go (1 hunks)
  • opcua_plugin/browse_workers.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • opcua_plugin/browse_frontend.go
🧰 Additional context used
📓 Path-based instructions (2)
`**/*`: When reviewing an error message, ensure it meets the...

**/*: When reviewing an error message, ensure it meets these guidelines. If a message is lacking (e.g., missing clear next steps or context), provide an example of the correct format to help improve clarity and usability.

Quick Guidelines for Writing Effective Error Messages

  1. Structure Your Message with Three Key Parts
    • Context:
    Explain what the system was doing.
    Example: “Couldn’t parse config file: /etc/sample-config.properties.”
    • Error Description:
    Clearly state what failed without generic wording.
    Example: “Invalid snapshot mode ‘nevr’ provided.”
    • Mitigation/Next Steps:
    Tell the user what to do next. For complex issues, include an error code or documentation link.
    Example: “Please update the mode; refer to our docs for valid options.”
  2. Clarity & Conciseness
    • Use plain, simple language and avoid unnecessary jargon (unless targeting technical users).
    • Aim for short sentences (ideally 8–14 words) for better understanding.
  3. Consistent Tone & Style
    • Uniformity: Use the same voice, terminology, and punctuation across all messages.
    • Empathy: Maintain a neutral, supportive tone; avoid blaming the user.
    • Avoid overly casual language (skip “oops”/“whoops” and excessive exclamation points).
  4. Visibility & Placement (User-Facing Messages)
    • Proximity: Place error messages close to the related UI element to reduce cognitive load.
    • Design: Use high-contrast text, icons, or subtle animations; follow accessibility guidelines (don’t rely solely on color).
  5. Actionability & Efficiency
    • Clearly indicate the next steps (e.g., “Retry later”, “Log in again”, “Check your input”).
    • Preserve user input when possible to minimize frustration.
    • Where relevant, provide error codes or links for further troubleshooting.
  6. Security & Privacy Considerations
    • User-Facing: Avoid exposing internal details or sensitive data.
    • Developer/Logging: Include detailed context (like file paths or exception chains) for debugging, but be cautious about sensitive information.
  7. Logging & Exception Best Practices (for Developer/Operations Messages)
    • Log the full context—including exception chains—to aid troubleshooting.
    • Use structured logging (e.g., JSON) for filtering and analysis.
    • Choose a consistent method: either raise an exception or log an error (not both) to avoid redundancy.
    • Don’t make error messages an API contract; expose machine-readable error codes or types instead.
  8. Additional Best Practices
    • Fail Early: Report errors as soon as possible (e.g., during input validation or startup).
    • One Concept, One Term: Use consistent terminology across messages and documentation.
    • Localization: For internal logs, use English; if localizing for users, follow a vetted process.
  • opcua_plugin/browse.go
  • opcua_plugin/browse_workers.go
`**/*`: Whenever in a new PR a new function or code is intro...

**/*: Whenever in a new PR a new function or code is introduced, ensure that there is at least somewhere as a code comment, in the README or wherer ever explained the "why" behind the decision. So e.g., why something was preferred over other options, or why a certain loop needs to be executed.

  • opcua_plugin/browse.go
  • opcua_plugin/browse_workers.go
🔇 Additional comments (5)
opcua_plugin/browse.go (2)

94-96: Address TODO comment about backpressure mechanism.

The current implementation uses a large buffer to avoid blocking, but this might lead to memory issues under high load. Consider implementing a backpressure mechanism using:

  1. A semaphore to limit concurrent tasks
  2. A sliding window approach
  3. Rate limiting based on server response times

Would you like me to propose a backpressure implementation?


283-323: LGTM! Well-documented helper functions.

The helper functions are well-implemented with:

  • Clear documentation
  • Proper error handling
  • Good separation of concerns
opcua_plugin/browse_workers.go (3)

18-25: Explain the rationale behind ServerMetrics.

This newly introduced struct benefits from a brief explanation of “why” the worker-based concurrency approach is chosen, along with any design tradeoffs or alternatives considered. Such context helps maintainers and contributors understand the long-term maintainability and reasoning behind collecting these metrics.

Would you like to add a code comment or README snippet explaining your decision to implement ServerMetrics?


36-44: Verify behavior when adding a worker with an existing ID.

The addWorker method overwrites the stop channel if the given ID already exists in the map, potentially leading to a stale channel or worker confusion. Consider checking for an existing entry before overwriting.


45-54: Removal logic looks good.

The removeWorker function correctly closes the associated channel and removes it from the map, preventing potential leaks. No changes needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants