Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport PR #2665 to to release/v1.7 for Fix gRPC error handling for mirror-gateway handler #2681

Merged

Conversation

hlts2
Copy link
Collaborator

@hlts2 hlts2 commented Oct 8, 2024

Description

SSIA

Related Issue

Versions

  • Vald Version: v1.7.13
  • Go Version: v1.23.1
  • Rust Version: v1.81.0
  • Docker Version: v27.3.1
  • Kubernetes Version: v1.31.1
  • Helm Version: v3.16.1
  • NGT Version: v2.2.4
  • Faiss Version: v1.8.0

Checklist

Special notes for your reviewer

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Enhanced error handling across multiple gRPC methods for clearer and more consistent error reporting.
    • Streamlined error parsing to simplify error handling processes.
  • Bug Fixes

    • Improved logging for better context and clarity in error messages.
    • Adjusted handling of unimplemented errors to ensure proper disconnection.
  • Refactor

    • Updated method logic for handling proxied requests to maintain consistency with new error handling patterns.
    • Simplified error handling in the registers, Connect, and Disconnect methods for improved robustness.

* fix: use FromError function when stream rpc error occurs

Signed-off-by: hlts2 <[email protected]>

* fix: delete ParseError of Register RPC error handling

Signed-off-by: hlts2 <[email protected]>

* fix: crud rpc error handling

Signed-off-by: hlts2 <[email protected]>

* fix: add code comment and refactor multi crud error handling

Signed-off-by: hlts2 <[email protected]>

* fix: deleted unnecessary error resource info

Signed-off-by: hlts2 <[email protected]>

* fix: typo

Signed-off-by: hlts2 <[email protected]>

* fix: status code when the parse is failed

Signed-off-by: hlts2 <[email protected]>

* fix: ignore parse success variable

Signed-off-by: hlts2 <[email protected]>

* fix: use FromError function for stream error

Signed-off-by: hlts2 <[email protected]>

* fix: error status handling

Signed-off-by: hlts2 <[email protected]>

* fix: typo

Signed-off-by: hlts2 <[email protected]>

---------

Signed-off-by: hlts2 <[email protected]>
Co-authored-by: Kosuke Morimoto <[email protected]>
Copy link
Contributor

coderabbitai bot commented Oct 8, 2024

📝 Walkthrough

Walkthrough

The changes in this pull request primarily enhance error handling and logging mechanisms within the gRPC server methods of the pkg/gateway/mirror/handler/grpc/handler.go file and the pkg/gateway/mirror/service/mirror.go file. Key modifications include a shift to using status.FromError(err) for error handling, standardization of error wrapping, and improvements in logging practices. Additionally, the import of errdetails has been removed from the mirror.go file, simplifying error reporting by focusing on the error type and message without additional context.

Changes

File Path Change Summary
pkg/gateway/mirror/handler/grpc/handler.go - Enhanced error handling using status.FromError(err).
- Standardized error wrapping across methods.
- Improved logging with context.
- Refactored errorState struct for better error management.
- Specific method changes for error handling in streaming and proxied requests.
pkg/gateway/mirror/service/mirror.go - Removed import of github.com/vdaas/vald/internal/net/grpc/errdetails.
- Simplified error handling in registers method.
- Updated logging to use log.Errorf.
- Streamlined error handling logic for various conditions.

Possibly related PRs

Suggested labels

type/refactoring, actions/e2e-deploy, actions/fossa, actions/e2e-chaos, actions/e2e-profiling, actions/e2e-max-dim, actions/backport/release/v1.7

Suggested reviewers

  • hlts2
  • vankichi
  • kpango

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vdaas-ci
Copy link
Collaborator

vdaas-ci commented Oct 8, 2024

[CHATOPS:HELP] ChatOps commands.

  • 🙆‍♀️ /approve - approve
  • 🍱 /format - format codes and add licenses
  • /gen-test - generate test codes
  • 🏷️ /label - add labels
  • 🔚 2️⃣ 🔚 /label actions/e2e-deploy - run E2E deploy & integration test

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (5)
pkg/gateway/mirror/service/mirror.go (3)

Line range hint 228-239: Handle Errors When Disconnecting on Unimplemented Code

When an Unimplemented error is received (line 228), the code attempts to disconnect from the target. However, if net.SplitHostPort(target) fails, the error handling could be improved by providing more context.

Enhance error handling and logging:

 if st.Code() == codes.Unimplemented {
     host, port, err := net.SplitHostPort(target)
     if err != nil {
-        log.Warn(err)
+        log.Warnf("Failed to split host and port for target '%s': %v", target, err)
     } else {
         if err := m.Disconnect(ctx, &payload.Mirror_Target{
             Host: host,
             Port: uint32(port),
         }); err != nil {
-            log.Errorf("failed to disconnect %s, err: %v", target, err)
+            log.Errorf("Failed to disconnect from target '%s': %v", target, err)
         }
     }
 }

Line range hint 242-246: Standardize Error Logging Messages

The log message on line 242 uses inconsistent formatting. Standardizing log messages improves readability and maintainability.

Update the log message format:

-log.Errorf("failed to send Register API to %s\t: %v", target, err)
+log.Errorf("Failed to send Register API to target '%s': %v", target, err)

Line range hint 273-283: Ensure All Targets are Returned Despite Errors

In lines 273-283, when handling errors.ErrGRPCClientConnNotFound, the function returns nil for resTgts, even though some targets might have been successfully processed. This could lead to loss of valid data.

Modify the return statement to ensure all gathered targets are returned:

 if errors.Is(err, errors.ErrGRPCClientConnNotFound("*")) {
     err = status.WrapWithInternal(
         mirror.RegisterRPCName+" API connection not found", err,
     )
     log.Warn(err)
     if span != nil {
         span.RecordError(err)
         span.SetAttributes(trace.StatusCodeInternal(err.Error())...)
         span.SetStatus(trace.StatusError, err.Error())
     }
-    return nil, err
+    return resTgts, err
 }

By returning resTgts instead of nil, the function preserves any successfully retrieved targets despite the connection error.

pkg/gateway/mirror/handler/grpc/handler.go (2)

394-398: Simplify unlikely nil check condition

The condition st == nil || st.Message() == "" is unlikely to occur, as noted in the comment. Simplifying or removing this check can streamline the code.


1591-1602: Ensure consistent error logging

In lines 1591-1602, after wrapping the error, the log message lacks additional context compared to similar logging elsewhere. Including more context can aid in debugging.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 4dc9354 and 2a1b476.

📒 Files selected for processing (2)
  • pkg/gateway/mirror/handler/grpc/handler.go (37 hunks)
  • pkg/gateway/mirror/service/mirror.go (4 hunks)
🧰 Additional context used

pkg/gateway/mirror/service/mirror.go Show resolved Hide resolved
pkg/gateway/mirror/service/mirror.go Show resolved Hide resolved
pkg/gateway/mirror/handler/grpc/handler.go Show resolved Hide resolved
pkg/gateway/mirror/handler/grpc/handler.go Show resolved Hide resolved
pkg/gateway/mirror/handler/grpc/handler.go Show resolved Hide resolved
pkg/gateway/mirror/handler/grpc/handler.go Show resolved Hide resolved
Copy link

codecov bot commented Oct 8, 2024

Codecov Report

Attention: Patch coverage is 3.43137% with 197 lines in your changes missing coverage. Please review.

Please upload report for BASE (release/v1.7@709d00e). Learn more about missing BASE report.

Files with missing lines Patch % Lines
pkg/gateway/mirror/handler/grpc/handler.go 3.68% 176 Missing and 7 partials ⚠️
pkg/gateway/mirror/service/mirror.go 0.00% 14 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff               @@
##             release/v1.7    #2681   +/-   ##
===============================================
  Coverage                ?   23.92%           
===============================================
  Files                   ?      539           
  Lines                   ?    46937           
  Branches                ?        0           
===============================================
  Hits                    ?    11231           
  Misses                  ?    34928           
  Partials                ?      778           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (7)
pkg/gateway/mirror/service/mirror.go (5)

199-214: Improve error context in status.WrapWith* functions

In the error handling blocks at lines 199, 204, 209, and 214, you are wrapping errors using status.WrapWith* functions, which is appropriate. However, consider adding more contextual information to the error messages to aid in debugging. Including details about the request or target could provide valuable insights when investigating issues.


218-224: Ensure comprehensive logging of non-gRPC errors

At lines 218-224, when status.FromError(err) returns a nil status or empty message, you log an error indicating that the call did not return a gRPC status error. While you handle this scenario to prevent nil pointer dereferences, consider logging the original error with more context to aid in troubleshooting unexpected error types.


273-274: Consistent error wrapping for connection not found

At lines 273-274, you wrap the error when a connection is not found:

err = status.WrapWithInternal(
    mirror.RegisterRPCName+" API connection not found", err,
)

Ensure that this error wrapping is consistent with how similar errors are handled elsewhere in the codebase. Consistent error messages improve maintainability and debugging efficiency.


283-283: Provide additional context in error logs

At line 283, you log the error without additional context:

log.Error(err)

Consider providing more information in the log message to clarify where and why the error occurred. For example:

log.Errorf("failed during %s: %v", mirror.RegisterRPCName, err)

This adds valuable context that can assist in debugging.


286-289: Enhance error message clarity in status.ParseError

In the error wrapping at lines 286-289:

st, msg, err := status.ParseError(err, codes.Internal,
    "failed to parse "+mirror.RegisterRPCName+" gRPC error response",
)

Ensure that the error message clearly conveys the nature of the failure. Including specific details from the original error can make troubleshooting more straightforward.

pkg/gateway/mirror/handler/grpc/handler.go (2)

120-131: Increase test coverage for new error handling blocks

The newly added error handling blocks in the Register method are not covered by unit tests. Improving test coverage ensures that these error scenarios are properly handled and prevents regressions in the future.

Consider adding unit tests that:

  • Trigger the default case in the error handling switch statement by simulating unexpected errors.
  • Verify that the status.WrapWithInternal function wraps errors correctly.
  • Check that the error details (BadRequest, FieldViolations) are populated as expected.

Would you like assistance in creating these tests or opening a GitHub issue to track this task?

Also applies to: 145-145

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 120-123: pkg/gateway/mirror/handler/grpc/handler.go#L120-L123
Added lines #L120 - L123 were not covered by tests


[warning] 125-129: pkg/gateway/mirror/handler/grpc/handler.go#L125-L129
Added lines #L125 - L129 were not covered by tests


[warning] 131-131: pkg/gateway/mirror/handler/grpc/handler.go#L131
Added line #L131 was not covered by tests


393-401: Increase test coverage for stream error handling

The added error handling code within streaming methods like StreamSearch, StreamSearchByID, and others is not covered by tests. Ensuring these paths are tested is crucial for robust error handling in streaming scenarios.

Consider:

  • Adding integration tests that simulate errors during streaming operations to trigger the new error handling code.
  • Verifying that the server responds with appropriate gRPC status codes and error messages when errors occur.

Would you like assistance in writing these tests or creating a GitHub issue to address this?

Also applies to: 418-426, 452-460, 477-485, 781-789, 806-814, 845-853, 870-878

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 393-394: pkg/gateway/mirror/handler/grpc/handler.go#L393-L394
Added lines #L393 - L394 were not covered by tests


[warning] 396-397: pkg/gateway/mirror/handler/grpc/handler.go#L396-L397
Added lines #L396 - L397 were not covered by tests


[warning] 401-401: pkg/gateway/mirror/handler/grpc/handler.go#L401
Added line #L401 was not covered by tests

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 4dc9354 and 2a1b476.

📒 Files selected for processing (2)
  • pkg/gateway/mirror/handler/grpc/handler.go (37 hunks)
  • pkg/gateway/mirror/service/mirror.go (4 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
pkg/gateway/mirror/handler/grpc/handler.go

[warning] 120-123: pkg/gateway/mirror/handler/grpc/handler.go#L120-L123
Added lines #L120 - L123 were not covered by tests


[warning] 125-129: pkg/gateway/mirror/handler/grpc/handler.go#L125-L129
Added lines #L125 - L129 were not covered by tests


[warning] 131-131: pkg/gateway/mirror/handler/grpc/handler.go#L131
Added line #L131 was not covered by tests


[warning] 145-145: pkg/gateway/mirror/handler/grpc/handler.go#L145
Added line #L145 was not covered by tests


[warning] 393-394: pkg/gateway/mirror/handler/grpc/handler.go#L393-L394
Added lines #L393 - L394 were not covered by tests


[warning] 396-397: pkg/gateway/mirror/handler/grpc/handler.go#L396-L397
Added lines #L396 - L397 were not covered by tests


[warning] 401-401: pkg/gateway/mirror/handler/grpc/handler.go#L401
Added line #L401 was not covered by tests


[warning] 418-419: pkg/gateway/mirror/handler/grpc/handler.go#L418-L419
Added lines #L418 - L419 were not covered by tests


[warning] 421-422: pkg/gateway/mirror/handler/grpc/handler.go#L421-L422
Added lines #L421 - L422 were not covered by tests


[warning] 426-426: pkg/gateway/mirror/handler/grpc/handler.go#L426
Added line #L426 was not covered by tests


[warning] 452-453: pkg/gateway/mirror/handler/grpc/handler.go#L452-L453
Added lines #L452 - L453 were not covered by tests


[warning] 455-456: pkg/gateway/mirror/handler/grpc/handler.go#L455-L456
Added lines #L455 - L456 were not covered by tests


[warning] 460-460: pkg/gateway/mirror/handler/grpc/handler.go#L460
Added line #L460 was not covered by tests


[warning] 477-478: pkg/gateway/mirror/handler/grpc/handler.go#L477-L478
Added lines #L477 - L478 were not covered by tests


[warning] 480-481: pkg/gateway/mirror/handler/grpc/handler.go#L480-L481
Added lines #L480 - L481 were not covered by tests


[warning] 485-485: pkg/gateway/mirror/handler/grpc/handler.go#L485
Added line #L485 was not covered by tests


[warning] 781-782: pkg/gateway/mirror/handler/grpc/handler.go#L781-L782
Added lines #L781 - L782 were not covered by tests


[warning] 784-785: pkg/gateway/mirror/handler/grpc/handler.go#L784-L785
Added lines #L784 - L785 were not covered by tests


[warning] 789-789: pkg/gateway/mirror/handler/grpc/handler.go#L789
Added line #L789 was not covered by tests


[warning] 806-807: pkg/gateway/mirror/handler/grpc/handler.go#L806-L807
Added lines #L806 - L807 were not covered by tests


[warning] 809-810: pkg/gateway/mirror/handler/grpc/handler.go#L809-L810
Added lines #L809 - L810 were not covered by tests


[warning] 814-814: pkg/gateway/mirror/handler/grpc/handler.go#L814
Added line #L814 was not covered by tests


[warning] 845-846: pkg/gateway/mirror/handler/grpc/handler.go#L845-L846
Added lines #L845 - L846 were not covered by tests


[warning] 848-849: pkg/gateway/mirror/handler/grpc/handler.go#L848-L849
Added lines #L848 - L849 were not covered by tests


[warning] 853-853: pkg/gateway/mirror/handler/grpc/handler.go#L853
Added line #L853 was not covered by tests


[warning] 870-871: pkg/gateway/mirror/handler/grpc/handler.go#L870-L871
Added lines #L870 - L871 were not covered by tests


[warning] 873-874: pkg/gateway/mirror/handler/grpc/handler.go#L873-L874
Added lines #L873 - L874 were not covered by tests


[warning] 878-878: pkg/gateway/mirror/handler/grpc/handler.go#L878
Added line #L878 was not covered by tests


[warning] 1045-1046: pkg/gateway/mirror/handler/grpc/handler.go#L1045-L1046
Added lines #L1045 - L1046 were not covered by tests


[warning] 1048-1049: pkg/gateway/mirror/handler/grpc/handler.go#L1048-L1049
Added lines #L1048 - L1049 were not covered by tests

🔇 Additional comments (1)
pkg/gateway/mirror/service/mirror.go (1)

Line range hint 228-236: Verify handling of codes.Unimplemented to prevent unintended disconnections

At lines 228-236, when the status code is codes.Unimplemented, the code attempts to disconnect from the target:

if st.Code() == codes.Unimplemented {
    // ...
    if err := m.Disconnect(ctx, &payload.Mirror_Target{
        Host: host,
        Port: uint32(port),
    }); err != nil {
        log.Errorf("failed to disconnect %s, err: %v", target, err)
    }
}

Ensure that this behavior is appropriate and that codes.Unimplemented errors only occur in scenarios where disconnection is intended. It's important to confirm that valid connections are not being disconnected due to unexpected Unimplemented errors.

Run the following script to identify where codes.Unimplemented is returned in the codebase:

This will help verify if there are any cases where codes.Unimplemented could be returned unintentionally, leading to undesired disconnections.

✅ Verification successful

Handling of codes.Unimplemented Verified

The usage of codes.Unimplemented is confined to auto-generated gRPC server interfaces for unimplemented methods. Therefore, the disconnection logic in mirror.go triggered by codes.Unimplemented is appropriate and does not lead to unintended disconnections.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for all occurrences of `codes.Unimplemented` being returned.

# Test: Find all instances where `codes.Unimplemented` is used in returning errors.
rg --type go 'status\.New\(codes\.Unimplemented' -A 5 -B 5

# Test: Find all instances where `status.Error` or `status.Errorf` is called with `codes.Unimplemented`.
rg --type go 'status\.Errorf?\(codes\.Unimplemented' -A 5 -B 5

Length of output: 35012

pkg/gateway/mirror/service/mirror.go Show resolved Hide resolved
pkg/gateway/mirror/handler/grpc/handler.go Show resolved Hide resolved
pkg/gateway/mirror/handler/grpc/handler.go Show resolved Hide resolved
Copy link

cloudflare-workers-and-pages bot commented Oct 8, 2024

Deploying vald with  Cloudflare Pages  Cloudflare Pages

Latest commit: 011b2ad
Status: ✅  Deploy successful!
Preview URL: https://358c835f.vald.pages.dev
Branch Preview URL: https://backport-release-v1-7-fix-mi-xwc2.vald.pages.dev

View logs

@kpango kpango merged commit 5d12e94 into release/v1.7 Oct 8, 2024
31 of 32 checks passed
@kpango kpango deleted the backport/release/v1.7/fix/mirror/error-handling branch October 8, 2024 23:25
@kpango kpango mentioned this pull request Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants