Add rollout restart agent e2e test #2799

vankichi · 2025-01-10T15:40:46Z

Description

Implement the E2E test and run it on GitHub Actions
- Background: Search fails w/DEADLINE_EXCEEDED when agent rolling update
Function
- Setup Vald cluster w/PV and disabled in-memory mode
- Rollout-restart agent after createIndex done
- Continue search operation during rollout-restart agent pods
- Catch DEADLINE_EXCEEDED error when it happens on the search operation
  - if the error occurs, E2E test ends with Fatal

Related Issue

Versions

Vald Version: v1.7.15
Go Version: v1.23.4
Rust Version: v1.83.0
Docker Version: v27.4.0
Kubernetes Version: v1.32.0
Helm Version: v3.16.3
NGT Version: v2.3.5
Faiss Version: v1.9.0

Checklist

I have read the CONTRIBUTING document and completed our CLA agreement.
I have checked open Pull Requests for the similar feature or fixes?

Special notes for your reviewer

Summary by CodeRabbit

Release Notes

New Features
- Added configuration options for K3D storage setup in GitHub Actions workflows.
- Introduced new end-to-end (E2E) test for agent rollout restart functionality.
- New input parameters for K3D setup to enhance configurability.
- New Helm values configuration for rollout agent.
- Added a new YAML configuration file for rollout agent settings.
- Enhanced Docker image build configurations to support multiple architectures (linux/amd64 and linux/arm64).
Testing Improvements
- Enhanced Kubernetes client with StatefulSet readiness check.
- Added new E2E test scenario for system resilience during agent restart.
- Increased timeout for gRPC connection to improve reliability.
- Improved error handling and synchronization in streaming operations.
Configuration Updates
- Added configurable wait time for resource readiness in E2E tests.
- Introduced new rollout resource management function in the kubectl package.

coderabbitai · 2025-01-10T15:40:54Z

📝 Walkthrough

Walkthrough

This pull request introduces enhancements to the end-to-end (E2E) testing infrastructure for the Vald project. The changes focus on adding configurability to K3D storage setup, creating a new rollout agent configuration, and implementing a new E2E test scenario for agent rollout restart. The modifications span multiple files, including GitHub Actions workflows, Makefiles, Kubernetes client implementations, and test scripts, with the primary goal of improving testing capabilities and system resilience.

Changes

File	Change Summary
`.github/actions/setup-e2e/action.yaml`	Added new input `require_k3d_storage` to control K3D storage setup
`.github/actions/setup-k3d/action.yaml`	Introduced `storage` input to enable local-path-storage deployment
`.github/helm/values/values-rollout-agent.yaml`	New configuration file for rollout agent with comprehensive settings
`.github/workflows/e2e.yaml`	Added new job `e2e-stream-crud-with-rollout-restart-agent` for E2E testing
`Makefile`	Added `E2E_WAIT_FOR_RESOURCE_READY` variable with 3m default timeout
`Makefile.d/e2e.mk`	Created new target `e2e/rollout/restart/agent` for agent rollout restart testing
`Makefile.d/functions.mk`	Updated `run-e2e-crud-test` function with new `-wait-resource-ready` parameter
`tests/e2e/crud/crud_test.go`	Added `waitResourceReadyDuration` and new test `TestE2EAgentRolloutRestart`
`tests/e2e/kubernetes/client/client.go`	Introduced `WaitForStatefulSetReady` method to check StatefulSet readiness
`tests/e2e/kubernetes/kubectl/kubectl.go`	Added `RolloutResourceName` function for resource rollout management
`tests/e2e/operation/operation.go`	Increased gRPC connection timeout from 5s to 60s
`tests/e2e/operation/stream.go`	Enhanced error handling in `UpsertWithParameters` method using mutex
`.github/workflows/dockers-*.yaml`	Added `platforms: linux/amd64,linux/arm64` parameter to multiple Docker image build workflows
`dockers/*/Dockerfile`	Reformatted `ENTRYPOINT` lines in various Dockerfiles without functional changes

Possibly related PRs

Multi-PF build for example-client image #2713: The changes in this PR involve adding a new input parameter for building the Docker image, which is related to the configurability aspect of the GitHub Actions workflow, similar to the require_k3d_storage parameter introduced in the main PR.

Suggested Labels

priority/medium, area/internal, actions/backport/release/v1.7, area/index/job/correction, area/index/job/readreplica

Suggested Reviewers

kpango
kmrmt

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

vdaas-ci · 2025-01-10T15:41:25Z

[CHATOPS:HELP] ChatOps commands.

🙆‍♀️ /approve - approve
🍱 /format - format codes and add licenses
✅ /gen-test - generate test codes
🏷️ /label - add labels
🔚 2️⃣ 🔚 /label actions/e2e-deploy - run E2E deploy & integration test

cloudflare-workers-and-pages · 2025-01-10T15:45:07Z

Deploying vald with Cloudflare Pages

Latest commit:	`625c035`
Status:	✅ Deploy successful!
Preview URL:	https://38d1d8ff.vald.pages.dev
Branch Preview URL:	https://test-e2e-add-rolloutrestart.vald.pages.dev

View logs

Signed-off-by: vankichi <[email protected]>

vankichi · 2025-01-15T02:40:48Z

@coderabbitai review

coderabbitai · 2025-01-15T02:40:53Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

tests/e2e/kubernetes/kubectl/kubectl.go (1)
45-86: LGTM! Well-structured implementation of resource rollout management.

The implementation is robust with:

Proper error handling and resource cleanup with deferred Close() calls

Concurrent output handling using goroutines and bufio.Scanner

Clear command construction with timeout parameter

However, consider adding debug logging using t.Logf for better test observability.
 func RolloutResourceName(
 	ctx context.Context, t *testing.T, resource string, name string, timeout string,
 ) error {
 	t.Helper()
+	t.Logf("Rolling out resource %s/%s with timeout %s", resource, name, timeout)
 	cmd := exec.CommandContext(ctx, "kubectl", "rollout", "restart", resource, name)
 	if err := runCmd(t, cmd); err != nil {
+		t.Logf("Failed to restart resource: %v", err)
 		return err
 	}

 	r := strings.Join([]string{resource, name}, "/")
 	to := strings.Join([]string{"--timeout", timeout}, "=")
 	cmd = exec.CommandContext(ctx, "kubectl", "rollout", "status", r, "--watch", to)
tests/e2e/kubernetes/client/client.go (1)
210-233: LGTM! Robust implementation of StatefulSet readiness check.

The implementation includes:

Proper context handling with timeout

Clear readiness condition check

Efficient polling with ticker

Consider adding exponential backoff to reduce API server load.
+import "k8s.io/client-go/util/retry"

 func (cli *client) WaitForStatefulSetReady(
 	ctx context.Context, namespace, name string, timeout time.Duration,
 ) (ok bool, err error) {
 	ctx, cancel := context.WithTimeout(ctx, timeout)
 	defer cancel()

-	tick := time.NewTicker(time.Second)
-	defer tick.Stop()
-
-	for {
+	return true, retry.OnError(retry.DefaultBackoff, func() error {
 		ss, err := cli.clientset.AppsV1().StatefulSets(namespace).Get(ctx, name, metav1.GetOptions{})
 		if err != nil {
-			return false, err
+			return err
 		}
 		if ss.Status.UpdatedReplicas == ss.Status.Replicas && ss.Status.ReadyReplicas == ss.Status.Replicas {
-			return true, nil
+			return nil
 		}
-		select {
-		case <-ctx.Done():
-			return false, ctx.Err()
-		case <-tick.C:
-		}
-	}
+		return fmt.Errorf("statefulset %s/%s not ready", namespace, name)
+	})
 }
Makefile.d/e2e.mk (1)
92-96: LGTM! Well-structured test target for agent rollout restart.

The target follows the established pattern for e2e test targets.

Note: There's a typo in the target name: e2e/rollaout/restart/agent should be e2e/rollout/restart/agent.
-.PHONY: e2e/rollaout/restart/agent
+.PHONY: e2e/rollout/restart/agent
 ## run rollout-restart agent e2e
-e2e/rollout/restart/agent:
+e2e/rollout/restart/agent:
 	$(call run-e2e-crud-test,-run TestE2EAgentRolloutRestart)
tests/e2e/crud/crud_test.go (1)
999-1103: Consider improving error handling and test reliability.

The test implementation has a few areas that could be enhanced:

The sleep duration between search attempts is hardcoded to 10 seconds.

The error aggregation could potentially consume a lot of memory if many errors occur.

The test might not catch all DEADLINE_EXCEEDED errors due to the sleep interval.

Consider these improvements:
-				time.Sleep(10 * time.Second)
+				time.Sleep(time.Second) // More frequent checks to catch errors
+				if serr != nil && errors.Is(serr, context.DeadlineExceeded) {
+					// Early exit on first deadline exceeded error
+					return
+				}
Also, consider adding a timeout context for the entire test to prevent it from running indefinitely if something goes wrong:
-	ctx, cancel := context.WithCancel(context.Background())
+	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6065fd9 and 6141552.

📒 Files selected for processing (10)

.github/actions/setup-e2e/action.yaml (2 hunks)
.github/actions/setup-k3d/action.yaml (2 hunks)
.github/helm/values/values-rollout-agent.yaml (1 hunks)
.github/workflows/e2e.yaml (2 hunks)
Makefile (1 hunks)
Makefile.d/e2e.mk (1 hunks)
Makefile.d/functions.mk (1 hunks)
tests/e2e/crud/crud_test.go (5 hunks)
tests/e2e/kubernetes/client/client.go (2 hunks)
tests/e2e/kubernetes/kubectl/kubectl.go (2 hunks)

🧰 Additional context used

🪛 GitHub Actions: Run formatter

.github/helm/values/values-rollout-agent.yaml

[error] 2-2: File needs formatting. Copyright year needs to be updated from 2024 to 2025. Please execute make format locally to fix this issue.

🪛 actionlint (1.7.4)

.github/workflows/e2e.yaml

362-362: shellcheck reported issue in this script: SC2086:info:1:42: Double quote to prevent globbing and word splitting

(shellcheck)

377-377: shellcheck reported issue in this script: SC2086:info:12:23: Double quote to prevent globbing and word splitting

(shellcheck)

🔇 Additional comments (12)

tests/e2e/kubernetes/client/client.go (1)

71-75: LGTM! Clear interface definition for StatefulSet readiness check.

The interface method signature is well-defined with appropriate parameters.

Makefile.d/functions.mk (1)

166-166: LGTM! Good addition of resource readiness wait parameter.

The parameter follows the established pattern and integrates well with the existing function.

tests/e2e/crud/crud_test.go (1)

66-67: LGTM! New duration variable for resource readiness.

The addition of waitResourceReadyDuration aligns with the PR's objective to handle agent rolling updates effectively.

.github/helm/values/values-rollout-agent.yaml (1)

34-60: Review agent configuration for production readiness.

The agent configuration has several points to consider:

The terminationGracePeriodSeconds of 600 seconds might be too long for CI/CD environments.

The storage size of 500Mi might be insufficient for production workloads.

The auto index duration limit of 2m might be too short for large datasets.

Please verify these settings in your environment:

.github/actions/setup-e2e/action.yaml (1)

35-38: LGTM! Storage configuration for k3d setup.

The addition of require_k3d_storage parameter and its usage in the k3d setup step is well-implemented and properly documented.

Also applies to: 92-92

.github/actions/setup-k3d/action.yaml (1)

39-42: LGTM! K3d storage deployment configuration.

The implementation of storage deployment is clean and follows the action's existing patterns.

Also applies to: 131-135
.github/workflows/e2e.yaml (5)

351-359: LGTM! Job configuration is well-structured.

The job configuration follows GitHub Actions best practices with appropriate timeout and container setup.

364-368: Good use of persistent storage for test stability.

The require_k3d_storage: true configuration aligns with the PR objective of using persistent volumes and disabling in-memory mode.

376-393: Test configuration looks comprehensive.

The E2E test configuration includes:

Appropriate dataset setup

Balanced operation counts (1000 for search operations, 100 for others)

Reasonable wait duration for index creation (3m)

Proper environment variable passing

🧰 Tools

🪛 actionlint (1.7.4)

377-377: shellcheck reported issue in this script: SC2086:info:12:23: Double quote to prevent globbing and word splitting

(shellcheck)

463-463: Good job including the new test in slack notifications.

The new test job is correctly added to the slack-notification job dependencies.

369-375: Verify the Helm values file for rollout agent configuration.

The deployment uses a specific values file for rollout agent configuration.

Run this script to inspect the Helm values:

✅ Verification successful

✓ Helm values file for rollout agent is properly configured

The values file contains appropriate configurations including:

Persistent volume settings with 500Mi storage and ReadWriteOncePod access mode

Resource allocations for all components

Agent replica configuration with parallel pod management

NGT indexing parameters
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the Helm values file for rollout agent configuration
# Check if the values file exists and inspect its content for PV and memory settings

if [ -f ".github/helm/values/values-rollout-agent.yaml" ]; then
    echo "Checking values-rollout-agent.yaml content:"
    cat .github/helm/values/values-rollout-agent.yaml
else
    echo "Warning: values-rollout-agent.yaml not found"
fi
Length of output: 2351
Makefile (1)

370-370: LGTM! Good addition of resource wait timeout.

The new E2E_WAIT_FOR_RESOURCE_READY variable with a default value of 3m is a good practice for ensuring test stability.

.github/helm/values/values-rollout-agent.yaml

Signed-off-by: vankichi <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

tests/e2e/operation/operation.go (1)
204-204: Consider making the timeout configurable.

Since different E2E test scenarios might require different timeout values (e.g., regular operations vs rolling updates), consider making this timeout configurable through an environment variable.

Example implementation:
+ const defaultGRPCTimeout = 60 * time.Second
+
+ func getGRPCTimeout() time.Duration {
+   if timeout := os.Getenv("E2E_GRPC_TIMEOUT"); timeout != "" {
+     if d, err := time.ParseDuration(timeout); err == nil {
+       return d
+     }
+   }
+   return defaultGRPCTimeout
+ }

  func (c *client) getGRPCConn() (*grpc.ClientConn, error) {
    return grpc.NewClient(
      c.host+":"+strconv.Itoa(c.port),
      grpc.WithInsecure(),
      grpc.WithKeepaliveParams(
        keepalive.ClientParameters{
          Time:                time.Second,
-         Timeout:             60 * time.Second,
+         Timeout:             getGRPCTimeout(),
          PermitWithoutStream: true,
        },
      ),
    )
  }
tests/e2e/operation/stream.go (2)
875-885: Consider using defer for mutex unlock.

While the error handling is correct, consider using defer mu.Unlock() right after the lock to prevent potential lock leaks in case of panics.
-				mu.Lock()
-				rerr = ierr
-				mu.Unlock()
+				mu.Lock()
+				defer mu.Unlock()
+				rerr = ierr
924-927: Consider consolidating error handling patterns.

The error handling pattern here differs from the goroutine's error handling. Consider using the same pattern with a local error variable for consistency.
-			mu.Lock()
-			rerr = errors.Join(rerr, err)
-			mu.Unlock()
-			return
+			mu.Lock()
+			defer mu.Unlock()
+			rerr = errors.Join(rerr, err)
+			return
tests/e2e/crud/crud_test.go (3)
999-1103: Comprehensive test implementation with proper concurrency handling.

The test implementation is well-structured with:

Proper context handling with cancellation

Concurrent search operations during rollout

Thread-safe error aggregation

Clean resource management with WaitGroup and channels

However, consider adding a timeout context for the entire test to prevent indefinite hanging.
-	ctx, cancel := context.WithCancel(context.Background())
+	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
Also, consider adding a constant for the sleep duration:
+	const searchRetryInterval = 10 * time.Second
 				}
-				time.Sleep(10 * time.Second)
+				time.Sleep(searchRetryInterval)
1028-1034: Consider adding retry mechanism with backoff.

The search function could benefit from a retry mechanism with exponential backoff to handle transient failures more gracefully.
+	searchWithRetry := func() error {
+		backoff := time.Second
+		maxBackoff := 30 * time.Second
+		for i := 0; i < 3; i++ {
+			err := searchFunc()
+			if err == nil {
+				return nil
+			}
+			if st, ok := status.FromError(err); ok && st.Code() != codes.DeadlineExceeded {
+				return err
+			}
+			time.Sleep(backoff)
+			backoff *= 2
+			if backoff > maxBackoff {
+				backoff = maxBackoff
+			}
+		}
+		return searchFunc()
+	}
1047-1057: Consider adding metrics or logging for error tracking.

The error handling in the search loop could benefit from metrics or logging to track the frequency and patterns of deadline exceeded errors.
 				err = searchFunc()
 				if err != nil {
 					st, ok := status.FromError(err)
 					if ok && st.Code() == codes.DeadlineExceeded {
+						t.Logf("Search operation timed out: %v", err)
 						_, _, rerr := status.ParseError(err, codes.DeadlineExceeded, "an error occurred")
 						mu.Lock()
 						serr = errors.Join(serr, rerr)
 						mu.Unlock()
 					}
+				} else {
+					t.Log("Search operation completed successfully")
 				}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f438c7c and 11cd40d.

📒 Files selected for processing (3)

tests/e2e/crud/crud_test.go (5 hunks)
tests/e2e/operation/operation.go (1 hunks)
tests/e2e/operation/stream.go (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (10)

GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build

🔇 Additional comments (5)

tests/e2e/operation/operation.go (1)

204-204: LGTM! Increasing the timeout to handle rolling updates.

The increase from 5s to 60s timeout is a reasonable change to handle potential delays during agent rolling updates and should help prevent DEADLINE_EXCEEDED errors in the E2E tests.

tests/e2e/operation/stream.go (2)

858-869: Improve error handling synchronization.

The addition of mutex synchronization and local error variable improves thread safety when handling errors in the streaming operation.

898-898: LGTM: Thread-safe error aggregation.

The error joining operation is correctly using the local error variable before synchronizing with the shared error.

tests/e2e/crud/crud_test.go (2)

66-67: LGTM: Clear variable naming.

The new duration variables have clear, descriptive names that indicate their purpose.

137-141: LGTM: Proper error handling in initialization.

The initialization of waitResourceReadyDuration includes appropriate error handling.

Signed-off-by: vankichi <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/e2e/crud/crud_test.go (1)
1035-1062: Well-implemented concurrent search with proper error handling.

The implementation correctly uses sync primitives (WaitGroup, Mutex) and channels for coordination. The error handling specifically targets DEADLINE_EXCEEDED errors as required.

Consider making the sleep duration (10s) configurable via a flag parameter for better test flexibility:
+var searchRetryInterval time.Duration
+
 func init() {
+    searchRetry := flag.String("search-retry-interval", "10s", "interval between search retries during rollout")
+    searchRetryInterval, err = time.ParseDuration(*searchRetry)
+    if err != nil {
+        panic(err)
+    }
 }
Then update line 1059:
-                time.Sleep(10 * time.Second)
+                time.Sleep(searchRetryInterval)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be7b4de and fdc3b87.

📒 Files selected for processing (2)

tests/e2e/crud/crud_test.go (5 hunks)
tests/e2e/operation/operation.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/e2e/operation/operation.go

⏰ Context from checks skipped due to timeout of 90000ms (70)

GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build

🔇 Additional comments (3)

tests/e2e/crud/crud_test.go (3)

66-67: LGTM! Well-structured variable initialization.

The addition of waitResourceReadyDuration with proper error handling and a reasonable default timeout value of 3 minutes is appropriate for managing resource readiness checks.

Also applies to: 101-141

1071-1099: Comprehensive post-rollout verification.

The test properly verifies the system state after rollout by:

Checking index information

Verifying object existence

Testing object retrieval

Confirming remove operations

1064-1069: Verify rollout restart command execution.

The implementation uses kubectl for rollout restart. Let's verify the command execution and logging.

✅ Verification successful

Implementation of rollout restart command is correct and complete

The kubectl rollout restart implementation includes proper command execution, error handling, status monitoring with timeout, and appropriate logging. The code follows best practices for both the command execution and test implementation.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify kubectl rollout restart command usage
# Test: Search for kubectl rollout restart command usage
rg -A 5 'kubectl.*rollout.*restart'
Length of output: 916

Script:
#!/bin/bash
# Get the complete implementation of kubectl package functions
ast-grep --pattern 'func RolloutResourceName($$$) {
  $$$
}'

echo "---"

ast-grep --pattern 'func runCmd($$$) {
  $$$
}'
Length of output: 127

Script:
#!/bin/bash
# Search for RolloutResourceName function implementation
rg -A 15 "^func RolloutResourceName" tests/e2e/kubernetes/kubectl/

echo "---"

# Search for runCmd function implementation
rg -A 10 "^func runCmd" tests/e2e/kubernetes/kubectl/
Length of output: 2022

Signed-off-by: kpango <[email protected]>

This commit fixes the style issues introduced in ef92673 according to the output from Gofumpt and Prettier. Details: #2799

github-actions · 2025-01-16T07:50:53Z

dockers/dev/Dockerfile

@@ -141,4 +141,4 @@ RUN --mount=type=bind,target=.,rw \
    && make faiss/install \
    && rm -rf ${GOPATH}/src/github.com/${ORG}/${REPO}/*
 # skipcq: DOK-DL3002
-USER root:root
+USER root:root


⚠️ [hadolint] <DL3002> _{reported by reviewdog 🐶}
Last USER should not be root

Signed-off-by: kpango <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

.github/workflows/dockers-index-creation-image.yaml (1)

246-246: LGTM! Consider updating documentation for multi-platform support.

The change consistently implements multi-platform support. Consider updating the project documentation to reflect the new multi-architecture support.

Consider adding a note about multi-platform support in the project's documentation, particularly in the deployment guide.
.github/workflows/dockers-index-correction-image.yaml (1)
252-252: LGTM! Consider implementing cross-platform E2E tests.

The multi-platform support is consistently implemented. To ensure reliability across architectures:

Consider running E2E tests on both AMD64 and ARM64 platforms

Verify that the agent rollout restart functionality works correctly on both architectures

Consider implementing a matrix strategy in the E2E test workflow to validate the rollout restart functionality across both architectures:
strategy:
  matrix:
    platform: [linux/amd64, linux/arm64]
.github/workflows/dockers-gateway-mirror-image.yaml (1)

Line range hint 266-266: Consistent multi-platform support across all Docker image workflows.

The addition of platforms: linux/amd64,linux/arm64 is consistently applied across all Docker image workflow files:

dockers-manager-index-image.yaml:266

dockers-agent-faiss-image.yaml:268

dockers-agent-ngt-image.yaml:272

dockers-agent-sidecar-image.yaml:298

This systematic update ensures that all components support both AMD64 and ARM64 architectures, which is particularly important for:

Running e2e tests across different architectures

Supporting diverse deployment environments

Ensuring consistent behavior during agent rollout tests

Consider documenting the multi-architecture support in the project's deployment guide, including any specific considerations for running e2e tests on different architectures.

Also applies to: 268-268, 272-272, 298-298

.github/workflows/dockers-agent-sidecar-image.yaml (1)

298-298: LGTM! Platform configuration is consistent.

The multi-arch support configuration matches the pattern used across other workflow files.

Consider implementing a CI test to verify that the built images work correctly on both architectures, especially for components with architecture-specific optimizations like NGT and FAISS.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fdc3b87 and 0289138.

⛔ Files ignored due to path filters (1)

hack/docker/gen/main.go is excluded by !**/gen/**

📒 Files selected for processing (46)

.github/workflows/dockers-agent-faiss-image.yaml (1 hunks)
.github/workflows/dockers-agent-image.yaml (1 hunks)
.github/workflows/dockers-agent-ngt-image.yaml (1 hunks)
.github/workflows/dockers-agent-sidecar-image.yaml (1 hunks)
.github/workflows/dockers-benchmark-job-image.yaml (1 hunks)
.github/workflows/dockers-benchmark-operator-image.yaml (1 hunks)
.github/workflows/dockers-dev-container-image.yaml (1 hunks)
.github/workflows/dockers-discoverer-k8s-image.yaml (1 hunks)
.github/workflows/dockers-example-client-image.yaml (1 hunks)
.github/workflows/dockers-gateway-filter-image.yaml (1 hunks)
.github/workflows/dockers-gateway-lb-image.yaml (1 hunks)
.github/workflows/dockers-gateway-mirror-image.yaml (1 hunks)
.github/workflows/dockers-helm-operator-image.yaml (1 hunks)
.github/workflows/dockers-index-correction-image.yaml (1 hunks)
.github/workflows/dockers-index-creation-image.yaml (1 hunks)
.github/workflows/dockers-index-deletion-image.yaml (1 hunks)
.github/workflows/dockers-index-operator-image.yaml (1 hunks)
.github/workflows/dockers-index-save-image.yaml (1 hunks)
.github/workflows/dockers-manager-index-image.yaml (1 hunks)
.github/workflows/dockers-readreplica-rotate-image.yaml (1 hunks)
dockers/agent/core/agent/Dockerfile (1 hunks)
dockers/agent/core/faiss/Dockerfile (1 hunks)
dockers/agent/core/ngt/Dockerfile (1 hunks)
dockers/agent/sidecar/Dockerfile (1 hunks)
dockers/binfmt/Dockerfile (1 hunks)
dockers/buildbase/Dockerfile (1 hunks)
dockers/buildkit/Dockerfile (1 hunks)
dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
dockers/ci/base/Dockerfile (1 hunks)
dockers/dev/Dockerfile (1 hunks)
dockers/discoverer/k8s/Dockerfile (1 hunks)
dockers/example/client/Dockerfile (1 hunks)
dockers/gateway/filter/Dockerfile (1 hunks)
dockers/gateway/lb/Dockerfile (1 hunks)
dockers/gateway/mirror/Dockerfile (1 hunks)
dockers/index/job/correction/Dockerfile (1 hunks)
dockers/index/job/creation/Dockerfile (1 hunks)
dockers/index/job/deletion/Dockerfile (1 hunks)
dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
dockers/index/job/save/Dockerfile (1 hunks)
dockers/index/operator/Dockerfile (1 hunks)
dockers/manager/index/Dockerfile (1 hunks)
dockers/operator/helm/Dockerfile (1 hunks)
dockers/tools/benchmark/job/Dockerfile (1 hunks)
dockers/tools/benchmark/operator/Dockerfile (1 hunks)
dockers/tools/cli/loadtest/Dockerfile (1 hunks)

✅ Files skipped from review due to trivial changes (26)

dockers/binfmt/Dockerfile
dockers/tools/benchmark/operator/Dockerfile
dockers/gateway/filter/Dockerfile
dockers/buildkit/Dockerfile
dockers/buildkit/syft/scanner/Dockerfile
dockers/operator/helm/Dockerfile
dockers/index/job/deletion/Dockerfile
dockers/dev/Dockerfile
dockers/agent/core/agent/Dockerfile
dockers/manager/index/Dockerfile
dockers/example/client/Dockerfile
dockers/tools/cli/loadtest/Dockerfile
dockers/gateway/mirror/Dockerfile
dockers/index/job/save/Dockerfile
dockers/index/job/creation/Dockerfile
dockers/discoverer/k8s/Dockerfile
dockers/index/job/correction/Dockerfile
dockers/gateway/lb/Dockerfile
dockers/index/operator/Dockerfile
dockers/ci/base/Dockerfile
dockers/agent/sidecar/Dockerfile
dockers/agent/core/faiss/Dockerfile
dockers/buildbase/Dockerfile
dockers/tools/benchmark/job/Dockerfile
dockers/agent/core/ngt/Dockerfile
dockers/index/job/readreplica/rotate/Dockerfile

⏰ Context from checks skipped due to timeout of 90000ms (110)

GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build

🔇 Additional comments (17)

.github/workflows/dockers-dev-container-image.yaml (1)

58-58: LGTM! Consider monitoring build times.

The addition of multi-architecture support is well-structured. However, building for multiple architectures may increase build duration.

Run this script to check current build times:

✅ Verification successful

Multi-arch builds show expected duration patterns

The workflow is well-optimized with event-based build configurations:

Full builds (27 minutes) run only for pull request events

Quick validation builds (0-1 minutes) run for pull request target events
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check workflow run times before and after the platform addition
gh run list --workflow=dockers-dev-container-image.yaml --limit 10 --json durationInMinutes,status,conclusion \
  | jq -r '.[] | select(.conclusion=="success") | "Duration: \(.durationInMinutes) minutes"'
Length of output: 445

Script:
#!/bin/bash
# Description: Check workflow run times using timestamp differences
gh run list --workflow=dockers-dev-container-image.yaml --limit 10 --json startedAt,updatedAt,conclusion \
  | jq -r '.[] | select(.conclusion=="success") | . as $run | 
    ((($run.updatedAt | fromdateiso8601) - ($run.startedAt | fromdateiso8601)) / 60) | 
    "Duration: \(. | floor) minutes"'
Length of output: 415

Script:
#!/bin/bash
# Description: Get detailed workflow run information
gh run list --workflow=dockers-dev-container-image.yaml --limit 10 \
  --json startedAt,conclusion,event,headBranch \
  | jq -r '.[] | select(.conclusion=="success") | "\(.startedAt): Branch=\(.headBranch) Event=\(.event)"'
Length of output: 753
.github/workflows/dockers-example-client-image.yaml (1)

68-68: LGTM! Multi-architecture support added correctly.

The platform specification for both amd64 and arm64 architectures is properly configured.
.github/workflows/dockers-agent-image.yaml (1)

74-74: LGTM! Verify Rust compilation for both architectures.

The multi-architecture support is correctly configured. Since this is a Rust-based core component, ensure that all Rust dependencies and build processes support both architectures.

Run this script to check Rust-related files for architecture-specific code:

✅ Verification successful

Multi-architecture support is properly configured ✅

The Rust codebase is architecture-agnostic and uses standard system libraries (BLAS, LAPACK) that are well-supported on both amd64 and arm64. The build configuration uses platform-independent flags and common C++ features (C++20, OpenMP) that work reliably across architectures.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check Rust files for architecture-specific code or dependencies
rg -g 'rust/**/*.rs' -g 'rust/**/Cargo.toml' 'target_arch|cfg\(target_arch|platform-specific|target_os'
Length of output: 105

Script:
#!/bin/bash
# First, find all Rust-related files
echo "=== Searching for Rust files ==="
fd -e rs -e toml

# Then, specifically look at Cargo.toml contents
echo -e "\n=== Checking Cargo.toml contents ==="
fd -g 'Cargo.toml' --exec cat {}
Length of output: 9998

Script:
#!/bin/bash
# Check build scripts for architecture-specific configurations
echo "=== Checking build.rs files ==="
fd build.rs --exec cat {}
Length of output: 2674
.github/workflows/dockers-helm-operator-image.yaml (1)

68-68: LGTM! Multi-architecture support added consistently.

The platform configuration aligns with the project-wide standardization of multi-architecture support.
.github/workflows/dockers-index-operator-image.yaml (1)

244-244: LGTM! Verify Go package compatibility.

The multi-architecture support is correctly configured. Given the extensive list of Go package dependencies, ensure all packages support cross-compilation for both architectures.

Run this script to check Go dependencies for potential architecture-specific code:

✅ Verification successful

✅ Dependencies are cross-platform compatible

The codebase uses standard Go syscalls and has no mandatory CGO dependencies or architecture-specific code that would prevent cross-compilation. All platform-specific operations are properly abstracted through Go's standard library.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check Go files for architecture-specific code or CGO usage
rg -g '*.go' 'runtime\.GOARCH|build constraints|CGO|syscall\.' ./internal/ ./pkg/
Length of output: 5915
.github/workflows/dockers-readreplica-rotate-image.yaml (1)

242-242: LGTM! Addition of multi-platform support enhances deployment flexibility.

The addition of AMD64 and ARM64 platform support is a good practice that increases deployment flexibility across different architectures.

Please verify that the CI/CD infrastructure has the necessary QEMU emulation support for multi-platform builds:

✅ Verification successful

Multi-platform build infrastructure is properly configured

The reusable workflow has the necessary setup for multi-platform builds:

QEMU emulation support is configured via docker/setup-qemu-action

Docker Buildx is properly set up via docker/setup-buildx-action

Both are configured to support the specified platforms (linux/amd64,linux/arm64)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify QEMU setup for multi-platform builds
# Check if QEMU is registered for multi-platform builds
docker buildx ls
Length of output: 115

Script:
#!/bin/bash
# Check for QEMU and buildx setup in the workflow
cat .github/workflows/dockers-readreplica-rotate-image.yaml | grep -A 5 -B 5 "qemu\|buildx"
Length of output: 93

Script:
#!/bin/bash
# View the complete workflow file
cat .github/workflows/dockers-readreplica-rotate-image.yaml
Length of output: 8708

Script:
#!/bin/bash
# Check the reusable workflow for QEMU and buildx setup
cat .github/workflows/_docker-image.yaml
Length of output: 4523
.github/workflows/dockers-index-save-image.yaml (1)

246-246: LGTM! Consistent multi-platform support across workflows.

The change maintains consistency with other workflow files by adding the same platform support.

.github/workflows/dockers-index-deletion-image.yaml (1)

246-246: LGTM! Maintains consistent platform support.

The addition aligns with the project-wide effort to support multiple architectures.
.github/workflows/dockers-gateway-lb-image.yaml (1)

260-260: Excellent addition of multi-architecture support!

The addition of platforms: linux/amd64,linux/arm64 across all Docker image build workflows is a strategic enhancement that:

Enables cross-platform compatibility by building images for both AMD64 and ARM64 architectures

Improves accessibility for users running on different hardware platforms

Aligns with modern containerization practices for broader platform support

This change is consistently applied across all Docker image workflows, ensuring uniform multi-architecture support throughout the project.

Let's verify the reusable workflow supports this parameter:

✅ Verification successful

Reusable workflow fully supports multi-architecture builds

The reusable workflow properly defines and handles the platforms parameter, correctly configuring buildx for multi-architecture support and passing the parameter to the build step.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the reusable workflow supports the platforms parameter

# Check if the reusable workflow file exists and contains the platforms parameter
rg -A 5 'platforms:' .github/workflows/_docker-image.yaml
Length of output: 942
.github/workflows/dockers-discoverer-k8s-image.yaml (1)

262-262: Same change as in gateway-lb workflow.

.github/workflows/dockers-gateway-filter-image.yaml (1)

262-262: Same change as in gateway-lb workflow.

.github/workflows/dockers-benchmark-operator-image.yaml (1)

258-258: Same change as in gateway-lb workflow.

.github/workflows/dockers-benchmark-job-image.yaml (1)

264-264: Same change as in gateway-lb workflow.
.github/workflows/dockers-gateway-mirror-image.yaml (1)

264-264: LGTM! Multi-platform support added correctly.

The addition of platforms: linux/amd64,linux/arm64 enables building Docker images for both AMD64 and ARM64 architectures, improving cross-platform compatibility.

Since this is a generated file, let's verify the generator script:

✅ Verification successful

✅ Platform configuration properly implemented in generator

The multi-platform support is correctly implemented in the generator script (hack/docker/gen/main.go) with proper constants and fallback logic, ensuring consistent platform configuration (linux/amd64,linux/arm64) across all 24 Docker image workflow files.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the platform parameter is consistently added across all generated workflow files
# and that it originates from the generator script

# Check the generator script for the platform parameter
echo "Checking generator script for platform parameter..."
rg -A 5 "platforms.*linux/amd64,linux/arm64" hack/docker/gen/main.go

# Verify consistency across all generated workflow files
echo "Verifying consistency across workflow files..."
rg -l "platforms.*linux/amd64,linux/arm64" .github/workflows/dockers-*-image.yaml | wc -l
Length of output: 1706

Script:
#!/bin/bash
# Search for any platform-related configuration in the generator
echo "Searching for platform-related code in generator..."
rg -i "platform" hack/docker/gen/main.go

# Check for template files that might contain the configuration
echo "Checking for template files..."
fd -t f "template" hack/docker/gen/

# If template files exist, search them for platform configuration
echo "Searching template files for platform configuration..."
rg -i "platform" hack/docker/gen/
Length of output: 1535
.github/workflows/dockers-manager-index-image.yaml (1)

266-266: LGTM! Verify compatibility with the reusable workflow.

The platform specification for multi-arch support is correctly formatted and properly placed.

Run this script to verify that the reusable workflow supports the platforms parameter:

✅ Verification successful

Platforms parameter is properly supported in the reusable workflow

The reusable workflow fully supports the platforms parameter through:

Input parameter definition with proper type and optionality

Buildx setup with multi-arch support

Correct usage in the build step
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check if the reusable workflow supports the platforms parameter
# Test: Look for platform-related configuration in the reusable workflow

rg -A 5 'platforms:' .github/workflows/_docker-image.yaml
Length of output: 942
.github/workflows/dockers-agent-faiss-image.yaml (1)

268-268: LGTM! Platform configuration is consistent.

The multi-arch support configuration matches the pattern used across other workflow files.

.github/workflows/dockers-agent-ngt-image.yaml (1)

272-272: LGTM! Platform configuration is consistent.

The multi-arch support configuration matches the pattern used across other workflow files.

Signed-off-by: vankichi <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

tests/e2e/operation/stream.go (2)
272-279: Consider making the timeout duration configurable.

The hardcoded 3-second timeout might be too restrictive for larger datasets or high-latency environments. Consider making this timeout configurable through a parameter.
-	to := time.Second * 3
+	to := c.config.SearchTimeout
Line range hint 652-721: LGTM! Consider extracting common error handling pattern.

The mutex-protected error handling improvements are well-implemented and thread-safe. However, the same pattern is repeated across multiple methods.

Consider extracting this common error handling pattern into a helper function to reduce code duplication:
type streamErrorHandler struct {
    mu   sync.Mutex
    rerr error
}

func (h *streamErrorHandler) handleError(err error) {
    h.mu.Lock()
    defer h.mu.Unlock()
    h.rerr = errors.Join(h.rerr, err)
}

func (h *streamErrorHandler) getError() error {
    h.mu.Lock()
    defer h.mu.Unlock()
    return h.rerr
}
Also applies to: 766-836, 882-951, 994-1062, 1155-1230
tests/e2e/crud/crud_test.go (4)
1031-1036: Add test progress logging to searchFunc.

Consider adding more detailed logging to help diagnose test failures.
 searchFunc := func(ctx context.Context) error {
+    t.Logf("Executing search operation with %d vectors", len(ds.Test[searchFrom:searchFrom+searchNum]))
     return op.Search(t, ctx, operation.Dataset{
         Test:      ds.Test[searchFrom : searchFrom+searchNum],
         Neighbors: ds.Neighbors[searchFrom : searchFrom+searchNum],
     })
 }
1071-1071: Make the sleep duration configurable.

The hardcoded 10-second sleep duration should be configurable to accommodate different test environments.
-				time.Sleep(10 * time.Second)
+				time.Sleep(searchRetryInterval)
1055-1063: Improve error handling specificity.

The error handling could be more specific about the nature of deadline exceeded errors to help with debugging.
 if ierr != nil {
     st, ok := status.FromError(ierr)
     if ok && st.Code() == codes.DeadlineExceeded {
+        t.Logf("Deadline exceeded during search: %v", ierr)
         _, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred")
         mu.Lock()
         e = errors.Join(e, rerr)
         mu.Unlock()
+    } else if ok {
+        t.Logf("Non-deadline error occurred: code=%v, message=%v", st.Code(), st.Message())
     }
 }
1111-1112: Document the commented out Flush operation.

Add a comment explaining why the Flush operation is commented out and why RemoveByTimestamp is used instead.
-	// err = op.Flush(t, ctx)
+	// TODO: Flush operation is temporarily disabled due to potential issues during agent rollout.
+	// Using RemoveByTimestamp as an alternative cleanup method.
 	err = op.RemoveByTimestamp(t, ctx, time.Now().Add(-time.Hour).UnixNano())

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0289138 and 625c035.

📒 Files selected for processing (4)

.github/helm/values/values-rollout-agent.yaml (1 hunks)
pkg/agent/core/ngt/service/ngt.go (1 hunks)
tests/e2e/crud/crud_test.go (6 hunks)
tests/e2e/operation/stream.go (17 hunks)

✅ Files skipped from review due to trivial changes (1)

pkg/agent/core/ngt/service/ngt.go

🚧 Files skipped from review as they are similar to previous changes (1)

.github/helm/values/values-rollout-agent.yaml

⏰ Context from checks skipped due to timeout of 90000ms (180)

GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build

🔇 Additional comments (2)

tests/e2e/crud/crud_test.go (2)

59-69: LGTM! Well-structured configuration variables.

The new configuration variables are well-named and appropriately scoped for their intended use.

88-88: LGTM! Proper flag initialization and error handling.

The flags are well-configured with sensible defaults and proper error handling for duration parsing.

Also applies to: 104-144

codecov · 2025-01-22T09:27:37Z

Codecov Report

Attention: Patch coverage is 25.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 23.93%. Comparing base (6065fd9) to head (625c035).
Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
hack/docker/gen/main.go	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2799      +/-   ##
==========================================
- Coverage   23.93%   23.93%   -0.01%     
==========================================
  Files         546      546              
  Lines       54555    54558       +3     
==========================================
  Hits        13058    13058              
- Misses      40712    40713       +1     
- Partials      785      787       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions bot added area/makefile size/L type/test Adding tests labels Jan 10, 2025

github-actions bot added the priority/low label Jan 10, 2025

vankichi force-pushed the test/e2e/add-rolloutrestart-agent-test branch from 4ae3d97 to 9985751 Compare January 10, 2025 15:45

vankichi added the actions/e2e-deploy label Jan 14, 2025

github-actions bot added the type/ci label Jan 14, 2025

vankichi force-pushed the test/e2e/add-rolloutrestart-agent-test branch from 96c2ce3 to 00434e0 Compare January 14, 2025 10:55

vankichi added actions/e2e-deploy and removed actions/e2e-deploy labels Jan 14, 2025

vankichi added 4 commits January 14, 2025 23:46

✅ Add rollout restart agent e2e test

ea795a0

Signed-off-by: vankichi <[email protected]>

♻️ Fix logic

2f6623b

Signed-off-by: vankichi <[email protected]>

✅ Add e2e

4fb082b

Signed-off-by: vankichi <[email protected]>

✅ Fix CI Err

bd747ff

Signed-off-by: vankichi <[email protected]>

vankichi force-pushed the test/e2e/add-rolloutrestart-agent-test branch from 573283f to bd747ff Compare January 14, 2025 14:46

vankichi added actions/e2e-deploy and removed actions/e2e-deploy labels Jan 14, 2025

✅ Fix

6141552

Signed-off-by: vankichi <[email protected]>

vankichi requested review from kmrmt and kpango January 15, 2025 02:39

vankichi changed the title ~~[WIP] Add rollout restart agent e2e test~~ Add rollout restart agent e2e test Jan 15, 2025

coderabbitai bot requested changes Jan 15, 2025

View reviewed changes

.github/helm/values/values-rollout-agent.yaml Outdated Show resolved Hide resolved

♻️ Fix format

f438c7c

Signed-off-by: vankichi <[email protected]>

coderabbitai bot previously approved these changes Jan 15, 2025

View reviewed changes

♻️ Fix race condition

3df88e8

Signed-off-by: vankichi <[email protected]>

vankichi dismissed coderabbitai[bot]’s stale review via 3df88e8 January 15, 2025 06:45

♻️ Fix race condition

11cd40d

Signed-off-by: vankichi <[email protected]>

coderabbitai bot reviewed Jan 15, 2025

View reviewed changes

coderabbitai bot previously approved these changes Jan 15, 2025

View reviewed changes

♻️ Fix race condition v2

be7b4de

Signed-off-by: vankichi <[email protected]>

vankichi dismissed coderabbitai[bot]’s stale review via be7b4de January 15, 2025 07:29

coderabbitai bot previously approved these changes Jan 15, 2025

View reviewed changes

♻️ Fix race condition

fdc3b87

Signed-off-by: vankichi <[email protected]>

vankichi dismissed coderabbitai[bot]’s stale review via fdc3b87 January 16, 2025 04:57

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

coderabbitai bot previously approved these changes Jan 16, 2025

View reviewed changes

fix

ef92673

Signed-off-by: kpango <[email protected]>

kpango dismissed coderabbitai[bot]’s stale review via ef92673 January 16, 2025 07:50

deepsource-autofix bot added a commit that referenced this pull request Jan 16, 2025

style: format code with Gofumpt and Prettier

da3dccf

This commit fixes the style issues introduced in ef92673 according to the output from Gofumpt and Prettier. Details: #2799

github-actions bot reviewed Jan 16, 2025

View reviewed changes

fix

0289138

Signed-off-by: kpango <[email protected]>

kpango force-pushed the test/e2e/add-rolloutrestart-agent-test branch from da3dccf to 0289138 Compare January 16, 2025 07:53

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

coderabbitai bot previously approved these changes Jan 16, 2025

View reviewed changes

♻️ fix

625c035

Signed-off-by: vankichi <[email protected]>

vankichi dismissed coderabbitai[bot]’s stale review via 625c035 January 22, 2025 08:41

github-actions bot added the area/agent/core/ngt label Jan 22, 2025

coderabbitai bot reviewed Jan 22, 2025

View reviewed changes

coderabbitai bot approved these changes Jan 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rollout restart agent e2e test #2799

Add rollout restart agent e2e test #2799

vankichi commented Jan 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 10, 2025 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Suggested Labels

Suggested Reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

vdaas-ci commented Jan 10, 2025

cloudflare-workers-and-pages bot commented Jan 10, 2025 •

edited

Loading

vankichi commented Jan 15, 2025

coderabbitai bot commented Jan 15, 2025

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

github-actions bot Jan 16, 2025

coderabbitai bot left a comment

coderabbitai bot left a comment

codecov bot commented Jan 22, 2025

Add rollout restart agent e2e test #2799

Are you sure you want to change the base?

Add rollout restart agent e2e test #2799

Conversation

vankichi commented Jan 10, 2025 • edited by coderabbitai bot Loading

Description

Related Issue

Versions

Checklist

Special notes for your reviewer

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Jan 10, 2025 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested Labels

Suggested Reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

vdaas-ci commented Jan 10, 2025

cloudflare-workers-and-pages bot commented Jan 10, 2025 • edited Loading

Deploying vald with Cloudflare Pages

vankichi commented Jan 15, 2025

coderabbitai bot commented Jan 15, 2025

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

github-actions bot Jan 16, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 22, 2025

Codecov Report

vankichi commented Jan 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 10, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

cloudflare-workers-and-pages bot commented Jan 10, 2025 •

edited

Loading