-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX] resolve agent GetGraphStatistics API double-free error problem #2733
base: main
Are you sure you want to change the base?
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThe pull request introduces a series of modifications that include the addition and renaming of various files, particularly in the Docker and workflow configurations. New files for job deletion and Docker configurations have been added, while some existing files have been renamed or removed. Additionally, several dependency version updates have been made across various modules and templates, reflecting a routine maintenance effort to keep the project up to date. Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
[CHATOPS:HELP] ChatOps commands.
|
54ce8b2
to
a9ce087
Compare
Deploying vald with Cloudflare Pages
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2733 +/- ##
==========================================
- Coverage 23.95% 23.24% -0.71%
==========================================
Files 545 539 -6
Lines 54322 53298 -1024
==========================================
- Hits 13011 12390 -621
+ Misses 40537 40170 -367
+ Partials 774 738 -36 ☔ View full report in Codecov by Sentry. |
a9ce087
to
50cfcd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (1)
pkg/agent/core/ngt/service/ngt.go (1)
Line range hint
1456-1477
: Avoid redundant error logging inloadStatistics
In the
loadStatistics
function, the error is logged usinglog.Errorf
and then returned. This can lead to redundant logging if the caller also logs the error. To prevent duplicate log entries and adhere to logging best practices, consider removing the log statement here and allowing the caller to handle the error logging.Apply this diff to adjust the logging:
func (n *ngt) loadStatistics(ctx context.Context) error { if n.IsStatisticsEnabled() { log.Info("loading index statistics to cache") stats, err := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics) if err != nil { - log.Errorf("failed to load index statistics to cache: %v", err) return err } n.statisticsCache.Store(&payload.Info_Index_Statistics{ Valid: stats.Valid, // ... }) } return nil }
🛑 Comments failed to post (3)
internal/core/algorithm/ngt/ngt.go (1)
1149-1182:
⚠️ Potential issuePotential infinite recursion in
GetGraphStatistics
The recursive call to
n.GetGraphStatistics
at line 1178 may lead to infinite recursion ifshared
remainstrue
and the error is notErrNGTIndexStatisticsNotReady
. This can occur if the error persists across retries. Consider revising the error handling logic to prevent infinite recursion, perhaps by limiting the number of retries or handling specific errors differently.pkg/agent/core/ngt/service/ngt.go (2)
1309-1309:
⚠️ Potential issueHandle potential error from
loadStatistics
inRegenerateIndexes
The
RegenerateIndexes
function now returns the result ofn.loadStatistics(ctx)
, which introduces new error propagation. Previously,RegenerateIndexes
might not have returned an error at this point. Ensure that all callers ofRegenerateIndexes
are prepared to handle this potential error, or consider handling the error within the function to maintain existing behavior.
1453-1453:
⚠️ Potential issueHandle potential error from
loadStatistics
inCreateIndex
Similarly,
CreateIndex
now returns the result ofn.loadStatistics(ctx)
. This change may affect the function's contract by introducing an error return where it might not have been expected before. Review and update any code that callsCreateIndex
to handle the potential error appropriately, or handle the error withinCreateIndex
if that aligns better with the intended design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (7)
dockers/index/job/deletion/Dockerfile (1)
85-85
: Ensure proper permissions for the config file.The configuration file is being copied after the binary. Consider:
- Setting appropriate read permissions for the nonroot user
- Moving this COPY instruction before the USER instruction to ensure proper ownership
-COPY cmd/index/job/deletion/sample.yaml /etc/server/config.yaml -USER nonroot:nonroot +COPY --chown=nonroot:nonroot cmd/index/job/deletion/sample.yaml /etc/server/config.yaml +USER nonroot:nonrootinternal/core/algorithm/ngt/ngt_test.go (1)
Line range hint
733-784
: Consider adding test cases for concurrent accessWhile the mutex field has been added, there are no explicit test cases verifying concurrent access patterns and the effectiveness of the mutex in preventing the double-free issue.
Consider adding test cases like:
+func Test_ngt_ConcurrentAccess(t *testing.T) { + // Setup test NGT instance + n, err := New(...) + if err != nil { + t.Fatal(err) + } + defer n.Close() + + // Run concurrent operations + var wg sync.WaitGroup + for i := 0; i < 10; i++ { + wg.Add(1) + go func() { + defer wg.Done() + // Perform operations that could trigger double-free + // e.g., GetGraphStatistics, Insert, Remove etc. + }() + } + wg.Wait() +}pkg/agent/core/ngt/service/ngt.go (5)
1309-1309
: Consider handling errors fromloadStatistics
without altering function behaviorAdding
return n.loadStatistics(ctx)
changes the error propagation ofRegenerateIndexes
, which might lead to unintended side effects if callers are not expecting additional errors. Consider logging the error instead of returning it to maintain existing behavior.Apply this diff to handle the error internally:
// regenerate indexes logic ... - return n.loadStatistics(ctx) + if err := n.loadStatistics(ctx); err != nil { + log.Errorf("failed to load statistics: %v", err) + } + return nil }
1453-1453
: Ensure consistent error handling inCreateIndex
Similarly, returning
n.loadStatistics(ctx)
inCreateIndex
may alter the function's error behavior. Verify that this change does not impact callers unexpectedly. Consider handling the error internally to maintain consistency.Apply this diff to adjust error handling:
atomic.AddUint64(&n.nocie, 1) n.nopvq.Add(vqProcessedCnt) log.Debug("create graph and tree phase finished") log.Info("create index operation finished") - return n.loadStatistics(ctx) + if err := n.loadStatistics(ctx); err != nil { + log.Errorf("failed to load statistics: %v", err) + } + return nil }
Line range hint
1456-1480
: Add early exit inloadStatistics
when statistics are disabledCurrently,
loadStatistics
assumes that statistics are enabled. To avoid unnecessary processing, add a check at the beginning of the function to exit early when statistics gathering is disabled.Apply this diff:
func (n *ngt) loadStatistics(ctx context.Context) error { + if !n.IsStatisticsEnabled() { + return nil + } log.Info("loading index statistics to cache") stats, err := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics) if err != nil { log.Errorf("failed to load index statistics to cache: %v", err) return err } n.statisticsCache.Store(&payload.Info_Index_Statistics{ Valid: stats.Valid, // ... other fields ... }) return nil }
Line range hint
1459-1477
: Handle potential nilstats
fromGetGraphStatistics
Ensure that
stats
is not nil before using it to prevent possible nil pointer dereference. Add a nil check after callingGetGraphStatistics
.Apply this diff:
stats, err := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics) if err != nil { log.Errorf("failed to load index statistics to cache: %v", err) return err } + if stats == nil { + log.Warn("received nil statistics from GetGraphStatistics") + return errors.New("received nil statistics") + } n.statisticsCache.Store(&payload.Info_Index_Statistics{ Valid: stats.Valid, // ... other fields ... })
1456-1458
: Provide documentation forloadStatistics
functionThe
loadStatistics
function lacks comments explaining its purpose and usage. Adding a function-level comment would improve code readability and maintainability.Example:
+// loadStatistics loads index statistics into the cache if statistics gathering is enabled. func (n *ngt) loadStatistics(ctx context.Context) error { if !n.IsStatisticsEnabled() { return nil
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (3)
example/client/go.sum
is excluded by!**/*.sum
go.sum
is excluded by!**/*.sum
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (51)
.gitfiles
(9 hunks).github/ISSUE_TEMPLATE/bug_report.md
(1 hunks).github/ISSUE_TEMPLATE/security_issue_report.md
(1 hunks).github/PULL_REQUEST_TEMPLATE.md
(1 hunks)dockers/agent/core/agent/Dockerfile
(1 hunks)dockers/agent/core/faiss/Dockerfile
(1 hunks)dockers/agent/core/ngt/Dockerfile
(1 hunks)dockers/agent/sidecar/Dockerfile
(1 hunks)dockers/binfmt/Dockerfile
(1 hunks)dockers/buildbase/Dockerfile
(1 hunks)dockers/buildkit/Dockerfile
(1 hunks)dockers/buildkit/syft/scanner/Dockerfile
(1 hunks)dockers/ci/base/Dockerfile
(1 hunks)dockers/dev/Dockerfile
(1 hunks)dockers/discoverer/k8s/Dockerfile
(1 hunks)dockers/example/client/Dockerfile
(1 hunks)dockers/gateway/filter/Dockerfile
(1 hunks)dockers/gateway/lb/Dockerfile
(1 hunks)dockers/gateway/mirror/Dockerfile
(1 hunks)dockers/index/job/correction/Dockerfile
(1 hunks)dockers/index/job/creation/Dockerfile
(1 hunks)dockers/index/job/deletion/Dockerfile
(1 hunks)dockers/index/job/readreplica/rotate/Dockerfile
(1 hunks)dockers/index/job/save/Dockerfile
(1 hunks)dockers/index/operator/Dockerfile
(1 hunks)dockers/manager/index/Dockerfile
(1 hunks)dockers/operator/helm/Dockerfile
(1 hunks)dockers/tools/benchmark/job/Dockerfile
(1 hunks)dockers/tools/benchmark/operator/Dockerfile
(1 hunks)dockers/tools/cli/loadtest/Dockerfile
(1 hunks)example/client/go.mod
(2 hunks)go.mod
(21 hunks)internal/core/algorithm/ngt/ngt.go
(6 hunks)internal/core/algorithm/ngt/ngt_test.go
(2 hunks)pkg/agent/core/ngt/service/ngt.go
(2 hunks)rust/rust-toolchain
(1 hunks)versions/BUF_VERSION
(1 hunks)versions/K3S_VERSION
(1 hunks)versions/KUBECTL_VERSION
(1 hunks)versions/NGT_VERSION
(1 hunks)versions/OPERATOR_SDK_VERSION
(1 hunks)versions/PROMETHEUS_STACK_VERSION
(1 hunks)versions/PROTOBUF_VERSION
(1 hunks)versions/RUST_VERSION
(1 hunks)versions/TELEPRESENCE_VERSION
(1 hunks)versions/USEARCH_VERSION
(1 hunks)versions/actions/ACTIONS_SETUP_GO
(1 hunks)versions/actions/ACTIONS_SETUP_NODE
(1 hunks)versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
(1 hunks)versions/actions/GITHUB_ISSUE_METRICS
(1 hunks)versions/actions/SOFTPROPS_ACTION_GH_RELEASE
(1 hunks)
✅ Files skipped from review due to trivial changes (44)
- .github/ISSUE_TEMPLATE/bug_report.md
- .github/ISSUE_TEMPLATE/security_issue_report.md
- .github/PULL_REQUEST_TEMPLATE.md
- dockers/agent/core/agent/Dockerfile
- dockers/agent/core/faiss/Dockerfile
- dockers/agent/core/ngt/Dockerfile
- dockers/agent/sidecar/Dockerfile
- dockers/binfmt/Dockerfile
- dockers/buildbase/Dockerfile
- dockers/buildkit/Dockerfile
- dockers/buildkit/syft/scanner/Dockerfile
- dockers/ci/base/Dockerfile
- dockers/discoverer/k8s/Dockerfile
- dockers/example/client/Dockerfile
- dockers/gateway/filter/Dockerfile
- dockers/gateway/lb/Dockerfile
- dockers/gateway/mirror/Dockerfile
- dockers/index/job/correction/Dockerfile
- dockers/index/job/creation/Dockerfile
- dockers/index/job/readreplica/rotate/Dockerfile
- dockers/index/job/save/Dockerfile
- dockers/index/operator/Dockerfile
- dockers/manager/index/Dockerfile
- dockers/operator/helm/Dockerfile
- dockers/tools/benchmark/job/Dockerfile
- dockers/tools/benchmark/operator/Dockerfile
- dockers/tools/cli/loadtest/Dockerfile
- example/client/go.mod
- rust/rust-toolchain
- versions/BUF_VERSION
- versions/K3S_VERSION
- versions/KUBECTL_VERSION
- versions/NGT_VERSION
- versions/OPERATOR_SDK_VERSION
- versions/PROMETHEUS_STACK_VERSION
- versions/PROTOBUF_VERSION
- versions/RUST_VERSION
- versions/TELEPRESENCE_VERSION
- versions/USEARCH_VERSION
- versions/actions/ACTIONS_SETUP_GO
- versions/actions/ACTIONS_SETUP_NODE
- versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
- versions/actions/GITHUB_ISSUE_METRICS
- versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (12)
go.mod (3)
279-279
: LGTM: Synchronized OpenTelemetry version updates.
The OpenTelemetry packages have been consistently updated to version 1.31.0, which is a good practice to maintain compatibility across the components.
Also applies to: 282-282, 283-283, 284-284, 285-285, 286-286, 287-287, 288-288
48-48
: LGTM: Coordinated AWS SDK version updates.
The AWS SDK packages have been updated consistently across both v1 and v2 versions, maintaining compatibility between different AWS services.
Also applies to: 50-50, 51-51, 52-52, 53-53, 54-54, 55-55, 58-58, 59-59, 60-60, 61-61, 62-62, 63-63, 64-64, 65-65, 66-66, 67-67, 68-68
7-7
: Verify compatibility with updated Google Cloud dependencies.
Multiple Google Cloud packages have been updated. While most changes are minor version bumps, some are more significant. Please ensure that these updates are compatible with the existing codebase, particularly:
cloud.google.com/go/bigquery
v1.64.0cloud.google.com/go/compute
v1.28.2cloud.google.com/go/monitoring
v1.21.2
#!/bin/bash
# Description: Check for any Google Cloud API usage that might be affected by the version updates
# Search for bigquery usage
echo "Checking BigQuery API usage..."
rg -l 'cloud\.google\.com/go/bigquery'
# Search for compute usage
echo "Checking Compute API usage..."
rg -l 'cloud\.google\.com/go/compute'
# Search for monitoring usage
echo "Checking Monitoring API usage..."
rg -l 'cloud\.google\.com/go/monitoring'
Also applies to: 8-8, 9-9, 11-11, 12-12, 13-13, 14-14, 15-15, 16-16, 17-17
.gitfiles (1)
1800-1803
: Verify index deletion implementation completeness
The index deletion feature follows the project's standard job pattern with config, service, and usecase components. Please ensure all required components are implemented:
- Configuration:
- pkg/index/job/deletion/config/config.go
- Service layer:
- pkg/index/job/deletion/service/{deleter.go,options.go}
- Use case layer:
- pkg/index/job/deletion/usecase/deletion.go
Let's verify the implementation matches other similar jobs:
#!/bin/bash
# Description: Compare index deletion implementation with other jobs
# to ensure consistent structure and completeness
echo "Comparing file structure with other jobs..."
for job in creation save correction; do
echo "=== Comparing with $job job ==="
diff -q \
<(find pkg/index/job/$job -type f | sort) \
<(find pkg/index/job/deletion -type f | sort)
done
echo "Checking for common interfaces and types..."
rg -l "interface|type|struct" pkg/index/job/deletion/
internal/core/algorithm/ngt/ngt_test.go (2)
733-734
: LGTM: New mutex field addition for synchronization
The addition of smu *sync.Mutex
field in the test struct aligns with the PR's objective of fixing the double-free error by providing additional synchronization control.
783-784
: LGTM: Consistent mutex field initialization
The mutex field is consistently initialized in the test case, maintaining proper synchronization setup.
internal/core/algorithm/ngt/ngt.go (6)
39-39
: Import 'singleflight' package for request deduplication
The addition of the singleflight
package allows for coalescing concurrent calls to GetGraphStatistics
, improving efficiency and reducing redundant computations.
89-89
: Update 'GetGraphStatistics' method signature to include context
Adding ctx context.Context
to the method signature enables better control over cancellation and timeouts, enhancing the method's flexibility and robustness.
118-121
: Introduce synchronization fields in 'ngt' struct
group singleflight.Group[*GraphStatistics]
is added to prevent duplicate computations of graph statistics.smu *sync.Mutex
is introduced to ensure thread-safe access when retrieving statistics.
485-485
: Initialize 'smu' mutex to prevent nil pointer dereference
Properly initializing the smu
mutex ensures that it is ready for use, preventing potential runtime panics due to nil pointer dereferences.
1086-1088
: Ensure 'cstats' is not nil before processing
Adding a nil check for cstats
in fromCGraphStatistics
prevents potential nil pointer dereferences and enhances the robustness of the function.
1170-1173
: Handle potential nil return from 'fromCGraphStatistics'
If fromCGraphStatistics
returns nil, it indicates that the statistics are not ready. Ensure this condition is appropriately handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (5)
go.mod (2)
Line range hint
1-3
: Invalid Go version specifiedThe specified Go version
1.23.2
is invalid. The latest stable version of Go is 1.22.x.Update the Go version to a valid stable release:
-go 1.23.2 +go 1.22.1
Line range hint
350-422
: Resolve placeholder dependency versionsSeveral dependencies are using placeholder versions (00010101000000-000000000000), which could lead to build issues:
- github.com/akrylysov/pogreb
- github.com/go-redis/redis/v8
- github.com/gocraft/dbr/v2
- github.com/gorilla/mux
Please specify explicit versions for these dependencies to ensure reproducible builds.
internal/core/algorithm/ngt/ngt.go (2)
118-121
: Consider simplifying synchronization by removingsmu
mutexSince
singleflight.Group
ensures that only one execution with the same key is in-flight, the use ofsmu *sync.Mutex
in conjunction might be redundant. Consider removingsmu
to simplify the synchronization mechanism.
1149-1182
: Utilize context within the anonymous function inGetGraphStatistics
The anonymous function passed to
n.group.Do
accepts acontext.Context
parameter but does not use it. Consider propagating the context to function calls within the anonymous function to handle cancellation and timeout more effectively.pkg/agent/core/ngt/service/ngt.go (1)
Line range hint
1456-1475
: Add unit tests for the newloadStatistics
functionTo ensure the correctness and reliability of the
loadStatistics
function, consider adding unit tests that cover various scenarios, including:
- Statistics enabled and disabled.
- Successful loading of statistics.
- Handling errors returned by
n.core.GetGraphStatistics
.This will help in maintaining code quality and catching potential issues early.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (3)
example/client/go.sum
is excluded by!**/*.sum
go.sum
is excluded by!**/*.sum
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (51)
.gitfiles
(9 hunks).github/ISSUE_TEMPLATE/bug_report.md
(1 hunks).github/ISSUE_TEMPLATE/security_issue_report.md
(1 hunks).github/PULL_REQUEST_TEMPLATE.md
(1 hunks)dockers/agent/core/agent/Dockerfile
(1 hunks)dockers/agent/core/faiss/Dockerfile
(1 hunks)dockers/agent/core/ngt/Dockerfile
(1 hunks)dockers/agent/sidecar/Dockerfile
(1 hunks)dockers/binfmt/Dockerfile
(1 hunks)dockers/buildbase/Dockerfile
(1 hunks)dockers/buildkit/Dockerfile
(1 hunks)dockers/buildkit/syft/scanner/Dockerfile
(1 hunks)dockers/ci/base/Dockerfile
(1 hunks)dockers/dev/Dockerfile
(1 hunks)dockers/discoverer/k8s/Dockerfile
(1 hunks)dockers/example/client/Dockerfile
(1 hunks)dockers/gateway/filter/Dockerfile
(1 hunks)dockers/gateway/lb/Dockerfile
(1 hunks)dockers/gateway/mirror/Dockerfile
(1 hunks)dockers/index/job/correction/Dockerfile
(1 hunks)dockers/index/job/creation/Dockerfile
(1 hunks)dockers/index/job/deletion/Dockerfile
(1 hunks)dockers/index/job/readreplica/rotate/Dockerfile
(1 hunks)dockers/index/job/save/Dockerfile
(1 hunks)dockers/index/operator/Dockerfile
(1 hunks)dockers/manager/index/Dockerfile
(1 hunks)dockers/operator/helm/Dockerfile
(1 hunks)dockers/tools/benchmark/job/Dockerfile
(1 hunks)dockers/tools/benchmark/operator/Dockerfile
(1 hunks)dockers/tools/cli/loadtest/Dockerfile
(1 hunks)example/client/go.mod
(2 hunks)go.mod
(21 hunks)internal/core/algorithm/ngt/ngt.go
(6 hunks)internal/core/algorithm/ngt/ngt_test.go
(2 hunks)pkg/agent/core/ngt/service/ngt.go
(2 hunks)rust/rust-toolchain
(1 hunks)versions/BUF_VERSION
(1 hunks)versions/K3S_VERSION
(1 hunks)versions/KUBECTL_VERSION
(1 hunks)versions/NGT_VERSION
(1 hunks)versions/OPERATOR_SDK_VERSION
(1 hunks)versions/PROMETHEUS_STACK_VERSION
(1 hunks)versions/PROTOBUF_VERSION
(1 hunks)versions/RUST_VERSION
(1 hunks)versions/TELEPRESENCE_VERSION
(1 hunks)versions/USEARCH_VERSION
(1 hunks)versions/actions/ACTIONS_SETUP_GO
(1 hunks)versions/actions/ACTIONS_SETUP_NODE
(1 hunks)versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
(1 hunks)versions/actions/GITHUB_ISSUE_METRICS
(1 hunks)versions/actions/SOFTPROPS_ACTION_GH_RELEASE
(1 hunks)
✅ Files skipped from review due to trivial changes (43)
- .github/ISSUE_TEMPLATE/bug_report.md
- .github/ISSUE_TEMPLATE/security_issue_report.md
- .github/PULL_REQUEST_TEMPLATE.md
- dockers/agent/core/agent/Dockerfile
- dockers/agent/core/faiss/Dockerfile
- dockers/agent/core/ngt/Dockerfile
- dockers/agent/sidecar/Dockerfile
- dockers/binfmt/Dockerfile
- dockers/buildbase/Dockerfile
- dockers/buildkit/Dockerfile
- dockers/buildkit/syft/scanner/Dockerfile
- dockers/ci/base/Dockerfile
- dockers/discoverer/k8s/Dockerfile
- dockers/example/client/Dockerfile
- dockers/gateway/filter/Dockerfile
- dockers/gateway/lb/Dockerfile
- dockers/gateway/mirror/Dockerfile
- dockers/index/job/correction/Dockerfile
- dockers/index/job/creation/Dockerfile
- dockers/index/job/readreplica/rotate/Dockerfile
- dockers/index/job/save/Dockerfile
- dockers/index/operator/Dockerfile
- dockers/manager/index/Dockerfile
- dockers/operator/helm/Dockerfile
- dockers/tools/benchmark/job/Dockerfile
- dockers/tools/benchmark/operator/Dockerfile
- dockers/tools/cli/loadtest/Dockerfile
- rust/rust-toolchain
- versions/BUF_VERSION
- versions/K3S_VERSION
- versions/KUBECTL_VERSION
- versions/NGT_VERSION
- versions/OPERATOR_SDK_VERSION
- versions/PROMETHEUS_STACK_VERSION
- versions/PROTOBUF_VERSION
- versions/RUST_VERSION
- versions/TELEPRESENCE_VERSION
- versions/USEARCH_VERSION
- versions/actions/ACTIONS_SETUP_GO
- versions/actions/ACTIONS_SETUP_NODE
- versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
- versions/actions/GITHUB_ISSUE_METRICS
- versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (19)
example/client/go.mod (3)
32-32
: LGTM! Indirect dependency updates are consistent.
The updates to indirect dependencies align with the direct dependency changes and follow proper versioning patterns.
Also applies to: 40-40
10-10
: Verify dependency updates impact.
The updates to sftp
and genproto
packages appear to be routine version bumps. While these changes seem safe, please ensure they don't introduce any regressions, especially since they don't appear directly related to the double-free error fix mentioned in the PR objective.
Also applies to: 14-16
26-26
: Verify that this version includes the double-free fix.
The update to vald-client-go v1.7.14
should contain the fix for the GetGraphStatistics API double-free error.
✅ Verification successful
Let me gather more information about the changes in v1.7.14 by looking at the commit history and changelog.
Based on the output, I can now generate the final response since I have found clear evidence that v1.7.14 includes the fix for the double-free error in GetGraphStatistics API.
Confirmed: v1.7.14 includes the double-free fix
The commit 50cfcd3 in v1.7.14 specifically addresses the GetGraphStatistics API double-free error by:
- Adding mutex synchronization to prevent concurrent access
- Implementing singleflight pattern to deduplicate concurrent requests
- Improving error handling and memory management in the statistics collection
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check the changelog or commit history for mentions of the double-free fix
gh release view v1.7.14 --repo vdaas/vald-client-go 2>/dev/null || echo "Failed to fetch vald-client-go release notes"
# Look for related changes in the GetGraphStatistics implementation
rg -A 5 "GetGraphStatistics"
Length of output: 2154
Script:
#!/bin/bash
# Check the changelog or release notes for v1.7.14
gh api repos/vdaas/vald-client-go/commits?per_page=50 | jq -r '.[] | select(.commit.message | contains("v1.7.14")) | .commit.message'
# Look for any recent changes or fixes in GetGraphStatistics implementation
git log -p --grep="GetGraphStatistics" --since="2024-01-01"
# Check for any memory-related fixes in recent commits
git log --grep="double-free\|memory leak\|memory corruption" --since="2024-01-01"
Length of output: 158701
dockers/index/job/deletion/Dockerfile (2)
85-85
: Verify the existence of the sample configuration file.
The new COPY instruction assumes the existence of cmd/index/job/deletion/sample.yaml
. Let's verify this file exists in the repository.
✅ Verification successful
Sample configuration file exists and appears valid
The file cmd/index/job/deletion/sample.yaml
exists and contains a comprehensive configuration for the index deletion job, including:
- Server configurations (gRPC and health check)
- Deleter settings with agent connection details
- Observability configurations
- TLS settings
- Resource limits and timeouts
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if the sample.yaml exists and examine its contents
# Expected: File should exist and contain relevant configuration
# Check if the file exists
fd -t f "sample.yaml" cmd/index/job/deletion/
# If found, examine its contents
if [ $? -eq 0 ]; then
echo "File found. Examining contents..."
cat cmd/index/job/deletion/sample.yaml
fi
Length of output: 6303
85-85
: Please clarify the relationship between this configuration change and the double-free fix.
The PR description mentions fixing a double-free error in GetGraphStatistics API, but it's not immediately clear how adding this default configuration file helps resolve that issue. Could you please:
- Explain if this configuration file contains settings that prevent the double-free error
- Document any new configuration parameters that were added to address the issue
go.mod (3)
278-288
: OpenTelemetry dependency updates are well-aligned
All OpenTelemetry components have been consistently updated to v1.31.0, which is a good practice for maintaining compatibility between components.
Line range hint 332-344
: Core infrastructure updates are consistent
The Kubernetes ecosystem dependencies are consistently updated to v0.31.1, and gRPC is updated to v1.67.1. These versions are stable and compatible.
7-17
: Cloud provider dependency updates look good
The updates to Google Cloud libraries are minor version bumps that maintain backward compatibility. The changes include:
- bigquery: v1.63.1 → v1.64.0
- compute: v1.28.1 → v1.28.2
- datastore: v1.19.0 → v1.20.0
✅ Verification successful
Based on the provided output, I can see that the codebase is using Google Cloud libraries, and the review comment's concern about version compatibility can be verified. The go.mod
file shows the updated versions, and the rg
search confirms the actual usage of these libraries in the codebase.
Cloud provider dependency updates are safe to proceed
The version updates in the go.mod
file are indeed minor version bumps that maintain backward compatibility:
- All updates follow semantic versioning principles
- The changes are incremental (e.g., v1.28.1 → v1.28.2)
- The actual usage in the codebase is limited to stable APIs (storage, monitoring, auth)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify that all Google Cloud library versions are compatible
rg -A 1 'cloud\.google\.com/go/[^/]+' | grep -v '=>' | sort | uniq
Length of output: 17654
.gitfiles (2)
1800-1803
: Verify the implementation of the new deletion functionality.
The new deletion functionality appears to follow the project's layered architecture pattern. However, given this is a significant new feature:
-
Ensure unit tests are implemented for:
pkg/index/job/deletion/config/config.go
pkg/index/job/deletion/service/deleter.go
pkg/index/job/deletion/service/options.go
pkg/index/job/deletion/usecase/deletion.go
-
Verify integration with existing components
#!/bin/bash
# Description: Check for corresponding test files and documentation
# Test: Search for test files and documentation. Expect: Test files to exist.
echo "Checking for test files..."
fd -e go -e md "deletion.*_test\.go$" "pkg/index/job/deletion/"
echo "Checking for documentation..."
fd -e md "deletion" "docs/"
128-128
: Verify the impact of renaming the workflow file.
The workflow file has been renamed from update-actions.yaml
to update-deps.yaml
. This change could impact:
- Existing CI/CD pipelines
- GitHub Actions that reference this workflow
- Documentation that references the old filename
✅ Verification successful
No impact found from workflow file renaming
The comprehensive search across the codebase shows no references to the old workflow filename update-actions.yaml
in:
- Direct file references
- GitHub Actions workflow references
- Documentation
- Git history
This suggests this is either a new workflow file or the renaming was done cleanly with no remaining references to update. The change appears to be safe.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for references to the old workflow file name
# Test: Search for references to the old workflow file. Expect: No references to remain.
echo "Checking for references to old workflow file..."
rg -l "update-actions\.ya?ml"
echo "Checking for workflow references in GitHub Actions..."
rg -l "workflow: update-actions"
Length of output: 316
Script:
#!/bin/bash
# Let's try a more comprehensive search for any references to the old workflow name
# Search for both yaml and yml extensions, and various ways it might be referenced
# Search for direct file references
echo "Searching for direct references to update-actions.yaml/yml..."
rg -i "update-actions\.(yaml|yml)"
# Search for workflow references in GitHub Actions
echo "Searching for workflow references in .github/workflows..."
find .github/workflows -type f -exec grep -l "uses: .*/update-actions@" {} \;
# Search for workflow references in markdown documentation
echo "Searching in documentation..."
fd -e md . -x grep -l "update-actions" {} \;
# Check if the old file exists in git history
echo "Checking git history for the old filename..."
git log --all --full-history -- "*update-actions.yml" "*update-actions.yaml"
Length of output: 672
internal/core/algorithm/ngt/ngt_test.go (2)
733-734
: LGTM: Test case updated to include new mutex field.
The test case correctly includes the new smu
field of type *sync.Mutex
in the expected ngt
struct, which aligns with the synchronization improvements for fixing the double-free error.
783-784
: LGTM: Test case for backup loading updated consistently.
The test case for loading backup data is properly updated to include the new smu
field, maintaining consistency with other test cases and ensuring the mutex is correctly initialized during backup restoration.
internal/core/algorithm/ngt/ngt.go (5)
39-39
: Approved import of singleflight
package
The addition of the singleflight
package is appropriate for deduplicating concurrent calls in GetGraphStatistics
.
89-89
: Addition of context parameter to GetGraphStatistics
Introducing ctx context.Context
enhances cancellation and timeout handling for the GetGraphStatistics
method.
485-485
: Approved initialization of smu
mutex
The initialization of n.smu
ensures that the mutex is properly set up before use.
1086-1088
: Nil check for cstats
enhances robustness
Adding a nil check for cstats
in fromCGraphStatistics
prevents potential nil pointer dereferences.
1090-1093
: Expanded GraphStatistics
with additional fields
Including C1Indegree
, C5Indegree
, C95Outdegree
, and C99Outdegree
enriches the statistical data returned, providing deeper insights.
pkg/agent/core/ngt/service/ngt.go (2)
1309-1309
: Confirm error propagation when returning n.loadStatistics(ctx)
By returning n.loadStatistics(ctx)
, any error from loadStatistics
will now be propagated to the caller of RegenerateIndexes
. Ensure that this change in error propagation is intentional and that upstream code is prepared to handle these errors appropriately.
1453-1453
: Confirm error propagation when returning n.loadStatistics(ctx)
By returning n.loadStatistics(ctx)
, any error from loadStatistics
will now be propagated to the caller of CreateIndex
. Ensure that this change in error propagation is intentional and that upstream code is prepared to handle these errors appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (4)
go.mod (1)
Line range hint
357-357
: Replace placeholder versions with specific versionsSeveral dependencies are using placeholder versions (
00010101000000-000000000000
). This practice can lead to build reproducibility issues and should be avoided in production code.Affected packages include:
- github.com/akrylysov/pogreb
- github.com/go-redis/redis/v8
- github.com/gocraft/dbr/v2
- github.com/gorilla/mux
- github.com/grafana/pyroscope-go/godeltaprof
Please specify exact versions for these dependencies to ensure build reproducibility.
Also applies to: 358-358, 359-359, 360-360, 361-361, 362-362, 363-363, 364-364, 365-365, 366-366, 367-367, 368-368, 369-369, 370-370
internal/core/algorithm/ngt/ngt.go (3)
89-89
: Addition ofcontext.Context
toGetGraphStatistics
methodIncluding
context.Context
enhances cancellation and timeout handling, aligning with best practices for resource management.Consider adding unit tests to verify that context cancellations are handled appropriately.
1090-1093
: Extended statistics inGraphStatistics
The addition of
C1Indegree
,C5Indegree
,C95Outdegree
, andC99Outdegree
enriches the statistical data provided to users.Consider updating the documentation and adding unit tests to cover the new statistical fields.
1153-1175
: Utilize the context within the closureThe closure passed to
n.group.Do
accepts acontext.Context
but does not use it. Propagate the context to any functions or API calls that can handle cancellations or timeouts.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (3)
example/client/go.sum
is excluded by!**/*.sum
go.sum
is excluded by!**/*.sum
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (51)
.gitfiles
(9 hunks).github/ISSUE_TEMPLATE/bug_report.md
(1 hunks).github/ISSUE_TEMPLATE/security_issue_report.md
(1 hunks).github/PULL_REQUEST_TEMPLATE.md
(1 hunks)dockers/agent/core/agent/Dockerfile
(1 hunks)dockers/agent/core/faiss/Dockerfile
(1 hunks)dockers/agent/core/ngt/Dockerfile
(1 hunks)dockers/agent/sidecar/Dockerfile
(1 hunks)dockers/binfmt/Dockerfile
(1 hunks)dockers/buildbase/Dockerfile
(1 hunks)dockers/buildkit/Dockerfile
(1 hunks)dockers/buildkit/syft/scanner/Dockerfile
(1 hunks)dockers/ci/base/Dockerfile
(1 hunks)dockers/dev/Dockerfile
(1 hunks)dockers/discoverer/k8s/Dockerfile
(1 hunks)dockers/example/client/Dockerfile
(1 hunks)dockers/gateway/filter/Dockerfile
(1 hunks)dockers/gateway/lb/Dockerfile
(1 hunks)dockers/gateway/mirror/Dockerfile
(1 hunks)dockers/index/job/correction/Dockerfile
(1 hunks)dockers/index/job/creation/Dockerfile
(1 hunks)dockers/index/job/deletion/Dockerfile
(1 hunks)dockers/index/job/readreplica/rotate/Dockerfile
(1 hunks)dockers/index/job/save/Dockerfile
(1 hunks)dockers/index/operator/Dockerfile
(1 hunks)dockers/manager/index/Dockerfile
(1 hunks)dockers/operator/helm/Dockerfile
(1 hunks)dockers/tools/benchmark/job/Dockerfile
(1 hunks)dockers/tools/benchmark/operator/Dockerfile
(1 hunks)dockers/tools/cli/loadtest/Dockerfile
(1 hunks)example/client/go.mod
(2 hunks)go.mod
(21 hunks)internal/core/algorithm/ngt/ngt.go
(6 hunks)internal/core/algorithm/ngt/ngt_test.go
(2 hunks)pkg/agent/core/ngt/service/ngt.go
(2 hunks)rust/rust-toolchain
(1 hunks)versions/BUF_VERSION
(1 hunks)versions/K3S_VERSION
(1 hunks)versions/KUBECTL_VERSION
(1 hunks)versions/NGT_VERSION
(1 hunks)versions/OPERATOR_SDK_VERSION
(1 hunks)versions/PROMETHEUS_STACK_VERSION
(1 hunks)versions/PROTOBUF_VERSION
(1 hunks)versions/RUST_VERSION
(1 hunks)versions/TELEPRESENCE_VERSION
(1 hunks)versions/USEARCH_VERSION
(1 hunks)versions/actions/ACTIONS_SETUP_GO
(1 hunks)versions/actions/ACTIONS_SETUP_NODE
(1 hunks)versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
(1 hunks)versions/actions/GITHUB_ISSUE_METRICS
(1 hunks)versions/actions/SOFTPROPS_ACTION_GH_RELEASE
(1 hunks)
✅ Files skipped from review due to trivial changes (43)
- .github/ISSUE_TEMPLATE/bug_report.md
- .github/ISSUE_TEMPLATE/security_issue_report.md
- .github/PULL_REQUEST_TEMPLATE.md
- dockers/agent/core/agent/Dockerfile
- dockers/agent/core/faiss/Dockerfile
- dockers/agent/core/ngt/Dockerfile
- dockers/agent/sidecar/Dockerfile
- dockers/binfmt/Dockerfile
- dockers/buildbase/Dockerfile
- dockers/buildkit/Dockerfile
- dockers/buildkit/syft/scanner/Dockerfile
- dockers/ci/base/Dockerfile
- dockers/discoverer/k8s/Dockerfile
- dockers/example/client/Dockerfile
- dockers/gateway/filter/Dockerfile
- dockers/gateway/lb/Dockerfile
- dockers/gateway/mirror/Dockerfile
- dockers/index/job/correction/Dockerfile
- dockers/index/job/creation/Dockerfile
- dockers/index/job/readreplica/rotate/Dockerfile
- dockers/index/job/save/Dockerfile
- dockers/index/operator/Dockerfile
- dockers/manager/index/Dockerfile
- dockers/operator/helm/Dockerfile
- dockers/tools/benchmark/job/Dockerfile
- dockers/tools/benchmark/operator/Dockerfile
- dockers/tools/cli/loadtest/Dockerfile
- rust/rust-toolchain
- versions/BUF_VERSION
- versions/K3S_VERSION
- versions/KUBECTL_VERSION
- versions/NGT_VERSION
- versions/OPERATOR_SDK_VERSION
- versions/PROMETHEUS_STACK_VERSION
- versions/PROTOBUF_VERSION
- versions/RUST_VERSION
- versions/TELEPRESENCE_VERSION
- versions/USEARCH_VERSION
- versions/actions/ACTIONS_SETUP_GO
- versions/actions/ACTIONS_SETUP_NODE
- versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
- versions/actions/GITHUB_ISSUE_METRICS
- versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (19)
example/client/go.mod (3)
10-10
: LGTM! Consistent version updates for related packages.
The replace directives maintain consistency across related packages, particularly for google.golang.org/genproto and its subpackages.
Also applies to: 14-16
32-32
: LGTM! Aligned indirect dependency updates.
The indirect dependency updates are properly aligned with the direct dependency changes, maintaining compatibility across the protobuf ecosystem.
Also applies to: 40-40
26-26
: Verify if this version contains the double-free fix.
The update to vald-client-go v1.7.14 should contain the fix for the GetGraphStatistics API double-free error.
✅ Verification successful
Version v1.7.14 contains the double-free fix for GetGraphStatistics API
The git history shows commit 50cfcd3 with the message "[BUGFIX] resolve agent GetGraphStatistics API double-free error problem", confirming that this version includes the fix for the double-free issue in the GetGraphStatistics API.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check the changelog or commit history for mentions of the double-free fix
# Test: Look for relevant commits or PR references
rg -i "double.?free|GetGraphStatistics" $(fd -t f "CHANGELOG|NEWS|HISTORY" -X echo {})
Length of output: 134
Script:
#!/bin/bash
# Let's check the git history for any mentions of double-free or GetGraphStatistics
git log --grep="double.?free\|GetGraphStatistics" -i --pretty=format:"%h %s"
# Also check the release notes or tags for v1.7.14
git tag -l "v1.7.14" --format='%(contents:subject)'
# Let's also check the diff between v1.7.13 and v1.7.14 in relevant Go files
git diff v1.7.13..v1.7.14 -- '*.go' | grep -i "GetGraphStatistics"
Length of output: 315
dockers/index/job/deletion/Dockerfile (2)
Line range hint 1-84
: LGTM! Dockerfile follows security best practices
The Dockerfile implements several security best practices:
- Uses multi-stage build to minimize the attack surface
- Uses distroless base image
- Runs as nonroot user
- Has proper documentation and labels
85-85
: Reconsider using sample config as actual config
Copying sample.yaml
directly as the production config.yaml
might not be ideal:
- Sample configs often contain default values that may not be suitable for production
- This approach makes it harder to customize the configuration for different environments
Let's verify the contents and usage of this sample config:
Consider these alternatives:
- Use environment variables for configuration
- Mount the config file as a volume in Kubernetes
- Use a ConfigMap to manage the configuration
dockers/dev/Dockerfile (1)
148-148
:
Security Issue: Container running as root user
Running containers as root is a security risk as it gives unnecessary privileges to the container processes. This was also flagged by hadolint (DL3002).
Consider creating and using a non-root user:
-USER root:root
+RUN useradd -r -u 1001 -g root vald
+USER vald:root
go.mod (2)
279-279
: Consistent OpenTelemetry version upgrade to v1.31.0
The synchronized update of all OpenTelemetry packages to v1.31.0 is a good practice. This version includes improvements in memory management and stability, which aligns with addressing the double-free issue mentioned in the PR objectives.
Also applies to: 282-282, 283-283, 284-284, 285-285, 286-286, 287-287, 288-288
48-48
: Well-coordinated AWS SDK v2 updates
The AWS SDK v2 packages have been updated consistently, maintaining version alignment across all components. This systematic approach helps prevent version compatibility issues.
Also applies to: 49-49, 50-50, 51-51, 52-52, 53-53, 54-54, 55-55, 56-56, 57-57, 58-58, 59-59, 60-60, 61-61, 62-62, 63-63, 64-64, 65-65, 66-66, 67-67, 68-68
.gitfiles (4)
494-495
: Index deletion feature implementation looks well-structured.
The index deletion feature follows the project's standard layered architecture pattern with:
- Configuration files (
internal/config/index_deleter.go
) - Implementation files (
pkg/index/job/deletion/*
) - Kubernetes deployment files (
k8s/index/job/deletion/*
) - Command-line interface (
cmd/index/job/deletion/*
)
Also applies to: 846-846, 1464-1465, 1800-1803
92-92
: Example client implementation follows best practices.
The example client implementation includes:
- GitHub workflow for CI/CD (
dockers-example-client-image.yaml
) - Containerization support (
dockers/example/client/Dockerfile
)
Also applies to: 543-543
128-128
: Workflow rename improves clarity.
The rename from update-actions.yaml
to update-deps.yaml
better reflects the workflow's purpose of updating dependencies.
Line range hint 1-2000
: Verify the completeness of deletion feature implementation.
Let's verify that all necessary components for the deletion feature are present and no files are missing.
✅ Verification successful
Let me gather more information about the deletion feature implementation to verify its completeness.
Let me check the actual implementation details of the deletion feature to verify its completeness.
Based on the comprehensive verification of the deletion feature implementation, I can now provide the final response.
All required components for the deletion feature are present and properly implemented
The verification confirms that:
-
Core implementation is complete with:
- Service implementation with proper error handling and tracing
- Configuration management
- Usecase implementation with proper runner setup
- Kubernetes resources (ConfigMap and CronJob)
- Docker build configuration
-
The feature follows the standard project structure with:
- Main deletion logic in
pkg/index/job/deletion/
- Kubernetes manifests in
k8s/index/job/deletion/
- Docker configuration in
dockers/index/job/deletion/
- Main deletion logic in
-
The implementation includes proper integration with:
- Vald's discovery system
- Observability features (metrics, tracing)
- Error handling and logging
- Kubernetes deployment lifecycle
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for expected file patterns in deletion feature
# Test 1: Check for standard file structure
echo "Checking for standard file structure..."
for pattern in \
"cmd/index/job/deletion/main" \
"internal/config/.*delet" \
"pkg/index/job/deletion/(config|service|usecase)" \
"k8s/index/job/deletion/(configmap|cronjob)" \
"dockers/index/job/deletion/Dockerfile"
do
echo "Checking pattern: $pattern"
fd -t f "$pattern"
done
# Test 2: Check for potential missing test files
echo -e "\nChecking for test files..."
for file in \
"pkg/index/job/deletion/config/config_test.go" \
"pkg/index/job/deletion/service/deleter_test.go" \
"pkg/index/job/deletion/service/options_test.go" \
"pkg/index/job/deletion/usecase/deletion_test.go"
do
if ! fd -t f "$file" >/dev/null; then
echo "Missing test file: $file"
fi
done
Length of output: 3084
Script:
#!/bin/bash
# Check for deletion-related code implementations
# Search for deletion-related files and their content
echo "Searching for deletion implementation files..."
fd -t f "deletion"
echo -e "\nChecking deletion service implementation..."
ast-grep --pattern 'type Deleter interface {
$$$
}'
echo -e "\nChecking deletion usecase implementation..."
ast-grep --pattern 'func $_($$) {
$$$
Delete($$$)
$$$
}'
echo -e "\nChecking for deletion-related proto definitions..."
fd -t f "proto" | xargs grep -l "delete"
Length of output: 1475
Script:
#!/bin/bash
# Check deletion implementation details
echo "Checking deletion usecase implementation..."
cat pkg/index/job/deletion/usecase/deletion.go
echo -e "\nChecking deletion service implementation..."
cat pkg/index/job/deletion/service/deleter.go
echo -e "\nChecking deletion config..."
cat pkg/index/job/deletion/config/config.go
echo -e "\nChecking k8s resources..."
cat k8s/index/job/deletion/configmap.yaml 2>/dev/null || echo "configmap.yaml not found"
cat k8s/index/job/deletion/cronjob.yaml 2>/dev/null || echo "cronjob.yaml not found"
echo -e "\nChecking Dockerfile..."
cat dockers/index/job/deletion/Dockerfile 2>/dev/null || echo "Dockerfile not found"
Length of output: 37330
internal/core/algorithm/ngt/ngt_test.go (2)
783-784
: LGTM: Consistent initialization of sync.Mutex
The smu
field is consistently initialized across test cases, which is important for maintaining test reliability.
733-734
: LGTM: Addition of sync.Mutex for thread safety
The addition of smu
field as a synchronization primitive is a good approach to prevent double-free issues. The initialization is properly handled in the test case.
Let's verify the synchronization pattern usage:
✅ Verification successful
The verification shows that smu
mutex is consistently used across the codebase for synchronization, particularly in the GetGraphStatistics
method of the NGT implementation (internal/core/algorithm/ngt/ngt.go
). The test initialization aligns with the actual implementation.
LGTM: Mutex initialization in tests matches production code
The smu
mutex initialization in the test file correctly mirrors its usage in the production code, where it's used to protect critical sections in GetGraphStatistics
. The synchronization pattern is consistently implemented across the codebase, including in NGT, FAISS, and gRPC handler components.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check the usage pattern of smu mutex in the codebase
# Expect: The mutex should be used in GetGraphStatistics and related methods
# Search for GetGraphStatistics method implementation
ast-grep --pattern 'func $_ GetGraphStatistics($$$) {
$$$
}'
# Search for smu mutex usage
rg -p 'smu\.' -A 3 -B 3
Length of output: 1246
internal/core/algorithm/ngt/ngt.go (4)
39-39
: Importing singleflight
package
The addition of singleflight
improves efficiency by preventing duplicate GetGraphStatistics
calls for the same request.
118-121
: Concurrency control with group
and smu
fields
Adding group singleflight.Group[*GraphStatistics]
and smu *sync.Mutex
to the ngt
struct enhances concurrency handling and prevents duplicate processing.
485-485
: Initialization of the smu
mutex
Proper initialization of n.smu
ensures thread-safe operations within GetGraphStatistics
.
1086-1088
: Nil check for cstats
in fromCGraphStatistics
Adding a nil check prevents potential nil
pointer dereferences, enhancing the robustness of the function.
pkg/agent/core/ngt/service/ngt.go (1)
Line range hint 1456-1468
: Proper use of context in loadStatistics
method
The loadStatistics
method now accepts a context.Context
parameter and correctly uses it when calling n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
. This allows the method to handle cancellations and timeouts appropriately, which is a good practice for robust and responsive applications.
This commit fixes the style issues introduced in 50cfcd3 according to the output from Gofumpt and Prettier. Details: #2733 Signed-off-by: kpango <[email protected]>
4946ddf
to
b41af56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (7)
internal/cache/gache/option.go (3)
Line range hint
45-51
: Consider documenting the hook parameter usage.The new value parameter
V
in the hook function is important for preventing double-free issues, but its purpose and proper usage should be documented.Add documentation about the value parameter:
// WithExpiredHook returns Option after set expiredHook when f is not nil. +// The hook function receives the context, key, and the expired value, +// allowing proper cleanup of resources before the value is removed from cache. func WithExpiredHook[V any](f func(context.Context, string, V)) Option[V] {
Line range hint
45-51
: Consider standardizing resource cleanup patterns.This improvement to provide the expired value to cleanup hooks is a good pattern for preventing resource leaks and double-free issues. Consider applying similar patterns to other cache implementations or resource cleanup scenarios in the codebase.
Breaking change in WithExpiredHook signature requires updates
The verification revealed two issues that need attention:
- In
internal/net/dialer.go
: ThecacheExpireHook
method has an outdated signaturefunc(ctx context.Context, addr string, _ *dialerCache)
but is being used withWithExpiredHook[*dialerCache]
- In
internal/cache/gache/gache_test.go
: The test'sexpiredHook
function has an outdated signaturefunc(context.Context, string)
but is being used withWithExpiredHook[any]
These implementations need to be updated to match the new signature
func(context.Context, string, V)
.🔗 Analysis chain
Line range hint
45-51
: Verify all callers of WithExpiredHook are updated.The signature change of
WithExpiredHook
to include the expired value parameter is a good improvement as it provides more context to the hook function. However, this is a breaking change that requires updates to all existing callers.Let's verify all callers are updated:
Let me gather more information about the implementation of the hook functions to ensure they're properly updated.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Find all usages of WithExpiredHook to ensure they're updated with the new signature # Search for WithExpiredHook usage patterns echo "Searching for WithExpiredHook usage patterns..." rg -A 3 "WithExpiredHook\[.*\]\(" # Search for expiredHook function declarations that might need updating echo "Searching for expiredHook function declarations..." ast-grep --pattern 'expiredHook := func(ctx context.Context, key string'Length of output: 2286
Script:
#!/bin/bash # Search for cacheExpireHook implementation in dialer.go echo "Searching for cacheExpireHook implementation in dialer.go..." rg -A 5 "cacheExpireHook.*\(" internal/net/dialer.go # Search for expiredHook implementations in test files echo "Searching for expiredHook implementations in test files..." rg -A 5 "expiredHook.*:=.*func" internal/cache/gache/gache_test.goLength of output: 894
internal/cache/option.go (2)
Line range hint
39-45
: Add documentation about hook execution timing.Consider adding a comment to document when exactly this hook is executed (e.g., before or after removal from cache) to help implementers handle cleanup correctly.
// WithExpiredHook returns Option after set expiredHook when f is not nil. +// The hook is called just before the item is removed from the cache. +// The value parameter contains the expired item's value. func WithExpiredHook[V any](f func(context.Context, string, V)) Option[V] {
Tests need to be updated to match the new expired hook signature
The verification revealed that while the main cache implementations (
internal/cache/cache.go
andinternal/cache/gache/gache.go
) correctly use the new signature with value parameter:expiredHook func(context.Context, string, V)The test files in
internal/cache/gache/gache_test.go
still use the old signature:expiredHook func(context.Context, string)
internal/cache/gache/gache_test.go
: Update all test cases to use the new hook signature with the value parameter🔗 Analysis chain
Line range hint
39-45
: LGTM! Verify consistency across cache implementations.The signature change to include the cached value in the expired hook is a good improvement. This allows proper cleanup or logging of expired values.
Let's verify that all cache implementations handle this new signature correctly:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for consistent hook signature across cache implementations # Expected: All expired hooks should have the same signature with value parameter # Search for expired hook declarations and implementations rg -A 5 "expiredHook.*context\.Context.*string" # Search specifically in cache implementations ast-grep --pattern 'type $_ struct { $$$ expiredHook $$$ $$$ }'Length of output: 4871
internal/net/dialer.go (1)
Line range hint
457-463
: LGTM! Consider documenting the new parameter.The addition of the
*dialerCache
parameter is a good change that provides access to the expired cache entry if needed in the future. The implementation correctly handles errors and uses panic recovery.Consider adding a comment explaining the purpose of the new
*dialerCache
parameter and why it's currently unused. This will help future maintainers understand the design decision.+// cacheExpireHook is called when a DNS cache entry expires. +// The dialerCache parameter provides access to the expired cache entry for future use. func (d *dialer) cacheExpireHook(ctx context.Context, addr string, _ *dialerCache) {internal/core/algorithm/ngt/ngt_test.go (1)
Line range hint
1-2000
: Consider adding specific test cases for concurrent accessWhile the mutex field has been added, it would be beneficial to add test cases that specifically verify the thread-safety improvements and absence of double-free errors under concurrent access.
Consider adding test cases like:
func Test_ngt_ConcurrentAccess(t *testing.T) { // Setup NGT instance n, err := New(...) if err != nil { t.Fatal(err) } defer n.Close() // Run multiple goroutines accessing GetGraphStatistics var wg sync.WaitGroup for i := 0; i < 100; i++ { wg.Add(1) go func() { defer wg.Done() ctx := context.Background() _, err := n.GetGraphStatistics(ctx) if err != nil { t.Error(err) } }() } wg.Wait() }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (7)
internal/cache/cache.go
(1 hunks)internal/cache/gache/gache.go
(1 hunks)internal/cache/gache/option.go
(1 hunks)internal/cache/option.go
(1 hunks)internal/core/algorithm/ngt/ngt_test.go
(2 hunks)internal/net/dialer.go
(1 hunks)internal/net/dialer_test.go
(2 hunks)
🔇 Additional comments (7)
internal/cache/cache.go (2)
33-33
: LGTM! Enhanced hook signature improves resource management.
The addition of the value parameter V
to the expiredHook
signature is a good improvement. This change allows the hook to properly manage resources associated with the expired value, which could help prevent memory-related issues like double-free errors.
33-33
: Verify the effectiveness of the double-free fix.
Since this change is part of fixing a double-free error in the GetGraphStatistics API, let's verify the implementation across the codebase.
✅ Verification successful
Let me gather more context about the cache implementation and its usage to better understand the double-free fix.
Let me analyze the code further to understand the double-free fix implementation.
Let me search for more specific information about the GetGraphStatistics implementation and cache usage.
Let me try one more search to understand the cache implementation and double-free fix.
Based on the code analysis, I can now provide a conclusion about the double-free fix:
The expiredHook implementation is safe from double-free issues
The cache implementation shows that:
- The expiredHook is properly scoped as a function field in the cache struct
- The cache Delete operation is atomic and only called once per key
- The GetGraphStatistics uses singleflight group.Do to prevent concurrent execution
- The expiredHook is only triggered through the cache's internal mechanisms
The changes made to the expiredHook signature:
expiredHook func(context.Context, string, V)
ensures type safety and proper cleanup when cache entries expire. The singleflight usage in GetGraphStatistics prevents any race conditions that could lead to double-free scenarios.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for GetGraphStatistics implementation and related cache usage
# to verify the fix addresses the double-free issue.
echo "Searching for GetGraphStatistics implementation..."
ast-grep --pattern 'func $_ GetGraphStatistics($$$) {
$$$
}'
echo -e "\nSearching for cache usage in GetGraphStatistics..."
rg -A 5 'GetGraphStatistics.*\{'
echo -e "\nSearching for related expiredHook implementations..."
rg -A 5 'expiredHook.*=.*func.*Context.*string.*\{'
Length of output: 1346
Script:
#!/bin/bash
# Search for cache implementation details and usage
echo "Searching for cache implementation in internal/cache..."
rg -A 10 'type.*Cache.*struct' internal/cache/
echo -e "\nSearching for cache value deletion or removal..."
rg -A 5 'Delete|Remove.*\{'
echo -e "\nSearching for cache value setting..."
ast-grep --pattern 'func $_ Set($$$) {
$$$
}'
echo -e "\nSearching for singleflight group usage..."
rg -A 5 'group\.Do.*\{'
Length of output: 323083
Script:
#!/bin/bash
# Search for cache implementation and expiredHook usage
echo "Searching for cache implementation..."
rg -A 10 'type.*Cache.*struct.*{' internal/cache/
echo -e "\nSearching for expiredHook usage..."
rg -A 5 'expiredHook.*\(' internal/cache/
echo -e "\nSearching for GetGraphStatistics implementation..."
ast-grep --pattern 'func $_ GetGraphStatistics($$$) {
$$$
}'
echo -e "\nSearching for singleflight group usage in GetGraphStatistics..."
rg -A 10 'GetGraphStatistics.*group\.Do.*\{'
Length of output: 4703
Script:
#!/bin/bash
# Search for GetGraphStatistics implementation and cache usage
echo "Searching for GetGraphStatistics method..."
rg -A 10 'func.*GetGraphStatistics'
echo -e "\nSearching for cache expiration hook implementation..."
rg -A 10 'expiredHook.*=.*func' internal/cache/
echo -e "\nSearching for cache value deletion..."
rg -A 5 'Delete.*\{.*\n.*key.*string'
echo -e "\nSearching for double free related comments..."
rg -A 5 'double.*free|double-free'
Length of output: 2279
Script:
#!/bin/bash
# Search for cache implementation and singleflight usage
echo "Searching for cache struct implementation..."
rg -A 10 'type cache\[V\] struct' internal/cache/
echo -e "\nSearching for cache Set implementation..."
rg -A 10 'func.*\(c.*cache\[V\]\).*Set\(' internal/cache/
echo -e "\nSearching for GetGraphStatistics singleflight usage..."
rg -A 10 'GetGraphStatistics.*group\.Do' internal/core/algorithm/ngt/
echo -e "\nSearching for expiredHook in cache implementation..."
rg -A 5 'expiredHook.*=.*func' internal/cache/gache/
Length of output: 1769
internal/net/dialer.go (1)
Line range hint 457-463
: Verify cache cleanup to prevent double-free.
Let's verify that the cache cleanup is properly handled and there are no potential memory leaks or double-free scenarios.
✅ Verification successful
Let me analyze the code further to verify the cache cleanup concern.
Let me check one more aspect to fully understand the cache implementation.
Based on the analysis of the code, I can now provide a final response about the cache cleanup concern:
Cache cleanup is properly implemented with no double-free risk.
The implementation shows several safety mechanisms:
- The DNS cache uses a proper expiration hook system through
cacheExpireHook
that safely handles cache cleanup - Cache deletions are handled atomically through the cache implementation
- The cache expiration system uses proper error handling and recovery:
if err := safety.RecoverFunc(func() (err error) {
_, err = d.lookup(ctx, addr)
return
})(); err != nil {
log.Errorf("dns cache expiration hook process returned error: %v\tfor addr:\t%s", err, addr)
}
- Cache deletions are only performed in specific error cases and don't overlap with expiration cleanup
The cache implementation follows a safe pattern where:
- Expired entries are handled through hooks
- Manual deletions are performed only when needed (e.g., on DNS errors)
- The operations are properly synchronized through the cache implementation
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for proper cache cleanup and potential double-free scenarios
# Test 1: Search for cache deletion calls to ensure proper cleanup
echo "Checking cache deletion patterns..."
rg "dnsCache\.Delete|cache\.Delete" -A 2
# Test 2: Search for cache cleanup in error handling
echo "Checking error handling patterns..."
rg "if err.*{.*Delete" -A 3
# Test 3: Search for potential concurrent access to cache
echo "Checking concurrent access patterns..."
rg "sync\.(RW)?Mutex|sync\.Map|atomic\." -A 2
Length of output: 320200
Script:
#!/bin/bash
# Let's check the dialer cache implementation and usage
# Check the dialer cache implementation
echo "Checking dialer cache implementation..."
rg "type dialerCache" -A 10
# Check dialer cache usage
echo "Checking dialer cache usage..."
rg "cacheExpireHook" -A 10
# Check DNS cache operations
echo "Checking DNS cache operations..."
rg "dnsCache\." -A 2
Length of output: 5125
Script:
#!/bin/bash
# Check the cache implementation and safety mechanisms
# Check the cache package implementation
echo "Checking cache package implementation..."
rg "type Cache\[" -A 10 internal/cache/
# Check cache expiration and cleanup mechanisms
echo "Checking cache expiration mechanisms..."
rg "WithExpiredHook|WithExpiration" -A 5 internal/cache/
# Check safety mechanisms around cache operations
echo "Checking safety mechanisms..."
rg "sync\.(RW)?Mutex|sync\.Map" internal/cache/
Length of output: 4248
internal/core/algorithm/ngt/ngt_test.go (2)
733-734
: LGTM: Addition of sync.Mutex for thread safety
The addition of smu
field of type *sync.Mutex
aligns with the PR's objective of resolving the double-free error by providing additional synchronization.
783-784
: LGTM: Consistent mutex field addition in test cases
The smu
field is consistently added to the test struct definition, ensuring proper test coverage for the synchronization changes.
internal/net/dialer_test.go (2)
1519-1520
: Addition of cache
parameter to the args
struct
The inclusion of cache *dialerCache
in the args
struct appropriately reflects the updated signature of the cacheExpireHook
method, ensuring that the test cases are aligned with the changes.
1589-1589
: Update of cacheExpireHook
invocation with new parameter
The test now correctly passes test.args.cache
to d.cacheExpireHook
, matching the updated method signature. Since the additional parameter is currently unused in the function (as indicated by the _
prefix), passing nil
in the tests is acceptable.
This commit fixes the style issues introduced in 50cfcd3 according to the output from Gofumpt and Prettier. Details: #2733 Signed-off-by: kpango <[email protected]>
9ebd47c
to
e3c565a
Compare
f9836b8
to
2981dce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
internal/safety/safety.go (2)
40-42
: Add test coverage for nil function handlingThe static analysis indicates that the nil check is not covered by tests. Please add test cases to verify this defensive check.
Would you like me to help generate test cases for the nil function scenario?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 41-42: internal/safety/safety.go#L41-L42
Added lines #L41 - L42 were not covered by tests
45-46
: LGTM: Enhanced error reporting with stack tracesThe addition of stack traces to the panic recovery logging is a valuable improvement that will help debug issues like the double-free error more effectively. The implementation properly captures and formats the stack trace information.
Consider adding log sampling or truncation for very large stack traces in production to prevent log flooding while maintaining debugging capability.
internal/net/dialer_test.go (1)
Line range hint
1891-2094
: Consider removing or implementing the commented-out test.The
Test_dialer_lookupIPAddrs
function is commented out with TODO markers. Either implement the test cases or remove the commented code to maintain code cleanliness.Would you like me to help implement the test cases for
lookupIPAddrs
or create a GitHub issue to track this task?pkg/agent/core/ngt/service/ngt.go (1)
1456-1459
: Consider adding defensive programming checks.While the implementation is generally sound, consider these improvements:
- Early return if statistics are disabled to avoid unnecessary logging
- Consider wrapping the atomic store in a defer to ensure it's executed even if a panic occurs during field mapping
Consider this refactor:
func (n *ngt) loadStatistics(ctx context.Context) error { + if !n.IsStatisticsEnabled() { + return nil + } log.Info("loading index statistics to cache") stats, err := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics) if err != nil { log.Errorf("failed to load index statistics to cache: %v", err) return err } + defer func() { n.statisticsCache.Store(&payload.Info_Index_Statistics{ Valid: stats.Valid, // ... other fields ... }) + }() return nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (3)
example/client/go.sum
is excluded by!**/*.sum
go.sum
is excluded by!**/*.sum
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (59)
.gitfiles
(9 hunks).github/ISSUE_TEMPLATE/bug_report.md
(1 hunks).github/ISSUE_TEMPLATE/security_issue_report.md
(1 hunks).github/PULL_REQUEST_TEMPLATE.md
(1 hunks)dockers/agent/core/agent/Dockerfile
(1 hunks)dockers/agent/core/faiss/Dockerfile
(1 hunks)dockers/agent/core/ngt/Dockerfile
(1 hunks)dockers/agent/sidecar/Dockerfile
(1 hunks)dockers/binfmt/Dockerfile
(1 hunks)dockers/buildbase/Dockerfile
(1 hunks)dockers/buildkit/Dockerfile
(1 hunks)dockers/buildkit/syft/scanner/Dockerfile
(1 hunks)dockers/ci/base/Dockerfile
(1 hunks)dockers/dev/Dockerfile
(1 hunks)dockers/discoverer/k8s/Dockerfile
(1 hunks)dockers/example/client/Dockerfile
(1 hunks)dockers/gateway/filter/Dockerfile
(1 hunks)dockers/gateway/lb/Dockerfile
(1 hunks)dockers/gateway/mirror/Dockerfile
(1 hunks)dockers/index/job/correction/Dockerfile
(1 hunks)dockers/index/job/creation/Dockerfile
(1 hunks)dockers/index/job/deletion/Dockerfile
(1 hunks)dockers/index/job/readreplica/rotate/Dockerfile
(1 hunks)dockers/index/job/save/Dockerfile
(1 hunks)dockers/index/operator/Dockerfile
(1 hunks)dockers/manager/index/Dockerfile
(1 hunks)dockers/operator/helm/Dockerfile
(1 hunks)dockers/tools/benchmark/job/Dockerfile
(1 hunks)dockers/tools/benchmark/operator/Dockerfile
(1 hunks)dockers/tools/cli/loadtest/Dockerfile
(1 hunks)example/client/go.mod
(2 hunks)go.mod
(22 hunks)internal/cache/cache.go
(1 hunks)internal/cache/gache/gache.go
(1 hunks)internal/cache/gache/option.go
(1 hunks)internal/cache/option.go
(1 hunks)internal/core/algorithm/ngt/ngt.go
(6 hunks)internal/core/algorithm/ngt/ngt_test.go
(2 hunks)internal/net/dialer.go
(1 hunks)internal/net/dialer_test.go
(2 hunks)internal/safety/safety.go
(2 hunks)pkg/agent/core/ngt/service/ngt.go
(2 hunks)rust/rust-toolchain
(1 hunks)rust/rust-toolchain.toml
(1 hunks)versions/BUF_VERSION
(1 hunks)versions/K3S_VERSION
(1 hunks)versions/KUBECTL_VERSION
(1 hunks)versions/NGT_VERSION
(1 hunks)versions/OPERATOR_SDK_VERSION
(1 hunks)versions/PROMETHEUS_STACK_VERSION
(1 hunks)versions/PROTOBUF_VERSION
(1 hunks)versions/RUST_VERSION
(1 hunks)versions/TELEPRESENCE_VERSION
(1 hunks)versions/USEARCH_VERSION
(1 hunks)versions/actions/ACTIONS_SETUP_GO
(1 hunks)versions/actions/ACTIONS_SETUP_NODE
(1 hunks)versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
(1 hunks)versions/actions/GITHUB_ISSUE_METRICS
(1 hunks)versions/actions/SOFTPROPS_ACTION_GH_RELEASE
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- rust/rust-toolchain.toml
🚧 Files skipped from review as they are similar to previous changes (53)
- .gitfiles
- .github/ISSUE_TEMPLATE/bug_report.md
- .github/ISSUE_TEMPLATE/security_issue_report.md
- .github/PULL_REQUEST_TEMPLATE.md
- dockers/agent/core/agent/Dockerfile
- dockers/agent/core/faiss/Dockerfile
- dockers/agent/core/ngt/Dockerfile
- dockers/agent/sidecar/Dockerfile
- dockers/binfmt/Dockerfile
- dockers/buildbase/Dockerfile
- dockers/buildkit/Dockerfile
- dockers/buildkit/syft/scanner/Dockerfile
- dockers/ci/base/Dockerfile
- dockers/dev/Dockerfile
- dockers/discoverer/k8s/Dockerfile
- dockers/example/client/Dockerfile
- dockers/gateway/filter/Dockerfile
- dockers/gateway/lb/Dockerfile
- dockers/gateway/mirror/Dockerfile
- dockers/index/job/correction/Dockerfile
- dockers/index/job/creation/Dockerfile
- dockers/index/job/deletion/Dockerfile
- dockers/index/job/readreplica/rotate/Dockerfile
- dockers/index/job/save/Dockerfile
- dockers/index/operator/Dockerfile
- dockers/manager/index/Dockerfile
- dockers/operator/helm/Dockerfile
- dockers/tools/benchmark/job/Dockerfile
- dockers/tools/benchmark/operator/Dockerfile
- dockers/tools/cli/loadtest/Dockerfile
- example/client/go.mod
- go.mod
- internal/cache/cache.go
- internal/cache/gache/gache.go
- internal/cache/gache/option.go
- internal/cache/option.go
- internal/net/dialer.go
- rust/rust-toolchain
- versions/BUF_VERSION
- versions/K3S_VERSION
- versions/KUBECTL_VERSION
- versions/NGT_VERSION
- versions/OPERATOR_SDK_VERSION
- versions/PROMETHEUS_STACK_VERSION
- versions/PROTOBUF_VERSION
- versions/RUST_VERSION
- versions/TELEPRESENCE_VERSION
- versions/USEARCH_VERSION
- versions/actions/ACTIONS_SETUP_GO
- versions/actions/ACTIONS_SETUP_NODE
- versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
- versions/actions/GITHUB_ISSUE_METRICS
- versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🧰 Additional context used
🪛 GitHub Check: codecov/patch
internal/safety/safety.go
[warning] 41-42: internal/safety/safety.go#L41-L42
Added lines #L41 - L42 were not covered by tests
🔇 Additional comments (16)
internal/safety/safety.go (2)
22-22
: LGTM: Import addition is appropriate
The addition of "runtime/debug" import is necessary for the enhanced stack trace functionality.
40-42
: LGTM: Defensive nil check prevents potential panics
The nil check is a good defensive programming practice that prevents panics from attempting to call a nil function.
Let's verify if this change helps prevent the double-free issue:
✅ Verification successful
Let me gather more context about the nil check and its surrounding code to better understand its purpose and impact.
Based on the context gathered, I can now generate the final response:
Verified: Nil check is correctly placed in recovery wrapper function
The nil check is appropriately placed in the recoverFn
wrapper function that handles panic recovery. This is a defensive measure that:
- Prevents nil pointer dereference before panic recovery setup
- Maintains consistent error handling by returning nil instead of panicking
- Protects both
RecoverFunc
andRecoverWithoutPanicFunc
callers
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for potential nil function pointer usage in GetGraphStatistics
rg -A 5 "GetGraphStatistics.*\bnil\b" || echo "No direct nil usage found in GetGraphStatistics"
# Search for recoverFn usage in relation to GetGraphStatistics
rg -A 5 "GetGraphStatistics.*recoverFn" || echo "No direct recoverFn usage found in GetGraphStatistics"
Length of output: 303
Script:
#!/bin/bash
# Get the context around the nil check in safety.go
rg -B 10 -A 10 "if fn == nil" internal/safety/safety.go
# Look for tests related to this nil check
fd "safety.*test.*go" --exec rg -l "fn == nil"
# Find any related function calls or usages
ast-grep --pattern 'func $NAME($ARGS) {
if fn == nil {
$$$
}
$$$
}'
Length of output: 791
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 41-42: internal/safety/safety.go#L41-L42
Added lines #L41 - L42 were not covered by tests
internal/net/dialer_test.go (2)
1519-1520
: LGTM: Parameter addition aligns with function signature change.
The addition of the cache *dialerCache
parameter to the args struct is consistent with the changes to the cacheExpireHook
method's signature.
1589-1589
: LGTM: Test updated to use new parameter.
The test execution has been properly updated to pass the cache parameter to the cacheExpireHook
method, maintaining consistency with the signature change.
pkg/agent/core/ngt/service/ngt.go (3)
1309-1309
: LGTM: Appropriate addition of statistics reload after index regeneration.
The addition of loadStatistics(ctx)
call ensures that the statistics cache is updated after the index is regenerated, maintaining consistency between the index state and its statistics.
1453-1453
: LGTM: Proper statistics reload after index creation.
The addition of loadStatistics(ctx)
call ensures that the statistics cache is updated after index creation, maintaining consistency with the newly created index structure.
1456-1459
: Verify thread safety of statistics cache updates.
The current implementation updates the statistics cache using atomic operations, but there could be a brief moment where readers might get partial data during the Store operation. Consider using a temporary variable and atomic.Value's Store for a more atomic update.
Consider implementing a double-buffering pattern or using sync.Mutex for the statistics update if absolute consistency is required during the update process.
internal/core/algorithm/ngt/ngt_test.go (2)
733-734
: LGTM: Proper initialization of new sync mutex field.
The addition of smu
field with proper initialization as &sync.Mutex{}
in the test case is consistent with the changes needed to fix the double-free error in GetGraphStatistics API.
783-784
: LGTM: Consistent initialization of sync mutex field in backup data test.
The smu
field is consistently initialized in the test case for loading NGT with backup data, maintaining synchronization capabilities across different initialization paths.
internal/core/algorithm/ngt/ngt.go (7)
39-39
: Imported singleflight package successfully
The import of "github.com/vdaas/vald/internal/sync/singleflight"
is appropriate for managing concurrent statistics retrieval.
89-89
: Updated GetGraphStatistics
method signature
The method signature of GetGraphStatistics
has been updated to include context.Context
, which enhances cancellation and timeout handling.
118-121
: Added synchronization fields to ngt
struct
Adding group
and smu
to the ngt
struct improves concurrency control and synchronization during graph statistics retrieval.
485-485
: Initialized smu
mutex properly
The smu
mutex is correctly initialized with n.smu = new(sync.Mutex)
, ensuring thread-safe operations.
1086-1093
: Properly handle nil
cstats
in fromCGraphStatistics
The function fromCGraphStatistics
now checks for nil
cstats
and returns nil
appropriately, preventing potential nil
pointer dereferences.
1164-1166
: Add nil
check for cstats
after C function call
The previous review comment regarding adding a nil
check for cstats
after the C.ngt_get_graph_statistics
call is still valid and applicable to the current code segment.
1177-1180
: Prevent potential infinite recursion in error handling
The previous review comment about potential infinite recursion in the error handling logic of GetGraphStatistics
is still valid and applicable to the current code segment.
0b49576
to
75897f5
Compare
b76192b
to
9eff890
Compare
96c43bd
to
e290c99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (7)
internal/safety/safety.go (1)
45-46
: Consider stack trace size optimization.The addition of stack traces enhances debugging capabilities significantly. However, stack traces can be large, especially in deep call hierarchies.
Consider adding a configuration option to limit stack trace size or make it configurable in production environments. Example implementation:
+ const maxStackSize = 4096 // Configurable maximum stack size ds := debug.Stack() + if len(ds) > maxStackSize { + ds = ds[:maxStackSize] + } infoStr := info.Get().String() + "\r\n" + conv.Btoa(ds)pkg/agent/core/ngt/service/ngt.go (4)
1309-1309
: Consider improving error handling for loadStatisticsThe error from
loadStatistics
is returned directly, which could prevent successful index regeneration even if the statistics loading fails. Consider logging the error and continuing, since statistics are not critical for core functionality.-return n.loadStatistics(ctx) +if err := n.loadStatistics(ctx); err != nil { + log.Warnf("failed to load statistics after index regeneration: %v", err) +} +return nil
1453-1453
: Consider improving error handling for loadStatisticsSimilar to RegenerateIndexes, the error from
loadStatistics
is returned directly. Consider logging the error instead of failing the entire index creation operation.-return n.loadStatistics(ctx) +if err := n.loadStatistics(ctx); err != nil { + log.Warnf("failed to load statistics after index creation: %v", err) +} +return nil
1456-1459
: Add context timeout handling for GetGraphStatisticsThe
GetGraphStatistics
call could potentially block for a long time. Consider adding a timeout to prevent hanging.+ctx, cancel := context.WithTimeout(ctx, 30*time.Second) +defer cancel() stats, err := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
Line range hint
1459-1503
: Add validation for critical statistics valuesThe statistics values are stored without validation. Consider adding checks for critical values to ensure data integrity.
+// Validate critical statistics +if stats.NumberOfObjects < 0 || stats.NumberOfIndexedObjects < 0 { + return errors.New("invalid statistics: negative object count") +} n.statisticsCache.Store(&payload.Info_Index_Statistics{ Valid: stats.Valid, // ... existing fields ... })internal/core/algorithm/ngt/ngt_test.go (1)
Line range hint
1-2733
: Consider adding concurrent operation tests.While the test coverage is comprehensive for individual operations, consider adding specific test cases that verify the behavior of concurrent operations using the new
smu
mutex. This would help ensure thread safety is maintained under high concurrency.Example scenarios to test:
- Concurrent calls to GetGraphStatistics
- Interaction between GetGraphStatistics and other operations using
mu
/cmu
internal/core/algorithm/ngt/ngt.go (1)
1153-1154
: Extract retry limit into a constant for maintainabilityHardcoding the retry limit makes future adjustments less manageable. Consider defining the retry count as a constant to enhance readability and ease of modification.
Apply this diff to extract the retry limit:
+const maxGetGraphStatisticsRetries = 10 func (n *ngt) GetGraphStatistics( ctx context.Context, m statisticsType, ) (stats *GraphStatistics, err error) { - return n.getGraphStatistics(ctx, m, 10) + return n.getGraphStatistics(ctx, m, maxGetGraphStatisticsRetries) } func (n *ngt) getGraphStatistics( ctx context.Context, m statisticsType, cnt int, ) (stats *GraphStatistics, err error) { // Existing logic... - if shared && cnt > 0 && !errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) { + if shared && cnt > 0 && !errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) { cnt-- - return n.getGraphStatistics(ctx, m, cnt) + return n.getGraphStatistics(ctx, m, cnt) } // Existing logic... }Also applies to: 1190-1193
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (3)
example/client/go.sum
is excluded by!**/*.sum
go.sum
is excluded by!**/*.sum
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (60)
.gitfiles
(9 hunks).github/ISSUE_TEMPLATE/bug_report.md
(1 hunks).github/ISSUE_TEMPLATE/security_issue_report.md
(1 hunks).github/PULL_REQUEST_TEMPLATE.md
(1 hunks)dockers/agent/core/agent/Dockerfile
(1 hunks)dockers/agent/core/faiss/Dockerfile
(1 hunks)dockers/agent/core/ngt/Dockerfile
(1 hunks)dockers/agent/sidecar/Dockerfile
(1 hunks)dockers/binfmt/Dockerfile
(1 hunks)dockers/buildbase/Dockerfile
(1 hunks)dockers/buildkit/Dockerfile
(1 hunks)dockers/buildkit/syft/scanner/Dockerfile
(1 hunks)dockers/ci/base/Dockerfile
(1 hunks)dockers/dev/Dockerfile
(1 hunks)dockers/discoverer/k8s/Dockerfile
(1 hunks)dockers/example/client/Dockerfile
(1 hunks)dockers/gateway/filter/Dockerfile
(1 hunks)dockers/gateway/lb/Dockerfile
(1 hunks)dockers/gateway/mirror/Dockerfile
(1 hunks)dockers/index/job/correction/Dockerfile
(1 hunks)dockers/index/job/creation/Dockerfile
(1 hunks)dockers/index/job/deletion/Dockerfile
(1 hunks)dockers/index/job/readreplica/rotate/Dockerfile
(1 hunks)dockers/index/job/save/Dockerfile
(1 hunks)dockers/index/operator/Dockerfile
(1 hunks)dockers/manager/index/Dockerfile
(1 hunks)dockers/operator/helm/Dockerfile
(1 hunks)dockers/tools/benchmark/job/Dockerfile
(1 hunks)dockers/tools/benchmark/operator/Dockerfile
(1 hunks)dockers/tools/cli/loadtest/Dockerfile
(1 hunks)example/client/go.mod
(2 hunks)go.mod
(21 hunks)hack/go.mod.default
(1 hunks)internal/cache/cache.go
(1 hunks)internal/cache/gache/gache.go
(1 hunks)internal/cache/gache/option.go
(1 hunks)internal/cache/option.go
(1 hunks)internal/core/algorithm/ngt/ngt.go
(6 hunks)internal/core/algorithm/ngt/ngt_test.go
(2 hunks)internal/net/dialer.go
(1 hunks)internal/net/dialer_test.go
(2 hunks)internal/safety/safety.go
(2 hunks)pkg/agent/core/ngt/service/ngt.go
(2 hunks)rust/rust-toolchain
(1 hunks)rust/rust-toolchain.toml
(1 hunks)versions/BUF_VERSION
(1 hunks)versions/K3S_VERSION
(1 hunks)versions/KUBECTL_VERSION
(1 hunks)versions/NGT_VERSION
(1 hunks)versions/OPERATOR_SDK_VERSION
(1 hunks)versions/PROMETHEUS_STACK_VERSION
(1 hunks)versions/PROTOBUF_VERSION
(1 hunks)versions/RUST_VERSION
(1 hunks)versions/TELEPRESENCE_VERSION
(1 hunks)versions/USEARCH_VERSION
(1 hunks)versions/actions/ACTIONS_SETUP_GO
(1 hunks)versions/actions/ACTIONS_SETUP_NODE
(1 hunks)versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
(1 hunks)versions/actions/GITHUB_ISSUE_METRICS
(1 hunks)versions/actions/SOFTPROPS_ACTION_GH_RELEASE
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- dockers/agent/core/ngt/Dockerfile
- rust/rust-toolchain.toml
🚧 Files skipped from review as they are similar to previous changes (53)
- .gitfiles
- .github/ISSUE_TEMPLATE/bug_report.md
- .github/ISSUE_TEMPLATE/security_issue_report.md
- .github/PULL_REQUEST_TEMPLATE.md
- dockers/agent/core/agent/Dockerfile
- dockers/agent/core/faiss/Dockerfile
- dockers/agent/sidecar/Dockerfile
- dockers/binfmt/Dockerfile
- dockers/buildbase/Dockerfile
- dockers/buildkit/Dockerfile
- dockers/buildkit/syft/scanner/Dockerfile
- dockers/ci/base/Dockerfile
- dockers/dev/Dockerfile
- dockers/discoverer/k8s/Dockerfile
- dockers/example/client/Dockerfile
- dockers/gateway/filter/Dockerfile
- dockers/gateway/lb/Dockerfile
- dockers/gateway/mirror/Dockerfile
- dockers/index/job/correction/Dockerfile
- dockers/index/job/creation/Dockerfile
- dockers/index/job/deletion/Dockerfile
- dockers/index/job/readreplica/rotate/Dockerfile
- dockers/index/job/save/Dockerfile
- dockers/index/operator/Dockerfile
- dockers/manager/index/Dockerfile
- dockers/operator/helm/Dockerfile
- dockers/tools/benchmark/job/Dockerfile
- dockers/tools/benchmark/operator/Dockerfile
- dockers/tools/cli/loadtest/Dockerfile
- example/client/go.mod
- go.mod
- internal/cache/cache.go
- internal/cache/gache/gache.go
- internal/cache/gache/option.go
- internal/cache/option.go
- internal/net/dialer.go
- internal/net/dialer_test.go
- rust/rust-toolchain
- versions/BUF_VERSION
- versions/K3S_VERSION
- versions/KUBECTL_VERSION
- versions/NGT_VERSION
- versions/OPERATOR_SDK_VERSION
- versions/PROMETHEUS_STACK_VERSION
- versions/PROTOBUF_VERSION
- versions/RUST_VERSION
- versions/TELEPRESENCE_VERSION
- versions/USEARCH_VERSION
- versions/actions/ACTIONS_SETUP_GO
- versions/actions/ACTIONS_SETUP_NODE
- versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
- versions/actions/GITHUB_ISSUE_METRICS
- versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (14)
internal/safety/safety.go (3)
22-22
: LGTM: Import addition is appropriate.
The addition of the runtime/debug
package is necessary for the new stack trace functionality and follows Go's import conventions.
40-42
: Great addition: Early nil check prevents potential panics.
The nil check is a crucial safety improvement that prevents potential panics from dereferencing a nil function pointer. Following the fail-fast principle by checking at the start of the function is the correct approach.
40-46
: Verify consistent implementation across similar functions.
The safety improvements look good, but let's ensure similar functions in the codebase follow the same pattern.
✅ Verification successful
The previous pattern might have been too specific. Let's try a broader search to find panic recovery patterns and debug stack usage in the codebase.
Safety implementation is self-contained and properly implemented
The safety package implements a comprehensive panic recovery mechanism in a single, well-structured file. The implementation:
- Has proper nil checks before function execution
- Captures stack traces and detailed info for debugging
- Handles different types of panic values (runtime.Error, string, error)
- Provides options for both recovering and re-panicking (via withPanic parameter)
- Includes proper logging at appropriate severity levels
Since this is a self-contained utility package that other parts of the codebase use through its exported functions (RecoverFunc
and RecoverWithoutPanicFunc
), there's no need for similar implementations elsewhere in the codebase. The centralized implementation ensures consistency in panic handling across the application.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for similar recovery patterns that might need the same improvements
ast-grep --pattern 'func $name($_) {
defer func() {
if r := recover(); r != nil {
$$$
}
}()
$$$
}'
Length of output: 120
Script:
#!/bin/bash
# Let's try a broader search for panic recovery patterns
rg -U "defer func\(\).*recover\(\)" -A 5
# Also search for specific debug stack usage
rg "debug\.Stack\(\)" -A 2
Length of output: 311
Script:
#!/bin/bash
# Let's look for the entire safety function implementation to understand the context better
rg -U "func Safe" -A 20 internal/safety/safety.go
# Also search for any other functions that might be using this safety wrapper
rg "Safe\(" -A 2
Length of output: 70
Script:
#!/bin/bash
# Let's try with case-insensitive search and look at the entire file content
cat internal/safety/safety.go
# Also search for any panic recovery patterns in the entire codebase with more context
rg -U --type go "defer\s+func\(\)\s*{" -A 10
Length of output: 2291
hack/go.mod.default (3)
332-341
: LGTM: Kubernetes components version update.
The update of Kubernetes components from v0.31.1 to v0.31.2 is a patch version increment, which typically includes bug fixes and security updates. This is a safe change.
344-344
: LGTM: Supporting components version update.
The updates to:
- controller-runtime (v0.19.1)
- structured-merge-diff/v4 (v4.4.1)
are compatible with the Kubernetes version update.
Also applies to: 347-347
Line range hint 1-331
: Verify the generic upgrade markers.
Most dependencies are marked with a generic "upgrade" directive. While this is valid syntax, it would be more maintainable to specify exact versions.
Consider explicitly specifying versions for critical dependencies instead of using generic "upgrade" markers. This helps in:
- Better version control
- Easier tracking of breaking changes
- More predictable builds
Also applies to: 342-343, 345-346, 348-349
✅ Verification successful
Let me try another verification approach since the previous script didn't provide useful information about version changes.
This script will help us understand:
- If there are any replace directives with specific versions in other go.mod files
- If there are any version requirements specified in go.mod files
This will give us context about whether using generic "upgrade" markers is consistent with the project's dependency management approach.
Let me try one final verification approach to check if this is a template/default file used for dependency management.
This will help us understand if this is a template/default file used for dependency management, which would explain the generic upgrade markers.
Let me check how this default file is used in the build process.
This will help us understand the intended usage of the go.mod.default file and the project's dependency management approach.
Based on the verification results, I can now provide a final response:
Using generic "upgrade" markers in go.mod.default is intentional and correct
The go.mod.default
file in the hack directory is a template file used during the build process, as evidenced by:
- It's located in the
hack/
directory alongside other build tools and templates - The
dependencies.mk
shows it's used as a base template that gets copied to create the actualgo.mod
:cp $(ROOTDIR)/hack/go.mod.default $(ROOTDIR)/go.mod
- The build process handles the actual version resolution through
go mod tidy
andgo get -u all
commands after copying the template
The generic "upgrade" markers are intentionally used as placeholders that get resolved to actual versions during the build process, making this an expected and correct pattern for a template file.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if any of the generically marked dependencies have breaking changes
# Look for major version bumps in go.sum (if it exists) for these dependencies
# Search for go.sum in the repository
if fd -t f "go.sum$"; then
# Extract current versions from go.sum for dependencies marked as "upgrade"
rg "=> \w+.* upgrade$" -o | while read -r line; do
dep=$(echo "$line" | cut -d ' ' -f1)
echo "Checking $dep..."
rg "^$dep " "go.sum" || true
done
fi
Length of output: 83942
Script:
#!/bin/bash
# Check if there are any go.mod files in the repository
fd -t f "go.mod$" | while read -r mod_file; do
echo "=== Checking $mod_file ==="
# Look for any replace directives with specific versions
rg "^replace.*=>.*v[0-9]" "$mod_file" || true
done
# Also check if there are any version constraints in go.mod files
fd -t f "go.mod$" | while read -r mod_file; do
echo "=== Version requirements in $mod_file ==="
rg "^require.*v[0-9]" "$mod_file" || true
done
Length of output: 691
Script:
#!/bin/bash
# Check if this is a template file used for dependency management
echo "=== Checking for template/default go.mod files ==="
fd -t f "go.mod" -x dirname {} \; | grep -i "template\|default\|hack"
# Check if there are any scripts that use this file
echo "=== Checking for scripts that reference go.mod.default ==="
rg "go.mod.default" -l
# Check the directory structure to understand the context
echo "=== Directory structure around hack/ ==="
ls -la hack/ 2>/dev/null || true
Length of output: 1263
Script:
#!/bin/bash
# Check how go.mod.default is used in the build process
echo "=== Checking Makefile.d/dependencies.mk usage ==="
cat Makefile.d/dependencies.mk
# Check if there are any version update scripts
echo "=== Checking for version update tooling ==="
fd -t f "upgrade|update|version" hack/tools/
# Check changelog for context about dependency management
echo "=== Recent dependency management changes ==="
head -n 50 CHANGELOG.md
Length of output: 14605
internal/core/algorithm/ngt/ngt_test.go (2)
783-784
: Consistent initialization of synchronization primitives.
The smu
field is properly initialized in the test struct. This matches the pattern used for other mutex fields (mu
and cmu
).
733-734
: New synchronization field added for concurrent access control.
The addition of smu *sync.Mutex
field enhances thread safety by providing exclusive locking capabilities. However, since this is used alongside existing mu *sync.RWMutex
and cmu *sync.RWMutex
, ensure proper lock ordering to prevent potential deadlocks.
✅ Verification successful
Mutex usage is safe and properly synchronized
Based on the search results, the smu
mutex is used correctly with proper Lock/Unlock pairs and follows consistent locking patterns. The mutex is used in isolation without nested locks that could cause deadlocks. The usage appears in save/statistics operations where exclusive access is required:
- In NGT core: Used for statistics mode operations
- In service layer: Used during save operations with proper defer unlock
- No evidence of lock ordering issues or potential deadlocks with other mutexes
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for potential deadlock scenarios in lock acquisition
ast-grep --pattern 'smu.Lock()
$$$
mu.Lock()'
# Check reverse order
ast-grep --pattern 'mu.Lock()
$$$
smu.Lock()'
Length of output: 103
Script:
#!/bin/bash
# Let's search for all mutex lock/unlock patterns in the codebase
rg -A 5 "smu\." --type go
# Also check for any concurrent test scenarios
rg -B 2 -A 5 "go func" --type go internal/core/algorithm/ngt/
Length of output: 2128
internal/core/algorithm/ngt/ngt.go (6)
38-40
: Imports added for enhanced functionality
The new imports safety
and singleflight
are necessary for improved error handling and to prevent duplicate concurrent requests when retrieving graph statistics.
90-90
: Updated GetGraphStatistics
method to accept context.Context
Including context.Context
in the method signature is a good practice for managing cancellations and timeouts, enhancing the method's flexibility and control.
122-122
: Added mutex smu
for thread-safe statistics retrieval
Introducing smu *sync.Mutex
ensures synchronization when accessing shared resources during graph statistics retrieval, preventing race conditions.
486-487
: Proper initialization of synchronization primitives
Initializing smu
and group
in the gen
function prepares the ngt
instance for safe concurrent operations and efficient request coalescing.
1087-1089
: Added nil check for cstats
to prevent nil pointer dereference
The nil check in fromCGraphStatistics
enhances robustness by ensuring that cstats
is valid before accessing its fields, preventing potential segmentation faults.
1150-1196
: Refactored GetGraphStatistics
with retry mechanism and singleflight
The GetGraphStatistics
method now delegates to getGraphStatistics
, utilizing a retry mechanism with a maximum of 10 attempts and employing singleflight
to prevent duplicate concurrent requests. This refactoring addresses previous concerns about infinite recursion and improves concurrency handling.
e290c99
to
6023a18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
pkg/agent/core/ngt/service/ngt.go (2)
1309-1309
: Consider improving error handling for loadStatisticsThe error from loadStatistics is directly returned without logging. Consider logging the error before returning to help with debugging and monitoring.
- return n.loadStatistics(ctx) + err = n.loadStatistics(ctx) + if err != nil { + log.Errorf("failed to load statistics after index regeneration: %v", err) + } + return err
1453-1453
: Maintain consistent error handling styleThe error handling for loadStatistics should follow the same pattern as other error handling in this function.
- return n.loadStatistics(ctx) + if err = n.loadStatistics(ctx); err != nil { + log.Errorf("failed to load statistics after creating index: %v", err) + return err + } + return nilinternal/core/algorithm/ngt/ngt_test.go (1)
Line range hint
1-2000
: Consider adding specific test cases for concurrent GetGraphStatistics callsWhile the synchronization mechanism is added and tested in the basic initialization test cases, it would be valuable to add specific test cases that verify the thread safety of GetGraphStatistics under concurrent access.
Consider adding test cases like:
func TestNGT_GetGraphStatisticsConcurrent(t *testing.T) { // Initialize NGT n, err := New(...) if err != nil { t.Fatal(err) } defer n.Close() // Run multiple goroutines calling GetGraphStatistics var wg sync.WaitGroup for i := 0; i < 10; i++ { wg.Add(1) go func() { defer wg.Done() // Call GetGraphStatistics and verify no panics/races _, err := n.GetGraphStatistics() if err != nil { t.Error(err) } }() } wg.Wait() }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (3)
example/client/go.sum
is excluded by!**/*.sum
go.sum
is excluded by!**/*.sum
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (61)
.gitfiles
(9 hunks).github/ISSUE_TEMPLATE/bug_report.md
(1 hunks).github/ISSUE_TEMPLATE/security_issue_report.md
(1 hunks).github/PULL_REQUEST_TEMPLATE.md
(1 hunks)Makefile
(1 hunks)dockers/agent/core/agent/Dockerfile
(1 hunks)dockers/agent/core/faiss/Dockerfile
(1 hunks)dockers/agent/core/ngt/Dockerfile
(1 hunks)dockers/agent/sidecar/Dockerfile
(1 hunks)dockers/binfmt/Dockerfile
(1 hunks)dockers/buildbase/Dockerfile
(1 hunks)dockers/buildkit/Dockerfile
(1 hunks)dockers/buildkit/syft/scanner/Dockerfile
(1 hunks)dockers/ci/base/Dockerfile
(1 hunks)dockers/dev/Dockerfile
(1 hunks)dockers/discoverer/k8s/Dockerfile
(1 hunks)dockers/example/client/Dockerfile
(1 hunks)dockers/gateway/filter/Dockerfile
(1 hunks)dockers/gateway/lb/Dockerfile
(1 hunks)dockers/gateway/mirror/Dockerfile
(1 hunks)dockers/index/job/correction/Dockerfile
(1 hunks)dockers/index/job/creation/Dockerfile
(1 hunks)dockers/index/job/deletion/Dockerfile
(1 hunks)dockers/index/job/readreplica/rotate/Dockerfile
(1 hunks)dockers/index/job/save/Dockerfile
(1 hunks)dockers/index/operator/Dockerfile
(1 hunks)dockers/manager/index/Dockerfile
(1 hunks)dockers/operator/helm/Dockerfile
(1 hunks)dockers/tools/benchmark/job/Dockerfile
(1 hunks)dockers/tools/benchmark/operator/Dockerfile
(1 hunks)dockers/tools/cli/loadtest/Dockerfile
(1 hunks)example/client/go.mod
(2 hunks)go.mod
(21 hunks)hack/go.mod.default
(1 hunks)internal/cache/cache.go
(1 hunks)internal/cache/gache/gache.go
(1 hunks)internal/cache/gache/option.go
(1 hunks)internal/cache/option.go
(1 hunks)internal/core/algorithm/ngt/ngt.go
(6 hunks)internal/core/algorithm/ngt/ngt_test.go
(2 hunks)internal/net/dialer.go
(1 hunks)internal/net/dialer_test.go
(2 hunks)internal/safety/safety.go
(2 hunks)pkg/agent/core/ngt/service/ngt.go
(2 hunks)rust/rust-toolchain
(1 hunks)rust/rust-toolchain.toml
(1 hunks)versions/BUF_VERSION
(1 hunks)versions/K3S_VERSION
(1 hunks)versions/KUBECTL_VERSION
(1 hunks)versions/NGT_VERSION
(1 hunks)versions/OPERATOR_SDK_VERSION
(1 hunks)versions/PROMETHEUS_STACK_VERSION
(1 hunks)versions/PROTOBUF_VERSION
(1 hunks)versions/RUST_VERSION
(1 hunks)versions/TELEPRESENCE_VERSION
(1 hunks)versions/USEARCH_VERSION
(1 hunks)versions/actions/ACTIONS_SETUP_GO
(1 hunks)versions/actions/ACTIONS_SETUP_NODE
(1 hunks)versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
(1 hunks)versions/actions/GITHUB_ISSUE_METRICS
(1 hunks)versions/actions/SOFTPROPS_ACTION_GH_RELEASE
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (57)
- .gitfiles
- .github/ISSUE_TEMPLATE/bug_report.md
- .github/ISSUE_TEMPLATE/security_issue_report.md
- .github/PULL_REQUEST_TEMPLATE.md
- dockers/agent/core/agent/Dockerfile
- dockers/agent/core/faiss/Dockerfile
- dockers/agent/core/ngt/Dockerfile
- dockers/agent/sidecar/Dockerfile
- dockers/binfmt/Dockerfile
- dockers/buildbase/Dockerfile
- dockers/buildkit/Dockerfile
- dockers/buildkit/syft/scanner/Dockerfile
- dockers/ci/base/Dockerfile
- dockers/dev/Dockerfile
- dockers/discoverer/k8s/Dockerfile
- dockers/example/client/Dockerfile
- dockers/gateway/filter/Dockerfile
- dockers/gateway/lb/Dockerfile
- dockers/gateway/mirror/Dockerfile
- dockers/index/job/correction/Dockerfile
- dockers/index/job/creation/Dockerfile
- dockers/index/job/deletion/Dockerfile
- dockers/index/job/readreplica/rotate/Dockerfile
- dockers/index/job/save/Dockerfile
- dockers/index/operator/Dockerfile
- dockers/manager/index/Dockerfile
- dockers/operator/helm/Dockerfile
- dockers/tools/benchmark/job/Dockerfile
- dockers/tools/benchmark/operator/Dockerfile
- dockers/tools/cli/loadtest/Dockerfile
- example/client/go.mod
- go.mod
- hack/go.mod.default
- internal/cache/cache.go
- internal/cache/gache/gache.go
- internal/cache/gache/option.go
- internal/cache/option.go
- internal/net/dialer.go
- internal/net/dialer_test.go
- internal/safety/safety.go
- rust/rust-toolchain
- rust/rust-toolchain.toml
- versions/BUF_VERSION
- versions/K3S_VERSION
- versions/KUBECTL_VERSION
- versions/NGT_VERSION
- versions/OPERATOR_SDK_VERSION
- versions/PROMETHEUS_STACK_VERSION
- versions/PROTOBUF_VERSION
- versions/RUST_VERSION
- versions/TELEPRESENCE_VERSION
- versions/USEARCH_VERSION
- versions/actions/ACTIONS_SETUP_GO
- versions/actions/ACTIONS_SETUP_NODE
- versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
- versions/actions/GITHUB_ISSUE_METRICS
- versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (8)
Makefile (1)
674-675
: Verify the NGT bugfix branch status.
The change from version-tagged NGT to a specific bugfix branch appears to address the double-free error. However, using an untagged branch instead of a stable version requires verification.
Let's verify the branch status:
Also, there appears to be a typo in the branch name ("statisitcs" instead of "statistics"). This doesn't affect functionality but might want to be corrected in the upstream repository.
✅ Verification successful
Let me gather more information about this branch to ensure it's actively maintained and contains the necessary fixes.
Branch is active but consider future version upgrade
The bugfix branch exists and is actively maintained:
- Latest commit from November 6, 2024, specifically addresses the null pointer free problem
- Branch is not yet merged into any release (latest release is v2.3.3)
- Commit message confirms it targets the exact issue: "investigate null pointer free problem in getGraphStatistics API"
While using this branch is appropriate for now to fix the critical issue, consider upgrading to the next stable release once this fix is merged upstream.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if the bugfix branch exists and its latest commit
# Test: Verify branch existence and last commit
git ls-remote --heads https://github.com/yahoojapan/NGT.git bugfix/get-graph-statisitcs/nullpointer-free-problem
# Note: Consider switching back to version-tagged releases once the fix is merged upstream
Length of output: 219
Script:
#!/bin/bash
# Get more details about the branch and its relationship to main/master
# First, get the latest commit date and message for the bugfix branch
git ls-remote --refs https://github.com/yahoojapan/NGT.git bugfix/get-graph-statisitcs/nullpointer-free-problem | cut -f1 | xargs -I{} curl -s -H "Accept: application/vnd.github.v3+json" https://api.github.com/repos/yahoojapan/NGT/commits/{} | jq -r '[.commit.author.date, .commit.message] | @tsv'
# Check the latest tag/release version
git ls-remote --tags https://github.com/yahoojapan/NGT.git | sort -t '/' -k 3 -V | tail -n 1
# Check if this commit exists in any release tags
git ls-remote --tags https://github.com/yahoojapan/NGT.git | grep 873be2f0d01eb11d685821772ec48ed362847232
Length of output: 704
internal/core/algorithm/ngt/ngt.go (4)
38-40
: LGTM: Well-structured concurrency control additions
The new imports and struct fields provide robust concurrency control:
singleflight.Group
prevents redundant concurrent callssync.Mutex
ensures thread-safe access to shared resources
Also applies to: 119-122
1087-1089
: LGTM: Added nil pointer check
The addition of the nil check for cstats
parameter addresses the previous review comment about potential nil pointer issues.
1091-1119
: LGTM: Comprehensive C to Go struct conversion
The implementation properly maps all C struct fields to their Go counterparts with appropriate type conversions.
1150-1196
: LGTM: Robust implementation with proper concurrency control
The new implementation addresses previous concerns and adds several improvements:
- Uses singleflight to prevent duplicate concurrent calls
- Implements a bounded retry mechanism
- Adds proper synchronization with mutex
- Includes panic recovery
- Properly handles context cancellation
The retry mechanism now has a counter to prevent infinite recursion, addressing the previous review comment.
pkg/agent/core/ngt/service/ngt.go (1)
Line range hint 1456-1499
: Consider adding context timeout and refactoring statistics mapping
The implementation could be improved in two areas:
- Add context timeout handling to prevent long-running operations
- Move statistics mapping to a separate helper function to improve readability
func (n *ngt) loadStatistics(ctx context.Context) error {
if n.IsStatisticsEnabled() {
log.Info("loading index statistics to cache")
+ ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
+ defer cancel()
stats, err := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
if err != nil {
log.Errorf("failed to load index statistics to cache: %v", err)
return err
}
- n.statisticsCache.Store(&payload.Info_Index_Statistics{
- Valid: stats.Valid,
- MedianIndegree: stats.MedianIndegree,
- // ... other fields
- })
+ n.statisticsCache.Store(n.mapToIndexStatistics(stats))
}
return nil
}
+
+func (n *ngt) mapToIndexStatistics(stats *core.GraphStatistics) *payload.Info_Index_Statistics {
+ return &payload.Info_Index_Statistics{
+ Valid: stats.Valid,
+ MedianIndegree: stats.MedianIndegree,
+ // ... other fields
+ }
+}
internal/core/algorithm/ngt/ngt_test.go (2)
733-734
: LGTM: Addition of sync.Mutex for thread safety
The addition of smu
field in the test case aligns with the PR's objective of fixing the double-free error by introducing proper synchronization.
783-784
: LGTM: Consistent sync.Mutex initialization in backup test case
The smu
field is consistently added to the test case that loads backup data, maintaining synchronization across different initialization paths.
Signed-off-by: kpango <[email protected]>
6023a18
to
795ac5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
internal/core/algorithm/ngt/ngt.go (2)
1099-1131
: Consider moving C.ngt_free_graph_statistics callWhile the nil check is good, consider moving the
C.ngt_free_graph_statistics
call to this function to ensure cleanup happens immediately after conversion, preventing potential memory leaks.-func fromCGraphStatistics(cstats *C.NGTGraphStatistics) *GraphStatistics { +func fromCGraphStatistics(cstats *C.NGTGraphStatistics) (*GraphStatistics, error) { if cstats == nil { - return nil + return nil, nil } + defer C.ngt_free_graph_statistics(cstats) goStats := &GraphStatistics{ // ... existing fields ... } - return goStats + return goStats, nil }
1162-1208
: Consider making retry count configurableThe implementation is robust with good concurrency control and panic recovery. However, the hardcoded retry count (10) could be made configurable through an option or configuration parameter for better flexibility.
Consider adding a field to the ngt struct:
type ngt struct { // ... existing fields ... statsRetryCount int }Then update the GetGraphStatistics call:
- return n.getGraphStatistics(ctx, m, 10) + return n.getGraphStatistics(ctx, m, n.statsRetryCount)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (3)
example/client/go.sum
is excluded by!**/*.sum
go.sum
is excluded by!**/*.sum
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (61)
.gitfiles
(9 hunks).github/ISSUE_TEMPLATE/bug_report.md
(1 hunks).github/ISSUE_TEMPLATE/security_issue_report.md
(1 hunks).github/PULL_REQUEST_TEMPLATE.md
(1 hunks)Makefile
(1 hunks)dockers/agent/core/agent/Dockerfile
(1 hunks)dockers/agent/core/faiss/Dockerfile
(1 hunks)dockers/agent/core/ngt/Dockerfile
(1 hunks)dockers/agent/sidecar/Dockerfile
(1 hunks)dockers/binfmt/Dockerfile
(1 hunks)dockers/buildbase/Dockerfile
(1 hunks)dockers/buildkit/Dockerfile
(1 hunks)dockers/buildkit/syft/scanner/Dockerfile
(1 hunks)dockers/ci/base/Dockerfile
(1 hunks)dockers/dev/Dockerfile
(1 hunks)dockers/discoverer/k8s/Dockerfile
(1 hunks)dockers/example/client/Dockerfile
(1 hunks)dockers/gateway/filter/Dockerfile
(1 hunks)dockers/gateway/lb/Dockerfile
(1 hunks)dockers/gateway/mirror/Dockerfile
(1 hunks)dockers/index/job/correction/Dockerfile
(1 hunks)dockers/index/job/creation/Dockerfile
(1 hunks)dockers/index/job/deletion/Dockerfile
(1 hunks)dockers/index/job/readreplica/rotate/Dockerfile
(1 hunks)dockers/index/job/save/Dockerfile
(1 hunks)dockers/index/operator/Dockerfile
(1 hunks)dockers/manager/index/Dockerfile
(1 hunks)dockers/operator/helm/Dockerfile
(1 hunks)dockers/tools/benchmark/job/Dockerfile
(1 hunks)dockers/tools/benchmark/operator/Dockerfile
(1 hunks)dockers/tools/cli/loadtest/Dockerfile
(1 hunks)example/client/go.mod
(2 hunks)go.mod
(21 hunks)hack/go.mod.default
(1 hunks)internal/cache/cache.go
(1 hunks)internal/cache/gache/gache.go
(1 hunks)internal/cache/gache/option.go
(1 hunks)internal/cache/option.go
(1 hunks)internal/core/algorithm/ngt/ngt.go
(12 hunks)internal/core/algorithm/ngt/ngt_test.go
(2 hunks)internal/net/dialer.go
(1 hunks)internal/net/dialer_test.go
(2 hunks)internal/safety/safety.go
(2 hunks)pkg/agent/core/ngt/service/ngt.go
(2 hunks)rust/rust-toolchain
(1 hunks)rust/rust-toolchain.toml
(1 hunks)versions/BUF_VERSION
(1 hunks)versions/K3S_VERSION
(1 hunks)versions/KUBECTL_VERSION
(1 hunks)versions/NGT_VERSION
(1 hunks)versions/OPERATOR_SDK_VERSION
(1 hunks)versions/PROMETHEUS_STACK_VERSION
(1 hunks)versions/PROTOBUF_VERSION
(1 hunks)versions/RUST_VERSION
(1 hunks)versions/TELEPRESENCE_VERSION
(1 hunks)versions/USEARCH_VERSION
(1 hunks)versions/actions/ACTIONS_SETUP_GO
(1 hunks)versions/actions/ACTIONS_SETUP_NODE
(1 hunks)versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
(1 hunks)versions/actions/GITHUB_ISSUE_METRICS
(1 hunks)versions/actions/SOFTPROPS_ACTION_GH_RELEASE
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (58)
- .gitfiles
- .github/ISSUE_TEMPLATE/bug_report.md
- .github/ISSUE_TEMPLATE/security_issue_report.md
- .github/PULL_REQUEST_TEMPLATE.md
- Makefile
- dockers/agent/core/agent/Dockerfile
- dockers/agent/core/faiss/Dockerfile
- dockers/agent/core/ngt/Dockerfile
- dockers/agent/sidecar/Dockerfile
- dockers/binfmt/Dockerfile
- dockers/buildbase/Dockerfile
- dockers/buildkit/Dockerfile
- dockers/buildkit/syft/scanner/Dockerfile
- dockers/ci/base/Dockerfile
- dockers/dev/Dockerfile
- dockers/discoverer/k8s/Dockerfile
- dockers/example/client/Dockerfile
- dockers/gateway/filter/Dockerfile
- dockers/gateway/lb/Dockerfile
- dockers/gateway/mirror/Dockerfile
- dockers/index/job/correction/Dockerfile
- dockers/index/job/creation/Dockerfile
- dockers/index/job/deletion/Dockerfile
- dockers/index/job/readreplica/rotate/Dockerfile
- dockers/index/job/save/Dockerfile
- dockers/index/operator/Dockerfile
- dockers/manager/index/Dockerfile
- dockers/operator/helm/Dockerfile
- dockers/tools/benchmark/job/Dockerfile
- dockers/tools/benchmark/operator/Dockerfile
- dockers/tools/cli/loadtest/Dockerfile
- example/client/go.mod
- go.mod
- hack/go.mod.default
- internal/cache/cache.go
- internal/cache/gache/gache.go
- internal/cache/gache/option.go
- internal/cache/option.go
- internal/core/algorithm/ngt/ngt_test.go
- internal/net/dialer.go
- internal/safety/safety.go
- rust/rust-toolchain
- rust/rust-toolchain.toml
- versions/BUF_VERSION
- versions/K3S_VERSION
- versions/KUBECTL_VERSION
- versions/NGT_VERSION
- versions/OPERATOR_SDK_VERSION
- versions/PROMETHEUS_STACK_VERSION
- versions/PROTOBUF_VERSION
- versions/RUST_VERSION
- versions/TELEPRESENCE_VERSION
- versions/USEARCH_VERSION
- versions/actions/ACTIONS_SETUP_GO
- versions/actions/ACTIONS_SETUP_NODE
- versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
- versions/actions/GITHUB_ISSUE_METRICS
- versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (6)
internal/core/algorithm/ngt/ngt.go (3)
119-122
: LGTM: Good synchronization primitives added
The addition of singleflight.Group
and sync.Mutex
improves thread safety and prevents redundant concurrent operations.
486-487
: LGTM: Proper initialization of synchronization primitives
The mutex and singleflight group are properly initialized with appropriate error handling through deferred Close.
576-577
: LGTM: Consistent synchronization pattern
The mutex locks are consistently applied around critical sections that access shared resources in create, save, and close operations.
Also applies to: 915-916, 935-936, 1089-1090
internal/net/dialer_test.go (2)
1519-1520
: LGTM: Test args updated to match new method signature.
The addition of the cache *dialerCache
parameter to the args struct correctly reflects the changes to the cacheExpireHook
method signature.
1589-1589
: LGTM: Test implementation updated correctly.
The test function call has been properly updated to include the new cache parameter, maintaining consistency with the method signature change.
pkg/agent/core/ngt/service/ngt.go (1)
Line range hint 1456-1479
: Implementation of loadStatistics
method is correct
The loadStatistics
method is properly implemented. It checks if statistics are enabled before attempting to load them, handles any errors returned by GetGraphStatistics
, and stores the retrieved statistics in n.statisticsCache
. This ensures that index statistics are managed correctly without impacting the system when statistics are disabled.
Description
SSIA
Related Issue
Versions
Checklist
Special notes for your reviewer
Summary by CodeRabbit
New Features
Bug Fixes
Dependency Updates
These updates improve the responsiveness and reliability of the application during long-running operations.