Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] resolve agent GetGraphStatistics API double-free error problem #2733

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kpango
Copy link
Collaborator

@kpango kpango commented Nov 5, 2024

Description

SSIA

Related Issue

Versions

  • Vald Version: v1.7.13
  • Go Version: v1.23.2
  • Rust Version: v1.81.0
  • Docker Version: v27.3.1
  • Kubernetes Version: v1.31.1
  • Helm Version: v3.16.2
  • NGT Version: v2.2.4
  • Faiss Version: v1.9.0

Checklist

Special notes for your reviewer

Summary by CodeRabbit

  • New Features

    • Enhanced methods for retrieving graph statistics and managing indexes now support cancellation and timeout capabilities via context parameters.
    • Improved error handling for loading and saving indexes.
    • New configuration files added for job deletion functionality.
  • Bug Fixes

    • Added synchronization mechanisms to prevent race conditions during concurrent access to graph statistics.
  • Dependency Updates

    • Updated various dependencies, including Rust (1.81.0 → 1.82.0), NGT (2.2.4 → 2.3.3), Kubernetes (1.31.1 → 1.31.2), and other libraries, ensuring the application uses the latest features and fixes.
    • Version updates for components such as Vald (1.7.13 → 1.7.14), Prometheus Stack (65.1.1 → 65.8.0), and others.

These updates improve the responsiveness and reliability of the application during long-running operations.

Copy link
Contributor

coderabbitai bot commented Nov 5, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The pull request introduces a series of modifications that include the addition and renaming of various files, particularly in the Docker and workflow configurations. New files for job deletion and Docker configurations have been added, while some existing files have been renamed or removed. Additionally, several dependency version updates have been made across various modules and templates, reflecting a routine maintenance effort to keep the project up to date.

Changes

File/Path Change Summary
.github/workflows/dockers-example-client-image.yaml New file added
.github/workflows/dockers-index-deletion.yaml New file added
cmd/index/job/deletion/main.go New file added
cmd/index/job/deletion/sample.yaml New file added
dockers/index/job/deletion/Dockerfile New file added
.github/workflows/update-actions.yaml Renamed to .github/workflows/update-deps.yaml
k8s/index/job/deletion/configmap.yaml New file added
k8s/index/job/deletion/cronjob.yaml New file added
apis/proto/buf.lock, apis/proto/buf.yaml Removed, replaced with buf.lock, buf.yaml
.github/ISSUE_TEMPLATE/*.md Version updates for Vald, Rust, Kubernetes, and NGT in templates
dockers/*/Dockerfile Various minor modifications, primarily adding newlines or adjusting ENTRYPOINT formatting
example/client/go.mod Dependency version updates for multiple packages
go.mod Dependency version updates for various Google Cloud libraries and others
internal/core/algorithm/ngt/ngt.go Enhanced GetGraphStatistics method with context handling and additional fields
internal/core/algorithm/ngt/ngt_test.go Added smu field to ngt struct and updated tests
pkg/agent/core/ngt/service/ngt.go Added context parameters to several methods for better resource management
rust/rust-toolchain, versions/* Multiple version updates across various files

Possibly related PRs

Suggested reviewers

  • vankichi
  • hlts2

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vdaas-ci
Copy link
Collaborator

vdaas-ci commented Nov 5, 2024

[CHATOPS:HELP] ChatOps commands.

  • 🙆‍♀️ /approve - approve
  • 🍱 /format - format codes and add licenses
  • /gen-test - generate test codes
  • 🏷️ /label - add labels
  • 🔚 2️⃣ 🔚 /label actions/e2e-deploy - run E2E deploy & integration test

Copy link

cloudflare-workers-and-pages bot commented Nov 5, 2024

Deploying vald with  Cloudflare Pages  Cloudflare Pages

Latest commit: 795ac5c
Status: ✅  Deploy successful!
Preview URL: https://d3532178.vald.pages.dev
Branch Preview URL: https://bugfix-agent-statistics-doub.vald.pages.dev

View logs

Copy link

codecov bot commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 5 lines in your changes missing coverage. Please review.

Project coverage is 23.24%. Comparing base (467ab10) to head (795ac5c).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
internal/safety/safety.go 40.00% 2 Missing and 1 partial ⚠️
pkg/agent/core/ngt/service/ngt.go 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2733      +/-   ##
==========================================
- Coverage   23.95%   23.24%   -0.71%     
==========================================
  Files         545      539       -6     
  Lines       54322    53298    -1024     
==========================================
- Hits        13011    12390     -621     
+ Misses      40537    40170     -367     
+ Partials      774      738      -36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kpango kpango force-pushed the bugfix/agent/statistics-double-free-problem branch from a9ce087 to 50cfcd3 Compare November 5, 2024 11:52
deepsource-autofix bot added a commit that referenced this pull request Nov 5, 2024
This commit fixes the style issues introduced in 50cfcd3 according to the output
from Gofumpt and Prettier.

Details: #2733
@github-actions github-actions bot removed the size/L label Nov 5, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (1)
pkg/agent/core/ngt/service/ngt.go (1)

Line range hint 1456-1477: Avoid redundant error logging in loadStatistics

In the loadStatistics function, the error is logged using log.Errorf and then returned. This can lead to redundant logging if the caller also logs the error. To prevent duplicate log entries and adhere to logging best practices, consider removing the log statement here and allowing the caller to handle the error logging.

Apply this diff to adjust the logging:

 func (n *ngt) loadStatistics(ctx context.Context) error {
     if n.IsStatisticsEnabled() {
         log.Info("loading index statistics to cache")
         stats, err := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
         if err != nil {
-            log.Errorf("failed to load index statistics to cache: %v", err)
             return err
         }
         n.statisticsCache.Store(&payload.Info_Index_Statistics{
             Valid:                            stats.Valid,
             // ...
         })
     }
     return nil
 }
🛑 Comments failed to post (3)
internal/core/algorithm/ngt/ngt.go (1)

1149-1182: ⚠️ Potential issue

Potential infinite recursion in GetGraphStatistics

The recursive call to n.GetGraphStatistics at line 1178 may lead to infinite recursion if shared remains true and the error is not ErrNGTIndexStatisticsNotReady. This can occur if the error persists across retries. Consider revising the error handling logic to prevent infinite recursion, perhaps by limiting the number of retries or handling specific errors differently.

pkg/agent/core/ngt/service/ngt.go (2)

1309-1309: ⚠️ Potential issue

Handle potential error from loadStatistics in RegenerateIndexes

The RegenerateIndexes function now returns the result of n.loadStatistics(ctx), which introduces new error propagation. Previously, RegenerateIndexes might not have returned an error at this point. Ensure that all callers of RegenerateIndexes are prepared to handle this potential error, or consider handling the error within the function to maintain existing behavior.


1453-1453: ⚠️ Potential issue

Handle potential error from loadStatistics in CreateIndex

Similarly, CreateIndex now returns the result of n.loadStatistics(ctx). This change may affect the function's contract by introducing an error return where it might not have been expected before. Review and update any code that calls CreateIndex to handle the potential error appropriately, or handle the error within CreateIndex if that aligns better with the intended design.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (7)
dockers/index/job/deletion/Dockerfile (1)

85-85: Ensure proper permissions for the config file.

The configuration file is being copied after the binary. Consider:

  1. Setting appropriate read permissions for the nonroot user
  2. Moving this COPY instruction before the USER instruction to ensure proper ownership
-COPY cmd/index/job/deletion/sample.yaml /etc/server/config.yaml
-USER nonroot:nonroot
+COPY --chown=nonroot:nonroot cmd/index/job/deletion/sample.yaml /etc/server/config.yaml
+USER nonroot:nonroot
internal/core/algorithm/ngt/ngt_test.go (1)

Line range hint 733-784: Consider adding test cases for concurrent access

While the mutex field has been added, there are no explicit test cases verifying concurrent access patterns and the effectiveness of the mutex in preventing the double-free issue.

Consider adding test cases like:

+func Test_ngt_ConcurrentAccess(t *testing.T) {
+    // Setup test NGT instance
+    n, err := New(...)
+    if err != nil {
+        t.Fatal(err)
+    }
+    defer n.Close()
+
+    // Run concurrent operations
+    var wg sync.WaitGroup
+    for i := 0; i < 10; i++ {
+        wg.Add(1)
+        go func() {
+            defer wg.Done()
+            // Perform operations that could trigger double-free
+            // e.g., GetGraphStatistics, Insert, Remove etc.
+        }()
+    }
+    wg.Wait()
+}
pkg/agent/core/ngt/service/ngt.go (5)

1309-1309: Consider handling errors from loadStatistics without altering function behavior

Adding return n.loadStatistics(ctx) changes the error propagation of RegenerateIndexes, which might lead to unintended side effects if callers are not expecting additional errors. Consider logging the error instead of returning it to maintain existing behavior.

Apply this diff to handle the error internally:

 	// regenerate indexes logic
 	...
-	return n.loadStatistics(ctx)
+	if err := n.loadStatistics(ctx); err != nil {
+		log.Errorf("failed to load statistics: %v", err)
+	}
+	return nil
 }

1453-1453: Ensure consistent error handling in CreateIndex

Similarly, returning n.loadStatistics(ctx) in CreateIndex may alter the function's error behavior. Verify that this change does not impact callers unexpectedly. Consider handling the error internally to maintain consistency.

Apply this diff to adjust error handling:

 	atomic.AddUint64(&n.nocie, 1)
 	n.nopvq.Add(vqProcessedCnt)
 	log.Debug("create graph and tree phase finished")
 	log.Info("create index operation finished")
 
-	return n.loadStatistics(ctx)
+	if err := n.loadStatistics(ctx); err != nil {
+		log.Errorf("failed to load statistics: %v", err)
+	}
+	return nil
 }

Line range hint 1456-1480: Add early exit in loadStatistics when statistics are disabled

Currently, loadStatistics assumes that statistics are enabled. To avoid unnecessary processing, add a check at the beginning of the function to exit early when statistics gathering is disabled.

Apply this diff:

 func (n *ngt) loadStatistics(ctx context.Context) error {
+	if !n.IsStatisticsEnabled() {
+		return nil
+	}
 	log.Info("loading index statistics to cache")
 	stats, err := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
 	if err != nil {
 		log.Errorf("failed to load index statistics to cache: %v", err)
 		return err
 	}
 	n.statisticsCache.Store(&payload.Info_Index_Statistics{
 		Valid:                            stats.Valid,
 		// ... other fields ...
 	})
 	return nil
 }

Line range hint 1459-1477: Handle potential nil stats from GetGraphStatistics

Ensure that stats is not nil before using it to prevent possible nil pointer dereference. Add a nil check after calling GetGraphStatistics.

Apply this diff:

 	stats, err := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
 	if err != nil {
 		log.Errorf("failed to load index statistics to cache: %v", err)
 		return err
 	}
+	if stats == nil {
+		log.Warn("received nil statistics from GetGraphStatistics")
+		return errors.New("received nil statistics")
+	}
 	n.statisticsCache.Store(&payload.Info_Index_Statistics{
 		Valid:                            stats.Valid,
 		// ... other fields ...
 	})

1456-1458: Provide documentation for loadStatistics function

The loadStatistics function lacks comments explaining its purpose and usage. Adding a function-level comment would improve code readability and maintainability.

Example:

+// loadStatistics loads index statistics into the cache if statistics gathering is enabled.
 func (n *ngt) loadStatistics(ctx context.Context) error {
 	if !n.IsStatisticsEnabled() {
 		return nil
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a583f09 and 4946ddf.

⛔ Files ignored due to path filters (3)
  • example/client/go.sum is excluded by !**/*.sum
  • go.sum is excluded by !**/*.sum
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (51)
  • .gitfiles (9 hunks)
  • .github/ISSUE_TEMPLATE/bug_report.md (1 hunks)
  • .github/ISSUE_TEMPLATE/security_issue_report.md (1 hunks)
  • .github/PULL_REQUEST_TEMPLATE.md (1 hunks)
  • dockers/agent/core/agent/Dockerfile (1 hunks)
  • dockers/agent/core/faiss/Dockerfile (1 hunks)
  • dockers/agent/core/ngt/Dockerfile (1 hunks)
  • dockers/agent/sidecar/Dockerfile (1 hunks)
  • dockers/binfmt/Dockerfile (1 hunks)
  • dockers/buildbase/Dockerfile (1 hunks)
  • dockers/buildkit/Dockerfile (1 hunks)
  • dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
  • dockers/ci/base/Dockerfile (1 hunks)
  • dockers/dev/Dockerfile (1 hunks)
  • dockers/discoverer/k8s/Dockerfile (1 hunks)
  • dockers/example/client/Dockerfile (1 hunks)
  • dockers/gateway/filter/Dockerfile (1 hunks)
  • dockers/gateway/lb/Dockerfile (1 hunks)
  • dockers/gateway/mirror/Dockerfile (1 hunks)
  • dockers/index/job/correction/Dockerfile (1 hunks)
  • dockers/index/job/creation/Dockerfile (1 hunks)
  • dockers/index/job/deletion/Dockerfile (1 hunks)
  • dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
  • dockers/index/job/save/Dockerfile (1 hunks)
  • dockers/index/operator/Dockerfile (1 hunks)
  • dockers/manager/index/Dockerfile (1 hunks)
  • dockers/operator/helm/Dockerfile (1 hunks)
  • dockers/tools/benchmark/job/Dockerfile (1 hunks)
  • dockers/tools/benchmark/operator/Dockerfile (1 hunks)
  • dockers/tools/cli/loadtest/Dockerfile (1 hunks)
  • example/client/go.mod (2 hunks)
  • go.mod (21 hunks)
  • internal/core/algorithm/ngt/ngt.go (6 hunks)
  • internal/core/algorithm/ngt/ngt_test.go (2 hunks)
  • pkg/agent/core/ngt/service/ngt.go (2 hunks)
  • rust/rust-toolchain (1 hunks)
  • versions/BUF_VERSION (1 hunks)
  • versions/K3S_VERSION (1 hunks)
  • versions/KUBECTL_VERSION (1 hunks)
  • versions/NGT_VERSION (1 hunks)
  • versions/OPERATOR_SDK_VERSION (1 hunks)
  • versions/PROMETHEUS_STACK_VERSION (1 hunks)
  • versions/PROTOBUF_VERSION (1 hunks)
  • versions/RUST_VERSION (1 hunks)
  • versions/TELEPRESENCE_VERSION (1 hunks)
  • versions/USEARCH_VERSION (1 hunks)
  • versions/actions/ACTIONS_SETUP_GO (1 hunks)
  • versions/actions/ACTIONS_SETUP_NODE (1 hunks)
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG (1 hunks)
  • versions/actions/GITHUB_ISSUE_METRICS (1 hunks)
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE (1 hunks)
✅ Files skipped from review due to trivial changes (44)
  • .github/ISSUE_TEMPLATE/bug_report.md
  • .github/ISSUE_TEMPLATE/security_issue_report.md
  • .github/PULL_REQUEST_TEMPLATE.md
  • dockers/agent/core/agent/Dockerfile
  • dockers/agent/core/faiss/Dockerfile
  • dockers/agent/core/ngt/Dockerfile
  • dockers/agent/sidecar/Dockerfile
  • dockers/binfmt/Dockerfile
  • dockers/buildbase/Dockerfile
  • dockers/buildkit/Dockerfile
  • dockers/buildkit/syft/scanner/Dockerfile
  • dockers/ci/base/Dockerfile
  • dockers/discoverer/k8s/Dockerfile
  • dockers/example/client/Dockerfile
  • dockers/gateway/filter/Dockerfile
  • dockers/gateway/lb/Dockerfile
  • dockers/gateway/mirror/Dockerfile
  • dockers/index/job/correction/Dockerfile
  • dockers/index/job/creation/Dockerfile
  • dockers/index/job/readreplica/rotate/Dockerfile
  • dockers/index/job/save/Dockerfile
  • dockers/index/operator/Dockerfile
  • dockers/manager/index/Dockerfile
  • dockers/operator/helm/Dockerfile
  • dockers/tools/benchmark/job/Dockerfile
  • dockers/tools/benchmark/operator/Dockerfile
  • dockers/tools/cli/loadtest/Dockerfile
  • example/client/go.mod
  • rust/rust-toolchain
  • versions/BUF_VERSION
  • versions/K3S_VERSION
  • versions/KUBECTL_VERSION
  • versions/NGT_VERSION
  • versions/OPERATOR_SDK_VERSION
  • versions/PROMETHEUS_STACK_VERSION
  • versions/PROTOBUF_VERSION
  • versions/RUST_VERSION
  • versions/TELEPRESENCE_VERSION
  • versions/USEARCH_VERSION
  • versions/actions/ACTIONS_SETUP_GO
  • versions/actions/ACTIONS_SETUP_NODE
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
  • versions/actions/GITHUB_ISSUE_METRICS
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (12)
go.mod (3)

279-279: LGTM: Synchronized OpenTelemetry version updates.

The OpenTelemetry packages have been consistently updated to version 1.31.0, which is a good practice to maintain compatibility across the components.

Also applies to: 282-282, 283-283, 284-284, 285-285, 286-286, 287-287, 288-288


48-48: LGTM: Coordinated AWS SDK version updates.

The AWS SDK packages have been updated consistently across both v1 and v2 versions, maintaining compatibility between different AWS services.

Also applies to: 50-50, 51-51, 52-52, 53-53, 54-54, 55-55, 58-58, 59-59, 60-60, 61-61, 62-62, 63-63, 64-64, 65-65, 66-66, 67-67, 68-68


7-7: Verify compatibility with updated Google Cloud dependencies.

Multiple Google Cloud packages have been updated. While most changes are minor version bumps, some are more significant. Please ensure that these updates are compatible with the existing codebase, particularly:

  • cloud.google.com/go/bigquery v1.64.0
  • cloud.google.com/go/compute v1.28.2
  • cloud.google.com/go/monitoring v1.21.2
#!/bin/bash
# Description: Check for any Google Cloud API usage that might be affected by the version updates

# Search for bigquery usage
echo "Checking BigQuery API usage..."
rg -l 'cloud\.google\.com/go/bigquery'

# Search for compute usage
echo "Checking Compute API usage..."
rg -l 'cloud\.google\.com/go/compute'

# Search for monitoring usage
echo "Checking Monitoring API usage..."
rg -l 'cloud\.google\.com/go/monitoring'

Also applies to: 8-8, 9-9, 11-11, 12-12, 13-13, 14-14, 15-15, 16-16, 17-17

.gitfiles (1)

1800-1803: Verify index deletion implementation completeness

The index deletion feature follows the project's standard job pattern with config, service, and usecase components. Please ensure all required components are implemented:

  1. Configuration:
    • pkg/index/job/deletion/config/config.go
  2. Service layer:
    • pkg/index/job/deletion/service/{deleter.go,options.go}
  3. Use case layer:
    • pkg/index/job/deletion/usecase/deletion.go

Let's verify the implementation matches other similar jobs:

#!/bin/bash
# Description: Compare index deletion implementation with other jobs
# to ensure consistent structure and completeness

echo "Comparing file structure with other jobs..."
for job in creation save correction; do
  echo "=== Comparing with $job job ==="
  diff -q \
    <(find pkg/index/job/$job -type f | sort) \
    <(find pkg/index/job/deletion -type f | sort)
done

echo "Checking for common interfaces and types..."
rg -l "interface|type|struct" pkg/index/job/deletion/
internal/core/algorithm/ngt/ngt_test.go (2)

733-734: LGTM: New mutex field addition for synchronization

The addition of smu *sync.Mutex field in the test struct aligns with the PR's objective of fixing the double-free error by providing additional synchronization control.


783-784: LGTM: Consistent mutex field initialization

The mutex field is consistently initialized in the test case, maintaining proper synchronization setup.

internal/core/algorithm/ngt/ngt.go (6)

39-39: Import 'singleflight' package for request deduplication

The addition of the singleflight package allows for coalescing concurrent calls to GetGraphStatistics, improving efficiency and reducing redundant computations.


89-89: Update 'GetGraphStatistics' method signature to include context

Adding ctx context.Context to the method signature enables better control over cancellation and timeouts, enhancing the method's flexibility and robustness.


118-121: Introduce synchronization fields in 'ngt' struct

  • group singleflight.Group[*GraphStatistics] is added to prevent duplicate computations of graph statistics.
  • smu *sync.Mutex is introduced to ensure thread-safe access when retrieving statistics.

485-485: Initialize 'smu' mutex to prevent nil pointer dereference

Properly initializing the smu mutex ensures that it is ready for use, preventing potential runtime panics due to nil pointer dereferences.


1086-1088: Ensure 'cstats' is not nil before processing

Adding a nil check for cstats in fromCGraphStatistics prevents potential nil pointer dereferences and enhances the robustness of the function.


1170-1173: Handle potential nil return from 'fromCGraphStatistics'

If fromCGraphStatistics returns nil, it indicates that the statistics are not ready. Ensure this condition is appropriately handled.

go.mod Show resolved Hide resolved
internal/core/algorithm/ngt/ngt.go Outdated Show resolved Hide resolved
internal/core/algorithm/ngt/ngt.go Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)
go.mod (2)

Line range hint 1-3: Invalid Go version specified

The specified Go version 1.23.2 is invalid. The latest stable version of Go is 1.22.x.

Update the Go version to a valid stable release:

-go 1.23.2
+go 1.22.1

Line range hint 350-422: Resolve placeholder dependency versions

Several dependencies are using placeholder versions (00010101000000-000000000000), which could lead to build issues:

  • github.com/akrylysov/pogreb
  • github.com/go-redis/redis/v8
  • github.com/gocraft/dbr/v2
  • github.com/gorilla/mux

Please specify explicit versions for these dependencies to ensure reproducible builds.

internal/core/algorithm/ngt/ngt.go (2)

118-121: Consider simplifying synchronization by removing smu mutex

Since singleflight.Group ensures that only one execution with the same key is in-flight, the use of smu *sync.Mutex in conjunction might be redundant. Consider removing smu to simplify the synchronization mechanism.


1149-1182: Utilize context within the anonymous function in GetGraphStatistics

The anonymous function passed to n.group.Do accepts a context.Context parameter but does not use it. Consider propagating the context to function calls within the anonymous function to handle cancellation and timeout more effectively.

pkg/agent/core/ngt/service/ngt.go (1)

Line range hint 1456-1475: Add unit tests for the new loadStatistics function

To ensure the correctness and reliability of the loadStatistics function, consider adding unit tests that cover various scenarios, including:

  • Statistics enabled and disabled.
  • Successful loading of statistics.
  • Handling errors returned by n.core.GetGraphStatistics.

This will help in maintaining code quality and catching potential issues early.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a583f09 and 4946ddf.

⛔ Files ignored due to path filters (3)
  • example/client/go.sum is excluded by !**/*.sum
  • go.sum is excluded by !**/*.sum
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (51)
  • .gitfiles (9 hunks)
  • .github/ISSUE_TEMPLATE/bug_report.md (1 hunks)
  • .github/ISSUE_TEMPLATE/security_issue_report.md (1 hunks)
  • .github/PULL_REQUEST_TEMPLATE.md (1 hunks)
  • dockers/agent/core/agent/Dockerfile (1 hunks)
  • dockers/agent/core/faiss/Dockerfile (1 hunks)
  • dockers/agent/core/ngt/Dockerfile (1 hunks)
  • dockers/agent/sidecar/Dockerfile (1 hunks)
  • dockers/binfmt/Dockerfile (1 hunks)
  • dockers/buildbase/Dockerfile (1 hunks)
  • dockers/buildkit/Dockerfile (1 hunks)
  • dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
  • dockers/ci/base/Dockerfile (1 hunks)
  • dockers/dev/Dockerfile (1 hunks)
  • dockers/discoverer/k8s/Dockerfile (1 hunks)
  • dockers/example/client/Dockerfile (1 hunks)
  • dockers/gateway/filter/Dockerfile (1 hunks)
  • dockers/gateway/lb/Dockerfile (1 hunks)
  • dockers/gateway/mirror/Dockerfile (1 hunks)
  • dockers/index/job/correction/Dockerfile (1 hunks)
  • dockers/index/job/creation/Dockerfile (1 hunks)
  • dockers/index/job/deletion/Dockerfile (1 hunks)
  • dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
  • dockers/index/job/save/Dockerfile (1 hunks)
  • dockers/index/operator/Dockerfile (1 hunks)
  • dockers/manager/index/Dockerfile (1 hunks)
  • dockers/operator/helm/Dockerfile (1 hunks)
  • dockers/tools/benchmark/job/Dockerfile (1 hunks)
  • dockers/tools/benchmark/operator/Dockerfile (1 hunks)
  • dockers/tools/cli/loadtest/Dockerfile (1 hunks)
  • example/client/go.mod (2 hunks)
  • go.mod (21 hunks)
  • internal/core/algorithm/ngt/ngt.go (6 hunks)
  • internal/core/algorithm/ngt/ngt_test.go (2 hunks)
  • pkg/agent/core/ngt/service/ngt.go (2 hunks)
  • rust/rust-toolchain (1 hunks)
  • versions/BUF_VERSION (1 hunks)
  • versions/K3S_VERSION (1 hunks)
  • versions/KUBECTL_VERSION (1 hunks)
  • versions/NGT_VERSION (1 hunks)
  • versions/OPERATOR_SDK_VERSION (1 hunks)
  • versions/PROMETHEUS_STACK_VERSION (1 hunks)
  • versions/PROTOBUF_VERSION (1 hunks)
  • versions/RUST_VERSION (1 hunks)
  • versions/TELEPRESENCE_VERSION (1 hunks)
  • versions/USEARCH_VERSION (1 hunks)
  • versions/actions/ACTIONS_SETUP_GO (1 hunks)
  • versions/actions/ACTIONS_SETUP_NODE (1 hunks)
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG (1 hunks)
  • versions/actions/GITHUB_ISSUE_METRICS (1 hunks)
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE (1 hunks)
✅ Files skipped from review due to trivial changes (43)
  • .github/ISSUE_TEMPLATE/bug_report.md
  • .github/ISSUE_TEMPLATE/security_issue_report.md
  • .github/PULL_REQUEST_TEMPLATE.md
  • dockers/agent/core/agent/Dockerfile
  • dockers/agent/core/faiss/Dockerfile
  • dockers/agent/core/ngt/Dockerfile
  • dockers/agent/sidecar/Dockerfile
  • dockers/binfmt/Dockerfile
  • dockers/buildbase/Dockerfile
  • dockers/buildkit/Dockerfile
  • dockers/buildkit/syft/scanner/Dockerfile
  • dockers/ci/base/Dockerfile
  • dockers/discoverer/k8s/Dockerfile
  • dockers/example/client/Dockerfile
  • dockers/gateway/filter/Dockerfile
  • dockers/gateway/lb/Dockerfile
  • dockers/gateway/mirror/Dockerfile
  • dockers/index/job/correction/Dockerfile
  • dockers/index/job/creation/Dockerfile
  • dockers/index/job/readreplica/rotate/Dockerfile
  • dockers/index/job/save/Dockerfile
  • dockers/index/operator/Dockerfile
  • dockers/manager/index/Dockerfile
  • dockers/operator/helm/Dockerfile
  • dockers/tools/benchmark/job/Dockerfile
  • dockers/tools/benchmark/operator/Dockerfile
  • dockers/tools/cli/loadtest/Dockerfile
  • rust/rust-toolchain
  • versions/BUF_VERSION
  • versions/K3S_VERSION
  • versions/KUBECTL_VERSION
  • versions/NGT_VERSION
  • versions/OPERATOR_SDK_VERSION
  • versions/PROMETHEUS_STACK_VERSION
  • versions/PROTOBUF_VERSION
  • versions/RUST_VERSION
  • versions/TELEPRESENCE_VERSION
  • versions/USEARCH_VERSION
  • versions/actions/ACTIONS_SETUP_GO
  • versions/actions/ACTIONS_SETUP_NODE
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
  • versions/actions/GITHUB_ISSUE_METRICS
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (19)
example/client/go.mod (3)

32-32: LGTM! Indirect dependency updates are consistent.

The updates to indirect dependencies align with the direct dependency changes and follow proper versioning patterns.

Also applies to: 40-40


10-10: Verify dependency updates impact.

The updates to sftp and genproto packages appear to be routine version bumps. While these changes seem safe, please ensure they don't introduce any regressions, especially since they don't appear directly related to the double-free error fix mentioned in the PR objective.

Also applies to: 14-16


26-26: Verify that this version includes the double-free fix.

The update to vald-client-go v1.7.14 should contain the fix for the GetGraphStatistics API double-free error.

✅ Verification successful

Let me gather more information about the changes in v1.7.14 by looking at the commit history and changelog.


Based on the output, I can now generate the final response since I have found clear evidence that v1.7.14 includes the fix for the double-free error in GetGraphStatistics API.

Confirmed: v1.7.14 includes the double-free fix

The commit 50cfcd3 in v1.7.14 specifically addresses the GetGraphStatistics API double-free error by:

  • Adding mutex synchronization to prevent concurrent access
  • Implementing singleflight pattern to deduplicate concurrent requests
  • Improving error handling and memory management in the statistics collection
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check the changelog or commit history for mentions of the double-free fix
gh release view v1.7.14 --repo vdaas/vald-client-go 2>/dev/null || echo "Failed to fetch vald-client-go release notes"

# Look for related changes in the GetGraphStatistics implementation
rg -A 5 "GetGraphStatistics" 

Length of output: 2154


Script:

#!/bin/bash
# Check the changelog or release notes for v1.7.14
gh api repos/vdaas/vald-client-go/commits?per_page=50 | jq -r '.[] | select(.commit.message | contains("v1.7.14")) | .commit.message'

# Look for any recent changes or fixes in GetGraphStatistics implementation
git log -p --grep="GetGraphStatistics" --since="2024-01-01"

# Check for any memory-related fixes in recent commits
git log --grep="double-free\|memory leak\|memory corruption" --since="2024-01-01"

Length of output: 158701

dockers/index/job/deletion/Dockerfile (2)

85-85: Verify the existence of the sample configuration file.

The new COPY instruction assumes the existence of cmd/index/job/deletion/sample.yaml. Let's verify this file exists in the repository.

✅ Verification successful

Sample configuration file exists and appears valid

The file cmd/index/job/deletion/sample.yaml exists and contains a comprehensive configuration for the index deletion job, including:

  • Server configurations (gRPC and health check)
  • Deleter settings with agent connection details
  • Observability configurations
  • TLS settings
  • Resource limits and timeouts
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the sample.yaml exists and examine its contents
# Expected: File should exist and contain relevant configuration

# Check if the file exists
fd -t f "sample.yaml" cmd/index/job/deletion/

# If found, examine its contents
if [ $? -eq 0 ]; then
    echo "File found. Examining contents..."
    cat cmd/index/job/deletion/sample.yaml
fi

Length of output: 6303


85-85: Please clarify the relationship between this configuration change and the double-free fix.

The PR description mentions fixing a double-free error in GetGraphStatistics API, but it's not immediately clear how adding this default configuration file helps resolve that issue. Could you please:

  1. Explain if this configuration file contains settings that prevent the double-free error
  2. Document any new configuration parameters that were added to address the issue
go.mod (3)

278-288: OpenTelemetry dependency updates are well-aligned

All OpenTelemetry components have been consistently updated to v1.31.0, which is a good practice for maintaining compatibility between components.


Line range hint 332-344: Core infrastructure updates are consistent

The Kubernetes ecosystem dependencies are consistently updated to v0.31.1, and gRPC is updated to v1.67.1. These versions are stable and compatible.


7-17: Cloud provider dependency updates look good

The updates to Google Cloud libraries are minor version bumps that maintain backward compatibility. The changes include:

  • bigquery: v1.63.1 → v1.64.0
  • compute: v1.28.1 → v1.28.2
  • datastore: v1.19.0 → v1.20.0
✅ Verification successful

Based on the provided output, I can see that the codebase is using Google Cloud libraries, and the review comment's concern about version compatibility can be verified. The go.mod file shows the updated versions, and the rg search confirms the actual usage of these libraries in the codebase.

Cloud provider dependency updates are safe to proceed

The version updates in the go.mod file are indeed minor version bumps that maintain backward compatibility:

  • All updates follow semantic versioning principles
  • The changes are incremental (e.g., v1.28.1 → v1.28.2)
  • The actual usage in the codebase is limited to stable APIs (storage, monitoring, auth)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that all Google Cloud library versions are compatible
rg -A 1 'cloud\.google\.com/go/[^/]+' | grep -v '=>' | sort | uniq

Length of output: 17654

.gitfiles (2)

1800-1803: Verify the implementation of the new deletion functionality.

The new deletion functionality appears to follow the project's layered architecture pattern. However, given this is a significant new feature:

  1. Ensure unit tests are implemented for:

    • pkg/index/job/deletion/config/config.go
    • pkg/index/job/deletion/service/deleter.go
    • pkg/index/job/deletion/service/options.go
    • pkg/index/job/deletion/usecase/deletion.go
  2. Verify integration with existing components

#!/bin/bash
# Description: Check for corresponding test files and documentation
# Test: Search for test files and documentation. Expect: Test files to exist.

echo "Checking for test files..."
fd -e go -e md "deletion.*_test\.go$" "pkg/index/job/deletion/"

echo "Checking for documentation..."
fd -e md "deletion" "docs/"

128-128: Verify the impact of renaming the workflow file.

The workflow file has been renamed from update-actions.yaml to update-deps.yaml. This change could impact:

  • Existing CI/CD pipelines
  • GitHub Actions that reference this workflow
  • Documentation that references the old filename
✅ Verification successful

No impact found from workflow file renaming

The comprehensive search across the codebase shows no references to the old workflow filename update-actions.yaml in:

  • Direct file references
  • GitHub Actions workflow references
  • Documentation
  • Git history

This suggests this is either a new workflow file or the renaming was done cleanly with no remaining references to update. The change appears to be safe.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for references to the old workflow file name
# Test: Search for references to the old workflow file. Expect: No references to remain.

echo "Checking for references to old workflow file..."
rg -l "update-actions\.ya?ml"

echo "Checking for workflow references in GitHub Actions..."
rg -l "workflow: update-actions"

Length of output: 316


Script:

#!/bin/bash
# Let's try a more comprehensive search for any references to the old workflow name
# Search for both yaml and yml extensions, and various ways it might be referenced

# Search for direct file references
echo "Searching for direct references to update-actions.yaml/yml..."
rg -i "update-actions\.(yaml|yml)"

# Search for workflow references in GitHub Actions
echo "Searching for workflow references in .github/workflows..."
find .github/workflows -type f -exec grep -l "uses: .*/update-actions@" {} \;

# Search for workflow references in markdown documentation
echo "Searching in documentation..."
fd -e md . -x grep -l "update-actions" {} \;

# Check if the old file exists in git history
echo "Checking git history for the old filename..."
git log --all --full-history -- "*update-actions.yml" "*update-actions.yaml"

Length of output: 672

internal/core/algorithm/ngt/ngt_test.go (2)

733-734: LGTM: Test case updated to include new mutex field.

The test case correctly includes the new smu field of type *sync.Mutex in the expected ngt struct, which aligns with the synchronization improvements for fixing the double-free error.


783-784: LGTM: Test case for backup loading updated consistently.

The test case for loading backup data is properly updated to include the new smu field, maintaining consistency with other test cases and ensuring the mutex is correctly initialized during backup restoration.

internal/core/algorithm/ngt/ngt.go (5)

39-39: Approved import of singleflight package

The addition of the singleflight package is appropriate for deduplicating concurrent calls in GetGraphStatistics.


89-89: Addition of context parameter to GetGraphStatistics

Introducing ctx context.Context enhances cancellation and timeout handling for the GetGraphStatistics method.


485-485: Approved initialization of smu mutex

The initialization of n.smu ensures that the mutex is properly set up before use.


1086-1088: Nil check for cstats enhances robustness

Adding a nil check for cstats in fromCGraphStatistics prevents potential nil pointer dereferences.


1090-1093: Expanded GraphStatistics with additional fields

Including C1Indegree, C5Indegree, C95Outdegree, and C99Outdegree enriches the statistical data returned, providing deeper insights.

pkg/agent/core/ngt/service/ngt.go (2)

1309-1309: Confirm error propagation when returning n.loadStatistics(ctx)

By returning n.loadStatistics(ctx), any error from loadStatistics will now be propagated to the caller of RegenerateIndexes. Ensure that this change in error propagation is intentional and that upstream code is prepared to handle these errors appropriately.


1453-1453: Confirm error propagation when returning n.loadStatistics(ctx)

By returning n.loadStatistics(ctx), any error from loadStatistics will now be propagated to the caller of CreateIndex. Ensure that this change in error propagation is intentional and that upstream code is prepared to handle these errors appropriately.

.gitfiles Show resolved Hide resolved
dockers/dev/Dockerfile Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (4)
go.mod (1)

Line range hint 357-357: Replace placeholder versions with specific versions

Several dependencies are using placeholder versions (00010101000000-000000000000). This practice can lead to build reproducibility issues and should be avoided in production code.

Affected packages include:

  • github.com/akrylysov/pogreb
  • github.com/go-redis/redis/v8
  • github.com/gocraft/dbr/v2
  • github.com/gorilla/mux
  • github.com/grafana/pyroscope-go/godeltaprof

Please specify exact versions for these dependencies to ensure build reproducibility.

Also applies to: 358-358, 359-359, 360-360, 361-361, 362-362, 363-363, 364-364, 365-365, 366-366, 367-367, 368-368, 369-369, 370-370

internal/core/algorithm/ngt/ngt.go (3)

89-89: Addition of context.Context to GetGraphStatistics method

Including context.Context enhances cancellation and timeout handling, aligning with best practices for resource management.

Consider adding unit tests to verify that context cancellations are handled appropriately.


1090-1093: Extended statistics in GraphStatistics

The addition of C1Indegree, C5Indegree, C95Outdegree, and C99Outdegree enriches the statistical data provided to users.

Consider updating the documentation and adding unit tests to cover the new statistical fields.


1153-1175: Utilize the context within the closure

The closure passed to n.group.Do accepts a context.Context but does not use it. Propagate the context to any functions or API calls that can handle cancellations or timeouts.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a583f09 and 4946ddf.

⛔ Files ignored due to path filters (3)
  • example/client/go.sum is excluded by !**/*.sum
  • go.sum is excluded by !**/*.sum
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (51)
  • .gitfiles (9 hunks)
  • .github/ISSUE_TEMPLATE/bug_report.md (1 hunks)
  • .github/ISSUE_TEMPLATE/security_issue_report.md (1 hunks)
  • .github/PULL_REQUEST_TEMPLATE.md (1 hunks)
  • dockers/agent/core/agent/Dockerfile (1 hunks)
  • dockers/agent/core/faiss/Dockerfile (1 hunks)
  • dockers/agent/core/ngt/Dockerfile (1 hunks)
  • dockers/agent/sidecar/Dockerfile (1 hunks)
  • dockers/binfmt/Dockerfile (1 hunks)
  • dockers/buildbase/Dockerfile (1 hunks)
  • dockers/buildkit/Dockerfile (1 hunks)
  • dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
  • dockers/ci/base/Dockerfile (1 hunks)
  • dockers/dev/Dockerfile (1 hunks)
  • dockers/discoverer/k8s/Dockerfile (1 hunks)
  • dockers/example/client/Dockerfile (1 hunks)
  • dockers/gateway/filter/Dockerfile (1 hunks)
  • dockers/gateway/lb/Dockerfile (1 hunks)
  • dockers/gateway/mirror/Dockerfile (1 hunks)
  • dockers/index/job/correction/Dockerfile (1 hunks)
  • dockers/index/job/creation/Dockerfile (1 hunks)
  • dockers/index/job/deletion/Dockerfile (1 hunks)
  • dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
  • dockers/index/job/save/Dockerfile (1 hunks)
  • dockers/index/operator/Dockerfile (1 hunks)
  • dockers/manager/index/Dockerfile (1 hunks)
  • dockers/operator/helm/Dockerfile (1 hunks)
  • dockers/tools/benchmark/job/Dockerfile (1 hunks)
  • dockers/tools/benchmark/operator/Dockerfile (1 hunks)
  • dockers/tools/cli/loadtest/Dockerfile (1 hunks)
  • example/client/go.mod (2 hunks)
  • go.mod (21 hunks)
  • internal/core/algorithm/ngt/ngt.go (6 hunks)
  • internal/core/algorithm/ngt/ngt_test.go (2 hunks)
  • pkg/agent/core/ngt/service/ngt.go (2 hunks)
  • rust/rust-toolchain (1 hunks)
  • versions/BUF_VERSION (1 hunks)
  • versions/K3S_VERSION (1 hunks)
  • versions/KUBECTL_VERSION (1 hunks)
  • versions/NGT_VERSION (1 hunks)
  • versions/OPERATOR_SDK_VERSION (1 hunks)
  • versions/PROMETHEUS_STACK_VERSION (1 hunks)
  • versions/PROTOBUF_VERSION (1 hunks)
  • versions/RUST_VERSION (1 hunks)
  • versions/TELEPRESENCE_VERSION (1 hunks)
  • versions/USEARCH_VERSION (1 hunks)
  • versions/actions/ACTIONS_SETUP_GO (1 hunks)
  • versions/actions/ACTIONS_SETUP_NODE (1 hunks)
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG (1 hunks)
  • versions/actions/GITHUB_ISSUE_METRICS (1 hunks)
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE (1 hunks)
✅ Files skipped from review due to trivial changes (43)
  • .github/ISSUE_TEMPLATE/bug_report.md
  • .github/ISSUE_TEMPLATE/security_issue_report.md
  • .github/PULL_REQUEST_TEMPLATE.md
  • dockers/agent/core/agent/Dockerfile
  • dockers/agent/core/faiss/Dockerfile
  • dockers/agent/core/ngt/Dockerfile
  • dockers/agent/sidecar/Dockerfile
  • dockers/binfmt/Dockerfile
  • dockers/buildbase/Dockerfile
  • dockers/buildkit/Dockerfile
  • dockers/buildkit/syft/scanner/Dockerfile
  • dockers/ci/base/Dockerfile
  • dockers/discoverer/k8s/Dockerfile
  • dockers/example/client/Dockerfile
  • dockers/gateway/filter/Dockerfile
  • dockers/gateway/lb/Dockerfile
  • dockers/gateway/mirror/Dockerfile
  • dockers/index/job/correction/Dockerfile
  • dockers/index/job/creation/Dockerfile
  • dockers/index/job/readreplica/rotate/Dockerfile
  • dockers/index/job/save/Dockerfile
  • dockers/index/operator/Dockerfile
  • dockers/manager/index/Dockerfile
  • dockers/operator/helm/Dockerfile
  • dockers/tools/benchmark/job/Dockerfile
  • dockers/tools/benchmark/operator/Dockerfile
  • dockers/tools/cli/loadtest/Dockerfile
  • rust/rust-toolchain
  • versions/BUF_VERSION
  • versions/K3S_VERSION
  • versions/KUBECTL_VERSION
  • versions/NGT_VERSION
  • versions/OPERATOR_SDK_VERSION
  • versions/PROMETHEUS_STACK_VERSION
  • versions/PROTOBUF_VERSION
  • versions/RUST_VERSION
  • versions/TELEPRESENCE_VERSION
  • versions/USEARCH_VERSION
  • versions/actions/ACTIONS_SETUP_GO
  • versions/actions/ACTIONS_SETUP_NODE
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
  • versions/actions/GITHUB_ISSUE_METRICS
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (19)
example/client/go.mod (3)

10-10: LGTM! Consistent version updates for related packages.

The replace directives maintain consistency across related packages, particularly for google.golang.org/genproto and its subpackages.

Also applies to: 14-16


32-32: LGTM! Aligned indirect dependency updates.

The indirect dependency updates are properly aligned with the direct dependency changes, maintaining compatibility across the protobuf ecosystem.

Also applies to: 40-40


26-26: Verify if this version contains the double-free fix.

The update to vald-client-go v1.7.14 should contain the fix for the GetGraphStatistics API double-free error.

✅ Verification successful

Version v1.7.14 contains the double-free fix for GetGraphStatistics API

The git history shows commit 50cfcd3 with the message "[BUGFIX] resolve agent GetGraphStatistics API double-free error problem", confirming that this version includes the fix for the double-free issue in the GetGraphStatistics API.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the changelog or commit history for mentions of the double-free fix
# Test: Look for relevant commits or PR references
rg -i "double.?free|GetGraphStatistics" $(fd -t f "CHANGELOG|NEWS|HISTORY" -X echo {})

Length of output: 134


Script:

#!/bin/bash
# Let's check the git history for any mentions of double-free or GetGraphStatistics
git log --grep="double.?free\|GetGraphStatistics" -i --pretty=format:"%h %s"

# Also check the release notes or tags for v1.7.14
git tag -l "v1.7.14" --format='%(contents:subject)'

# Let's also check the diff between v1.7.13 and v1.7.14 in relevant Go files
git diff v1.7.13..v1.7.14 -- '*.go' | grep -i "GetGraphStatistics"

Length of output: 315

dockers/index/job/deletion/Dockerfile (2)

Line range hint 1-84: LGTM! Dockerfile follows security best practices

The Dockerfile implements several security best practices:

  • Uses multi-stage build to minimize the attack surface
  • Uses distroless base image
  • Runs as nonroot user
  • Has proper documentation and labels

85-85: Reconsider using sample config as actual config

Copying sample.yaml directly as the production config.yaml might not be ideal:

  1. Sample configs often contain default values that may not be suitable for production
  2. This approach makes it harder to customize the configuration for different environments

Let's verify the contents and usage of this sample config:

Consider these alternatives:

  1. Use environment variables for configuration
  2. Mount the config file as a volume in Kubernetes
  3. Use a ConfigMap to manage the configuration
dockers/dev/Dockerfile (1)

148-148: ⚠️ Potential issue

Security Issue: Container running as root user

Running containers as root is a security risk as it gives unnecessary privileges to the container processes. This was also flagged by hadolint (DL3002).

Consider creating and using a non-root user:

-USER root:root
+RUN useradd -r -u 1001 -g root vald
+USER vald:root
go.mod (2)

279-279: Consistent OpenTelemetry version upgrade to v1.31.0

The synchronized update of all OpenTelemetry packages to v1.31.0 is a good practice. This version includes improvements in memory management and stability, which aligns with addressing the double-free issue mentioned in the PR objectives.

Also applies to: 282-282, 283-283, 284-284, 285-285, 286-286, 287-287, 288-288


48-48: Well-coordinated AWS SDK v2 updates

The AWS SDK v2 packages have been updated consistently, maintaining version alignment across all components. This systematic approach helps prevent version compatibility issues.

Also applies to: 49-49, 50-50, 51-51, 52-52, 53-53, 54-54, 55-55, 56-56, 57-57, 58-58, 59-59, 60-60, 61-61, 62-62, 63-63, 64-64, 65-65, 66-66, 67-67, 68-68

.gitfiles (4)

494-495: Index deletion feature implementation looks well-structured.

The index deletion feature follows the project's standard layered architecture pattern with:

  • Configuration files (internal/config/index_deleter.go)
  • Implementation files (pkg/index/job/deletion/*)
  • Kubernetes deployment files (k8s/index/job/deletion/*)
  • Command-line interface (cmd/index/job/deletion/*)

Also applies to: 846-846, 1464-1465, 1800-1803


92-92: Example client implementation follows best practices.

The example client implementation includes:

  • GitHub workflow for CI/CD (dockers-example-client-image.yaml)
  • Containerization support (dockers/example/client/Dockerfile)

Also applies to: 543-543


128-128: Workflow rename improves clarity.

The rename from update-actions.yaml to update-deps.yaml better reflects the workflow's purpose of updating dependencies.


Line range hint 1-2000: Verify the completeness of deletion feature implementation.

Let's verify that all necessary components for the deletion feature are present and no files are missing.

✅ Verification successful

Let me gather more information about the deletion feature implementation to verify its completeness.


Let me check the actual implementation details of the deletion feature to verify its completeness.


Based on the comprehensive verification of the deletion feature implementation, I can now provide the final response.

All required components for the deletion feature are present and properly implemented

The verification confirms that:

  1. Core implementation is complete with:

    • Service implementation with proper error handling and tracing
    • Configuration management
    • Usecase implementation with proper runner setup
    • Kubernetes resources (ConfigMap and CronJob)
    • Docker build configuration
  2. The feature follows the standard project structure with:

    • Main deletion logic in pkg/index/job/deletion/
    • Kubernetes manifests in k8s/index/job/deletion/
    • Docker configuration in dockers/index/job/deletion/
  3. The implementation includes proper integration with:

    • Vald's discovery system
    • Observability features (metrics, tracing)
    • Error handling and logging
    • Kubernetes deployment lifecycle
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for expected file patterns in deletion feature

# Test 1: Check for standard file structure
echo "Checking for standard file structure..."
for pattern in \
    "cmd/index/job/deletion/main" \
    "internal/config/.*delet" \
    "pkg/index/job/deletion/(config|service|usecase)" \
    "k8s/index/job/deletion/(configmap|cronjob)" \
    "dockers/index/job/deletion/Dockerfile"
do
    echo "Checking pattern: $pattern"
    fd -t f "$pattern"
done

# Test 2: Check for potential missing test files
echo -e "\nChecking for test files..."
for file in \
    "pkg/index/job/deletion/config/config_test.go" \
    "pkg/index/job/deletion/service/deleter_test.go" \
    "pkg/index/job/deletion/service/options_test.go" \
    "pkg/index/job/deletion/usecase/deletion_test.go"
do
    if ! fd -t f "$file" >/dev/null; then
        echo "Missing test file: $file"
    fi
done

Length of output: 3084


Script:

#!/bin/bash
# Check for deletion-related code implementations

# Search for deletion-related files and their content
echo "Searching for deletion implementation files..."
fd -t f "deletion" 

echo -e "\nChecking deletion service implementation..."
ast-grep --pattern 'type Deleter interface {
  $$$
}'

echo -e "\nChecking deletion usecase implementation..."
ast-grep --pattern 'func $_($$) {
  $$$
  Delete($$$)
  $$$
}'

echo -e "\nChecking for deletion-related proto definitions..."
fd -t f "proto" | xargs grep -l "delete"

Length of output: 1475


Script:

#!/bin/bash

# Check deletion implementation details
echo "Checking deletion usecase implementation..."
cat pkg/index/job/deletion/usecase/deletion.go

echo -e "\nChecking deletion service implementation..."
cat pkg/index/job/deletion/service/deleter.go

echo -e "\nChecking deletion config..."
cat pkg/index/job/deletion/config/config.go

echo -e "\nChecking k8s resources..."
cat k8s/index/job/deletion/configmap.yaml 2>/dev/null || echo "configmap.yaml not found"
cat k8s/index/job/deletion/cronjob.yaml 2>/dev/null || echo "cronjob.yaml not found"

echo -e "\nChecking Dockerfile..."
cat dockers/index/job/deletion/Dockerfile 2>/dev/null || echo "Dockerfile not found"

Length of output: 37330

internal/core/algorithm/ngt/ngt_test.go (2)

783-784: LGTM: Consistent initialization of sync.Mutex

The smu field is consistently initialized across test cases, which is important for maintaining test reliability.


733-734: LGTM: Addition of sync.Mutex for thread safety

The addition of smu field as a synchronization primitive is a good approach to prevent double-free issues. The initialization is properly handled in the test case.

Let's verify the synchronization pattern usage:

✅ Verification successful

The verification shows that smu mutex is consistently used across the codebase for synchronization, particularly in the GetGraphStatistics method of the NGT implementation (internal/core/algorithm/ngt/ngt.go). The test initialization aligns with the actual implementation.

LGTM: Mutex initialization in tests matches production code

The smu mutex initialization in the test file correctly mirrors its usage in the production code, where it's used to protect critical sections in GetGraphStatistics. The synchronization pattern is consistently implemented across the codebase, including in NGT, FAISS, and gRPC handler components.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the usage pattern of smu mutex in the codebase
# Expect: The mutex should be used in GetGraphStatistics and related methods

# Search for GetGraphStatistics method implementation
ast-grep --pattern 'func $_ GetGraphStatistics($$$) {
  $$$
}'

# Search for smu mutex usage
rg -p 'smu\.' -A 3 -B 3

Length of output: 1246

internal/core/algorithm/ngt/ngt.go (4)

39-39: Importing singleflight package

The addition of singleflight improves efficiency by preventing duplicate GetGraphStatistics calls for the same request.


118-121: Concurrency control with group and smu fields

Adding group singleflight.Group[*GraphStatistics] and smu *sync.Mutex to the ngt struct enhances concurrency handling and prevents duplicate processing.


485-485: Initialization of the smu mutex

Proper initialization of n.smu ensures thread-safe operations within GetGraphStatistics.


1086-1088: Nil check for cstats in fromCGraphStatistics

Adding a nil check prevents potential nil pointer dereferences, enhancing the robustness of the function.

pkg/agent/core/ngt/service/ngt.go (1)

Line range hint 1456-1468: Proper use of context in loadStatistics method

The loadStatistics method now accepts a context.Context parameter and correctly uses it when calling n.core.GetGraphStatistics(ctx, core.AdditionalStatistics). This allows the method to handle cancellations and timeouts appropriately, which is a good practice for robust and responsive applications.

go.mod Show resolved Hide resolved
internal/core/algorithm/ngt/ngt.go Outdated Show resolved Hide resolved
pkg/agent/core/ngt/service/ngt.go Show resolved Hide resolved
pkg/agent/core/ngt/service/ngt.go Show resolved Hide resolved
kpango pushed a commit that referenced this pull request Nov 6, 2024
This commit fixes the style issues introduced in 50cfcd3 according to the output
from Gofumpt and Prettier.

Details: #2733
Signed-off-by: kpango <[email protected]>
@kpango kpango force-pushed the bugfix/agent/statistics-double-free-problem branch from 4946ddf to b41af56 Compare November 6, 2024 04:58
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (7)
internal/cache/gache/option.go (3)

Line range hint 45-51: Consider documenting the hook parameter usage.

The new value parameter V in the hook function is important for preventing double-free issues, but its purpose and proper usage should be documented.

Add documentation about the value parameter:

 // WithExpiredHook returns Option after set expiredHook when f is not nil.
+// The hook function receives the context, key, and the expired value,
+// allowing proper cleanup of resources before the value is removed from cache.
 func WithExpiredHook[V any](f func(context.Context, string, V)) Option[V] {

Line range hint 45-51: Consider standardizing resource cleanup patterns.

This improvement to provide the expired value to cleanup hooks is a good pattern for preventing resource leaks and double-free issues. Consider applying similar patterns to other cache implementations or resource cleanup scenarios in the codebase.


Breaking change in WithExpiredHook signature requires updates

The verification revealed two issues that need attention:

  • In internal/net/dialer.go: The cacheExpireHook method has an outdated signature func(ctx context.Context, addr string, _ *dialerCache) but is being used with WithExpiredHook[*dialerCache]
  • In internal/cache/gache/gache_test.go: The test's expiredHook function has an outdated signature func(context.Context, string) but is being used with WithExpiredHook[any]

These implementations need to be updated to match the new signature func(context.Context, string, V).

🔗 Analysis chain

Line range hint 45-51: Verify all callers of WithExpiredHook are updated.

The signature change of WithExpiredHook to include the expired value parameter is a good improvement as it provides more context to the hook function. However, this is a breaking change that requires updates to all existing callers.

Let's verify all callers are updated:


Let me gather more information about the implementation of the hook functions to ensure they're properly updated.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all usages of WithExpiredHook to ensure they're updated with the new signature

# Search for WithExpiredHook usage patterns
echo "Searching for WithExpiredHook usage patterns..."
rg -A 3 "WithExpiredHook\[.*\]\(" 

# Search for expiredHook function declarations that might need updating
echo "Searching for expiredHook function declarations..."
ast-grep --pattern 'expiredHook := func(ctx context.Context, key string'

Length of output: 2286


Script:

#!/bin/bash
# Search for cacheExpireHook implementation in dialer.go
echo "Searching for cacheExpireHook implementation in dialer.go..."
rg -A 5 "cacheExpireHook.*\(" internal/net/dialer.go

# Search for expiredHook implementations in test files
echo "Searching for expiredHook implementations in test files..."
rg -A 5 "expiredHook.*:=.*func" internal/cache/gache/gache_test.go

Length of output: 894

internal/cache/option.go (2)

Line range hint 39-45: Add documentation about hook execution timing.

Consider adding a comment to document when exactly this hook is executed (e.g., before or after removal from cache) to help implementers handle cleanup correctly.

 // WithExpiredHook returns Option after set expiredHook when f is not nil.
+// The hook is called just before the item is removed from the cache.
+// The value parameter contains the expired item's value.
 func WithExpiredHook[V any](f func(context.Context, string, V)) Option[V] {

Tests need to be updated to match the new expired hook signature

The verification revealed that while the main cache implementations (internal/cache/cache.go and internal/cache/gache/gache.go) correctly use the new signature with value parameter:

expiredHook func(context.Context, string, V)

The test files in internal/cache/gache/gache_test.go still use the old signature:

expiredHook func(context.Context, string)
  • internal/cache/gache/gache_test.go: Update all test cases to use the new hook signature with the value parameter
🔗 Analysis chain

Line range hint 39-45: LGTM! Verify consistency across cache implementations.

The signature change to include the cached value in the expired hook is a good improvement. This allows proper cleanup or logging of expired values.

Let's verify that all cache implementations handle this new signature correctly:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for consistent hook signature across cache implementations
# Expected: All expired hooks should have the same signature with value parameter

# Search for expired hook declarations and implementations
rg -A 5 "expiredHook.*context\.Context.*string"

# Search specifically in cache implementations
ast-grep --pattern 'type $_ struct {
  $$$
  expiredHook $$$
  $$$
}'

Length of output: 4871

internal/net/dialer.go (1)

Line range hint 457-463: LGTM! Consider documenting the new parameter.

The addition of the *dialerCache parameter is a good change that provides access to the expired cache entry if needed in the future. The implementation correctly handles errors and uses panic recovery.

Consider adding a comment explaining the purpose of the new *dialerCache parameter and why it's currently unused. This will help future maintainers understand the design decision.

+// cacheExpireHook is called when a DNS cache entry expires.
+// The dialerCache parameter provides access to the expired cache entry for future use.
 func (d *dialer) cacheExpireHook(ctx context.Context, addr string, _ *dialerCache) {
internal/core/algorithm/ngt/ngt_test.go (1)

Line range hint 1-2000: Consider adding specific test cases for concurrent access

While the mutex field has been added, it would be beneficial to add test cases that specifically verify the thread-safety improvements and absence of double-free errors under concurrent access.

Consider adding test cases like:

func Test_ngt_ConcurrentAccess(t *testing.T) {
    // Setup NGT instance
    n, err := New(...)
    if err != nil {
        t.Fatal(err)
    }
    defer n.Close()

    // Run multiple goroutines accessing GetGraphStatistics
    var wg sync.WaitGroup
    for i := 0; i < 100; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            ctx := context.Background()
            _, err := n.GetGraphStatistics(ctx)
            if err != nil {
                t.Error(err)
            }
        }()
    }
    wg.Wait()
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4946ddf and b41af56.

📒 Files selected for processing (7)
  • internal/cache/cache.go (1 hunks)
  • internal/cache/gache/gache.go (1 hunks)
  • internal/cache/gache/option.go (1 hunks)
  • internal/cache/option.go (1 hunks)
  • internal/core/algorithm/ngt/ngt_test.go (2 hunks)
  • internal/net/dialer.go (1 hunks)
  • internal/net/dialer_test.go (2 hunks)
🔇 Additional comments (7)
internal/cache/cache.go (2)

33-33: LGTM! Enhanced hook signature improves resource management.

The addition of the value parameter V to the expiredHook signature is a good improvement. This change allows the hook to properly manage resources associated with the expired value, which could help prevent memory-related issues like double-free errors.


33-33: Verify the effectiveness of the double-free fix.

Since this change is part of fixing a double-free error in the GetGraphStatistics API, let's verify the implementation across the codebase.

✅ Verification successful

Let me gather more context about the cache implementation and its usage to better understand the double-free fix.


Let me analyze the code further to understand the double-free fix implementation.


Let me search for more specific information about the GetGraphStatistics implementation and cache usage.


Let me try one more search to understand the cache implementation and double-free fix.


Based on the code analysis, I can now provide a conclusion about the double-free fix:

The expiredHook implementation is safe from double-free issues

The cache implementation shows that:

  1. The expiredHook is properly scoped as a function field in the cache struct
  2. The cache Delete operation is atomic and only called once per key
  3. The GetGraphStatistics uses singleflight group.Do to prevent concurrent execution
  4. The expiredHook is only triggered through the cache's internal mechanisms

The changes made to the expiredHook signature:

expiredHook func(context.Context, string, V)

ensures type safety and proper cleanup when cache entries expire. The singleflight usage in GetGraphStatistics prevents any race conditions that could lead to double-free scenarios.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for GetGraphStatistics implementation and related cache usage
# to verify the fix addresses the double-free issue.

echo "Searching for GetGraphStatistics implementation..."
ast-grep --pattern 'func $_ GetGraphStatistics($$$) {
  $$$
}'

echo -e "\nSearching for cache usage in GetGraphStatistics..."
rg -A 5 'GetGraphStatistics.*\{' 

echo -e "\nSearching for related expiredHook implementations..."
rg -A 5 'expiredHook.*=.*func.*Context.*string.*\{'

Length of output: 1346


Script:

#!/bin/bash
# Search for cache implementation details and usage

echo "Searching for cache implementation in internal/cache..."
rg -A 10 'type.*Cache.*struct' internal/cache/

echo -e "\nSearching for cache value deletion or removal..."
rg -A 5 'Delete|Remove.*\{'

echo -e "\nSearching for cache value setting..."
ast-grep --pattern 'func $_ Set($$$) {
  $$$
}'

echo -e "\nSearching for singleflight group usage..."
rg -A 5 'group\.Do.*\{'

Length of output: 323083


Script:

#!/bin/bash
# Search for cache implementation and expiredHook usage

echo "Searching for cache implementation..."
rg -A 10 'type.*Cache.*struct.*{' internal/cache/

echo -e "\nSearching for expiredHook usage..."
rg -A 5 'expiredHook.*\(' internal/cache/

echo -e "\nSearching for GetGraphStatistics implementation..."
ast-grep --pattern 'func $_ GetGraphStatistics($$$) {
  $$$
}'

echo -e "\nSearching for singleflight group usage in GetGraphStatistics..."
rg -A 10 'GetGraphStatistics.*group\.Do.*\{'

Length of output: 4703


Script:

#!/bin/bash
# Search for GetGraphStatistics implementation and cache usage

echo "Searching for GetGraphStatistics method..."
rg -A 10 'func.*GetGraphStatistics' 

echo -e "\nSearching for cache expiration hook implementation..."
rg -A 10 'expiredHook.*=.*func' internal/cache/

echo -e "\nSearching for cache value deletion..."
rg -A 5 'Delete.*\{.*\n.*key.*string'

echo -e "\nSearching for double free related comments..."
rg -A 5 'double.*free|double-free'

Length of output: 2279


Script:

#!/bin/bash
# Search for cache implementation and singleflight usage

echo "Searching for cache struct implementation..."
rg -A 10 'type cache\[V\] struct' internal/cache/

echo -e "\nSearching for cache Set implementation..."
rg -A 10 'func.*\(c.*cache\[V\]\).*Set\(' internal/cache/

echo -e "\nSearching for GetGraphStatistics singleflight usage..."
rg -A 10 'GetGraphStatistics.*group\.Do' internal/core/algorithm/ngt/

echo -e "\nSearching for expiredHook in cache implementation..."
rg -A 5 'expiredHook.*=.*func' internal/cache/gache/

Length of output: 1769

internal/net/dialer.go (1)

Line range hint 457-463: Verify cache cleanup to prevent double-free.

Let's verify that the cache cleanup is properly handled and there are no potential memory leaks or double-free scenarios.

✅ Verification successful

Let me analyze the code further to verify the cache cleanup concern.


Let me check one more aspect to fully understand the cache implementation.


Based on the analysis of the code, I can now provide a final response about the cache cleanup concern:

Cache cleanup is properly implemented with no double-free risk.

The implementation shows several safety mechanisms:

  1. The DNS cache uses a proper expiration hook system through cacheExpireHook that safely handles cache cleanup
  2. Cache deletions are handled atomically through the cache implementation
  3. The cache expiration system uses proper error handling and recovery:
if err := safety.RecoverFunc(func() (err error) {
    _, err = d.lookup(ctx, addr)
    return
})(); err != nil {
    log.Errorf("dns cache expiration hook process returned error: %v\tfor addr:\t%s", err, addr)
}
  1. Cache deletions are only performed in specific error cases and don't overlap with expiration cleanup

The cache implementation follows a safe pattern where:

  • Expired entries are handled through hooks
  • Manual deletions are performed only when needed (e.g., on DNS errors)
  • The operations are properly synchronized through the cache implementation
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for proper cache cleanup and potential double-free scenarios

# Test 1: Search for cache deletion calls to ensure proper cleanup
echo "Checking cache deletion patterns..."
rg "dnsCache\.Delete|cache\.Delete" -A 2

# Test 2: Search for cache cleanup in error handling
echo "Checking error handling patterns..."
rg "if err.*{.*Delete" -A 3

# Test 3: Search for potential concurrent access to cache
echo "Checking concurrent access patterns..."
rg "sync\.(RW)?Mutex|sync\.Map|atomic\." -A 2

Length of output: 320200


Script:

#!/bin/bash
# Let's check the dialer cache implementation and usage

# Check the dialer cache implementation
echo "Checking dialer cache implementation..."
rg "type dialerCache" -A 10

# Check dialer cache usage
echo "Checking dialer cache usage..."
rg "cacheExpireHook" -A 10

# Check DNS cache operations
echo "Checking DNS cache operations..."
rg "dnsCache\." -A 2

Length of output: 5125


Script:

#!/bin/bash
# Check the cache implementation and safety mechanisms

# Check the cache package implementation
echo "Checking cache package implementation..."
rg "type Cache\[" -A 10 internal/cache/

# Check cache expiration and cleanup mechanisms
echo "Checking cache expiration mechanisms..."
rg "WithExpiredHook|WithExpiration" -A 5 internal/cache/

# Check safety mechanisms around cache operations
echo "Checking safety mechanisms..."
rg "sync\.(RW)?Mutex|sync\.Map" internal/cache/

Length of output: 4248

internal/core/algorithm/ngt/ngt_test.go (2)

733-734: LGTM: Addition of sync.Mutex for thread safety

The addition of smu field of type *sync.Mutex aligns with the PR's objective of resolving the double-free error by providing additional synchronization.


783-784: LGTM: Consistent mutex field addition in test cases

The smu field is consistently added to the test struct definition, ensuring proper test coverage for the synchronization changes.

internal/net/dialer_test.go (2)

1519-1520: Addition of cache parameter to the args struct

The inclusion of cache *dialerCache in the args struct appropriately reflects the updated signature of the cacheExpireHook method, ensuring that the test cases are aligned with the changes.


1589-1589: Update of cacheExpireHook invocation with new parameter

The test now correctly passes test.args.cache to d.cacheExpireHook, matching the updated method signature. Since the additional parameter is currently unused in the function (as indicated by the _ prefix), passing nil in the tests is acceptable.

internal/cache/gache/gache.go Show resolved Hide resolved
kpango pushed a commit that referenced this pull request Nov 6, 2024
This commit fixes the style issues introduced in 50cfcd3 according to the output
from Gofumpt and Prettier.

Details: #2733
Signed-off-by: kpango <[email protected]>
@kpango kpango force-pushed the bugfix/agent/statistics-double-free-problem branch 3 times, most recently from 9ebd47c to e3c565a Compare November 6, 2024 07:39
@kpango kpango force-pushed the bugfix/agent/statistics-double-free-problem branch 2 times, most recently from f9836b8 to 2981dce Compare November 6, 2024 08:17
deepsource-autofix bot added a commit that referenced this pull request Nov 6, 2024
This commit fixes the style issues introduced in 2981dce according to the output
from Gofumpt and Prettier.

Details: #2733
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
internal/safety/safety.go (2)

40-42: Add test coverage for nil function handling

The static analysis indicates that the nil check is not covered by tests. Please add test cases to verify this defensive check.

Would you like me to help generate test cases for the nil function scenario?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 41-42: internal/safety/safety.go#L41-L42
Added lines #L41 - L42 were not covered by tests


45-46: LGTM: Enhanced error reporting with stack traces

The addition of stack traces to the panic recovery logging is a valuable improvement that will help debug issues like the double-free error more effectively. The implementation properly captures and formats the stack trace information.

Consider adding log sampling or truncation for very large stack traces in production to prevent log flooding while maintaining debugging capability.

internal/net/dialer_test.go (1)

Line range hint 1891-2094: Consider removing or implementing the commented-out test.

The Test_dialer_lookupIPAddrs function is commented out with TODO markers. Either implement the test cases or remove the commented code to maintain code cleanliness.

Would you like me to help implement the test cases for lookupIPAddrs or create a GitHub issue to track this task?

pkg/agent/core/ngt/service/ngt.go (1)

1456-1459: Consider adding defensive programming checks.

While the implementation is generally sound, consider these improvements:

  1. Early return if statistics are disabled to avoid unnecessary logging
  2. Consider wrapping the atomic store in a defer to ensure it's executed even if a panic occurs during field mapping

Consider this refactor:

 func (n *ngt) loadStatistics(ctx context.Context) error {
+    if !n.IsStatisticsEnabled() {
+        return nil
+    }
     log.Info("loading index statistics to cache")
     stats, err := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
     if err != nil {
         log.Errorf("failed to load index statistics to cache: %v", err)
         return err
     }
+    defer func() {
         n.statisticsCache.Store(&payload.Info_Index_Statistics{
             Valid:                            stats.Valid,
             // ... other fields ...
         })
+    }()
     return nil
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b41af56 and f9836b8.

⛔ Files ignored due to path filters (3)
  • example/client/go.sum is excluded by !**/*.sum
  • go.sum is excluded by !**/*.sum
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (59)
  • .gitfiles (9 hunks)
  • .github/ISSUE_TEMPLATE/bug_report.md (1 hunks)
  • .github/ISSUE_TEMPLATE/security_issue_report.md (1 hunks)
  • .github/PULL_REQUEST_TEMPLATE.md (1 hunks)
  • dockers/agent/core/agent/Dockerfile (1 hunks)
  • dockers/agent/core/faiss/Dockerfile (1 hunks)
  • dockers/agent/core/ngt/Dockerfile (1 hunks)
  • dockers/agent/sidecar/Dockerfile (1 hunks)
  • dockers/binfmt/Dockerfile (1 hunks)
  • dockers/buildbase/Dockerfile (1 hunks)
  • dockers/buildkit/Dockerfile (1 hunks)
  • dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
  • dockers/ci/base/Dockerfile (1 hunks)
  • dockers/dev/Dockerfile (1 hunks)
  • dockers/discoverer/k8s/Dockerfile (1 hunks)
  • dockers/example/client/Dockerfile (1 hunks)
  • dockers/gateway/filter/Dockerfile (1 hunks)
  • dockers/gateway/lb/Dockerfile (1 hunks)
  • dockers/gateway/mirror/Dockerfile (1 hunks)
  • dockers/index/job/correction/Dockerfile (1 hunks)
  • dockers/index/job/creation/Dockerfile (1 hunks)
  • dockers/index/job/deletion/Dockerfile (1 hunks)
  • dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
  • dockers/index/job/save/Dockerfile (1 hunks)
  • dockers/index/operator/Dockerfile (1 hunks)
  • dockers/manager/index/Dockerfile (1 hunks)
  • dockers/operator/helm/Dockerfile (1 hunks)
  • dockers/tools/benchmark/job/Dockerfile (1 hunks)
  • dockers/tools/benchmark/operator/Dockerfile (1 hunks)
  • dockers/tools/cli/loadtest/Dockerfile (1 hunks)
  • example/client/go.mod (2 hunks)
  • go.mod (22 hunks)
  • internal/cache/cache.go (1 hunks)
  • internal/cache/gache/gache.go (1 hunks)
  • internal/cache/gache/option.go (1 hunks)
  • internal/cache/option.go (1 hunks)
  • internal/core/algorithm/ngt/ngt.go (6 hunks)
  • internal/core/algorithm/ngt/ngt_test.go (2 hunks)
  • internal/net/dialer.go (1 hunks)
  • internal/net/dialer_test.go (2 hunks)
  • internal/safety/safety.go (2 hunks)
  • pkg/agent/core/ngt/service/ngt.go (2 hunks)
  • rust/rust-toolchain (1 hunks)
  • rust/rust-toolchain.toml (1 hunks)
  • versions/BUF_VERSION (1 hunks)
  • versions/K3S_VERSION (1 hunks)
  • versions/KUBECTL_VERSION (1 hunks)
  • versions/NGT_VERSION (1 hunks)
  • versions/OPERATOR_SDK_VERSION (1 hunks)
  • versions/PROMETHEUS_STACK_VERSION (1 hunks)
  • versions/PROTOBUF_VERSION (1 hunks)
  • versions/RUST_VERSION (1 hunks)
  • versions/TELEPRESENCE_VERSION (1 hunks)
  • versions/USEARCH_VERSION (1 hunks)
  • versions/actions/ACTIONS_SETUP_GO (1 hunks)
  • versions/actions/ACTIONS_SETUP_NODE (1 hunks)
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG (1 hunks)
  • versions/actions/GITHUB_ISSUE_METRICS (1 hunks)
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • rust/rust-toolchain.toml
🚧 Files skipped from review as they are similar to previous changes (53)
  • .gitfiles
  • .github/ISSUE_TEMPLATE/bug_report.md
  • .github/ISSUE_TEMPLATE/security_issue_report.md
  • .github/PULL_REQUEST_TEMPLATE.md
  • dockers/agent/core/agent/Dockerfile
  • dockers/agent/core/faiss/Dockerfile
  • dockers/agent/core/ngt/Dockerfile
  • dockers/agent/sidecar/Dockerfile
  • dockers/binfmt/Dockerfile
  • dockers/buildbase/Dockerfile
  • dockers/buildkit/Dockerfile
  • dockers/buildkit/syft/scanner/Dockerfile
  • dockers/ci/base/Dockerfile
  • dockers/dev/Dockerfile
  • dockers/discoverer/k8s/Dockerfile
  • dockers/example/client/Dockerfile
  • dockers/gateway/filter/Dockerfile
  • dockers/gateway/lb/Dockerfile
  • dockers/gateway/mirror/Dockerfile
  • dockers/index/job/correction/Dockerfile
  • dockers/index/job/creation/Dockerfile
  • dockers/index/job/deletion/Dockerfile
  • dockers/index/job/readreplica/rotate/Dockerfile
  • dockers/index/job/save/Dockerfile
  • dockers/index/operator/Dockerfile
  • dockers/manager/index/Dockerfile
  • dockers/operator/helm/Dockerfile
  • dockers/tools/benchmark/job/Dockerfile
  • dockers/tools/benchmark/operator/Dockerfile
  • dockers/tools/cli/loadtest/Dockerfile
  • example/client/go.mod
  • go.mod
  • internal/cache/cache.go
  • internal/cache/gache/gache.go
  • internal/cache/gache/option.go
  • internal/cache/option.go
  • internal/net/dialer.go
  • rust/rust-toolchain
  • versions/BUF_VERSION
  • versions/K3S_VERSION
  • versions/KUBECTL_VERSION
  • versions/NGT_VERSION
  • versions/OPERATOR_SDK_VERSION
  • versions/PROMETHEUS_STACK_VERSION
  • versions/PROTOBUF_VERSION
  • versions/RUST_VERSION
  • versions/TELEPRESENCE_VERSION
  • versions/USEARCH_VERSION
  • versions/actions/ACTIONS_SETUP_GO
  • versions/actions/ACTIONS_SETUP_NODE
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
  • versions/actions/GITHUB_ISSUE_METRICS
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🧰 Additional context used
🪛 GitHub Check: codecov/patch
internal/safety/safety.go

[warning] 41-42: internal/safety/safety.go#L41-L42
Added lines #L41 - L42 were not covered by tests

🔇 Additional comments (16)
internal/safety/safety.go (2)

22-22: LGTM: Import addition is appropriate

The addition of "runtime/debug" import is necessary for the enhanced stack trace functionality.


40-42: LGTM: Defensive nil check prevents potential panics

The nil check is a good defensive programming practice that prevents panics from attempting to call a nil function.

Let's verify if this change helps prevent the double-free issue:

✅ Verification successful

Let me gather more context about the nil check and its surrounding code to better understand its purpose and impact.


Based on the context gathered, I can now generate the final response:

Verified: Nil check is correctly placed in recovery wrapper function

The nil check is appropriately placed in the recoverFn wrapper function that handles panic recovery. This is a defensive measure that:

  • Prevents nil pointer dereference before panic recovery setup
  • Maintains consistent error handling by returning nil instead of panicking
  • Protects both RecoverFunc and RecoverWithoutPanicFunc callers
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for potential nil function pointer usage in GetGraphStatistics
rg -A 5 "GetGraphStatistics.*\bnil\b" || echo "No direct nil usage found in GetGraphStatistics"

# Search for recoverFn usage in relation to GetGraphStatistics
rg -A 5 "GetGraphStatistics.*recoverFn" || echo "No direct recoverFn usage found in GetGraphStatistics"

Length of output: 303


Script:

#!/bin/bash
# Get the context around the nil check in safety.go
rg -B 10 -A 10 "if fn == nil" internal/safety/safety.go

# Look for tests related to this nil check
fd "safety.*test.*go" --exec rg -l "fn == nil"

# Find any related function calls or usages
ast-grep --pattern 'func $NAME($ARGS) {
  if fn == nil {
    $$$
  }
  $$$
}'

Length of output: 791

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 41-42: internal/safety/safety.go#L41-L42
Added lines #L41 - L42 were not covered by tests

internal/net/dialer_test.go (2)

1519-1520: LGTM: Parameter addition aligns with function signature change.

The addition of the cache *dialerCache parameter to the args struct is consistent with the changes to the cacheExpireHook method's signature.


1589-1589: LGTM: Test updated to use new parameter.

The test execution has been properly updated to pass the cache parameter to the cacheExpireHook method, maintaining consistency with the signature change.

pkg/agent/core/ngt/service/ngt.go (3)

1309-1309: LGTM: Appropriate addition of statistics reload after index regeneration.

The addition of loadStatistics(ctx) call ensures that the statistics cache is updated after the index is regenerated, maintaining consistency between the index state and its statistics.


1453-1453: LGTM: Proper statistics reload after index creation.

The addition of loadStatistics(ctx) call ensures that the statistics cache is updated after index creation, maintaining consistency with the newly created index structure.


1456-1459: Verify thread safety of statistics cache updates.

The current implementation updates the statistics cache using atomic operations, but there could be a brief moment where readers might get partial data during the Store operation. Consider using a temporary variable and atomic.Value's Store for a more atomic update.

Consider implementing a double-buffering pattern or using sync.Mutex for the statistics update if absolute consistency is required during the update process.

internal/core/algorithm/ngt/ngt_test.go (2)

733-734: LGTM: Proper initialization of new sync mutex field.

The addition of smu field with proper initialization as &sync.Mutex{} in the test case is consistent with the changes needed to fix the double-free error in GetGraphStatistics API.


783-784: LGTM: Consistent initialization of sync mutex field in backup data test.

The smu field is consistently initialized in the test case for loading NGT with backup data, maintaining synchronization capabilities across different initialization paths.

internal/core/algorithm/ngt/ngt.go (7)

39-39: Imported singleflight package successfully

The import of "github.com/vdaas/vald/internal/sync/singleflight" is appropriate for managing concurrent statistics retrieval.


89-89: Updated GetGraphStatistics method signature

The method signature of GetGraphStatistics has been updated to include context.Context, which enhances cancellation and timeout handling.


118-121: Added synchronization fields to ngt struct

Adding group and smu to the ngt struct improves concurrency control and synchronization during graph statistics retrieval.


485-485: Initialized smu mutex properly

The smu mutex is correctly initialized with n.smu = new(sync.Mutex), ensuring thread-safe operations.


1086-1093: Properly handle nil cstats in fromCGraphStatistics

The function fromCGraphStatistics now checks for nil cstats and returns nil appropriately, preventing potential nil pointer dereferences.


1164-1166: Add nil check for cstats after C function call

The previous review comment regarding adding a nil check for cstats after the C.ngt_get_graph_statistics call is still valid and applicable to the current code segment.


1177-1180: Prevent potential infinite recursion in error handling

The previous review comment about potential infinite recursion in the error handling logic of GetGraphStatistics is still valid and applicable to the current code segment.

@kpango kpango force-pushed the bugfix/agent/statistics-double-free-problem branch from 0b49576 to 75897f5 Compare November 6, 2024 08:29
deepsource-autofix bot added a commit that referenced this pull request Nov 6, 2024
This commit fixes the style issues introduced in 75897f5 according to the output
from Gofumpt and Prettier.

Details: #2733
@kpango kpango force-pushed the bugfix/agent/statistics-double-free-problem branch 2 times, most recently from b76192b to 9eff890 Compare November 6, 2024 08:44
coderabbitai[bot]
coderabbitai bot previously approved these changes Nov 6, 2024
@kpango kpango force-pushed the bugfix/agent/statistics-double-free-problem branch 2 times, most recently from 96c43bd to e290c99 Compare November 6, 2024 09:59
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (7)
internal/safety/safety.go (1)

45-46: Consider stack trace size optimization.

The addition of stack traces enhances debugging capabilities significantly. However, stack traces can be large, especially in deep call hierarchies.

Consider adding a configuration option to limit stack trace size or make it configurable in production environments. Example implementation:

+ const maxStackSize = 4096 // Configurable maximum stack size

  ds := debug.Stack()
+ if len(ds) > maxStackSize {
+   ds = ds[:maxStackSize]
+ }
  infoStr := info.Get().String() + "\r\n" + conv.Btoa(ds)
pkg/agent/core/ngt/service/ngt.go (4)

1309-1309: Consider improving error handling for loadStatistics

The error from loadStatistics is returned directly, which could prevent successful index regeneration even if the statistics loading fails. Consider logging the error and continuing, since statistics are not critical for core functionality.

-return n.loadStatistics(ctx)
+if err := n.loadStatistics(ctx); err != nil {
+    log.Warnf("failed to load statistics after index regeneration: %v", err)
+}
+return nil

1453-1453: Consider improving error handling for loadStatistics

Similar to RegenerateIndexes, the error from loadStatistics is returned directly. Consider logging the error instead of failing the entire index creation operation.

-return n.loadStatistics(ctx)
+if err := n.loadStatistics(ctx); err != nil {
+    log.Warnf("failed to load statistics after index creation: %v", err)
+}
+return nil

1456-1459: Add context timeout handling for GetGraphStatistics

The GetGraphStatistics call could potentially block for a long time. Consider adding a timeout to prevent hanging.

+ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
+defer cancel()
 stats, err := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)

Line range hint 1459-1503: Add validation for critical statistics values

The statistics values are stored without validation. Consider adding checks for critical values to ensure data integrity.

+// Validate critical statistics
+if stats.NumberOfObjects < 0 || stats.NumberOfIndexedObjects < 0 {
+    return errors.New("invalid statistics: negative object count")
+}
 n.statisticsCache.Store(&payload.Info_Index_Statistics{
     Valid:                            stats.Valid,
     // ... existing fields ...
 })
internal/core/algorithm/ngt/ngt_test.go (1)

Line range hint 1-2733: Consider adding concurrent operation tests.

While the test coverage is comprehensive for individual operations, consider adding specific test cases that verify the behavior of concurrent operations using the new smu mutex. This would help ensure thread safety is maintained under high concurrency.

Example scenarios to test:

  • Concurrent calls to GetGraphStatistics
  • Interaction between GetGraphStatistics and other operations using mu/cmu
internal/core/algorithm/ngt/ngt.go (1)

1153-1154: Extract retry limit into a constant for maintainability

Hardcoding the retry limit makes future adjustments less manageable. Consider defining the retry count as a constant to enhance readability and ease of modification.

Apply this diff to extract the retry limit:

+const maxGetGraphStatisticsRetries = 10

 func (n *ngt) GetGraphStatistics(
     ctx context.Context, m statisticsType,
 ) (stats *GraphStatistics, err error) {
-    return n.getGraphStatistics(ctx, m, 10)
+    return n.getGraphStatistics(ctx, m, maxGetGraphStatisticsRetries)
 }

 func (n *ngt) getGraphStatistics(
     ctx context.Context, m statisticsType, cnt int,
 ) (stats *GraphStatistics, err error) {
     // Existing logic...
-    if shared && cnt > 0 && !errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) {
+    if shared && cnt > 0 && !errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) {
         cnt--
-        return n.getGraphStatistics(ctx, m, cnt)
+        return n.getGraphStatistics(ctx, m, cnt)
     }
     // Existing logic...
 }

Also applies to: 1190-1193

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b41af56 and e290c99.

⛔ Files ignored due to path filters (3)
  • example/client/go.sum is excluded by !**/*.sum
  • go.sum is excluded by !**/*.sum
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (60)
  • .gitfiles (9 hunks)
  • .github/ISSUE_TEMPLATE/bug_report.md (1 hunks)
  • .github/ISSUE_TEMPLATE/security_issue_report.md (1 hunks)
  • .github/PULL_REQUEST_TEMPLATE.md (1 hunks)
  • dockers/agent/core/agent/Dockerfile (1 hunks)
  • dockers/agent/core/faiss/Dockerfile (1 hunks)
  • dockers/agent/core/ngt/Dockerfile (1 hunks)
  • dockers/agent/sidecar/Dockerfile (1 hunks)
  • dockers/binfmt/Dockerfile (1 hunks)
  • dockers/buildbase/Dockerfile (1 hunks)
  • dockers/buildkit/Dockerfile (1 hunks)
  • dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
  • dockers/ci/base/Dockerfile (1 hunks)
  • dockers/dev/Dockerfile (1 hunks)
  • dockers/discoverer/k8s/Dockerfile (1 hunks)
  • dockers/example/client/Dockerfile (1 hunks)
  • dockers/gateway/filter/Dockerfile (1 hunks)
  • dockers/gateway/lb/Dockerfile (1 hunks)
  • dockers/gateway/mirror/Dockerfile (1 hunks)
  • dockers/index/job/correction/Dockerfile (1 hunks)
  • dockers/index/job/creation/Dockerfile (1 hunks)
  • dockers/index/job/deletion/Dockerfile (1 hunks)
  • dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
  • dockers/index/job/save/Dockerfile (1 hunks)
  • dockers/index/operator/Dockerfile (1 hunks)
  • dockers/manager/index/Dockerfile (1 hunks)
  • dockers/operator/helm/Dockerfile (1 hunks)
  • dockers/tools/benchmark/job/Dockerfile (1 hunks)
  • dockers/tools/benchmark/operator/Dockerfile (1 hunks)
  • dockers/tools/cli/loadtest/Dockerfile (1 hunks)
  • example/client/go.mod (2 hunks)
  • go.mod (21 hunks)
  • hack/go.mod.default (1 hunks)
  • internal/cache/cache.go (1 hunks)
  • internal/cache/gache/gache.go (1 hunks)
  • internal/cache/gache/option.go (1 hunks)
  • internal/cache/option.go (1 hunks)
  • internal/core/algorithm/ngt/ngt.go (6 hunks)
  • internal/core/algorithm/ngt/ngt_test.go (2 hunks)
  • internal/net/dialer.go (1 hunks)
  • internal/net/dialer_test.go (2 hunks)
  • internal/safety/safety.go (2 hunks)
  • pkg/agent/core/ngt/service/ngt.go (2 hunks)
  • rust/rust-toolchain (1 hunks)
  • rust/rust-toolchain.toml (1 hunks)
  • versions/BUF_VERSION (1 hunks)
  • versions/K3S_VERSION (1 hunks)
  • versions/KUBECTL_VERSION (1 hunks)
  • versions/NGT_VERSION (1 hunks)
  • versions/OPERATOR_SDK_VERSION (1 hunks)
  • versions/PROMETHEUS_STACK_VERSION (1 hunks)
  • versions/PROTOBUF_VERSION (1 hunks)
  • versions/RUST_VERSION (1 hunks)
  • versions/TELEPRESENCE_VERSION (1 hunks)
  • versions/USEARCH_VERSION (1 hunks)
  • versions/actions/ACTIONS_SETUP_GO (1 hunks)
  • versions/actions/ACTIONS_SETUP_NODE (1 hunks)
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG (1 hunks)
  • versions/actions/GITHUB_ISSUE_METRICS (1 hunks)
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • dockers/agent/core/ngt/Dockerfile
  • rust/rust-toolchain.toml
🚧 Files skipped from review as they are similar to previous changes (53)
  • .gitfiles
  • .github/ISSUE_TEMPLATE/bug_report.md
  • .github/ISSUE_TEMPLATE/security_issue_report.md
  • .github/PULL_REQUEST_TEMPLATE.md
  • dockers/agent/core/agent/Dockerfile
  • dockers/agent/core/faiss/Dockerfile
  • dockers/agent/sidecar/Dockerfile
  • dockers/binfmt/Dockerfile
  • dockers/buildbase/Dockerfile
  • dockers/buildkit/Dockerfile
  • dockers/buildkit/syft/scanner/Dockerfile
  • dockers/ci/base/Dockerfile
  • dockers/dev/Dockerfile
  • dockers/discoverer/k8s/Dockerfile
  • dockers/example/client/Dockerfile
  • dockers/gateway/filter/Dockerfile
  • dockers/gateway/lb/Dockerfile
  • dockers/gateway/mirror/Dockerfile
  • dockers/index/job/correction/Dockerfile
  • dockers/index/job/creation/Dockerfile
  • dockers/index/job/deletion/Dockerfile
  • dockers/index/job/readreplica/rotate/Dockerfile
  • dockers/index/job/save/Dockerfile
  • dockers/index/operator/Dockerfile
  • dockers/manager/index/Dockerfile
  • dockers/operator/helm/Dockerfile
  • dockers/tools/benchmark/job/Dockerfile
  • dockers/tools/benchmark/operator/Dockerfile
  • dockers/tools/cli/loadtest/Dockerfile
  • example/client/go.mod
  • go.mod
  • internal/cache/cache.go
  • internal/cache/gache/gache.go
  • internal/cache/gache/option.go
  • internal/cache/option.go
  • internal/net/dialer.go
  • internal/net/dialer_test.go
  • rust/rust-toolchain
  • versions/BUF_VERSION
  • versions/K3S_VERSION
  • versions/KUBECTL_VERSION
  • versions/NGT_VERSION
  • versions/OPERATOR_SDK_VERSION
  • versions/PROMETHEUS_STACK_VERSION
  • versions/PROTOBUF_VERSION
  • versions/RUST_VERSION
  • versions/TELEPRESENCE_VERSION
  • versions/USEARCH_VERSION
  • versions/actions/ACTIONS_SETUP_GO
  • versions/actions/ACTIONS_SETUP_NODE
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
  • versions/actions/GITHUB_ISSUE_METRICS
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (14)
internal/safety/safety.go (3)

22-22: LGTM: Import addition is appropriate.

The addition of the runtime/debug package is necessary for the new stack trace functionality and follows Go's import conventions.


40-42: Great addition: Early nil check prevents potential panics.

The nil check is a crucial safety improvement that prevents potential panics from dereferencing a nil function pointer. Following the fail-fast principle by checking at the start of the function is the correct approach.


40-46: Verify consistent implementation across similar functions.

The safety improvements look good, but let's ensure similar functions in the codebase follow the same pattern.

✅ Verification successful

The previous pattern might have been too specific. Let's try a broader search to find panic recovery patterns and debug stack usage in the codebase.


Safety implementation is self-contained and properly implemented

The safety package implements a comprehensive panic recovery mechanism in a single, well-structured file. The implementation:

  • Has proper nil checks before function execution
  • Captures stack traces and detailed info for debugging
  • Handles different types of panic values (runtime.Error, string, error)
  • Provides options for both recovering and re-panicking (via withPanic parameter)
  • Includes proper logging at appropriate severity levels

Since this is a self-contained utility package that other parts of the codebase use through its exported functions (RecoverFunc and RecoverWithoutPanicFunc), there's no need for similar implementations elsewhere in the codebase. The centralized implementation ensures consistency in panic handling across the application.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for similar recovery patterns that might need the same improvements
ast-grep --pattern 'func $name($_) {
  defer func() {
    if r := recover(); r != nil {
      $$$
    }
  }()
  $$$
}'

Length of output: 120


Script:

#!/bin/bash
# Let's try a broader search for panic recovery patterns
rg -U "defer func\(\).*recover\(\)" -A 5

# Also search for specific debug stack usage
rg "debug\.Stack\(\)" -A 2

Length of output: 311


Script:

#!/bin/bash
# Let's look for the entire safety function implementation to understand the context better
rg -U "func Safe" -A 20 internal/safety/safety.go

# Also search for any other functions that might be using this safety wrapper
rg "Safe\(" -A 2

Length of output: 70


Script:

#!/bin/bash
# Let's try with case-insensitive search and look at the entire file content
cat internal/safety/safety.go

# Also search for any panic recovery patterns in the entire codebase with more context
rg -U --type go "defer\s+func\(\)\s*{" -A 10

Length of output: 2291

hack/go.mod.default (3)

332-341: LGTM: Kubernetes components version update.

The update of Kubernetes components from v0.31.1 to v0.31.2 is a patch version increment, which typically includes bug fixes and security updates. This is a safe change.


344-344: LGTM: Supporting components version update.

The updates to:

  • controller-runtime (v0.19.1)
  • structured-merge-diff/v4 (v4.4.1)
    are compatible with the Kubernetes version update.

Also applies to: 347-347


Line range hint 1-331: Verify the generic upgrade markers.

Most dependencies are marked with a generic "upgrade" directive. While this is valid syntax, it would be more maintainable to specify exact versions.

Consider explicitly specifying versions for critical dependencies instead of using generic "upgrade" markers. This helps in:

  • Better version control
  • Easier tracking of breaking changes
  • More predictable builds

Also applies to: 342-343, 345-346, 348-349

✅ Verification successful

Let me try another verification approach since the previous script didn't provide useful information about version changes.

This script will help us understand:

  1. If there are any replace directives with specific versions in other go.mod files
  2. If there are any version requirements specified in go.mod files
    This will give us context about whether using generic "upgrade" markers is consistent with the project's dependency management approach.

Let me try one final verification approach to check if this is a template/default file used for dependency management.

This will help us understand if this is a template/default file used for dependency management, which would explain the generic upgrade markers.


Let me check how this default file is used in the build process.

This will help us understand the intended usage of the go.mod.default file and the project's dependency management approach.


Based on the verification results, I can now provide a final response:

Using generic "upgrade" markers in go.mod.default is intentional and correct

The go.mod.default file in the hack directory is a template file used during the build process, as evidenced by:

  1. It's located in the hack/ directory alongside other build tools and templates
  2. The dependencies.mk shows it's used as a base template that gets copied to create the actual go.mod:
    cp $(ROOTDIR)/hack/go.mod.default $(ROOTDIR)/go.mod
  3. The build process handles the actual version resolution through go mod tidy and go get -u all commands after copying the template

The generic "upgrade" markers are intentionally used as placeholders that get resolved to actual versions during the build process, making this an expected and correct pattern for a template file.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if any of the generically marked dependencies have breaking changes
# Look for major version bumps in go.sum (if it exists) for these dependencies

# Search for go.sum in the repository
if fd -t f "go.sum$"; then
    # Extract current versions from go.sum for dependencies marked as "upgrade"
    rg "=> \w+.* upgrade$" -o | while read -r line; do
        dep=$(echo "$line" | cut -d ' ' -f1)
        echo "Checking $dep..."
        rg "^$dep " "go.sum" || true
    done
fi

Length of output: 83942


Script:

#!/bin/bash
# Check if there are any go.mod files in the repository
fd -t f "go.mod$" | while read -r mod_file; do
  echo "=== Checking $mod_file ==="
  # Look for any replace directives with specific versions
  rg "^replace.*=>.*v[0-9]" "$mod_file" || true
done

# Also check if there are any version constraints in go.mod files
fd -t f "go.mod$" | while read -r mod_file; do
  echo "=== Version requirements in $mod_file ==="
  rg "^require.*v[0-9]" "$mod_file" || true
done

Length of output: 691


Script:

#!/bin/bash
# Check if this is a template file used for dependency management
echo "=== Checking for template/default go.mod files ==="
fd -t f "go.mod" -x dirname {} \; | grep -i "template\|default\|hack"

# Check if there are any scripts that use this file
echo "=== Checking for scripts that reference go.mod.default ==="
rg "go.mod.default" -l

# Check the directory structure to understand the context
echo "=== Directory structure around hack/ ==="
ls -la hack/ 2>/dev/null || true

Length of output: 1263


Script:

#!/bin/bash
# Check how go.mod.default is used in the build process
echo "=== Checking Makefile.d/dependencies.mk usage ==="
cat Makefile.d/dependencies.mk

# Check if there are any version update scripts
echo "=== Checking for version update tooling ==="
fd -t f "upgrade|update|version" hack/tools/

# Check changelog for context about dependency management
echo "=== Recent dependency management changes ==="
head -n 50 CHANGELOG.md

Length of output: 14605

internal/core/algorithm/ngt/ngt_test.go (2)

783-784: Consistent initialization of synchronization primitives.

The smu field is properly initialized in the test struct. This matches the pattern used for other mutex fields (mu and cmu).


733-734: New synchronization field added for concurrent access control.

The addition of smu *sync.Mutex field enhances thread safety by providing exclusive locking capabilities. However, since this is used alongside existing mu *sync.RWMutex and cmu *sync.RWMutex, ensure proper lock ordering to prevent potential deadlocks.

✅ Verification successful

Mutex usage is safe and properly synchronized

Based on the search results, the smu mutex is used correctly with proper Lock/Unlock pairs and follows consistent locking patterns. The mutex is used in isolation without nested locks that could cause deadlocks. The usage appears in save/statistics operations where exclusive access is required:

  • In NGT core: Used for statistics mode operations
  • In service layer: Used during save operations with proper defer unlock
  • No evidence of lock ordering issues or potential deadlocks with other mutexes
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for potential deadlock scenarios in lock acquisition
ast-grep --pattern 'smu.Lock()
  $$$
  mu.Lock()' 

# Check reverse order
ast-grep --pattern 'mu.Lock()
  $$$
  smu.Lock()'

Length of output: 103


Script:

#!/bin/bash
# Let's search for all mutex lock/unlock patterns in the codebase
rg -A 5 "smu\." --type go

# Also check for any concurrent test scenarios
rg -B 2 -A 5 "go func" --type go internal/core/algorithm/ngt/

Length of output: 2128

internal/core/algorithm/ngt/ngt.go (6)

38-40: Imports added for enhanced functionality

The new imports safety and singleflight are necessary for improved error handling and to prevent duplicate concurrent requests when retrieving graph statistics.


90-90: Updated GetGraphStatistics method to accept context.Context

Including context.Context in the method signature is a good practice for managing cancellations and timeouts, enhancing the method's flexibility and control.


122-122: Added mutex smu for thread-safe statistics retrieval

Introducing smu *sync.Mutex ensures synchronization when accessing shared resources during graph statistics retrieval, preventing race conditions.


486-487: Proper initialization of synchronization primitives

Initializing smu and group in the gen function prepares the ngt instance for safe concurrent operations and efficient request coalescing.


1087-1089: Added nil check for cstats to prevent nil pointer dereference

The nil check in fromCGraphStatistics enhances robustness by ensuring that cstats is valid before accessing its fields, preventing potential segmentation faults.


1150-1196: Refactored GetGraphStatistics with retry mechanism and singleflight

The GetGraphStatistics method now delegates to getGraphStatistics, utilizing a retry mechanism with a maximum of 10 attempts and employing singleflight to prevent duplicate concurrent requests. This refactoring addresses previous concerns about infinite recursion and improves concurrency handling.

coderabbitai[bot]
coderabbitai bot previously approved these changes Nov 6, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
pkg/agent/core/ngt/service/ngt.go (2)

1309-1309: Consider improving error handling for loadStatistics

The error from loadStatistics is directly returned without logging. Consider logging the error before returning to help with debugging and monitoring.

-  return n.loadStatistics(ctx)
+  err = n.loadStatistics(ctx)
+  if err != nil {
+    log.Errorf("failed to load statistics after index regeneration: %v", err)
+  }
+  return err

1453-1453: Maintain consistent error handling style

The error handling for loadStatistics should follow the same pattern as other error handling in this function.

-  return n.loadStatistics(ctx)
+  if err = n.loadStatistics(ctx); err != nil {
+    log.Errorf("failed to load statistics after creating index: %v", err)
+    return err
+  }
+  return nil
internal/core/algorithm/ngt/ngt_test.go (1)

Line range hint 1-2000: Consider adding specific test cases for concurrent GetGraphStatistics calls

While the synchronization mechanism is added and tested in the basic initialization test cases, it would be valuable to add specific test cases that verify the thread safety of GetGraphStatistics under concurrent access.

Consider adding test cases like:

func TestNGT_GetGraphStatisticsConcurrent(t *testing.T) {
    // Initialize NGT
    n, err := New(...)
    if err != nil {
        t.Fatal(err)
    }
    defer n.Close()

    // Run multiple goroutines calling GetGraphStatistics
    var wg sync.WaitGroup
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            // Call GetGraphStatistics and verify no panics/races
            _, err := n.GetGraphStatistics()
            if err != nil {
                t.Error(err)
            }
        }()
    }
    wg.Wait()
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e290c99 and 6023a18.

⛔ Files ignored due to path filters (3)
  • example/client/go.sum is excluded by !**/*.sum
  • go.sum is excluded by !**/*.sum
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (61)
  • .gitfiles (9 hunks)
  • .github/ISSUE_TEMPLATE/bug_report.md (1 hunks)
  • .github/ISSUE_TEMPLATE/security_issue_report.md (1 hunks)
  • .github/PULL_REQUEST_TEMPLATE.md (1 hunks)
  • Makefile (1 hunks)
  • dockers/agent/core/agent/Dockerfile (1 hunks)
  • dockers/agent/core/faiss/Dockerfile (1 hunks)
  • dockers/agent/core/ngt/Dockerfile (1 hunks)
  • dockers/agent/sidecar/Dockerfile (1 hunks)
  • dockers/binfmt/Dockerfile (1 hunks)
  • dockers/buildbase/Dockerfile (1 hunks)
  • dockers/buildkit/Dockerfile (1 hunks)
  • dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
  • dockers/ci/base/Dockerfile (1 hunks)
  • dockers/dev/Dockerfile (1 hunks)
  • dockers/discoverer/k8s/Dockerfile (1 hunks)
  • dockers/example/client/Dockerfile (1 hunks)
  • dockers/gateway/filter/Dockerfile (1 hunks)
  • dockers/gateway/lb/Dockerfile (1 hunks)
  • dockers/gateway/mirror/Dockerfile (1 hunks)
  • dockers/index/job/correction/Dockerfile (1 hunks)
  • dockers/index/job/creation/Dockerfile (1 hunks)
  • dockers/index/job/deletion/Dockerfile (1 hunks)
  • dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
  • dockers/index/job/save/Dockerfile (1 hunks)
  • dockers/index/operator/Dockerfile (1 hunks)
  • dockers/manager/index/Dockerfile (1 hunks)
  • dockers/operator/helm/Dockerfile (1 hunks)
  • dockers/tools/benchmark/job/Dockerfile (1 hunks)
  • dockers/tools/benchmark/operator/Dockerfile (1 hunks)
  • dockers/tools/cli/loadtest/Dockerfile (1 hunks)
  • example/client/go.mod (2 hunks)
  • go.mod (21 hunks)
  • hack/go.mod.default (1 hunks)
  • internal/cache/cache.go (1 hunks)
  • internal/cache/gache/gache.go (1 hunks)
  • internal/cache/gache/option.go (1 hunks)
  • internal/cache/option.go (1 hunks)
  • internal/core/algorithm/ngt/ngt.go (6 hunks)
  • internal/core/algorithm/ngt/ngt_test.go (2 hunks)
  • internal/net/dialer.go (1 hunks)
  • internal/net/dialer_test.go (2 hunks)
  • internal/safety/safety.go (2 hunks)
  • pkg/agent/core/ngt/service/ngt.go (2 hunks)
  • rust/rust-toolchain (1 hunks)
  • rust/rust-toolchain.toml (1 hunks)
  • versions/BUF_VERSION (1 hunks)
  • versions/K3S_VERSION (1 hunks)
  • versions/KUBECTL_VERSION (1 hunks)
  • versions/NGT_VERSION (1 hunks)
  • versions/OPERATOR_SDK_VERSION (1 hunks)
  • versions/PROMETHEUS_STACK_VERSION (1 hunks)
  • versions/PROTOBUF_VERSION (1 hunks)
  • versions/RUST_VERSION (1 hunks)
  • versions/TELEPRESENCE_VERSION (1 hunks)
  • versions/USEARCH_VERSION (1 hunks)
  • versions/actions/ACTIONS_SETUP_GO (1 hunks)
  • versions/actions/ACTIONS_SETUP_NODE (1 hunks)
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG (1 hunks)
  • versions/actions/GITHUB_ISSUE_METRICS (1 hunks)
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (57)
  • .gitfiles
  • .github/ISSUE_TEMPLATE/bug_report.md
  • .github/ISSUE_TEMPLATE/security_issue_report.md
  • .github/PULL_REQUEST_TEMPLATE.md
  • dockers/agent/core/agent/Dockerfile
  • dockers/agent/core/faiss/Dockerfile
  • dockers/agent/core/ngt/Dockerfile
  • dockers/agent/sidecar/Dockerfile
  • dockers/binfmt/Dockerfile
  • dockers/buildbase/Dockerfile
  • dockers/buildkit/Dockerfile
  • dockers/buildkit/syft/scanner/Dockerfile
  • dockers/ci/base/Dockerfile
  • dockers/dev/Dockerfile
  • dockers/discoverer/k8s/Dockerfile
  • dockers/example/client/Dockerfile
  • dockers/gateway/filter/Dockerfile
  • dockers/gateway/lb/Dockerfile
  • dockers/gateway/mirror/Dockerfile
  • dockers/index/job/correction/Dockerfile
  • dockers/index/job/creation/Dockerfile
  • dockers/index/job/deletion/Dockerfile
  • dockers/index/job/readreplica/rotate/Dockerfile
  • dockers/index/job/save/Dockerfile
  • dockers/index/operator/Dockerfile
  • dockers/manager/index/Dockerfile
  • dockers/operator/helm/Dockerfile
  • dockers/tools/benchmark/job/Dockerfile
  • dockers/tools/benchmark/operator/Dockerfile
  • dockers/tools/cli/loadtest/Dockerfile
  • example/client/go.mod
  • go.mod
  • hack/go.mod.default
  • internal/cache/cache.go
  • internal/cache/gache/gache.go
  • internal/cache/gache/option.go
  • internal/cache/option.go
  • internal/net/dialer.go
  • internal/net/dialer_test.go
  • internal/safety/safety.go
  • rust/rust-toolchain
  • rust/rust-toolchain.toml
  • versions/BUF_VERSION
  • versions/K3S_VERSION
  • versions/KUBECTL_VERSION
  • versions/NGT_VERSION
  • versions/OPERATOR_SDK_VERSION
  • versions/PROMETHEUS_STACK_VERSION
  • versions/PROTOBUF_VERSION
  • versions/RUST_VERSION
  • versions/TELEPRESENCE_VERSION
  • versions/USEARCH_VERSION
  • versions/actions/ACTIONS_SETUP_GO
  • versions/actions/ACTIONS_SETUP_NODE
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
  • versions/actions/GITHUB_ISSUE_METRICS
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (8)
Makefile (1)

674-675: Verify the NGT bugfix branch status.

The change from version-tagged NGT to a specific bugfix branch appears to address the double-free error. However, using an untagged branch instead of a stable version requires verification.

Let's verify the branch status:

Also, there appears to be a typo in the branch name ("statisitcs" instead of "statistics"). This doesn't affect functionality but might want to be corrected in the upstream repository.

✅ Verification successful

Let me gather more information about this branch to ensure it's actively maintained and contains the necessary fixes.


Branch is active but consider future version upgrade

The bugfix branch exists and is actively maintained:

  • Latest commit from November 6, 2024, specifically addresses the null pointer free problem
  • Branch is not yet merged into any release (latest release is v2.3.3)
  • Commit message confirms it targets the exact issue: "investigate null pointer free problem in getGraphStatistics API"

While using this branch is appropriate for now to fix the critical issue, consider upgrading to the next stable release once this fix is merged upstream.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the bugfix branch exists and its latest commit

# Test: Verify branch existence and last commit
git ls-remote --heads https://github.com/yahoojapan/NGT.git bugfix/get-graph-statisitcs/nullpointer-free-problem

# Note: Consider switching back to version-tagged releases once the fix is merged upstream

Length of output: 219


Script:

#!/bin/bash
# Get more details about the branch and its relationship to main/master
# First, get the latest commit date and message for the bugfix branch
git ls-remote --refs https://github.com/yahoojapan/NGT.git bugfix/get-graph-statisitcs/nullpointer-free-problem | cut -f1 | xargs -I{} curl -s -H "Accept: application/vnd.github.v3+json" https://api.github.com/repos/yahoojapan/NGT/commits/{} | jq -r '[.commit.author.date, .commit.message] | @tsv'

# Check the latest tag/release version
git ls-remote --tags https://github.com/yahoojapan/NGT.git | sort -t '/' -k 3 -V | tail -n 1

# Check if this commit exists in any release tags
git ls-remote --tags https://github.com/yahoojapan/NGT.git | grep 873be2f0d01eb11d685821772ec48ed362847232

Length of output: 704

internal/core/algorithm/ngt/ngt.go (4)

38-40: LGTM: Well-structured concurrency control additions

The new imports and struct fields provide robust concurrency control:

  • singleflight.Group prevents redundant concurrent calls
  • sync.Mutex ensures thread-safe access to shared resources

Also applies to: 119-122


1087-1089: LGTM: Added nil pointer check

The addition of the nil check for cstats parameter addresses the previous review comment about potential nil pointer issues.


1091-1119: LGTM: Comprehensive C to Go struct conversion

The implementation properly maps all C struct fields to their Go counterparts with appropriate type conversions.


1150-1196: LGTM: Robust implementation with proper concurrency control

The new implementation addresses previous concerns and adds several improvements:

  1. Uses singleflight to prevent duplicate concurrent calls
  2. Implements a bounded retry mechanism
  3. Adds proper synchronization with mutex
  4. Includes panic recovery
  5. Properly handles context cancellation

The retry mechanism now has a counter to prevent infinite recursion, addressing the previous review comment.

pkg/agent/core/ngt/service/ngt.go (1)

Line range hint 1456-1499: Consider adding context timeout and refactoring statistics mapping

The implementation could be improved in two areas:

  1. Add context timeout handling to prevent long-running operations
  2. Move statistics mapping to a separate helper function to improve readability
 func (n *ngt) loadStatistics(ctx context.Context) error {
     if n.IsStatisticsEnabled() {
         log.Info("loading index statistics to cache")
+        ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
+        defer cancel()
         
         stats, err := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
         if err != nil {
             log.Errorf("failed to load index statistics to cache: %v", err)
             return err
         }
-        n.statisticsCache.Store(&payload.Info_Index_Statistics{
-            Valid:                            stats.Valid,
-            MedianIndegree:                   stats.MedianIndegree,
-            // ... other fields
-        })
+        n.statisticsCache.Store(n.mapToIndexStatistics(stats))
     }
     return nil
 }
+
+func (n *ngt) mapToIndexStatistics(stats *core.GraphStatistics) *payload.Info_Index_Statistics {
+    return &payload.Info_Index_Statistics{
+        Valid:                            stats.Valid,
+        MedianIndegree:                   stats.MedianIndegree,
+        // ... other fields
+    }
+}
internal/core/algorithm/ngt/ngt_test.go (2)

733-734: LGTM: Addition of sync.Mutex for thread safety

The addition of smu field in the test case aligns with the PR's objective of fixing the double-free error by introducing proper synchronization.


783-784: LGTM: Consistent sync.Mutex initialization in backup test case

The smu field is consistently added to the test case that loads backup data, maintaining synchronization across different initialization paths.

coderabbitai[bot]
coderabbitai bot previously approved these changes Nov 6, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
internal/core/algorithm/ngt/ngt.go (2)

1099-1131: Consider moving C.ngt_free_graph_statistics call

While the nil check is good, consider moving the C.ngt_free_graph_statistics call to this function to ensure cleanup happens immediately after conversion, preventing potential memory leaks.

-func fromCGraphStatistics(cstats *C.NGTGraphStatistics) *GraphStatistics {
+func fromCGraphStatistics(cstats *C.NGTGraphStatistics) (*GraphStatistics, error) {
 	if cstats == nil {
-		return nil
+		return nil, nil
 	}
+	defer C.ngt_free_graph_statistics(cstats)
 	goStats := &GraphStatistics{
 		// ... existing fields ...
 	}
-	return goStats
+	return goStats, nil
}

1162-1208: Consider making retry count configurable

The implementation is robust with good concurrency control and panic recovery. However, the hardcoded retry count (10) could be made configurable through an option or configuration parameter for better flexibility.

Consider adding a field to the ngt struct:

type ngt struct {
    // ... existing fields ...
    statsRetryCount int
}

Then update the GetGraphStatistics call:

-	return n.getGraphStatistics(ctx, m, 10)
+	return n.getGraphStatistics(ctx, m, n.statsRetryCount)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6023a18 and 795ac5c.

⛔ Files ignored due to path filters (3)
  • example/client/go.sum is excluded by !**/*.sum
  • go.sum is excluded by !**/*.sum
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (61)
  • .gitfiles (9 hunks)
  • .github/ISSUE_TEMPLATE/bug_report.md (1 hunks)
  • .github/ISSUE_TEMPLATE/security_issue_report.md (1 hunks)
  • .github/PULL_REQUEST_TEMPLATE.md (1 hunks)
  • Makefile (1 hunks)
  • dockers/agent/core/agent/Dockerfile (1 hunks)
  • dockers/agent/core/faiss/Dockerfile (1 hunks)
  • dockers/agent/core/ngt/Dockerfile (1 hunks)
  • dockers/agent/sidecar/Dockerfile (1 hunks)
  • dockers/binfmt/Dockerfile (1 hunks)
  • dockers/buildbase/Dockerfile (1 hunks)
  • dockers/buildkit/Dockerfile (1 hunks)
  • dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
  • dockers/ci/base/Dockerfile (1 hunks)
  • dockers/dev/Dockerfile (1 hunks)
  • dockers/discoverer/k8s/Dockerfile (1 hunks)
  • dockers/example/client/Dockerfile (1 hunks)
  • dockers/gateway/filter/Dockerfile (1 hunks)
  • dockers/gateway/lb/Dockerfile (1 hunks)
  • dockers/gateway/mirror/Dockerfile (1 hunks)
  • dockers/index/job/correction/Dockerfile (1 hunks)
  • dockers/index/job/creation/Dockerfile (1 hunks)
  • dockers/index/job/deletion/Dockerfile (1 hunks)
  • dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
  • dockers/index/job/save/Dockerfile (1 hunks)
  • dockers/index/operator/Dockerfile (1 hunks)
  • dockers/manager/index/Dockerfile (1 hunks)
  • dockers/operator/helm/Dockerfile (1 hunks)
  • dockers/tools/benchmark/job/Dockerfile (1 hunks)
  • dockers/tools/benchmark/operator/Dockerfile (1 hunks)
  • dockers/tools/cli/loadtest/Dockerfile (1 hunks)
  • example/client/go.mod (2 hunks)
  • go.mod (21 hunks)
  • hack/go.mod.default (1 hunks)
  • internal/cache/cache.go (1 hunks)
  • internal/cache/gache/gache.go (1 hunks)
  • internal/cache/gache/option.go (1 hunks)
  • internal/cache/option.go (1 hunks)
  • internal/core/algorithm/ngt/ngt.go (12 hunks)
  • internal/core/algorithm/ngt/ngt_test.go (2 hunks)
  • internal/net/dialer.go (1 hunks)
  • internal/net/dialer_test.go (2 hunks)
  • internal/safety/safety.go (2 hunks)
  • pkg/agent/core/ngt/service/ngt.go (2 hunks)
  • rust/rust-toolchain (1 hunks)
  • rust/rust-toolchain.toml (1 hunks)
  • versions/BUF_VERSION (1 hunks)
  • versions/K3S_VERSION (1 hunks)
  • versions/KUBECTL_VERSION (1 hunks)
  • versions/NGT_VERSION (1 hunks)
  • versions/OPERATOR_SDK_VERSION (1 hunks)
  • versions/PROMETHEUS_STACK_VERSION (1 hunks)
  • versions/PROTOBUF_VERSION (1 hunks)
  • versions/RUST_VERSION (1 hunks)
  • versions/TELEPRESENCE_VERSION (1 hunks)
  • versions/USEARCH_VERSION (1 hunks)
  • versions/actions/ACTIONS_SETUP_GO (1 hunks)
  • versions/actions/ACTIONS_SETUP_NODE (1 hunks)
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG (1 hunks)
  • versions/actions/GITHUB_ISSUE_METRICS (1 hunks)
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (58)
  • .gitfiles
  • .github/ISSUE_TEMPLATE/bug_report.md
  • .github/ISSUE_TEMPLATE/security_issue_report.md
  • .github/PULL_REQUEST_TEMPLATE.md
  • Makefile
  • dockers/agent/core/agent/Dockerfile
  • dockers/agent/core/faiss/Dockerfile
  • dockers/agent/core/ngt/Dockerfile
  • dockers/agent/sidecar/Dockerfile
  • dockers/binfmt/Dockerfile
  • dockers/buildbase/Dockerfile
  • dockers/buildkit/Dockerfile
  • dockers/buildkit/syft/scanner/Dockerfile
  • dockers/ci/base/Dockerfile
  • dockers/dev/Dockerfile
  • dockers/discoverer/k8s/Dockerfile
  • dockers/example/client/Dockerfile
  • dockers/gateway/filter/Dockerfile
  • dockers/gateway/lb/Dockerfile
  • dockers/gateway/mirror/Dockerfile
  • dockers/index/job/correction/Dockerfile
  • dockers/index/job/creation/Dockerfile
  • dockers/index/job/deletion/Dockerfile
  • dockers/index/job/readreplica/rotate/Dockerfile
  • dockers/index/job/save/Dockerfile
  • dockers/index/operator/Dockerfile
  • dockers/manager/index/Dockerfile
  • dockers/operator/helm/Dockerfile
  • dockers/tools/benchmark/job/Dockerfile
  • dockers/tools/benchmark/operator/Dockerfile
  • dockers/tools/cli/loadtest/Dockerfile
  • example/client/go.mod
  • go.mod
  • hack/go.mod.default
  • internal/cache/cache.go
  • internal/cache/gache/gache.go
  • internal/cache/gache/option.go
  • internal/cache/option.go
  • internal/core/algorithm/ngt/ngt_test.go
  • internal/net/dialer.go
  • internal/safety/safety.go
  • rust/rust-toolchain
  • rust/rust-toolchain.toml
  • versions/BUF_VERSION
  • versions/K3S_VERSION
  • versions/KUBECTL_VERSION
  • versions/NGT_VERSION
  • versions/OPERATOR_SDK_VERSION
  • versions/PROMETHEUS_STACK_VERSION
  • versions/PROTOBUF_VERSION
  • versions/RUST_VERSION
  • versions/TELEPRESENCE_VERSION
  • versions/USEARCH_VERSION
  • versions/actions/ACTIONS_SETUP_GO
  • versions/actions/ACTIONS_SETUP_NODE
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
  • versions/actions/GITHUB_ISSUE_METRICS
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (6)
internal/core/algorithm/ngt/ngt.go (3)

119-122: LGTM: Good synchronization primitives added

The addition of singleflight.Group and sync.Mutex improves thread safety and prevents redundant concurrent operations.


486-487: LGTM: Proper initialization of synchronization primitives

The mutex and singleflight group are properly initialized with appropriate error handling through deferred Close.


576-577: LGTM: Consistent synchronization pattern

The mutex locks are consistently applied around critical sections that access shared resources in create, save, and close operations.

Also applies to: 915-916, 935-936, 1089-1090

internal/net/dialer_test.go (2)

1519-1520: LGTM: Test args updated to match new method signature.

The addition of the cache *dialerCache parameter to the args struct correctly reflects the changes to the cacheExpireHook method signature.


1589-1589: LGTM: Test implementation updated correctly.

The test function call has been properly updated to include the new cache parameter, maintaining consistency with the method signature change.

pkg/agent/core/ngt/service/ngt.go (1)

Line range hint 1456-1479: Implementation of loadStatistics method is correct

The loadStatistics method is properly implemented. It checks if statistics are enabled before attempting to load them, handles any errors returned by GetGraphStatistics, and stores the retrieved statistics in n.statisticsCache. This ensures that index statistics are managed correctly without impacting the system when statistics are disabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants