Add LLM Support #160

neilnaveen · 2024-12-03T20:40:34Z

Added an OpenAI llm to the CLI, that can run leaderboards, queries, and globsearchs all from user input, this is also a chat session, so the llm, can create queries based on the conversation
Batched SQLite so that it does not overflow on large amounts of data

Summary by CodeRabbit

New Features
- Introduced a command-line interface (CLI) for querying a graph database using natural language through a large language model (LLM).
- Added options for configuring OpenAI's LLM and vector database within the server settings.
Bug Fixes
- Enhanced error handling during database initialization and document addition for improved reliability.
Chores
- Updated dependencies to include new libraries for enhanced functionality.
- Implemented a batching approach for saving caches to improve performance and reliability.

coderabbitai · 2024-12-03T20:40:45Z

Caution

Review failed

The pull request is closed.

Walkthrough

The changes introduce a command-line interface (CLI) for querying a graph database focused on supply chain security artifacts. The new CLI, encapsulated in llm.go, allows users to interact with a large language model (LLM) using natural language queries. It supports various command options and manages interactions with the OpenAI API and ChromaDB. The root command structure is updated to include this new command, and enhancements are made to the server configuration to support the LLM feature. Additionally, the go.mod file is updated with new dependencies, and the SQL storage method is modified to save caches in batches.

Changes

File	Change Summary
cmd/llm/llm.go	- Introduced CLI for querying a graph database with a Cobra command named `llm`. - Added functions and types for command options and execution.
cmd/root/root.go	- Added the `llm.New()` command to the root command structure.
cmd/server/server.go	- Enhanced `options` struct with `UseOpenAILLM` and `VectorDBPath`. - Updated `AddFlags` and `startServer` methods for new functionality with ChromaDB.
go.mod	- Added new direct dependencies: `github.com/philippgille/chromem-go v0.7.0` and `github.com/sashabaranov/go-openai v1.36.0`.
pkg/storages/sql.go	- Modified `SaveCaches` method to process caches in batches of 500, enhancing error handling and performance for SQLite operations.

Possibly related PRs

Add functionality to inspect nodes #104: Updates to the command structure and documentation related to querying nodes, aligning with the new CLI functionality.
Refactor cmd for globalsearch #113: Refactor of the command structure for globsearch, relevant to the new command for querying a graph database.
Refactor getmetada cmd #116: Refactor of the getMetadata command, including updates to command structure and output handling, relating to CLI changes.
Refactor server cmd #119: Refactor of the server command, potentially connected to the overall command structure introduced in the main PR.
refactor: improve cache command implementation #120: Improvements to the cache command implementation, linking to database interactions in the main PR.
Refactor custom leaderboard cmd package #130: Refactor of the custom leaderboard command package, sharing similarities in command structure and output formatting.
SQL storage #137: Introduction of SQL storage functionality, relating to database interactions through the new CLI.

Suggested reviewers

neilnaveen

Poem

🐰 In the garden of code, new paths we weave,
A CLI blooms, for queries to achieve.
With LLM's wisdom, we chat and explore,
In the realm of data, there's always more!
Batches of caches, we save with delight,
Together we venture, into the night! 🌙

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9d87ed8 and c8f8522.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (5)

cmd/llm/llm.go (1 hunks)
cmd/root/root.go (2 hunks)
cmd/server/server.go (4 hunks)
go.mod (1 hunks)
pkg/storages/sql.go (1 hunks)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (9)

cmd/server/server.go (2)
36-37: Ensure consistent field naming in the options struct

The field UseOpenAILLM has inconsistent capitalization. For readability and to follow Go naming conventions, consider renaming it to UseOpenAiLLM or UseOpenAILlm.

Apply this diff:
type options struct {
    // ...
-   UseOpenAILLM  bool
+   UseOpenAiLLM  bool
    VectorDBPath  string
}
Don't forget to update all references to this field throughout the codebase.

60-61: Align flag names with field names for clarity

In the AddFlags method, ensure that the flag names accurately reflect the field names. Since the field is named UseOpenAiLLM, consider renaming the flag to --use-openai-llm for consistency.

Apply this diff:
cmd.Flags().BoolVar(&o.UseOpenAILLM, "use-openai-llm", false, "Use OpenAI LLM for graph analysis")
Adjust the variable name if you rename the field as suggested.
pkg/storages/sql.go (1)
282-309: Optimize batch processing logic in SaveCaches method

The current implementation of batching in SaveCaches is functional but can be optimized for clarity and efficiency. Consider the following improvements:

Use min(i + batchSize, len(caches)) to calculate the end index without an explicit if statement.

Preallocate the kvCaches slice with batchSize to improve memory allocation.

Apply this diff:
const batchSize = 500 // Safe batch size considering SQLite's limits

// Process caches in batches
for i := 0; i < len(caches); i += batchSize {
-   end := i + batchSize
-   if end > len(caches) {
-       end = len(caches)
-   }
+   end := int(math.Min(float64(i+batchSize), float64(len(caches))))

    batch := caches[i:end]
    kvCaches := make([]KVStore, 0, len(batch))
    for _, cache := range batch {
        cacheKey := fmt.Sprintf("%s%d", CacheKeyPrefix, cache.ID)
        data, err := cache.MarshalJSON()
        if err != nil {
            return fmt.Errorf("failed to marshal cache: %w", err)
        }
        kvCaches = append(kvCaches, KVStore{
            Key:   cacheKey,
            Value: string(data),
        })
    }

    if err := s.DB.Save(&kvCaches).Error; err != nil {
        return fmt.Errorf("failed to save caches batch: %w", err)
    }
}
cmd/llm/llm.go (4)
61-61: Remove unused field saveQuery

The field saveQuery is declared but not used anywhere in the code. To keep the codebase clean, consider removing it.

Apply this diff:
type options struct {
    maxOutput          int
    showInfo           bool
-   saveQuery          string
    addr               string
    // ...
}
🧰 Tools

🪛 golangci-lint (1.62.2)

61-61: field saveQuery is unused

(unused)

82-84: Handle missing OPENAI_API_KEY environment variable gracefully

Instead of returning an error when the OPENAI_API_KEY is not set, consider providing a more user-friendly message or guidance on how to set it.

Apply this diff:
if os.Getenv("OPENAI_API_KEY") == "" {
-   return fmt.Errorf("OPENAI_API_KEY environment variable is not set")
+   fmt.Println("Please set the OPENAI_API_KEY environment variable to use this feature.")
+   return nil
}
139-143: Trim and validate user input before processing

Add input validation to handle empty inputs or potential unintended commands.

Apply this diff:
input = strings.TrimSpace(input)
+ if input == "" {
+     fmt.Println("Input cannot be empty. Please enter a valid query.")
+     continue
+ }
if strings.ToLower(input) == "exit" {
    fmt.Println("Ending chat session. Goodbye!")
    return nil
}
278-285: Refine assistant's feedback to the user

The assistant's feedback message could be more conversational and helpful by directly incorporating the query result without a repetitive prompt.

Apply this diff:
feedbackMsg := fmt.Sprintf("%s", queryResult)
messages = append(messages, openai.ChatCompletionMessage{
    Role:    openai.ChatMessageRoleAssistant,
    Content: feedbackMsg,
})
cmd/root/root.go (1)
58-58: Maintain alphabetical order of commands in rootCmd

For better readability and organization, consider adding the llm command in alphabetical order within the list of commands.

Apply this diff:
rootCmd.AddCommand(cache.New())
+ rootCmd.AddCommand(llm.New())
rootCmd.AddCommand(leaderboard.New())
rootCmd.AddCommand(query.New())
rootCmd.AddCommand(server.New())
go.mod (1)
52-54: Group new dependencies with existing ones

For clarity, place the new direct dependencies (chromem-go and go-openai) within the existing require block, keeping indirect dependencies separate.

Apply this diff:
require (
    // Existing dependencies...
+   github.com/philippgille/chromem-go v0.7.0
+   github.com/sashabaranov/go-openai v1.36.0
)

require (
    // Indirect dependencies...
)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between fbc0864 and 9d87ed8.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (5)

cmd/llm/llm.go (1 hunks)
cmd/root/root.go (2 hunks)
cmd/server/server.go (4 hunks)
go.mod (1 hunks)
pkg/storages/sql.go (1 hunks)

🧰 Additional context used

🪛 golangci-lint (1.62.2)

cmd/llm/llm.go

61-61: field saveQuery is unused

(unused)

cmd/server/server.go

- Added an OpenAI llm to the CLI, that can run leaderboards, queries, and globsearchs all from user input, this is also a chat session, so the llm, can create queries based on the conversation - Batched SQLite so that it does not overflow on large amounts of data Signed-off-by: neilnaveen <[email protected]>

Signed-off-by: naveensrinivasan <[email protected]>

codecov · 2024-12-03T22:21:52Z

Codecov Report

Attention: Patch coverage is 4.63918% with 370 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
cmd/llm/llm.go	0.00%	278 Missing ⚠️
cmd/server/server.go	2.27%	86 Missing ⚠️
pkg/storages/sql.go	76.19%	3 Missing and 2 partials ⚠️
cmd/root/root.go	0.00%	1 Missing ⚠️

Files with missing lines	Coverage Δ
cmd/root/root.go	`0.00% <0.00%> (ø)`
pkg/storages/sql.go	`56.94% <76.19%> (ø)`
cmd/server/server.go	`38.60% <2.27%> (ø)`
cmd/llm/llm.go	`0.00% <0.00%> (ø)`

coderabbitai bot reviewed Dec 3, 2024

View reviewed changes

cmd/server/server.go Outdated Show resolved Hide resolved

cmd/server/server.go Outdated Show resolved Hide resolved

cmd/server/server.go Outdated Show resolved Hide resolved

neilnaveen and others added 2 commits December 3, 2024 15:11

Fixed the log.fatal to error

c8f8522

Signed-off-by: naveensrinivasan <[email protected]>

naveensrinivasan force-pushed the neil/addLLmSupport branch from 9d87ed8 to c8f8522 Compare December 3, 2024 21:13

naveensrinivasan approved these changes Dec 3, 2024

View reviewed changes

naveensrinivasan enabled auto-merge (squash) December 3, 2024 21:14

naveensrinivasan merged commit 11c9968 into main Dec 3, 2024
12 of 14 checks passed

naveensrinivasan deleted the neil/addLLmSupport branch December 3, 2024 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLM Support #160

Add LLM Support #160

neilnaveen commented Dec 3, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 3, 2024 •

edited

Loading

Review failed

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

codecov bot commented Dec 3, 2024

Add LLM Support #160

Add LLM Support #160

Conversation

neilnaveen commented Dec 3, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Dec 3, 2024 • edited Loading

Review failed

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 3, 2024

Codecov Report

neilnaveen commented Dec 3, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 3, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)