Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLM Support #160

Merged
merged 2 commits into from
Dec 3, 2024
Merged

Add LLM Support #160

merged 2 commits into from
Dec 3, 2024

Conversation

neilnaveen
Copy link
Member

@neilnaveen neilnaveen commented Dec 3, 2024

  • Added an OpenAI llm to the CLI, that can run leaderboards, queries, and globsearchs all from user input, this is also a chat session, so the llm, can create queries based on the conversation
  • Batched SQLite so that it does not overflow on large amounts of data

Summary by CodeRabbit

  • New Features

    • Introduced a command-line interface (CLI) for querying a graph database using natural language through a large language model (LLM).
    • Added options for configuring OpenAI's LLM and vector database within the server settings.
  • Bug Fixes

    • Enhanced error handling during database initialization and document addition for improved reliability.
  • Chores

    • Updated dependencies to include new libraries for enhanced functionality.
    • Implemented a batching approach for saving caches to improve performance and reliability.

Copy link
Contributor

coderabbitai bot commented Dec 3, 2024

Caution

Review failed

The pull request is closed.

Walkthrough

The changes introduce a command-line interface (CLI) for querying a graph database focused on supply chain security artifacts. The new CLI, encapsulated in llm.go, allows users to interact with a large language model (LLM) using natural language queries. It supports various command options and manages interactions with the OpenAI API and ChromaDB. The root command structure is updated to include this new command, and enhancements are made to the server configuration to support the LLM feature. Additionally, the go.mod file is updated with new dependencies, and the SQL storage method is modified to save caches in batches.

Changes

File Change Summary
cmd/llm/llm.go - Introduced CLI for querying a graph database with a Cobra command named llm.
- Added functions and types for command options and execution.
cmd/root/root.go - Added the llm.New() command to the root command structure.
cmd/server/server.go - Enhanced options struct with UseOpenAILLM and VectorDBPath.
- Updated AddFlags and startServer methods for new functionality with ChromaDB.
go.mod - Added new direct dependencies: github.com/philippgille/chromem-go v0.7.0 and github.com/sashabaranov/go-openai v1.36.0.
pkg/storages/sql.go - Modified SaveCaches method to process caches in batches of 500, enhancing error handling and performance for SQLite operations.

Possibly related PRs

Suggested reviewers

  • neilnaveen

Poem

🐰 In the garden of code, new paths we weave,
A CLI blooms, for queries to achieve.
With LLM's wisdom, we chat and explore,
In the realm of data, there's always more!
Batches of caches, we save with delight,
Together we venture, into the night! 🌙


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9d87ed8 and c8f8522.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (5)
  • cmd/llm/llm.go (1 hunks)
  • cmd/root/root.go (2 hunks)
  • cmd/server/server.go (4 hunks)
  • go.mod (1 hunks)
  • pkg/storages/sql.go (1 hunks)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (9)
cmd/server/server.go (2)

36-37: Ensure consistent field naming in the options struct

The field UseOpenAILLM has inconsistent capitalization. For readability and to follow Go naming conventions, consider renaming it to UseOpenAiLLM or UseOpenAILlm.

Apply this diff:

type options struct {
    // ...
-   UseOpenAILLM  bool
+   UseOpenAiLLM  bool
    VectorDBPath  string
}

Don't forget to update all references to this field throughout the codebase.


60-61: Align flag names with field names for clarity

In the AddFlags method, ensure that the flag names accurately reflect the field names. Since the field is named UseOpenAiLLM, consider renaming the flag to --use-openai-llm for consistency.

Apply this diff:

cmd.Flags().BoolVar(&o.UseOpenAILLM, "use-openai-llm", false, "Use OpenAI LLM for graph analysis")

Adjust the variable name if you rename the field as suggested.

pkg/storages/sql.go (1)

282-309: Optimize batch processing logic in SaveCaches method

The current implementation of batching in SaveCaches is functional but can be optimized for clarity and efficiency. Consider the following improvements:

  • Use min(i + batchSize, len(caches)) to calculate the end index without an explicit if statement.
  • Preallocate the kvCaches slice with batchSize to improve memory allocation.

Apply this diff:

const batchSize = 500 // Safe batch size considering SQLite's limits

// Process caches in batches
for i := 0; i < len(caches); i += batchSize {
-   end := i + batchSize
-   if end > len(caches) {
-       end = len(caches)
-   }
+   end := int(math.Min(float64(i+batchSize), float64(len(caches))))

    batch := caches[i:end]
    kvCaches := make([]KVStore, 0, len(batch))
    for _, cache := range batch {
        cacheKey := fmt.Sprintf("%s%d", CacheKeyPrefix, cache.ID)
        data, err := cache.MarshalJSON()
        if err != nil {
            return fmt.Errorf("failed to marshal cache: %w", err)
        }
        kvCaches = append(kvCaches, KVStore{
            Key:   cacheKey,
            Value: string(data),
        })
    }

    if err := s.DB.Save(&kvCaches).Error; err != nil {
        return fmt.Errorf("failed to save caches batch: %w", err)
    }
}
cmd/llm/llm.go (4)

61-61: Remove unused field saveQuery

The field saveQuery is declared but not used anywhere in the code. To keep the codebase clean, consider removing it.

Apply this diff:

type options struct {
    maxOutput          int
    showInfo           bool
-   saveQuery          string
    addr               string
    // ...
}
🧰 Tools
🪛 golangci-lint (1.62.2)

61-61: field saveQuery is unused

(unused)


82-84: Handle missing OPENAI_API_KEY environment variable gracefully

Instead of returning an error when the OPENAI_API_KEY is not set, consider providing a more user-friendly message or guidance on how to set it.

Apply this diff:

if os.Getenv("OPENAI_API_KEY") == "" {
-   return fmt.Errorf("OPENAI_API_KEY environment variable is not set")
+   fmt.Println("Please set the OPENAI_API_KEY environment variable to use this feature.")
+   return nil
}

139-143: Trim and validate user input before processing

Add input validation to handle empty inputs or potential unintended commands.

Apply this diff:

input = strings.TrimSpace(input)
+ if input == "" {
+     fmt.Println("Input cannot be empty. Please enter a valid query.")
+     continue
+ }
if strings.ToLower(input) == "exit" {
    fmt.Println("Ending chat session. Goodbye!")
    return nil
}

278-285: Refine assistant's feedback to the user

The assistant's feedback message could be more conversational and helpful by directly incorporating the query result without a repetitive prompt.

Apply this diff:

feedbackMsg := fmt.Sprintf("%s", queryResult)
messages = append(messages, openai.ChatCompletionMessage{
    Role:    openai.ChatMessageRoleAssistant,
    Content: feedbackMsg,
})
cmd/root/root.go (1)

58-58: Maintain alphabetical order of commands in rootCmd

For better readability and organization, consider adding the llm command in alphabetical order within the list of commands.

Apply this diff:

rootCmd.AddCommand(cache.New())
+ rootCmd.AddCommand(llm.New())
rootCmd.AddCommand(leaderboard.New())
rootCmd.AddCommand(query.New())
rootCmd.AddCommand(server.New())
go.mod (1)

52-54: Group new dependencies with existing ones

For clarity, place the new direct dependencies (chromem-go and go-openai) within the existing require block, keeping indirect dependencies separate.

Apply this diff:

require (
    // Existing dependencies...
+   github.com/philippgille/chromem-go v0.7.0
+   github.com/sashabaranov/go-openai v1.36.0
)

require (
    // Indirect dependencies...
)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between fbc0864 and 9d87ed8.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (5)
  • cmd/llm/llm.go (1 hunks)
  • cmd/root/root.go (2 hunks)
  • cmd/server/server.go (4 hunks)
  • go.mod (1 hunks)
  • pkg/storages/sql.go (1 hunks)
🧰 Additional context used
🪛 golangci-lint (1.62.2)
cmd/llm/llm.go

61-61: field saveQuery is unused

(unused)

cmd/server/server.go Outdated Show resolved Hide resolved
cmd/server/server.go Outdated Show resolved Hide resolved
cmd/server/server.go Outdated Show resolved Hide resolved
neilnaveen and others added 2 commits December 3, 2024 15:11
- Added an OpenAI llm to the CLI, that can run leaderboards, queries,
  and globsearchs all from user input, this is also a chat session, so
the llm, can create queries based on the conversation
- Batched SQLite so that it does not overflow on large amounts of data

Signed-off-by: neilnaveen <[email protected]>
Signed-off-by: naveensrinivasan <[email protected]>
@naveensrinivasan naveensrinivasan enabled auto-merge (squash) December 3, 2024 21:14
@naveensrinivasan naveensrinivasan merged commit 11c9968 into main Dec 3, 2024
12 of 14 checks passed
@naveensrinivasan naveensrinivasan deleted the neil/addLLmSupport branch December 3, 2024 21:16
Copy link

codecov bot commented Dec 3, 2024

Codecov Report

Attention: Patch coverage is 4.63918% with 370 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
cmd/llm/llm.go 0.00% 278 Missing ⚠️
cmd/server/server.go 2.27% 86 Missing ⚠️
pkg/storages/sql.go 76.19% 3 Missing and 2 partials ⚠️
cmd/root/root.go 0.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
cmd/root/root.go 0.00% <0.00%> (ø)
pkg/storages/sql.go 56.94% <76.19%> (ø)
cmd/server/server.go 38.60% <2.27%> (ø)
cmd/llm/llm.go 0.00% <0.00%> (ø)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants