-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(wren-ai-service): change sql through reasoning #1273
Conversation
WalkthroughThe changes update SQL processing across the project by adding and modifying pipeline entries and functions related to SQL regeneration while removing the legacy SQL explanation functionality. Configuration files now include a new Changes
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms (2)
🔇 Additional comments (1)
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
7024507
to
3db391b
Compare
3db391b
to
f61a024
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (10)
wren-ai-service/src/pipelines/generation/sql_regeneration.py (2)
24-38
: Validate the output format.
Including a JSON format requirement is good practice, but consider verifying or parsing the final LLM-generated string to confirm it strictly follows the expected JSON shape.
132-135
: Default mutable parameter usage.
Passing a mutable default (Configuration()
) can cause unexpected behavior if the object is modified. Consider defaulting toNone
and instantiating inside the method to avoid shared state.wren-ai-service/src/web/v1/services/ask.py (3)
101-115
: Consider direct property definition instead of private_query_id
.
This class replicates the getter-setter pattern fromAskRequest
, but you might simplify by storingquery_id
directly unless there's a special need for these property methods.
122-134
: Similar approach toStopAskRequest
.
No immediate concerns, but consider normalizing naming and usage if you integrate both for consistency.
551-706
: Unified logic opportunity inask_feedback
.
Most logic inask_feedback
strongly resembles the mainask
method. Extract common parts into a shared function or module to reduce duplication and potential inconsistency. Also be mindful of concurrency if multiple feedback requests for the same query ID arrive simultaneously.wren-ai-service/demo/utils.py (2)
587-624
: Refactor common error handling pattern.The error handling pattern is duplicated across multiple functions (
ask_feedback
,save_sql_pair
, etc.). Consider extracting it into a reusable helper function.Create a helper function like this:
def handle_api_response(response_future, operation_name: str): """ Generic handler for API responses with status polling. Args: response_future: Async function that returns response with status operation_name: Name of the operation for toast messages """ status = None while not status or ( status != "finished" and status != "failed" and status != "stopped" ): response = response_future() assert response.status_code == 200 status = response.json()["status"] st.toast(f"The {operation_name} status: {status}") time.sleep(POLLING_INTERVAL) if status == "finished": return response.json() elif status == "failed": st.error( f'An error occurred while processing the {operation_name}: {response.json()['error']}', icon="🚨", ) return None
626-662
: Add retry mechanism for API calls.The
save_sql_pair
function makes API calls without any retry mechanism. Consider adding retries for better reliability.Add a retry decorator:
from functools import wraps import time def retry_on_exception(retries=3, delay=1): def decorator(func): @wraps(func) def wrapper(*args, **kwargs): for i in range(retries): try: return func(*args, **kwargs) except Exception as e: if i == retries - 1: raise time.sleep(delay) return None return wrapper return decorator @retry_on_exception() def save_sql_pair(question: str, sql: str): # existing implementationwren-ai-service/docs/config_examples/config.ollama.yaml (1)
118-121
: Fix trailing whitespace.Remove trailing whitespace on line 121 to maintain consistent formatting.
Apply this diff to fix the formatting:
- name: sql_regeneration llm: litellm_llm.openai/phi4:14b engine: wren_ui -
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 121-121: trailing spaces
(trailing-spaces)
wren-ai-service/tools/config/config.example.yaml (1)
145-147
: Consider adding documentation for the sql_regeneration pipeline.Since this is an example configuration file that serves as a template, consider adding comments explaining the purpose and requirements of the sql_regeneration pipeline.
Add documentation above the sql_regeneration entry:
+ # SQL regeneration pipeline for improving SQL queries through reasoning + # Requires: + # - LLM model with SQL generation capabilities + # - wren_ui engine for SQL execution - name: sql_regeneration llm: litellm_llm.gpt-4o-mini-2024-07-18 engine: wren_uideployment/kustomizations/base/cm.yaml (1)
179-181
: Verify the intentional model difference between configurations.The
sql_regeneration
pipeline entry is using a different model compared toconfig.deepseek.yaml
:
- cm.yaml:
gpt-4o-mini-2024-07-18
- config.deepseek.yaml:
deepseek/deepseek-coder
While this might be intentional for different environments, please confirm this is the desired configuration.
Consider documenting the rationale for using different models in different environments to help with maintenance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (28)
deployment/kustomizations/base/cm.yaml
(1 hunks)docker/config.example.yaml
(1 hunks)wren-ai-service/demo/app.py
(1 hunks)wren-ai-service/demo/utils.py
(6 hunks)wren-ai-service/docs/config_examples/config.azure.yaml
(1 hunks)wren-ai-service/docs/config_examples/config.deepseek.yaml
(1 hunks)wren-ai-service/docs/config_examples/config.google_ai_studio.yaml
(1 hunks)wren-ai-service/docs/config_examples/config.groq.yaml
(1 hunks)wren-ai-service/docs/config_examples/config.ollama.yaml
(1 hunks)wren-ai-service/src/globals.py
(1 hunks)wren-ai-service/src/pipelines/generation/__init__.py
(1 hunks)wren-ai-service/src/pipelines/generation/followup_sql_generation.py
(2 hunks)wren-ai-service/src/pipelines/generation/sql_correction.py
(4 hunks)wren-ai-service/src/pipelines/generation/sql_expansion.py
(2 hunks)wren-ai-service/src/pipelines/generation/sql_explanation.py
(0 hunks)wren-ai-service/src/pipelines/generation/sql_generation.py
(1 hunks)wren-ai-service/src/pipelines/generation/sql_regeneration.py
(2 hunks)wren-ai-service/src/pipelines/generation/utils/sql.py
(3 hunks)wren-ai-service/src/web/v1/routers/__init__.py
(0 hunks)wren-ai-service/src/web/v1/routers/ask.py
(2 hunks)wren-ai-service/src/web/v1/routers/sql_explanations.py
(0 hunks)wren-ai-service/src/web/v1/routers/sql_pairs.py
(1 hunks)wren-ai-service/src/web/v1/routers/sql_regenerations.py
(0 hunks)wren-ai-service/src/web/v1/services/__init__.py
(0 hunks)wren-ai-service/src/web/v1/services/ask.py
(11 hunks)wren-ai-service/tests/pytest/test_main.py
(0 hunks)wren-ai-service/tools/config/config.example.yaml
(1 hunks)wren-ai-service/tools/config/config.full.yaml
(1 hunks)
💤 Files with no reviewable changes (6)
- wren-ai-service/src/web/v1/routers/init.py
- wren-ai-service/tests/pytest/test_main.py
- wren-ai-service/src/web/v1/services/init.py
- wren-ai-service/src/web/v1/routers/sql_regenerations.py
- wren-ai-service/src/web/v1/routers/sql_explanations.py
- wren-ai-service/src/pipelines/generation/sql_explanation.py
🧰 Additional context used
🪛 YAMLlint (1.35.1)
wren-ai-service/docs/config_examples/config.ollama.yaml
[error] 121-121: trailing spaces
(trailing-spaces)
🔇 Additional comments (34)
wren-ai-service/src/pipelines/generation/sql_regeneration.py (8)
3-3
: No concerns with the updated import statement.
13-18
: No issues with these newly imported symbols.
19-19
: Import of Configuration looks fine.
78-79
: No concerns with the newregenerate_sql
function definition.
82-82
: No issues with returning the generator result here.
86-94
: Add fallback whenreplies
is missing.
Ifregenerate_sql
doesn’t contain areplies
key,post_processor.run()
might fail. A defensive check ensures robust error handling.
129-129
: Observer decorator rename is consistent.
161-162
: No issues with the dry-run usage for testing.wren-ai-service/src/web/v1/services/ask.py (5)
95-95
: Renaming tosql_generation_reasoning
is consistent with the new structure.
118-119
: No issues with the simple response model.
136-138
: No concerns with theStopAskFeedbackResponse
.
140-156
: No issues with the new feedback result request/response models.
171-173
: Caution with concurrent updates to_ask_feedback_results
.
TTLCache is not thread-safe. Consider adding locks or ensuring single-thread usage to avoid race conditions in high-concurrency environments.wren-ai-service/src/pipelines/generation/__init__.py (1)
36-36
: Re-addingSQLRegeneration
to__all__
.
No objections; this aligns with the recent pipeline changes focusing on regeneration.wren-ai-service/src/pipelines/generation/sql_correction.py (3)
28-28
: LGTM! Improved prompt clarity.The system prompt has been simplified by combining two sentences into one, making it more concise while maintaining the same meaning.
41-46
: LGTM! Added conditional rendering of database schema.The addition of the conditional block ensures that the database schema section is only included when documents are available, improving prompt relevance.
109-111
: Verify the impact of using shared model kwargs.The switch to
SQL_GENERATION_MODEL_KWARGS
suggests a standardization of model configuration. Ensure this change maintains the necessary response format for SQL correction.✅ Verification successful
🏁 Script executed:
#!/bin/bash # Description: Check if SQL_GENERATION_MODEL_KWARGS maintains required response format # Search for SQL_GENERATION_MODEL_KWARGS definition rg -A 5 "SQL_GENERATION_MODEL_KWARGS\s*=" src/Length of output: 93
I've noticed that the initial search command looked in the wrong directory (it used
src/
rather than the correct path) and, as a result, returned no results. To verify the impact of using shared model kwargs, we need to locate the definition ofSQL_GENERATION_MODEL_KWARGS
in its proper context and inspect its settings to ensure it preserves the expected SQL correction response format.Let's re-run the search in the correct directory.
Please execute the above script and provide the output so we can continue our verification.
🏁 Script executed:
#!/bin/bash # Search for the definition of SQL_GENERATION_MODEL_KWARGS in the appropriate directory (wren-ai-service/src/) rg -A 5 "SQL_GENERATION_MODEL_KWARGS\s*=" wren-ai-service/src/Length of output: 609
Verification of Shared Model Kwargs for SQL Correction Response
- The
SQL_GENERATION_MODEL_KWARGS
is defined inwren-ai-service/src/pipelines/generation/utils/sql.py
.- It specifies a
"response_format"
with"type": "json_schema"
.- The
"json_schema"
is set usingSqlGenerationResult.model_json_schema()
, ensuring the expected SQL generation result format is maintained.wren-ai-service/src/pipelines/generation/sql_expansion.py (2)
13-16
: LGTM! Simplified imports.The imports have been consolidated to include only the necessary components from the SQL utils module.
104-104
: Verify the impact of using shared model kwargs.Similar to the correction pipeline, ensure that using
SQL_GENERATION_MODEL_KWARGS
maintains the required response format for SQL expansion.✅ Verification successful
🏁 Script executed:
#!/bin/bash # Description: Check if SQL_GENERATION_MODEL_KWARGS is used consistently # Search for all usages of SQL_GENERATION_MODEL_KWARGS rg "SQL_GENERATION_MODEL_KWARGS" src/Length of output: 82
I'll run a new shell script to search for the definition and usage of
SQL_GENERATION_MODEL_KWARGS
specifically in thewren-ai-service
directory. This should help us determine whether it’s defined correctly and maintained consistently.
🏁 Script executed:
#!/bin/bash # Search for all occurrences of SQL_GENERATION_MODEL_KWARGS in the wren-ai-service directory rg -n "SQL_GENERATION_MODEL_KWARGS" wren-ai-serviceLength of output: 1266
Shared Model Kwargs Usage is Consistent Across Pipelines
The constant
SQL_GENERATION_MODEL_KWARGS
is defined centrally inwren-ai-service/src/pipelines/generation/utils/sql.py
and is used in the same manner in the SQL expansion pipeline as in the correction, generation, and regeneration pipelines. This confirms that the shared model kwargs maintain the required response format for SQL expansion.wren-ai-service/src/pipelines/generation/sql_generation.py (1)
13-18
: LGTM! Consistent model configuration.The import changes align with the standardization of SQL generation model configuration across pipelines.
wren-ai-service/src/web/v1/routers/sql_pairs.py (1)
147-147
: LGTM! Enhanced status tracking.The addition of "indexing" status to the Literal type provides better visibility into the SQL pairs preparation process, aligning with the status set in the prepare endpoint (line 100).
wren-ai-service/src/pipelines/generation/followup_sql_generation.py (1)
13-18
: LGTM! Improved code organization by using shared constants.The change to use
SQL_GENERATION_MODEL_KWARGS
fromsql.py
instead of a local constant improves maintainability by centralizing the configuration.wren-ai-service/src/web/v1/routers/ask.py (3)
146-166
: LGTM! Well-structured feedback endpoint implementation.The POST endpoint follows the same pattern as the existing
/asks
endpoint, with proper background task handling and status initialization.
169-181
: LGTM! Consistent implementation of feedback cancellation.The PATCH endpoint maintains consistency with the existing stop functionality in
/asks/{query_id}
.
184-191
: LGTM! Clean implementation of feedback result retrieval.The GET endpoint follows the same pattern as the existing result retrieval endpoints.
wren-ai-service/demo/app.py (1)
61-62
: LGTM! Added session state for SQL generation reasoning.The addition of
sql_generation_reasoning
aligns with the PR's objective to change SQL through reasoning.wren-ai-service/src/globals.py (1)
111-114
: LGTM! Consolidated SQL regeneration into ask service.The addition of the
sql_regeneration
pipeline toask_service
improves service organization by consolidating related functionality.wren-ai-service/src/pipelines/generation/utils/sql.py (1)
556-564
: LGTM! Well-structured model kwargs.The
SQL_GENERATION_MODEL_KWARGS
constant is well-defined using the Pydantic model's JSON schema, ensuring type safety and validation.docker/config.example.yaml (1)
131-133
: LGTM! Consistent pipeline configuration.The
sql_regeneration
pipeline entry is well-configured with appropriate model and engine settings.wren-ai-service/docs/config_examples/config.azure.yaml (1)
125-127
: LGTM! Verify Azure model availability.The SQL regeneration pipeline configuration is correctly structured with the appropriate model and engine specifications.
✅ Verification successful
🌐 Web query:
Is GPT-4 available through Azure OpenAI Service with API version 2024-02-15-preview?
💡 Result:
GPT-4 is available through Azure OpenAI Service using API version
2024-02-15-preview
, though with specific access requirements[1][7]. Microsoft introduced GPT-4 integration into Azure OpenAI Service in mid-May 2023[1][7], though initial access required submitting a business case for approval during its private preview phase[7][9].The
2024-02-15-preview
API version specifically supports:
- Assistants API for building AI-powered conversational agents[8]
- Text-to-speech capabilities[8]
- GPT models including GPT-3.5-Turbo and GPT-4[2][7]
Key implementation details:
- Access requires an approved Azure OpenAI Service subscription[7]
- Users must deploy GPT-4 as a separate model within their Azure OpenAI resource[3][7]
- Latest updates recommend migrating to newer API versions like
2025-01-01-preview
for enhanced features[5][8]For current implementations:
# Sample API call using Python SDK from openai import AzureOpenAI client = AzureOpenAI( api_version="2024-02-15-preview", azure_endpoint="https://your-resource.openai.azure.com/", api_key="your-api-key" ) response = client.chat.completions.create( model="gpt-4", # Your deployment name messages=[{"role": "user", "content": "Explain quantum computing"}] )Microsoft continues expanding capabilities through newer API versions while maintaining backward compatibility with
2024-02-15-preview
for existing implementations[5][8].Citations:
- 1: https://www.webpronews.com/gpt-4o-is-available-on-microsoft-azure-ai/
- 2: https://learn.microsoft.com/en-us/answers/questions/1640277/update-your-code-to-use-the-latest-azure-openai-se
- 3: https://trailheadtechnology.com/deploying-a-gpt-4o-model-to-azure-openai-service/
- 4: https://www.serverless-solutions.com/gpt-4-comes-to-azure-openai-service/
- 5: https://learn.microsoft.com/nb-no/azure/ai-services/openai/api-version-deprecation
- 6: https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview/2024-02-15-preview/inference.json
- 7: https://team-gpt.com/blog/how-to-setup-microsoft-azure-openai-service/
- 8: https://learn.microsoft.com/en-us/azure/ai-services/openai/reference-preview
- 9: https://www.dev4side.com/en/blog/azure-openai-service
- 10: https://community.openai.com/t/openai-azure-openai-service-openai-ai-assistant-api/620744
SQL Regeneration Pipeline Configuration is Correct & Azure GPT-4 is Available
The Azure model configuration in the SQL regeneration pipeline is properly structured. The web query confirms that GPT-4 is available through Azure OpenAI Service with API version
2024-02-15-preview
(with the noted access requirements). No follow-up changes are needed.wren-ai-service/tools/config/config.full.yaml (1)
145-147
: LGTM! Verify OpenAI model availability.The SQL regeneration pipeline configuration is correctly structured with the appropriate model and engine specifications.
✅ Verification successful
🌐 Web query:
Is gpt-4o-mini-2024-07-18 a valid OpenAI model?
💡 Result:
Yes, gpt-4o-mini-2024-07-18 is a valid OpenAI model. Here's what the evidence shows:
Official Documentation Confirmation
OpenAI's API documentation explicitly lists
gpt-4o-mini-2024-07-18
as a valid model ID under its GPT‑4o mini section[7][10]. The model supports:
Included in tables:
Model Context Window Max Output Tokens gpt‑4o‑mini‑2024‑07‑18
128,000 tokens 16,384 tokens
Key Features
- Multimodal Inputs: Accepts text and image inputs (with plans for audio/video)[1][3][10].
- Performance: Outperforms GPT‑3.5 Turbo and rivals Claude Haiku/Gemini Flash on benchmarks like MMLU (82%) and multimodal reasoning (59.4% on MMMU)[1][3].
- Cost Efficiency: Priced at $0.15/million input tokens and $0.60/million output tokens—60% cheaper than GPT‑3.5 Turbo[1][7].
Use Cases
The model is designed for:
- Applications requiring parallel API calls or large context windows (e.g., full codebases)[1].
- Fine-tuning workflows where outputs from larger models like GPT‑4o can be distilled into GPT‑4o mini for cost savings[7][10].
Community Validation
Users have reported fine-tuning experiments with
gpt‑4o‑mini‑2024‑07‑18
, noting its stability compared to larger models like GPT‑4o[6]. While some faced temporary API errors (e.g., "Unknown model" issues), these appear resolved[9].
In summary, gpt‑4o‑mini‑2024‑07‑18 is both officially documented and actively used in real-world applications[1][7][10].
Citations:
- 1: https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/
- 2: https://www.restack.io/p/openai-answer-model-naming-guidelines-cat-ai
- 3: https://openrouter.ai/openai/gpt-4o-mini-2024-07-18
- 4: https://cookbook.openai.com/examples/gpt4o/introduction_to_gpt4o
- 5: https://community.openai.com/t/naming-convention-for-gpt-4-0/1096783
- 6: https://community.openai.com/t/inconsistent-fine-tuning-results-gpt-4o-vs-gpt-4o-mini/947573
- 7: https://platform.openai.com/docs/models
- 8: https://openai.com/brand
- 9: https://community.openai.com/t/unknown-model-gpt-4o-mini/946600
- 10: https://platform.openai.com/docs/models/gpt-4o
SQL Regeneration Configuration Verified
The configuration at lines 145-147 is correctly structured, and we've confirmed that the OpenAI model gpt-4o-mini-2024-07-18 is valid per OpenAI’s official documentation.wren-ai-service/docs/config_examples/config.groq.yaml (1)
119-121
: LGTM! Verify Groq model availability.The SQL regeneration pipeline configuration is correctly structured with the appropriate model and engine specifications.
✅ Verification successful
🌐 Web query:
Is llama-3.3-70b-specdec available through Groq's API?
💡 Result:
Yes, llama-3.3-70b-specdec is available through Groq's API[1][4]. The model was released on GroqCloud as part of a performance enhancement initiative, offering >6x speed improvements over previous versions by leveraging speculative decoding techniques[4]. Developers can access it via:
- GroqCloud Developer Console: Available to paying customers with an API key[4][7].
- On-demand pricing: $0.59 per million input tokens and $0.99 per million output tokens[4].
Key details:
Performance: Achieves 1,660 tokens per second on Groq’s 14nm LPU architecture[4].
Use cases: Agentic workflows, real-time conversational AI, and decision-making applications[4][7].
Integration: Supported in tools like Promptfoo and Helicone for observability and testing[6][7].For free-tier access, some platforms like App Inventor’s GroqText extension include limited daily tokens for experimentation[10].
Citations:
- 1: https://groq.com/a-new-scaling-paradigm-metas-llama-3-3-70b-challenges-death-of-scaling-law/
- 2: https://aws.amazon.com/blogs/machine-learning/llama-3-3-70b-now-available-in-amazon-sagemaker-jumpstart/
- 3: https://huggingface.co/Groq/Llama-3-Groq-70B-Tool-Use
- 4: https://groq.com/groq-first-generation-14nm-chip-just-got-a-6x-speed-boost-introducing-llama-3-1-70b-speculative-decoding-on-groqcloud/
- 5: https://aws.amazon.com/about-aws/whats-new/2024/12/metas-llama-3-3-70b-model-amazon-bedrock/
- 6: https://www.helicone.ai/blog/meta-llama-3-3-70-b-instruct
- 7: https://www.promptfoo.dev/docs/providers/groq/
- 8: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct
- 9: https://sambanova.ai/blog/meta-llama-3.3-70b-now-available-today-for-developers-and-enterprises
- 10: https://community.appinventor.mit.edu/t/paid-groqtext-extension-integrate-deepseek-llama-and-gemma-8-more-ai-llm-with-daily-free-tokens/136567
SQL Regeneration Pipeline and Groq Model Availability Verified
The configuration at lines 119-121, which adds the
sql_regeneration
pipeline using thellama-3.3-70b-specdec
model with thewren_ui
engine, is correct. The Groq API confirms thatllama-3.3-70b-specdec
is indeed available, with notable performance improvements as detailed in the web query results.wren-ai-service/docs/config_examples/config.google_ai_studio.yaml (1)
126-128
: LGTM! Verify Gemini model availability.The SQL regeneration pipeline configuration is correctly structured with the appropriate model and engine specifications.
✅ Verification successful
🌐 Web query:
Is gemini-2.0-flash-exp available through Google AI Studio?
💡 Result:
Yes, gemini-2.0-flash-exp is available through Google AI Studio as of February 2025[3][5]. Here are key details:
Availability status
- General availability: The updated Gemini 2.0 Flash (including
gemini-2.0-flash-exp
) became generally available via Google AI Studio on February 6, 2025[3].- Experimental access: Earlier experimental versions were first made available in December 2024 through Google AI Studio[5][10].
Key features
- 1M token context window for processing large inputs[2][8]
- Multimodal input (text/images/audio/video)[2][8]
- Native tool integration with Google Search and code execution[5][8]
- Free tier limits:
15 requests/minute
1M tokens/minute
1,500 requests/day[1]How to access
- Visit Google AI Studio
- Obtain a free API key
- Use model ID
gemini-2.0-flash-exp
in your API calls[1][7]The model remains experimental for certain features like text-to-speech generation and image output, which require allowlist access[9]. Developers can transition from experimental to production use through Vertex AI's Gen AI SDK[6].
Citations:
- 1: https://forum.cursor.com/t/gemini-flash-2-0-thinking-new-and-free/34390
- 2: https://ai.google.dev/gemini-api/docs/models/gemini
- 3: https://blog.google/technology/google-deepmind/gemini-model-updates-february-2025/
- 4: https://www.datacamp.com/blog/gemini-2-0-flash-experimental
- 5: https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/
- 6: https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2
- 7: https://ai.google.dev/gemini-api/docs/thinking
- 8: https://deepmind.google/technologies/gemini/flash/
- 9: https://ai.google.dev/gemini-api/docs/models/experimental-models
- 10: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/gemini-experimental
Gemini Model Verified
The SQL regeneration pipeline configuration is correct, and the
gemini-2.0-flash-exp
model is confirmed to be available through Google AI Studio as of February 2025.wren-ai-service/docs/config_examples/config.deepseek.yaml (1)
137-139
: LGTM! Configuration is consistent with other SQL-related pipelines.The new
sql_regeneration
pipeline entry follows the established pattern and uses appropriate model and engine choices.Let's verify that this configuration is consistent across all config files:
❌ Verification inconclusive
🏁 Script executed:
#!/bin/bash # Description: Check for sql_regeneration configuration consistency across config files rg -A 2 "sql_regeneration:" deployment/ docker/ wren-ai-service/Length of output: 64
SQL Regeneration Configuration Verification
The configured
sql_regeneration
pipeline inwren-ai-service/docs/config_examples/config.deepseek.yaml
appears to follow the established pattern. However, the initial search across directories (deployment/, docker/, and wren-ai-service/) returned no additional instances. This might be expected if the new configuration is only meant to reside in the provided example file.Please verify manually whether the new pipeline entry should also appear in other related config files or environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
wren-ai-service/src/pipelines/generation/sql_regeneration.py (2)
24-38
: Enhance SQL standards specification in the system prompt.While the system prompt is clear, it could be more specific about SQL standards to ensure consistent output.
Consider updating the system prompt to explicitly mention supported SQL standards and any specific requirements:
### TASK ### -You are a great ANSI SQL expert. Now you are given a SQL generation reasoning and an original SQL query, +You are a great ANSI SQL expert specializing in SQL:2016 standard. Now you are given a SQL generation reasoning and an original SQL query, please carefully review the reasoning, and then generate a new SQL query that matches the reasoning. While generating the new SQL query, you should use the original SQL query as a reference.
158-163
: Use more realistic test data in dry run.The current test data is overly simplistic. Consider using more realistic examples that cover common use cases.
Example:
dry_run_pipeline( SQLRegeneration, "sql_regeneration", - sql_generation_reasoning="this is a test query", - sql="select * from users", + sql_generation_reasoning="Add a WHERE clause to filter active users with more than 100 points", + sql="SELECT user_id, username, points FROM users", )wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (1)
131-131
: Consider addressing the TODO comment.The comment suggests adding an LLM filter as a backup for the ScoreFilter. With the recent changes to make the scoring threshold configurable, this might be a good time to implement this enhancement.
Would you like me to help implement the LLM filter or create an issue to track this task?
wren-ai-service/src/web/v1/services/ask.py (1)
554-677
: Consider refactoring error handling logic.The error handling logic in
ask_feedback
is very similar to the logic inask
. Consider extracting common error handling patterns into a shared method to improve maintainability.Example refactor:
def _handle_error(self, error: Exception, container: dict, query_id: str, results: dict): logger.exception(f"pipeline - OTHERS: {error}") container[query_id] = AskFeedbackResultResponse( status="failed", error=AskError( code="OTHERS", message=str(error), ), ) results["metadata"]["error_type"] = "OTHERS" results["metadata"]["error_message"] = str(error) return results
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
wren-ai-service/src/config.py
(1 hunks)wren-ai-service/src/globals.py
(2 hunks)wren-ai-service/src/pipelines/common.py
(2 hunks)wren-ai-service/src/pipelines/generation/sql_generation_reasoning.py
(6 hunks)wren-ai-service/src/pipelines/generation/sql_regeneration.py
(2 hunks)wren-ai-service/src/pipelines/generation/utils/sql.py
(3 hunks)wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py
(4 hunks)wren-ai-service/src/web/v1/services/ask.py
(13 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- wren-ai-service/src/pipelines/generation/utils/sql.py
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Analyze (python)
- GitHub Check: pytest
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (14)
wren-ai-service/src/pipelines/generation/sql_regeneration.py (1)
1-20
: LGTM! Well-organized imports.The imports are properly organized and include all necessary dependencies for the SQL regeneration functionality.
wren-ai-service/src/pipelines/common.py (2)
49-51
: LGTM!The function call has been reformatted for better readability while maintaining the same functionality.
67-72
: LGTM!The addition of the
max_size
parameter enhances the filtering functionality by allowing control over the number of documents returned. The implementation correctly applies the limit after sorting to ensure the top-scoring documents are returned.Also applies to: 78-78
wren-ai-service/src/config.py (1)
33-34
: LGTM!The new configuration fields are well-placed in the indexing and retrieval section, properly typed, and have sensible default values that align with their usage in the SqlPairsRetrieval class.
wren-ai-service/src/pipelines/generation/sql_generation_reasoning.py (3)
30-30
: LGTM!The changes effectively integrate SQL examples into the prompt structure, with proper conditional handling to include the examples section only when examples are provided.
Also applies to: 46-51
67-67
: LGTM!The sql_samples parameter is properly integrated into the function signature and correctly passed to the prompt builder.
Also applies to: 74-74
138-138
: LGTM!The changes properly handle the optional sql_samples parameter with appropriate default values, ensuring backward compatibility through the empty list fallback.
Also applies to: 147-147
wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (2)
85-90
: LGTM!The function properly integrates the configurable threshold and max size parameters, correctly passing them to the score filter.
Also applies to: 92-96
119-120
: LGTM!The constructor properly handles the new parameters with default values that match the global settings, storing them appropriately in the _configs dictionary.
Also applies to: 134-137
wren-ai-service/src/globals.py (1)
92-93
: LGTM! Configuration parameters added for SQL pairs retrieval.The addition of
sql_pairs_similarity_threshold
andsql_pairs_retrieval_max_size
parameters enhances the configurability of the SQL pairs retrieval pipeline.Also applies to: 113-115
wren-ai-service/src/web/v1/services/ask.py (4)
102-156
: LGTM! Well-structured feedback-related model classes.The new model classes maintain consistency with existing patterns and provide a clear structure for handling SQL feedback functionality.
177-183
: LGTM! Improved reusability of the stop check logic.The
_is_stopped
method has been refactored to accept a container parameter, making it reusable for both ask and feedback results.
343-348
: LGTM! Enhanced SQL generation with similar SQL pairs.The addition of SQL samples from similar queries can improve the quality of generated SQL.
617-617
: Consider providing context for SQL correction.The SQL correction pipeline is called with an empty contexts list, which might limit its effectiveness.
❌ Verification inconclusive
🏁 Script executed:
#!/bin/bash # Description: Check if the sql_correction pipeline requires contexts. ast-grep --pattern $'sql_correction = generation.SQLCorrection( $$$ contexts=$_, $$$ )'Length of output: 94
SQL Correction Pipeline: Manual Verification Needed
The current implementation instantiates the SQL correction pipeline with an empty contexts list. Our automated search (using ast-grep for a pattern like
contexts=$_
) didn’t produce evidence to confirm whether other parts of the codebase populate this parameter or if it’s expected to remain empty. Please manually verify if non-empty context data is needed for effective SQL regeneration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
wren-ai-service/src/pipelines/indexing/sql_pairs.py (1)
125-135
: Consider adding error handling for the cleaner.run call.While the empty check is good, the function should handle potential errors from the document store operation to ensure pipeline robustness.
Consider wrapping the call in a try-catch block:
async def clean( cleaner: SqlPairsCleaner, sql_pairs: List[SqlPair], embedding: Dict[str, Any], project_id: Optional[str] = "", ) -> Dict[str, Any]: sql_pair_ids = [sql_pair.id for sql_pair in sql_pairs] if sql_pair_ids: - await cleaner.run(sql_pair_ids=sql_pair_ids, project_id=project_id) + try: + await cleaner.run(sql_pair_ids=sql_pair_ids, project_id=project_id) + except Exception as e: + logger.error(f"Error cleaning SQL pairs: {e}") + # Consider whether to re-raise or continue with embedding return embedding
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
wren-ai-service/src/pipelines/generation/sql_generation_reasoning.py
(6 hunks)wren-ai-service/src/pipelines/indexing/sql_pairs.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- wren-ai-service/src/pipelines/generation/sql_generation_reasoning.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pytest
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
wren-ai-service/src/pipelines/indexing/sql_pairs.py (1)
132-133
: LGTM! Good defensive programming practice.The added check prevents unnecessary calls to the document store when there are no SQL pairs to clean, which is a good practice for robustness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
wren-ai-service/src/pipelines/generation/sql_generation.py (2)
95-105
: Enhance error handling in post_process function.The function could benefit from explicit error handling for timeout and SQL validation scenarios.
Consider adding try-catch blocks:
@observe(capture_input=False) async def post_process( generate_sql: dict, post_processor: SQLGenPostProcessor, engine_timeout: float, project_id: str | None = None, ) -> dict: + try: return await post_processor.run( generate_sql.get("replies"), timeout=engine_timeout, project_id=project_id, ) + except TimeoutError: + logger.error("SQL validation timed out") + raise + except Exception as e: + logger.error(f"Error in post processing: {str(e)}") + raise
111-166
: Enhance type safety with additional type hints.The class could benefit from more explicit type hints and validation.
Consider these improvements:
class SQLGeneration(BasicPipeline): def __init__( self, llm_provider: LLMProvider, engine: Engine, engine_timeout: Optional[float] = 30.0, **kwargs, - ): + ) -> None: self._components = { "generator": llm_provider.get_generator( system_prompt=sql_generation_system_prompt, generation_kwargs=SQL_GENERATION_MODEL_KWARGS, ), "prompt_builder": PromptBuilder( template=sql_generation_user_prompt_template ), "post_processor": SQLGenPostProcessor(engine=engine), } self._configs = { "engine_timeout": engine_timeout, } super().__init__( AsyncDriver({}, sys.modules[__name__], result_builder=base.DictResult()) ) @observe(name="SQL Generation") async def run( self, query: str, contexts: List[str], sql_generation_reasoning: Optional[str] = None, configuration: Configuration = Configuration(), sql_samples: Optional[List[Dict[str, str]]] = None, project_id: Optional[str] = None, has_calculated_field: bool = False, has_metric: bool = False, - ): + ) -> Dict[str, Any]:wren-ai-service/eval/utils.py (2)
220-229
: Swap hard-coded print statements for logging.
Usingprint("datasource is duckdb")
may be too rigid. If WrenEngine supports other data sources, it might be misleading as well. Consider Python’slogging
module.- print("datasource is duckdb") + import logging + logger = logging.getLogger(__name__) + logger.info("Detected WrenEngine; defaulting to DuckDB datasource.")
570-587
: Rewriting YAML may discard comments & formatting.
The approach withyaml.safe_dump_all
can lose existing inline comments or formatting. Consider whether preserving these is necessary and use a more selective editing strategy if so.wren-ai-service/eval/prediction.py (1)
109-114
: Substring check for “spider_” may be brittle.
Using"spider_" in path
works, but a more robust detection approach or a dedicated CLI argument might be preferable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
wren-ai-service/eval/__init__.py
(1 hunks)wren-ai-service/eval/pipelines.py
(8 hunks)wren-ai-service/eval/prediction.py
(3 hunks)wren-ai-service/eval/utils.py
(4 hunks)wren-ai-service/src/pipelines/generation/sql_generation.py
(2 hunks)wren-ai-service/src/pipelines/generation/sql_generation_reasoning.py
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- wren-ai-service/src/pipelines/generation/sql_generation_reasoning.py
🔇 Additional comments (13)
wren-ai-service/src/pipelines/generation/sql_generation.py (2)
42-42
: Verify the purpose of the empty line change.An empty line was modified at line 42 between the SQL sample sections. While this might be intentional for formatting, ensure it doesn't affect the template rendering.
13-14
:⚠️ Potential issueCritical: Resolve the missing
SQL_GENERATION_MODEL_KWARGS
dependency.The code still imports and uses
SQL_GENERATION_MODEL_KWARGS
which has been removed according to the AI summary. This will cause runtime errors.Let's verify if the variable exists in the utils file:
Consider one of these solutions:
- Restore the variable if it's still needed
- Replace the usage with inline configuration
- Move the configuration to a dedicated config file
Example of inline configuration:
generation_kwargs={ "response_format": { "type": "json_object", "schema": { "type": "object", "properties": { "sql": {"type": "string"}, "reasoning": {"type": "string"} }, "required": ["sql", "reasoning"] } } }Also applies to: 122-122
❌ Incorrect review comment
🏁 Script executed:
#!/bin/bash # Check if SQL_GENERATION_MODEL_KWARGS exists in the utils file rg -A 5 "SQL_GENERATION_MODEL_KWARGS" "src/pipelines/generation/utils/sql.py"Length of output: 153
I'll further investigate by searching the entire repository for any occurrence of
SQL_GENERATION_MODEL_KWARGS
. This will help confirm whether the variable was indeed removed and if it's referenced elsewhere.
🏁 Script executed:
#!/bin/bash # Recursively search the repository for SQL_GENERATION_MODEL_KWARGS rg "SQL_GENERATION_MODEL_KWARGS"Length of output: 1208
Clarification:
SQL_GENERATION_MODEL_KWARGS
is defined in the repository.A recursive search shows that the variable is indeed defined in
wren-ai-service/src/pipelines/generation/utils/sql.py
and is used in multiple pipeline modules (e.g., in SQL regeneration, correction, and generation). The concern that it was removed is incorrect.Likely an incorrect or invalid review comment.
wren-ai-service/eval/__init__.py (1)
13-13
: LGTM! Well-structured configuration addition.The new
allow_sql_samples
boolean flag is a clean addition that maintains backward compatibility with its default value ofTrue
. The attribute name is descriptive and follows the class's naming convention.wren-ai-service/eval/pipelines.py (4)
23-24
: LGTM! Enhanced metrics coverage.The addition of
ExactMatchAccuracy
andExecutionAccuracy
metrics enriches the evaluation capabilities.
229-230
: LGTM! Clean settings integration.The
settings: EvalSettings
parameter is properly integrated into the constructor, and the_allow_sql_samples
flag is correctly initialized from settings.Also applies to: 238-239
254-254
: LGTM! Robust SQL samples handling.The conditional inclusion of SQL samples using
prediction.get("samples", []) if self._allow_sql_samples else []
is a clean implementation that respects the settings flag.
289-291
: Verify Spider dataset compatibility.The comment indicates these metrics are for the Spider dataset but are temporarily disabled. However, the metrics are now uncommented and active.
Please confirm if these metrics should be enabled for the Spider dataset evaluation:
- Are there any specific requirements or considerations for the Spider dataset?
- Could enabling these metrics affect other dataset evaluations?
✅ Verification successful
🌐 Web query:
What is the Spider dataset in SQL and how is it typically evaluated?
💡 Result:
The Spider dataset is a large-scale benchmark designed for complex and cross-domain text-to-SQL tasks, focusing on translating natural language questions into SQL queries across diverse database schemas. Here’s an overview of its structure and typical evaluation methods:
Overview of the Spider Dataset
- Scope: Contains 10,181 questions paired with 5,693 unique SQL queries across 200 databases spanning 138 domains (e.g., college admissions, flight tracking)[1][2][4].
- Complexity: SQL queries include advanced features like:
- Multi-table joins
- Nested subqueries
- Aggregations (
GROUP BY
,HAVING
)- Sorting (
ORDER BY
)[1][2].- Cross-Domain Split: Databases in training and test sets do not overlap, forcing models to generalize to unseen schemas and query structures[1][2].
Key Features
- Schema Diversity: Each database averages 28 columns and 9 foreign keys, with domain-specific schemas[2].
- Human Annotation: Created by 11 annotators over 1,000 hours to ensure quality[1][4].
- Task Design: Tests generalization to new databases and new SQL structures[1][4].
Evaluation Metrics
Spider uses three primary evaluation methods:
- Exact Matching (EM) Accuracy:
- Checks if the predicted SQL exactly matches the gold query.
- State-of-the-art models initially achieved only 9.7–12.4% EM accuracy[1][2][4].
- Component Matching:
- Decomposes SQL into clauses (e.g.,
SELECT
,WHERE
) and compares sets of components[8].- Execution Accuracy:
- Runs predicted SQL against databases to verify correctness of results[8].
- Limited by reliance on value prediction in some cases[8].
Challenges Highlighted by Spider
- Models struggle with:
- Schema linking (matching questions to tables/columns).
- Handling unseen database structures.
- Generating complex nested queries[1][2][5].
- Subsequent benchmarks like Spider 2.0 and Dr.Spider further test robustness against perturbations (e.g., schema variations or paraphrased questions)[3][5].
Example Use Case
For a question like “List all TV shows with more than 10 episodes”, models must:
- Identify relevant tables (
TV_show
,Episodes
).- Join tables using foreign keys.
- Generate:
SELECT TV_show.name FROM TV_show JOIN Episodes ON TV_show.id = Episodes.show_id GROUP BY TV_show.id HAVING COUNT(*) > 10;
The Spider dataset remains a critical benchmark for advancing text-to-SQL systems due to its emphasis on real-world complexity and cross-domain generalization[1][2][4].
Citations:
- 1: https://ar5iv.labs.arxiv.org/html/1809.08887
- 2: https://aclanthology.org/D18-1425.pdf
- 3: https://openreview.net/pdf/7a17aa3930ef4d006ff699f1ae9c2616fe8c6e60.pdf
- 4: https://paperswithcode.com/paper/spider-a-large-scale-human-labeled-dataset
- 5: https://openreview.net/forum?id=Wc5bmZZU9cy
- 6: https://www.databricks.com/blog/improving-text2sql-performance-ease-databricks
- 7: https://github.com/taoyds/spider/blob/master/README.md
- 8: https://github.com/taoyds/spider
- 9: https://www.castordoc.com/ai-strategy/how-to-download-and-use-the-spider-dataset
- 10: https://paperswithcode.com/dataset/spider-realistic
Verification of Spider Dataset Compatibility in Metrics Activation
Based on the Spider dataset overview and its standard evaluation practices, here are the key points:
Spider Evaluation Standards: The Spider dataset is a benchmark for complex text-to-SQL tasks that typically evaluates predictions via:
- Exact Matching Accuracy: Ensuring the predicted SQL exactly matches the gold query.
- Execution Accuracy: Validating correctness by running the SQL query.
These are standard and expected metrics for Spider dataset evaluation.
No Special Requirements: There are no additional, unique configuration requirements for Spider beyond accommodating its inherent complexity (e.g., handling multi-table joins, nested subqueries, etc.). What’s essential is ensuring that the evaluation pipeline applies these metrics specifically when the Spider dataset is used.
Impact on Other Evaluations: Enabling these metrics in the Spider evaluation pipeline should not affect evaluations for other datasets, provided that the pipeline is set up to conditionally apply dataset-specific metrics. Each dataset’s evaluation should only trigger the metrics intended for it, and there is no intrinsic conflict between these metrics and other evaluation routines.
Action: The activated metrics (ExactMatchAccuracy and ExecutionAccuracy) are in line with Spider’s evaluation protocol. As long as the pipeline correctly segregates evaluation per dataset, they should remain enabled for Spider without adverse effects on other dataset evaluations.
wren-ai-service/eval/utils.py (2)
13-13
: New YAML import looks fine.
No concerns with addingyaml
as a dependency for reading/writing YAML files.
18-18
: Import usage is correct.
No issues importingWrenEngine
for specialized engine checks.wren-ai-service/eval/prediction.py (4)
2-2
: New import for base64.
No concerns; used properly for encoding.
9-9
: Use of orjson for faster serialization.
No issues; it’s generally a performant JSON library.
19-19
: Imported functions align with new logic.
No concerns about referencingparse_toml
andreplace_wren_engine_env_variables
.
105-105
: Parsing dataset earlier in the flow.
No issues. This can simplify subsequent logic by ensuring the dataset is available upfront.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
wren-ai-service/eval/metrics/spider/__init__.py (2)
567-567
: Consider using more specific exception types.While changing from bare
except
toexcept Exception
is an improvement, consider catching specific exceptions that you expect (e.g.,ValueError
,KeyError
) for better error handling.- except Exception: + except (ValueError, KeyError) as e: + print(f"Error parsing SQL structure: {e}")
628-639
: Consider using logging instead of print for error messages.While the error handling is good, using the logging module would be more appropriate for error tracking in a production environment.
+import logging + def build_foreign_key_map_from_json(table): try: import json with open(table) as f: data = json.load(f) tables = {} for entry in data: tables[entry["db_id"]] = build_foreign_key_map(entry) return tables except Exception as e: - print(f"Error building foreign key map from JSON: {e}") + logging.error(f"Error building foreign key map from JSON: {e}") return {}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
wren-ai-service/eval/metrics/spider/__init__.py
(4 hunks)wren-ai-service/eval/prediction.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- wren-ai-service/eval/prediction.py
🔇 Additional comments (2)
wren-ai-service/eval/metrics/spider/__init__.py (2)
159-162
: LGTM! Improved type checking usingisinstance()
.The change from
type() is
toisinstance()
follows Python best practices and makes the code more robust by properly handling inheritance and subclassing.
501-509
: LGTM! Consistent type checking improvements.The change to
isinstance()
maintains consistency with other type checking improvements and follows Python best practices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm. but having a test case need to be modified. the assertion in test_sql_pairs::test_sql_paris_deletion
await pipe.clean(sql_pairs=[], project_id="fake-id")
> assert await store.count_documents() == 0
E assert 2 == 0
fixed |
sql_explanation
andsql_regeneration
related code, since it's outdatedsql_regeneration
api, service, pipeline for feedback loopSQL_GENERATION_MODEL_KWARGS
Summary by CodeRabbit
New Features
Refactor
Tests