-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(wren-ai-service): Refactor Semantics Description Pipeline with Output Filtering and Request Model Updates #1272
Conversation
WalkthroughThe changes update the semantics description processing pipeline. A new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Router
participant Service
participant Pipeline
Client->>Router: HTTP Request (PostRequest)
Router->>Router: Convert PostRequest to GenerateRequest using model_dump()
Router->>Service: Invoke generate(generate_request)
Service->>Pipeline: Process data through output(normalize, picked_models)
Pipeline-->>Service: Return filtered model data (output)
Service-->>Router: Return response with output key
Router-->>Client: HTTP Response (PostResponse)
Poem
Tip 🌐 Web search-backed reviews and chat
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
wren-ai-service/tests/pytest/pipelines/generation/test_semantics_enrichment.py (1)
48-67
: Consider adding more edge cases.While the current test covers the case where all columns are hallucinated, consider adding tests for:
- Empty input models
- Models with mixed valid and invalid column types
- Models with special characters in column names
wren-ai-service/src/pipelines/generation/semantics_description.py (1)
91-104
: Consider optimizing the filtering logic.The current implementation creates multiple intermediate dictionaries. Consider using a more memory-efficient approach for large datasets.
- models = {model["name"]: model for model in picked_models} - - return { - name: {**data, "columns": _filter(data["columns"], models[name]["columns"])} - for name, data in normalize.items() - if name in models - } + models = {model["name"]: set(col["name"] for col in model["columns"]) + for model in picked_models} + + return { + name: {**data, "columns": [ + col for col in data["columns"] + if col["name"] in models.get(name, set()) + ]} + for name, data in normalize.items() + if name in models + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
wren-ai-service/src/pipelines/generation/semantics_description.py
(2 hunks)wren-ai-service/src/web/v1/routers/semantics_description.py
(1 hunks)wren-ai-service/src/web/v1/services/semantics_description.py
(3 hunks)wren-ai-service/tests/pytest/pipelines/generation/test_semantics_enrichment.py
(1 hunks)wren-ai-service/tests/pytest/services/test_semantics_description.py
(11 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: pytest
- GitHub Check: pytest
🔇 Additional comments (11)
wren-ai-service/tests/pytest/pipelines/generation/test_semantics_enrichment.py (2)
4-24
: LGTM! Well-structured test for the base case.The test effectively validates the output function's behavior when no hallucinated columns are present.
26-46
: LGTM! Good test coverage for hallucination filtering.The test properly verifies that hallucinated columns (marked with
$
) are filtered out while preserving valid columns.wren-ai-service/src/web/v1/services/semantics_description.py (3)
35-35
: LGTM! Type annotation improvement.The type annotation
Dict[str, self.Resource]
is more precise than the previous version.
50-57
: LGTM! Well-structured request model.The GenerateRequest model clearly defines all required fields and includes optional configuration.
87-98
: Consider adding validation for column names.The code processes column names but doesn't validate their format. Consider adding validation to ensure column names don't contain special characters that could cause issues.
wren-ai-service/src/web/v1/routers/semantics_description.py (1)
114-116
: LGTM! Clean request object creation.Using
model_dump()
for request creation is a cleaner approach than manual parameter passing.wren-ai-service/src/pipelines/generation/semantics_description.py (1)
218-218
: LGTM! Pipeline step update.Updating the final step to "output" aligns with the new filtering functionality.
wren-ai-service/tests/pytest/services/test_semantics_description.py (4)
13-14
: LGTM! Mock response structure updated to match pipeline changes.The mock response structure has been correctly updated to use "output" instead of "normalize", aligning with the pipeline's final step update.
31-36
: LGTM! Request model consistently updated across all test cases.All test cases have been properly updated to use
GenerateRequest
instead ofInput
while maintaining the same request structure and parameters.Also applies to: 57-62, 79-84, 135-140, 171-176, 194-199, 223-228
143-146
: LGTM! Batch processing mock responses updated correctly.The mock responses for batch processing have been properly updated to use the new "output" structure while maintaining the test's logical integrity.
233-237
: LGTM! Concurrent test mock response structure updated.The delayed_response function has been correctly updated to use the new "output" structure while preserving the concurrent testing logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This PR introduces improvements to the Semantics Description pipeline by adding output filtering capabilities and restructuring request handling.
Key Changes
output
function in semantics description pipeline to filter and validate columnsInput
model withGenerateRequest
model in semantics description servicenormalize
tooutput
output
instead ofnormalize
Technical Details
Testing
test_semantics_enrichment.py
Try it on the Web endpoint
POST
/v1/semantics-descriptionsImpact
These changes improve the reliability of the semantics description pipeline by:
Summary by CodeRabbit
New Features
Tests