-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(wren-ai-service): Consolidate SQL Pairs Service and Remove Redundant Code #1268
Conversation
WalkthroughThis pull request updates the SQL pairs configuration and related service logic. The default file path for SQL pairs is renamed, and service imports in the ServiceContainer are consolidated. Pipeline functions are modified to support dynamic method invocation and improved logging. In addition, legacy SQL pairs deletion and preparation functionality is removed in favor of a new SQL pairs API and service that handle indexing, cleaning, and deletion. Associated tests have been added, and outdated tests removed. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Router
participant Service as SqlPairsService
participant Pipeline
participant Cache
Client->>Router: POST /sql-pairs (payload)
Router->>Service: index(request)
Service->>Pipeline: Execute SQL pairs indexing
Pipeline-->>Service: Return result
Service->>Cache: Update status (finished/failed)
Service-->>Router: Response with tracking ID
Router-->>Client: Return response
sequenceDiagram
participant Client
participant Router
participant Service as SqlPairsService
participant Pipeline
participant Cache
Client->>Router: DELETE /sql-pairs (payload)
Router->>Service: delete(request)
Service->>Pipeline: Execute SQL pairs deletion (clean)
Pipeline-->>Service: Return deletion result
Service->>Cache: Update status (finished/failed)
Service-->>Router: Response with deletion ID
Router-->>Client: Return deletion response
Suggested reviewers
Poem
Tip 🌐 Web search-backed reviews and chat
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (9)
💤 Files with no reviewable changes (9)
⏰ Context from checks skipped due to timeout of 90000ms (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
wren-ai-service/src/web/v1/routers/sql_pairs.py (1)
145-162
: Return a unified status.The GET endpoint returns
"deleting"
,"finished"
, or"failed"
. If you anticipate indexing and deleting statuses in the same ID’s lifecycle, consider also supporting"indexing"
. Otherwise, the returned data might look incomplete for those using the GET endpoint to track indexing progress.- status: Literal["deleting", "finished", "failed"] + status: Literal["indexing", "deleting", "finished", "failed"]
🧹 Nitpick comments (9)
wren-ai-service/src/web/v1/services/sql_pairs.py (4)
16-25
: Consider using a more descriptive error code mechanism.Currently, the error field only supports
"OTHERS"
as a literal. If you anticipate more nuanced error categories (e.g.,"PIPELINE_ERROR"
,"DATA_ERROR"
, etc.), consider refactoring to anEnum
or a broader set of Literals for better clarity and extensibility.
35-47
: Centralize logging or re-raise exceptions for broader monitoring.While
_handle_exception
sets the Resource to"failed"
and logs the error, consider whether additional handling (like re-raising or alerting) might be beneficial, especially if you have a monitoring or alerting pipeline that depends on raised exceptions rather than mere log outputs.
90-111
: Ensure consistent error handling with the delete operation.Deletion logic mirrors the indexing flow, but if partial deletions fail, you may want to confirm whether the entire request should be considered failed. Currently, any thrown exception marks the entire operation as
"failed"
. If partial or bulk deletes are allowed, consider different statuses or more granular error reporting.
113-119
: Avoid usinglogger.exception
for resource-not-found scenarios.If the resource was never created or has expired from the TTL cache, this often isn’t truly an “exceptional” scenario. Using
logger.exception
can confuse operational metrics or log scrapers into thinking an error occurred. Consider switching tologger.info
orlogger.warning
.wren-ai-service/src/web/v1/routers/sql_pairs.py (1)
91-110
: BackgroundTasks error monitoring.The indexing operation is offloaded to a background task. If an unhandled exception occurs there, it may never surface in the main request logs. Ensure you have a robust logging or error collection mechanism to capture background task exceptions.
wren-ai-service/src/pipelines/indexing/sql_pairs.py (2)
52-74
: Check for double-deletion or missing docs.
SqlPairsCleaner.run
callsdelete_documents
on matching records. If a record doesn’t exist, there’s no error. Depending on your logic, you may prefer more explicit reporting (for partial or total misses). Also ensure the DocumentStore’s filter logic aligns with your ID or project scoping.
213-225
: Return confirmation on successful deletion.Your
clean
method isasync
and returnsNone
, but you might want to return an object or success boolean. That can help the caller confirm that the cleaning ran without error instead of having to rely solely on logs or exceptions.wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs.py (1)
73-96
: Consider adding edge case tests.While the test covers basic deletion scenarios, consider adding tests for:
- Deleting non-existent SQL pairs
- Deleting with invalid project IDs
- Concurrent deletions
@pytest.mark.asyncio async def test_sql_pairs_deletion_edge_cases(): # Setup code... # Test deleting non-existent pairs await pipe.clean(sql_pairs=[], project_id="non-existent-id") assert await store.count_documents() == 2 # Should not affect existing documents # Test invalid project ID with pytest.raises(ValueError): await pipe.clean(sql_pairs=[], project_id="") # Test concurrent deletions await asyncio.gather( pipe.clean(sql_pairs=[], project_id="fake-id"), pipe.clean(sql_pairs=[], project_id="fake-id-2") ) assert await store.count_documents() == 0wren-ai-service/tests/pytest/services/test_sql_pairs.py (1)
28-229
: Consider adding more edge cases to improve test coverage.The current test suite covers basic functionality well. Consider adding these scenarios:
- Test with invalid SQL queries
- Test with duplicate SQL pair IDs
- Test with non-existent project IDs
- Test concurrent indexing operations
- Test error handling for database connection failures
Would you like me to help implement these additional test cases?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
wren-ai-service/src/config.py
(1 hunks)wren-ai-service/src/globals.py
(11 hunks)wren-ai-service/src/pipelines/common.py
(1 hunks)wren-ai-service/src/pipelines/indexing/__init__.py
(0 hunks)wren-ai-service/src/pipelines/indexing/sql_pairs.py
(4 hunks)wren-ai-service/src/pipelines/indexing/sql_pairs_deletion.py
(0 hunks)wren-ai-service/src/web/v1/routers/__init__.py
(2 hunks)wren-ai-service/src/web/v1/routers/question_recommendation.py
(1 hunks)wren-ai-service/src/web/v1/routers/relationship_recommendation.py
(1 hunks)wren-ai-service/src/web/v1/routers/semantics_description.py
(1 hunks)wren-ai-service/src/web/v1/routers/sql_pairs.py
(1 hunks)wren-ai-service/src/web/v1/routers/sql_pairs_preparation.py
(0 hunks)wren-ai-service/src/web/v1/services/__init__.py
(2 hunks)wren-ai-service/src/web/v1/services/sql_pairs.py
(1 hunks)wren-ai-service/src/web/v1/services/sql_pairs_preparation.py
(0 hunks)wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs.py
(1 hunks)wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs_deletion.py
(0 hunks)wren-ai-service/tests/pytest/services/test_sql_pairs.py
(1 hunks)wren-ai-service/tests/pytest/services/test_sql_pairs_preparation.py
(0 hunks)
💤 Files with no reviewable changes (6)
- wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs_deletion.py
- wren-ai-service/src/pipelines/indexing/init.py
- wren-ai-service/src/pipelines/indexing/sql_pairs_deletion.py
- wren-ai-service/src/web/v1/services/sql_pairs_preparation.py
- wren-ai-service/src/web/v1/routers/sql_pairs_preparation.py
- wren-ai-service/tests/pytest/services/test_sql_pairs_preparation.py
✅ Files skipped from review due to trivial changes (3)
- wren-ai-service/src/web/v1/routers/semantics_description.py
- wren-ai-service/src/web/v1/routers/relationship_recommendation.py
- wren-ai-service/src/web/v1/routers/question_recommendation.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pytest
- GitHub Check: pytest
- GitHub Check: Analyze (go)
🔇 Additional comments (16)
wren-ai-service/src/web/v1/services/sql_pairs.py (3)
26-34
: Validate pipeline keys before use.You're referencing
self._pipelines["sql_pairs"]
later on; ensure that"sql_pairs"
key is always present. A defensive check or a try-except for missing pipeline keys could enhance reliability if there's a configuration or registration issue.
60-81
: Guard against uninitialized or empty requests.When processing
.sql_pairs
inindex
, consider validating thatrequest.sql_pairs
is non-empty and well-formed. Empty or erroneous input might cause an unnecessary pipeline run or partial failure.
127-129
: Consider concurrency safety for cache writes.
__setitem__
is straightforward, but if you anticipate concurrent writes to the same key from multiple tasks, data synchronization or checks might be needed. ATTLCache
could still face race conditions in extreme cases.wren-ai-service/src/web/v1/routers/sql_pairs.py (2)
20-39
: Optional input validation for SQL pairs.When receiving SQL pairs in the POST request, ensure the list is not excessively large or malformed. A pre-validation step (e.g., restricting the size or performing schema checks) can safeguard against unexpected overhead or pipeline failures.
121-143
: Symmetry in endpoint naming and handling.The DELETE endpoint mirrors the POST flow. This consistency is good, but also be sure each endpoint’s background task exceptions are tracked consistently. If you want partial deletes or more granular feedback, you might add additional metadata (e.g., which IDs succeeded/failed).
wren-ai-service/src/pipelines/indexing/sql_pairs.py (2)
24-25
: Clarify default values.
sql: str = ""
andquestion: str = ""
are convenient defaults, but consider whether empty strings might cause confusion downstream. If an empty query is invalid, it might be safer to raise an error during initialization.
184-185
: Validate existence of local SQL pairs file.
_load_sql_pairs
warns if the file is not found, but the subsequent pipeline logic always mergesself._external_pairs
. If the file is missing or corrupt, you might define a fallback to prevent silent partial merges or duplicated keys.wren-ai-service/src/web/v1/services/__init__.py (2)
41-41
: LGTM! String formatting improvement.Changed from single quotes to double quotes for better consistency.
61-92
: LGTM! Well-organized service imports.Good practice to:
- Consolidate service imports in
__init__.py
to avoid circular imports- Use
__all__
to explicitly control exports- Include
# noqa: E402
to suppress import order warningswren-ai-service/src/pipelines/common.py (2)
36-41
: LGTM! Enhanced pipeline flexibility.Added
method
parameter enables dynamic invocation of pipeline methods, supporting the consolidated SQL pairs functionality.
49-49
: Verify the impact of dev logging in production.Setting
is_dev=True
might expose sensitive information in logs.wren-ai-service/src/web/v1/routers/__init__.py (1)
15-15
: LGTM! Router consolidation aligns with PR objectives.Successfully replaced
sql_pairs_preparation
with consolidatedsql_pairs
router while maintaining import and inclusion order.Also applies to: 33-33
wren-ai-service/src/config.py (1)
63-63
: Verify the impact of the configuration path change.The default path for SQL pairs has been updated from "pairs.json" to "sql_pairs.json" for better naming consistency. This is a breaking change that requires users to update their configuration files.
Run this script to find any hardcoded references to the old path:
✅ Verification successful
🏁 Script executed:
#!/bin/bash # Description: Search for hardcoded references to the old path. rg -i "pairs\.json"Length of output: 681
Configuration Change Impact Verified
- The default
sql_pairs_path
inwren-ai-service/src/config.py
has been updated to"sql_pairs.json"
, and this change propagates to the corresponding module inwren-ai-service/src/pipelines/indexing/sql_pairs.py
.- Test cases in
wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs.py
explicitly override the path using"tests/data/pairs.json"
, so they remain unaffected.- This confirms that production users will need to update their configuration files accordingly, as the breaking change is applied only to the default and not in tests.
wren-ai-service/tests/pytest/services/test_sql_pairs.py (1)
12-26
: LGTM! Well-structured fixture setup.The fixture correctly initializes the service container and document store provider with proper cleanup between tests.
wren-ai-service/src/globals.py (2)
10-10
: LGTM! Improved import organization.Consolidating service imports into a single namespace improves code organization and maintainability.
17-30
: LGTM! Service container updates align with the PR objectives.The changes correctly:
- Update service type annotations to use the services namespace
- Rename SqlPairsPreparationService to SqlPairsService
- Update pipeline configuration for SQL pairs
Also applies to: 232-240
|
||
return await self._pipe.execute( | ||
["write"], | ||
inputs={ | ||
"mdl_str": mdl_str, | ||
"project_id": project_id, | ||
**self._components, | ||
input = { | ||
"mdl_str": mdl_str, | ||
"project_id": project_id, | ||
"external_pairs": { | ||
**self._external_pairs, | ||
**external_pairs, | ||
}, | ||
**self._components, | ||
} | ||
|
||
return await self._pipe.execute(["write"], inputs=input) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider error handling in pipeline execution.
The pipeline self._pipe.execute(["write"], inputs=input)
might fail if the store or embedding process errors. Currently, that error is not caught here. For user-facing reliability, consider a try-except to either log or handle partial writes more gracefully.
+ try:
+ return await self._pipe.execute(["write"], inputs=input)
+ except Exception as e:
+ logger.error(f"Pipeline execution failed: {e}")
+ raise
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
return await self._pipe.execute( | |
["write"], | |
inputs={ | |
"mdl_str": mdl_str, | |
"project_id": project_id, | |
**self._components, | |
input = { | |
"mdl_str": mdl_str, | |
"project_id": project_id, | |
"external_pairs": { | |
**self._external_pairs, | |
**external_pairs, | |
}, | |
**self._components, | |
} | |
return await self._pipe.execute(["write"], inputs=input) | |
input = { | |
"mdl_str": mdl_str, | |
"project_id": project_id, | |
"external_pairs": { | |
**self._external_pairs, | |
**external_pairs, | |
}, | |
**self._components, | |
} | |
try: | |
return await self._pipe.execute(["write"], inputs=input) | |
except Exception as e: | |
logger.error(f"Pipeline execution failed: {e}") | |
raise |
|
||
pipe_components = generate_components(settings.components) | ||
pipeline = pipeline_cls(**pipe_components[pipeline_name]) | ||
init_langfuse(settings) | ||
|
||
async_validate(lambda: pipeline.run(**kwargs)) | ||
async_validate(lambda: getattr(pipeline, method)(**kwargs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Verify error handling for dynamic method invocation.
Using getattr
without checking if the method exists could raise AttributeError
.
- async_validate(lambda: getattr(pipeline, method)(**kwargs))
+ if not hasattr(pipeline, method):
+ raise ValueError(f"Method '{method}' not found in pipeline")
+ async_validate(lambda: getattr(pipeline, method)(**kwargs))
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async_validate(lambda: getattr(pipeline, method)(**kwargs)) | |
if not hasattr(pipeline, method): | |
raise ValueError(f"Method '{method}' not found in pipeline") | |
async_validate(lambda: getattr(pipeline, method)(**kwargs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This PR refactors the SQL Pairs functionality to improve code organization and remove redundancy:
Key Changes
SqlPairsService
) and removed redundantSqlPairsPreparationService
__init__.py
to improve code organization and avoid circular importssql_pairs.json
instead ofpairs.json
)Technical Details
SqlPairsDeletion
pipeline and integrated deletion functionality intoSqlPairs
pipelineSqlPairsService
Testing
Breaking Changes
SqlPairsPreparationService
in favor ofSqlPairsService
pairs.json
tosql_pairs.json
Summary by CodeRabbit
New Features
Refactor
Removed Features
Tests
Chores