Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add faq consistency check #209

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

cremertim
Copy link
Contributor

@cremertim cremertim commented Mar 3, 2025

Add consistency check for FAQ rewriting. Returns found inconsistencies as well as a new proposition for the FAQ

Summary by CodeRabbit

  • New Features

    • Enhanced rewriting outputs with detailed feedback—including improvement suggestions and alerts on content inconsistencies—to provide clearer status updates.
    • Introduced course-specific identification and improved FAQ integration, enabling tailored content retrieval via advanced search techniques.
    • Added new attributes for more comprehensive status updates, including suggestions, inconsistencies, and improvement.
    • Implemented a new method for fetching FAQs from the database with flexible search capabilities.
    • Added a new prompt for verifying FAQ consistency against final results.
  • Chores

    • Removed an outdated FAQ rewriting prompt to streamline the overall process.

Copy link
Contributor

coderabbitai bot commented Mar 3, 2025

Walkthrough

This pull request introduces several changes to the rewriting pipeline and its associated modules. It adds a new course_id field to the RewritingPipelineExecutionDTO class and new attributes (suggestions, inconsistencies, improvement) to the RewritingStatusUpdateDTO class. A new prompt for FAQ consistency checking is introduced, while an outdated FAQ rewriting prompt is removed. The RewritingPipeline class is enhanced with FAQ retrieval and consistency checking functionalities, and the StatusCallback class's done method is updated to accommodate new parameters for reporting inconsistencies and improvements.

Changes

File(s) Change Summary
app/domain/rewriting_pipeline_execution_dto.py
app/domain/status/rewriting_status_update_dto.py
Added new course_id field in RewritingPipelineExecutionDTO and new attributes (suggestions, inconsistencies, improvement) in RewritingStatusUpdateDTO.
app/pipeline/prompts/faq_consistency_prompt.py
app/pipeline/prompts/faq_rewriting.py
Introduced faq_consistency_prompt variable; removed legacy faq_rewriting prompt file.
app/pipeline/rewriting_pipeline.py Extended the pipeline with new dependencies (FaqRetrieval, VectorDatabase), added FAQ retrieval, consistency checking via check_faq_consistency, and inconsistency parsing via parse_inconsistencies; updated the __call__ method to handle FAQs.
app/retrieval/faq_retrieval.py Added get_faqs_from_db method to fetch FAQs from the database using course ID and hybrid search parameters.
app/web/status/status_update.py Updated the done method in StatusCallback to handle additional parameters (inconsistencies, improvement) for enhanced status reporting.

Possibly related PRs

  • FAQ rewriting pipeline #191: Related to modifications in the RewritingPipelineExecutionDTO class, specifically its field structure and introduction of new fields.

Suggested labels

component:LLM

Suggested reviewers

  • bassner
  • isabellagessl
✨ Finishing Touches
  • 📝 Generate Docstrings

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (14)
app/pipeline/prompts/faq_consistency_prompt.py (5)

7-9: Consider removing unnecessary empty lines.

There are several consecutive empty lines here that don't serve a purpose and could be removed to improve readability.

-

-

-

19-27: Improve clarity of inconsistencies detection instructions.

The instructions for the "faqs" entry could be more concise and better organized. Consider restructuring for clarity while maintaining all requirements.

-"faqs": This entry should be a list of Strings, each string represents an FAQ.
--Make sure each faq is separated by comma.
--Also end each faq with a newline character.
--The fields are exactly named faq_id, faq_question_title and faq_question_answer
-and reside within properties dict of each list entry.
--Make sure to only include inconsistent faqs
--Do not include any additional FAQs that are consistent with the final_result.
+"faqs": This entry should be a list of strings, each representing an inconsistent FAQ.
+- Each FAQ must be comma-separated and end with a newline character
+- Only include fields named faq_id, faq_question_title, and faq_question_answer within the properties dict
+- Only include FAQs inconsistent with the final_result, exclude consistent ones

35-40: Simplify suggestion formatting instructions.

The instructions for the "suggestion" entry contain unnecessary newline characters and could be more concise.

-"suggestion": This entry is a list of strings, each string represents a suggestion to improve the final result.\n
-- Each suggestion should focus on a different inconsistency.
-- Each suggestions highlights what is the inconsistency and how it can be improved. 
-- Do not mention the term final result, call it provided text
-- Please ensure that at no time, you have a different amount of suggestions than inconsistencies.\n
-Both should have the same amount of entries.
+"suggestion": This entry is a list of strings, each representing a suggestion to improve the final result.
+- Each suggestion should focus on a unique inconsistency
+- Highlight both the inconsistency and how it can be improved
+- Refer to "final result" as "provided text"
+- Ensure the number of suggestions exactly matches the number of inconsistencies

42-44: Remove unnecessary empty lines after "improved version" explanation.

There are consecutive empty lines that don't serve a purpose in the prompt.

-"improved version": This entry should be a string that represents the improved version of the final result.
-

-
+"improved version": This entry should be a string that represents the improved version of the final result.

47-47: Missing newline at end of file.

The file is missing a newline character at the end, which is a common convention for Python files.

-"""
-47
+"""
app/domain/status/rewriting_status_update_dto.py (1)

10-10: Fix spacing in type annotation.

There's a missing space between the type annotation and the assignment operator.

-    improvement: str= ""
+    improvement: str = ""
app/retrieval/faq_retrieval.py (4)

77-77: Translate German comment to English for consistency.

The codebase appears to use English for code and comments, but this line contains a German comment.

-        hybrid_factor: float = 0.75,  # Gewichtung zwischen Vektor- und Textsuche
+        hybrid_factor: float = 0.75,  # Weighting between vector and keyword search

79-87: Translate German docstring to English for consistency.

The method docstring is written in German while the rest of the codebase appears to be in English. Consider translating the docstring to maintain language consistency.

-        """
-        Holt FAQs direkt aus der Datenbank, optional mit einer Ähnlichkeitssuche auf question_title und question_answer.
-
-        :param course_id: ID des Kurses, um nur FAQs eines bestimmten Kurses zu holen.
-        :param search_text: Optionaler Suchtext, der für eine semantische Suche verwendet wird.
-        :param result_limit: Anzahl der gewünschten Ergebnisse.
-        :param hybrid_factor: Gewichtung zwischen vektorbasierten und keywordbasierten Ergebnissen (0 = nur Vektor, 1 = nur Keywords).
-        :return: Liste der gefundenen FAQs.
+        """
+        Retrieves FAQs directly from the database, optionally with similarity search on question_title and question_answer.
+
+        :param course_id: ID of the course to filter FAQs for a specific course.
+        :param search_text: Optional search text used for semantic searching.
+        :param result_limit: Number of desired results.
+        :param hybrid_factor: Weighting between vector-based and keyword-based results (0 = vector only, 1 = keywords only).
+        :return: List of retrieved FAQs.

89-90: Consider adding a comment explaining the Filter usage.

Adding a brief English comment explaining what the filter does would improve code readability.

-        # Filter für den Kurs
+        # Filter results by course ID
         filter_weaviate = Filter.by_property("course_id").equal(course_id)

95-98: Add explanatory comments for hybrid search parameters.

The hybrid search parameters could benefit from brief explanations to help future maintainers understand their purpose.

         response = self.collection.query.hybrid(
-            query=search_text,  # Keyword-Suche
-            vector=vec,  # Vektorbasierte Ähnlichkeitssuche
-            alpha=hybrid_factor,  # Mischung aus Vektor- und Textsuche
+            query=search_text,  # Keyword search
+            vector=vec,  # Vector-based similarity search
+            alpha=hybrid_factor,  # Balance between vector and keyword search (0-1)
             return_properties=self.get_schema_properties(),
app/web/status/status_update.py (2)

113-114: Consider extending the docstring to include inconsistencies and improvement.

The method signature adds new parameters inconsistencies and improvement without updating the docstring to reflect them. Adding these parameters to the docstring would promote clarity and maintain consistency.


128-131: Use consistent approach for setting new attributes.

Using hasattr(self.status, ...) before assigning inconsistencies and improvement is sensible. Ensure all new attributes added in the status DTO (e.g., inconsistencies, improvement) follow a similar pattern, so that the presence or absence of corresponding fields in child DTOs won't cause errors.

app/pipeline/rewriting_pipeline.py (2)

46-49: Handle VectorDatabase instantiation failures.

Calls to VectorDatabase() and FaqRetrieval(self.db.client) may fail if the database connection is misconfigured or offline. Consider adding try-except blocks or fallback handlers for reliability.


87-90: Initialize newly introduced fields more descriptively.

inconsistencies = [], improvement = "", and suggestions = [] are clear placeholders. Consider adding in-code comments or docstrings around them to clarify their usage, aiding future maintainers.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4337c6 and fcf8a5a.

📒 Files selected for processing (7)
  • app/domain/rewriting_pipeline_execution_dto.py (1 hunks)
  • app/domain/status/rewriting_status_update_dto.py (1 hunks)
  • app/pipeline/prompts/faq_consistency_prompt.py (1 hunks)
  • app/pipeline/prompts/faq_rewriting.py (0 hunks)
  • app/pipeline/rewriting_pipeline.py (5 hunks)
  • app/retrieval/faq_retrieval.py (2 hunks)
  • app/web/status/status_update.py (3 hunks)
💤 Files with no reviewable changes (1)
  • app/pipeline/prompts/faq_rewriting.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Push Docker Image
🔇 Additional comments (12)
app/domain/rewriting_pipeline_execution_dto.py (1)

7-7: LGTM: Course ID field correctly added.

The new course_id field is properly typed as an integer with the appropriate JSON alias "courseId", which aligns well with the PR objective of enhancing FAQ consistency checking by allowing course-specific operations.

app/pipeline/prompts/faq_consistency_prompt.py (1)

1-46: Well-structured prompt with clear guidelines for FAQ consistency checking.

The prompt effectively outlines the task of verifying consistency between FAQs and a final result, with detailed instructions on the expected output format including consistency status, inconsistencies, suggestions, and improvement options.

app/domain/status/rewriting_status_update_dto.py (1)

8-10: New status attributes added for consistency reporting.

The new attributes for suggestions, inconsistencies, and improvement are properly typed and initialized with appropriate default values.

app/retrieval/faq_retrieval.py (2)

5-6: Added appropriate import for the new functionality.

The import of Filter from Weaviate is correctly added for the new database filtering functionality.


72-114: Well-implemented database retrieval method for FAQs.

The new get_faqs_from_db method provides a robust way to fetch FAQs filtered by course ID with optional semantic search capabilities. The method handles both cases (with and without search text) appropriately and returns properly formatted results.

app/web/status/status_update.py (1)

143-144: Resetting inconsistencies but not improvement may be inconsistent.

Currently, self.status.suggestions and self.status.inconsistencies are reset to None at the end of the done method, but improvement is not. If the improvement field is also intended to be ephemeral, consider resetting it as well for consistency or provide a rationale for preserving it.

app/pipeline/rewriting_pipeline.py (6)

1-3: Imports look good.

New imports for json and expanded typing support are appropriate for handling the JSON responses and improved type hints.


21-22: Check usage of FaqRetrieval and VectorDatabase.

These additions expand the pipeline’s retrieval layer. Ensure that database credentials or connections are handled securely and that usage is tested, especially if running in production or with real data.


65-68: Good approach: Reusing stored prompts for rewriting.

Using a standard prompt template helps maintain consistency. Just ensure that any special characters in dto.to_be_rewritten are escaped or handled if needed (e.g., backticks or braces).


91-104: Check potential issues if dto.course_id is missing or invalid.

When variant == "faq", you retrieve FAQs using dto.course_id. Validate that dto.course_id is not None or empty to avoid unexpected behavior from the database call.


147-154: Validate data structure for parse_inconsistencies.

The function directly accesses faq_id, faq_question_title, and faq_question_answer. If the JSON structure lacks these fields, this will raise a KeyError. A defensive check or default value would make the parsing more robust.


16-16:

❓ Verification inconclusive

Ensure the new prompt module is committed and tested.

faq_consistency_prompt is imported here. Verify that this module is properly tracked in the repository and tested to confirm that expected content exists in the prompt.


🏁 Script executed:

#!/bin/bash
# Confirm that faq_consistency_prompt.py exists and is tested
fd faq_consistency_prompt.py
rg -A 3 "test_faq_consistency_prompt"

Length of output: 115


I didn't find any direct test references using the string "test_faq_consistency_prompt". To be extra sure, let's run an additional check for any test files that reference faq_consistency_prompt:

Please run this script and share the output so we can confirm whether tests for the new prompt module exist.


🏁 Script executed:

#!/bin/bash
# Searching for any occurrence of "faq_consistency_prompt" in files likely to be tests.
rg -l "faq_consistency_prompt" --glob '*test*'

Length of output: 195


ACTION: Verify prompt module tests are in place

The module faq_consistency_prompt is properly tracked at app/pipeline/prompts/faq_consistency_prompt.py. However, our automated checks did not locate any tests explicitly referencing this module. Please manually verify that tests exist (or add tests if they’re missing) to confirm that the prompt's expected content is validated.

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 3, 2025
@cremertim cremertim changed the title Feature/faq/add rewrite consistency Add faq consistency check Mar 3, 2025
coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 3, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
app/pipeline/rewriting_pipeline.py (1)

111-151: ⚠️ Potential issue

Add error handling for JSON parsing and missing keys.

There's no error handling for JSON parsing or accessing dictionary keys, which could lead to unhandled exceptions if the LLM returns malformed responses.

Add error handling for JSON parsing and missing keys:

-        result = response.contents[0].text_content
-        data = json.loads(result)
-
-        result_dict = {
-            "type": data["type"],
-            "message": data["message"],
-            "faqs": data["faqs"],
-            "suggestion": data["suggestion"],
-            "improved version": data["improved version"],
-        }
+        result = response.contents[0].text_content
+        try:
+            data = json.loads(result)
+            
+            # Default values for keys that might be missing
+            result_dict = {
+                "type": data.get("type", "unknown"),
+                "message": data.get("message", ""),
+                "faqs": data.get("faqs", []),
+                "suggestion": data.get("suggestion", []),
+                "improved version": data.get("improved version", ""),
+            }
+        except json.JSONDecodeError as e:
+            logging.error(f"Failed to parse JSON from LLM response: {str(e)}")
+            # Return a default dictionary indicating the error
+            result_dict = {
+                "type": "error",
+                "message": f"Failed to parse response: {str(e)}",
+                "faqs": [],
+                "suggestion": [],
+                "improved version": "",
+            }

Also, update the docstring to accurately reflect the return value structure:

        """
        Checks the consistency of the given FAQs with the provided final_result.
        Returns "consistent" if there are no inconsistencies, otherwise returns "inconsistent".

        :param faqs: List of retrieved FAQs.
        :param final_result: The result to compare the FAQs against.
+       :return: A dictionary containing:
+           - type: "consistent" or "inconsistent"
+           - message: A message explaining the result
+           - faqs: List of inconsistent FAQs with details
+           - suggestion: List of suggestions for improvement
+           - improved version: An improved version of the final_result
        """
🧹 Nitpick comments (4)
app/pipeline/rewriting_pipeline.py (4)

1-23: Imports look good, though consider organizing them in logical groups.

The imports are correctly updated to include the necessary modules and types for the new FAQ consistency checking functionality.

Consider grouping imports more logically - standard library imports, then third-party libraries, then application imports - with a blank line between each group:

import json
import logging
from typing import Literal, Optional, List, Dict

from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import (
    ChatPromptTemplate,
)

from app.common.PipelineEnum import PipelineEnum
from app.common.pyris_message import PyrisMessage, IrisMessageRole
from app.domain.data.text_message_content_dto import TextMessageContentDTO
from app.domain.rewriting_pipeline_execution_dto import RewritingPipelineExecutionDTO
from app.llm import CapabilityRequestHandler, RequirementList, CompletionArguments
from app.pipeline import Pipeline
from app.pipeline.prompts.faq_consistency_prompt import faq_consistency_prompt
from app.pipeline.prompts.rewriting_prompts import (
    system_prompt_faq,
    system_prompt_problem_statement,
)
from app.retrieval.faq_retrieval import FaqRetrieval
from app.vector_database.database import VectorDatabase
from app.web.status.status_update import RewritingCallback

133-135: Consider using a variable for temperature value.

The temperature value is hardcoded, which may be fine for this specific case, but a constant or configuration parameter would be more maintainable.

Consider defining the temperature as a class constant or configuration parameter:

+    # Constants for LLM parameters
+    FAQ_CONSISTENCY_TEMPERATURE = 0.0
+    REWRITING_TEMPERATURE = 0.4

    def __init__(
        self, callback: RewritingCallback, variant: Literal["faq", "problem_statement"]
    ):
        # ...existing code...

    # Then in the __call__ method:
    response = self.request_handler.chat(
-        [prompt], CompletionArguments(temperature=0.4), tools=None
+        [prompt], CompletionArguments(temperature=self.REWRITING_TEMPERATURE), tools=None
    )

    # And in check_faq_consistency:
    response = self.request_handler.chat(
-        [prompt], CompletionArguments(temperature=0.0), tools=None
+        [prompt], CompletionArguments(temperature=self.FAQ_CONSISTENCY_TEMPERATURE), tools=None
    )

153-159: Remove debug logging and improve function documentation.

The debug logging statement should be removed, and the function could benefit from better documentation.

Improve the parse_inconsistencies function:

def parse_inconsistencies(inconsistencies: List[Dict[str, str]]) -> List[str]:
-   logging.info("parse consistency")
+   """
+   Formats a list of inconsistency dictionaries into human-readable strings.
+   
+   :param inconsistencies: List of dictionaries containing inconsistency details
+   :return: List of formatted strings describing each inconsistency
+   """
    parsed_inconsistencies = [
        f"FAQ ID: {entry['faq_id']}, Title: {entry['faq_question_title']}, Answer: {entry['faq_question_answer']}"
        for entry in inconsistencies
    ]
    return parsed_inconsistencies

149-149: Consistent logging level for similar events.

There's inconsistent use of logging levels - warning for inconsistencies but info for the consistency check response.

Use consistent logging levels for related information:

-        logging.info(f"Consistency FAQ consistency check response: {result_dict}")
+        if "inconsistent" in result_dict["type"].lower():
+            logging.warning(f"Consistency FAQ consistency check response: {result_dict}")
+        else:
+            logging.info(f"Consistency FAQ consistency check response: {result_dict}")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c5617e and 8be2f5c.

📒 Files selected for processing (3)
  • app/domain/status/rewriting_status_update_dto.py (1 hunks)
  • app/pipeline/rewriting_pipeline.py (5 hunks)
  • app/web/status/status_update.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • app/domain/status/rewriting_status_update_dto.py
  • app/web/status/status_update.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Push Docker Image
🔇 Additional comments (3)
app/pipeline/rewriting_pipeline.py (3)

65-67: Refactored prompt formatting looks good.

The code now uses a format_args dictionary to pass variables to the prompt template, which is a cleaner approach.


93-94:

✅ Verification successful

Verify course_id existence.

The code assumes that dto.course_id exists, but this might not be the case if the DTO wasn't properly initialized or if this field was recently added.

Ensure the RewritingPipelineExecutionDTO class has the course_id field:


🏁 Script executed:

#!/bin/bash
# Check the RewritingPipelineExecutionDTO class for the course_id field
rg -A 20 "class RewritingPipelineExecutionDTO" app/domain/rewriting_pipeline_execution_dto.py

Length of output: 280


Course ID Field Verified

The RewritingPipelineExecutionDTO class has a defined course_id field (with the alias "courseId"), so the assumption in the code is correct. No changes are needed in this part.


46-49:

❓ Verification inconclusive

Database initialization looks good, but verify connection lifecycle.

The Vector Database initialization and FAQ retriever setup are properly implemented.

Make sure the database connection is properly managed throughout the application lifecycle. Consider if the database connection should be created in the constructor or passed in as a dependency.


🏁 Script executed:

#!/bin/bash
# Check whether the VectorDatabase connection is properly closed elsewhere
rg -A 5 "def __del__" app/vector_database/database.py
# Or check if there's a close method that should be called
rg -A 5 "def close" app/vector_database/database.py

Length of output: 109


Ensure Proper Lifecycle Management for Database Connection

The initialization of VectorDatabase and the FAQ retriever in app/pipeline/rewriting_pipeline.py appears fine. However, our checks did not reveal any explicit teardown methods (like __del__ or close) in the VectorDatabase class (referenced from app/vector_database/database.py). Please manually verify that the database connection is properly closed or otherwise managed during the application lifecycle. Consider whether the connection should be instantiated in the constructor or injected as a dependency to ensure resource cleanup.

Comment on lines +123 to +126

consistency_prompt = faq_consistency_prompt.format(
faqs=properties_list, final_result=final_result
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Check for empty FAQs list before processing.

There's no check to see if the FAQs list is empty before trying to process it, which could lead to unexpected behavior in the consistency checking.

Add a check for an empty FAQs list:

        properties_list = [entry["properties"] for entry in faqs]
+        
+        # Return early if no FAQs to check against
+        if not properties_list:
+            return {
+                "type": "consistent",
+                "message": "No FAQs found to check consistency against.",
+                "faqs": [],
+                "suggestion": [],
+                "improved version": "",
+            }

        consistency_prompt = faq_consistency_prompt.format(
            faqs=properties_list, final_result=final_result
        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
consistency_prompt = faq_consistency_prompt.format(
faqs=properties_list, final_result=final_result
)
properties_list = [entry["properties"] for entry in faqs]
# Return early if no FAQs to check against
if not properties_list:
return {
"type": "consistent",
"message": "No FAQs found to check consistency against.",
"faqs": [],
"suggestion": [],
"improved version": "",
}
consistency_prompt = faq_consistency_prompt.format(
faqs=properties_list, final_result=final_result
)

:param final_result: The result to compare the FAQs against.

"""
properties_list = [entry["properties"] for entry in faqs]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle potential KeyError for 'properties'.

There's no error handling if 'properties' key is missing from a FAQ entry.

Add error handling for missing 'properties' key in FAQ entries:

-        properties_list = [entry["properties"] for entry in faqs]
+        properties_list = []
+        for entry in faqs:
+            if "properties" in entry:
+                properties_list.append(entry["properties"])
+            else:
+                logging.warning(f"FAQ entry missing 'properties' key: {entry}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
properties_list = [entry["properties"] for entry in faqs]
properties_list = []
for entry in faqs:
if "properties" in entry:
properties_list.append(entry["properties"])
else:
logging.warning(f"FAQ entry missing 'properties' key: {entry}")

Comment on lines +87 to +109
inconsistencies = []
improvement = ""
suggestions = []

if self.variant == "faq":
faqs = self.faq_retriever.get_faqs_from_db(
course_id=dto.course_id, search_text=response, result_limit=10
)
consistency_result = self.check_faq_consistency(faqs, final_result)

if "inconsistent" in consistency_result["type"].lower():
logging.warning("Detected inconsistencies in FAQ retrieval.")
inconsistencies = parse_inconsistencies(consistency_result["faqs"])
improvement = consistency_result["improved version"]
suggestions = consistency_result["suggestion"]

self.callback.done(
final_result=final_result,
tokens=self.tokens,
inconsistencies=inconsistencies,
improvement=improvement,
suggestions=suggestions,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Callback parameters usage is comprehensive, but consider adding error handling.

The code now properly initializes and passes inconsistency-related parameters to the callback. However, there's no error handling for potential issues in the FAQ retrieval or consistency checking process.

Add error handling around the FAQ retrieval and consistency checking to ensure that the pipeline doesn't fail if there are issues with these operations:

        final_result = response
        inconsistencies = []
        improvement = ""
        suggestions = []

        if self.variant == "faq":
-           faqs = self.faq_retriever.get_faqs_from_db(
-               course_id=dto.course_id, search_text=response, result_limit=10
-           )
-           consistency_result = self.check_faq_consistency(faqs, final_result)
-
-           if "inconsistent" in consistency_result["type"].lower():
-               logging.warning("Detected inconsistencies in FAQ retrieval.")
-               inconsistencies = parse_inconsistencies(consistency_result["faqs"])
-               improvement = consistency_result["improved version"]
-               suggestions = consistency_result["suggestion"]
+           try:
+               faqs = self.faq_retriever.get_faqs_from_db(
+                   course_id=dto.course_id, search_text=response, result_limit=10
+               )
+               if faqs:  # Only check consistency if we have FAQs to check against
+                   consistency_result = self.check_faq_consistency(faqs, final_result)
+                   
+                   if "inconsistent" in consistency_result["type"].lower():
+                       logging.warning("Detected inconsistencies in FAQ retrieval.")
+                       inconsistencies = parse_inconsistencies(consistency_result["faqs"])
+                       improvement = consistency_result["improved version"]
+                       suggestions = consistency_result["suggestion"]
+               else:
+                   logging.info("No FAQs retrieved for consistency check.")
+           except Exception as e:
+               logging.error(f"Error during FAQ consistency checking: {str(e)}")
+               # Continue with the pipeline even if FAQ checking fails
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inconsistencies = []
improvement = ""
suggestions = []
if self.variant == "faq":
faqs = self.faq_retriever.get_faqs_from_db(
course_id=dto.course_id, search_text=response, result_limit=10
)
consistency_result = self.check_faq_consistency(faqs, final_result)
if "inconsistent" in consistency_result["type"].lower():
logging.warning("Detected inconsistencies in FAQ retrieval.")
inconsistencies = parse_inconsistencies(consistency_result["faqs"])
improvement = consistency_result["improved version"]
suggestions = consistency_result["suggestion"]
self.callback.done(
final_result=final_result,
tokens=self.tokens,
inconsistencies=inconsistencies,
improvement=improvement,
suggestions=suggestions,
)
final_result = response
inconsistencies = []
improvement = ""
suggestions = []
if self.variant == "faq":
try:
faqs = self.faq_retriever.get_faqs_from_db(
course_id=dto.course_id, search_text=response, result_limit=10
)
if faqs: # Only check consistency if we have FAQs to check against
consistency_result = self.check_faq_consistency(faqs, final_result)
if "inconsistent" in consistency_result["type"].lower():
logging.warning("Detected inconsistencies in FAQ retrieval.")
inconsistencies = parse_inconsistencies(consistency_result["faqs"])
improvement = consistency_result["improved version"]
suggestions = consistency_result["suggestion"]
else:
logging.info("No FAQs retrieved for consistency check.")
except Exception as e:
logging.error(f"Error during FAQ consistency checking: {str(e)}")
# Continue with the pipeline even if FAQ checking fails
self.callback.done(
final_result=final_result,
tokens=self.tokens,
inconsistencies=inconsistencies,
improvement=improvement,
suggestions=suggestions,
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant