Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Prompt to reduce false positives #8

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 59 additions & 3 deletions validator/main.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import os
import re
import itertools
import warnings
from functools import partial
Expand All @@ -8,7 +9,6 @@
import nltk
import numpy as np
from guardrails.utils.docs_utils import get_chunks_from_text
from guardrails.utils.validator_utils import PROVENANCE_V1_PROMPT
from guardrails.validator_base import (
FailResult,
PassResult,
Expand All @@ -21,6 +21,47 @@
from tenacity import retry, stop_after_attempt, wait_random_exponential
from sentence_transformers import SentenceTransformer

PROVENANCE_V1_PROMPT = """Instruction:
As an Attribution Validator, your task is to determine if the given contexts provide irrefutable evidence to support the claim. Follow these strict guidelines:

Respond "Yes" ONLY if ALL of the following conditions are met:

The contexts explicitly and unambiguously state information that fully confirms ALL aspects of the claim.
There is NO room for alternative interpretations or assumptions.
The support is direct and doesn't require complex inference chains.
If numbers or specific details are mentioned in the claim, they MUST be exactly matched in the contexts.


Respond "No" if ANY of the following are true:

The contexts do not provide explicit information that fully substantiates every part of the claim.
The claim requires any degree of inference or assumption not directly stated in the contexts.
The contexts only partially support the claim or support it with slight differences in details.
There is any ambiguity, vagueness, or room for interpretation in how the contexts relate to the claim.
The claim includes any information not present in the contexts, even if it seems common knowledge.
The contexts contradict any part of the claim, no matter how minor.


Treat the contexts as the ONLY source of truth. Do not use any outside knowledge or assumptions.
For multi-part claims, EVERY single part must be explicitly supported by the contexts for a "Yes" response.
If there is ANY doubt whatsoever, respond with "No".
Be extremely literal in your interpretation. Do not extrapolate or generalize from the given information.

Provide your analysis in this format:
<reasoning>

Point 1
Point 2
Point 3 (if needed)
</reasoning>


<decision>Yes</decision> OR <decision>No</decision>
Claim:
{}
Contexts:
{}
Response:"""

@register_validator(name="guardrails/provenance_llm", data_type="string")
class ProvenanceLLM(Validator):
Expand Down Expand Up @@ -187,6 +228,18 @@ def validate_each_sentence(
fix_value="\n".join(supported_sentences),
)
return PassResult(metadata=metadata)

def parse_response(self, response:str) -> bool:
response = response.lower()
# Extract decision
decision_match = re.search(r'<decision>(yes|no)</decision>', response)
decision = decision_match.group(1) if decision_match else None
if decision is None or decision == 'no':
return False
elif decision == 'yes':
return True
else:
return False

def validate_full_text(
self, value: Any, query_function: Callable, metadata: Dict[str, Any]
Expand All @@ -196,9 +249,12 @@ def validate_full_text(

# Self-evaluate LLM with entire text
eval_response = self.evaluate_with_llm(value, query_function)
if eval_response == "yes":
print('eval response', eval_response)
passed = self.parse_response(eval_response)
print('passed', passed)
if passed == True:
return PassResult(metadata=metadata)
if eval_response == "no":
if passed == False:
return FailResult(
metadata=metadata,
error_message=(
Expand Down