The LLM-as-a-Judge approach uses large language models (LLMs) to evaluate AI-generated text based on predefined criteria.
- Implemented by Semantic Kernel
- Prompts are from autoevals
- Battle
- ClosedQA
- Humor
- Factuality
- Moderation
- Security
- Summarization
- SQL
- Translation
- Fine-tuned binary classifiers
Note that the name and parameters must match individual prompts above.
// Setup semantic kernel with ChatCompletion first
// Create PromptExecutionSettings and set 'Temperature'
const string isThisFunny = "I am a brown fox";
var json =
$$"""
{
"humor" : {
"output" : "{{isThisFunny}}"
},
"factuality" : {
"input" : "What color was Cotton?",
"output": "white",
"expected": "white"
}
}
""";
await foreach (var result in
kernel.Run(json, executionSettings: executionSettings))
{
Console.WriteLine($"[{result.Key}]: result: {result.Value?.Item1}, score: {result.Value?.Item2}");
}
Complete example here