GitHub

Lightweight AI application evaluation templates

The LLM-as-a-Judge approach uses large language models (LLMs) to evaluate AI-generated text based on predefined criteria.

Implemented by Semantic Kernel
Prompts are from autoevals

Battle
ClosedQA
Humor
Factuality
Moderation
Security
Summarization
SQL
Translation
Fine-tuned binary classifiers

Example of quick test of AI application output

Note that the name and parameters must match individual prompts above.

// Setup semantic kernel with ChatCompletion first
// Create PromptExecutionSettings and set 'Temperature'
const string isThisFunny = "I am a brown fox";
var json = 
    $$"""
    {
        "humor" : {
            "output" : "{{isThisFunny}}"
        },
        "factuality" : {
            "input" : "What color was Cotton?",
            "output": "white",
            "expected": "white"
        }
    }
    """;
await foreach (var result in 
          kernel.Run(json, executionSettings: executionSettings))
{
    Console.WriteLine($"[{result.Key}]: result: {result.Value?.Item1}, score: {result.Value?.Item2}");
}

Complete example here

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lightweight AI application evaluation templates

Example of quick test of AI application output

About

Releases

Packages

Languages

License

StormHub/TinyToolBox.AI

Folders and files

Latest commit

History

Repository files navigation

Lightweight AI application evaluation templates

Example of quick test of AI application output

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages