Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

promptfoo / redteam #1114

Merged
merged 24 commits into from
Feb 10, 2025
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
4e84038
add redteam options
pelikhan Feb 9, 2025
ec80141
better support for promptfood reporting
pelikhan Feb 9, 2025
e420126
better promptfoo reporting
pelikhan Feb 9, 2025
ace7a05
adding more options
pelikhan Feb 10, 2025
dae8f68
added provider to cli
pelikhan Feb 10, 2025
aebee32
✨ feat: Add options parameter to callApi method
pelikhan Feb 10, 2025
9192f6f
updated test
pelikhan Feb 10, 2025
6aecf5c
add fileContent field support
pelikhan Feb 10, 2025
421c50d
a few more redteam values
pelikhan Feb 10, 2025
536d817
add red team
pelikhan Feb 10, 2025
16fadd1
a few more updates around default setup
pelikhan Feb 10, 2025
184be71
add more connection
pelikhan Feb 10, 2025
fec5743
:sparkles: Add redteam option in prompt configuration
pelikhan Feb 10, 2025
f80a2ac
add a few more options
pelikhan Feb 10, 2025
38035af
✨ feat: Enhance red team config with language targeting
pelikhan Feb 10, 2025
1a1d00e
refactor type
pelikhan Feb 10, 2025
13cb79c
add redteam option
pelikhan Feb 10, 2025
e697408
✨ feat: update options and concurrency settings for tests
pelikhan Feb 10, 2025
72b1a55
add redteam invocation
pelikhan Feb 10, 2025
74c558b
✨ Enhance redteam features and prompt handling
pelikhan Feb 10, 2025
61d47e8
✨ feat: add "eval" alias for "test" command
pelikhan Feb 10, 2025
55d2deb
revert adding provider to cli
pelikhan Feb 10, 2025
0fd588f
⚡️ refactor: streamline script by removing redteam config
pelikhan Feb 10, 2025
9b6cd48
✨ add CI awareness and confirmation prompts
pelikhan Feb 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/src/content/docs/reference/cli/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,8 @@

## `test`

```

Check warning on line 87 in docs/src/content/docs/reference/cli/commands.md

View workflow job for this annotation

GitHub Actions / build

The command usage should be consistent. Consider using a single pipe symbol (`|`) to separate the commands for clarity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command usage should be consistent. Consider using a single pipe symbol (|) to separate the commands for clarity.

AI-generated content by pr-docs-review-commit command_usage may be incorrect

Usage: genaiscript test [options] [command]
Usage: genaiscript test|eval [options] [command]

Options:
-h, --help display help for command
Expand All @@ -106,9 +106,10 @@

Arguments:
script Script ids. If not provided, all scripts
are tested

Check warning on line 109 in docs/src/content/docs/reference/cli/commands.md

View workflow job for this annotation

GitHub Actions / build

Options should be grouped logically and consistently ordered. Consider moving `--redteam` options closer to their respective commands.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Options should be grouped logically and consistently ordered. Consider moving --redteam options closer to their respective commands.

AI-generated content by pr-docs-review-commit option_order may be incorrect


Options:
--redteam run red team tests
-p, --provider <string> Preferred LLM provider aliases (choices:
"openai", "azure", "azure_serverless",
"azure_serverless_models", "anthropic",
Expand Down Expand Up @@ -144,9 +145,10 @@
```
Usage: genaiscript test list [options]

List available tests in workspace

Check warning on line 148 in docs/src/content/docs/reference/cli/commands.md

View workflow job for this annotation

GitHub Actions / build

Options should be grouped logically and consistently ordered. Consider moving `--redteam` options closer to their respective commands.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Options should be grouped logically and consistently ordered. Consider moving --redteam options closer to their respective commands.

AI-generated content by pr-docs-review-commit option_order may be incorrect


Options:
--redteam list red team tests
-g, --groups <groups...> groups to include or exclude. Use :! prefix to
exclude
-h, --help display help for command
Expand Down
6 changes: 4 additions & 2 deletions packages/cli/src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ export async function cli() {
.action(runScriptWithExitCode) // Action to execute the script with exit code

// Define 'test' command group for running tests
const test = program.command("test")
const test = program.command("test").alias("eval")

const testRun = test
.command("run", { isDefault: true })
Expand All @@ -229,6 +229,7 @@ export async function cli() {
"[script...]",
"Script ids. If not provided, all scripts are tested"
)
.option("--redteam", "run red team tests")
addModelOptions(testRun) // Add model options to the command
.option(
"--models <models...>",
Expand All @@ -254,11 +255,12 @@ export async function cli() {
// List available tests
test.command("list")
.description("List available tests in workspace")
.action(scriptTestList) // Action to list the tests
.option("--redteam", "list red team tests")
.option(
"-g, --groups <groups...>",
"groups to include or exclude. Use :! prefix to exclude"
)
.action(scriptTestList) // Action to list the tests

// Launch test viewer
test.command("view")
Expand Down
39 changes: 26 additions & 13 deletions packages/cli/src/test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ import {
EMOJI_FAIL,
TEST_RUNS_DIR_NAME,
PROMPTFOO_REMOTE_API_PORT,
PROMPTFOO_TEST_MAX_CONCURRENCY,
} from "../../core/src/constants"
import { promptFooDriver } from "../../core/src/default_prompts"
import { serializeError } from "../../core/src/error"
Expand All @@ -36,7 +37,7 @@ import {
PromptScriptTestRunResponse,
PromptScriptTestResult,
} from "../../core/src/server/messages"
import { generatePromptFooConfiguration } from "../../core/src/test"
import { generatePromptFooConfiguration } from "../../core/src/promptfoo"
import { delay } from "es-toolkit"
import { resolveModelConnectionInfo } from "../../core/src/models"
import { filterScripts } from "../../core/src/ast"
Expand All @@ -48,6 +49,7 @@ import {
CancellationOptions,
checkCancelled,
} from "../../core/src/cancellation"
import { CORE_VERSION } from "../../core/src/version"

/**
* Parses model specifications from a string and returns a ModelOptions object.
Expand Down Expand Up @@ -107,13 +109,14 @@ export async function runPromptScriptTests(
cache?: boolean
verbose?: boolean
write?: boolean
redteam?: boolean
promptfooVersion?: string
outSummary?: string
testDelay?: string
} & CancellationOptions
): Promise<PromptScriptTestRunResponse> {
applyModelOptions(options, "cli")
const { cancellationToken } = options || {}
const { cancellationToken, redteam } = options || {}
const scripts = await listTests({ ids, ...(options || {}) })
if (!scripts.length)
return {
Expand Down Expand Up @@ -155,7 +158,7 @@ export async function runPromptScriptTests(
- Run this command to launch the promptfoo test viewer.

\`\`\`sh
genaiscript test view
npx --yes genaiscript@${CORE_VERSION} test view
\`\`\`

`
Expand Down Expand Up @@ -187,29 +190,30 @@ genaiscript test view
provider: "provider.mjs",
chatInfo,
embeddingsInfo,
redteam,
})
const yaml = YAMLStringify(config)
await writeFile(fn, yaml)
configurations.push({ script, configuration: fn })
}

const promptFooVersion = options.promptfooVersion || PROMPTFOO_VERSION
const results: PromptScriptTestResult[] = []
// Execute each configuration and gather results
for (const config of configurations) {
checkCancelled(cancellationToken)
const { script, configuration } = config
const outJson = configuration.replace(/\.yaml$/, ".res.json")
const cmd = "npx"
const args = [
"--yes",
`promptfoo@${options.promptfooVersion || PROMPTFOO_VERSION}`,
"eval",
const args = ["--yes", `promptfoo@${promptFooVersion}`]
if (redteam) args.push("redteam", "run", "--force")
else args.push("eval", "--no-progress-bar")
args.push(
"--config",
configuration,
"--max-concurrency",
"1",
"--no-progress-bar",
]
String(PROMPTFOO_TEST_MAX_CONCURRENCY)
)
if (options.cache) args.push("--cache")
if (options.verbose) args.push("--verbose")
args.push("--output", outJson)
Expand Down Expand Up @@ -277,11 +281,16 @@ genaiscript test view
* @param options - Options to filter the test scripts by IDs or groups.
* @returns A Promise resolving to an array of filtered scripts.
*/
async function listTests(options: { ids?: string[]; groups?: string[] }) {
async function listTests(options: {
ids?: string[]
groups?: string[]
redteam?: boolean
}) {
const prj = await buildProject()
const scripts = filterScripts(prj.scripts, {
...(options || {}),
test: true,
test: options.redteam ? undefined : true,
redteam: options.redteam,
})
return scripts
}
Expand All @@ -300,6 +309,7 @@ export async function scriptsTest(
cache?: boolean
verbose?: boolean
write?: boolean
redteam?: boolean
promptfooVersion?: string
outSummary?: string
testDelay?: string
Expand All @@ -320,7 +330,10 @@ export async function scriptsTest(
* Lists available test scripts, printing their IDs and filenames.
* @param options - Options to filter the scripts by groups.
*/
export async function scriptTestList(options: { groups?: string[] }) {
export async function scriptTestList(options: {
groups?: string[]
redteam?: boolean
}) {
const scripts = await listTests(options)
console.log(scripts.map((s) => toStringList(s.id, s.filename)).join("\n"))
}
Expand Down
4 changes: 3 additions & 1 deletion packages/core/src/ast.ts
Original file line number Diff line number Diff line change
Expand Up @@ -89,15 +89,17 @@ export interface ScriptFilterOptions {
ids?: string[]
groups?: string[]
test?: boolean
redteam?: boolean
}

export function filterScripts(
scripts: PromptScript[],
options: ScriptFilterOptions
) {
const { ids, groups, test } = options || {}
const { ids, groups, test, redteam } = options || {}
return scripts
.filter((t) => !test || arrayify(t.tests)?.length)
.filter((t) => !redteam || t.redteam)
.filter((t) => !ids?.length || ids.includes(t.id))
.filter((t) => tagFilter(groups, t.group))
}
2 changes: 2 additions & 0 deletions packages/core/src/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,8 @@ export const WHISPERASR_API_BASE = "http://localhost:9000"
export const PROMPTFOO_CACHE_PATH = ".genaiscript/cache/tests"
export const PROMPTFOO_CONFIG_DIR = ".genaiscript/config/tests"
export const PROMPTFOO_REMOTE_API_PORT = 15500
export const PROMPTFOO_REDTEAM_NUM_TESTS = 5
export const PROMPTFOO_TEST_MAX_CONCURRENCY = 1

export const RUNS_DIR_NAME = "runs"
export const CONVERTS_DIR_NAME = "converts"
Expand Down
50 changes: 44 additions & 6 deletions packages/core/src/genaiscript-api-provider.mjs
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
import { pathToFileURL } from "node:url"

function deleteUndefinedValues(o) {
if (typeof o === "object" && !Array.isArray(o))
for (const k in o) if (o[k] === undefined) delete o[k]
return o
}

/**
* GenAiScript PromptFoo Custom Provider
*
Expand All @@ -8,7 +14,7 @@ import { pathToFileURL } from "node:url"
*/
class GenAIScriptApiProvider {
constructor(options) {
this.config = options.config
this.config = options.config || {}
this.providerId =
options.id ||
`genaiscript/${this.config.model || "large"}/${this.config.smallModel || "small"}/${this.config.visionModel || "vision"}`
Expand All @@ -19,15 +25,17 @@ class GenAIScriptApiProvider {
return this.providerId
}

async callApi(scriptId, context) {
async callApi(scriptId, context, callOptions) {
const { logger } = context
try {
const files = context.vars.files // string or string[]
const workspaceFiles = context.vars.workspaceFiles // WorkspaceFile or WorkspaceFile[]
const fileContent = context.vars.fileContent // string

let { cli, ...options } = structuredClone(this.config)
options.runTries = 2
options.runTrace = false
options.lobprobs = !!callOptions?.includeLogProbs

const testVars = context.vars.vars // {}
if (testVars && typeof testVars === "object")
Expand All @@ -38,13 +46,43 @@ class GenAIScriptApiProvider {
options.workspaceFiles = Array.isArray(workspaceFiles)
? workspaceFiles
: [workspaceFiles]
if (fileContent) {
if (!options.workspaceFiles) options.workspaceFiles = []
options.workspaceFiles.push({
filename: "",
content: fileContent,
})
}
const api = await import(cli ?? "genaiscript/api")
const res = await api.run(scriptId, files, options)
logger.debug(res)
return {
//logger.debug(res)
const { error, stats, logprobs, finishReason } = res || {}
const cost = stats?.cost
const logProbs = logprobs?.length
? logprobs.map((lp) => lp.logprob)
: undefined
const isRefusal =
finishReason === "refusal" || finishReason === "content_filter"

/*
https://www.promptfoo.dev/docs/configuration/reference/#providerresponse
*/
const pres = deleteUndefinedValues({
error,
cost,
tokenUsage: stats
? deleteUndefinedValues({
total: stats.total_tokens,
prompt: stats.prompt_tokens,
completion: stats.completion_tokens,
cached: stats.prompt_tokens_details?.cached_tokens,
})
: undefined,
logProbs,
isRefusal,
output: res,
error: res?.error,
}
})
return pres
} catch (e) {
logger.error(e)
return {
Expand Down
12 changes: 11 additions & 1 deletion packages/core/src/indent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,14 @@ export function indent(text: string, indentation: string) {
.join("\n")
}

export const dedent = tsDedent
/**
* Unindents a string
*/
export function dedent(
templ: TemplateStringsArray | string,
...values: unknown[]
): string {
if (templ === undefined) return undefined
if (templ === null) return null
return tsDedent(templ, ...values)
}
Loading
Loading