Skip to content

Commit

Permalink
feat: Evals
Browse files Browse the repository at this point in the history
  • Loading branch information
bracesproul committed Nov 27, 2024
1 parent 50ef560 commit bc90408
Show file tree
Hide file tree
Showing 4 changed files with 144 additions and 0 deletions.
36 changes: 36 additions & 0 deletions src/evals/general/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import { type Example, Run } from "langsmith";

Check failure on line 1 in src/evals/general/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

'langsmith' should be listed in the project's dependencies. Run 'npm i -S langsmith' to add it

Check failure on line 1 in src/evals/general/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

'langsmith' should be listed in the project's dependencies. Run 'npm i -S langsmith' to add it
import { evaluate, EvaluationResult } from "langsmith/evaluation";

Check failure on line 2 in src/evals/general/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

'langsmith' should be listed in the project's dependencies. Run 'npm i -S langsmith' to add it

Check failure on line 2 in src/evals/general/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

'langsmith' should be listed in the project's dependencies. Run 'npm i -S langsmith' to add it
import "dotenv/config";

Check failure on line 3 in src/evals/general/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

'dotenv' should be listed in the project's dependencies, not devDependencies

Check failure on line 3 in src/evals/general/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

'dotenv' should be listed in the project's dependencies, not devDependencies
import { generatePostGraph } from "../../agent/subgraphs/generate-post/graph.js";

const runGraph = async (
input: Record<string, any>

Check warning on line 7 in src/evals/general/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

Unexpected any. Specify a different type

Check warning on line 7 in src/evals/general/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

Unexpected any. Specify a different type
): Promise<Record<string, any>> => {

Check warning on line 8 in src/evals/general/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

Unexpected any. Specify a different type

Check warning on line 8 in src/evals/general/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

Unexpected any. Specify a different type
return await generatePostGraph.invoke(input);
};

const evaluatePost = (run: Run, example?: Example): EvaluationResult => {
if (!example) {
throw new Error("No example provided");
}
if (!example.outputs) {
throw new Error("No example outputs provided");
}
if (!run.outputs) {
throw new Error("No run outputs provided");
}

// TODO: Implement evaluation logic
throw new Error("Evaluation logic not implemented");
};

async function runEval() {
const datasetName = "sma:generate-post:general";
await evaluate(runGraph, {
data: datasetName,
evaluators: [evaluatePost],
experimentPrefix: "Post Generation - General",
});
}

runEval();

Check failure on line 36 in src/evals/general/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

Promises must be awaited, end with a call to .catch, end with a call to .then with a rejection handler or be explicitly marked as ignored with the `void` operator

Check failure on line 36 in src/evals/general/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

Promises must be awaited, end with a call to .catch, end with a call to .then with a rejection handler or be explicitly marked as ignored with the `void` operator
36 changes: 36 additions & 0 deletions src/evals/github/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import { type Example, Run } from "langsmith";

Check failure on line 1 in src/evals/github/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

'langsmith' should be listed in the project's dependencies. Run 'npm i -S langsmith' to add it

Check failure on line 1 in src/evals/github/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

'langsmith' should be listed in the project's dependencies. Run 'npm i -S langsmith' to add it
import { evaluate, EvaluationResult } from "langsmith/evaluation";

Check failure on line 2 in src/evals/github/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

'langsmith' should be listed in the project's dependencies. Run 'npm i -S langsmith' to add it

Check failure on line 2 in src/evals/github/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

'langsmith' should be listed in the project's dependencies. Run 'npm i -S langsmith' to add it
import "dotenv/config";

Check failure on line 3 in src/evals/github/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

'dotenv' should be listed in the project's dependencies, not devDependencies

Check failure on line 3 in src/evals/github/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

'dotenv' should be listed in the project's dependencies, not devDependencies
import { generatePostGraph } from "../../agent/subgraphs/generate-post/graph.js";

const runGraph = async (
input: Record<string, any>

Check warning on line 7 in src/evals/github/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

Unexpected any. Specify a different type

Check warning on line 7 in src/evals/github/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

Unexpected any. Specify a different type
): Promise<Record<string, any>> => {

Check warning on line 8 in src/evals/github/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

Unexpected any. Specify a different type

Check warning on line 8 in src/evals/github/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

Unexpected any. Specify a different type
return await generatePostGraph.invoke(input);
};

const evaluatePost = (run: Run, example?: Example): EvaluationResult => {
if (!example) {
throw new Error("No example provided");
}
if (!example.outputs) {
throw new Error("No example outputs provided");
}
if (!run.outputs) {
throw new Error("No run outputs provided");
}

// TODO: Implement evaluation logic
throw new Error("Evaluation logic not implemented");
};

async function runEval() {
const datasetName = "sma:generate-post:github";
await evaluate(runGraph, {
data: datasetName,
evaluators: [evaluatePost],
experimentPrefix: "Post Generation - Github",
});
}

runEval();
36 changes: 36 additions & 0 deletions src/evals/twitter/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import { type Example, Run } from "langsmith";
import { evaluate, EvaluationResult } from "langsmith/evaluation";
import "dotenv/config";
import { generatePostGraph } from "../../agent/subgraphs/generate-post/graph.js";

const runGraph = async (
input: Record<string, any>

Check warning on line 7 in src/evals/twitter/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

Unexpected any. Specify a different type

Check warning on line 7 in src/evals/twitter/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

Unexpected any. Specify a different type
): Promise<Record<string, any>> => {

Check warning on line 8 in src/evals/twitter/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

Unexpected any. Specify a different type

Check warning on line 8 in src/evals/twitter/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

Unexpected any. Specify a different type
return await generatePostGraph.invoke(input);
};

const evaluatePost = (run: Run, example?: Example): EvaluationResult => {
if (!example) {
throw new Error("No example provided");
}
if (!example.outputs) {
throw new Error("No example outputs provided");
}
if (!run.outputs) {
throw new Error("No run outputs provided");
}

// TODO: Implement evaluation logic
throw new Error("Evaluation logic not implemented");
};

async function runEval() {
const datasetName = "sma:generate-post:twitter";
await evaluate(runGraph, {
data: datasetName,
evaluators: [evaluatePost],
experimentPrefix: "Post Generation - Twitter",
});
}

runEval();
36 changes: 36 additions & 0 deletions src/evals/youtube/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import { type Example, Run } from "langsmith";
import { evaluate, EvaluationResult } from "langsmith/evaluation";
import "dotenv/config";
import { generatePostGraph } from "../../agent/subgraphs/generate-post/graph.js";

const runGraph = async (
input: Record<string, any>

Check warning on line 7 in src/evals/youtube/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

Unexpected any. Specify a different type

Check warning on line 7 in src/evals/youtube/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

Unexpected any. Specify a different type
): Promise<Record<string, any>> => {

Check warning on line 8 in src/evals/youtube/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 18.x)

Unexpected any. Specify a different type

Check warning on line 8 in src/evals/youtube/index.ts

View workflow job for this annotation

GitHub Actions / Unit Tests (ubuntu-latest, 20.x)

Unexpected any. Specify a different type
return await generatePostGraph.invoke(input);
};

const evaluatePost = (run: Run, example?: Example): EvaluationResult => {
if (!example) {
throw new Error("No example provided");
}
if (!example.outputs) {
throw new Error("No example outputs provided");
}
if (!run.outputs) {
throw new Error("No run outputs provided");
}

// TODO: Implement evaluation logic
throw new Error("Evaluation logic not implemented");
};

async function runEval() {
const datasetName = "sma:generate-post:youtube";
await evaluate(runGraph, {
data: datasetName,
evaluators: [evaluatePost],
experimentPrefix: "Post Generation - YouTube",
});
}

runEval();

0 comments on commit bc90408

Please sign in to comment.