Structured Outputs vs Function Calling - Performance & Accuracy Analysis #282

shreyashankar · 2025-01-17T18:02:04Z

Context

We currently implement extraction tasks using function calling, but some APIs support structured outputs. We need data to make an informed decision between these approaches.

Proposed Methodology

1. Create Synthetic Test Dataset

Take Wikipedia articles as base text (e.g., articles about cities, history)
Randomly insert known car names throughout the text (from a predefined list of 100+ cars)
Generate 50+ documents with varying:
- Document length (short/medium/long)
- Density of car mentions
- Complexity of surrounding context
Store ground truth: exact positions and names of inserted cars

2. Benchmark Implementation

Test both approaches.
Measure:

Latency (p50, p90, p99)
Token usage
Accuracy (precision/recall against ground truth)
Error rates/edge cases
Cost implications

3. Model Support Documentation

Create compatibility matrix, like as follows (this might not be accurate:

Model	Function Calling	Structured Outputs
GPT-4o	✓	✓
GPT-4o-mini	✓	✓
Claude	✓	✓
Llama-2	?	?
Mistral	?	?
[Add others]

Action Items

Create data generation script
Implement both extraction methods
Build benchmarking harness
Run tests across different models
Document findings & recommendations
Update codebase to use optimal method with fallback

Expected Output

Public benchmark results & methodology
Clear recommendation with data backing
PR to update implementation based on findings

shreyashankar · 2025-01-29T17:21:14Z

Closed via #291

shreyashankar added good first engineering issue Engineering-focused issue for newcomers good first research issue Good for newcomers who want to get involved in research labels Jan 17, 2025

shreyashankar changed the title ~~Benchmark: Structured Outputs vs Function Calling - Performance & Accuracy Analysis~~ Structured Outputs vs Function Calling - Performance & Accuracy Analysis Jan 17, 2025

shreyashankar mentioned this issue Jan 23, 2025

Replace Default Tool Implementation in APIWrapper with a Structured Output Request #286

Closed

shreyashankar closed this as completed Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structured Outputs vs Function Calling - Performance & Accuracy Analysis #282

Structured Outputs vs Function Calling - Performance & Accuracy Analysis #282

shreyashankar commented Jan 17, 2025

shreyashankar commented Jan 29, 2025

Structured Outputs vs Function Calling - Performance & Accuracy Analysis #282

Structured Outputs vs Function Calling - Performance & Accuracy Analysis #282

Comments

shreyashankar commented Jan 17, 2025

Context

Proposed Methodology

1. Create Synthetic Test Dataset

2. Benchmark Implementation

3. Model Support Documentation

Action Items

Expected Output

shreyashankar commented Jan 29, 2025