Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured Outputs vs Function Calling - Performance & Accuracy Analysis #282

Closed
shreyashankar opened this issue Jan 17, 2025 · 1 comment
Labels
good first engineering issue Engineering-focused issue for newcomers good first research issue Good for newcomers who want to get involved in research

Comments

@shreyashankar
Copy link
Collaborator

Context

We currently implement extraction tasks using function calling, but some APIs support structured outputs. We need data to make an informed decision between these approaches.

Proposed Methodology

1. Create Synthetic Test Dataset

  • Take Wikipedia articles as base text (e.g., articles about cities, history)
  • Randomly insert known car names throughout the text (from a predefined list of 100+ cars)
  • Generate 50+ documents with varying:
    • Document length (short/medium/long)
    • Density of car mentions
    • Complexity of surrounding context
  • Store ground truth: exact positions and names of inserted cars

2. Benchmark Implementation

Test both approaches.
Measure:

  • Latency (p50, p90, p99)
  • Token usage
  • Accuracy (precision/recall against ground truth)
  • Error rates/edge cases
  • Cost implications

3. Model Support Documentation

Create compatibility matrix, like as follows (this might not be accurate:

Model Function Calling Structured Outputs
GPT-4o
GPT-4o-mini
Claude
Llama-2 ? ?
Mistral ? ?
[Add others]

Action Items

  1. Create data generation script
  2. Implement both extraction methods
  3. Build benchmarking harness
  4. Run tests across different models
  5. Document findings & recommendations
  6. Update codebase to use optimal method with fallback

Expected Output

  • Public benchmark results & methodology
  • Clear recommendation with data backing
  • PR to update implementation based on findings
@shreyashankar shreyashankar added good first engineering issue Engineering-focused issue for newcomers good first research issue Good for newcomers who want to get involved in research labels Jan 17, 2025
@shreyashankar shreyashankar changed the title Benchmark: Structured Outputs vs Function Calling - Performance & Accuracy Analysis Structured Outputs vs Function Calling - Performance & Accuracy Analysis Jan 17, 2025
@shreyashankar
Copy link
Collaborator Author

Closed via #291

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first engineering issue Engineering-focused issue for newcomers good first research issue Good for newcomers who want to get involved in research
Projects
None yet
Development

No branches or pull requests

1 participant