Setup

CHASE: Challenging AI with Synthetic Evaluations

The pace of evolution of Large Language Models (LLMs) necessitates new approaches for rigorous and comprehensive evaluation. Traditional human annotation is increasingly impracticable due to the complexities and costs involved in generating high-quality, challenging problems. In this work, we introduce **CHASE**, a unified framework to synthetically generate challenging problems using LLMs without human involvement. For a given task, our approach builds a hard problem in a bottom-up manner from simpler components. Moreover, our framework decomposes the generation process into independently verifiable sub-tasks, thereby ensuring a high level of quality and correctness. We implement CHASE to create evaluation benchmarks across three diverse domains: (1) document-based question answering, (2) repository-level code completion, and (3) math reasoning. The performance of state-of-the-art LLMs on these synthetic benchmarks lies in the range of 40-60% accuracy, thereby demonstrating the effectiveness of our framework at generating challenging problems.

Setup

Install VirtualEnv using the following (optional):

$ [sudo] pip install virtualenv

Create and activate your virtual environment (optional):

$ virtualenv -p python3 chasenv
$ source chasenv/bin/activate

Depending on your machine, you may have to do:

$ python3 -m venv chasenv
$ source chasenv/bin/activate

Dependencies

compatible with python 3
dependencies can be installed using CHASE/requirements.txt
Works best with CUDA 12.5 (otherwise you may have to struggle with installation of individual libraries)

Install all the required packages:

at CHASE/:

$ pip install -r requirements.txt

Usage

Citation

If you use our data or code, please cite our work:

@misc{patel2025llmgeneratechallengingproblems,
      title={How to Get Your LLM to Generate Challenging Problems for Evaluation}, 
      author={Arkil Patel and Siva Reddy and Dzmitry Bahdanau},
      year={2025},
      eprint={2502.14678},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.14678}, 
}

For any clarification, comments, or suggestions please contact Arkil.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
code		code
images		images
math		math
qa		qa
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CHASE: Challenging AI with Synthetic Evaluations

Setup

Dependencies

Usage

Citation

About

Releases

Packages

Languages

License

McGill-NLP/CHASE

Folders and files

Latest commit

History

Repository files navigation

CHASE: Challenging AI with Synthetic Evaluations

Setup

Dependencies

Usage

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages