Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrolls branch #2309

Open
wants to merge 125 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
d1b6d12
v1
Feb 12, 2023
5ef6ea8
Add to registry
Feb 12, 2023
2aed125
Fix eval
Feb 12, 2023
31a8b83
Add splits
Feb 12, 2023
48c6bd6
fix mmlu task, set updated dataset name and make the prompt identical…
ollmer May 12, 2023
7b8d9b7
add comment
ollmer May 12, 2023
9a29afb
Add CIT
Muennighoff May 20, 2023
92ec976
add SCROLLS (GovReport, SummScreenFD, QMSum, NarrativeQA, Qasper, QuA…
jquesnelle May 25, 2023
b1af197
update task_table.md for SCROLLS
jquesnelle May 25, 2023
3a424af
bump eval version
ollmer May 28, 2023
d8bf52c
fix p-tuning inaccuracy, because output logit contains virtual token …
sywangyi May 30, 2023
f034974
address review comments
ollmer Jun 6, 2023
c117e78
fix unreachable return
ollmer Jun 7, 2023
5ba0c5f
Merge remote-tracking branch 'upstream/master' into mmlu_fix
ollmer Jun 13, 2023
babb83b
Make Hailey and Lintang the code owners.
StellaAthena Jun 15, 2023
2b70a7c
Update README.md
haileyschoelkopf Jun 15, 2023
3e2e6d8
Merge pull request #497 from ollmer/mmlu_fix
StellaAthena Jun 15, 2023
ecba73d
Update README.md
StellaAthena Jun 15, 2023
b281b09
Update README.md
StellaAthena Jun 15, 2023
a28c019
Allow HFLM model to be initialized with transformers.PreTrainedModel …
svenhendrikx Jun 16, 2023
42caa66
Turn bigbench_resources into module such that it's included in the build
svenhendrikx Jun 16, 2023
ddc634f
Add logic to simple_evaluate to instantiate HFLM from transformers.Pr…
svenhendrikx Jun 16, 2023
e6960b9
Merge branch 'master' into instantiate-model-from-Automodel
svenhendrikx Jun 18, 2023
0570991
[triviaqa] The ground truth must be a *substring* of the generated an…
Vermeille Jun 21, 2023
a6952a0
Change collation order for greedy_until
haileyschoelkopf Jun 25, 2023
f6b81c6
Update README.md
lintangsutawika Jun 26, 2023
5e56fbf
Merge pull request #517 from jquesnelle/scrolls
haileyschoelkopf Jun 26, 2023
0dd4519
Merge pull request #610 from Vermeille/patch-1
StellaAthena Jun 26, 2023
f2ae05a
Merge pull request #616 from EleutherAI/patch-sort-descending
lintangsutawika Jun 27, 2023
ac0cb7e
Revert "[triviaqa] The ground truth must be a *substring* of the gene…
haileyschoelkopf Jun 27, 2023
9f4862f
Merge pull request #621 from EleutherAI/revert-610-patch-1
haileyschoelkopf Jun 27, 2023
e0498dd
change self.gpt2 -> self.model
haileyschoelkopf Jun 27, 2023
2a92cde
fix issue with dumping model_name to results
haileyschoelkopf Jun 27, 2023
5f1d18d
remove tokenizer assert
haileyschoelkopf Jun 27, 2023
d362dfe
pass batch size
haileyschoelkopf Jun 27, 2023
4221d30
switch to using get_model()
haileyschoelkopf Jun 27, 2023
1e98c74
update docstring
haileyschoelkopf Jun 27, 2023
72ee34d
Merge pull request #1 from EleutherAI/pass-automodel
svenhendrikx Jun 27, 2023
72b7f0c
Merge pull request #601 from svenhendrikx/instantiate-model-from-Auto…
haileyschoelkopf Jun 27, 2023
6028177
Dockerfile added
Shas3011 Jun 28, 2023
9923190
fix 'Descriptors cannot not be created directly.'
tothemoon96 Jun 28, 2023
c9c141d
add err handling for multi-tok stopseq
haileyschoelkopf Jun 29, 2023
bc10a39
Merge pull request #630 from EleutherAI/fix-stopseq
haileyschoelkopf Jun 29, 2023
e9f1af3
add trust_remote_code to tokenizer.from_pretrained
if001 Jun 30, 2023
13014b2
Merge pull request #637 from if001/trust_remote_code
haileyschoelkopf Jun 30, 2023
e26cfb0
Merge branch 'EleutherAI:master' into master
Shas3011 Jun 30, 2023
a946c6c
Fix trust_remote_code, bnb_4bit_*, max_batch_size for gpt2
gakada Jul 2, 2023
d153705
Merge pull request #643 from gakada/master
haileyschoelkopf Jul 3, 2023
44363c3
fix typo
haileyschoelkopf Jul 3, 2023
a8996a2
Merge pull request #623 from Shas3011/master
haileyschoelkopf Jul 3, 2023
caadcbc
Update babi.py
haileyschoelkopf Jul 3, 2023
9ad2fc3
Account for padding in inplen calculation
haileyschoelkopf Jul 3, 2023
35f1b5a
Update base.py
haileyschoelkopf Jul 3, 2023
25dfd3f
Merge pull request #389 from Muennighoff/babi
haileyschoelkopf Jul 3, 2023
318bd98
Merge branch 'EleutherAI:master' into fix_ptun
sywangyi Jul 4, 2023
491ec98
Merge pull request #533 from sywangyi/fix_ptun
haileyschoelkopf Jul 4, 2023
8c5117b
Add json files in package
SingL3 Jul 5, 2023
1736d78
Merge pull request #653 from SingL3/master
lintangsutawika Jul 5, 2023
6345d4e
implement C-Eval
HYZ17 Jul 8, 2023
c1e722d
implement C-Eval
HYZ17 Jul 8, 2023
3863ecd
implement C-Eval
HYZ17 Jul 8, 2023
eea0300
add C-Eval to task table
HYZ17 Jul 8, 2023
4c7f3e0
Update README.md
jiminHuang Jul 13, 2023
38bdf92
Update huggingface.py
jiminHuang Jul 13, 2023
e9ac57a
Update task_table.md
haileyschoelkopf Jul 13, 2023
5023fa0
Update ceval.py
haileyschoelkopf Jul 13, 2023
42f8206
Merge pull request #664 from HYZ17/ceval
haileyschoelkopf Jul 13, 2023
3557720
Merge pull request #1 from jiminHuang/master
tothemoon96 Jul 14, 2023
11e650d
add csatqa
guijinSON Jul 16, 2023
f6afabd
add csatqa
guijinSON Jul 16, 2023
b18c86f
Update csatqa.py
guijinSON Jul 17, 2023
89855a3
Delete .DS_Store
guijinSON Jul 17, 2023
8e95e13
Delete .DS_Store
guijinSON Jul 17, 2023
d3cb0c8
Delete .DS_Store
haileyschoelkopf Jul 17, 2023
df3da98
update csatqa.py
guijinSON Jul 17, 2023
6cfbdf9
Update huggingface.py
jiminHuang Jul 20, 2023
062549c
Update __init__.py
jiminHuang Jul 20, 2023
9ecf896
Merge branch 'chancefocus:master' into master
jiminHuang Jul 20, 2023
3920369
Merge pull request #2 from jiminHuang/master
tothemoon96 Jul 20, 2023
11ab205
add supports for python3.8
tothemoon96 Jul 21, 2023
ebe7ac5
Merge branch 'master' of https://github.com/chancefocus/financial-eva…
tothemoon96 Jul 21, 2023
21d1ebf
merge from lm-evaluation-harness
tothemoon96 Jul 21, 2023
fff14df
add use_fast
tothemoon96 Jul 21, 2023
f7ba252
add support for llama-based model
tothemoon96 Jul 22, 2023
1a2d51a
Update huggingface.py
jiminHuang Jul 26, 2023
298f777
Update __init__.py
jiminHuang Jul 26, 2023
7b67fca
Merge pull request #3 from jiminHuang/master
tothemoon96 Jul 26, 2023
7a1ec14
Create haerae.py
guijinSON Jul 27, 2023
9799eb1
Update __init__.py
guijinSON Jul 27, 2023
602cf79
Update haerae.py
guijinSON Jul 27, 2023
e961047
Update csatqa.py
guijinSON Jul 27, 2023
ca830da
Update haerae.py
guijinSON Jul 27, 2023
948e741
Update haerae.py
guijinSON Jul 27, 2023
693d256
Update setup.py
pminervini Jul 27, 2023
c195583
Merge pull request #707 from pminervini/master
haileyschoelkopf Jul 30, 2023
5e59782
Merge pull request #706 from guijinSON/master
guijinSON Jul 31, 2023
1b9833d
remove results folder
haileyschoelkopf Aug 1, 2023
4fbbd60
Merge pull request #718 from EleutherAI/remove-results-folder
haileyschoelkopf Aug 1, 2023
f580860
add bnb_4bit_use_double_quant and low_cpu_mem_usage
jiqing-feng Aug 2, 2023
4c6d15f
revise huggingface.py
Aug 2, 2023
5fbbb75
revise huggingface.py
Aug 2, 2023
f056ed6
Merge pull request #4 from Dai-shen/huggingface_revise
jiminHuang Aug 2, 2023
fe803c2
Merge pull request #722 from jiqing-feng/4bit_double_quant
haileyschoelkopf Aug 4, 2023
4a936ef
update hf
tothemoon96 Aug 6, 2023
60ca6ed
merge raw in main
tothemoon96 Aug 6, 2023
d504944
updated syntax to new anthropic API
baberabb Aug 6, 2023
5a49b2a
Merge pull request #738 from baberabb/master_anthropic
haileyschoelkopf Aug 6, 2023
d6aa3b6
fix llama tokenizer
tothemoon96 Aug 7, 2023
dfaee42
fix llama
tothemoon96 Aug 7, 2023
f08f7c7
Update triviaqa.py
StellaAthena Aug 8, 2023
b952a20
Merge pull request #746 from EleutherAI/StellaAthena-patch-2
lintangsutawika Aug 8, 2023
9ab8538
Update crowspairs.py
haileyschoelkopf Aug 10, 2023
24e5972
Add CMMLU Benchmark
haonan-li Aug 11, 2023
fd1c719
Merge pull request #765 from EleutherAI/haileyschoelkopf-patch-1
haileyschoelkopf Aug 13, 2023
aaef2c4
Merge pull request #772 from haonan-li/master
haileyschoelkopf Aug 18, 2023
da545ee
update base lib
tothemoon96 Aug 21, 2023
dac9073
Merge remote-tracking branch 'raw/master'
tothemoon96 Aug 21, 2023
736411f
Update huggingface.py
jiminHuang Aug 25, 2023
b754ab2
feature: add VLLM interface
jiminHuang Oct 1, 2023
4ce8801
feature: add VLLM interface
jiminHuang Oct 1, 2023
a2a3bec
Merge pull request #6 from jiminHuang/master
tothemoon96 Oct 1, 2023
9d6eb9d
add tensor parallel to vllm
tothemoon96 Oct 13, 2023
d46cbfe
Update huggingface.py
Me1oyy May 29, 2024
03b90db
Merge pull request #7 from Me1oyy/master
Me1oyy May 29, 2024
afea752
Update scrolls.py
blitzionic Sep 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ env
data/
lm_cache
.idea

*.egg-info/
5 changes: 0 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ repos:
- id: destroyed-symlinks
- id: detect-private-key
- id: end-of-file-fixer
- id: no-commit-to-branch
- id: requirements-txt-fixer
- id: trailing-whitespace
- id: fix-byte-order-marker
Expand All @@ -24,10 +23,6 @@ repos:
args: [--remove]
- id: mixed-line-ending
args: [--fix=lf]
- repo: https://github.com/pycqa/flake8
rev: 3.7.9
hooks:
- id: flake8
- repo: https://github.com/psf/black
rev: 22.3.0
hooks:
Expand Down
2 changes: 1 addition & 1 deletion CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1 +1 @@
* @jon-tow @StellaAthena @haileyschoelkopf @lintangsutawika
* @haileyschoelkopf @lintangsutawika
30 changes: 30 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
FROM nvidia/cuda:11.2.0-cudnn8-runtime-ubuntu20.04


### Install python 3.10 and set it as default python interpreter
RUN apt update && apt install software-properties-common -y && \
add-apt-repository ppa:deadsnakes/ppa -y && apt update && \
apt install curl -y && \
apt install python3.10 -y && \
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1 && \
update-alternatives --install /usr/bin/python python /usr/bin/python3.10 1 && \
apt install python3.10-venv python3.10-dev -y && \
curl -Ss https://bootstrap.pypa.io/get-pip.py | python3.10 && \
apt-get clean && rm -rf /var/lib/apt/lists/


### Copy files
COPY . /lm-evaluation-harness/

### Set working directory

WORKDIR /lm-evaluation-harness


### Install requirements
RUN pip install --no-cache-dir -e .
### Run bash
CMD ["/bin/bash"]



189 changes: 2 additions & 187 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,188 +1,3 @@
# Language Model Evaluation Harness
# Financial Evaluation Framework

## Overview

This project provides a unified framework to test generative language models on a large number of different evaluation tasks.

Features:

- 200+ tasks implemented. See the [task-table](./docs/task_table.md) for a complete list.
- Support for models loaded via [transformers](https://github.com/huggingface/transformers/) (including quantization via [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ)), [GPT-NeoX](https://github.com/EleutherAI/gpt-neox), and [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed/), with a flexible tokenization-agnostic interface.
- Support for commercial APIs including [OpenAI](https://openai.com), [goose.ai](https://goose.ai), and [TextSynth](https://textsynth.com/).
- Support for evaluation on adapters (e.g. LoRa) supported in [HuggingFace's PEFT library](https://github.com/huggingface/peft).
- Evaluating with publicly available prompts ensures reproducibility and comparability between papers.
- Task versioning to ensure reproducibility when tasks are updated.

## Install

To install `lm-eval` from the github repository main branch, run:

```bash
git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
```

To install additional multilingual tokenization and text segmentation packages, you must install the package with the `multilingual` extra:

```bash
pip install -e ".[multilingual]"
```

To support loading GPTQ quantized models, install the package with the `auto-gptq` extra:

```bash
pip install -e ".[auto-gptq]"
```

## Basic Usage

> **Note**: When reporting results from eval harness, please include the task versions (shown in `results["versions"]`) for reproducibility. This allows bug fixes to tasks while also ensuring that previously reported scores are reproducible. See the [Task Versioning](#task-versioning) section for more info.

### Hugging Face `transformers`

To evaluate a model hosted on the [HuggingFace Hub](https://huggingface.co/models) (e.g. GPT-J-6B) on `hellaswag` you can use the following command:


```bash
python main.py \
--model hf-causal \
--model_args pretrained=EleutherAI/gpt-j-6B \
--tasks hellaswag \
--device cuda:0
```

Additional arguments can be provided to the model constructor using the `--model_args` flag. Most notably, this supports the common practice of using the `revisions` feature on the Hub to store partially trained checkpoints, or to specify the datatype for running a model:

```bash
python main.py \
--model hf-causal \
--model_args pretrained=EleutherAI/pythia-160m,revision=step100000,dtype="float" \
--tasks lambada_openai,hellaswag \
--device cuda:0
```

To evaluate models that are loaded via `AutoSeq2SeqLM` in Huggingface, you instead use `hf-seq2seq`. *To evaluate (causal) models across multiple GPUs, use `--model hf-causal-experimental`*

> **Warning**: Choosing the wrong model may result in erroneous outputs despite not erroring.

### Commercial APIs

Our library also supports language models served via the OpenAI API:

```bash
export OPENAI_API_SECRET_KEY=YOUR_KEY_HERE
python main.py \
--model gpt3 \
--model_args engine=davinci \
--tasks lambada_openai,hellaswag
```

While this functionality is only officially maintained for the official OpenAI API, it tends to also work for other hosting services that use the same API such as [goose.ai](goose.ai) with minor modification. We also have an implementation for the [TextSynth](https://textsynth.com/index.html) API, using `--model textsynth`.

To verify the data integrity of the tasks you're performing in addition to running the tasks themselves, you can use the `--check_integrity` flag:

```bash
python main.py \
--model gpt3 \
--model_args engine=davinci \
--tasks lambada_openai,hellaswag \
--check_integrity
```

### Other Frameworks

A number of other libraries contain scripts for calling the eval harness through their library. These include [GPT-NeoX](https://github.com/EleutherAI/gpt-neox/blob/main/eval_tasks/eval_adapter.py), [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed/blob/main/examples/MoE/readme_evalharness.md), and [mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/blob/master/eval_harness.py).

💡 **Tip**: You can inspect what the LM inputs look like by running the following command:

```bash
python write_out.py \
--tasks all_tasks \
--num_fewshot 5 \
--num_examples 10 \
--output_base_path /path/to/output/folder
```

This will write out one text file for each task.

## Advanced Usage

For models loaded with the HuggingFace `transformers` library, any arguments provided via `--model_args` get passed to the relevant constructor directly. This means that anything you can do with `AutoModel` can be done with our library. For example, you can pass a local path via `pretrained=` or use models finetuned with [PEFT](https://github.com/huggingface/peft) by taking the call you would run to evaluate the base model and add `,peft=PATH` to the `model_args` argument:
```bash
python main.py \
--model hf-causal-experimental \
--model_args pretrained=EleutherAI/gpt-j-6b,peft=nomic-ai/gpt4all-j-lora \
--tasks openbookqa,arc_easy,winogrande,hellaswag,arc_challenge,piqa,boolq \
--device cuda:0
```

GPTQ quantized models can be loaded by specifying their file names in `,quantized=NAME` (or `,quantized=True` for default names) in the `model_args` argument:

```bash
python main.py \
--model hf-causal-experimental \
--model_args pretrained=model-name-or-path,quantized=model.safetensors,gptq_use_triton=True \
--tasks hellaswag
```

We support wildcards in task names, for example you can run all of the machine-translated lambada tasks via `--task lambada_openai_mt_*`.

We currently only support one prompt per task, which we strive to make the "standard" as defined by the benchmark's authors. If you would like to study how varying prompts causes changes in the evaluation score, check out the [BigScience fork](https://github.com/bigscience-workshop/lm-evaluation-harness) of this repo. We are currently working on upstreaming this capability to `main`.

## Implementing new tasks

To implement a new task in the eval harness, see [this guide](./docs/task_guide.md).

## Task Versioning

To help improve reproducibility, all tasks have a `VERSION` field. When run from the command line, this is reported in a column in the table, or in the "version" field in the evaluator return dict. The purpose of the version is so that if the task definition changes (i.e to fix a bug), then we can know exactly which metrics were computed using the old buggy implementation to avoid unfair comparisons. To enforce this, there are unit tests that make sure the behavior of all tests remains the same as when they were first implemented. Task versions start at 0, and each time a breaking change is made, the version is incremented by one.

When reporting eval harness results, please also report the version of each task. This can be done either with a separate column in the table, or by reporting the task name with the version appended as such: taskname-v0.

## Test Set Decontamination

To address concerns about train / test contamination, we provide utilities for comparing results on a benchmark using only the data points not found in the model training set. Unfortunately, outside of models trained on the Pile and C4, its very rare that people who train models disclose the contents of the training data. However this utility can be useful to evaluate models you have trained on private data, provided you are willing to pre-compute the necessary indices. We provide computed indices for 13-gram exact match deduplication against the Pile, and plan to add additional precomputed dataset indices in the future (including C4 and min-hash LSH deduplication).

For details on text decontamination, see the [decontamination guide](./docs/decontamination.md).

Note that the directory provided to the `--decontamination_ngrams_path` argument should contain the ngram files and info.json. See the above guide for ngram generation for the pile, this could be adapted for other training sets.

```bash
python main.py \
--model gpt2 \
--tasks sciq \
--decontamination_ngrams_path path/containing/training/set/ngrams \
--device cuda:0
```

## Cite as

```
@software{eval-harness,
author = {Gao, Leo and
Tow, Jonathan and
Biderman, Stella and
Black, Sid and
DiPofi, Anthony and
Foster, Charles and
Golding, Laurence and
Hsu, Jeffrey and
McDonell, Kyle and
Muennighoff, Niklas and
Phang, Jason and
Reynolds, Laria and
Tang, Eric and
Thite, Anish and
Wang, Ben and
Wang, Kevin and
Zou, Andy},
title = {A framework for few-shot language model evaluation},
month = sep,
year = 2021,
publisher = {Zenodo},
version = {v0.0.1},
doi = {10.5281/zenodo.5371628},
url = {https://doi.org/10.5281/zenodo.5371628}
}
```
Welcome to the Financial Evaluation Framework! This project is a forked version, specifically tailored for the financial sector. Our goal is to adapt and expand the original framework to enable the assessment of generative language models on a variety of evaluation tasks unique to financial data analysis and forecasting.
59 changes: 59 additions & 0 deletions docs/task_table.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,13 @@
|reversed_words | |✓ | | 10000|acc |
|rte |✓ |✓ | | 277|acc |
|sciq |✓ |✓ |✓ | 1000|acc, acc_norm |
|scrolls_contractnli |✓ |✓ | | 1037|em, acc, acc_norm |
|scrolls_govreport |✓ |✓ | | 972|rouge1, rouge2, rougeL |
|scrolls_narrativeqa |✓ |✓ | | 3425|f1 |
|scrolls_qasper |✓ |✓ | | 984|f1 |
|scrolls_qmsum |✓ |✓ | | 272|rouge1, rouge2, rougeL |
|scrolls_quality |✓ |✓ | | 2086|em, acc, acc_norm |
|scrolls_summscreenfd |✓ |✓ | | 338|rouge1, rouge2, rougeL |
|squad2 |✓ |✓ | | 11873|exact, f1, HasAns_exact, HasAns_f1, NoAns_exact, NoAns_f1, best_exact, best_f1 |
|sst |✓ |✓ | | 872|acc |
|swag |✓ |✓ | | 20006|acc, acc_norm |
Expand Down Expand Up @@ -371,3 +378,55 @@
|xwinograd_pt | | |✓ | 263|acc |
|xwinograd_ru | | |✓ | 315|acc |
|xwinograd_zh | | |✓ | 504|acc |
| Ceval-valid-computer_network | | ✓ | | 19 | acc |
| Ceval-valid-operating_system | | ✓ | | 19 | acc |
| Ceval-valid-computer_architecture | | ✓ | | 21 | acc |
| Ceval-valid-college_programming | | ✓ | | 37 | acc |
| Ceval-valid-college_physics | | ✓ | | 19 | acc |
| Ceval-valid-college_chemistry | | ✓ | | 24 | acc |
| Ceval-valid-advanced_mathematics | | ✓ | | 19 | acc |
| Ceval-valid-probability_and_statistics | | ✓ | | 18 | acc |
| Ceval-valid-discrete_mathematics | | ✓ | | 16 | acc |
| Ceval-valid-electrical_engineer | | ✓ | | 37 | acc |
| Ceval-valid-metrology_engineer | | ✓ | | 24 | acc |
| Ceval-valid-high_school_mathematics | | ✓ | | 18 | acc |
| Ceval-valid-high_school_physics | | ✓ | | 19 | acc |
| Ceval-valid-high_school_chemistry | | ✓ | | 19 | acc |
| Ceval-valid-high_school_biology | | ✓ | | 19 | acc |
| Ceval-valid-middle_school_mathematics | | ✓ | | 19 | acc |
| Ceval-valid-middle_school_biology | | ✓ | | 21 | acc |
| Ceval-valid-middle_school_physics | | ✓ | | 19 | acc |
| Ceval-valid-middle_school_chemistry | | ✓ | | 20 | acc |
| Ceval-valid-veterinary_medicine | | ✓ | | 23 | acc |
| Ceval-valid-college_economics | | ✓ | | 55 | acc |
| Ceval-valid-business_administration | | ✓ | | 33 | acc |
| Ceval-valid-marxism | | ✓ | | 19 | acc |
| Ceval-valid-mao_zedong_thought | | ✓ | | 24 | acc |
| Ceval-valid-education_science | | ✓ | | 29 | acc |
| Ceval-valid-teacher_qualification | | ✓ | | 44 | acc |
| Ceval-valid-high_school_politics | | ✓ | | 19 | acc |
| Ceval-valid-high_school_geography | | ✓ | | 19 | acc |
| Ceval-valid-middle_school_politics | | ✓ | | 21 | acc |
| Ceval-valid-middle_school_geography | | ✓ | | 12 | acc |
| Ceval-valid-modern_chinese_history | | ✓ | | 23 | acc |
| Ceval-valid-ideological_and_moral_cultivation | | ✓ | | 19 | acc |
| Ceval-valid-logic | | ✓ | | 22 | acc |
| Ceval-valid-law | | ✓ | | 24 | acc |
| Ceval-valid-chinese_language_and_literature | | ✓ | | 23 | acc |
| Ceval-valid-art_studies | | ✓ | | 33 | acc |
| Ceval-valid-professional_tour_guide | | ✓ | | 29 | acc |
| Ceval-valid-legal_professional | | ✓ | | 23 | acc |
| Ceval-valid-high_school_chinese | | ✓ | | 19 | acc |
| Ceval-valid-high_school_history | | ✓ | | 20 | acc |
| Ceval-valid-middle_school_history | | ✓ | | 22 | acc |
| Ceval-valid-civil_servant | | ✓ | | 47 | acc |
| Ceval-valid-sports_science | | ✓ | | 19 | acc |
| Ceval-valid-plant_protection | | ✓ | | 22 | acc |
| Ceval-valid-basic_medicine | | ✓ | | 19 | acc |
| Ceval-valid-clinical_medicine | | ✓ | | 22 | acc |
| Ceval-valid-urban_and_rural_planner | | ✓ | | 46 | acc |
| Ceval-valid-accountant | | ✓ | | 49 | acc |
| Ceval-valid-fire_engineer | | ✓ | | 31 | acc |
| Ceval-valid-environmental_impact_assessment_engineer | | ✓ | | 31 | acc |
| Ceval-valid-tax_accountant | | ✓ | | 49 | acc |
| Ceval-valid-physician | | ✓ | | 49 | acc |
Loading