Skip to content

Commit

Permalink
Apply min_new_tokens=2 to mixtral-8x7b, address #1777 (#1884)
Browse files Browse the repository at this point in the history
  • Loading branch information
nvzhihanj authored Oct 22, 2024
1 parent ecb8801 commit f74d16f
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 9 deletions.
2 changes: 1 addition & 1 deletion compliance/nvidia/TEST06/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This repository provides the config files and scripts to run and verify TEST 06

The purpose of this test is to ensure the consistency of the output of the LLM (Llama2 and Mixtral) model and avoid a potential EOS exploit. This test will make a performance run, with a limit of 100 samples and logging them into `mlperf_log_accuracy.json`. To achieve a passing result in this test, three criteria must be met:
- In the case the first token is reported independently (not applicable for Offline scenario), it should match for every query with the first token of the model output.
- For each query, the model output should only end with zero or one EOS token. The only exception for 2 EOS tokens is when the entire output sequences are EOS tokens (i.e. output is [eos_token_id, eos_token_id])
- For each query, the model output should only end with zero or one EOS token.
- The number of reported tokens should match with the length of output sequence.

## Requisites
Expand Down
3 changes: 1 addition & 2 deletions compliance/nvidia/TEST06/run_verification.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,7 @@ def eos_check(acc_data, dtype, eos_token_id=2):
if data[i] == eos_token_id:
n_eos_tokens += 1
if n_eos_tokens >= 2:
# Allow output to be [eos_token_id, eos_token_id]
return len(data) == 2
return False
if data[i] != eos_token_id:
break
i-=1
Expand Down
13 changes: 8 additions & 5 deletions language/mixtral-8x7b/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,9 @@ rclone copyurl https://inference.mlcommons-storage.org/mixtral_8x7b%2F2024.06.06
#### Using wget

Alternatively, you can simply cd into the folder where you want to place the dataset and run

TBD: The dataset is being replaced in v5.0 due to https://github.com/mlcommons/inference/issues/1777

```bash
wget https://inference.mlcommons-storage.org/mixtral_8x7b%2F2024.06.06_mixtral_15k_v4.pkl
```
Expand Down Expand Up @@ -261,17 +264,17 @@ python -u evaluate-accuracy.py --checkpoint-path [path_to_model_checkpoint] \
Reference scores:
Open Orca:
```json
{'rouge1': 45.4911, 'rouge2': 23.2829, 'rougeL': 30.3615}
{'rouge1': 45.5989, 'rouge2': 23.3526, 'rougeL': 30.4608}
```
GSM8K:
```json
{'gsm8k': 73.78}
{'gsm8k': 73.66}
```
MBXP:
```json
{'mbxp': 60.12}
{'mbxp': 60.16}
```
For official submissions, 99% of each reference score is enforced. Additionally, 90%-110% of the generated tokens_per_samples:
For official submissions, 99% of each reference score is enforced. Additionally, 90%-110% of the generated tokens_per_samples (counting all the non-EOS tokens):
```json
{'tokens_per_sample': 145.9}
{'tokens_per_sample': 144.84}
```
2 changes: 1 addition & 1 deletion language/mixtral-8x7b/SUT.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
gen_kwargs = {
"early_stopping": True,
"max_new_tokens": 1024,
"min_new_tokens": 1,
"min_new_tokens": 2,
"num_beams": 1,
"do_sample": False
}
Expand Down

0 comments on commit f74d16f

Please sign in to comment.