You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was evaluating how well the (Long)LLMLingua is able to achieve the requested compression rate (focusing on the rate parameter, not target_tokens) and came to these conclusions:
For smaller prompts (< 150 tokens) barely any compression can be achieved, if any at all
Requested compression rate is best achieved for prompts around 2000 tokens
For longer prompts (>5000 tokens) the requested rate is overshot (or undershot)
More detailed results are below.
My question is, am doing something wrong when invoking LLMLingua, or is this behaviour normal?
I adhered to the usage examples in README.md:
I tested with the default Llama 2 7b as well as with GPT-2. It seems that with the smaller model the deviation overall is smaller than with the bigger model.
(Prompt lengths measured using the GPT-3.5 tokenizer)
LLMLingua with Llama 2
LLMLingua with GPT-2
LongLLMLingua with Llama 2
LongLLMLingua with GPT-2
In contrast, LLMLingua-2 adheres to the requested compression rate quite well, only slightly overshooting the requested rate:
LLMLingua-2
The prompts I used are truncated from the longest prompt in the LongBench GovReport task (link).
The text was updated successfully, but these errors were encountered:
Reopening as I cannot figure out how to correctly use LLMLingua without overshooting the target compression rate.
No matter how I set iterative_size, large prompts (2K+) are overcompressed.
cornzz
changed the title
[Question]: Achieved compression rate with (Long)LLMLingua not meeting expectations?
[Bug]: Achieved compression rate with (Long)LLMLingua overshot
Jan 16, 2025
cornzz
added a commit
to cornzz/LLMLingua
that referenced
this issue
Jan 16, 2025
I was evaluating how well the (Long)LLMLingua is able to achieve the requested compression rate (focusing on the
rate
parameter, nottarget_tokens
) and came to these conclusions:More detailed results are below.
My question is, am doing something wrong when invoking LLMLingua, or is this behaviour normal?
I adhered to the usage examples in README.md:
Code snippet
I tested with the default Llama 2 7b as well as with GPT-2. It seems that with the smaller model the deviation overall is smaller than with the bigger model.
(Prompt lengths measured using the GPT-3.5 tokenizer)
LLMLingua with Llama 2
LLMLingua with GPT-2
LongLLMLingua with Llama 2
LongLLMLingua with GPT-2
In contrast, LLMLingua-2 adheres to the requested compression rate quite well, only slightly overshooting the requested rate:
LLMLingua-2
The prompts I used are truncated from the longest prompt in the LongBench GovReport task (link).
The text was updated successfully, but these errors were encountered: