[Bug]: Achieved compression rate with (Long)LLMLingua overshot #195

cornzz · 2024-11-14T11:32:42Z

I was evaluating how well the (Long)LLMLingua is able to achieve the requested compression rate (focusing on the rate parameter, not target_tokens) and came to these conclusions:

For smaller prompts (< 150 tokens) barely any compression can be achieved, if any at all
Requested compression rate is best achieved for prompts around 2000 tokens
For longer prompts (>5000 tokens) the requested rate is overshot (or undershot)

More detailed results are below.
My question is, am doing something wrong when invoking LLMLingua, or is this behaviour normal?
I adhered to the usage examples in README.md:

Code snippet

compressor = PromptCompressor(
    model_name="NousResearch/Llama-2-7b-hf",  # or "openai-community/gpt2"
    device_map="balanced"
)
...
def compress(prompt, rate, question=""):
    if longllmlingua:
        res = compressor.compress_prompt(
            [prompt],
            question=question,
            rate=rate,
            condition_in_question="after_condition",
            reorder_context="sort",
            dynamic_context_compression_ratio=0.3,
            condition_compare=True,
            rank_method="longllmlingua",
        )
    else:
        res = compressor.compress_prompt(prompt, rate=rate)
    return res

I tested with the default Llama 2 7b as well as with GPT-2. It seems that with the smaller model the deviation overall is smaller than with the bigger model.

(Prompt lengths measured using the GPT-3.5 tokenizer)

LLMLingua with Llama 2

LLMLingua with GPT-2

LongLLMLingua with Llama 2

LongLLMLingua with GPT-2

In contrast, LLMLingua-2 adheres to the requested compression rate quite well, only slightly overshooting the requested rate:

LLMLingua-2

The prompts I used are truncated from the longest prompt in the LongBench GovReport task (link).

The text was updated successfully, but these errors were encountered:

cornzz · 2024-11-14T13:30:14Z

-- Moved to separate issue: #196 --

cornzz · 2025-01-03T00:23:56Z

Reopening as I cannot figure out how to correctly use LLMLingua without overshooting the target compression rate.
No matter how I set iterative_size, large prompts (2K+) are overcompressed.

…ion (microsoft#195)

cornzz added the question Further information is requested label Nov 14, 2024

cornzz mentioned this issue Nov 14, 2024

[Bug]: Prompts smaller than iterative_size are not compressed #196

Open

cornzz closed this as completed Nov 15, 2024

cornzz reopened this Jan 3, 2025

cornzz mentioned this issue Jan 16, 2025

Understanding the interplay between ratio and iterative_size #61

Closed

cornzz changed the title ~~[Question]: Achieved compression rate with (Long)LLMLingua not meeting expectations?~~ [Bug]: Achieved compression rate with (Long)LLMLingua overshot Jan 16, 2025

cornzz added a commit to cornzz/LLMLingua that referenced this issue Jan 16, 2025

Fix(LLMLingua): fix perplexity calculation and resulting overcompress…

aabda13

…ion (microsoft#195)

cornzz mentioned this issue Jan 16, 2025

Fix perplexity calculation and resulting overcompression #208

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Achieved compression rate with (Long)LLMLingua overshot #195

[Bug]: Achieved compression rate with (Long)LLMLingua overshot #195

cornzz commented Nov 14, 2024 •

edited

Loading

cornzz commented Nov 14, 2024 •

edited

Loading

cornzz commented Jan 3, 2025

[Bug]: Achieved compression rate with (Long)LLMLingua overshot #195

[Bug]: Achieved compression rate with (Long)LLMLingua overshot #195

Comments

cornzz commented Nov 14, 2024 • edited Loading

cornzz commented Nov 14, 2024 • edited Loading

cornzz commented Jan 3, 2025

cornzz commented Nov 14, 2024 •

edited

Loading

cornzz commented Nov 14, 2024 •

edited

Loading