Understanding the interplay between `ratio` and `iterative_size` #61

acnagle · 2024-01-17T19:27:47Z

Thank you for the interesting work, and making the code easily accessible. I have some confusion on the relationship between the ratio and iterative_size parameters.

In the case I am interested, there is a single demonstration that I want to compress using only the token-level compression approach. I've noticed that, in general, the final ratio between the compressed and original length can vary quite a bit for large enough' ratio' values. However, I noticed that when I make the iterative_size parameter small, e.g. 10, the final compressed ratio is more truthful to the value specified for the ratio parameter.

I'm confused as to why this is the case. From the paper, my understanding was that \gamma_j threshold for segment s_j (whose length is defined by the iterative_size parameter), was based primarily on the ratio parameter. Meaning that, regardless of the iterative_size, LLMLingua would always prune ratio percentage of the tokens in that segment.

Any clarifications of this would be useful, including where in the code \gamma_j is computed.

The text was updated successfully, but these errors were encountered:

iofu728 · 2024-01-18T13:04:34Z

Hi @acnagle, thank you for your support of LLMLingua.

This is a great question, and I believe other users may have similar queries.

The actual compression ratio indeed has a certain relationship with the iterative_size. Specifically, the calculation of (\gamma_j) is based on the global Perplexity (PPL) distribution, determined by calculating the quantile according to the compression ratio, as detailed https://github.com/microsoft/LLMLingua/blob/main/llmlingua/prompt_compressor.py#L621.

However, since it's challenging to directly obtain the actual global PPL distribution after compression, it's also difficult to get a true and accurate quantile. Thus, we update the PPL distribution per segment. A smaller iterative_size results in more sampling points for this estimation, leading to more accurate compression. This is particularly evident when the original prompt size is smaller.

I believe that by improving the get_estimate_threshold_base_distribution, the estimation of the PPL distribution can be made more accurate.

I hope this answers your question, and thank you again for your support.

acnagle · 2024-02-23T08:17:58Z

Hello again, I was reviewing the code and realized that I might not have understood your explanation as well as I thought. Just to make sure I understand: by using a smaller iterative_size, we are able to get a better approximation for p(\tilde{s}_j) according to eq. 5 in the paper. And if we have a better approximation then our chosen \gamma_j quantile will be more faithful to the compression ratio that we passed in. Is this correct?

You also mentioned in your comment that there might be a way to improve the get_estimate_threshold_base_distribution. Do you have any suggestions in this direction?

iofu728 · 2024-02-26T07:37:03Z

Hi @acnagle,

Yes, the purpose of iterative compression is to minimize the approximation loss in eq. 5. This approximation can be improved in two ways:

By explicitly learning forward compressed results through training, we will release work based on this perspective next month.
By fitting a curve to compensate for the gap.

acnagle · 2024-02-26T18:04:49Z

Thank you for your response. I'm looking forward to the followup work!

cornzz · 2025-01-16T15:44:23Z

Edit: I opened a PR #208 I am not 100% sure this is entirely correct, but it does fix / reduce the overcompression. Diff for 8k prompt compressed at rate=0.5 before and after the fix. Note how in the former for much of the prompt, compression is way too aggressive.

@acnagle @iofu728 Maybe I have a major misunderstanding, but I am fairly certain that there is a bug in the threshold estimation. Note that I am not talking about the perplexity calculation, I understand that there the previous segments must be included. However, from my understanding, the PPL threshold should be calculated based on the PPL values of the current segment, but the loss tensor includes not only the ppl of the current segment, but also of all previous segments, which have already been compressed (as well as the following segments, up to the context length of the model):

LLMLingua/llmlingua/prompt_compressor.py

Lines 1705 to 1713 in 7a440a1

    
           if condition_compare: 
        
               self_loss = self_past_loss 
        
               threshold = self.get_estimate_threshold_base_distribution( 
        
                   self_loss[: loss[start:].shape[0]] - loss[start:], ratio, False 
        
               ) 
        
           else: 
        
               threshold = self.get_estimate_threshold_base_distribution( 
        
                   loss, ratio, False 
        
               )

The average ppl of all previously compressed segments is much higher than that of the current segment, therefore the calculated threshold is higher than it should be for the current segment and the target compression ratio is way overshot.

This also explains the behaviour I have described in #195. It gets worse with longer prompts, naturally. I have also checked diffs of uncompressed and compressed prompts and noticed that compression gets more and more aggressive towards the end of the prompt, only leaving a few words from every other sentence: https://www.diffchecker.com/vSaXla9g/

iofu728 self-assigned this Jan 18, 2024

iofu728 added the question Further information is requested label Jan 18, 2024

acnagle closed this as completed Jan 25, 2024

acnagle reopened this Feb 23, 2024

acnagle closed this as completed Feb 26, 2024

This was referenced Nov 14, 2024

[Bug]: Achieved compression rate with (Long)LLMLingua overshot #195

Open

[Bug]: Prompts smaller than iterative_size are not compressed #196

Open

cornzz mentioned this issue Jan 16, 2025

Fix perplexity calculation and resulting overcompression #208

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding the interplay between `ratio` and `iterative_size` #61

Understanding the interplay between `ratio` and `iterative_size` #61

acnagle commented Jan 17, 2024

iofu728 commented Jan 18, 2024

acnagle commented Feb 23, 2024

iofu728 commented Feb 26, 2024

acnagle commented Feb 26, 2024

cornzz commented Jan 16, 2025 •

edited

Loading

Understanding the interplay between ratio and iterative_size #61

Understanding the interplay between ratio and iterative_size #61

Comments

acnagle commented Jan 17, 2024

iofu728 commented Jan 18, 2024

acnagle commented Feb 23, 2024

iofu728 commented Feb 26, 2024

acnagle commented Feb 26, 2024

cornzz commented Jan 16, 2025 • edited Loading

Understanding the interplay between `ratio` and `iterative_size` #61

Understanding the interplay between `ratio` and `iterative_size` #61

cornzz commented Jan 16, 2025 •

edited

Loading