Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding the interplay between ratio and iterative_size #61

Closed
acnagle opened this issue Jan 17, 2024 · 5 comments
Closed

Understanding the interplay between ratio and iterative_size #61

acnagle opened this issue Jan 17, 2024 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@acnagle
Copy link

acnagle commented Jan 17, 2024

Thank you for the interesting work, and making the code easily accessible. I have some confusion on the relationship between the ratio and iterative_size parameters.

In the case I am interested, there is a single demonstration that I want to compress using only the token-level compression approach. I've noticed that, in general, the final ratio between the compressed and original length can vary quite a bit for large enough' ratio' values. However, I noticed that when I make the iterative_size parameter small, e.g. 10, the final compressed ratio is more truthful to the value specified for the ratio parameter.

I'm confused as to why this is the case. From the paper, my understanding was that \gamma_j threshold for segment s_j (whose length is defined by the iterative_size parameter), was based primarily on the ratio parameter. Meaning that, regardless of the iterative_size, LLMLingua would always prune ratio percentage of the tokens in that segment.

Any clarifications of this would be useful, including where in the code \gamma_j is computed.

@iofu728 iofu728 self-assigned this Jan 18, 2024
@iofu728 iofu728 added the question Further information is requested label Jan 18, 2024
@iofu728
Copy link
Contributor

iofu728 commented Jan 18, 2024

Hi @acnagle, thank you for your support of LLMLingua.

This is a great question, and I believe other users may have similar queries.

The actual compression ratio indeed has a certain relationship with the iterative_size. Specifically, the calculation of (\gamma_j) is based on the global Perplexity (PPL) distribution, determined by calculating the quantile according to the compression ratio, as detailed https://github.com/microsoft/LLMLingua/blob/main/llmlingua/prompt_compressor.py#L621.

However, since it's challenging to directly obtain the actual global PPL distribution after compression, it's also difficult to get a true and accurate quantile. Thus, we update the PPL distribution per segment. A smaller iterative_size results in more sampling points for this estimation, leading to more accurate compression. This is particularly evident when the original prompt size is smaller.

I believe that by improving the get_estimate_threshold_base_distribution, the estimation of the PPL distribution can be made more accurate.

I hope this answers your question, and thank you again for your support.

@acnagle acnagle closed this as completed Jan 25, 2024
@acnagle
Copy link
Author

acnagle commented Feb 23, 2024

Hello again, I was reviewing the code and realized that I might not have understood your explanation as well as I thought. Just to make sure I understand: by using a smaller iterative_size, we are able to get a better approximation for p(\tilde{s}_j) according to eq. 5 in the paper. And if we have a better approximation then our chosen \gamma_j quantile will be more faithful to the compression ratio that we passed in. Is this correct?

You also mentioned in your comment that there might be a way to improve the get_estimate_threshold_base_distribution. Do you have any suggestions in this direction?

@acnagle acnagle reopened this Feb 23, 2024
@iofu728
Copy link
Contributor

iofu728 commented Feb 26, 2024

Hi @acnagle,

Yes, the purpose of iterative compression is to minimize the approximation loss in eq. 5. This approximation can be improved in two ways:

  1. By explicitly learning forward compressed results through training, we will release work based on this perspective next month.
  2. By fitting a curve to compensate for the gap.

@acnagle
Copy link
Author

acnagle commented Feb 26, 2024

Thank you for your response. I'm looking forward to the followup work!

@cornzz
Copy link

cornzz commented Jan 16, 2025

Edit: I opened a PR #208 I am not 100% sure this is entirely correct, but it does fix / reduce the overcompression. Diff for 8k prompt compressed at rate=0.5 before and after the fix. Note how in the former for much of the prompt, compression is way too aggressive.

@acnagle @iofu728 Maybe I have a major misunderstanding, but I am fairly certain that there is a bug in the threshold estimation. Note that I am not talking about the perplexity calculation, I understand that there the previous segments must be included. From my understanding, the PPL threshold should be calculated based on the PPL values of the current segment, but the loss tensor includes not only the ppl of the current segment, but also of all previous segments, which have already been compressed (as well as the following segments, up to the context length of the model):

if condition_compare:
self_loss = self_past_loss
threshold = self.get_estimate_threshold_base_distribution(
self_loss[: loss[start:].shape[0]] - loss[start:], ratio, False
)
else:
threshold = self.get_estimate_threshold_base_distribution(
loss, ratio, False
)

Therefore the target compression ratio is way overshot.

This also explains the behaviour I have described in #195. It gets worse with longer prompts, naturally. I have also checked diffs of uncompressed and compressed prompts and noticed that compression gets more and more aggressive towards the end of the prompt, only leaving a few words from every other sentence: https://www.diffchecker.com/vSaXla9g/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants