-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding the interplay between ratio
and iterative_size
#61
Comments
Hi @acnagle, thank you for your support of LLMLingua. This is a great question, and I believe other users may have similar queries. The actual compression ratio indeed has a certain relationship with the However, since it's challenging to directly obtain the actual global PPL distribution after compression, it's also difficult to get a true and accurate quantile. Thus, we update the PPL distribution per segment. A smaller I believe that by improving the I hope this answers your question, and thank you again for your support. |
Hello again, I was reviewing the code and realized that I might not have understood your explanation as well as I thought. Just to make sure I understand: by using a smaller You also mentioned in your comment that there might be a way to improve the |
Hi @acnagle, Yes, the purpose of iterative compression is to minimize the approximation loss in eq. 5. This approximation can be improved in two ways:
|
Thank you for your response. I'm looking forward to the followup work! |
Edit: I opened a PR #208 I am not 100% sure this is entirely correct, but it does fix / reduce the overcompression. Diff for 8k prompt compressed at rate=0.5 before and after the fix. Note how in the former for much of the prompt, compression is way too aggressive. @acnagle @iofu728 Maybe I have a major misunderstanding, but I am fairly certain that there is a bug in the threshold estimation. Note that I am not talking about the perplexity calculation, I understand that there the previous segments must be included. From my understanding, the PPL threshold should be calculated based on the PPL values of the current segment, but the LLMLingua/llmlingua/prompt_compressor.py Lines 1705 to 1713 in 7a440a1
Therefore the target compression ratio is way overshot. This also explains the behaviour I have described in #195. It gets worse with longer prompts, naturally. I have also checked diffs of uncompressed and compressed prompts and noticed that compression gets more and more aggressive towards the end of the prompt, only leaving a few words from every other sentence: https://www.diffchecker.com/vSaXla9g/ |
Thank you for the interesting work, and making the code easily accessible. I have some confusion on the relationship between the
ratio
anditerative_size
parameters.In the case I am interested, there is a single demonstration that I want to compress using only the token-level compression approach. I've noticed that, in general, the final ratio between the compressed and original length can vary quite a bit for large enough' ratio' values. However, I noticed that when I make the
iterative_size
parameter small, e.g. 10, the final compressed ratio is more truthful to the value specified for theratio
parameter.I'm confused as to why this is the case. From the paper, my understanding was that \gamma_j threshold for segment s_j (whose length is defined by the
iterative_size
parameter), was based primarily on theratio
parameter. Meaning that, regardless of theiterative_size
, LLMLingua would always pruneratio
percentage of the tokens in that segment.Any clarifications of this would be useful, including where in the code \gamma_j is computed.
The text was updated successfully, but these errors were encountered: