You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I have a question about the setting of KV Cache Budget. In the code, it sets a fixed HH_SIZE and RECENT_SIZE, but in the paper, the experiments are performed under various total KV Cache Budgets (%). So given a KV cache budget, like 20%, how does the KV cache split between the heavy hitter tokens and the recent tokens?
The text was updated successfully, but these errors were encountered:
Hi, I have a question about the setting of KV Cache Budget. In the code, it sets a fixed HH_SIZE and RECENT_SIZE, but in the paper, the experiments are performed under various total KV Cache Budgets (%). So given a KV cache budget, like 20%, how does the KV cache split between the heavy hitter tokens and the recent tokens?
The text was updated successfully, but these errors were encountered: