We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vllm serve THUDM/glm-4-9b-chat --api-key token-abc123 --tensor-parallel-size 4 --gpu-memory-utilization 0.95 --max_model_len 131072 --trust-remote-code
README中启动服务时最长长度 128k,在 paper 中, Long 的数据集都大于 128k,麻烦给一下具体启动服务的命令和 rope的配置,我这里没法复现出 paper 里的 long 的指标
The text was updated successfully, but these errors were encountered:
你好,Long (>128k) 只是评测数据的一个subset,代表所有测试数据中长度大于 128k token 的数据集合。在所有数据上的评测我们都是用的--max_model_len 131072,对于超过 128k token 的序列作截断。
--max_model_len 131072
Sorry, something went wrong.
数据集的介绍中数据集长度最长会到 2m,请问下
看了下 Paper,是从中间截断,请问这种方式如何确保 context 中的答案信息能够被保留?另外对于 Expert来说,看的是全文,对模型来说是截断的128k context,这里的比较是否会有diff
No branches or pull requests
vllm serve THUDM/glm-4-9b-chat --api-key token-abc123 --tensor-parallel-size 4 --gpu-memory-utilization 0.95 --max_model_len 131072 --trust-remote-code
README中启动服务时最长长度 128k,在 paper 中, Long 的数据集都大于 128k,麻烦给一下具体启动服务的命令和 rope的配置,我这里没法复现出 paper 里的 long 的指标
The text was updated successfully, but these errors were encountered: