-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decode Wrong Token #16
Comments
For llama2, you do not need to download the weights yourself. Just launch the api_server with |
Is there a difference between the two method? the LLama model which I used is also download from huggingface |
You may refer to the downloader code to see if you have missed some details during convertering. |
@PKUFlyingPig my environment is the same as above, when the max_token is short is 20, it runs well. But when it is 512,it crash as above, can you help us to fix it, please? |
一样的问题呢。
|
当prompt的长度过长的时候就会出现这个错误。 |
我也遇到了这个问题,我发现貌似是SwiftTransformer中findmax函数的max_idx没有初始化,所以返回了一个错误的token_idx,给他初始赋个词表内的值应该就能解决问题。但按理来说max_idx肯定在接下来的查询词表概率中最大的值的计算中会被赋值,可是似乎有些时候没被赋值?这似乎说明SwiftTransformer在某些情况下计算的结果是错误的?这一错误也会导致生成结果混乱,如issue #40所示。 @PKUFlyingPig 请问作者你们在实验时遇到过这种情况吗? |
一样的问题,应该是作者在论文中只针对OPT做了验证实验,其他模型并没做扩展 |
model: Llama-2-7b-hf
step:
1、python3 converter.py --input "Llama-2-7b-hf/*.bin"--output /datasets/distserve/llama-7b --dtype float16 --model llama
2、python3 api_server/distserve_api_server.py --port 6902 --model /datasets/distserve/llama-7b --context-tensor-parallel-size 1 --decoding-tensor-parallel-size 1
3、python3 evaluation/2-benchmark-serving/0-prepare-dataset.py --dataset-path Sharegpt
4、python3 evaluation/2-benchmark-serving/2-benchmark-serving.py --port 6902
the error message:
SwiftTransformer/src/csrc/model/gpt/gpt.cc:278 'cudaMemcpy(ith_context_req_req_index.ptr, ith_context_req_req_index_cpu, sizeof(int64_t) * batch_size, cudaMemcpyHostToDevice)': (700) an illegal memory access was encountered
The text was updated successfully, but these errors were encountered: