Replies: 2 comments 1 reply
-
Hi @slewie, One guess here is that your transformer models are running on CPU instead of GPU when loading through the For reference, you can use the from guidance import models
gpt = models.Transformers('gpt2', device_map='auto') # `auto` or any other device map |
Beta Was this translation helpful? Give feedback.
-
I think there is more interesting problem, that size doesn't have impact on computation time. I have loaded two models: model_llama34 = models.Transformers("Phind/Phind-CodeLlama-34B-v2", echo=False, device_map='balanced')
model_llama7 = models.Transformers("codellama/CodeLlama-7b-Instruct-hf", echo=False, device_map='balanced') and tried prompts with different length: |
Beta Was this translation helpful? Give feedback.
-
Hello! I tried to classify some texts using guidance, but they were slow, and I tried using transformers, and it turned out that the guidance were about 5 times slower.
Code:
Results for that model:
Guidance
![image](https://private-user-images.githubusercontent.com/57148398/306186786-42b23f82-d21d-407e-a253-2210ab861319.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNzg2MzAsIm5iZiI6MTczOTA3ODMzMCwicGF0aCI6Ii81NzE0ODM5OC8zMDYxODY3ODYtNDJiMjNmODItZDIxZC00MDdlLWEyNTMtMjIxMGFiODYxMzE5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDA1MTg1MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBhZDI4MDdjODM4YTRhMDEzZDM0ZDEzMjRlOGJmZWJjMWUyMTE1YTcyYmJiY2NkZjI1YjQxZjg0YzgxNjVkNWYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.h2v8aoHW5iTQdecMtWP95qsnQ5ae5ttHmXNiVFyu_S4)
transformers
![image](https://private-user-images.githubusercontent.com/57148398/306186903-8d121d4a-ea05-427f-a1c5-ce7bbc0fa1e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNzg2MzAsIm5iZiI6MTczOTA3ODMzMCwicGF0aCI6Ii81NzE0ODM5OC8zMDYxODY5MDMtOGQxMjFkNGEtZWEwNS00MjdmLWExYzUtY2U3YmJjMGZhMWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDA1MTg1MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTU2NzU2MTM1OGQwMmI0NWQ4MzdjN2YwZTMxMGU3MGVmZTFhZGFkNjkyNzdiMmE4MzgwY2E4MTU3OGZmZmM5MWImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.kygPS5nih24rbAhAYNhSTVQU0XdmUMGhLDNIapEDNCk)
Also in the guidance there is practically no difference between the sizes of the models, they work with the same time. I tried big codellama and got the same results
Why does this problem occur?
Beta Was this translation helpful? Give feedback.
All reactions