Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bfloat16 execution #1121

Closed
guillaumekln opened this issue Mar 9, 2023 · 2 comments · Fixed by #1316
Closed

Support bfloat16 execution #1121

guillaumekln opened this issue Mar 9, 2023 · 2 comments · Fixed by #1316
Labels
enhancement New feature or request

Comments

@guillaumekln
Copy link
Collaborator

Models that are trained with bfloat16 can have numerical issues when run with float16. See #1074 for example.

We should consider supporting bfloat16 execution which is supported on recent Intel CPUs and NVIDIA GPUs.

@wsxiaoys
Copy link
Contributor

wsxiaoys commented Jun 13, 2023

To add another potential related data point of this, I converted SantaCoder ( a float16 trained model) to ctranslate2 for inference, when combined with beam_size = 1:

  1. int8 / int8_float16 generate reasonble outputs.
  2. float16 generate unusable outputs.

Converted model locates here: https://huggingface.co/TabbyML/SantaCoder-1B/tree/main/ctranslate2

The reported issue: TabbyML/tabby#236

@guillaumekln
Copy link
Collaborator Author

The linked PR is adding support for bfloat16.

Just like other types, you can select it during conversion with --quantization bfloat16 or when loading the model with compute_type="bfloat16". The quantization mode int8_bfloat16 is also supported.

It would be helpful if anyone watching this issue can test the implementation and give feedback. To install the development build:

  1. Go to this build page
  2. Download the artifact "python-wheels"
  3. Extract the archive
  4. Install the wheel matching your system and Python version with pip install --force-reinstall <wheel>

Note that a GPU with Compute Capability 8 or greater is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants