Skip to content

support for LLM.int8

Compare
Choose a tag to compare
@pszemraj pszemraj released this 31 Jan 04:11
· 3 commits to main since this release
9108f66

On GPU, you can now use LLM.int8 to use less memory:

from textsum.summarize import Summarizer
summarizer = Summarizer(load_in_8bit=True) # loads default model in LLM.int8, taking 1/4 of the memory

What's Changed

Full Changelog: v0.1.3...v0.1.5