You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
our full pipeline is dependent on Mixtral-8x7B-Instruct-v0.1. This model, even in GGUF form, is massive (Q4_K_M is 24 GB). This causes a bottleneck in the LLMBlock portion of the process, specifically:
When running on a laptop, or even a server the gen_spellcheck and gen_knowledge blocks take forever because we are asking the system to process large tasks on already limited hardware serving a model too big for the system without using swap.
Another positive aspect is that Mixtral and Mistral Instruct use the same prompt templates, meaning we only need to add mistral as an entry into our profile mappings so that when users provide a mistral-... model, we use the already existing Mixtral prompt template!
This issue also encompasses some logging enhancement for people who want to see the progress of their generation task (loading bars, printing the prompts, etc).
The text was updated successfully, but these errors were encountered:
our full pipeline is dependent on
Mixtral-8x7B-Instruct-v0.1
. This model, even in GGUF form, is massive (Q4_K_M is 24 GB). This causes a bottleneck in theLLMBlock
portion of the process, specifically:When running on a laptop, or even a server the
gen_spellcheck
andgen_knowledge
blocks take forever because we are asking the system to process large tasks on already limited hardware serving a model too big for the system without using swap.switching to https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/tree/main which is derived from https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 solves this problem and cuts down the time for each block significantly. A run that took me 28 hours on a 10 core, 64GB machine now takes roughly 2 hours. These are some significant gains.
Another positive aspect is that Mixtral and Mistral Instruct use the same prompt templates, meaning we only need to add mistral as an entry into our profile mappings so that when users provide a
mistral-...
model, we use the already existing Mixtral prompt template!This issue also encompasses some logging enhancement for people who want to see the progress of their generation task (loading bars, printing the prompts, etc).
The text was updated successfully, but these errors were encountered: