`--pipeline full` optimizations for Laptop performance #301

cdoern · 2024-10-09T15:07:44Z

our full pipeline is dependent on Mixtral-8x7B-Instruct-v0.1. This model, even in GGUF form, is massive (Q4_K_M is 24 GB). This causes a bottleneck in the LLMBlock portion of the process, specifically:

            response = self.ctx.client.completions.create(
                prompt=prompts, **self.gen_kwargs
            )

When running on a laptop, or even a server the gen_spellcheck and gen_knowledge blocks take forever because we are asking the system to process large tasks on already limited hardware serving a model too big for the system without using swap.

switching to https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/tree/main which is derived from https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 solves this problem and cuts down the time for each block significantly. A run that took me 28 hours on a 10 core, 64GB machine now takes roughly 2 hours. These are some significant gains.

Another positive aspect is that Mixtral and Mistral Instruct use the same prompt templates, meaning we only need to add mistral as an entry into our profile mappings so that when users provide a mistral-... model, we use the already existing Mixtral prompt template!

This issue also encompasses some logging enhancement for people who want to see the progress of their generation task (loading bars, printing the prompts, etc).

The text was updated successfully, but these errors were encountered:

cdoern self-assigned this Oct 9, 2024

ktam3 closed this as completed Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`--pipeline full` optimizations for Laptop performance #301

`--pipeline full` optimizations for Laptop performance #301

cdoern commented Oct 9, 2024 •

edited

Loading

--pipeline full optimizations for Laptop performance #301

--pipeline full optimizations for Laptop performance #301

Comments

cdoern commented Oct 9, 2024 • edited Loading

`--pipeline full` optimizations for Laptop performance #301

`--pipeline full` optimizations for Laptop performance #301

cdoern commented Oct 9, 2024 •

edited

Loading