Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--pipeline full optimizations for Laptop performance #301

Closed
cdoern opened this issue Oct 9, 2024 · 0 comments
Closed

--pipeline full optimizations for Laptop performance #301

cdoern opened this issue Oct 9, 2024 · 0 comments
Assignees

Comments

@cdoern
Copy link
Contributor

cdoern commented Oct 9, 2024

our full pipeline is dependent on Mixtral-8x7B-Instruct-v0.1. This model, even in GGUF form, is massive (Q4_K_M is 24 GB). This causes a bottleneck in the LLMBlock portion of the process, specifically:

            response = self.ctx.client.completions.create(
                prompt=prompts, **self.gen_kwargs
            )

When running on a laptop, or even a server the gen_spellcheck and gen_knowledge blocks take forever because we are asking the system to process large tasks on already limited hardware serving a model too big for the system without using swap.

switching to https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/tree/main which is derived from https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 solves this problem and cuts down the time for each block significantly. A run that took me 28 hours on a 10 core, 64GB machine now takes roughly 2 hours. These are some significant gains.

Another positive aspect is that Mixtral and Mistral Instruct use the same prompt templates, meaning we only need to add mistral as an entry into our profile mappings so that when users provide a mistral-... model, we use the already existing Mixtral prompt template!

This issue also encompasses some logging enhancement for people who want to see the progress of their generation task (loading bars, printing the prompts, etc).

@cdoern cdoern self-assigned this Oct 9, 2024
@ktam3 ktam3 closed this as completed Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants