Converting LoRA (adapters.safetensors) to GGUF #1507

stippi2 · 2024-10-21T13:19:42Z

stippi2
Oct 21, 2024

Hi,

thanks for making this awesome framework available! I followed a tutorial and was able to fine-tune the Gemma 2 2B model using a dataset from HF. The process seems to have worked, since I can test inference with and without the generated adapter and get the expected results.

Now I could fuse the model and adapter and convert it to GGUF using mlx_lm.fuse. However, I would like to keep the adapter separate and instead use the new "hot-swapping" of LoRAs in the llama.cpp. Or run Ollama with base model and adapter like described here. While trying to do this, I ran into these problems:

Some tutorials that I found seem to use scripts that no longer exist in llama.cpp.
It is unclear to me whether the adapter_config.json generated by MLX is even compatible with what llama.cpp expects.

Can you please give input on how the adapter can be used in llama.cpp without fusing into the base model? Thanks a lot!

Answered by stippi2

Oct 24, 2024

Hi @awni
After understanding more about how LoRA works and using tools like gguf-tools, I arrived at the following conversion script:

import json
from safetensors.torch import load_file, save_file
from pathlib import Path


loaded_state_dict = load_file("adapters/adapters.safetensors")

def rename_key(old_key):
    # Prepend prefix
    new_key = f"base_model.model.{old_key}"
    # lora_a -> lora_A.weight
    new_key = new_key.replace('lora_a', 'lora_A.weight')
    # lora_b -> lora_B.weight
    new_key = new_key.replace('lora_b', 'lora_B.weight')
    return new_key

def convert_value(old_value):
    return old_value.transpose(0, 1).contiguous()

new_state_dict = {
    rename_key(k): conver…

View full answer

awni · 2024-10-22T03:19:50Z

awni
Oct 22, 2024
Maintainer

I don't know the format of what llama.cpp expects for adapters but whatever it is it will likely require a conversion to make it work. But I think it should be quite doable.. You could file an issue in their repo, here, or both to see if someone can write a script converting MLX safetensors adapters for use with llama.cpp.

1 reply

stippi2 Oct 22, 2024
Author

Thanks. Do you know whether it will just be the adapter_config.json that needs conversion, or also the adapters.safetensors? Sorry, if this is a dumb question, but I only started learning on the topic recently.

stippi2 · 2024-10-22T12:27:31Z

stippi2
Oct 22, 2024
Author

I'm not 100% sure, but I think llama.cpp expects the "Huggingface PEFT adapter format", which is somewhat documented here. My assumption is based on reading convert_lora_to_gguf.py.
If I understand correctly, the MLX adapters.safetensors file contains the LoRA weights, presumably with a mapping describing to which matrix in the base model the respective A and B matrices of the adapter are to be applied. Am I correct that one could convert the file to the HF PEFT format adapter_model.safetensors with some sort of "search and replace" in the keys approach?
Any guidance appreciated. I would then open a PR if I can get it to work.

0 replies

stippi2 · 2024-10-24T10:40:17Z

stippi2
Oct 24, 2024
Author

Hi @awni
After understanding more about how LoRA works and using tools like gguf-tools, I arrived at the following conversion script:

import json
from safetensors.torch import load_file, save_file
from pathlib import Path


loaded_state_dict = load_file("adapters/adapters.safetensors")

def rename_key(old_key):
    # Prepend prefix
    new_key = f"base_model.model.{old_key}"
    # lora_a -> lora_A.weight
    new_key = new_key.replace('lora_a', 'lora_A.weight')
    # lora_b -> lora_B.weight
    new_key = new_key.replace('lora_b', 'lora_B.weight')
    return new_key

def convert_value(old_value):
    return old_value.transpose(0, 1).contiguous()

new_state_dict = {
    rename_key(k): convert_value(v) for k, v in loaded_state_dict.items()
}

# Print shapes for verification
for key, value in new_state_dict.items():
    print(f"{key}: {value.shape} ({value.dtype})")

path = Path("converted")
path.mkdir(exist_ok=True)

save_file(new_state_dict, "converted/adapter_model.safetensors")

def get_target_modules_from_state_dict(state_dict):
    modules = set()
    for key in state_dict.keys():
        # Example: "model.layers.1.self_attn.q_proj.lora_a"
        # We want the "q_proj" part
        parts = key.split('.')
        for part in parts:
            if part.endswith('_proj'):
                modules.add(part)
    return sorted(list(modules))

target_modules = get_target_modules_from_state_dict(loaded_state_dict)

def mlx_to_peft_config(mlx_config, target_modules, output_path):
    # Base PEFT-Konfiguration
    peft_config = {
        "alpha_pattern": {},
        "auto_mapping": None,
        "base_model_name_or_path": mlx_config["model"],
        "bias": "none",
        "fan_in_fan_out": False,
        "inference_mode": True,
        "init_lora_weights": True,
        "layers_pattern": None,
        "layers_to_transform": None,
        "loftq_config": {},
        "lora_alpha": mlx_config["lora_parameters"]["alpha"],
        "lora_dropout": mlx_config["lora_parameters"]["dropout"],
        "megatron_config": None,
        "megatron_core": "megatron.core",
        "modules_to_save": None,
        "peft_type": "LORA",
        "r": mlx_config["lora_parameters"]["rank"],
        "rank_pattern": {},
        "revision": None,
        "target_modules": target_modules,
        "task_type": "CAUSAL_LM",
        "use_rslora": False
    }

    output_dir = Path(output_path).parent
    output_dir.mkdir(parents=True, exist_ok=True)

    with open(output_path, 'w') as f:
        json.dump(peft_config, f, indent=2)

    return peft_config

mlx_config_path = "adapters/adapter_config.json"
with open(mlx_config_path) as f:
    mlx_config = json.load(f)

mlx_to_peft_config(mlx_config, target_modules, "converted/adapter_config.json")

It assumes your MLX generated fine-tuning adapter is stored in ./adapters. It creates the converted adapter files in ./converted.

The script performs these main operations:

It renames the keys in the adapters.safetensors dictionary following the pattern:

- model.layers.0.self_attn.q_proj.lora_a
+ base_model.model.model.layers.0.self_attn.q_proj.lora_A.weight

It transposes the tensors: torch.Size([2304, 8]) -> torch.Size([8, 2304]) in the adapters.safetensors file and renames it to adapter_model.safetensors
It transforms the adapter_config.json file to the HF PEFT format.
- It generates the appropriate target_modules list by collecting the *_proj tensor keys present in the .safetensors file.
- It maps the values for the keys base_model_name_or_path, lora_alpha, lora_dropout, and r (LoRA rank) fields.

The resulting adapter can be converted to GGUF using the script from llama.cpp like so:

python convert_lora_to_gguf.py --base <path to your base model> --outtype q8_0 <path to the converted adapter>

Where path to your base model contains the results of doing:

mlx_lm.fuse --model <HF-vendor/model> --save-path <path to your base model> --de-quantize

How do we proceed from here?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting LoRA (adapters.safetensors) to GGUF #1507

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Converting LoRA (adapters.safetensors) to GGUF #1507

stippi2 Oct 21, 2024

Replies: 3 comments · 1 reply

awni Oct 22, 2024 Maintainer

stippi2 Oct 22, 2024 Author

stippi2 Oct 22, 2024 Author

stippi2 Oct 24, 2024 Author

stippi2
Oct 21, 2024

Replies: 3 comments 1 reply

awni
Oct 22, 2024
Maintainer

stippi2 Oct 22, 2024
Author

stippi2
Oct 22, 2024
Author

stippi2
Oct 24, 2024
Author