-
Hi, thanks for making this awesome framework available! I followed a tutorial and was able to fine-tune the Gemma 2 2B model using a dataset from HF. The process seems to have worked, since I can test inference with and without the generated adapter and get the expected results. Now I could fuse the model and adapter and convert it to GGUF using
Can you please give input on how the adapter can be used in llama.cpp without fusing into the base model? Thanks a lot! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
I don't know the format of what llama.cpp expects for adapters but whatever it is it will likely require a conversion to make it work. But I think it should be quite doable.. You could file an issue in their repo, here, or both to see if someone can write a script converting MLX safetensors adapters for use with llama.cpp. |
Beta Was this translation helpful? Give feedback.
-
I'm not 100% sure, but I think llama.cpp expects the "Huggingface PEFT adapter format", which is somewhat documented here. My assumption is based on reading convert_lora_to_gguf.py. |
Beta Was this translation helpful? Give feedback.
-
Hi @awni import json
from safetensors.torch import load_file, save_file
from pathlib import Path
loaded_state_dict = load_file("adapters/adapters.safetensors")
def rename_key(old_key):
# Prepend prefix
new_key = f"base_model.model.{old_key}"
# lora_a -> lora_A.weight
new_key = new_key.replace('lora_a', 'lora_A.weight')
# lora_b -> lora_B.weight
new_key = new_key.replace('lora_b', 'lora_B.weight')
return new_key
def convert_value(old_value):
return old_value.transpose(0, 1).contiguous()
new_state_dict = {
rename_key(k): convert_value(v) for k, v in loaded_state_dict.items()
}
# Print shapes for verification
for key, value in new_state_dict.items():
print(f"{key}: {value.shape} ({value.dtype})")
path = Path("converted")
path.mkdir(exist_ok=True)
save_file(new_state_dict, "converted/adapter_model.safetensors")
def get_target_modules_from_state_dict(state_dict):
modules = set()
for key in state_dict.keys():
# Example: "model.layers.1.self_attn.q_proj.lora_a"
# We want the "q_proj" part
parts = key.split('.')
for part in parts:
if part.endswith('_proj'):
modules.add(part)
return sorted(list(modules))
target_modules = get_target_modules_from_state_dict(loaded_state_dict)
def mlx_to_peft_config(mlx_config, target_modules, output_path):
# Base PEFT-Konfiguration
peft_config = {
"alpha_pattern": {},
"auto_mapping": None,
"base_model_name_or_path": mlx_config["model"],
"bias": "none",
"fan_in_fan_out": False,
"inference_mode": True,
"init_lora_weights": True,
"layers_pattern": None,
"layers_to_transform": None,
"loftq_config": {},
"lora_alpha": mlx_config["lora_parameters"]["alpha"],
"lora_dropout": mlx_config["lora_parameters"]["dropout"],
"megatron_config": None,
"megatron_core": "megatron.core",
"modules_to_save": None,
"peft_type": "LORA",
"r": mlx_config["lora_parameters"]["rank"],
"rank_pattern": {},
"revision": None,
"target_modules": target_modules,
"task_type": "CAUSAL_LM",
"use_rslora": False
}
output_dir = Path(output_path).parent
output_dir.mkdir(parents=True, exist_ok=True)
with open(output_path, 'w') as f:
json.dump(peft_config, f, indent=2)
return peft_config
mlx_config_path = "adapters/adapter_config.json"
with open(mlx_config_path) as f:
mlx_config = json.load(f)
mlx_to_peft_config(mlx_config, target_modules, "converted/adapter_config.json") It assumes your MLX generated fine-tuning adapter is stored in The script performs these main operations:
The resulting adapter can be converted to GGUF using the script from python convert_lora_to_gguf.py --base <path to your base model> --outtype q8_0 <path to the converted adapter> Where mlx_lm.fuse --model <HF-vendor/model> --save-path <path to your base model> --de-quantize How do we proceed from here? |
Beta Was this translation helpful? Give feedback.
Hi @awni
After understanding more about how LoRA works and using tools like gguf-tools, I arrived at the following conversion script: