Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't create a session (local model) #979

Open
1 of 5 tasks
djannot opened this issue Oct 16, 2024 · 11 comments
Open
1 of 5 tasks

Can't create a session (local model) #979

djannot opened this issue Oct 16, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@djannot
Copy link

djannot commented Oct 16, 2024

System Info

transformers.js 2.17.2

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

I have my model available under the /models/tokenizer and /models/onnx/onnx paths.

I'm loading the tokenizer with:

        const tokenizerPath = '/tokenizer/';
        tokenizer = await AutoTokenizer.from_pretrained(tokenizerPath);

And the model with:

        const modelPath = '/onnx/';
        model = await AutoModel.from_pretrained(modelPath, {
            model_file_name: "model",
            quantized: false
        });

I have my model under the 2 paths because it always adds /onnx a second time. I also had to set quantized to false because otherwise it was adding a suffix to the file.

I found it very difficult to find a way to get it loading the model on the right path.

Anyway, it's now getting the tokenizer and model files correctly, but then I get this error in the browser console (same with chrome and firefox).

Error: Can't create a session

Reproduction

I can't provide full reproduction steps because I'm using a local model.

But the code is:

import { AutoModel, AutoTokenizer, env } from '@xenova/transformers';

let model;
let tokenizer;

// Function to load the tokenizer
async function loadTokenizer() {
    try {
        // Path to your tokenizer files
        const tokenizerPath = '/tokenizer/';

        // Initialize tokenizer
        tokenizer = await AutoTokenizer.from_pretrained(tokenizerPath);

        console.log('Tokenizer loaded successfully.');
    } catch (error) {
        console.error('Failed to load the tokenizer:', error);
    }
}

// Function to load the ONNX model
async function loadModel() {
    try {
        // Path to your ONNX model
        const modelPath = '/onnx/';

        // Initialize model
        //env.remoteHost = 'https://hf-mirror.com';
        model = await AutoModel.from_pretrained(modelPath, {
            model_file_name: "model",
            quantized: false
        });

        console.log('ONNX Model loaded successfully.');
    } catch (error) {
        console.error('Failed to load the model:', error);
    }
}
@djannot djannot added the bug Something isn't working label Oct 16, 2024
@djannot
Copy link
Author

djannot commented Oct 17, 2024

I've created a small snippet to check my onnx model is fine:

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("./public/models/onnx/onnx")
model = ORTModelForCausalLM.from_pretrained("./public/models/onnx/onnx", use_cache=False, use_io_binding=False)

inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="pt")

gen_tokens = model.generate(**inputs,do_sample=True,temperature=0.9, min_length=20,max_length=20)
response = tokenizer.batch_decode(gen_tokens)
print("Generated text:", response)

So it works with optimum.onnxruntime and I'd like to understand how to make it work with transformers.js

@BritishWerewolf
Copy link
Contributor

Hey, I've done some work with custom models.
My workflow looks something like this.

The most important thing to note when using pretrained models is to ensure the structure is correct.
This is just public_root/models/[name_of_model]/onnx/model.onnx.
My folder structure for the model (which is U2Net) looks like this.

public/models/u2netp/onnx/model.onnx
public/models/u2netp/config.json
public/models/u2netp/preprocessor_config.json

It's also possible to have a model_quantized.onnx that will be the default to AutoModel.

Then to reference that model I have the following.

import { env, AutoModel, AutoProcessor, RawImage } from '@xenova/transformers';

// Force transformers to only look locally and not make any fetch requests.
env.allowLocalModels = true;
env.allowRemoteModels = false;

async function main() {
    // Create the processor.
    // The name here, should match the name of the folder in `models`
    const processor = await AutoProcessor.from_pretrained('u2netp')
    .catch(error => new Error(error));
    if (processor instanceof Error) {
        console.log(processor.message);
        return;
    }
    
    // U2Net is an image based model, so you might skip this step.
    const url = 'https://example.com/test.png';
    const image = await RawImage.fromURL(url)
    .catch(error => new Error(error));
    if (image instanceof Error) {
        console.error(image.message);
        return;
    }

    // Preprocess the image.
    const processed = await processor(image);

    // Create the model, again the name should match the name of the folder.
    // I am passing quantized: false because I do not have a
    // `model_quantized.onnx` within the folder.
    const model = await AutoModel.from_pretrained('u2netp', {
        quantized: false,
    });

    // Get the outputs of the model.
    const outputs = await model({ 'input': processed });
}
main();

@djannot
Copy link
Author

djannot commented Oct 18, 2024

Thanks @BritishWerewolf

I have the right structure now, but I still get this error. I think it's caused by the way the model is converted.

I also get this warning before the error:

onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::string, int> &, const logging::Logger &, bool, const std::string &, int) ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 5 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain ai.onnx.ml is till opset 3.

I've tried changing the opset value as well when exporting my model to onnx (from gguf) with the optimum-cli but could find a way to make it working.

So maybe the problem is coming from the way I generate my original gguf mode. I'm using unsloth to train the model and export it to gguf.

@BritishWerewolf
Copy link
Contributor

It looks like you have everything nearly there, but your ONNX model is using a under development opset.

To be clear, are you saying you did something like this:

optimum-cli export onnx --model path_to_gguf_model --output path_to_output_directory --opset 3

I think setting to opset 3 is important.

Other than that, I have not worked with GGUF so not entirely sure how to help.
It seems that if you can export or convert to ONNX then the rest of the code will work.

@djannot
Copy link
Author

djannot commented Oct 18, 2024

I used --opset 18.

I came to the conclusing I should use this value when reading https://onnxruntime.ai/docs/reference/compatibility.html#onnx-opset-support

I've also tried with 3 because of the warning, but in that case I got:

Opset 3 is lower than the recommended minmum opset (14) to export llama. The ONNX export may fail or the exported model may be suboptimal.
...
ValueError: Unsupported ONNX opset version: 3

@djannot
Copy link
Author

djannot commented Oct 18, 2024

I switched to @huggingface/transformers and now I get:

app.js:46 Failed to load the model: 20419424

I'm also getting the same error if I try to use the onnxruntime-web package directly.

@BritishWerewolf
Copy link
Contributor

What are the specifics of the model?

Looking at the onnx-runtime the following issue was created:
microsoft/onnxruntime-genai#761 (comment)

Currently, ONNX Runtime GenAI's model builder only supports converting float16/float32 GGUF models and not already-quantized GGUF models. If you have the original float16/float32 weights in a GGUF file, you can try using that to get the ONNX model.

Does any of that help?
Are you using float16?

@djannot
Copy link
Author

djannot commented Oct 18, 2024

Thanks @BritishWerewolf

I'm going to try to disable quantization.

@djannot
Copy link
Author

djannot commented Oct 18, 2024

Same without quantization in the original gguf model

@BritishWerewolf
Copy link
Contributor

Can you help me to understand GGUF?
How do you create models? I am wondering if I can replicate something on my machine.

@djannot
Copy link
Author

djannot commented Oct 19, 2024

I'm using unsloth to train my model and export it to gguf.

Here is the code I use:

from unsloth import FastLanguageModel
import torch
import json
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = False # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-1B-Instruct",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

from datasets import Dataset, DatasetDict

# Load the JSON file manually
with open("prompts.json", "r") as file:
    data = json.load(file)

# Function to flatten and format the conversations
def format_conversations(data):
    formatted_convos = []
    for convo in data:
        formatted_convo = [{"from": message["from"], "value": message["value"]} for message in convo]
        formatted_convos.append({"conversations": formatted_convo})
    return formatted_convos

# Apply formatting
formatted_data = format_conversations(data)

# Convert to Dataset
dataset = Dataset.from_list(formatted_data)

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3",
    mapping = {"role" : "from", "content" : "value", "user" : "user", "assistant" : "assistant"},
    map_eos_token = True,
)

def formatting_prompts_func(convos):
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos["conversations"]]
    return { "text" : texts, }
pass

dataset = dataset.map(formatting_prompts_func, batched = True,)

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 1000,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

trainer_stats = trainer.train()
model.save_pretrained("model")
tokenizer.save_pretrained("model")
model.save_pretrained_gguf("model_gguf", tokenizer, quantization_method="not_quantized")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants