Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The 'stop' argument causes 'pyo3_runtime.PanicException' in some cases. #1131

Open
Six6stRINgs opened this issue Feb 21, 2025 · 7 comments
Open

Comments

@Six6stRINgs
Copy link

The bug
Hello, Thanks for your great work. But recently I have occured some wired bugs and don't how to fix it.
I want to gen a json format string using gen with some regular args.
Using the stop argument in gen can cause bugs in certain cases.

Code

wmeta_key = wordmeta.get_attr_name_list()   #Basemodel from pydantic
    input_text = f"task:{task}\nprompt:{prompt}\n```json\n{{\n"
    for key in wmeta_key:
        wmeta_value = getattr(wordmeta, key)
        wdecision_value = getattr(wdecision, f"need_{key}")

        if isinstance(wmeta_value, int | float):
            reg = r"^\d+$"
            m_tokens = 10
        elif isinstance(wmeta_value, str):
            reg = None
            m_tokens = 50

        gen_res = gen_decision(
            lm + f"{task} {key}: ",
            need_llm=wdecision_value,
            ori_text=wmeta_value,
            name="res",
            regex=reg,
            stop=["'", '"', ".", "\n"],
            max_tokens=m_tokens,
        )

def gen_decision(lm, need_llm: bool, ori_text: str, **gen_args):
    return lm + gen(**gen_args) if need_llm else ori_text

......

When running gen_decision, error will be reported. If I remove the stop argument, the issue does not occur. However, the output generated by the LLM is not satisfactory.

assertion failed: self.state.byte_to_token_idx.len() >= n_bytes
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "/work/llm.py", line 65, in gen_decision
    return lm + gen(**gen_args) if need_llm else ori_text
  File "/root/anaconda3/envs/guidance/lib/python3.10/site-packages/guidance/models/_model.py", line 1207, in __add__
    out = lm._run_stateless(value)
  File "/root/anaconda3/envs/guidance/lib/python3.10/site-packages/guidance/models/_model.py", line 1413, in _run_stateless
    for chunk in gen_obj:
  File "/root/anaconda3/envs/guidance/lib/python3.10/site-packages/guidance/models/_model.py", line 431, in __call__
    tokens, mask_fut, backtrack = parser.advance(engine_output)
  File "/root/anaconda3/envs/guidance/lib/python3.10/site-packages/guidance/_parser.py", line 78, in advance
    return self._generator.send(engine_output)
  File "/root/anaconda3/envs/guidance/lib/python3.10/site-packages/guidance/_parser.py", line 153, in _parse
    backtrack, ff_tokens = self.ll_interpreter.commit_token(
pyo3_runtime.PanicException: assertion failed: self.state.byte_to_token_idx.len() >= n_bytes
[ERROR] 2025-02-21-07:57:02 (PID:3017059, Device:0, RankID:-1) ERR99999 UNKNOWN application exception

But when running some simple codes, it will be ok:

lm = create_lm() #lm = models.Transformers(model_path, device_map=device_map)
    print(
        lm
        + "Hello "
        + gen(
            name="res",
            regex=None,
            stop=["'", '"', ";", ":", ",", ".", "\n"],
            max_tokens=50,
            temperature=0.5,
        )
    )

Response

Hello I am trying to create a simple program that will take a string and print out the number of times each character appears in the string

To Reproduce
Loader : loaded by models.Transformers
Model: Qwen2.5-1.5b-instruct, Qwen1.5-7b
Executing the aforementioned code

System info (please complete the following information):

  • OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): Ubuntu
  • Guidance Version (guidance.__version__): 0.2.0
  • Device: npu
@Harsha-Nori
Copy link
Member

Hi @Six6stRINgs, thanks for reporting this...definitely odd! Will look into reproducing this week

@Six6stRINgs
Copy link
Author

@Harsha-Nori Hello, thanks for your reply. I spent a few days trying to make the code a little bit easier to reprodue this bug.
There is the code:

import guidance
from guidance import gen
from pydantic import BaseModel

class bm(BaseModel):
    a: str
    text: str
    ccc: str

    def get_attr_name_list(self) -> list[str]:
        return [attr for attr, _ in self.model_fields.items()]


class bm2(BaseModel):
    ttt: str
    attrr: str
    bo: str

    def get_attr_name_list(self) -> list[str]:
        return [attr for attr, _ in self.model_fields.items()]


if __name__ == "__main__":
    model_path = "/data/weights/qwen2_5-1.5b-instruct/"
    device_map = {"": 2}

    m = bm(
        a="5",
        text="world",
        ccc="hello world",
    )

    m2 = bm2(ttt="123", attrr="2", bo="True")

    model = guidance.models.Transformers(model_path, device_map=device_map)

    bm_list1 = ["a", "text", "ccc"]  # crash at 2nd str
    bm_list2 = ["222", "333", "444"]  # normal
    bm_list3 = ["a", "666", "ccc"]  # normal
    bm_list4 = ["text", "ccc", "a"]  # crash at 1st str

    bm2_list1 = ["ttt", "attrr", "bo"]  # normal
    bm2_list2 = ["t1", "a2", "b3"]  # normal
    bm2_list3 = ["ttt", "attrr", "ta"]  # normal

    bm_keys: list[str] = m.get_attr_name_list()  # crash at 2nd str, same as bm_list1
    bm2_keys: list[str] = m2.get_attr_name_list()  # normal

    # the list/keys from above
    for key in bm_keys:
        print(f"key: {key}")
        res = (
            model
            + f"Hello, {key}: "
            + gen(
                name="res",
                stop=["'", '"', ";", ":", ",", ".", "\n"],
                max_tokens=15,
            )
        )
        print(f"LLM res: {res}")

I run this code on my linux server.

The key part is the the list used in for loop

It seems that the str in list may crash the program. If the string is exactly the attribute name of the Basemodel, same bug will be reported.
From the code, bm_list1, bm_list4 and bm_keys will crash the program when model begins to gen. But other lists work well.

thread '<unnamed>' panicked at parser/src/earley/parser.rs:2043:9:
assertion failed: self.state.byte_to_token_idx.len() >= n_bytes
......

However, the odd part is that bm2_list1 and bm2_keys of what contains in the same attribute string of the class bm2 it works without error.
If I remove the stop arg, all lists run well.
Have no idea about this bug T_T.

@Harsha-Nori
Copy link
Member

Harsha-Nori commented Mar 5, 2025 via email

@hudson-ai
Copy link
Collaborator

hudson-ai commented Mar 5, 2025

@mmoskal I have a hypothesis -- we crash if stop backtracks during token healing.

Minimal repro:

model += "Hello, text: "
model += gen(stop='"')

If we token heal and allow the first token to be ' "' (which it seems Qwen wants to do in this case), something breaks. Note that we don't get any errors if the above were this instead:

model += "Hello, text: " + gen(stop='"')

@mmoskal
Copy link
Collaborator

mmoskal commented Mar 5, 2025

Yeah, this crashes:

#[test]
fn test_ll_stop_heal() {
    // https://github.com/guidance-ai/guidance/issues/1131
    check_lark_grammar_prompt(
        r#"
            start: gen
            gen[stop=/"/]: /.*/
        "#,
        "Hello, text: ",
        &["Hello‧,‧ text‧:", " \""],
    );
}

@hudson-ai
Copy link
Collaborator

@Six6stRINgs thanks for opeing the issue and finding this bug!

While we work on a fix, here is a workaround: group the f"Hello, {key}: " prompt with the gen expression.

There is a subtle difference between

model + "Hello" + gen(...)

and

model + ("Hello" + gen(...))

Doing the second version instead here should get around the crash.

@mmoskal
Copy link
Collaborator

mmoskal commented Mar 5, 2025

keeping open until llguidance is updated in guidance

@mmoskal mmoskal reopened this Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants