bug: Token usage is not computed correctly for some models #947

trebedea · 2025-01-21T15:55:01Z

Did you check docs and existing issues?

I have read all the NeMo-Guardrails docs
I have updated the package to the latest version before submitting this issue
(optional) I have used the develop branch
I have searched the existing issues of NeMo-Guardrails

Python version (python --version)

Python 3.10.8

Operating system/version

Windows 10

NeMo-Guardrails version (if you must use a specific version and not the latest

0.11.0

Describe the bug

Token stats are not computed correctly (they are missing from the GenerationResponse log) for some models - e.g. for NIMs or NVAIE, but probably for other ones as well. This makes both reporting token usage and running evaluations (nemoguardrails eval) for these models incorrect or completely fail.

For example, running evaluation for NVIDIA models (NIMs or NVAIE) fails with an exception for any guaradrail config (e.g. abc bot):

python.exe -m nemoguardrails eval run --guardrail-config-path=examples/bots/TR/botname/gr-config --eval-config-path=examples/bots/TR/botname/config --output-path=examples/bots/TR/botname/output 
Loading eval configuration from \projects\nemoguardrails-1224\NeMo-Guardrails\examples\bots\TR\botname\config.
Starting the evaluation for examples/bots/TR/botname/gr-config.
Writing results to examples/bots/TR/botname/output.
Loading eval configuration \projects\nemoguardrails-1224\NeMo-Guardrails\examples\bots\TR\botname\config ...
Loaded 7 policies and 215 interactions.
Loading guardrail configuration examples/bots/TR/botname/gr-config ...
Fetching 5 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 5102.56it/s]
[1] "Hello!"
Running 215 interactions ... ----------------------------------------   0% -:--:--
Traceback (most recent call last):
  File "\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "\projects\nemoguardrails-1224\NeMo-Guardrails\nemoguardrails\__main__.py", line 20, in <module>
    app()
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\typer\main.py", line 338, in __call__
    raise e
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\typer\main.py", line 321, in __call__
    return get_command(self)(*args, **kwargs)
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\typer\core.py", line 728, in main
    return _main(
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\typer\core.py", line 197, in _main
    rv = self.invoke(ctx)
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\click\core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\click\core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\typer\main.py", line 703, in wrapper
    return callback(**use_params)
  File "\projects\nemoguardrails-1224\NeMo-Guardrails\nemoguardrails\eval\cli.py", line 87, in run
    asyncio.run(
  File "\AppData\Local\Programs\Python\Python310\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 649, in run_until_complete
    return future.result()
  File "\projects\nemoguardrails-1224\NeMo-Guardrails\nemoguardrails\eval\eval.py", line 314, in run_eval
    await asyncio.gather(*[_worker() for _ in range(parallel)])
  File "\projects\nemoguardrails-1224\NeMo-Guardrails\nemoguardrails\eval\eval.py", line 301, in _worker
    metrics = _collect_span_metrics(interaction_log.trace)
  File "\projects\nemoguardrails-1224\NeMo-Guardrails\nemoguardrails\eval\utils.py", line 161, in _collect_span_metrics
    metrics[metric] = metrics.get(metric, 0) + span.metrics[metric]
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

Process finished with exit code 1`

When running a simple test.py , we get empty token stats:

from nemoguardrails import RailsConfig
from nemoguardrails import LLMRails

config = RailsConfig.from_path("./botname/gr-config")
rails = LLMRails(config)

response = rails.generate(messages=[{
    "role": "user",
    "content": "Hi! How are you?"
}],  options = {   "log": {
        "activated_rails": True,
        "llm_calls": True,
        "internal_events": True,
        "colang_history": True
    }})
print(response)
response.log.print_summary()
info = rails.explain()
info.print_llm_calls_summary()

Output is - see 0 tokens in General stats summary and total_tokens=None in GenerationLog:

response=[{'role': 'assistant', 'content': "I'm functioning properly, thanks for asking. What brings you here today? Are you looking for answers about ABC Company?"}] llm_output=None output_data=None log=GenerationLog(activated_rails=[ActivatedRail(type='input', name='self check input', decisions=['execute self_check_input'], executed_actions=[ExecutedAction(action_name='self_check_input', action_params={}, return_value=True, llm_calls=[LLMCallInfo(task='self_check_input', duration=0.33532238006591797, total_tokens=None, prompt_tokens=None, completion_tokens=None, started_at=1737474372.5013032, finished_at=1737474372.8366256, id='e3663627-2951-431b-9a8a-7e05072fba0b', prompt= ......... 
// removed long response logs for brevity

# General stats

- Total time: 1.42s
  - [0.34s][24.12%]: INPUT Rails
  - [0.73s][51.23%]: GENERATION Rails
  - [0.35s][24.44%]: OUTPUT Rails
- 3 LLM calls, 1.40s total duration, 0 total prompt tokens, 0 total completion tokens, 0 total tokens.

Steps To Reproduce

Run test.py above with the following model (and probably others as well) in config.yml:

models:
- type: main
  engine: nvidia_ai_endpoints
  model: meta/llama-3.1-8b-instruct

Alternative, run nemoguardrails eval run with the same model.

Expected Behavior

Token stats are not None in GenerationLog.
Exception is not thrown when running nemoguardrails eval run.

Actual Behavior

Token stats are None in GenerationLog.
Exception is thrown when running nemoguardrails eval run.

The text was updated successfully, but these errors were encountered:

Pouyanpi · 2025-01-23T12:49:13Z

@trebedea I noticed the same when running abc bot using gpt-3.5-instruct-turbo. So token_usage is always empty.

trebedea · 2025-01-23T16:33:36Z

Synced with @Pouyanpi , this happens only when using streaming. This is true, token stats are broken for all models when using streaming.

trebedea added bug Something isn't working status: needs triage New issues that have not yet been reviewed or categorized. labels Jan 21, 2025

Pouyanpi assigned trebedea Jan 21, 2025

Pouyanpi removed the status: needs triage New issues that have not yet been reviewed or categorized. label Jan 22, 2025

trebedea mentioned this issue Jan 27, 2025

Fix: Repair token usage stats in LLM call info #953

Merged

1 task

Pouyanpi closed this as completed in #953 Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Token usage is not computed correctly for some models #947

bug: Token usage is not computed correctly for some models #947

trebedea commented Jan 21, 2025

Pouyanpi commented Jan 23, 2025

trebedea commented Jan 23, 2025

bug: Token usage is not computed correctly for some models #947

bug: Token usage is not computed correctly for some models #947

Comments

trebedea commented Jan 21, 2025

Did you check docs and existing issues?

Python version (python --version)

Operating system/version

NeMo-Guardrails version (if you must use a specific version and not the latest

Describe the bug

Steps To Reproduce

Expected Behavior

Actual Behavior

Pouyanpi commented Jan 23, 2025

trebedea commented Jan 23, 2025