Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Token usage is not computed correctly for some models #947

Closed
4 tasks done
trebedea opened this issue Jan 21, 2025 · 2 comments · Fixed by #953
Closed
4 tasks done

bug: Token usage is not computed correctly for some models #947

trebedea opened this issue Jan 21, 2025 · 2 comments · Fixed by #953
Assignees
Labels
bug Something isn't working

Comments

@trebedea
Copy link
Collaborator

Did you check docs and existing issues?

  • I have read all the NeMo-Guardrails docs
  • I have updated the package to the latest version before submitting this issue
  • (optional) I have used the develop branch
  • I have searched the existing issues of NeMo-Guardrails

Python version (python --version)

Python 3.10.8

Operating system/version

Windows 10

NeMo-Guardrails version (if you must use a specific version and not the latest

0.11.0

Describe the bug

Token stats are not computed correctly (they are missing from the GenerationResponse log) for some models - e.g. for NIMs or NVAIE, but probably for other ones as well. This makes both reporting token usage and running evaluations (nemoguardrails eval) for these models incorrect or completely fail.

For example, running evaluation for NVIDIA models (NIMs or NVAIE) fails with an exception for any guaradrail config (e.g. abc bot):

python.exe -m nemoguardrails eval run --guardrail-config-path=examples/bots/TR/botname/gr-config --eval-config-path=examples/bots/TR/botname/config --output-path=examples/bots/TR/botname/output 
Loading eval configuration from \projects\nemoguardrails-1224\NeMo-Guardrails\examples\bots\TR\botname\config.
Starting the evaluation for examples/bots/TR/botname/gr-config.
Writing results to examples/bots/TR/botname/output.
Loading eval configuration \projects\nemoguardrails-1224\NeMo-Guardrails\examples\bots\TR\botname\config ...
Loaded 7 policies and 215 interactions.
Loading guardrail configuration examples/bots/TR/botname/gr-config ...
Fetching 5 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 5102.56it/s]
[1] "Hello!"
Running 215 interactions ... ----------------------------------------   0% -:--:--
Traceback (most recent call last):
  File "\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "\projects\nemoguardrails-1224\NeMo-Guardrails\nemoguardrails\__main__.py", line 20, in <module>
    app()
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\typer\main.py", line 338, in __call__
    raise e
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\typer\main.py", line 321, in __call__
    return get_command(self)(*args, **kwargs)
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\typer\core.py", line 728, in main
    return _main(
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\typer\core.py", line 197, in _main
    rv = self.invoke(ctx)
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\click\core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\click\core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "\AppData\Local\pypoetry\Cache\virtualenvs\nemoguardrails-8Un37jLR-py3.10\lib\site-packages\typer\main.py", line 703, in wrapper
    return callback(**use_params)
  File "\projects\nemoguardrails-1224\NeMo-Guardrails\nemoguardrails\eval\cli.py", line 87, in run
    asyncio.run(
  File "\AppData\Local\Programs\Python\Python310\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 649, in run_until_complete
    return future.result()
  File "\projects\nemoguardrails-1224\NeMo-Guardrails\nemoguardrails\eval\eval.py", line 314, in run_eval
    await asyncio.gather(*[_worker() for _ in range(parallel)])
  File "\projects\nemoguardrails-1224\NeMo-Guardrails\nemoguardrails\eval\eval.py", line 301, in _worker
    metrics = _collect_span_metrics(interaction_log.trace)
  File "\projects\nemoguardrails-1224\NeMo-Guardrails\nemoguardrails\eval\utils.py", line 161, in _collect_span_metrics
    metrics[metric] = metrics.get(metric, 0) + span.metrics[metric]
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

Process finished with exit code 1`

When running a simple test.py , we get empty token stats:

from nemoguardrails import RailsConfig
from nemoguardrails import LLMRails

config = RailsConfig.from_path("./botname/gr-config")
rails = LLMRails(config)

response = rails.generate(messages=[{
    "role": "user",
    "content": "Hi! How are you?"
}],  options = {   "log": {
        "activated_rails": True,
        "llm_calls": True,
        "internal_events": True,
        "colang_history": True
    }})
print(response)
response.log.print_summary()
info = rails.explain()
info.print_llm_calls_summary()

Output is - see 0 tokens in General stats summary and total_tokens=None in GenerationLog:

response=[{'role': 'assistant', 'content': "I'm functioning properly, thanks for asking. What brings you here today? Are you looking for answers about ABC Company?"}] llm_output=None output_data=None log=GenerationLog(activated_rails=[ActivatedRail(type='input', name='self check input', decisions=['execute self_check_input'], executed_actions=[ExecutedAction(action_name='self_check_input', action_params={}, return_value=True, llm_calls=[LLMCallInfo(task='self_check_input', duration=0.33532238006591797, total_tokens=None, prompt_tokens=None, completion_tokens=None, started_at=1737474372.5013032, finished_at=1737474372.8366256, id='e3663627-2951-431b-9a8a-7e05072fba0b', prompt= ......... 
// removed long response logs for brevity

# General stats

- Total time: 1.42s
  - [0.34s][24.12%]: INPUT Rails
  - [0.73s][51.23%]: GENERATION Rails
  - [0.35s][24.44%]: OUTPUT Rails
- 3 LLM calls, 1.40s total duration, 0 total prompt tokens, 0 total completion tokens, 0 total tokens.

Steps To Reproduce

  1. Run test.py above with the following model (and probably others as well) in config.yml:
models:
- type: main
  engine: nvidia_ai_endpoints
  model: meta/llama-3.1-8b-instruct
  1. Alternative, run nemoguardrails eval run with the same model.

Expected Behavior

  1. Token stats are not None in GenerationLog.
  2. Exception is not thrown when running nemoguardrails eval run.

Actual Behavior

  1. Token stats are None in GenerationLog.
  2. Exception is thrown when running nemoguardrails eval run.
@trebedea trebedea added bug Something isn't working status: needs triage New issues that have not yet been reviewed or categorized. labels Jan 21, 2025
@Pouyanpi Pouyanpi removed the status: needs triage New issues that have not yet been reviewed or categorized. label Jan 22, 2025
@Pouyanpi
Copy link
Collaborator

@trebedea I noticed the same when running abc bot using gpt-3.5-instruct-turbo. So token_usage is always empty.

@trebedea
Copy link
Collaborator Author

Synced with @Pouyanpi , this happens only when using streaming. This is true, token stats are broken for all models when using streaming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants