Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenizer Import Error When Using Ollama Models #766

Open
AnukaMithara opened this issue Oct 24, 2024 · 2 comments
Open

Tokenizer Import Error When Using Ollama Models #766

AnukaMithara opened this issue Oct 24, 2024 · 2 comments

Comments

@AnukaMithara
Copy link

Tokenizer Import Error When Using Ollama Models

Description

When attempting to use Ollama models (llama3, llama3.1, mistral), the application fails due to a tokenizer import error. The error occurs when trying to calculate tokens for text chunking operations. It working fine with OpenAI Models

Environment

  • Python 3.12
  • LangChain Core
  • Ollama Models tested:
    • ollama/llama3
    • ollama/llama3.1
    • ollama/mistral

Error Message

ImportError: cannot import name 'GPT2TokenizerFast' from 'transformers'

Root cause appears to be LangChain attempting to use GPT2TokenizerFast for token counting operations with Ollama models.

Full Traceback

The error originates in langchain_core/language_models/base.py and propagates through the text chunking functionality:

  1. Initial error in token calculation:
from transformers import GPT2TokenizerFast
ImportError: cannot import name 'GPT2TokenizerFast' from 'transformers'
  1. This causes a cascade through:
    • scrapegraphai/utils/tokenizer.py
    • scrapegraphai/utils/split_text_into_chunks.py
    • semchunk/semchunk.py

Steps to Reproduce

  1. Set up a project using LangChain with Ollama models
  2. Attempt to perform operations requiring token counting (e.g., text chunking)
  3. The operation fails with the above tokenizer import error

Expected Behavior

The application should properly handle token counting operations with Ollama models without requiring the transformers package or should use an alternative tokenizer implementation.

Current Behavior

The application fails with an import error, suggesting installation of the transformers package, which may not be the correct solution for Ollama models.

Possible Solutions

  1. Implement a specific tokenizer for Ollama models
  2. Use a different token counting mechanism for these models
  3. Add proper fallback behavior when transformers package isn't available

Additional Notes

This appears to be a broader issue with how LangChain handles tokenization for Ollama models, as the error is consistent across multiple Ollama models (llama3, llama3.1, mistral).

@AnukaMithara
Copy link
Author

2024-10-24 13:14:26,930 app.service.search_graph_service:66 -> (ERROR) Error in searching process: Could not import transformers python package. This is needed in order to calculate get_token_ids. Please install it with pip install transformers.
Traceback (most recent call last):
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/langchain_core/language_models/base.py", line 61, in get_tokenizer
from transformers import GPT2TokenizerFast # type: ignore[import]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ImportError: cannot import name 'GPT2TokenizerFast' from 'transformers' (/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/transformers/init.py)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/app/service/search_graph_service.py", line 46, in search
result = search_graph.run()
^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/graphs/search_graph.py", line 115, in run
self.final_state, self.execution_info = self.graph.execute(inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/graphs/base_graph.py", line 279, in execute
return self._execute_standard(initial_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/graphs/base_graph.py", line 195, in _execute_standard
raise e
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/graphs/base_graph.py", line 179, in _execute_standard
result = current_node.execute(state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/nodes/graph_iterator_node.py", line 73, in execute
state = asyncio.run(self._async_execute(state, batchsize))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/nodes/graph_iterator_node.py", line 136, in _async_execute
answers = await tqdm.gather(
^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/tqdm/asyncio.py", line 79, in gather
res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
^^^^^^^
File "/usr/lib/python3.12/asyncio/tasks.py", line 631, in _wait_for_one
return f.result() # May raise f.exception().
^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/tqdm/asyncio.py", line 76, in wrap_awaitable
return i, await f
^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/nodes/graph_iterator_node.py", line 126, in _async_run
return await asyncio.to_thread(graph.run)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/graphs/smart_scraper_graph.py", line 184, in run
self.final_state, self.execution_info = self.graph.execute(inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/graphs/base_graph.py", line 279, in execute
return self._execute_standard(initial_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/graphs/base_graph.py", line 195, in _execute_standard
raise e
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/graphs/base_graph.py", line 179, in _execute_standard
result = current_node.execute(state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/nodes/parse_node.py", line 83, in execute
chunks = split_text_into_chunks(text=docs_transformed.page_content,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/utils/split_text_into_chunks.py", line 28, in split_text_into_chunks
chunks = chunk(text=text,
^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/semchunk/semchunk.py", line 129, in chunk
if token_counter(split) > chunk_size:
^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/utils/split_text_into_chunks.py", line 24, in count_tokens
return num_tokens_calculus(text, model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/utils/tokenizer.py", line 30, in num_tokens_calculus
num_tokens = num_tokens_fn(string, llm_model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/scrapegraphai/utils/tokenizers/tokenizer_ollama.py", line 26, in num_tokens_ollama
tokens = llm_model.get_num_tokens(text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/langchain_core/language_models/base.py", line 365, in get_num_tokens
return len(self.get_token_ids(text))
^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/langchain_core/language_models/base.py", line 352, in get_token_ids
return _get_token_ids_default_method(text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/langchain_core/language_models/base.py", line 76, in _get_token_ids_default_method
tokenizer = get_tokenizer()
^^^^^^^^^^^^^^^
File "/mnt/c/Users/dell/Desktop/Projects/datafab-scrapper-gen1-srv/venv/lib/python3.12/site-packages/langchain_core/language_models/base.py", line 68, in get_tokenizer
raise ImportError(msg) from e
ImportError: Could not import transformers python package. This is needed in order to calculate get_token_ids. Please install it with pip install transformers.

@AnukaMithara
Copy link
Author

configurations

DEFAULT_SEARCH_GRAPH_CONFIG = {
"llm": {
"model": "ollama/llama3",
"temperature": 0,
"base_url": OLLAMA_BASE_URL,
},
"max_results": 2,
"verbose": True,
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant