Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Your First Golden - Using Custom model #1216

Open
pratikchhapolika opened this issue Dec 8, 2024 · 7 comments
Open

Generate Your First Golden - Using Custom model #1216

pratikchhapolika opened this issue Dec 8, 2024 · 7 comments

Comments

@pratikchhapolika
Copy link

pratikchhapolika commented Dec 8, 2024

I am following this link: https://docs.confident-ai.com/docs/synthesizer-introduction#:~:text=begin%20generating%20goldens.-,from%20deepeval.synthesizer%20import%20Synthesizer,-...

Browser: Chrome
Python: 3.12
deepeval version: '2.0.3'
Jupyter Notebook on Macbook 16 Pro

Note: This custom model works fine when evaluating on metrics.

Custom model using AzureOpenAI

from langchain_openai.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings

class AzureOpenAI(DeepEvalBaseLLM):
    def __init__(
        self,
        model
    ):
        self.model = model

    def load_model(self):
        return self.model

    def generate(self, prompt: str) -> str:
        chat_model = self.load_model()
        return chat_model.invoke(prompt).content

    async def a_generate(self, prompt: str) -> str:
        chat_model = self.load_model()
        res = await chat_model.ainvoke(prompt)
        return res.content

    def get_model_name(self):
        return "Custom Azure OpenAI Model"

# Replace these with real values
custom_model=AzureChatOpenAI(
                api_version=config.pf_api_version,
                azure_endpoint=config.pf_oa_endpoint,
                azure_ad_token=token,
                max_tokens=config.max_tokens,
                model=config.pf_llm_deployment,
    
            )
# init the embeddings for answer_relevancy, answer_correctness and answer_similarity
azure_embeddings = AzureOpenAIEmbeddings(
    openai_api_version=config.pf_api_version,
    azure_endpoint=config.pf_oa_endpoint_embed,
    azure_ad_token=token,
    model=config.pf_embedding_engine,
)
azure_openai = AzureOpenAI(model=custom_model)

Generate Your First Golden

from deepeval.synthesizer import Synthesizer

...
synthesizer = Synthesizer(model=azure_openai)
synthesizer.generate_goldens_from_docs(
    document_paths=['abc.pdf'],
    include_expected_output=True
)
print(synthesizer.synthetic_goldens)

ERROR TRACE

---------------------------------------------------------------------------
OpenAIError                               Traceback (most recent call last)
Cell In[9], line 2
      1 # Use gpt-3.5-turbo instead
----> 2 synthesizer = Synthesizer(model=azure_openai)
      4 synthesizer.generate_goldens_from_docs(
      5     document_paths=['abc.pdf'],
      6     include_expected_output=True,
      7     max_goldens_per_document=2
      8 )
      9 print(synthesizer.synthetic_goldens)

File [~/Library/Python/3.12/lib/python/site-packages/deepeval/synthesizer/synthesizer.py:93](http://localhost:8888/lab/tree/Downloads/~/Library/Python/3.12/lib/python/site-packages/deepeval/synthesizer/synthesizer.py#line=92), in Synthesizer.__init__(self, model, async_mode, max_concurrent, filtration_config, evolution_config, styling_config)
     88 self.synthetic_goldens: List[Golden] = []
     89 self.context_generator = None
     90 self.filtration_config = (
     91     filtration_config
     92     if filtration_config is not None
---> 93     else FiltrationConfig()
     94 )
     95 self.evolution_config = (
     96     evolution_config
     97     if evolution_config is not None
     98     else EvolutionConfig()
     99 )
    100 self.styling_config = (
    101     styling_config if styling_config is not None else StylingConfig()
    102 )

File <string>:6, in __init__(self, synthetic_input_quality_threshold, max_quality_retries, critic_model)

File [~/Library/Python/3.12/lib/python/site-packages/deepeval/synthesizer/config.py:18](http://localhost:8888/lab/tree/Downloads/~/Library/Python/3.12/lib/python/site-packages/deepeval/synthesizer/config.py#line=17), in FiltrationConfig.__post_init__(self)
     17 def __post_init__(self):
---> 18     self.critic_model, _ = initialize_model(self.critic_model)

File [~/Library/Python/3.12/lib/python/site-packages/deepeval/metrics/utils.py:269](http://localhost:8888/lab/tree/Downloads/~/Library/Python/3.12/lib/python/site-packages/deepeval/metrics/utils.py#line=268), in initialize_model(model)
    267     return model, False
    268 # Otherwise (the model is a string or None), we initialize a GPTModel and use as a native model
--> 269 return GPTModel(model=model), True

File [~/Library/Python/3.12/lib/python/site-packages/deepeval/models/gpt_model.py:103](http://localhost:8888/lab/tree/Downloads/~/Library/Python/3.12/lib/python/site-packages/deepeval/models/gpt_model.py#line=102), in GPTModel.__init__(self, model, _openai_api_key, base_url, *args, **kwargs)
    101 self.args = args
    102 self.kwargs = kwargs
--> 103 super().__init__(model_name)

File [~/Library/Python/3.12/lib/python/site-packages/deepeval/models/base_model.py:35](http://localhost:8888/lab/tree/Downloads/~/Library/Python/3.12/lib/python/site-packages/deepeval/models/base_model.py#line=34), in DeepEvalBaseLLM.__init__(self, model_name, *args, **kwargs)
     33 def __init__(self, model_name: Optional[str] = None, *args, **kwargs):
     34     self.model_name = model_name
---> 35     self.model = self.load_model(*args, **kwargs)

File [~/Library/Python/3.12/lib/python/site-packages/deepeval/models/gpt_model.py:156](http://localhost:8888/lab/tree/Downloads/~/Library/Python/3.12/lib/python/site-packages/deepeval/models/gpt_model.py#line=155), in GPTModel.load_model(self)
    144     return CustomChatOpenAI(
    145         model_name=model_name,
    146         openai_api_key=openai_api_key,
   (...)
    153         **self.kwargs,
    154     )
    155 else:
--> 156     return ChatOpenAI(
    157         model_name=self.model_name,
    158         openai_api_key=self._openai_api_key,
    159         *self.args,
    160         **self.kwargs,
    161     )

File [~/Library/Python/3.12/lib/python/site-packages/langchain_core/load/serializable.py:125](http://localhost:8888/lab/tree/Downloads/~/Library/Python/3.12/lib/python/site-packages/langchain_core/load/serializable.py#line=124), in Serializable.__init__(self, *args, **kwargs)
    123 def __init__(self, *args: Any, **kwargs: Any) -> None:
    124     """"""
--> 125     super().__init__(*args, **kwargs)

    [... skipping hidden 1 frame]

File [~/Library/Python/3.12/lib/python/site-packages/langchain_openai/chat_models/base.py:551](http://localhost:8888/lab/tree/Downloads/~/Library/Python/3.12/lib/python/site-packages/langchain_openai/chat_models/base.py#line=550), in BaseChatOpenAI.validate_environment(self)
    549         self.http_client = httpx.Client(proxy=self.openai_proxy)
    550     sync_specific = {"http_client": self.http_client}
--> 551     self.root_client = openai.OpenAI(**client_params, **sync_specific)  # type: ignore[arg-type]
    552     self.client = self.root_client.chat.completions
    553 if not self.async_client:

File [~/Library/Python/3.12/lib/python/site-packages/openai/_client.py:105](http://localhost:8888/lab/tree/Downloads/~/Library/Python/3.12/lib/python/site-packages/openai/_client.py#line=104), in OpenAI.__init__(self, api_key, organization, project, base_url, timeout, max_retries, default_headers, default_query, http_client, _strict_response_validation)
    103     api_key = os.environ.get("OPENAI_API_KEY")
    104 if api_key is None:
--> 105     raise OpenAIError(
    106         "The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable"
    107     )
    108 self.api_key = api_key
    110 if organization is None:

OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
@kritinv
Copy link
Collaborator

kritinv commented Dec 9, 2024

Hey @pratikchhapolika , if you don't supply an OpenAI API key, DeepEval uses the OpenAI model as the critic model for filtering unqualified goldens. You can easily avoid this by defining your own custom FiltrationConfig with the custom model you've defined for generation.

@pratikchhapolika
Copy link
Author

pratikchhapolika commented Dec 10, 2024

Hey @pratikchhapolika , if you don't supply an OpenAI API key, DeepEval uses the OpenAI model as the critic model for filtering unqualified goldens. You can easily avoid this by defining your own custom FiltrationConfig with the custom model you've defined for generation.

filtration_config = FiltrationConfig(critic_model=azure_openai,synthetic_input_quality_threshold=0.6)
synthesizer = Synthesizer(filtration_config=filtration_config,model=azure_openai)

synthesizer.generate_goldens_from_docs(
    document_paths=['abc.pdf'],
    include_expected_output=True,
)
print(synthesizer.synthetic_goldens)
df = synthesizer.to_pandas()

I am seeing the same error @kritinv

---------------------------------------------------------------------------
OpenAIError                               Traceback (most recent call last)
Cell In[5], line 4
      1 filtration_config = FiltrationConfig(critic_model=azure_openai,synthetic_input_quality_threshold=0.6)
      2 synthesizer = Synthesizer(filtration_config=filtration_config,model=azure_openai)
----> 4 synthesizer.generate_goldens_from_docs(
      5     document_paths=['abc.pdf'],
      6     include_expected_output=True,
      7 )
      8 print(synthesizer.synthetic_goldens)
      9 df = synthesizer.to_pandas()

File [~/Library/Python/3.12/lib/python/site-packages/deepeval/synthesizer/synthesizer.py:117](http://localhost:8888/lab/tree/Downloads/deepeval/~/Library/Python/3.12/lib/python/site-packages/deepeval/synthesizer/synthesizer.py#line=116), in Synthesizer.generate_goldens_from_docs(self, document_paths, include_expected_output, max_goldens_per_context, context_construction_config, _send_data)
    108 def generate_goldens_from_docs(
    109     self,
    110     document_paths: List[str],
   (...)
    114     _send_data=True,
    115 ):
    116     if context_construction_config is None:
--> 117         context_construction_config = ContextConstructionConfig()
    119     if self.async_mode:
    120         loop = get_or_create_event_loop()

File <string>:11, in __init__(self, embedder, critic_model, max_contexts_per_document, chunk_size, chunk_overlap, context_quality_threshold, context_similarity_threshold, max_retries)

File [~/Library/Python/3.12/lib/python/site-packages/deepeval/synthesizer/config.py:57](http://localhost:8888/lab/tree/Downloads/deepeval/~/Library/Python/3.12/lib/python/site-packages/deepeval/synthesizer/config.py#line=56), in ContextConstructionConfig.__post_init__(self)
     56 def __post_init__(self):
---> 57     self.critic_model, _ = initialize_model(self.critic_model)
     58     if self.embedder is None:
     59         self.embedder = OpenAIEmbeddingModel()

File [~/Library/Python/3.12/lib/python/site-packages/deepeval/metrics/utils.py:269](http://localhost:8888/lab/tree/Downloads/deepeval/~/Library/Python/3.12/lib/python/site-packages/deepeval/metrics/utils.py#line=268), in initialize_model(model)
    267     return model, False
    268 # Otherwise (the model is a string or None), we initialize a GPTModel and use as a native model
--> 269 return GPTModel(model=model), True

File [~/Library/Python/3.12/lib/python/site-packages/deepeval/models/gpt_model.py:103](http://localhost:8888/lab/tree/Downloads/deepeval/~/Library/Python/3.12/lib/python/site-packages/deepeval/models/gpt_model.py#line=102), in GPTModel.__init__(self, model, _openai_api_key, base_url, *args, **kwargs)
    101 self.args = args
    102 self.kwargs = kwargs
--> 103 super().__init__(model_name)

File [~/Library/Python/3.12/lib/python/site-packages/deepeval/models/base_model.py:35](http://localhost:8888/lab/tree/Downloads/deepeval/~/Library/Python/3.12/lib/python/site-packages/deepeval/models/base_model.py#line=34), in DeepEvalBaseLLM.__init__(self, model_name, *args, **kwargs)
     33 def __init__(self, model_name: Optional[str] = None, *args, **kwargs):
     34     self.model_name = model_name
---> 35     self.model = self.load_model(*args, **kwargs)

File [~/Library/Python/3.12/lib/python/site-packages/deepeval/models/gpt_model.py:156](http://localhost:8888/lab/tree/Downloads/deepeval/~/Library/Python/3.12/lib/python/site-packages/deepeval/models/gpt_model.py#line=155), in GPTModel.load_model(self)
    144     return CustomChatOpenAI(
    145         model_name=model_name,
    146         openai_api_key=openai_api_key,
   (...)
    153         **self.kwargs,
    154     )
    155 else:
--> 156     return ChatOpenAI(
    157         model_name=self.model_name,
    158         openai_api_key=self._openai_api_key,
    159         *self.args,
    160         **self.kwargs,
    161     )

File [~/Library/Python/3.12/lib/python/site-packages/langchain_core/load/serializable.py:125](http://localhost:8888/lab/tree/Downloads/deepeval/~/Library/Python/3.12/lib/python/site-packages/langchain_core/load/serializable.py#line=124), in Serializable.__init__(self, *args, **kwargs)
    123 def __init__(self, *args: Any, **kwargs: Any) -> None:
    124     """"""
--> 125     super().__init__(*args, **kwargs)

    [... skipping hidden 1 frame]

File [~/Library/Python/3.12/lib/python/site-packages/langchain_openai/chat_models/base.py:551](http://localhost:8888/lab/tree/Downloads/deepeval/~/Library/Python/3.12/lib/python/site-packages/langchain_openai/chat_models/base.py#line=550), in BaseChatOpenAI.validate_environment(self)
    549         self.http_client = httpx.Client(proxy=self.openai_proxy)
    550     sync_specific = {"http_client": self.http_client}
--> 551     self.root_client = openai.OpenAI(**client_params, **sync_specific)  # type: ignore[arg-type]
    552     self.client = self.root_client.chat.completions
    553 if not self.async_client:

File [~/Library/Python/3.12/lib/python/site-packages/openai/_client.py:105](http://localhost:8888/lab/tree/Downloads/deepeval/~/Library/Python/3.12/lib/python/site-packages/openai/_client.py#line=104), in OpenAI.__init__(self, api_key, organization, project, base_url, timeout, max_retries, default_headers, default_query, http_client, _strict_response_validation)
    103     api_key = os.environ.get("OPENAI_API_KEY")
    104 if api_key is None:
--> 105     raise OpenAIError(
    106         "The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable"
    107     )
    108 self.api_key = api_key
    110 if organization is None:

OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

@penguine-ip

@pratikchhapolika
Copy link
Author

pratikchhapolika commented Dec 10, 2024

I also find this doc to be misleading: https://docs.confident-ai.com/docs/guides-using-custom-embedding-models#:~:text=from%20deepeval.synthesizer%20import%20Synthesizer%0A...%0A%0Asynthesizer%20%3D%20Synthesizer(embedder%3DCustomEmbeddingModel())


from deepeval.synthesizer import Synthesizer
...

synthesizer = Synthesizer(embedder=CustomEmbeddingModel())

Shoud we pass, chat model to both Synthesizer and filtration_config or the embedding model.
Also no parameter as embedder

Which model does it uses to convert pdf to text?

@3051360
Copy link

3051360 commented Jan 21, 2025

I used Azure OpenAI LLM and Embeddings (custom models) for synthetic dataset generation. It used to work without errors with version 1.3.2

✨ 🚀 ✨ Loading Documents: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.22it/s]
✨ 🧩 ✨ Generating Contexts: 100%|██████████████████████████████████████████████████████████████████████| 1/1 [00:13<00:00, 13.19s/it]
Utilizing 3 out of 5 chunks.
✨ Generating up to 6 goldens using DeepEval (using Custom Azure OpenAI Model and Custom Azure OpenAI Embedding Model, use case=QA, met

but with the latest version 2.1.9 it throws the following error:

✨ 🚀 ✨ Loading Documents: 100%|███████████████████████████████████████████████| 1/1 [00:00<00:00,  1.81it/s]
✨ 🧩 ✨ Generating Contexts: 100%|█████████████████████████████████████████████| 9/9 [00:08<00:00,  1.08it/s]
Utilizing 7 out of 9 chunks.
✨ Generating up to 6 goldens using DeepEval (using gpt-4o, method=docs):   0%|         | 0/6 [00:00<?, ?it/s]
Traceback (most recent call last):

File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/deepeval/synthesizer/synthesizer.py", line 122, in generate_goldens_from_docs
goldens = loop.run_until_complete(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 686, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/deepeval/synthesizer/synthesizer.py", line 222, in a_generate_goldens_from_docs
goldens = await self.a_generate_goldens_from_contexts(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/deepeval/synthesizer/synthesizer.py", line 396, in a_generate_goldens_from_contexts
await asyncio.gather(*tasks)
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/deepeval/synthesizer/synthesizer.py", line 964, in task_wrapper
return await func(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/deepeval/synthesizer/synthesizer.py", line 419, in _a_generate_from_context
synthetic_inputs: List[SyntheticData] = await self._a_generate_inputs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/deepeval/synthesizer/synthesizer.py", line 693, in _a_generate_inputs
res: SyntheticDataList = await self._a_generate_schema(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/deepeval/synthesizer/synthesizer.py", line 908, in _a_generate_schema
res, cost = await model.a_generate(prompt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 189, in async_wrapped
return await copy(fn, *args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 111, in call
do = await self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 153, in iter
result = await action(retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/tenacity/_utils.py", line 99, in inner
return call(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/tenacity/init.py", line 398, in <lambda>
self._add_action_func(lambda rs: rs.outcome.result())
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 114, in call
result = await fn(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/deepeval/models/gpt_model.py", line 304, in a_generate
res = await chat_model.ainvoke(prompt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 307, in ainvoke
llm_result = await self.agenerate_prompt(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 796, in agenerate_prompt
return await self.agenerate(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 756, in agenerate
raise exceptions[0]
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 924, in _agenerate_with_cache
result = await self._agenerate(
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/langchain_openai/chat_models/base.py", line 960, in _agenerate
response = await self.async_client.create(payload)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/openai/resources/chat/completions.py", line 1720, in create
return await self._post(
^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/openai/_base_client.py", line 1849, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/openai/_base_client.py", line 1543, in request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/Users/L051360/.virtualenvs/temp_env/lib/python3.12/site-packages/openai/_base_client.py", line 1644, in _request
raise self._make_status_error_from_response(err.response) from None
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: 1********************82. You can find your API key at [https://platform.openai.com/account/api-keys.'](https://platform.openai.com/account/api-keys.%27), 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

My point is why is it even falling back to OpenAI keys when custom model is being supplied.
In my simplest form my test code is below:

For obvious reasons, I have omitted my Azure OpenAI credentials in the below code.


from deepeval.synthesizer import Synthesizer
from deepeval.models.base_model import DeepEvalBaseLLM
from deepeval.synthesizer.config import ContextConstructionConfig
from deepeval.models import DeepEvalBaseEmbeddingModel
from langchain_openai import AzureOpenAIEmbeddings, AzureChatOpenAI
from typing import List

class AzureOpenAI(DeepEvalBaseLLM):
    def __init__(self, model):
        self.model = model

    def load_model(self):
        return self.model

    def generate(self, prompt: str) -> str:
        chat_model = self.load_model()
        return chat_model.invoke(prompt).content

    async def a_generate(self, prompt: str) -> str:
        chat_model = self.load_model()
        res = await chat_model.ainvoke(prompt)
        return res.content

    def get_model_name(self):
        return "Custom Azure OpenAI Model"

custom_model = AzureChatOpenAI(
    openai_api_version="...",
    azure_deployment="...",
    azure_endpoint="...",
    openai_api_key="...",
)
azure_openai = AzureOpenAI(model=custom_model)

class CustomEmbeddingModel(DeepEvalBaseEmbeddingModel):
    def __init__(self):
        pass

    def load_model(self):
        return AzureOpenAIEmbeddings(
            openai_api_version="...",
            azure_deployment="...",
            azure_endpoint="...",
            openai_api_key="...",
        )

    def embed_text(self, text: str) -> List[float]:
        embedding_model = self.load_model()
        return embedding_model.embed_query(text)

    def embed_texts(self, texts: List[str]) -> List[List[float]]:
        embedding_model = self.load_model()
        return embedding_model.embed_documents(texts)

    async def a_embed_text(self, text: str) -> List[float]:
        embedding_model = self.load_model()
        return await embedding_model.aembed_query(text)

    async def a_embed_texts(self, texts: List[str]) -> List[List[float]]:
        embedding_model = self.load_model()
        return await embedding_model.aembed_documents(texts)

    def get_model_name(self):
        "Custom Azure Embedding Model"    


azure_embeddings = CustomEmbeddingModel()

conf = ContextConstructionConfig(
    chunk_overlap=0,
    chunk_size=100,
    critic_model=azure_openai,
    embedder=azure_embeddings,
)

docs = ["uploaded_document.pdf"]

synth = Synthesizer()
synth.generate_goldens_from_docs(context_construction_config=conf, document_paths=docs)

@RajeswariKumaran
Copy link

I am trying with custom LLM using a model from amazonbedrock and I get the same error with the latest version. So downgraded DeepEval to 1.3.2, where it works without error. But there is no FiltrationConfig or ContextConstructionConfig in this version. You simply pass the model, critic_model and embedder while initializing synthesizer like:

synthesizer = Synthesizer(model=awsbedrock, critic_model=awsbedrock, embedder=awsbedrockembed)

Anyone knows if this issue is in DeepEval's list to be fixed?

@penguine-ip
Copy link
Contributor

Hey @pratikchhapolika @3051360 @RajeswariKumaran totally missed this thread. We're active every minute in our discord but issues on github can get very fragmented so i suggest in the future if you want things to be fixed immediately go to discord.

@pratikchhapolika if you're still around, can you try again in the latest version

@3051360 it is because you didn't supply a model to synthesizer. The generation pipeline uses multiple LLMs and you can learn more about here: https://docs.confident-ai.com/docs/synthesizer-introduction#how-does-it-work
Can you try supplying the model and know if it works?

@RajeswariKumaran How did you define your synthesizer before you downgraded? I can't comment without seeing your code but there are a few places where LLMs are used so not specifying custom models for every part of your pipeline might be the error.

@RajeswariKumaran
Copy link

RajeswariKumaran commented Jan 24, 2025

@penguine-ip , thanks for getting back on this. I tried this again with the current version of deepeval v2.2.5 and made the necessary changes back and it worked fine! Earlier I got the error "AssertionError: n_contexts_per_doc must be a positive integer." I am not sure what I changed in the interim but I dont see this issue now with Amazonbedrock custom llm.

After defining classes using DeepEvalBaseLLM (awsbedrock) and DeepEvalBaseEmbeddingModel (awsbedrockembeddings) for amazonbedrock, I used the following:

filtration_config = FiltrationConfig(
critic_model=awsbedrock,
synthetic_input_quality_threshold=0.5
)
synthesizer = Synthesizer(model=awsbedrock, filtration_config=filtration_config)

context_construction_config = ContextConstructionConfig(max_contexts_per_document=5, critic_model=awsbedrock, embedder=awsbedrockembed)

synthesizer.generate_goldens_from_docs(
document_paths=[doc],
include_expected_output=True
context_construction_config=context_construction_config,
max_goldens_per_context=5
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants