Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation fix for Custom Embedding Functions #1886

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mlinmg
Copy link

@mlinmg mlinmg commented Jan 13, 2025

Motivation

When I tried to set up custom embedder, I could not pass the OAI embedding funct like it is said in the documentation, it would gave

Traceback (most recent call last):
...
pydantic_core._pydantic_core.ValidationError: 1 validation error for MarketingCopyCrew
embedder
  Input should be a valid dictionary [type=dict_type, input_value=<chromadb.utils.embedding...bject at 0x74ebaabfbeb0>, input_type=OpenAIEmbeddingFunction]
    For further information visit https://errors.pydantic.dev/2.10/v/dict_type

Even by setting a custom config it would still only route to the default OAI url, not using the user-defined base url, with no possibility of having a custom one.
Added OpenAiEmbeddingFunction as an allowed type to be in line with the documentation and to accept custom OAI-like embedder.

EDIT: I later found the part where you use the custom fn, SO i changed this PR to a documentation fix pull request

Added OpenAiEmbeddingFunction as an allowed type to be in line with the documentation and to accept custom OAI-like embedder
@mlinmg mlinmg marked this pull request as draft January 13, 2025 12:51
@mlinmg mlinmg changed the title Update crew.py Documentation fix for Custom Embedding Functions Jan 13, 2025
@mlinmg mlinmg marked this pull request as ready for review January 13, 2025 13:16
@joaomdmoura
Copy link
Collaborator

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment for PR #1886

Overview

The changes implemented in this PR address documentation alignment and type definitions for the embedding functions in the CrewAI framework. Overall, the updates are beneficial as they enhance the clarity and usability of the embedder configurations.

Code Quality Findings

1. src/crewai/crew.py

  • Type Definition Inconsistency: The initial attempt to introduce OpenAIEmbeddingFunction as a valid type for the embedder was ultimately reverted. This change seems to stem from an ambiguity regarding the intended flexibility of the embedding configuration.
  • Import Management: The transient inclusion and subsequent removal of the OpenAI embedding import raises concerns about maintaining consistency and clarity in the codebase.

Suggestions for Improvement

  • Maintain Dictionary Configuration: Instead of allowing direct instances of embedding functions, it is advisable to retain a dictionary-based configuration. Consider adding type validations for this dictionary:
embedder: Optional[dict] = Field(
    default=None,
    description="Configuration for the embedder to be used for the crew. Must contain 'provider' key with embdder instance.",
    # Suggested validation logic if using Pydantic
    validators=[
        Dict({
            'provider': Any,
            Optional('additional_config'): Any
        })
    ]
)
  • Enhance Docstrings: Including type hints in the docstring will improve the developer experience:
embedder: Optional[dict] = Field(
    default=None,
    description="Configuration for the embedder to be used for the crew.\nExample:\n  {'provider': OpenAIEmbeddingFunction(), 'additional_config': {...}}",
)

2. docs/concepts/memory.mdx

  • Documentation Accuracy: Well-done on updating the documentation to reflect the proper configuration methods using a dictionary format with a 'provider' key.
  • Formatting Inconsistencies: The presence of trailing spaces and varied formatting can detract from the professional presentation of the documentation.

Improvements Needed

  • Expand on Examples: Enhance the documentation with more comprehensive examples covering various embedding scenarios:
### Embedding Configuration Examples

#### Using OpenAI Embeddings
```python
embedder={
    'provider': OpenAIEmbeddingFunction(
        api_key=os.getenv("OPENAI_API_KEY"),
        model_name="text-embedding-3-small"
    )
}

Using Custom Embeddings

embedder={
    'provider': CustomEmbeddingFunction(),
    'additional_config': {
        'dimension': 1536,
        'normalize': True
    }
}

- **Clean Up Documentation**: Remove trailing spaces and ensure a consistent format throughout.

## Historical Context and Related PRs
Reflecting on previous pull requests that have dealt with embedding functions can offer valuable insights into best practices and design considerations. Reviewing how similar changes were approached can facilitate better decision-making regarding type management and documentation consistency.

## Conclusion
The changes made in this pull request improve the accuracy of documentation concerning embedder configuration, preventing misconfigurations and enhancing the developer experience. However, there are still opportunities to bolster type safety and introduce robust validation mechanisms. Moving forward, maintaining a clear and coherent structure both in code and documentation will significantly improve the usability of embedding functions within the CrewAI framework.

Let’s ensure continued alignment of our documentation with the implementation, and consider implementing refined type safety mechanisms to promote robustness and clarity in embedding configurations. 

Thank you for the effort on this PR!

@joaomdmoura
Copy link
Collaborator

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment for PR #1886

Overview

The proposed changes to src/crewai/crew.py and docs/concepts/memory.mdx focus on refining the embedder configuration handling and updating relevant documentation. Below is a summary of findings, suggestions for improvements, and the context garnered from related discussions.

1. src/crewai/crew.py Analysis

Review Comments

  1. Type Definition Changes:

    • The reversion to a strict dict typing for the embedder field enhances encapsulation and future flexibility. The initial attempt to integrate OpenAIEmbeddingFunction was wise but the final decision checks for broader compatibility.
  2. Import Cleanup:

    • Excellent practice in maintaining clean imports by removing unused references.

Suggestions for Improvement

  • Enhanced Type Documentation:

    • Specify the expected format more explicitly for the embedder field:
      embedder: Optional[dict] = Field(
          default=None,
          description="""Configuration for the embedder.
          Expected format: {
              'provider': <embedding_provider>,
              'config': <provider_specific_config>
          }""",
      )
  • Validation Additions:

    • A potential validator would ensure passed configurations have the required structure:
      @validator('embedder')
      def validate_embedder(cls, v):
          if v is not None and not isinstance(v, dict):
              raise ValueError("Embedder must be a dictionary configuration")
          if 'provider' not in v:
              raise ValueError("Embedder configuration must include 'provider' key")
          return v

2. docs/concepts/memory.mdx Analysis

Review Comments

  1. Documentation Alignment:
    • All examples now correctly showcase the dictionary format for embedder configurations. This alignment greatly enhances user understanding.

Suggestions for Improvement

  • Add Type Hints in Examples:

    • Clarifying type expectations in the docs would significantly help users:
      embedder: dict = {
          'provider': OpenAIEmbeddingFunction(
              api_key=os.getenv("OPENAI_API_KEY"),
              model_name="text-embedding-3-small"
          )
      }
  • Include Configuration Schema:

    • Establishing a clear schema for configuration usage would improve developer experience:
      ## Embedder Configuration Schema
      ```python
      {
          'provider': <EmbeddingFunction>,  # Required
          'config': {                       # Optional
              'collection_name': str,       # Optional
              'distance_function': str      # Optional
          }
      }
      
      

Overall Recommendations

  1. Code Structure:

    • Retain the strict dict typing for the embedder in crew.py.
    • Elevate validation and consider creating a Pydantic model for embedding configuration.
  2. Documentation:

    • Introduce troubleshooting guidance for embedder configurations, including common errors and solutions.
    • Provide diverse real-world examples to bolster clarity.
  3. Testing:

    • Integrate unit and integration tests to cover various scenarios pertaining to embedder configurations.

4. Security Considerations

  • Ensure that API keys are appropriately managed in examples to uphold security best practices.
  • Input sanitization for custom embedding configurations ought to be considered to mitigate potential vulnerabilities.

This review demonstrates a well-considered and holistic enhancement of the functionality and clarity surrounding the embedder's configuration, successfully maintaining backward compatibility while fostering a better development experience.

@bhancockio bhancockio requested a review from lorenzejay January 14, 2025 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants