Skip to content

Commit

Permalink
Merge branch 'main' into devin/1735621211-fix-llm-parameter-case-norm…
Browse files Browse the repository at this point in the history
…alization
  • Loading branch information
theCyberTech authored Jan 5, 2025
2 parents 4c3253e + 440883e commit 56fb691
Show file tree
Hide file tree
Showing 55 changed files with 2,681 additions and 1,049 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ crew_tasks_output.json
.mypy_cache
.ruff_cache
.venv
agentops.log
2 changes: 1 addition & 1 deletion docs/concepts/flows.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ print("---- Final Output ----")
print(final_output)
````

``` text Output
```text Output
---- Final Output ----
Second method received: Output from first_method
````
Expand Down
228 changes: 209 additions & 19 deletions docs/concepts/knowledge.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ description: What is knowledge in CrewAI and how to use it.
icon: book
---

# Using Knowledge in CrewAI

## What is Knowledge?

Knowledge in CrewAI is a powerful system that allows AI agents to access and utilize external information sources during their tasks.
Expand Down Expand Up @@ -36,7 +34,20 @@ CrewAI supports various types of knowledge sources out of the box:
</Card>
</CardGroup>

## Quick Start
## Supported Knowledge Parameters

| Parameter | Type | Required | Description |
| :--------------------------- | :---------------------------------- | :------- | :---------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sources` | **List[BaseKnowledgeSource]** | Yes | List of knowledge sources that provide content to be stored and queried. Can include PDF, CSV, Excel, JSON, text files, or string content. |
| `collection_name` | **str** | No | Name of the collection where the knowledge will be stored. Used to identify different sets of knowledge. Defaults to "knowledge" if not provided. |
| `storage` | **Optional[KnowledgeStorage]** | No | Custom storage configuration for managing how the knowledge is stored and retrieved. If not provided, a default storage will be created. |

## Quickstart Example

<Tip>
For file-Based Knowledge Sources, make sure to place your files in a `knowledge` directory at the root of your project.
Also, use relative paths from the `knowledge` directory when creating the source.
</Tip>

Here's an example using string-based knowledge:

Expand Down Expand Up @@ -80,7 +91,8 @@ result = crew.kickoff(inputs={"question": "What city does John live in and how o
```


Here's another example with the `CrewDoclingSource`
Here's another example with the `CrewDoclingSource`. The CrewDoclingSource is actually quite versatile and can handle multiple file formats including TXT, PDF, DOCX, HTML, and more.

```python Code
from crewai import LLM, Agent, Crew, Process, Task
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
Expand Down Expand Up @@ -128,39 +140,217 @@ result = crew.kickoff(
)
```

## More Examples

Here are examples of how to use different types of knowledge sources:

### Text File Knowledge Source
```python
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource

# Create a text file knowledge source
text_source = CrewDoclingSource(
file_paths=["document.txt", "another.txt"]
)

# Create crew with text file source on agents or crew level
agent = Agent(
...
knowledge_sources=[text_source]
)

crew = Crew(
...
knowledge_sources=[text_source]
)
```

### PDF Knowledge Source
```python
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource

# Create a PDF knowledge source
pdf_source = PDFKnowledgeSource(
file_paths=["document.pdf", "another.pdf"]
)

# Create crew with PDF knowledge source on agents or crew level
agent = Agent(
...
knowledge_sources=[pdf_source]
)

crew = Crew(
...
knowledge_sources=[pdf_source]
)
```

### CSV Knowledge Source
```python
from crewai.knowledge.source.csv_knowledge_source import CSVKnowledgeSource

# Create a CSV knowledge source
csv_source = CSVKnowledgeSource(
file_paths=["data.csv"]
)

# Create crew with CSV knowledge source or on agent level
agent = Agent(
...
knowledge_sources=[csv_source]
)

crew = Crew(
...
knowledge_sources=[csv_source]
)
```

### Excel Knowledge Source
```python
from crewai.knowledge.source.excel_knowledge_source import ExcelKnowledgeSource

# Create an Excel knowledge source
excel_source = ExcelKnowledgeSource(
file_paths=["spreadsheet.xlsx"]
)

# Create crew with Excel knowledge source on agents or crew level
agent = Agent(
...
knowledge_sources=[excel_source]
)

crew = Crew(
...
knowledge_sources=[excel_source]
)
```

### JSON Knowledge Source
```python
from crewai.knowledge.source.json_knowledge_source import JSONKnowledgeSource

# Create a JSON knowledge source
json_source = JSONKnowledgeSource(
file_paths=["data.json"]
)

# Create crew with JSON knowledge source on agents or crew level
agent = Agent(
...
knowledge_sources=[json_source]
)

crew = Crew(
...
knowledge_sources=[json_source]
)
```

## Knowledge Configuration

### Chunking Configuration

Control how content is split for processing by setting the chunk size and overlap.
Knowledge sources automatically chunk content for better processing.
You can configure chunking behavior in your knowledge sources:

```python Code
knowledge_source = StringKnowledgeSource(
content="Long content...",
chunk_size=4000, # Characters per chunk (default)
chunk_overlap=200 # Overlap between chunks (default)
```python
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

source = StringKnowledgeSource(
content="Your content here",
chunk_size=4000, # Maximum size of each chunk (default: 4000)
chunk_overlap=200 # Overlap between chunks (default: 200)
)
```

## Embedder Configuration
The chunking configuration helps in:
- Breaking down large documents into manageable pieces
- Maintaining context through chunk overlap
- Optimizing retrieval accuracy

### Embeddings Configuration

You can also configure the embedder for the knowledge store.
This is useful if you want to use a different embedder for the knowledge store than the one used for the agents.
The `embedder` parameter supports various embedding model providers that include:
- `openai`: OpenAI's embedding models
- `google`: Google's text embedding models
- `azure`: Azure OpenAI embeddings
- `ollama`: Local embeddings with Ollama
- `vertexai`: Google Cloud VertexAI embeddings
- `cohere`: Cohere's embedding models
- `bedrock`: AWS Bedrock embeddings
- `huggingface`: Hugging Face models
- `watson`: IBM Watson embeddings

Here's an example of how to configure the embedder for the knowledge store using Google's `text-embedding-004` model:
<CodeGroup>
```python Example
from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
import os

You can also configure the embedder for the knowledge store. This is useful if you want to use a different embedder for the knowledge store than the one used for the agents.
# Get the GEMINI API key
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")

```python Code
...
# Create a knowledge source
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(
content="Users name is John. He is 30 years old and lives in San Francisco.",
content=content,
)

# Create an LLM with a temperature of 0 to ensure deterministic outputs
gemini_llm = LLM(
model="gemini/gemini-1.5-pro-002",
api_key=GEMINI_API_KEY,
temperature=0,
)

# Create an agent with the knowledge store
agent = Agent(
role="About User",
goal="You know everything about the user.",
backstory="""You are a master at understanding people and their preferences.""",
verbose=True,
allow_delegation=False,
llm=gemini_llm,
)

task = Task(
description="Answer the following questions about the user: {question}",
expected_output="An answer to the question.",
agent=agent,
)

crew = Crew(
...
agents=[agent],
tasks=[task],
verbose=True,
process=Process.sequential,
knowledge_sources=[string_source],
embedder={
"provider": "openai",
"config": {"model": "text-embedding-3-small"},
},
"provider": "google",
"config": {
"model": "models/text-embedding-004",
"api_key": GEMINI_API_KEY,
}
}
)

result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})
```
```text Output
# Agent: About User
## Task: Answer the following questions about the user: What city does John live in and how old is he?
# Agent: About User
## Final Answer:
John is 30 years old and lives in San Francisco.
```
</CodeGroup>
## Clearing Knowledge

If you need to clear the knowledge stored in CrewAI, you can use the `crewai reset-memories` command with the `--knowledge` option.
Expand Down
Loading

0 comments on commit 56fb691

Please sign in to comment.