Skip to content

Commit

Permalink
Merge branch 'main' into siddhi-embedding
Browse files Browse the repository at this point in the history
  • Loading branch information
siddhivelankar23 committed Aug 19, 2024
2 parents 64d1ecc + 60cc0b0 commit 08fc6c6
Show file tree
Hide file tree
Showing 134 changed files with 674 additions and 508 deletions.
5 changes: 4 additions & 1 deletion .github/workflows/_get-test-matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,10 @@ jobs:
run: |
set -xe
if [ "${{ github.event_name }}" == "pull_request" ] || [ "${{ github.event_name }}" == "pull_request_target" ]; then
base_commit=${{ github.event.pull_request.base.sha }}
LATEST_COMMIT_SHA=$(curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" \
"https://api.github.com/repos/opea-project/GenAIComps/commits?sha=main" | jq -r '.[0].sha')
echo "Latest commit SHA is $LATEST_COMMIT_SHA"
base_commit=$LATEST_COMMIT_SHA
else
base_commit=$(git rev-parse HEAD~1) # push event
fi
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/pr-examples-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ on:
- comps/llms/text-generation/tgi/**
- comps/dataprep/redis/langchain/**
- requirements.txt
- "!**.md"

# If there is a new commit, the previous jobs will be canceled
concurrency:
Expand Down
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,6 @@ A `Microservices` can be created by using the decorator `register_microservice`.

```python
from langchain_community.embeddings import HuggingFaceHubEmbeddings
from langsmith import traceable

from comps import register_microservice, EmbedDoc, ServiceType, TextDoc

Expand All @@ -187,7 +186,6 @@ from comps import register_microservice, EmbedDoc, ServiceType, TextDoc
input_datatype=TextDoc,
output_datatype=EmbedDoc,
)
@traceable(run_type="embedding")
def embedding(input: TextDoc) -> EmbedDoc:
embed_vector = embeddings.embed_query(input.text)
res = EmbedDoc(text=input.text, embedding=embed_vector)
Expand Down
16 changes: 8 additions & 8 deletions comps/agent/langchain/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,32 +4,32 @@ The langchain agent model refers to a framework that integrates the reasoning ca

![Architecture Overview](agent_arch.jpg)

# 🚀1. Start Microservice with Python(Option 1)
## 🚀1. Start Microservice with Python(Option 1)

## 1.1 Install Requirements
### 1.1 Install Requirements

```bash
cd comps/agent/langchain/
pip install -r requirements.txt
```

## 1.2 Start Microservice with Python Script
### 1.2 Start Microservice with Python Script

```bash
cd comps/agent/langchain/
python agent.py
```

# 🚀2. Start Microservice with Docker (Option 2)
## 🚀2. Start Microservice with Docker (Option 2)

## Build Microservices
### Build Microservices

```bash
cd GenAIComps/ # back to GenAIComps/ folder
docker build -t opea/comps-agent-langchain:latest -f comps/agent/langchain/docker/Dockerfile .
```

## start microservices
### start microservices

```bash
export ip_address=$(hostname -I | awk '{print $1}')
Expand All @@ -56,7 +56,7 @@ docker logs comps-langchain-agent-endpoint
> docker run --rm --runtime=runc --name="comps-langchain-agent-endpoint" -v ./comps/agent/langchain/:/home/user/comps/agent/langchain/ -p 9090:9090 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} --env-file ${agent_env} opea/comps-agent-langchain:latest
> ```
# 🚀3. Validate Microservice
## 🚀3. Validate Microservice
Once microservice starts, user can use below script to invoke.
Expand All @@ -73,7 +73,7 @@ data: [DONE]
```
# 🚀4. Provide your own tools
## 🚀4. Provide your own tools

- Define tools

Expand Down
1 change: 0 additions & 1 deletion comps/agent/langchain/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ langchain-openai
langchain_community
langchainhub
langgraph
langsmith
numpy

# used by cloud native
Expand Down
24 changes: 12 additions & 12 deletions comps/asr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,17 @@

ASR (Audio-Speech-Recognition) microservice helps users convert speech to text. When building a talking bot with LLM, users will need to convert their audio inputs (What they talk, or Input audio from other sources) to text, so the LLM is able to tokenize the text and generate an answer. This microservice is built for that conversion stage.

# 🚀1. Start Microservice with Python (Option 1)
## 🚀1. Start Microservice with Python (Option 1)

To start the ASR microservice with Python, you need to first install python packages.

## 1.1 Install Requirements
### 1.1 Install Requirements

```bash
pip install -r requirements.txt
```

## 1.2 Start Whisper Service/Test
### 1.2 Start Whisper Service/Test

- Xeon CPU

Expand Down Expand Up @@ -40,7 +40,7 @@ nohup python whisper_server.py --device=hpu &
python check_whisper_server.py
```

## 1.3 Start ASR Service/Test
### 1.3 Start ASR Service/Test

```bash
cd ../
Expand All @@ -54,13 +54,13 @@ While the Whisper service is running, you can start the ASR service. If the ASR
{'id': '0e686efd33175ce0ebcf7e0ed7431673', 'text': 'who is pat gelsinger'}
```

# 🚀2. Start Microservice with Docker (Option 2)
## 🚀2. Start Microservice with Docker (Option 2)

Alternatively, you can also start the ASR microservice with Docker.

## 2.1 Build Images
### 2.1 Build Images

### 2.1.1 Whisper Server Image
#### 2.1.1 Whisper Server Image

- Xeon CPU

Expand All @@ -76,15 +76,15 @@ cd ../..
docker build -t opea/whisper-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile_hpu .
```

### 2.1.2 ASR Service Image
#### 2.1.2 ASR Service Image

```bash
docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/Dockerfile .
```

## 2.2 Start Whisper and ASR Service
### 2.2 Start Whisper and ASR Service

### 2.2.1 Start Whisper Server
#### 2.2.1 Start Whisper Server

- Xeon

Expand All @@ -98,15 +98,15 @@ docker run -p 7066:7066 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$htt
docker run -p 7066:7066 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/whisper-gaudi:latest
```

### 2.2.2 Start ASR service
#### 2.2.2 Start ASR service

```bash
ip_address=$(hostname -I | awk '{print $1}')

docker run -d -p 9099:9099 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e ASR_ENDPOINT=http://$ip_address:7066 opea/asr:latest
```

### 2.2.3 Test
#### 2.2.3 Test

```bash
# Use curl or python
Expand Down
8 changes: 4 additions & 4 deletions comps/chathistory/mongo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,16 @@ export DB_NAME=${DB_NAME}
export COLLECTION_NAME=${COLLECTION_NAME}
```

# 🚀Start Microservice with Docker
## 🚀Start Microservice with Docker

## Build Docker Image
### Build Docker Image

```bash
cd ../../../../
docker build -t opea/chathistory-mongo-server:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/chathistory/mongo/docker/Dockerfile .
```

## Run Docker with CLI
### Run Docker with CLI

- Run mongoDB image

Expand All @@ -40,7 +40,7 @@ docker run -d -p 27017:27017 --name=mongo mongo:latest
docker run -d --name="chathistory-mongo-server" -p 6013:6013 -p 6012:6012 -p 6014:6014 -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy -e MONGO_HOST=${MONGO_HOST} -e MONGO_PORT=${MONGO_PORT} -e DB_NAME=${DB_NAME} -e COLLECTION_NAME=${COLLECTION_NAME} opea/chathistory-mongo-server:latest
```

# Invoke Microservice
## Invoke Microservice

Once chathistory service is up and running, users can update the database by using the below API endpoint. The API returns a unique UUID for the saved conversation.

Expand Down
10 changes: 5 additions & 5 deletions comps/dataprep/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,22 +17,22 @@ Occasionally unstructured data will contain image data, to convert the image dat
export SUMMARIZE_IMAGE_VIA_LVM=1
```

# Dataprep Microservice with Redis
## Dataprep Microservice with Redis

For details, please refer to this [readme](redis/README.md)

# Dataprep Microservice with Milvus
## Dataprep Microservice with Milvus

For details, please refer to this [readme](milvus/README.md)

# Dataprep Microservice with Qdrant
## Dataprep Microservice with Qdrant

For details, please refer to this [readme](qdrant/README.md)

# Dataprep Microservice with Pinecone
## Dataprep Microservice with Pinecone

For details, please refer to this [readme](pinecone/README.md)

# Dataprep Microservice with PGVector
## Dataprep Microservice with PGVector

For details, please refer to this [readme](pgvector/README.md)
18 changes: 9 additions & 9 deletions comps/dataprep/milvus/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Dataprep Microservice with Milvus

# 🚀Start Microservice with Python
## 🚀Start Microservice with Python

## Install Requirements
### Install Requirements

```bash
pip install -r requirements.txt
Expand All @@ -11,11 +11,11 @@ apt-get install libtesseract-dev -y
apt-get install poppler-utils -y
```

## Start Milvus Server
### Start Milvus Server

Please refer to this [readme](../../../vectorstores/langchain/milvus/README.md).

## Setup Environment Variables
### Setup Environment Variables

```bash
export no_proxy=${your_no_proxy}
Expand All @@ -27,30 +27,30 @@ export COLLECTION_NAME=${your_collection_name}
export MOSEC_EMBEDDING_ENDPOINT=${your_embedding_endpoint}
```

## Start Document Preparation Microservice for Milvus with Python Script
### Start Document Preparation Microservice for Milvus with Python Script

Start document preparation microservice for Milvus with below command.

```bash
python prepare_doc_milvus.py
```

# 🚀Start Microservice with Docker
## 🚀Start Microservice with Docker

## Build Docker Image
### Build Docker Image

```bash
cd ../../../../
docker build -t opea/dataprep-milvus:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy --build-arg no_proxy=$no_proxy -f comps/dataprep/milvus/docker/Dockerfile .
```

## Run Docker with CLI
### Run Docker with CLI

```bash
docker run -d --name="dataprep-milvus-server" -p 6010:6010 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy -e MOSEC_EMBEDDING_ENDPOINT=${your_embedding_endpoint} -e MILVUS=${your_milvus_host_ip} opea/dataprep-milvus:latest
```

# Invoke Microservice
## Invoke Microservice

Once document preparation microservice for Milvus is started, user can use below command to invoke the microservice to convert the document to embedding and save to the database.

Expand Down
4 changes: 0 additions & 4 deletions comps/dataprep/milvus/prepare_doc_milvus.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
from langchain_core.documents import Document
from langchain_milvus.vectorstores import Milvus
from langchain_text_splitters import HTMLHeaderTextSplitter
from langsmith import traceable
from pyspark import SparkConf, SparkContext

from comps import DocPath, opea_microservices, register_microservice
Expand Down Expand Up @@ -167,7 +166,6 @@ async def ingest_link_to_milvus(link_list: List[str]):


@register_microservice(name="opea_service@prepare_doc_milvus", endpoint="/v1/dataprep", host="0.0.0.0", port=6010)
@traceable(run_type="tool")
async def ingest_documents(
files: Optional[Union[UploadFile, List[UploadFile]]] = File(None),
link_list: Optional[str] = Form(None),
Expand Down Expand Up @@ -239,7 +237,6 @@ def process_files_wrapper(files):
@register_microservice(
name="opea_service@prepare_doc_milvus_file", endpoint="/v1/dataprep/get_file", host="0.0.0.0", port=6011
)
@traceable(run_type="tool")
async def rag_get_file_structure():
print("[ dataprep - get file ] start to get file structure")

Expand Down Expand Up @@ -270,7 +267,6 @@ def delete_by_partition_field(my_milvus, partition_field):
@register_microservice(
name="opea_service@prepare_doc_milvus_del", endpoint="/v1/dataprep/delete_file", host="0.0.0.0", port=6012
)
@traceable(run_type="tool")
async def delete_single_file(file_path: str = Body(..., embed=True)):
"""Delete file according to `file_path`.
Expand Down
1 change: 0 additions & 1 deletion comps/dataprep/milvus/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ langchain
langchain-community
langchain-text-splitters
langchain_milvus
langsmith
markdown
numpy
openai
Expand Down
Loading

0 comments on commit 08fc6c6

Please sign in to comment.