diff --git a/comps/dataprep/vdms/README.md b/comps/dataprep/vdms/README.md index 7c4d8e86f..132a8816b 100644 --- a/comps/dataprep/vdms/README.md +++ b/comps/dataprep/vdms/README.md @@ -6,9 +6,9 @@ For dataprep microservice, we currently provide one framework: `Langchain`. We organized the folders in the same way, so you can use either framework for dataprep microservice with the following constructions. -# 🚀1. Start Microservice with Python (Option 1) +## 🚀1. Start Microservice with Python (Option 1) -## 1.1 Install Requirements +### 1.1 Install Requirements Install Single-process version (for 1-10 files processing) @@ -25,11 +25,11 @@ pip install -r requirements.txt cd langchain_ray; pip install -r requirements_ray.txt ``` --> -## 1.2 Start VDMS Server +### 1.2 Start VDMS Server -Please refer to this [readme](../../vectorstores/vdms/README.md). +Refer to this [readme](../../vectorstores/vdms/README.md). -## 1.3 Setup Environment Variables +### 1.3 Setup Environment Variables ```bash export http_proxy=${your_http_proxy} @@ -40,7 +40,7 @@ export COLLECTION_NAME=${your_collection_name} export PYTHONPATH=${path_to_comps} ``` -## 1.4 Start Document Preparation Microservice for VDMS with Python Script +### 1.4 Start Document Preparation Microservice for VDMS with Python Script Start document preparation microservice for VDMS with below command. @@ -56,13 +56,13 @@ python prepare_doc_vdms.py python prepare_doc_redis_on_ray.py ``` --> -# 🚀2. Start Microservice with Docker (Option 2) +## 🚀2. Start Microservice with Docker (Option 2) -## 2.1 Start VDMS Server +### 2.1 Start VDMS Server -Please refer to this [readme](../../vectorstores/vdms/README.md). +Refer to this [readme](../../vectorstores/vdms/README.md). -## 2.2 Setup Environment Variables +### 2.2 Setup Environment Variables ```bash export http_proxy=${your_http_proxy} @@ -76,16 +76,16 @@ export DISTANCE_STRATEGY="L2" export PYTHONPATH=${path_to_comps} ``` -## 2.3 Build Docker Image +### 2.3 Build Docker Image - Build docker image with langchain -Start single-process version (for 1-10 files processing) + Start single-process version (for 1-10 files processing) -```bash -cd ../../../ -docker build -t opea/dataprep-vdms:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/vdms/langchain/Dockerfile . -``` + ```bash + cd ../../../ + docker build -t opea/dataprep-vdms:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/vdms/langchain/Dockerfile . + ``` -## 2.4 Run Docker with CLI +### 2.4 Run Docker with CLI Start single-process version (for 1-10 files processing) @@ -113,13 +113,13 @@ docker run -d --name="dataprep-vdms-server" -p 6007:6007 --runtime=runc --ipc=ho -e TIMEOUT_SECONDS=600 opea/dataprep-on-ray-vdms:latest ``` --> -# 🚀3. Status Microservice +## 🚀3. Status Microservice ```bash docker container logs -f dataprep-vdms-server ``` -# 🚀4. Consume Microservice +## 🚀4. Consume Microservice Once document preparation microservice for VDMS is started, user can use below command to invoke the microservice to convert the document to embedding and save to the database. @@ -127,61 +127,61 @@ Make sure the file path after `files=@` is correct. - Single file upload -```bash -curl -X POST \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./file1.txt" \ - http://localhost:6007/v1/dataprep -``` + ```bash + curl -X POST \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./file1.txt" \ + http://localhost:6007/v1/dataprep + ``` -You can specify chunk_size and chunk_size by the following commands. + You can specify `chunk_size` and `chunk_overlap` by the following commands. -```bash -curl -X POST \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./LLAMA2_page6.pdf" \ - -F "chunk_size=1500" \ - -F "chunk_overlap=100" \ - http://localhost:6007/v1/dataprep -``` + ```bash + curl -X POST \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./LLAMA2_page6.pdf" \ + -F "chunk_size=1500" \ + -F "chunk_overlap=100" \ + http://localhost:6007/v1/dataprep + ``` - Multiple file upload -```bash -curl -X POST \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./file1.txt" \ - -F "files=@./file2.txt" \ - -F "files=@./file3.txt" \ - http://localhost:6007/v1/dataprep -``` - -- Links upload (not supported for llama_index now) - -```bash -curl -X POST \ - -F 'link_list=["https://www.ces.tech/"]' \ - http://localhost:6007/v1/dataprep -``` - -or - -```python -import requests -import json - -proxies = {"http": ""} -url = "http://localhost:6007/v1/dataprep" -urls = [ - "https://towardsdatascience.com/no-gpu-no-party-fine-tune-bert-for-sentiment-analysis-with-vertex-ai-custom-jobs-d8fc410e908b?source=rss----7f60cf5620c9---4" -] -payload = {"link_list": json.dumps(urls)} - -try: - resp = requests.post(url=url, data=payload, proxies=proxies) - print(resp.text) - resp.raise_for_status() # Raise an exception for unsuccessful HTTP status codes - print("Request successful!") -except requests.exceptions.RequestException as e: - print("An error occurred:", e) -``` + ```bash + curl -X POST \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./file1.txt" \ + -F "files=@./file2.txt" \ + -F "files=@./file3.txt" \ + http://localhost:6007/v1/dataprep + ``` + +- Links upload (not supported for `llama_index` now) + + ```bash + curl -X POST \ + -F 'link_list=["https://www.ces.tech/"]' \ + http://localhost:6007/v1/dataprep + ``` + + or + + ```python + import requests + import json + + proxies = {"http": ""} + url = "http://localhost:6007/v1/dataprep" + urls = [ + "https://towardsdatascience.com/no-gpu-no-party-fine-tune-bert-for-sentiment-analysis-with-vertex-ai-custom-jobs-d8fc410e908b?source=rss----7f60cf5620c9---4" + ] + payload = {"link_list": json.dumps(urls)} + + try: + resp = requests.post(url=url, data=payload, proxies=proxies) + print(resp.text) + resp.raise_for_status() # Raise an exception for unsuccessful HTTP status codes + print("Request successful!") + except requests.exceptions.RequestException as e: + print("An error occurred:", e) + ``` diff --git a/comps/dataprep/vdms/multimodal_langchain/README.md b/comps/dataprep/vdms/multimodal_langchain/README.md index 0b5b721fa..2d86c28b1 100644 --- a/comps/dataprep/vdms/multimodal_langchain/README.md +++ b/comps/dataprep/vdms/multimodal_langchain/README.md @@ -2,25 +2,25 @@ For dataprep microservice, we currently provide one framework: `Langchain`. -# 🚀1. Start Microservice with Python (Option 1) +## 🚀1. Start Microservice with Python (Option 1) -## 1.1 Install Requirements +### 1.1 Install Requirements - option 1: Install Single-process version (for 1-10 files processing) -```bash -apt-get update -apt-get install -y default-jre tesseract-ocr libtesseract-dev poppler-utils -pip install -r requirements.txt -``` + ```bash + apt-get update + apt-get install -y default-jre tesseract-ocr libtesseract-dev poppler-utils + pip install -r requirements.txt + ``` -## 1.2 Start VDMS Server +### 1.2 Start VDMS Server ```bash docker run -d --name="vdms-vector-db" -p 55555:55555 intellabs/vdms:latest ``` -## 1.3 Setup Environment Variables +### 1.3 Setup Environment Variables ```bash export http_proxy=${your_http_proxy} @@ -33,7 +33,7 @@ export your_hf_api_token="{your_hf_token}" export PYTHONPATH=${path_to_comps} ``` -## 1.4 Start Data Preparation Microservice for VDMS with Python Script +### 1.4 Start Data Preparation Microservice for VDMS with Python Script Start document preparation microservice for VDMS with below command. @@ -41,15 +41,15 @@ Start document preparation microservice for VDMS with below command. python ingest_videos.py ``` -# 🚀2. Start Microservice with Docker (Option 2) +## 🚀2. Start Microservice with Docker (Option 2) -## 2.1 Start VDMS Server +### 2.1 Start VDMS Server ```bash docker run -d --name="vdms-vector-db" -p 55555:55555 intellabs/vdms:latest ``` -## 2.1 Setup Environment Variables +### 2.1 Setup Environment Variables ```bash export http_proxy=${your_http_proxy} @@ -61,29 +61,29 @@ export INDEX_NAME="rag-vdms" export your_hf_api_token="{your_hf_token}" ``` -## 2.3 Build Docker Image +### 2.3 Build Docker Image - Build docker image -```bash -cd ../../../ - docker build -t opea/dataprep-vdms:latest --network host --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/vdms/multimodal_langchain/Dockerfile . + ```bash + cd ../../../ + docker build -t opea/dataprep-vdms:latest --network host --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/vdms/multimodal_langchain/Dockerfile . -``` + ``` -## 2.4 Run Docker Compose +### 2.4 Run Docker Compose ```bash docker compose -f comps/dataprep/vdms/multimodal_langchain/docker-compose-dataprep-vdms.yaml up -d ``` -# 🚀3. Status Microservice +## 🚀3. Status Microservice ```bash docker container logs -f dataprep-vdms-server ``` -# 🚀4. Consume Microservice +## 🚀4. Consume Microservice Once data preparation microservice for VDMS is started, user can use below command to invoke the microservice to convert the videos to embedding and save to the database. @@ -91,34 +91,34 @@ Make sure the file path after `files=@` is correct. - Single file upload -```bash -curl -X POST \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./file1.mp4" \ - http://localhost:6007/v1/dataprep -``` + ```bash + curl -X POST \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./file1.mp4" \ + http://localhost:6007/v1/dataprep + ``` - Multiple file upload -```bash -curl -X POST \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./file1.mp4" \ - -F "files=@./file2.mp4" \ - -F "files=@./file3.mp4" \ - http://localhost:6007/v1/dataprep -``` + ```bash + curl -X POST \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./file1.mp4" \ + -F "files=@./file2.mp4" \ + -F "files=@./file3.mp4" \ + http://localhost:6007/v1/dataprep + ``` - List of uploaded files -```bash -curl -X GET http://localhost:6007/v1/dataprep/get_videos -``` + ```bash + curl -X GET http://localhost:6007/v1/dataprep/get_videos + ``` - Download uploaded files -Please use the file name from the list + Use the file name from the list -```bash -curl -X GET http://localhost:6007/v1/dataprep/get_file/${filename} -``` + ```bash + curl -X GET http://localhost:6007/v1/dataprep/get_file/${filename} + ```