update README

linto-ai · Apr 12, 2023 · 0049288 · 0049288
1 parent 5a1b5d3
commit 0049288
Show file tree

Hide file tree

Showing 2 changed files with 58 additions and 20 deletions.
diff --git a/.envdefault b/.envdefault
@@ -2,10 +2,7 @@
 APP_LANG=fr en
 ASSETS_PATH_ON_HOST=./assets
 ASSETS_PATH_IN_CONTAINER=/app/assets
-LM_MAP = "{
-    'fr': 'sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2',
-    'en': 'sentence-transformers/all-MiniLM-L6-v2'
-}"
+LM_MAP={"fr":"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2","en":"sentence-transformers/all-MiniLM-L6-v2"}
 
 # SERVING PARAMETERS
 SERVICE_MODE=http

diff --git a/README.md b/README.md
@@ -3,12 +3,9 @@
 ## Description
 This repository is for building a Docker image for LinTO's NLP service: Keyphrase Extraction on the basis of [linto-platform-nlp-core](https://github.com/linto-ai/linto-platform-nlp-core), can be deployed along with [LinTO stack](https://github.com/linto-ai/linto-platform-stack) or in a standalone way (see Develop section in below).
 
-linto-platform-nlp-keyphrase-extraction is backed by [spaCy](https://spacy.io/) v3.0+ featuring transformer-based pipelines, thus deploying with GPU support is highly recommeded for inference efficiency.
+LinTo's NLP services adopt the basic design concept of spaCy: [component and pipeline](https://spacy.io/usage/processing-pipelines), components (located under the folder `components/`) are decoupled from the service and can be easily re-used in other spaCy projects, components are organised into pipelines for realising specific NLP tasks. 
 
-LinTo's NLP services adopt the basic design concept of spaCy: [component and pipeline](https://spacy.io/usage/processing-pipelines), components are decoupled from the service and can be easily re-used in other projects, components are organised into pipelines for realising specific NLP tasks. 
-
-This service uses [FastAPI](https://fastapi.tiangolo.com/) to serve custom spaCy's components as pipelines:
-- `kpe`: Keyphrase Extraction
+This service can be launched in two ways: REST API and Celery task, with and without GPU support.
 
 ## Usage
 
@@ -29,14 +26,22 @@ bash scripts/download_models.sh
 
 2 configure running environment variables
 ```bash
-mv .envdefault .env
-# cat .envdefault
-# APP_LANG=fr en | Running language of application, "fr en", "fr", etc.
-# ASSETS_PATH_ON_HOST=./assets | Storage path of models on host. (only applicable when docker-compose is used)
-# ASSETS_PATH_IN_CONTAINER=/app/assets | Volume mount point of models in container. (only applicable when docker-compose is used)
-# WORKER_NUMBER=1 | Number of processing workers. (only applicable when docker-compose is used)
+cp .envdefault .env
 ```
 
+| Environment Variable | Description | Default Value |
+| --- | --- | --- |
+| `APP_LANG` | A space-separated list of supported languages for the application | fr en |
+| `ASSETS_PATH_ON_HOST` | The path to the assets folder on the host machine | ./assets |
+| `ASSETS_PATH_IN_CONTAINER` | The volume mount point of models in container | /app/assets |
+| `LM_MAP` | A JSON string that maps each supported language to its corresponding language model | {"fr":"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2","en":"sentence-transformers/all-MiniLM-L6-v2"} |
+| `SERVICE_MODE` | The mode in which the service is served, either "http" (REST API) or "task" (Celery task) | "http" |
+| `CONCURRENCY` | The maximum number of requests that can be handled concurrently | 1 |
+| `USE_GPU` | A flag indicating whether to use GPU for computation or not, either "True" or "False" | True |
+| `SERVICE_NAME` | The name of the micro-service | kpe |
+| `SERVICES_BROKER` | The URL of the broker server used for communication between micro-services | "redis://localhost:6379" |
+| `BROKER_PASS` | The password for accessing the broker server | None |
+
 4 Build image
 ```bash
 sudo docker build --tag lintoai/linto-platform-nlp-keyphrase-extraction:latest .
@@ -52,22 +57,29 @@ sudo docker run --gpus all \
 --rm -p 80:80 \
 -v $PWD/assets:/app/assets:ro \
 --env-file .env \
-lintoai/linto-platform-nlp-keyphrase-extraction:latest \
---workers 1
+lintoai/linto-platform-nlp-keyphrase-extraction:latest
 ```
+<details>
+  <summary>Check running with CPU only setting</summary>
+
+  - remove `--gpus all` from the first command.
+  - set `USE_GPU=False` in the `.env`.
+</details>
+
 or
+
 ```bash
 sudo docker-compose up
 ```
 <details>
   <summary>Check running with CPU only setting</summary>
 
-  - remove `--gpus all` from the first command.
   - remove `runtime: nvidia` from the `docker-compose.yml` file.
+  - set `USE_GPU=False` in the `.env`.
 </details>
 
 
-6 Navigate to `http://localhost/docs` or `http://localhost/redoc` in your browser, to explore the REST API interactively. See the examples for how to query the API.
+6 If running under `SERVICE_MODE=http`, navigate to `http://localhost/docs` or `http://localhost/redoc` in your browser, to explore the REST API interactively. See the examples for how to query the API. If running under `SERVICE_MODE=task`, plese refers to the individual section in the end of this README.
 
 
 ## Specification for `http://localhost/kpe/{lang}`
@@ -198,4 +210,33 @@ Component's config can be modified in [`components/config.cfg`](components/confi
 ```
 
 ### Advanced usage
-For advanced usage, such as Max Sum Similarity and Maximal Marginal Relevance for diversifying extraction results, please refer to the documentation of [KeyBERT](https://maartengr.github.io/KeyBERT/guides/quickstart.html#usage) and [medium post](https://towardsdatascience.com/keyword-extraction-with-bert-724efca412ea) to know how it works.
+For advanced usage, such as Max Sum Similarity and Maximal Marginal Relevance for diversifying extraction results, please refer to the documentation of [KeyBERT](https://maartengr.github.io/KeyBERT/guides/quickstart.html#usage) and [medium post](https://towardsdatascience.com/keyword-extraction-with-bert-724efca412ea) to know how it works.
+
+
+## Testing Celery mode locally
+1 Install Redis on your local machine, and run it with:
+```bash
+redis-server --protected-mode no --bind 0.0.0.0 --loglevel debug
+```
+
+2 Make sure in your `.env`, these two variables are set correctly as `SERVICE_MODE=task` and `SERVICES_BROKER=redis://172.17.0.1:6379`
+
+Then start your docker container with either `docker run` or `docker-compose up` as shown in the previous section.
+
+3 On your local computer, run this python script: 
+```python
+from celery import Celery
+celery = Celery(broker='redis://localhost:6379/0', backend='redis://localhost:6379/1')
+r = celery.send_task(
+    'kpe_task', 
+    (
+        'en', 
+        [
+            "Apple Inc. is an American multinational technology company that specializes in consumer electronics, computer software and online services.",
+            "Unsupervised learning is a type of machine learning in which the algorithm is not provided with any pre-assigned labels or scores for the training data. As a result, unsupervised learning algorithms must first self-discover any naturally occurring patterns in that training data set."
+        ],
+        {"kpe": {"top_n": 3}}
+    ),
+    queue='kpe')
+r.get()
+```