Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
guokan-shang committed Apr 12, 2023
1 parent 5a1b5d3 commit 0049288
Show file tree
Hide file tree
Showing 2 changed files with 58 additions and 20 deletions.
5 changes: 1 addition & 4 deletions .envdefault
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,7 @@
APP_LANG=fr en
ASSETS_PATH_ON_HOST=./assets
ASSETS_PATH_IN_CONTAINER=/app/assets
LM_MAP = "{
'fr': 'sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2',
'en': 'sentence-transformers/all-MiniLM-L6-v2'
}"
LM_MAP={"fr":"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2","en":"sentence-transformers/all-MiniLM-L6-v2"}

# SERVING PARAMETERS
SERVICE_MODE=http
Expand Down
73 changes: 57 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,9 @@
## Description
This repository is for building a Docker image for LinTO's NLP service: Keyphrase Extraction on the basis of [linto-platform-nlp-core](https://github.com/linto-ai/linto-platform-nlp-core), can be deployed along with [LinTO stack](https://github.com/linto-ai/linto-platform-stack) or in a standalone way (see Develop section in below).

linto-platform-nlp-keyphrase-extraction is backed by [spaCy](https://spacy.io/) v3.0+ featuring transformer-based pipelines, thus deploying with GPU support is highly recommeded for inference efficiency.
LinTo's NLP services adopt the basic design concept of spaCy: [component and pipeline](https://spacy.io/usage/processing-pipelines), components (located under the folder `components/`) are decoupled from the service and can be easily re-used in other spaCy projects, components are organised into pipelines for realising specific NLP tasks.

LinTo's NLP services adopt the basic design concept of spaCy: [component and pipeline](https://spacy.io/usage/processing-pipelines), components are decoupled from the service and can be easily re-used in other projects, components are organised into pipelines for realising specific NLP tasks.

This service uses [FastAPI](https://fastapi.tiangolo.com/) to serve custom spaCy's components as pipelines:
- `kpe`: Keyphrase Extraction
This service can be launched in two ways: REST API and Celery task, with and without GPU support.

## Usage

Expand All @@ -29,14 +26,22 @@ bash scripts/download_models.sh

2 configure running environment variables
```bash
mv .envdefault .env
# cat .envdefault
# APP_LANG=fr en | Running language of application, "fr en", "fr", etc.
# ASSETS_PATH_ON_HOST=./assets | Storage path of models on host. (only applicable when docker-compose is used)
# ASSETS_PATH_IN_CONTAINER=/app/assets | Volume mount point of models in container. (only applicable when docker-compose is used)
# WORKER_NUMBER=1 | Number of processing workers. (only applicable when docker-compose is used)
cp .envdefault .env
```

| Environment Variable | Description | Default Value |
| --- | --- | --- |
| `APP_LANG` | A space-separated list of supported languages for the application | fr en |
| `ASSETS_PATH_ON_HOST` | The path to the assets folder on the host machine | ./assets |
| `ASSETS_PATH_IN_CONTAINER` | The volume mount point of models in container | /app/assets |
| `LM_MAP` | A JSON string that maps each supported language to its corresponding language model | {"fr":"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2","en":"sentence-transformers/all-MiniLM-L6-v2"} |
| `SERVICE_MODE` | The mode in which the service is served, either "http" (REST API) or "task" (Celery task) | "http" |
| `CONCURRENCY` | The maximum number of requests that can be handled concurrently | 1 |
| `USE_GPU` | A flag indicating whether to use GPU for computation or not, either "True" or "False" | True |
| `SERVICE_NAME` | The name of the micro-service | kpe |
| `SERVICES_BROKER` | The URL of the broker server used for communication between micro-services | "redis://localhost:6379" |
| `BROKER_PASS` | The password for accessing the broker server | None |

4 Build image
```bash
sudo docker build --tag lintoai/linto-platform-nlp-keyphrase-extraction:latest .
Expand All @@ -52,22 +57,29 @@ sudo docker run --gpus all \
--rm -p 80:80 \
-v $PWD/assets:/app/assets:ro \
--env-file .env \
lintoai/linto-platform-nlp-keyphrase-extraction:latest \
--workers 1
lintoai/linto-platform-nlp-keyphrase-extraction:latest
```
<details>
<summary>Check running with CPU only setting</summary>

- remove `--gpus all` from the first command.
- set `USE_GPU=False` in the `.env`.
</details>

or

```bash
sudo docker-compose up
```
<details>
<summary>Check running with CPU only setting</summary>

- remove `--gpus all` from the first command.
- remove `runtime: nvidia` from the `docker-compose.yml` file.
- set `USE_GPU=False` in the `.env`.
</details>


6 Navigate to `http://localhost/docs` or `http://localhost/redoc` in your browser, to explore the REST API interactively. See the examples for how to query the API.
6 If running under `SERVICE_MODE=http`, navigate to `http://localhost/docs` or `http://localhost/redoc` in your browser, to explore the REST API interactively. See the examples for how to query the API. If running under `SERVICE_MODE=task`, plese refers to the individual section in the end of this README.


## Specification for `http://localhost/kpe/{lang}`
Expand Down Expand Up @@ -198,4 +210,33 @@ Component's config can be modified in [`components/config.cfg`](components/confi
```

### Advanced usage
For advanced usage, such as Max Sum Similarity and Maximal Marginal Relevance for diversifying extraction results, please refer to the documentation of [KeyBERT](https://maartengr.github.io/KeyBERT/guides/quickstart.html#usage) and [medium post](https://towardsdatascience.com/keyword-extraction-with-bert-724efca412ea) to know how it works.
For advanced usage, such as Max Sum Similarity and Maximal Marginal Relevance for diversifying extraction results, please refer to the documentation of [KeyBERT](https://maartengr.github.io/KeyBERT/guides/quickstart.html#usage) and [medium post](https://towardsdatascience.com/keyword-extraction-with-bert-724efca412ea) to know how it works.


## Testing Celery mode locally
1 Install Redis on your local machine, and run it with:
```bash
redis-server --protected-mode no --bind 0.0.0.0 --loglevel debug
```

2 Make sure in your `.env`, these two variables are set correctly as `SERVICE_MODE=task` and `SERVICES_BROKER=redis://172.17.0.1:6379`

Then start your docker container with either `docker run` or `docker-compose up` as shown in the previous section.

3 On your local computer, run this python script:
```python
from celery import Celery
celery = Celery(broker='redis://localhost:6379/0', backend='redis://localhost:6379/1')
r = celery.send_task(
'kpe_task',
(
'en',
[
"Apple Inc. is an American multinational technology company that specializes in consumer electronics, computer software and online services.",
"Unsupervised learning is a type of machine learning in which the algorithm is not provided with any pre-assigned labels or scores for the training data. As a result, unsupervised learning algorithms must first self-discover any naturally occurring patterns in that training data set."
],
{"kpe": {"top_n": 3}}
),
queue='kpe')
r.get()
```

0 comments on commit 0049288

Please sign in to comment.