From 556f2d0c0271a7595ec174c3bbe5f15638368069 Mon Sep 17 00:00:00 2001 From: Fiete Ostkamp Date: Sun, 30 May 2021 11:20:53 +0200 Subject: [PATCH 1/2] add index_to_name.json, update torch-model-archiver command to include the file --- README.md | 15 ++++++++------- index_to_name.json | 28 ++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+), 7 deletions(-) create mode 100644 index_to_name.json diff --git a/README.md b/README.md index b69aa12..946d02e 100644 --- a/README.md +++ b/README.md @@ -113,16 +113,17 @@ data = _service.preprocess(data) return data except Exception as e: raise e - -``` -TorcheServe uses a format called MAR (Model Archive). We can convert our PyTorch model to a .mar file using this command: ``` -torch-model-archiver --model-name "bert" --version 1.0 --serialized-file ./bert_model/pytorch_model.bin --extra-files "./bert_model/config.json,./bert_model/vocab.txt" --handler "./handler.py" +Create a new model directory: +``` bash +mkdir model_store ``` -Move the .mar file into a new directory:  -``` -mkdir model_store && mv bert.mar model_store +TorcheServe uses a format called `MAR` (Model Archive). We can convert our PyTorch model to a `.mar` file using this command: +``` bash +torch-model-archiver --model-name "bert" --version 1.0 --serialized-file ./bert_model/pytorch_model.bin --extra-files "./bert_model/config.json,./bert_model/vocab.txt,./bert_model/index_to_name.json" --handler "./handler.py" --export-path "model_store/" ``` +The resulting mar file will be stored in the `model_store` directory we just created. + Finally, we can start TorchServe using the command:  ``` torchserve --start --model-store model_store --models bert=bert.mar diff --git a/index_to_name.json b/index_to_name.json new file mode 100644 index 0000000..089ef20 --- /dev/null +++ b/index_to_name.json @@ -0,0 +1,28 @@ +{ + "0":"Business Ethics", + "1":"Data Security", + "2":"Access and Affordability", + "3":"Business Model Resilience", + "4":"Competitive Behavior", + "5":"Critical Incident Risk Management", + "6":"Customer Welfare", + "7":"Director Removal", + "8":"Employee Engagement Inclusion And Diversity", + "9":"Employee Health And Safety", + "10":"Human Rights And Community Relations", + "11":"Labor Practices", + "12":"Management Of Legal And Regulatory Framework", + "13":"Physical Impacts Of Climate Change", + "14":"Product Quality And Safety", + "15":"Product Design And Lifecycle Management", + "16":"Selling Practices And Product Labeling", + "17":"Supply Chain Management", + "18":"Systemic Risk Management", + "19":"Waste And Hazardous Materials Management", + "20":"Water And Wastewater Management", + "21":"Air Quality", + "22":"Customer Privacy", + "23":"Ecological Impacts", + "24":"Energy Management", + "25":"GHG Emissions" +} \ No newline at end of file From 2f786a3074dc69a42af78ad2ad1f1cf2f10ad50d Mon Sep 17 00:00:00 2001 From: Fiete Ostkamp Date: Sun, 30 May 2021 11:22:42 +0200 Subject: [PATCH 2/2] structure the README with headings, highlight file names --- README.md | 73 ++++++++++++++++++++++--------------------------------- 1 file changed, 29 insertions(+), 44 deletions(-) diff --git a/README.md b/README.md index 946d02e..62d9b8f 100644 --- a/README.md +++ b/README.md @@ -5,32 +5,26 @@ Read more about this pre-trained model [here.](https://towardsdatascience.com/nl **In collaboration with [Charan Pothireddi](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/) and [Parabole.ai](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/)** +## Prerequisites The further pre-trained ESG-BERT model can be found [here](https://drive.google.com/drive/folders/1yfNpMvByz3fJMsOqir3SerS6PwsRS2rt?usp=sharing) at this GitHub repository. It is a PyTorch model but it can be converted into a Tensorflow model. They can be fine-tuned using either framework. I found the PyTorch framework to be a lot cleaner, and easier to replicate with other models. However, serving the final fine-tuned model is a lot easier on TensorFlow, than on PyTorch.  -You can download the ESG-BERT model (named pytorch_model.bin) along with config.json and vovab.txt fles here. BERT base model was further pre-trained on Sustainable Investing text corpus, resulting in a domain specific model. You need the all of those 3 files for fine-tuning. +You can download the ESG-BERT model (named `pytorch_model.bin`) along with `config.json` and `vocab.txt` files here. The BERT base model was further pre-trained on Sustainable Investing text corpus, resulting in a domain specific model. You need the all of those 3 files for fine-tuning. -For fine-tuning the model, you can use this command to load it into PyTorch.  -``` -model = BertForSequenceClassification.from_pretrained( - 'path/to/dir/containing/ESG-BERT', - num_labels = num, #number of classifications - output_attentions = False, # Whether the model returns attentions weights. - output_hidden_states = False, # Whether the model returns all hidden-states. -) -model.to(device) - -``` The fine-tuned model for text classification is also available [here](https://drive.google.com/drive/folders/1Qz4HP3xkjLfJ6DGCFNeJ7GmcPq65_HVe?usp=sharing). It can be used directly to make predictions using just a few steps.  -First, download the fine-tuned pytorch_model.bin, config.json, and vocab.txt into your local directory. Make sure to place all of them into the same directory, mine is called "bert_model".  +First, download the fine-tuned `pytorch_model.bin`, `config.json`, and `vocab.txt` into your local directory. Make sure to place all of them into the same directory, mine is called `bert_model`. + +### Install dependencies JDK 11 is needed to serve the model. Go ahead and install it from the Oracle downloads page. Now we are ready to set up TorcheServe. TorchServe is a model serving architecture for PyTorch models, go ahead and install that using pip. You can also use conda for the installation. We also need pytorch and transformers installed. -``` +``` bash pip install torchserve torch-model-archiver pip install torchvision pip install transformers ``` -Next up, we'll set up the handler script. It is a basic handler for text classification that can be improved upon. Save this script as "handler.py" in your directory. [1] -``` + +### Set up the handler script +Next up, we'll set up the handler script. It is a basic handler for text classification that can be improved upon. Save this script as `handler.py` in your directory. [1] +``` python from abc import ABC import json import logging @@ -113,7 +107,9 @@ data = _service.preprocess(data) return data except Exception as e: raise e + ``` +## Creating a torchserve model archive Create a new model directory: ``` bash mkdir model_store @@ -124,44 +120,33 @@ torch-model-archiver --model-name "bert" --version 1.0 --serialized-file ./bert_ ``` The resulting mar file will be stored in the `model_store` directory we just created. +## Serve the model Finally, we can start TorchServe using the command:  ``` torchserve --start --model-store model_store --models bert=bert.mar ``` + +## Test the model We can now query the model from another terminal window using the Inference API. We pass a text file containing text that the model will try to classify.  ``` curl -X POST http://127.0.0.1:8080/predictions/bert -T predict.txt ``` -This returns a label number which correlates to a textual label. This is stored in the label_dict.txt dictionary file.  -``` -__label__Business_Ethics : 0 -__label__Data_Security : 1 -__label__Access_And_Affordability : 2 -__label__Business_Model_Resilience : 3 -__label__Competitive_Behavior : 4 -__label__Critical_Incident_Risk_Management : 5 -__label__Customer_Welfare : 6 -__label__Director_Removal : 7 -__label__Employee_Engagement_Inclusion_And_Diversity : 8 -__label__Employee_Health_And_Safety : 9 -__label__Human_Rights_And_Community_Relations : 10 -__label__Labor_Practices : 11 -__label__Management_Of_Legal_And_Regulatory_Framework : 12 -__label__Physical_Impacts_Of_Climate_Change : 13 -__label__Product_Quality_And_Safety : 14 -__label__Product_Design_And_Lifecycle_Management : 15 -__label__Selling_Practices_And_Product_Labeling : 16 -__label__Supply_Chain_Management : 17 -__label__Systemic_Risk_Management : 18 -__label__Waste_And_Hazardous_Materials_Management : 19 -__label__Water_And_Wastewater_Management : 20 -__label__Air_Quality : 21 -__label__Customer_Privacy : 22 -__label__Ecological_Impacts : 23 -__label__Energy_Management : 24 -__label__GHG_Emissions : 25 +This returns a textual label, defined in the `index_to_name.json` file. + +## Fine-tuning the model yourself +For fine-tuning the model, you can use this command to load it into PyTorch.  +``` python +model = BertForSequenceClassification.from_pretrained( + 'path/to/dir/containing/ESG-BERT', + num_labels = num, #number of classifications + output_attentions = False, # Whether the model returns attentions weights. + output_hidden_states = False, # Whether the model returns all hidden-states. +) +model.to(device) ``` + + References: [1] - ---