diff --git a/docs/module_usage/tutorials/cv_modules/face_detection.md b/docs/module_usage/tutorials/cv_modules/face_detection.md index d5b7084ac..18aa38c00 100644 --- a/docs/module_usage/tutorials/cv_modules/face_detection.md +++ b/docs/module_usage/tutorials/cv_modules/face_detection.md @@ -10,11 +10,16 @@
👉模型列表详情 -|模型|mAP(%)|GPU推理耗时(ms)|CPU推理耗时 (ms)|模型存储大小(M)|介绍| -|-|-|-|-|-|-| -|PicoDet_LCNet_x2_5_face|35.8|33.7|537.0|28.9|基于PicoDet_LCNet_x2_5的人脸检测模型| +| 模型 | AP (%)
Easy/Medium/Hard | GPU推理耗时 (ms) | CPU推理耗时 | 模型存储大小 (M) | 介绍 | +|:-:|:-:|:-:|:-:|:-:|:-:| +| BlazeFace | 77.7/73.4/49.5 | | | 0.447 | | +| BlazeFace-FPN-SSH | 83.2/80.5/60.5 | | | 0.606 | BlazeFace的改进模型,增加FPN和SSH结构 | +| PicoDet_LCNet_x2_5_face | 93.7/90.7/68.1 | | | 28.9 | 基于PicoDet_LCNet_x2_5的人脸检测模型 | +| PP-YOLOE_plus-S_face | 93.9/91.8/79.8 | | | 26.5 | 基于PP-YOLOE_plus-S的人脸检测模型 | + +注:以上精度指标是在WIDER-FACE验证集上,以640 +*640作为输入尺寸评估得到的。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。 -注:以上精度指标为wider_face数据集 mAP(0.5:0.95)。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。
## 三、快速集成 @@ -245,7 +250,7 @@ python main.py -c paddlex/configs/face_detection/PicoDet_LCNet_x2_5_face.yaml \ 1.**产线集成** -人脸目标检测模块可以集成的PaddleX产线有**人脸识别**(comming soon),只需要替换模型路径即可完成相关产线的人脸检测模块的模型更新。在产线集成中,你可以使用高性能部署和服务化部署来部署你得到的模型。 +人脸目标检测模块可以集成的PaddleX产线有[**人脸识别**](../../../pipeline_usage/tutorials/face_recognition_pipelines/face_recognition.md),只需要替换模型路径即可完成相关产线的人脸检测模块的模型更新。在产线集成中,你可以使用高性能部署和服务化部署来部署你得到的模型。 2.**模块集成** diff --git a/docs/module_usage/tutorials/cv_modules/face_detection_en.md b/docs/module_usage/tutorials/cv_modules/face_detection_en.md index 746cb9ecd..cccab5c6b 100644 --- a/docs/module_usage/tutorials/cv_modules/face_detection_en.md +++ b/docs/module_usage/tutorials/cv_modules/face_detection_en.md @@ -10,11 +10,14 @@ Face detection is a fundamental task in object detection, aiming to automaticall
👉Model List Details -| Model | mAP(%)| GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description | -|-|-|-|-|-|-| -| PicoDet_LCNet_x2_5_face | 35.8 | 33.7 | 537.0 | 28.9 | Face detection model based on PicoDet_LCNet_x2_5 | - -**Note: The evaluation set for the above accuracy metrics is wider_face dataset mAP(0.5:0.95). GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.** +| Model | AP (%)
Easy/Medium/Hard | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description | +|:-:|:-:|:-:|:-:|:-:|:-:| +| BlazeFace | 77.7/73.4/49.5 | | | 0.447 | | +| BlazeFace-FPN-SSH | 83.2/80.5/60.5 | | | 0.606 | An improved model of BlazeFace, incorporating FPN and SSH structures | +| PicoDet_LCNet_x2_5_face | 93.7/90.7/68.1 | | | 28.9 | Face Detection model based on PicoDet_LCNet_x2_5 | +| PP-YOLOE_plus-S_face | 93.9/91.8/79.8 | | | 26.5 |Face Detection model based on PP-YOLOE_plus-S | + +**Note: The above accuracy metrics are evaluated on the WIDER-FACE validation set with an input size of 640*640. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
## III. Quick Integration @@ -242,7 +245,7 @@ The model can be directly integrated into the PaddleX pipeline or into your own 1. **Pipeline Integration** -The face detection module can be integrated into PaddleX pipelines such as **Face Recognition** (coming soon). Simply replace the model path to update the face detection module of the relevant pipeline. In pipeline integration, you can use high-performance inference and service-oriented deployment to deploy your model. +The face detection module can be integrated into PaddleX pipelines such as [**Face Recognition**](../../../pipeline_usage/tutorials/face_recognition_pipelines/face_recognition_en.md). Simply replace the model path to update the face detection module of the relevant pipeline. In pipeline integration, you can use high-performance inference and service-oriented deployment to deploy your model. 2. **Module Integration** diff --git a/docs/module_usage/tutorials/cv_modules/face_recognition.md b/docs/module_usage/tutorials/cv_modules/face_recognition.md new file mode 100644 index 000000000..22655040d --- /dev/null +++ b/docs/module_usage/tutorials/cv_modules/face_recognition.md @@ -0,0 +1,235 @@ +简体中文 | [English](face_recognition_en.md) + +# 人脸识别模块使用教程 + +## 一、概述 +人脸识别模型通常以经过检测提取和关键点矫正处理的标准化人脸图像作为输入。人脸识别模型从这些图像中提取具有高度辨识性的人脸特征,以便供后续模块使用,如人脸匹配和验证等任务。 + +## 二、支持模型列表 + +
+ 👉模型列表详情 + +| 模型 | 输出特征维度 | AP (%)
AgeDB-30/CFP-FP/LFW | GPU推理耗时 (ms) | CPU推理耗时 | 模型存储大小 (M) | 介绍 | +|---------------|--------|-------------------------------|--------------|---------|------------|-------------------------------------| +| MobileFaceNet | 128 | 96.28/96.71/99.58 | | | 4.1 | 基于MobileFaceNet在MS1Mv3数据集上训练的人脸识别模型 | +| ResNet50 | 512 | 98.12/98.56/99.77 | | | 87.2 | 基于ResNet50在MS1Mv3数据集上训练的人脸识别模型 | + +注:以上精度指标是分别在AgeDB-30、CFP-FP和LFW数据集上测得的Accuracy。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。 +
+ +## 三、快速集成 +> ❗ 在快速集成前,请先安装 PaddleX 的 wheel 包,详细请参考 [PaddleX本地安装教程](../../../installation/installation.md) + +完成whl包的安装后,几行代码即可完成人脸识别模块的推理,可以任意切换该模块下的模型,您也可以将人脸识别的模块中的模型推理集成到您的项目中。运行以下代码前,请您下载[示例图片](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/face_classification_001.jpg)到本地。 + +```python +from paddlex import create_model + +model_name = "MobileFaceNet" + +model = create_model(model_name) +output = model.predict("face_classification_001.jpg", batch_size=1) + +for res in output: + res.print(json_format=False) + res.save_to_json("./output/res.json") +``` + +关于更多 PaddleX 的单模型推理的 API 的使用方法,可以参考[PaddleX单模型Python脚本使用说明](../../instructions/model_python_API.md)。 +## 四、二次开发 +如果你追求更高精度的现有模型,可以使用PaddleX的二次开发能力,开发更好的人脸识别模型。在使用PaddleX开发人脸识别模型之前,请务必安装PaddleX的PaddleClas插件,安装过程可以参考 [PaddleX本地安装教程](../../../installation/installation.md) + +### 4.1 数据准备 +在进行模型训练前,需要准备相应任务模块的数据集。PaddleX 针对每一个模块提供了数据校验功能,**只有通过数据校验的数据才可以进行模型训练**。此外,PaddleX为每一个模块都提供了demo数据集,您可以基于官方提供的 Demo 数据完成后续的开发。若您希望用私有数据集进行后续的模型训练,人脸识别模块的训练数据集采取通用图像分类数据集格式组织,可以参考[PaddleX图像分类任务模块数据标注教程](../../../data_annotations/cv_modules/image_classification.md)。若您希望用私有数据集进行后续的模型评估,请注意人脸识别模块的验证数据集格式与训练数据集的方式有所不同,请参考[4.1.4节 人脸识别模块验证集数据组织方式](#414-人脸识别模块验证集数据组织方式) + +#### 4.1.1 Demo 数据下载 +您可以参考下面的命令将 Demo 数据集下载到指定文件夹: + +```bash +cd /path/to/paddlex +wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/face_rec_examples.tar -P ./dataset +tar -xf ./dataset/face_rec_examples.tar -C ./dataset/ +``` +#### 4.1.2 数据校验 +一行命令即可完成数据校验: + +```bash +python main.py -c paddlex/configs/face_recognition/MobileFaceNet.yaml \ + -o Global.mode=check_dataset \ + -o Global.dataset_dir=./dataset/face_rec_examples +``` +执行上述命令后,PaddleX 会对数据集进行校验,并统计数据集的基本信息,命令运行成功后会在log中打印出`Check dataset passed !`信息。校验结果文件保存在`./output/check_dataset_result.json`,同时相关产出会保存在当前目录的`./output/check_dataset`目录下,产出目录中包括可视化的示例样本图片。 + +
+ 👉 校验结果详情(点击展开) + + +校验结果文件具体内容为: + +```bash +{ + "done_flag": true, + "check_pass": true, + "attributes": { + "train_label_file": "../../dataset/face_rec_examples/train/label.txt", + "train_num_classes": 995, + "train_samples": 1000, + "train_sample_paths": [ + "check_dataset/demo_img/01378592.jpg", + "check_dataset/demo_img/04331410.jpg", + "check_dataset/demo_img/03485713.jpg", + "check_dataset/demo_img/02382123.jpg", + "check_dataset/demo_img/01722397.jpg", + "check_dataset/demo_img/02682349.jpg", + "check_dataset/demo_img/00272794.jpg", + "check_dataset/demo_img/03151987.jpg", + "check_dataset/demo_img/01725764.jpg", + "check_dataset/demo_img/02580369.jpg" + ], + "val_label_file": "../../dataset/face_rec_examples/val/pair_label.txt", + "val_num_classes": 2, + "val_samples": 500 + }, + "analysis": {}, + "dataset_path": "./dataset/face_rec_examples", + "show_type": "image", + "dataset_type": "ClsDataset" +} +``` +上述校验结果中,`check_pass` 为 `True` 表示数据集格式符合要求,其他部分指标的说明如下: + +* `attributes.train_num_classes`:该数据集训练类别数为 995; +* `attributes.val_num_classes`:该数据集验证类别数为 2; +* `attributes.train_samples`:该数据集训练集样本数量为 1000; +* `attributes.val_samples`:该数据集验证集样本数量为 500; +* `attributes.train_sample_paths`:该数据集训练集样本可视化图片相对路径列表; + + +
+ +#### 4.1.3 数据集格式转换/数据集划分(可选) +在您完成数据校验之后,可以通过**修改配置文件**或是**追加超参数**的方式对数据集的格式进行转换,也可以对数据集的训练/验证比例进行重新划分。 + +
+ 👉 格式转换/数据集划分详情(点击展开) + +人脸识别模块不支持数据格式转换与数据集划分。 + +
+ +#### 4.1.4 人脸识别模块验证集数据组织方式 + +人脸识别模块验证数据集与训练数据集格式不同,若需要在私有数据上评估模型精度,请按照如下方式组织自己的数据集: + +```bash +face_rec_dataroot # 数据集根目录,目录名称可以改变 +├── train # 训练数据集的保存目录,目录名称不可以改变 + ├── images # 图像的保存目录,目录名称可以改变,但要注意与label.txt中的内容对应 + ├── xxx.jpg # 人脸图像文件 + ├── xxx.jpg # 人脸图像文件 + ... + ├── label.txt # 训练集标注文件,文件名称不可改变。每行给出图像相对`train`的路径和人脸图像类别(人脸身份)id,使用空格分隔,内容举例:images/image_06765.jpg 0 +├── val # 验证数据集的保存目录,目录名称不可以改变 + ├── images # 图像的保存目录,目录名称可以改变,但要注意与pari_label.txt中的内容对应 + ├── xxx.jpg # 人脸图像文件 + ├── xxx.jpg # 人脸图像文件 + ... + └── pair_label.txt # 验证数据集标注文件,文件名称不可改变。每行给出两个要对比的图像路径和一个表示该对图像是否属于同一个人的0、1标签,使用空格分隔。 +``` + +验证集标注文件`pair_label.txt`的内容示例: + +```bash +# 人脸图像1.jpg 人脸图像2.jpg 标签(0表示该行的两个人脸图像文件不属于同一个人,1表示属于同一个人) +images/Angela_Merkel_0001.jpg images/Angela_Merkel_0002.jpg 1 +images/Bruce_Gebhardt_0001.jpg images/Masao_Azuma_0001.jpg 0 +images/Francis_Ford_Coppola_0001.jpg images/Francis_Ford_Coppola_0002.jpg 1 +images/Jason_Kidd_0006.jpg images/Jason_Kidd_0008.jpg 1 +images/Miyako_Miyazaki_0002.jpg images/Munir_Akram_0002.jpg 0 +``` + +### 4.2 模型训练 +一条命令即可完成模型的训练,以此处MobileFaceNet的训练为例: + +```bash +python main.py -c paddlex/configs/face_recognition/MobileFaceNet.yaml \ + -o Global.mode=train \ + -o Global.dataset_dir=./dataset/face_rec_examples +``` +需要如下几步: + +* 指定模型的`.yaml` 配置文件路径(此处为`MobileFaceNet.yaml`) +* 指定模式为模型训练:`-o Global.mode=train` +* 指定训练数据集路径:`-o Global.dataset_dir` +其他相关参数均可通过修改`.yaml`配置文件中的`Global`和`Train`下的字段来进行设置,也可以通过在命令行中追加参数来进行调整。如指定前 2 卡 gpu 训练:`-o Global.device=gpu:0,1`;设置训练轮次数为 10:`-o Train.epochs_iters=10`。更多可修改的参数及其详细解释,可以查阅模型对应任务模块的配置文件说明[PaddleX通用模型配置文件参数说明](../../instructions/config_parameters_common.md)。 + +
+ 👉 更多说明(点击展开) + + +* 模型训练过程中,PaddleX 会自动保存模型权重文件,默认为`output`,如需指定保存路径,可通过配置文件中 `-o Global.output` 字段进行设置。 +* PaddleX 对您屏蔽了动态图权重和静态图权重的概念。在模型训练的过程中,会同时产出动态图和静态图的权重,在模型推理时,默认选择静态图权重推理。 +* 训练其他模型时,需要的指定相应的配置文件,模型和配置的文件的对应关系,可以查阅[PaddleX模型列表(CPU/GPU)](../../../support_list/models_list.md)。 +在完成模型训练后,所有产出保存在指定的输出目录(默认为`./output/`)下,通常有以下产出: + +* `train_result.json`:训练结果记录文件,记录了训练任务是否正常完成,以及产出的权重指标、相关文件路径等; +* `train.log`:训练日志文件,记录了训练过程中的模型指标变化、loss 变化等; +* `config.yaml`:训练配置文件,记录了本次训练的超参数的配置; +* `.pdparams`、`.pdema`、`.pdopt.pdstate`、`.pdiparams`、`.pdmodel`:模型权重相关文件,包括网络参数、优化器、EMA、静态图网络参数、静态图网络结构等; +
+ +## **4.3 模型评估** +在完成模型训练后,可以对指定的模型权重文件在验证集上进行评估,验证模型精度。使用 PaddleX 进行模型评估,一条命令即可完成模型的评估: + +```bash +python main.py -c paddlex/configs/face_recognition/MobileFaceNet.yaml \ + -o Global.mode=evaluate \ + -o Global.dataset_dir=./dataset/face_rec_examples +``` +与模型训练类似,需要如下几步: + +* 指定模型的`.yaml` 配置文件路径(此处为`MobileFaceNet.yaml`) +* 指定模式为模型评估:`-o Global.mode=evaluate` +* 指定验证数据集路径:`-o Global.dataset_dir` +其他相关参数均可通过修改`.yaml`配置文件中的`Global`和`Evaluate`下的字段来进行设置,详细请参考[PaddleX通用模型配置文件参数说明](../../instructions/config_parameters_common.md)。 + +
+ 👉 更多说明(点击展开) + + +在模型评估时,需要指定模型权重文件路径,每个配置文件中都内置了默认的权重保存路径,如需要改变,只需要通过追加命令行参数的形式进行设置即可,如`-o Evaluate.weight_path=``./output/best_model/best_model/model.pdparams`。 + +在完成模型评估后,会产出`evaluate_result.json,其记录了`评估的结果,具体来说,记录了评估任务是否正常完成,以及模型的评估指标,包含 Accuracy; + +
+ +### **4.4 模型推理** +在完成模型的训练和评估后,即可使用训练好的模型权重进行推理预测。在PaddleX中实现模型推理预测可以通过两种方式:命令行和wheel 包。 + +#### 4.4.1 模型推理 +* 通过命令行的方式进行推理预测,只需如下一条命令,运行以下代码前,请您下载[示例图片](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/face_classification_001.jpg)到本地。 +```bash +python main.py -c paddlex/configs/face_recognition/MobileFaceNet.yaml \ + -o Global.mode=predict \ + -o Predict.model_dir="./output/best_model/inference" \ + -o Predict.input="face_classification_001.jpg" +``` +与模型训练和评估类似,需要如下几步: + +* 指定模型的`.yaml` 配置文件路径(此处为`MobileFaceNet.yaml`) +* 指定模式为模型推理预测:`-o Global.mode=predict` +* 指定模型权重路径:`-o Predict.model_dir="./output/best_model/inference"` +* 指定输入数据路径:`-o Predict.input="..."` +其他相关参数均可通过修改`.yaml`配置文件中的`Global`和`Predict`下的字段来进行设置,详细请参考[PaddleX通用模型配置文件参数说明](../../instructions/config_parameters_common.md)。 + +#### 4.4.2 模型集成 +模型可以直接集成到 PaddleX 产线中,也可以直接集成到您自己的项目中。 + +1.**产线集成** + +人脸识别模块可以集成的PaddleX产线有[**人脸识别**](../../../pipeline_usage/tutorials/face_recognition_pipelines/face_recognition.md),只需要替换模型路径即可完成相关产线的人脸识别模块的模型更新。在产线集成中,你可以使用高性能部署和服务化部署来部署你得到的模型。 + +2.**模块集成** + +您产出的权重可以直接集成到人脸识别模块中,可以参考[快速集成](#三快速集成)的 Python 示例代码,只需要将模型替换为你训练的到的模型路径即可。 diff --git a/docs/module_usage/tutorials/cv_modules/face_recognition_en.md b/docs/module_usage/tutorials/cv_modules/face_recognition_en.md new file mode 100644 index 000000000..fbb386bdd --- /dev/null +++ b/docs/module_usage/tutorials/cv_modules/face_recognition_en.md @@ -0,0 +1,233 @@ +English | [简体中文](face_recognition.md) + +# Face Recognition Module Usage Tutorial + +## I. Overview +Face recognition models typically take standardized face images processed through detection, extraction, and keypoint correction as input. These models extract highly discriminative facial features from these images for subsequent modules, such as face matching and verification tasks. + +## II. Supported Model List + +
+ 👉Details of Model List + +| Model | Output Feature Dimension | AP (%)
AgeDB-30/CFP-FP/LFW | GPU Inference Time (ms) | CPU Inference Time | Model Size (M) | Description | +|---------------|--------|-------------------------------|--------------|---------|------------|-------------------------------------| +| MobileFaceNet | 128 | 96.28/96.71/99.58 | | | 4.1 | Face recognition model trained on MobileFaceNet with MS1Mv3 dataset | +| ResNet50 | 512 | 98.12/98.56/99.77 | | | 87.2 | Face recognition model trained on ResNet50 with MS1Mv3 dataset | + +Note: The above accuracy metrics are Accuracy scores measured on the AgeDB-30, CFP-FP, and LFW datasets, respectively. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. +
+ +## III. Quick Integration +> ❗ Before quick integration, please install the PaddleX wheel package. For details, refer to the [PaddleX Local Installation Tutorial](../../../installation/installation_en.md) + +After installing the whl package, a few lines of code can complete the inference of the face recognition module. You can switch models under this module freely, and you can also integrate the model inference of the face recognition module into your project. Before running the following code, please download the [example image](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/face_classification_001.jpg) to your local machine. + +```python +from paddlex import create_model + +model_name = "MobileFaceNet" + +model = create_model(model_name) +output = model.predict("face_classification_001.jpg", batch_size=1) + +for res in output: + res.print(json_format=False) + res.save_to_json("./output/res.json") +``` + +For more information on using the PaddleX single-model inference API, refer to the [PaddleX Single Model Python Script Usage Instructions](../../instructions/model_python_API_en.md). + +## IV. Custom Development +If you aim for higher accuracy with existing models, you can leverage PaddleX's custom development capabilities to develop better face recognition models. Before developing face recognition models with PaddleX, ensure you have installed the PaddleX PaddleClas plugin. The installation process can be found in the [PaddleX Local Installation Tutorial](../../../installation/installation_en.md) + +### 4.1 Data Preparation +Before model training, you need to prepare the dataset for the corresponding task module. PaddleX provides data validation functionality for each module, and **only data that passes validation can be used for model training**. Additionally, PaddleX provides demo datasets for each module, allowing you to complete subsequent development based on the official demo data. If you wish to use a private dataset for subsequent model training, the training dataset for the face recognition module is organized in a general image classification dataset format. You can refer to the [PaddleX Image Classification Task Module Data Annotation Tutorial](../../../data_annotations/cv_modules/image_classification_en.md). If you wish to use a private dataset for subsequent model evaluation, note that the validation dataset format for the face recognition module differs from the training dataset format. Please refer to [Section 4.1.4 Face Recognition Module Validation Set Data Organization](#414-face-recognition-module-validation-set-data-organization) + +#### 4.1.1 Demo Data Download +You can use the following commands to download the demo dataset to a specified folder: + +```bash +cd /path/to/paddlex +wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/face_rec_examples.tar -P ./dataset +tar -xf ./dataset/face_rec_examples.tar -C ./dataset/ +``` +#### 4.1.2 Data Validation +A single command can complete data validation: + +```bash +python main.py -c paddlex/configs/face_recognition/MobileFaceNet.yaml \ + -o Global.mode=check_dataset \ + -o Global.dataset_dir=./dataset/face_rec_examples +``` + +After executing the above command, PaddleX will validate the dataset and collect its basic information. Upon successful execution, the log will print the message `Check dataset passed !`. The validation result file will be saved in `./output/check_dataset_result.json`, and related outputs will be saved in the `./output/check_dataset` directory of the current directory. The output directory includes visualized example images and histograms of sample distributions. + +
+ 👉 Validation Result Details (Click to Expand) + +The specific content of the validation result file is: + +```bash +{ + "done_flag": true, + "check_pass": true, + "attributes": { + "train_label_file": "../../dataset/face_rec_examples/train/label.txt", + "train_num_classes": 995, + "train_samples": 1000, + "train_sample_paths": [ + "check_dataset/demo_img/01378592.jpg", + "check_dataset/demo_img/04331410.jpg", + "check_dataset/demo_img/03485713.jpg", + "check_dataset/demo_img/02382123.jpg", + "check_dataset/demo_img/01722397.jpg", + "check_dataset/demo_img/02682349.jpg", + "check_dataset/demo_img/00272794.jpg", + "check_dataset/demo_img/03151987.jpg", + "check_dataset/demo_img/01725764.jpg", + "check_dataset/demo_img/02580369.jpg" + ], + "val_label_file": "../../dataset/face_rec_examples/val/pair_label.txt", + "val_num_classes": 2, + "val_samples": 500 + }, + "analysis": {}, + "dataset_path": "./dataset/face_rec_examples", + "show_type": "image", + "dataset_type": "ClsDataset" +} +``` + +The verification results mentioned above indicate that `check_pass` being `True` means the dataset format meets the requirements. Details of other indicators are as follows: + +* `attributes.train_num_classes`: The number of classes in this training dataset is 995; +* `attributes.val_num_classes`: The number of classes in this validation dataset is 2; +* `attributes.train_samples`: The number of training samples in this dataset is 1000; +* `attributes.val_samples`: The number of validation samples in this dataset is 500; +* `attributes.train_sample_paths`: The list of relative paths to the visualization images of training samples in this dataset; + +
+ +#### 4.1.3 Dataset Format Conversion / Dataset Splitting (Optional) +After completing the data validation, you can convert the dataset format and re-split the training/validation ratio by **modifying the configuration file** or **adding hyperparameters**. + +
+ 👉 Details on Format Conversion / Dataset Splitting (Click to Expand) + +The Face Recognition module does not support data format conversion or dataset splitting. + +
+ +#### 4.1.4 Data Organization for Face Recognition Module Validation Set + +The format of the validation dataset for the Face Recognition module differs from the training dataset. If you need to evaluate model accuracy on private data, please organize your dataset as follows: + +```bash +face_rec_dataroot # Root directory of the dataset, the directory name can be changed +├── train # Directory for saving the training dataset, the directory name cannot be changed + ├── images # Directory for saving images, the directory name can be changed but should correspond to the content in label.txt + ├── xxx.jpg # Face image file + ├── xxx.jpg # Face image file + ... + ├── label.txt # Training set annotation file, the file name cannot be changed. Each line gives the relative path of the image to `train` and the face image class (face identity) id, separated by a space. Example content: images/image_06765.jpg 0 +├── val # Directory for saving the validation dataset, the directory name cannot be changed + ├── images # Directory for saving images, the directory name can be changed but should correspond to the content in pair_label.txt + ├── xxx.jpg # Face image file + ├── xxx.jpg # Face image file + ... + └── pair_label.txt # Validation dataset annotation file, the file name cannot be changed. Each line gives the paths of two images to be compared and a 0 or 1 label indicating whether the pair of images belong to the same person, separated by spaces. +``` + +Example content of the validation set annotation file `pair_label.txt`: + +```bash +# Face image 1.jpg Face image 2.jpg Label (0 indicates the two face images do not belong to the same person, 1 indicates they do) +images/Angela_Merkel_0001.jpg images/Angela_Merkel_0002.jpg 1 +images/Bruce_Gebhardt_0001.jpg images/Masao_Azuma_0001.jpg 0 +images/Francis_Ford_Coppola_0001.jpg images/Francis_Ford_Coppola_0002.jpg 1 +images/Jason_Kidd_0006.jpg images/Jason_Kidd_0008.jpg 1 +images/Miyako_Miyazaki_0002.jpg images/Munir_Akram_0002.jpg 0 +``` + +### 4.2 Model Training +Model training can be completed with a single command. Here is an example of training MobileFaceNet: + +```bash +python main.py -c paddlex/configs/face_recognition/MobileFaceNet.yaml \ + -o Global.mode=train \ + -o Global.dataset_dir=./dataset/face_rec_examples +``` +The steps required are: + +* Specify the path to the `.yaml` configuration file for the model (here it is `MobileFaceNet.yaml`) +* Specify the mode as model training: `-o Global.mode=train` +* Specify the path to the training dataset: `-o Global.dataset_dir` +Other related parameters can be set by modifying the `Global` and `Train` fields in the `.yaml` configuration file or by appending parameters in the command line. For example, to specify the first two GPUs for training: `-o Global.device=gpu:0,1`; to set the number of training epochs to 10: `-o Train.epochs_iters=10`. For more modifiable parameters and their detailed explanations, refer to the configuration file instructions for the corresponding task module [PaddleX Common Configuration Parameters for Model Tasks](../../instructions/config_parameters_common_en.md). + +
+ 👉 More Details (Click to Expand) + +* During model training, PaddleX automatically saves model weight files, defaulting to `output`. To specify a save path, use the `-o Global.output` field in the configuration file. +* PaddleX shields you from the concepts of dynamic graph weights and static graph weights. During model training, both dynamic and static graph weights are produced, and static graph weights are selected by default for model inference. +* When training other models, specify the corresponding configuration file. The correspondence between models and configuration files can be found in the [PaddleX Model List (CPU/GPU)](../../../support_list/models_list_en.md). +After completing model training, all outputs are saved in the specified output directory (default is `./output/`). Typically, the following outputs are included: +* `train_result.json`: A file that records the training results, indicating whether the training task was successfully completed, and includes metrics, paths to related files, etc. +* `train.log`: A log file that records changes in model metrics, loss variations, and other details during the training process. +* `config.yaml`: A configuration file that logs the hyperparameter settings for the current training session. +* `.pdparams`, `.pdema`, `.pdopt.pdstate`, `.pdiparams`, `.pdmodel`: Files related to model weights, including network parameters, optimizer, EMA (Exponential Moving Average), static graph network parameters, and static graph network structure. +
+ +### **4.3 Model Evaluation** +After completing model training, you can evaluate the specified model weight file on the validation set to verify the model's accuracy. Using PaddleX for model evaluation, you can complete the evaluation with a single command: + +```bash +python main.py -c paddlex/configs/face_detection/MobileFaceNet.yaml \ + -o Global.mode=evaluate \ + -o Global.dataset_dir=./dataset/face_rec_examples +``` +Similar to model training, the process involves the following steps: + +* Specify the path to the `.yaml` configuration file for the model(here it's `MobileFaceNet.yaml`) +* Set the mode to model evaluation: `-o Global.mode=evaluate` +* Specify the path to the validation dataset: `-o Global.dataset_dir` +Other related parameters can be configured by modifying the fields under `Global` and `Evaluate` in the `.yaml` configuration file. For detailed information, please refer to [PaddleX Common Configuration Parameters for Models](../../instructions/config_parameters_common_en.md)。 + +
+ 👉 More Details (Click to Expand) + +During model evaluation, the path to the model weights file needs to be specified. Each configuration file has a default weight save path built in. If you need to change it, you can set it by appending a command line parameter, such as `-o Evaluate.weight_path="./output/best_model/best_model/model.pdparams"`. + +After completing the model evaluation, an `evaluate_result.json` file will be produced, which records the evaluation results. Specifically, it records whether the evaluation task was completed normally and the model's evaluation metrics, including Accuracy. + +
+ +### **4.4 Model Inference** +After completing model training and evaluation, you can use the trained model weights for inference predictions. In PaddleX, model inference predictions can be implemented through two methods: command line and wheel package. + +#### 4.4.1 Model Inference +* To perform inference predictions through the command line, you only need the following command. Before running the following code, please download the [example image](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/face_classification_001.jpg) to your local machine. +```bash +python main.py -c paddlex/configs/face_recognition/MobileFaceNet.yaml \ + -o Global.mode=predict \ + -o Predict.model_dir="./output/best_model/inference" \ + -o Predict.input="face_classification_001.jpg" +``` +Similar to model training and evaluation, the following steps are required: + +* Specify the path to the model's `.yaml` configuration file (here it is `MobileFaceNet.yaml`) +* Specify the mode as model inference prediction: `-o Global.mode=predict` +* Specify the path to the model weights: `-o Predict.model_dir="./output/best_model/inference"` +* Specify the path to the input data: `-o Predict.input="..."` +Other related parameters can be set by modifying the fields under `Global` and `Predict` in the `.yaml` configuration file. For details, please refer to [PaddleX Common Model Configuration File Parameter Description](../../instructions/config_parameters_common.md). + +#### 4.4.2 Model Integration +The model can be directly integrated into the PaddleX pipeline or into your own project. + +1. **Pipeline Integration** + +The face recognition module can be integrated into the PaddleX pipeline for [**Face Recognition**](../../../pipeline_usage/tutorials/face_recognition_pipelines/face_recognition_en.md). You only need to replace the model path to update the face recognition module of the relevant pipeline. In pipeline integration, you can use high-performance deployment and service-oriented deployment to deploy the model you obtained. + +2. **Module Integration** + +The weights you produced can be directly integrated into the face recognition module. You can refer to the Python example code in [Quick Integration](#III.-Quick-Integration) and only need to replace the model with the path to the model you trained. diff --git a/docs/pipeline_usage/tutorials/face_recognition_pipelines/face_recognition.md b/docs/pipeline_usage/tutorials/face_recognition_pipelines/face_recognition.md new file mode 100644 index 000000000..66a2039c4 --- /dev/null +++ b/docs/pipeline_usage/tutorials/face_recognition_pipelines/face_recognition.md @@ -0,0 +1,711 @@ +简体中文 | [English](face_recognition_en.md) + +# 人脸识别产线使用教程 + +## 1. 人脸识别产线介绍 +人脸识别任务是计算机视觉领域的重要组成部分,旨在通过分析和比较人脸特征,实现对个人身份的自动识别。该任务不仅需要检测图像中的人脸,还需要对人脸图像进行特征提取和匹配,从而在数据库中找到对应的身份信息。人脸识别广泛应用于安全认证、监控系统、社交媒体和智能设备等场景。 + +人脸识别产线是专注于解决人脸定位和识别任务的端到端串联系统,可以从图像中快速准确地定位人脸区域、提取人脸特征,并与特征库中预先建立的特征做检索比对,从而确认身份信息。 + +![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/face_recognition/01.png) + +**人脸识别产线中包含了人脸检测模块和人脸识别模块**,每个模块中包含了若干模型,具体使用哪些模型,您可以根据下边的 benchmark 数据来选择。**如您更考虑模型精度,请选择精度较高的模型,如您更考虑模型推理速度,请选择推理速度较快的模型,如您更考虑模型存储大小,请选择存储大小较小的模型**。 + +
+ 👉模型列表详情 + +**人脸检测模块:** + +| 模型 | AP (%)
Easy/Medium/Hard | GPU推理耗时 (ms) | CPU推理耗时 | 模型存储大小 (M) | 介绍 | +|--------------------------|-----------------|--------------|---------|------------|-----------------------------| +| BlazeFace | 77.7/73.4/49.5 | | | 0.447 | | +| BlazeFace-FPN-SSH | 83.2/80.5/60.5 | | | 0.606 | BlazeFace的改进模型,增加FPN和SSH结构 | +| PicoDet_LCNet_x2_5_face | 93.7/90.7/68.1 | | | 28.9 | 基于PicoDet_LCNet_x2_5的人脸检测模型 | +| PP-YOLOE_plus-S_face | 93.9/91.8/79.8 | | | 26.5 | 基于PP-YOLOE_plus-S的人脸检测模型 | + +注:以上精度指标是在WIDER-FACE验证集上,以640 +*640作为输入尺寸评估得到的。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。 + +**人脸识别模块:** + +| 模型 | 输出特征维度 | AP (%)
AgeDB-30/CFP-FP/LFW | GPU推理耗时 (ms) | CPU推理耗时 | 模型存储大小 (M) | 介绍 | +|---------------|--------|-------------------------------|--------------|---------|------------|-------------------------------------| +| MobileFaceNet | 128 | 96.28/96.71/99.58 | | | 4.1 | 基于MobileFaceNet在MS1Mv3数据集上训练的人脸识别模型 | +| ResNet50 | 512 | 98.12/98.56/99.77 | | | 87.2 | 基于ResNet50在MS1Mv3数据集上训练的人脸识别模型 | + +注:以上精度指标是分别在 AgeDB-30、CFP-FP 和 LFW 数据集上测得的 Accuracy。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。 + +
+ +## 2. 快速开始 +PaddleX 所提供的预训练的模型产线均可以快速体验效果,你可以在线体验人脸识别产线的效果,也可以在本地使用命令行或 Python 体验人脸识别产线的效果。 + +### 2.1 在线体验 +您可以[在线体验](https://aistudio.baidu.com/community/app/91660/webUI?source=appMineRecent)人脸识别产线的效果,用官方提供的 Demo 图片进行识别,例如: + +![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/face_recognition/02.png) + +如果您对产线运行的效果满意,可以直接对产线进行集成部署,您可以直接从云端下载部署包,也可以使用[2.2节本地体验](#22-本地体验)的方式。如果不满意,您也可以利用私有数据**对产线中的模型进行在线微调**。 + +### 2.2 本地体验 +> ❗ 在本地使用人脸识别产线前,请确保您已经按照[PaddleX安装教程](../../../installation/installation.md)完成了PaddleX的wheel包安装。 + +#### 2.2.1 命令行方式体验 + +暂不支持命令行体验 +#### 2.2.2 Python脚本方式集成 +请下载[测试图像](https://paddle-model-ecology.bj.bcebos.com/paddlex/demo_data/friends1.jpg)进行测试。 +在该产线的运行示例中需要预先构建人脸特征库,您可以参考如下指令下载官方提供的demo数据[]( )用来后续构建人脸特征库。 +您可以参考下面的命令将 Demo 数据集下载到指定文件夹: + +```bash +cd /path/to/paddlex +wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/face_demo_gallery.tar +tar -xf ./face_demo_gallery.tar +``` + +若您希望用私有数据集建立人脸特征库,可以参考[2.3节 构建特征库的数据组织方式](#23-构建特征库的数据组织方式)。之后通过几行代码即可完成人脸特征库建立和人脸识别产线的快速推理。 + +```python +from paddlex import create_pipeline + +pipeline = create_pipeline(pipeline="face_recognition") + +pipeline.build_index(data_root="face_demo_gallery", index_dir="face_gallery_index") + +output = pipeline.predict("friends1.jpg") +for res in output: + res.print() + res.save_to_img("./output/") +``` + +在上述 Python 脚本中,执行了如下几个步骤: + +(1)实例化 `create_pipeline` 实例化 人脸识别 产线对象。具体参数说明如下: + +|参数|参数说明|参数类型|默认值| +|-|-|-|-| +|`pipeline`|产线名称或是产线配置文件路径。如为产线名称,则必须为 PaddleX 所支持的产线。|`str`|无| +|`device`|产线模型推理设备。支持:“gpu”,“cpu”。|`str`|`gpu`| +|`use_hpip`|是否启用高性能推理,仅当该产线支持高性能推理时可用。|`bool`|`False`| + +(2)调用人脸识别产线对象的 `build_index` 方法,构建人脸特征库。具体参数为说明如下: + +|参数|参数说明|参数类型|默认值| +|-|-|-|-| +|`data_root`|数据集的根目录,数据组织方式参考[2.3节 构建特征库的数据组织方式](#2.3-构建特征库的数据组织方式)|`str`|无| +|`index_dir`|特征库的保存路径。成功调用`build_index`方法后会在改路径下生成两个文件:
`"id_map.pkl"` 保存了图像ID与图像特征标签之间的映射关系;
`“vector.index”`存储了每张图像的特征向量|`str`|无| + +(3)调用人脸识别产线对象的 `predict` 方法进行推理预测:`predict` 方法参数为`x`,用于输入待预测数据,支持多种输入方式,具体示例如下: + +| 参数类型 | 参数说明 | +|---------------|-----------------------------------------------------------------------------------------------------------| +| Python Var | 支持直接传入Python变量,如numpy.ndarray表示的图像数据。 | +| str | 支持传入待预测数据文件路径,如图像文件的本地路径:`/root/data/img.jpg`。 | +| str | 支持传入待预测数据文件URL,如图像文件的网络URL:[示例](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_001.png)。| +| str | 支持传入本地目录,该目录下需包含待预测数据文件,如本地路径:`/root/data/`。 | +| dict | 支持传入字典类型,字典的key需与具体任务对应,如图像分类任务对应\"img\",字典的val支持上述类型数据,例如:`{\"img\": \"/root/data1\"}`。| +| list | 支持传入列表,列表元素需为上述类型数据,如`[numpy.ndarray, numpy.ndarray],[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]`,`[\"/root/data1\", \"/root/data2\"]`,`[{\"img\": \"/root/data1\"}, {\"img\": \"/root/data2/img.jpg\"}]`。| + +(4)调用`predict`方法获取预测结果:`predict` 方法为`generator`,因此需要通过调用获得预测结果,`predict`方法以batch为单位对数据进行预测,因此预测结果为list形式表示的一组预测结果。 + +(5)对预测结果进行处理:每个样本的预测结果均为`dict`类型,且支持打印,或保存为文件,支持保存的类型与具体产线相关,如: + +| 方法 | 说明 | 方法参数 | +|--------------|-----------------------------|--------------------------------------------------------------------------------------------------------| +| print | 打印结果到终端 | `- format_json`:bool类型,是否对输出内容进行使用json缩进格式化,默认为True;
`- indent`:int类型,json格式化设置,仅当format_json为True时有效,默认为4;
`- ensure_ascii`:bool类型,json格式化设置,仅当format_json为True时有效,默认为False; | +| save_to_json | 将结果保存为json格式的文件 | `- save_path`:str类型,保存的文件路径,当为目录时,保存文件命名与输入文件类型命名一致;
`- indent`:int类型,json格式化设置,默认为4;
`- ensure_ascii`:bool类型,json格式化设置,默认为False; | +| save_to_img | 将结果保存为图像格式的文件 | `- save_path`:str类型,保存的文件路径,当为目录时,保存文件命名与输入文件类型命名一致; | + +若您获取了配置文件,即可对人脸识别产线各项配置进行自定义,只需要修改 `create_pipeline` 方法中的 `pipeline` 参数值为产线配置文件路径即可。 + +例如,若您的配置文件保存在 `./my_path/face_recognition.yaml` ,则只需执行: + +```python +from paddlex import create_pipeline +pipeline = create_pipeline(pipeline="./my_path/face_recognition.yaml") +pipeline.build_index(data_root="face_demo_gallery", index_dir="face_gallery_index") +output = pipeline.predict("friends1.jpg") +for res in output: + res.print() + res.save_to_img("./output/") +``` +### 2.3 构建特征库的数据组织方式 + +PaddleX的人脸识别产线示例需要使用预先构建好的特征库进行人脸特征检索。如果您希望用私有数据构建人脸特征库,则需要按照如下方式组织数据: + +```bash +data_root # 数据集根目录,目录名称可以改变 +├── images # 图像的保存目录,目录名称可以改变 +│ ├── ID0 # 身份ID名字,最好是有意义的名字,比如人名 +│ │ ├── xxx.jpg # 图片,此处支持层级嵌套 +│ │ ├── xxx.jpg # 图片,此处支持层级嵌套 +│ │ ... +│ ├── ID1 # 身份ID名字,最好是有意义的名字,比如人名 +│ │ ... +└── gallery.txt # 特征库数据集标注文件,文件名称不可改变。每行给出待检索人脸图像路径和图像特征标签,使用空格分隔,内容举例:images/Chandler/Chandler00037.jpg Chandler +``` +## 3. 开发集成/部署 +如果人脸识别产线可以达到您对产线推理速度和精度的要求,您可以直接进行开发集成/部署。 + +若您需要将人脸识别产线直接应用在您的Python项目中,可以参考 [2.2.2 Python脚本方式](#222-python脚本方式集成)中的示例代码。 + +此外,PaddleX 也提供了其他三种部署方式,详细说明如下: + +🚀 **高性能推理**:在实际生产环境中,许多应用对部署策略的性能指标(尤其是响应速度)有着较严苛的标准,以确保系统的高效运行与用户体验的流畅性。为此,PaddleX 提供高性能推理插件,旨在对模型推理及前后处理进行深度性能优化,实现端到端流程的显著提速,详细的高性能推理流程请参考[PaddleX高性能推理指南](../../../pipeline_deploy/high_performance_inference.md)。 + +☁️ **服务化部署**:服务化部署是实际生产环境中常见的一种部署形式。通过将推理功能封装为服务,客户端可以通过网络请求来访问这些服务,以获取推理结果。PaddleX 支持用户以低成本实现产线的服务化部署,详细的服务化部署流程请参考[PaddleX服务化部署指南](../../../pipeline_deploy/service_deploy.md)。 + +下面是API参考和多语言服务调用示例: + +
+API参考 + +对于服务提供的所有操作: + +- 响应体以及POST请求的请求体均为JSON数据(JSON对象)。 +- 当请求处理成功时,响应状态码为`200`,响应体的属性如下: + + |名称|类型|含义| + |-|-|-| + |`errorCode`|`integer`|错误码。固定为`0`。| + |`errorMsg`|`string`|错误说明。固定为`"Success"`。| + + 响应体还可能有`result`属性,类型为`object`,其中存储操作结果信息。 + +- 当请求处理未成功时,响应体的属性如下: + + |名称|类型|含义| + |-|-|-| + |`errorCode`|`integer`|错误码。与响应状态码相同。| + |`errorMsg`|`string`|错误说明。| + +服务提供的操作如下: + +- **`infer`** + + 获取图像OCR结果。 + + `POST /ocr` + + - 请求体的属性如下: + + |名称|类型|含义|是否必填| + |-|-|-|-| + |`image`|`string`|服务可访问的图像文件的URL或图像文件内容的Base64编码结果。|是| + |`inferenceParams`|`object`|推理参数。|否| + + `inferenceParams`的属性如下: + + |名称|类型|含义|是否必填| + |-|-|-|-| + |`maxLongSide`|`integer`|推理时,若文本检测模型的输入图像较长边的长度大于`maxLongSide`,则将对图像进行缩放,使其较长边的长度等于`maxLongSide`。|否| + + - 请求处理成功时,响应体的`result`具有如下属性: + + |名称|类型|含义| + |-|-|-| + |`texts`|`array`|文本位置、内容和得分。| + |`image`|`string`|OCR结果图,其中标注检测到的文本位置。图像为JPEG格式,使用Base64编码。| + + `texts`中的每个元素为一个`object`,具有如下属性: + + |名称|类型|含义| + |-|-|-| + |`poly`|`array`|文本位置。数组中元素依次为包围文本的多边形的顶点坐标。| + |`text`|`string`|文本内容。| + |`score`|`number`|文本识别得分。| + + `result`示例如下: + + ```json + { + "texts": [ + { + "poly": [ + [ + 444, + 244 + ], + [ + 705, + 244 + ], + [ + 705, + 311 + ], + [ + 444, + 311 + ] + ], + "text": "北京南站", + "score": 0.9 + }, + { + "poly": [ + [ + 992, + 248 + ], + [ + 1263, + 251 + ], + [ + 1263, + 318 + ], + [ + 992, + 315 + ] + ], + "text": "天津站", + "score": 0.5 + } + ], + "image": "xxxxxx" + } + ``` + +
+ +
+多语言调用服务示例 + +
+Python + +```python +import base64 +import requests + +API_URL = "http://localhost:8080/ocr" # 服务URL +image_path = "./demo.jpg" +output_image_path = "./out.jpg" + +# 对本地图像进行Base64编码 +with open(image_path, "rb") as file: + image_bytes = file.read() + image_data = base64.b64encode(image_bytes).decode("ascii") + +payload = {"image": image_data} # Base64编码的文件内容或者图像URL + +# 调用API +response = requests.post(API_URL, json=payload) + +# 处理接口返回数据 +assert response.status_code == 200 +result = response.json()["result"] +with open(output_image_path, "wb") as file: + file.write(base64.b64decode(result["image"])) +print(f"Output image saved at {output_image_path}") +print("\nDetected texts:") +print(result["texts"]) +``` + +
+ +
+C++ + +```cpp +#include +#include "cpp-httplib/httplib.h" // https://github.com/Huiyicc/cpp-httplib +#include "nlohmann/json.hpp" // https://github.com/nlohmann/json +#include "base64.hpp" // https://github.com/tobiaslocker/base64 + +int main() { + httplib::Client client("localhost:8080"); + const std::string imagePath = "./demo.jpg"; + const std::string outputImagePath = "./out.jpg"; + + httplib::Headers headers = { + {"Content-Type", "application/json"} + }; + + // 对本地图像进行Base64编码 + std::ifstream file(imagePath, std::ios::binary | std::ios::ate); + std::streamsize size = file.tellg(); + file.seekg(0, std::ios::beg); + + std::vector buffer(size); + if (!file.read(buffer.data(), size)) { + std::cerr << "Error reading file." << std::endl; + return 1; + } + std::string bufferStr(reinterpret_cast(buffer.data()), buffer.size()); + std::string encodedImage = base64::to_base64(bufferStr); + + nlohmann::json jsonObj; + jsonObj["image"] = encodedImage; + std::string body = jsonObj.dump(); + + // 调用API + auto response = client.Post("/ocr", headers, body, "application/json"); + // 处理接口返回数据 + if (response && response->status == 200) { + nlohmann::json jsonResponse = nlohmann::json::parse(response->body); + auto result = jsonResponse["result"]; + + encodedImage = result["image"]; + std::string decodedString = base64::from_base64(encodedImage); + std::vector decodedImage(decodedString.begin(), decodedString.end()); + std::ofstream outputImage(outPutImagePath, std::ios::binary | std::ios::out); + if (outputImage.is_open()) { + outputImage.write(reinterpret_cast(decodedImage.data()), decodedImage.size()); + outputImage.close(); + std::cout << "Output image saved at " << outPutImagePath << std::endl; + } else { + std::cerr << "Unable to open file for writing: " << outPutImagePath << std::endl; + } + + auto texts = result["texts"]; + std::cout << "\nDetected texts:" << std::endl; + for (const auto& text : texts) { + std::cout << text << std::endl; + } + } else { + std::cout << "Failed to send HTTP request." << std::endl; + return 1; + } + + return 0; +} +``` + +
+ +
+Java + +```java +import okhttp3.*; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.node.ObjectNode; + +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.util.Base64; + +public class Main { + public static void main(String[] args) throws IOException { + String API_URL = "http://localhost:8080/ocr"; // 服务URL + String imagePath = "./demo.jpg"; // 本地图像 + String outputImagePath = "./out.jpg"; // 输出图像 + + // 对本地图像进行Base64编码 + File file = new File(imagePath); + byte[] fileContent = java.nio.file.Files.readAllBytes(file.toPath()); + String imageData = Base64.getEncoder().encodeToString(fileContent); + + ObjectMapper objectMapper = new ObjectMapper(); + ObjectNode params = objectMapper.createObjectNode(); + params.put("image", imageData); // Base64编码的文件内容或者图像URL + + // 创建 OkHttpClient 实例 + OkHttpClient client = new OkHttpClient(); + MediaType JSON = MediaType.Companion.get("application/json; charset=utf-8"); + RequestBody body = RequestBody.Companion.create(params.toString(), JSON); + Request request = new Request.Builder() + .url(API_URL) + .post(body) + .build(); + + // 调用API并处理接口返回数据 + try (Response response = client.newCall(request).execute()) { + if (response.isSuccessful()) { + String responseBody = response.body().string(); + JsonNode resultNode = objectMapper.readTree(responseBody); + JsonNode result = resultNode.get("result"); + String base64Image = result.get("image").asText(); + JsonNode texts = result.get("texts"); + + byte[] imageBytes = Base64.getDecoder().decode(base64Image); + try (FileOutputStream fos = new FileOutputStream(outputImagePath)) { + fos.write(imageBytes); + } + System.out.println("Output image saved at " + outputImagePath); + System.out.println("\nDetected texts: " + texts.toString()); + } else { + System.err.println("Request failed with code: " + response.code()); + } + } + } +} +``` + +
+ +
+Go + +```go +package main + +import ( + "bytes" + "encoding/base64" + "encoding/json" + "fmt" + "io/ioutil" + "net/http" +) + +func main() { + API_URL := "http://localhost:8080/ocr" + imagePath := "./demo.jpg" + outputImagePath := "./out.jpg" + + // 对本地图像进行Base64编码 + imageBytes, err := ioutil.ReadFile(imagePath) + if err != nil { + fmt.Println("Error reading image file:", err) + return + } + imageData := base64.StdEncoding.EncodeToString(imageBytes) + + payload := map[string]string{"image": imageData} // Base64编码的文件内容或者图像URL + payloadBytes, err := json.Marshal(payload) + if err != nil { + fmt.Println("Error marshaling payload:", err) + return + } + + // 调用API + client := &http.Client{} + req, err := http.NewRequest("POST", API_URL, bytes.NewBuffer(payloadBytes)) + if err != nil { + fmt.Println("Error creating request:", err) + return + } + + res, err := client.Do(req) + if err != nil { + fmt.Println("Error sending request:", err) + return + } + defer res.Body.Close() + + // 处理接口返回数据 + body, err := ioutil.ReadAll(res.Body) + if err != nil { + fmt.Println("Error reading response body:", err) + return + } + type Response struct { + Result struct { + Image string `json:"image"` + Texts []map[string]interface{} `json:"texts"` + } `json:"result"` + } + var respData Response + err = json.Unmarshal([]byte(string(body)), &respData) + if err != nil { + fmt.Println("Error unmarshaling response body:", err) + return + } + + outputImageData, err := base64.StdEncoding.DecodeString(respData.Result.Image) + if err != nil { + fmt.Println("Error decoding base64 image data:", err) + return + } + err = ioutil.WriteFile(outputImagePath, outputImageData, 0644) + if err != nil { + fmt.Println("Error writing image to file:", err) + return + } + fmt.Printf("Image saved at %s.jpg\n", outputImagePath) + fmt.Println("\nDetected texts:") + for _, text := range respData.Result.Texts { + fmt.Println(text) + } +} +``` + +
+ +
+C# + +```csharp +using System; +using System.IO; +using System.Net.Http; +using System.Net.Http.Headers; +using System.Text; +using System.Threading.Tasks; +using Newtonsoft.Json.Linq; + +class Program +{ + static readonly string API_URL = "http://localhost:8080/ocr"; + static readonly string imagePath = "./demo.jpg"; + static readonly string outputImagePath = "./out.jpg"; + + static async Task Main(string[] args) + { + var httpClient = new HttpClient(); + + // 对本地图像进行Base64编码 + byte[] imageBytes = File.ReadAllBytes(imagePath); + string image_data = Convert.ToBase64String(imageBytes); + + var payload = new JObject{ { "image", image_data } }; // Base64编码的文件内容或者图像URL + var content = new StringContent(payload.ToString(), Encoding.UTF8, "application/json"); + + // 调用API + HttpResponseMessage response = await httpClient.PostAsync(API_URL, content); + response.EnsureSuccessStatusCode(); + + // 处理接口返回数据 + string responseBody = await response.Content.ReadAsStringAsync(); + JObject jsonResponse = JObject.Parse(responseBody); + + string base64Image = jsonResponse["result"]["image"].ToString(); + byte[] outputImageBytes = Convert.FromBase64String(base64Image); + + File.WriteAllBytes(outputImagePath, outputImageBytes); + Console.WriteLine($"Output image saved at {outputImagePath}"); + Console.WriteLine("\nDetected texts:"); + Console.WriteLine(jsonResponse["result"]["texts"].ToString()); + } +} +``` + +
+ +
+Node.js + +```js +const axios = require('axios'); +const fs = require('fs'); + +const API_URL = 'http://localhost:8080/ocr' +const imagePath = './demo.jpg' +const outputImagePath = "./out.jpg"; + +let config = { + method: 'POST', + maxBodyLength: Infinity, + url: API_URL, + data: JSON.stringify({ + 'image': encodeImageToBase64(imagePath) // Base64编码的文件内容或者图像URL + }) +}; + +// 对本地图像进行Base64编码 +function encodeImageToBase64(filePath) { + const bitmap = fs.readFileSync(filePath); + return Buffer.from(bitmap).toString('base64'); +} + +// 调用API +axios.request(config) +.then((response) => { + // 处理接口返回数据 + const result = response.data["result"]; + const imageBuffer = Buffer.from(result["image"], 'base64'); + fs.writeFile(outputImagePath, imageBuffer, (err) => { + if (err) throw err; + console.log(`Output image saved at ${outputImagePath}`); + }); + console.log("\nDetected texts:"); + console.log(result["texts"]); +}) +.catch((error) => { + console.log(error); +}); +``` + +
+ +
+PHP + +```php + $image_data); // Base64编码的文件内容或者图像URL + +// 调用API +$ch = curl_init($API_URL); +curl_setopt($ch, CURLOPT_POST, true); +curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($payload)); +curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); +$response = curl_exec($ch); +curl_close($ch); + +// 处理接口返回数据 +$result = json_decode($response, true)["result"]; +file_put_contents($output_image_path, base64_decode($result["image"])); +echo "Output image saved at " . $output_image_path . "\n"; +echo "\nDetected texts:\n"; +print_r($result["texts"]); + +?> +``` + +
+
+
+ +📱 **端侧部署**:端侧部署是一种将计算和数据处理功能放在用户设备本身上的方式,设备可以直接处理数据,而不需要依赖远程的服务器。PaddleX 支持将模型部署在 Android 等端侧设备上,详细的端侧部署流程请参考[PaddleX端侧部署指南](../../../pipeline_deploy/edge_deploy.md)。 +您可以根据需要选择合适的方式部署模型产线,进而进行后续的 AI 应用集成。 + + +## 4. 二次开发 +如果 人脸识别 产线提供的默认模型权重在您的场景中,精度或速度不满意,您可以尝试利用**您自己拥有的特定领域或应用场景的数据**对现有模型进行进一步的**微调**,以提升通用该产线的在您的场景中的识别效果。 + +### 4.1 模型微调 +由于人脸识别产线包含两个模块(人脸检测和人脸识别),模型产线的效果不及预期可能来自于其中任何一个模块。 + +您可以对识别效果差的图片进行分析,如果在分析过程中发现有较多的人脸未被检测出来,那么可能是人脸检测模型存在不足,您需要参考[人脸检测模块开发教程](../../../module_usage/tutorials/cv_modules/face_detection.md)中的[二次开发](../../../module_usage/tutorials/cv_modules/face_detection.md#四二次开发)章节,使用您的私有数据集对人脸检测模型进行微调;如果在已检测到的人脸出现匹配错误,这表明人脸识别模型需要进一步改进,您需要参考[人脸识别模块开发教程](../../../module_usage/tutorials/cv_modules/face_recognition.md)中的[二次开发](../../../module_usage/tutorials/cv_modules/face_recognition.md#四二次开发)章节,对人脸识别模型进行微调。 + +### 4.2 模型应用 +当您使用私有数据集完成微调训练后,可获得本地模型权重文件。 + +若您需要使用微调后的模型权重,只需对产线配置文件做修改,将微调后模型权重的本地路径替换至产线配置文件中的对应位置即可: + +```bash + +...... +Pipeline: + device: "gpu:0" + det_model: "BlazeFace" #可修改为微调后人脸检测模型的本地路径 + rec_model: "MobileFaceNet" #可修改为微调后人脸识别模型的本地路径 + det_batch_size: 1 + rec_batch_size: 1 + device: gpu +...... +``` +随后, 参考[2.2 本地体验](#22-本地体验)中的命令行方式或Python脚本方式,加载修改后的产线配置文件即可。 +注:目前暂不支持为人脸检测和人脸识别模型设置单独的batch_size。 + +## 5. 多硬件支持 +PaddleX 支持英伟达 GPU、昆仑芯 XPU、昇腾 NPU和寒武纪 MLU 等多种主流硬件设备,**仅需修改 `--device`参数**即可完成不同硬件之间的无缝切换。 + +例如,使用Python运行人脸识别线时,将运行设备从英伟达 GPU 更改为昇腾 NPU,仅需将脚本中的 `device` 修改为 npu 即可: + +```python +from paddlex import create_pipeline +from paddlex import create_pipeline + +pipeline = create_pipeline( + pipeline="face_recognition", + device="npu:0" # gpu:0 --> npu:0 + ) +``` +若您想在更多种类的硬件上使用人脸识别产线,请参考[PaddleX多硬件使用指南](../../../other_devices_support/multi_devices_use_guide.md)。 diff --git a/docs/pipeline_usage/tutorials/face_recognition_pipelines/face_recognition_en.md b/docs/pipeline_usage/tutorials/face_recognition_pipelines/face_recognition_en.md new file mode 100644 index 000000000..913d5aa50 --- /dev/null +++ b/docs/pipeline_usage/tutorials/face_recognition_pipelines/face_recognition_en.md @@ -0,0 +1,598 @@ +English | [简体中文](face_recognition.md) + +# Face Recognition Pipeline Tutorial + +## 1. Introduction to the Face Recognition Pipeline +Face recognition is a crucial component in the field of computer vision, aiming to automatically identify individuals by analyzing and comparing facial features. This task involves not only detecting faces in images but also extracting and matching facial features to find corresponding identity information in a database. Face recognition is widely used in security authentication, surveillance systems, social media, smart devices, and other scenarios. + +The face recognition pipeline is an end-to-end system dedicated to solving face detection and recognition tasks. It can quickly and accurately locate face regions in images, extract facial features, and retrieve and compare them with pre-established features in a feature database to confirm identity information. + +![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/face_recognition/01.png) + +**The face recognition pipeline includes a face detection module and a face recognition module**, with several models in each module. Which models to use can be selected based on the benchmark data below. **If you prioritize model accuracy, choose models with higher accuracy; if you prioritize inference speed, choose models with faster inference; if you prioritize model size, choose models with smaller storage requirements**. + +
+ 👉Model List Details + +**Face Detection Module**: + +| Model | AP (%)
Easy/Medium/Hard | GPU Inference Time (ms) | CPU Inference Time | Model Size (M) | Description | +|--------------------------|-----------------|--------------|---------|------------|-----------------------------| +| BlazeFace | 77.7/73.4/49.5 | | | 0.447 | | +| BlazeFace-FPN-SSH | 83.2/80.5/60.5 | | | 0.606 | Improved BlazeFace with FPN and SSH structures | +| PicoDet_LCNet_x2_5_face | 93.7/90.7/68.1 | | | 28.9 | Face detection model based on PicoDet_LCNet_x2_5 | +| PP-YOLOE_plus-S_face | 93.9/91.8/79.8 | | | 26.5 | Face detection model based on PP-YOLOE_plus-S | + +Note: The above accuracy metrics are evaluated on the WIDER-FACE validation set with an input size of 640x640. All GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. + +**Face Recognition Module**: + +| Model | Output Feature Dimension | AP (%)
AgeDB-30/CFP-FP/LFW | GPU Inference Time (ms) | CPU Inference Time | Model Size (M) | Description | +|---------------|--------|-------------------------------|--------------|---------|------------|-------------------------------------| +| MobileFaceNet | 128 | 96.28/96.71/99.58 | | | 4.1 | Face recognition model trained on MS1Mv3 based on MobileFaceNet | +| ResNet50 | 512 | 98.12/98.56/99.77 | | | 87.2 | Face recognition model trained on MS1Mv3 based on ResNet50 | + +Note: The above accuracy metrics are Accuracy scores measured on the AgeDB-30, CFP-FP, and LFW datasets, respectively. All GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. + +
+ +## 2. Quick Start +The pre-trained model pipelines provided by PaddleX can be quickly experienced. You can experience the effects of the face recognition pipeline online or locally using command-line or Python. + +### 2.1 Online Experience +You can [experience online](https://aistudio.baidu.com/community/app/91) + +### 2.2 Local Experience +> ❗ Before using the facial recognition pipeline locally, please ensure that you have completed the installation of the PaddleX wheel package according to the [PaddleX Installation Guide](../../../installation/installation.md). + +#### 2.2.1 Command Line Experience + +Command line experience is not supported at the moment. +#### 2.2.2 Integration via Python Script + +Please download the [test image](https://paddle-model-ecology.bj.bcebos.com/paddlex/demo_data/friends1.jpg) for testing. In the example of running this pipeline, you need to pre-build a facial feature library. You can refer to the following instructions to download the official demo data []() to be used for subsequent construction of the facial feature library. You can use the following command to download the demo dataset to a specified folder: + +```bash +cd /path/to/paddlex +wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/face_demo_gallery.tar +tar -xf ./face_demo_gallery.tar +``` + +If you wish to build a facial feature library using a private dataset, please refer to [Section 2.3: Data Organization for Building a Feature Library](#23-data-organization-for-building-a-feature-library). Afterward, you can complete the establishment of the facial feature library and quickly perform inference with the facial recognition pipeline using just a few lines of code. + +```python +from paddlex import create_pipeline + +pipeline = create_pipeline(pipeline="face_recognition") + +pipeline.build_index(data_root="face_demo_gallery", index_dir="face_gallery_index") + +output = pipeline.predict("friends1.jpg") +for res in output: + res.print() + res.save_to_img("./output/") +``` + +In the above Python script, the following steps are executed: + +(1) Instantiate the `create_pipeline` to create a face recognition pipeline object. The specific parameter descriptions are as follows: + +| Parameter | Description | Type | Default | +|-|-|-|-| +| `pipeline` | The name of the pipeline or the path to the pipeline configuration file. If it is the pipeline name, it must be a pipeline supported by PaddleX. | `str` | None | +| `device` | The device for pipeline model inference. Supports: "gpu", "cpu". | `str` | "gpu" | +| `use_hpip` | Whether to enable high-performance inference, only available when the pipeline supports high-performance inference. | `bool` | `False` | + +(2) Call the `build_index` method of the face recognition pipeline object to build the facial feature library. The specific parameters are described as follows: + +| Parameter | Description | Type | Default | +|-|-|-|-| +| `data_root` | The root directory of the dataset, with data organization referring to [Section 2.3: Data Organization for Building a Feature Library](#2.3-Data-Organization-for-Building-a-Feature-Library) | `str` | None | +| `index_dir` | The save path for the feature library. After successfully calling the `build_index` method, two files will be generated in this path:
`"id_map.pkl"` saves the mapping relationship between image IDs and image feature labels;
`"vector.index"` stores the feature vectors of each image. | `str` | None | + +(3) Call the `predict` method of the face recognition pipeline object for inference prediction: The `predict` method parameter is `x`, used to input data to be predicted, supporting multiple input methods, as shown in the following examples: + +| Parameter Type | Description | +|----------------|-----------------------------------------------------------------------------------------------------------| +| Python Var | Supports directly passing in Python variables, such as image data represented by `numpy.ndarray`. | +| `str` | Supports passing in the file path of the data to be predicted, such as the local path of an image file: `/root/data/img.jpg`. | +| `str` | Supports passing in the URL of the data file to be predicted, such as the network URL of an image file: [Example](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_001.png). | +| `str` | Supports passing in a local directory containing the data files to be predicted, such as the local path: `/root/data/`. | +| `dict` | Supports passing in a dictionary type, where the key needs to correspond to the specific task, such as "img" for image classification tasks, and the value of the dictionary supports the above types of data, for example: `{"img": "/root/data1"}`. | +| `list` | Supports passing in a list, where the list elements need to be the above types of data, such as `[numpy.ndarray, numpy.ndarray], ["/root/data/img1.jpg", "/root/data/img2.jpg"], ["/root/data1", "/root/data2"], [{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}]`. | + +(4) Obtain the prediction results by calling the `predict` method: The `predict` method is a `generator`, so prediction results need to be obtained through iteration. The `predict` method predicts data in batches, so the prediction results are in the form of a list. + +(5) Process the prediction results: The prediction result for each sample is of type `dict`, and it supports printing or saving to a file. The supported file types depend on the specific pipeline, such as: + +| Method | Description | Method Parameters | +|--------------|-----------------------------|--------------------------------------------------------------------------------------------------------| +| print | Print results to the terminal | `- format_json`: Boolean, whether to format the output with JSON indentation, default is True;
`- indent`: Integer, JSON formatting setting, effective only when format_json is True, default is 4;
`- ensure_ascii`: Boolean, JSON formatting setting, effective only when format_json is True, default is False; | +| save_to_json | Save results as a JSON file | `- save_path`: String, file path for saving; if it's a directory, the saved file name matches the input file name;
`- indent`: Integer, JSON formatting setting, default is 4;
`- ensure_ascii`: Boolean, JSON formatting setting, default is False; | +| save_to_img | Save results as an image file | `- save_path`: String, file path for saving; if it's a directory, the saved file name matches the input file name; | + +If you have obtained the configuration file, you can customize various settings of the facial recognition pipeline by simply modifying the `pipeline` parameter value in the `create_pipeline` method to the path of the pipeline configuration file. + +For example, if your configuration file is saved at `./my_path/face_recognition.yaml`, you just need to execute: + +```python +from paddlex import create_pipeline +pipeline = create_pipeline(pipeline="./my_path/face_recognition.yaml") +pipeline.build_index(data_root="face_demo_gallery", index_dir="face_gallery_index") +output = pipeline.predict("friends1.jpg") +for res in output: + res.print() + res.save_to_img("./output/") +``` + +### 2.3 Data Organization for Feature Library Construction + +The face recognition pipeline example in PaddleX requires a pre-constructed feature library for face feature retrieval. If you wish to build a face feature library with private data, you need to organize the data as follows: + +```bash +data_root # Root directory of the dataset, the directory name can be changed +├── images # Directory for saving images, the directory name can be changed +│ ├── ID0 # Identity ID name, preferably meaningful, such as a person's name +│ │ ├── xxx.jpg # Image, nested directories are supported +│ │ ├── xxx.jpg # Image, nested directories are supported +│ │ ... +│ ├── ID1 # Identity ID name, preferably meaningful, such as a person's name +│ │ ... +└── gallery.txt # Annotation file for the feature library dataset, the file name cannot be changed. Each line gives the path of the face image to be retrieved and the image feature label, separated by a space. Example content: images/Chandler/Chandler00037.jpg Chandler +``` + +## 3. Development Integration/Deployment +If the face recognition pipeline meets your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment. + +If you need to directly apply the face recognition pipeline in your Python project, you can refer to the example code in [2.2.2 Python Script Integration](#222-python-script-integration). + +Additionally, PaddleX provides three other deployment methods, detailed as follows: + +🚀 **High-Performance Inference**: In actual production environments, many applications have stringent standards for the performance metrics of deployment strategies (especially response speed) to ensure efficient system operation and smooth user experience. To this end, PaddleX provides high-performance inference plugins aimed at deeply optimizing model inference and pre/post-processing to significantly speed up the end-to-end process. For detailed high-performance inference procedures, please refer to the [PaddleX High-Performance Inference Guide](../../../pipeline_deploy/high_performance_inference.md). + +☁️ **Service-Oriented Deployment**: Service-oriented deployment is a common deployment form in actual production environments. By encapsulating inference functionality as services, clients can access these services through network requests to obtain inference results. PaddleX supports users in achieving service-oriented deployment of pipelines at low cost. For detailed service-oriented deployment procedures, please refer to the [PaddleX Service-Oriented Deployment Guide](../../../pipeline_deploy/service_deploy.md). + +Below are the API reference and multi-language service invocation examples: + +
+API Reference + +For all operations provided by the service: + +- The response body and the request body of POST requests are both JSON data (JSON objects). +- When the request is successfully processed, the response status code is `200`, and the attributes of the response body are as follows: + + | Name | Type | Meaning | + |-|-|-| + |`errorCode`|`integer`|Error code. Fixed to `0`. | + |`errorMsg`|`string`|Error description. Fixed to `"Success"`. | + + The response body may also have a `result` attribute of type `object`, which stores the operation result information. + +- When the request is not successfully processed, the attributes of the response body are as follows: + + | Name | Type | Meaning | + |-|-|-| + |`errorCode`|`integer`|Error code. Same as the response status code. | + |`errorMsg`|`string`|Error description. | + +The operations provided by the service are as follows: + +- **`infer`** + + Obtain OCR results for an image. + + `POST /ocr` + + - The attributes of the request body are as follows: + + | Name | Type | Meaning | Required | + |-|-|-|-| + |`image`|`string`|The URL of an accessible image file or the Base64 encoded result of the image file content. |Yes| + |`inferenceParams`|`object`|Inference parameters. |No| + + The attributes of```markdown +
+Python + +```python +import base64 +import requests + +API_URL = "http://localhost:8080/ocr" # Service URL +image_path = "./demo.jpg" +output_image_path = "./out.jpg" + +# Encode the local image to Base64 +with open(image_path, "rb") as file: + image_bytes = file.read() + image_data = base64.b64encode(image_bytes).decode("ascii") + +payload = {"image": image_data} # Base64 encoded file content or image URL + +# Call the API +response = requests.post(API_URL, json=payload) + +# Process the response data +assert response.status_code == 200 +result = response.json()["result"] +with open(output_image_path, "wb") as file: + file.write(base64.b64decode(result["image"])) +print(f"Output image saved at {output_image_path}") +print("\nDetected texts:") +print(result["texts"]) +``` + +
+ +
+C++ + +```cpp +#include +#include "cpp-httplib/httplib.h" // https://github.com/Huiyicc/cpp-httplib +#include "nlohmann/json.hpp" // https://github.com/nlohmann/json +#include "base64.hpp" // https://github.com/tobiaslocker/base64 + +int main() { + httplib::Client client("localhost:8080"); + const std::string imagePath = "./demo.jpg"; + const std::string outputImagePath = "./out.jpg"; + + httplib::Headers headers = { + {"Content-Type", "application/json"} + }; + + // Encode the local image to Base64 + std::ifstream file(imagePath, std::ios::binary | std::ios::ate); + std::streamsize size = file.tellg(); + file.seekg(0, std::ios::beg); + + std::vector buffer(size); + if (!file.read(buffer.data(), size)) { + std::cerr << "Error reading file." << std::endl; + return 1; + } + std::string bufferStr(reinterpret_cast(buffer.data()), buffer.size()); + std::string encodedImage = base64::to_base64(bufferStr); + + nlohmann::json jsonObj; + jsonObj["image"] = encodedImage; + std::string body = jsonObj.dump(); + + // Call the API + auto response = client.Post("/ocr", headers, body, "application/json"); + // Process the response data + if (response && response->status == 200) { + nlohmann::json jsonResponse = nlohmann::json::parse(response->body); + auto result = jsonResponse["result"]; + + encodedImage = result["image"]; + std::string decodedString = base64::from_base64(encodedImage); + std::vector decodedImage(decodedString.begin(), decodedString.end()); + std::ofstream outputImage(outputImagePath, std::ios::binary | std::ios::out); + if (outputImage.is_open()) { + outputImage.write(reinterpret_cast(decodedImage.data()), decodedImage.size()); + outputImage.close(); + std::cout << "Output image saved at " << outputImagePath << std::endl; + } else { + std::cerr << "Unable to open file for writing: " << outputImagePath << std::endl; + } + + auto texts = result["texts"]; + std::cout << "\nDetected texts:" << std::endl; + for (const auto& text : texts) { + std::cout << text << std::endl; + } + } else { + std::cout << "Failed to send HTTP request." << std::endl; + return 1; + } + + return 0; +} + +``` + +
+``````markdown +# Tutorial on Artificial Intelligence and Computer Vision + +This tutorial, intended for numerous developers, covers the basics and applications of AI and Computer Vision. + +
+Java + +```java +import okhttp3.*; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.node.ObjectNode; + +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.util.Base64; + +public class Main { + public static void main(String[] args) throws IOException { + String API_URL = "http://localhost:8080/ocr"; // Service URL + String imagePath = "./demo.jpg"; // Local image path + String outputImagePath = "./out.jpg"; // Output image path + + // Encode the local image to Base64 + File file = new File(imagePath); + byte[] fileContent = java.nio.file.Files.readAllBytes(file.toPath()); + String imageData = Base64.getEncoder().encodeToString(fileContent); + + ObjectMapper objectMapper = new ObjectMapper(); + ObjectNode params = objectMapper.createObjectNode(); + params.put("image", imageData); // Base64-encoded file content or image URL + + // Create an OkHttpClient instance + OkHttpClient client = new OkHttpClient(); + MediaType JSON = MediaType.get("application/json; charset=utf-8"); + RequestBody body = RequestBody.create(params.toString(), JSON); + Request request = new Request.Builder() + .url(API_URL) + .post(body) + .build(); + + // Call the API and process the response + try (Response response = client.newCall(request).execute()) { + if (response.isSuccessful()) { + String responseBody = response.body().string(); + JsonNode resultNode = objectMapper.readTree(responseBody); + JsonNode result = resultNode.get("result"); + String base64Image = result.get("image").asText(); + JsonNode texts = result.get("texts"); + + byte[] imageBytes = Base64.getDecoder().decode(base64Image); + try (FileOutputStream fos = new FileOutputStream(outputImagePath)) { + fos.write(imageBytes); + } + System.out.println("Output image saved at " + outputImagePath); + System.out.println("\nDetected texts: " + texts.toString()); + } else { + System.err.println("Request failed with code: " + response.code()); + } + } + } +} +``` + +
+ +
+Go + +```go +package main + +import ( + "bytes" + "encoding/base64" + "encoding/json" + "fmt" + "io/ioutil" + "net/http" +) + +func main() { + API_URL := "http://localhost:8080/ocr" + imagePath := "./demo.jpg" + outputImagePath := "./out.jpg" + + // Encode the local image to Base64 + imageBytes, err := ioutil.ReadFile(imagePath) + if err != nil { + fmt.Println("Error reading image file:", err) + return + } + imageData := base64.StdEncoding.EncodeToString(imageBytes) + + payload := map[string]string{"image": imageData} // Base64-encoded file content or image URL + payloadBytes, err := json.Marshal(payload) + if err != nil { + fmt.Println("Error marshaling payload:", err) + return + } + + // Call the API + client := &http.Client{} + req, err := http.NewRequest("POST", API_URL, bytes.NewBuffer(payloadBytes)) + if err != nil { + fmt.Println("Error creating request:", err) + return + } + + res, err := client.Do(req) + if err != nil { + fmt.Println("Error sending request:", err) + return + } + defer res.Body.Close() + + // Process the response + body, err := ioutil.ReadAll(res.Body) + if err != nil { + fmt.Println("Error reading response body:", err) + return + }```markdown +# An English Tutorial on Artificial Intelligence and Computer Vision + +This tutorial document is intended for numerous developers and covers content related to artificial intelligence and computer vision. + +
+C# + +```csharp +using System; +using System.IO; +using System.Net.Http; +using System.Net.Http.Headers; +using System.Text; +using System.Threading.Tasks; +using Newtonsoft.Json.Linq; + +class Program +{ + static readonly string API_URL = "http://localhost:8080/ocr"; + static readonly string imagePath = "./demo.jpg"; + static readonly string outputImagePath = "./out.jpg"; + + static async Task Main(string[] args) + { + var httpClient = new HttpClient(); + + // Encode the local image to Base64 + byte[] imageBytes = File.ReadAllBytes(imagePath); + string image_data = Convert.ToBase64String(imageBytes); + + var payload = new JObject{ { "image", image_data } }; // Base64 encoded file content or image URL + var content = new StringContent(payload.ToString(), Encoding.UTF8, "application/json"); + + // Call the API + HttpResponseMessage response = await httpClient.PostAsync(API_URL, content); + response.EnsureSuccessStatusCode(); + + // Process the API response + string responseBody = await response.Content.ReadAsStringAsync(); + JObject jsonResponse = JObject.Parse(responseBody); + + string base64Image = jsonResponse["result"]["image"].ToString(); + byte[] outputImageBytes = Convert.FromBase64String(base64Image); + + File.WriteAllBytes(outputImagePath, outputImageBytes); + Console.WriteLine($"Output image saved at {outputImagePath}"); + Console.WriteLine("\nDetected texts:"); + Console.WriteLine(jsonResponse["result"]["texts"].ToString()); + } +} +``` + +
+ +
+Node.js + +```js +const axios = require('axios'); +const fs = require('fs'); + +const API_URL = 'http://localhost:8080/ocr'; +const imagePath = './demo.jpg'; +const outputImagePath = "./out.jpg"; + +let config = { + method: 'POST', + maxBodyLength: Infinity, + url: API_URL, + data: JSON.stringify({ + 'image': encodeImageToBase64(imagePath) // Base64 encoded file content or image URL + }) +}; + +// Encode the local image to Base64 +function encodeImageToBase64(filePath) { + const bitmap = fs.readFileSync(filePath); + return Buffer.from(bitmap).toString('base64'); +} + +// Call the API +axios.request(config) +.then((response) => { + // Process the API response + const result = response.data["result"]; + const imageBuffer = Buffer.from(result["image"], 'base64'); + fs.writeFile(outputImagePath, imageBuffer, (err) => { + if (err) throw err; + console.log(`Output image saved at ${outputImagePath}`); + }); + console.log("\nDetected texts:"); + console.log(result["texts"]); +}) +.catch((error) => { + console.log(error); +}); +``` + +
+ +
+PHP + +```php + $image_data); // Base64 encoded file content or image URL + +// Call the API +$ch = curl_init($API_URL); +curl_setopt($ch, CURLOPT_POST, true); +curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($payload)); +curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); +$response = curl_exec($ch); +curl_close($ch); + +// Process the API response +$result = json_decode($response, true)["result"]; +file_put_contents($output +``` + +
+
+
+ +📱 **Edge Deployment**: Edge deployment is a method where computing and data processing functions are placed on the user's device itself, allowing the device to process data directly without relying on remote servers. PaddleX supports deploying models on edge devices such as Android. For detailed edge deployment procedures, please refer to the [PaddleX Edge Deployment Guide](../../../pipeline_deploy/edge_deploy_en.md). +You can choose an appropriate method to deploy your model pipeline based on your needs, and proceed with subsequent AI application integration. + + +## 4. Custom Development +If the default model weights provided by the Face Recognition Pipeline do not meet your expectations in terms of accuracy or speed for your specific scenario, you can try to further **fine-tune** the existing models using **your own domain-specific or application-specific data** to enhance the recognition performance of the pipeline in your scenario. + +### 4.1 Model Fine-tuning +Since the Face Recognition Pipeline consists of two modules (face detection and face recognition), the suboptimal performance of the pipeline may stem from either module. + +You can analyze images with poor recognition results. If you find that many faces are not detected during the analysis, it may indicate deficiencies in the face detection model. In this case, you need to refer to the [Custom Development](../../../module_usage/tutorials/cv_modules/face_detection_en.md#IV.-Custom-Development) section in the [Face Detection Module Development Tutorial](../../../module_usage/tutorials/cv_modules/face_detection_en.md) and use your private dataset to fine-tune the face detection model. If matching errors occur in detected faces, it suggests that the face recognition model needs further improvement. You should refer to the [Custom Development](../../../module_usage/tutorials/cv_modules/face_recognition_en.md#IV.-Custom-Development) section in the [Face Recognition Module Development Tutorial](../../../module_usage/tutorials/cv_modules/face_recognition_en.md) to fine-tune the face recognition model. + +### 4.2 Model Application +After completing fine-tuning training with your private dataset, you will obtain local model weight files. + +To use the fine-tuned model weights, you only need to modify the pipeline configuration file by replacing the local paths of the fine-tuned model weights with the corresponding paths in the pipeline configuration file: + +```bash + +...... +Pipeline: + device: "gpu:0" + det_model: "BlazeFace" # Can be modified to the local path of the fine-tuned face detection model + rec_model: "MobileFaceNet" # Can be modified to the local path of the fine-tuned face recognition model + det_batch_size: 1 + rec_batch_size: 1 + device: gpu +...... +``` +Subsequently, refer to the command-line method or Python script method in [2.2 Local Experience](#22-Local-Experience) to load the modified pipeline configuration file. +Note: Currently, setting separate `batch_size` for face detection and face recognition models is not supported. + +## 5. Multi-hardware Support +PaddleX supports various mainstream hardware devices such as NVIDIA GPUs, Kunlun XPU, Ascend NPU, and Cambricon MLU. **Simply modifying the `--device` parameter** allows seamless switching between different hardware. + +For example, when running the face recognition pipeline using Python and changing the running device from an NVIDIA GPU to an Ascend NPU, you only need to modify the `device` in the script to `npu`: + +```python +from paddlex import create_pipeline + +pipeline = create_pipeline( + pipeline="face_recognition", + device="npu:0" # gpu:0 --> npu:0 +) +``` +If you want to use the face recognition pipeline on more types of hardware, please refer to the [PaddleX Multi-device Usage Guide](../../../other_devices_support/multi_devices_use_guide_en.md). diff --git a/paddlex/configs/face_detection/BlazeFace-FPN-SSH.yaml b/paddlex/configs/face_detection/BlazeFace-FPN-SSH.yaml new file mode 100644 index 000000000..0ca8308f6 --- /dev/null +++ b/paddlex/configs/face_detection/BlazeFace-FPN-SSH.yaml @@ -0,0 +1,40 @@ +Global: + model: BlazeFace-FPN-SSH + mode: check_dataset # check_dataset/train/evaluate/predict + dataset_dir: "/paddle/dataset/paddlex/det/widerface_coco_examples" + device: gpu:0,1,2,3 + output: "output" + +CheckDataset: + convert: + enable: False + src_dataset_type: null + split: + enable: False + train_percent: null + val_percent: null + +Train: + num_classes: 1 + epochs_iters: 50 + batch_size: 4 + learning_rate: 0.001 + pretrain_weight_path: null + warmup_steps: 500 + resume_path: null + log_interval: 10 + eval_interval: 1 + +Evaluate: + weight_path: "output/best_model/best_model.pdparams" + log_interval: 10 + +Export: + weight_path: https://paddledet.bj.bcebos.com/models/blazeface_fpn_ssh_1000e.pdparams + +Predict: + batch_size: 1 + model_dir: "output/best_model/inference" + input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/face_detection.png" + kernel_option: + run_mode: paddle diff --git a/paddlex/configs/face_detection/BlazeFace.yaml b/paddlex/configs/face_detection/BlazeFace.yaml new file mode 100644 index 000000000..73ccdc663 --- /dev/null +++ b/paddlex/configs/face_detection/BlazeFace.yaml @@ -0,0 +1,40 @@ +Global: + model: BlazeFace + mode: check_dataset # check_dataset/train/evaluate/predict + dataset_dir: "/paddle/dataset/paddlex/det/widerface_coco_examples" + device: gpu:0,1,2,3 + output: "output" + +CheckDataset: + convert: + enable: False + src_dataset_type: null + split: + enable: False + train_percent: null + val_percent: null + +Train: + num_classes: 1 + epochs_iters: 50 + batch_size: 4 + learning_rate: 0.001 + pretrain_weight_path: null + warmup_steps: 500 + resume_path: null + log_interval: 10 + eval_interval: 1 + +Evaluate: + weight_path: "output/best_model/best_model.pdparams" + log_interval: 10 + +Export: + weight_path: https://paddledet.bj.bcebos.com/models/blazeface_1000e.pdparams + +Predict: + batch_size: 1 + model_dir: "output/best_model/inference" + input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/face_detection.png" + kernel_option: + run_mode: paddle diff --git a/paddlex/configs/face_detection/PP-YOLOE_plus-S_face.yaml b/paddlex/configs/face_detection/PP-YOLOE_plus-S_face.yaml new file mode 100644 index 000000000..63fdd4aff --- /dev/null +++ b/paddlex/configs/face_detection/PP-YOLOE_plus-S_face.yaml @@ -0,0 +1,40 @@ +Global: + model: PP-YOLOE_plus-S_face + mode: check_dataset # check_dataset/train/evaluate/predict + dataset_dir: "/paddle/dataset/paddlex/det/widerface_coco_examples" + device: gpu:0,1,2,3 + output: "output" + +CheckDataset: + convert: + enable: False + src_dataset_type: null + split: + enable: False + train_percent: null + val_percent: null + +Train: + num_classes: 1 + epochs_iters: 50 + batch_size: 4 + learning_rate: 0.0001 + pretrain_weight_path: null + warmup_steps: 100 + resume_path: null + log_interval: 10 + eval_interval: 1 + +Evaluate: + weight_path: "output/best_model/best_model.pdparams" + log_interval: 10 + +Export: + weight_path: https://paddledet.bj.bcebos.com/models/ppyoloe_plus-s_face.pdparams + +Predict: + batch_size: 1 + model_dir: "output/best_model/inference" + input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/face_detection.png" + kernel_option: + run_mode: paddle diff --git a/paddlex/configs/face_recognition/MobileFaceNet.yaml b/paddlex/configs/face_recognition/MobileFaceNet.yaml new file mode 100644 index 000000000..806ecfaba --- /dev/null +++ b/paddlex/configs/face_recognition/MobileFaceNet.yaml @@ -0,0 +1,41 @@ +Global: + model: MobileFaceNet + mode: check_dataset # check_dataset/train/evaluate/predict + dataset_dir: "/paddle/dataset/paddlex/cls/face_train_examples" + device: gpu:0,1,2,3 + output: "output" + +CheckDataset: + convert: + enable: False + src_dataset_type: null + split: + enable: False + train_percent: null + val_percent: null + +Train: + num_classes: 995 + epochs_iters: 25 + batch_size: 128 + learning_rate: 0.002 + pretrain_weight_path: null + warmup_steps: 1 + resume_path: null + log_interval: 1 + eval_interval: 1 + save_interval: 1 + +Evaluate: + weight_path: "output/best_model/best_model.pdparams" + log_interval: 1 + +Export: + weight_path: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/foundation_models/mobilefacenet.pdparams + +Predict: + batch_size: 1 + model_dir: "output/best_model/inference" + input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/face_classification_001.jpg" + kernel_option: + run_mode: paddle diff --git a/paddlex/configs/face_recognition/ResNet50_face.yaml b/paddlex/configs/face_recognition/ResNet50_face.yaml new file mode 100644 index 000000000..001fb175f --- /dev/null +++ b/paddlex/configs/face_recognition/ResNet50_face.yaml @@ -0,0 +1,41 @@ +Global: + model: ResNet50_face + mode: check_dataset # check_dataset/train/evaluate/predict + dataset_dir: "/paddle/dataset/paddlex/cls/face_rec_examples" + device: gpu:0,1,2,3 + output: "output" + +CheckDataset: + convert: + enable: False + src_dataset_type: null + split: + enable: False + train_percent: null + val_percent: null + +Train: + num_classes: 995 + epochs_iters: 25 + batch_size: 64 + learning_rate: 0.004 + pretrain_weight_path: null + warmup_steps: 1 + resume_path: null + log_interval: 1 + eval_interval: 1 + save_interval: 1 + +Evaluate: + weight_path: "output/best_model/best_model.pdparams" + log_interval: 1 + +Export: + weight_path: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/foundation_models/resnet50_face.pdparams + +Predict: + batch_size: 1 + model_dir: "output/best_model/inference" + input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/face_classification_001.jpg" + kernel_option: + run_mode: paddle diff --git a/paddlex/inference/components/__init__.py b/paddlex/inference/components/__init__.py index b8bfb17d5..072d09e60 100644 --- a/paddlex/inference/components/__init__.py +++ b/paddlex/inference/components/__init__.py @@ -15,3 +15,4 @@ from .transforms import * from .paddle_predictor import * from .task_related import * +from .retrieval import * diff --git a/paddlex/inference/components/retrieval/__init__.py b/paddlex/inference/components/retrieval/__init__.py new file mode 100644 index 000000000..7cfcb5767 --- /dev/null +++ b/paddlex/inference/components/retrieval/__init__.py @@ -0,0 +1,15 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .faiss import FaissIndexer diff --git a/paddlex/inference/components/retrieval/faiss.py b/paddlex/inference/components/retrieval/faiss.py new file mode 100644 index 000000000..79c6a3629 --- /dev/null +++ b/paddlex/inference/components/retrieval/faiss.py @@ -0,0 +1,256 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import pickle +from pathlib import Path +import faiss +import numpy as np + +from ....utils import logging +from ..base import BaseComponent + + +class FaissIndexer(BaseComponent): + + INPUT_KEYS = "feature" + OUTPUT_KEYS = ["label", "score"] + DEAULT_INPUTS = {"feature": "feature"} + DEAULT_OUTPUTS = {"label": "label", "score": "score", "unique_id": "unique_id"} + + ENABLE_BATCH = True + + def __init__( + self, + index_dir, + metric_type="IP", + return_k=1, + score_thres=None, + hamming_radius=None, + ): + super().__init__() + index_dir = Path(index_dir) + vector_path = (index_dir / "vector.index").as_posix() + id_map_path = (index_dir / "id_map.pkl").as_posix() + + if metric_type == "hamming": + self._indexer = faiss.read_index_binary(vector_path) + self.hamming_radius = hamming_radius + else: + self._indexer = faiss.read_index(vector_path) + self.score_thres = score_thres + with open(id_map_path, "rb") as fd: + self.id_map = pickle.load(fd) + self.metric_type = metric_type + self.return_k = return_k + self.unique_id_map = {k: v+1 for v, k in enumerate(sorted(set(self.id_map.values())))} + + def apply(self, feature): + """apply""" + scores_list, ids_list = self._indexer.search(np.array(feature), self.return_k) + preds = [] + for scores, ids in zip(scores_list, ids_list): + labels = [] + for id in ids: + if id > 0: + labels.append(self.id_map[id]) + preds.append({"score": scores, + "label": labels, + "unique_id": [self.unique_id_map[l] for l in labels]}) + + if self.metric_type == "hamming": + idxs = np.where(scores_list[:, 0] > self.hamming_radius)[0] + else: + idxs = np.where(scores_list[:, 0] < self.score_thres)[0] + for idx in idxs: + preds[idx] = {"score": None, "label": None, "unique_id": None} + return preds + + +class FaissBuilder: + + SUPPORT_MODE = ("new", "remove", "append") + SUPPORT_METRIC_TYPE = ("hamming", "IP", "L2") + SUPPORT_INDEX_TYPE = ("Flat", "IVF", "HNSW32") + BINARY_METRIC_TYPE = ("hamming", "jaccard") + BINARY_SUPPORT_INDEX_TYPE = ("Flat", "IVF", "BinaryHash") + + def __init__(self, predict, mode="new", index_type="HNSW32", metric_type="IP"): + super().__init__() + assert mode in self.SUPPORT_MODE, f"Supported modes only: {self.SUPPORT_MODE}!" + assert ( + metric_type in self.SUPPORT_METRIC_TYPE + ), f"Supported metric types only: {self.SUPPORT_METRIC_TYPE}!" + assert ( + index_type in self.SUPPORT_INDEX_TYPE + ), f"Supported index types only: {self.SUPPORT_INDEX_TYPE}!" + + self._predict = predict + self._mode = mode + self._metric_type = metric_type + self._index_type = index_type + + def _get_index_type(self, num=None): + if self._metric_type in self.BINARY_METRIC_TYPE: + assert ( + self._index_type in self.BINARY_SUPPORT_INDEX_TYPE + ), f"The metric type({self._metric_type}) only support {self.BINARY_SUPPORT_INDEX_TYPE} index types!" + + # if IVF method, cal ivf number automaticlly + if self._index_type == "IVF": + index_type = self._index_type + str(min(int(num // 8), 65536)) + if self._metric_type in self.BINARY_METRIC_TYPE: + index_type += ",BFlat" + else: + index_type += ",Flat" + + # for binary index, add B at head of index_type + if self._metric_type in self.BINARY_METRIC_TYPE: + return "B" + index_type + + if self._index_type == "HNSW32": + index_type = self._index_type + logging.warning("The HNSW32 method dose not support 'remove' operation") + return index_type + + def _get_metric_type(self): + if self._metric_type == "hamming": + return faiss.METRIC_Hamming + elif self._metric_type == "jaccard": + return faiss.METRIC_Jaccard + elif self._metric_type == "IP": + return faiss.METRIC_INNER_PRODUCT + elif self._metric_type == "L2": + return faiss.METRIC_L2 + + def build( + self, + label_file, + image_root, + index_dir, + ): + file_list, gallery_docs = get_file_list(label_file, image_root) + if self._mode != "remove": + features = [res["feature"] for res in self._predict(file_list)] + dtype = ( + np.uint8 if self._metric_type in self.BINARY_METRIC_TYPE else np.float32 + ) + features = np.array(features).astype(dtype) + vector_num, vector_dim = features.shape + + if self._mode in ["remove", "append"]: + # if remove or append, load vector.index and id_map.pkl + index, ids = self._load_index(index_dir) + else: + # build index + if self._metric_type in self.BINARY_METRIC_TYPE: + index = faiss.index_binary_factory( + vector_dim, + self._get_index_type(vector_num), + self._get_metric_type(), + ) + else: + index = faiss.index_factory( + vector_dim, + self._get_index_type(vector_num), + self._get_metric_type(), + ) + index = faiss.IndexIDMap2(index) + ids = {} + + if self._mode != "remove": + # calculate id for new data + index, ids = self._add_gallery(index, ids, features, gallery_docs) + else: + if self._index_type == "HNSW32": + raise RuntimeError( + "The index_type: HNSW32 dose not support 'remove' operation" + ) + # remove ids in id_map, remove index data in faiss index + index, ids = self._rm_id_in_galllery(index, ids, gallery_docs) + + # store faiss index file and id_map file + self._save_gallery(index, ids, index_dir) + + def _load_index(self, index_dir): + assert os.path.join( + index_dir, "vector.index" + ), "The vector.index dose not exist in {} when 'index_operation' is not None".format( + index_dir + ) + assert os.path.join( + index_dir, "id_map.pkl" + ), "The id_map.pkl dose not exist in {} when 'index_operation' is not None".format( + index_dir + ) + index = faiss.read_index(os.path.join(index_dir, "vector.index")) + with open(os.path.join(index_dir, "id_map.pkl"), "rb") as fd: + ids = pickle.load(fd) + assert index.ntotal == len( + ids.keys() + ), "data number in index is not equal in in id_map" + return index, ids + + def _add_gallery(self, index, ids, gallery_features, gallery_docs): + start_id = max(ids.keys()) + 1 if ids else 0 + ids_now = (np.arange(0, len(gallery_docs)) + start_id).astype(np.int64) + + # only train when new index file + if self._mode == "new": + if self._metric_type in self.BINARY_METRIC_TYPE: + index.add(gallery_features) + else: + index.train(gallery_features) + + if not self._metric_type in self.BINARY_METRIC_TYPE: + index.add_with_ids(gallery_features, ids_now) + + for i, d in zip(list(ids_now), gallery_docs): + ids[i] = d + return index, ids + + def _rm_id_in_galllery(self, index, ids, gallery_docs): + remove_ids = list(filter(lambda k: ids.get(k) in gallery_docs, ids.keys())) + remove_ids = np.asarray(remove_ids) + index.remove_ids(remove_ids) + for k in remove_ids: + del ids[k] + + return index, ids + + def _save_gallery(self, index, ids, index_dir): + Path(index_dir).mkdir(parents=True, exist_ok=True) + if self._metric_type in self.BINARY_METRIC_TYPE: + faiss.write_index_binary(index, os.path.join(index_dir, "vector.index")) + else: + faiss.write_index(index, os.path.join(index_dir, "vector.index")) + + with open(os.path.join(index_dir, "id_map.pkl"), "wb") as fd: + pickle.dump(ids, fd) + + +def get_file_list(data_file, root_dir, delimiter="\t"): + root_dir = Path(root_dir) + files = [] + labels = [] + lines = [] + with open(data_file, "r", encoding="utf-8") as f: + lines = f.readlines() + for line in lines: + path, label = line.strip().split(delimiter) + file_path = root_dir / path + files.append(file_path.as_posix()) + labels.append(label) + + return files, labels diff --git a/paddlex/inference/components/task_related/clas.py b/paddlex/inference/components/task_related/clas.py index 1ecf1d2e4..f1e1356e8 100644 --- a/paddlex/inference/components/task_related/clas.py +++ b/paddlex/inference/components/task_related/clas.py @@ -113,12 +113,12 @@ class NormalizeFeatures(BaseComponent): """Normalize Features Transform""" INPUT_KEYS = ["pred"] - OUTPUT_KEYS = ["rec_feature"] + OUTPUT_KEYS = ["feature"] DEAULT_INPUTS = {"pred": "pred"} - DEAULT_OUTPUTS = {"rec_feature": "rec_feature"} + DEAULT_OUTPUTS = {"feature": "feature"} def apply(self, pred): """apply""" feas_norm = np.sqrt(np.sum(np.square(pred[0]), axis=0, keepdims=True)) - rec_feature = np.divide(pred[0], feas_norm) - return {"rec_feature": rec_feature} + feature = np.divide(pred[0], feas_norm) + return {"feature": feature} diff --git a/paddlex/inference/models/__init__.py b/paddlex/inference/models/__init__.py index 49143fa4a..07feeb58f 100644 --- a/paddlex/inference/models/__init__.py +++ b/paddlex/inference/models/__init__.py @@ -34,6 +34,7 @@ from .multilabel_classification import MLClasPredictor from .anomaly_detection import UadPredictor from .formula_recognition import LaTeXOCRPredictor +from .face_recognition import FaceRecPredictor def _create_hp_predictor( diff --git a/paddlex/inference/models/face_recognition.py b/paddlex/inference/models/face_recognition.py new file mode 100644 index 000000000..075f0187f --- /dev/null +++ b/paddlex/inference/models/face_recognition.py @@ -0,0 +1,99 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np + +from paddlex.utils.func_register import FuncRegister +from paddlex.modules.face_recognition.model_list import MODELS +from ..components import * +from ..results import BaseResult +from .base import BasicPredictor + + +class FaceRecPredictor(BasicPredictor): + + entities = MODELS + + _FUNC_MAP = {} + register = FuncRegister(_FUNC_MAP) + + def _build_components(self): + self._add_component(ReadImage(format="RGB")) + for cfg in self.config["PreProcess"]["transform_ops"]: + tf_key = list(cfg.keys())[0] + func = self._FUNC_MAP[tf_key] + args = cfg.get(tf_key, {}) + op = func(self, **args) if args else func(self) + self._add_component(op) + + predictor = ImagePredictor( + model_dir=self.model_dir, + model_prefix=self.MODEL_FILE_PREFIX, + option=self.pp_option, + ) + self._add_component(predictor) + + post_processes = self.config["PostProcess"] + for key in post_processes: + func = self._FUNC_MAP.get(key) + args = post_processes.get(key, {}) + op = func(self, **args) if args else func(self) + self._add_component(op) + + @register("ResizeImage") + # TODO(gaotingquan): backend & interpolation + def build_resize( + self, + resize_short=None, + size=None, + backend="cv2", + interpolation="LINEAR", + return_numpy=False, + ): + assert resize_short or size + if resize_short: + op = ResizeByShort( + target_short_edge=resize_short, size_divisor=None, interp="LINEAR" + ) + else: + op = Resize(target_size=size) + return op + + @register("CropImage") + def build_crop(self, size=224): + return Crop(crop_size=size) + + @register("NormalizeImage") + def build_normalize( + self, + mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225], + scale=1 / 255, + order="", + channel_num=3, + ): + assert channel_num == 3 + return Normalize(mean=mean, std=std) + + @register("ToCHWImage") + def build_to_chw(self): + return ToCHWImage() + + @register("NormalizeFeatures") + def build_normalize_features(self): + return NormalizeFeatures() + + def _pack_res(self, data): + keys = ["input_path", "feature"] + return BaseResult({key: data[key] for key in keys}) diff --git a/paddlex/inference/models/object_detection.py b/paddlex/inference/models/object_detection.py index 49d8cb904..533b82008 100644 --- a/paddlex/inference/models/object_detection.py +++ b/paddlex/inference/models/object_detection.py @@ -53,7 +53,7 @@ def _build_components(self): } ) - if self.model_name == "Blazeface": + if self.model_name in ["BlazeFace", "BlazeFace-FPN-SSH"]: predictor.set_inputs( { "img": "img", diff --git a/paddlex/inference/pipelines/__init__.py b/paddlex/inference/pipelines/__init__.py index 8f1da24d4..e2aee8822 100644 --- a/paddlex/inference/pipelines/__init__.py +++ b/paddlex/inference/pipelines/__init__.py @@ -34,6 +34,7 @@ from .ocr import OCRPipeline from .formula_recognition import FormulaRecognitionPipeline from .table_recognition import TableRecPipeline +from .face_recognition import FaceRecPipeline from .seal_recognition import SealOCRPipeline from .ppchatocrv3 import PPChatOCRPipeline from .layout_parsing import LayoutParsingPipeline diff --git a/paddlex/inference/pipelines/face_recognition/__init__.py b/paddlex/inference/pipelines/face_recognition/__init__.py new file mode 100644 index 000000000..d4e1ff6dc --- /dev/null +++ b/paddlex/inference/pipelines/face_recognition/__init__.py @@ -0,0 +1,15 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .face_recognition import FaceRecPipeline diff --git a/paddlex/inference/pipelines/face_recognition/face_recognition.py b/paddlex/inference/pipelines/face_recognition/face_recognition.py new file mode 100644 index 000000000..0f66c13a7 --- /dev/null +++ b/paddlex/inference/pipelines/face_recognition/face_recognition.py @@ -0,0 +1,138 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os + +import numpy as np +from paddlex.inference.components import CropByBoxes, FaissIndexer +from paddlex.inference.components.retrieval.faiss import FaissBuilder +from paddlex.inference.results import FaceRecResult +from paddlex.inference.utils.io import ImageReader +from ..base import BasePipeline + + +class FaceRecPipeline(BasePipeline): + """Face Recognition Pipeline""" + + entities = "face_recognition" + + def __init__( + self, + det_model, + rec_model, + det_batch_size=1, + rec_batch_size=1, + index_dir=None, + metric_type="IP", + score_thres=None, + hamming_radius=None, + return_k=5, + device=None, + predictor_kwargs=None, + ): + super().__init__(device, predictor_kwargs) + self._build_predictor(det_model, rec_model) + self.set_predictor(det_batch_size, rec_batch_size, device) + self.indexer_kargs = { + "return_k": return_k, + "metric_type": metric_type, + "score_thres": score_thres, + "hamming_radius": hamming_radius, + } + self._indexer = ( + FaissIndexer(index_dir, metric_type, return_k, score_thres, hamming_radius) + if index_dir + else None + ) + + def _build_predictor(self, det_model, rec_model): + self.det_model = self._create(model=det_model) + self.rec_model = self._create(model=rec_model) + self._crop_by_boxes = CropByBoxes() + self._img_reader = ImageReader(backend="opencv") + + def set_predictor(self, det_batch_size=None, rec_batch_size=None, device=None): + if det_batch_size: + self.det_model.set_predictor(batch_size=det_batch_size) + if rec_batch_size: + self.rec_model.set_predictor(batch_size=rec_batch_size) + if device: + self.det_model.set_predictor(device=device) + self.rec_model.set_predictor(device=device) + + def predict(self, input, **kwargs): + assert self._indexer + self.set_predictor(**kwargs) + for det_res in self.det_model(input): + rec_res = self.get_rec_result(det_res) + yield self.get_final_result(det_res, rec_res) + + def get_rec_result(self, det_res): + full_img = self._img_reader.read(det_res["input_path"]) + w, h = full_img.shape[:2] + # det_res["boxes"].append( + # {"cls_id": 0, "label": "full_img", "score": 0, "coordinate": [0, 0, h, w]} + # ) + subs_of_img = list(self._crop_by_boxes(det_res)) + img_list = [img["img"] for img in subs_of_img] + all_rec_res = list(self.rec_model(img_list)) + all_rec_res = next(self._indexer(all_rec_res)) + output = {"label": [], "score": [], "unique_id": []} + for res in all_rec_res: + output["label"].append(res["label"]) + output["score"].append(res["score"]) + output["unique_id"].append(res["unique_id"]) + return output + + def get_final_result(self, det_res, rec_res): + single_img_res = {"input_path": det_res["input_path"], "boxes": []} + for i, obj in enumerate(det_res["boxes"]): + rec_scores = rec_res["score"][i] + labels = rec_res["label"][i] + rec_ids = rec_res["unique_id"][i] + single_img_res["boxes"].append( + { + "labels": labels, + "rec_scores": rec_scores, + "rec_ids": rec_ids, + "det_score": obj["score"], + "coordinate": obj["coordinate"], + } + ) + return FaceRecResult(single_img_res) + + def build_index( + self, + data_root, + index_dir, + mode="new", + metric_type="IP", + index_type="HNSW32", + **kwargs, + ): + self.set_predictor(**kwargs) + builder = FaissBuilder( + self.rec_model.predict, + mode=mode, + metric_type=metric_type, + index_type=index_type, + ) + label_file = os.path.join(data_root, "gallery.txt") + assert os.path.exists(label_file), f"{label_file} not exists." + builder.build(label_file, data_root, index_dir) + self._indexer = FaissIndexer(index_dir, metric_type, + return_k=self.indexer_kargs['return_k'], + score_thres=self.indexer_kargs['score_thres'], + hamming_radius=self.indexer_kargs['hamming_radius']) + return diff --git a/paddlex/inference/results/__init__.py b/paddlex/inference/results/__init__.py index 0438063f4..63fe66ce4 100644 --- a/paddlex/inference/results/__init__.py +++ b/paddlex/inference/results/__init__.py @@ -26,3 +26,4 @@ from .ts import TSFcResult, TSAdResult, TSClsResult from .warp import DocTrResult from .chat_ocr import * +from .face_rec import FaceRecResult diff --git a/paddlex/inference/results/face_rec.py b/paddlex/inference/results/face_rec.py new file mode 100644 index 000000000..30118d21b --- /dev/null +++ b/paddlex/inference/results/face_rec.py @@ -0,0 +1,35 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +from .base import CVResult +from .det import draw_box + + +class FaceRecResult(CVResult): + + def _to_img(self): + """apply""" + image = self._img_reader.read(self["input_path"]) + boxes = [ + { + "coordinate": box["coordinate"], + "label": box["labels"][0] if box["labels"] is not None else "Unknown", + "score": box["rec_scores"][0] if box["rec_scores"] is not None else 0, + "cls_id": box["rec_ids"][0] if box["rec_ids"] is not None else 0 # rec ids start from 1 + } + for box in self["boxes"] + ] + image = draw_box(image, boxes) + return image diff --git a/paddlex/inference/utils/official_models.py b/paddlex/inference/utils/official_models.py index e4a39c8c3..b8ab0b3f0 100644 --- a/paddlex/inference/utils/official_models.py +++ b/paddlex/inference/utils/official_models.py @@ -258,6 +258,11 @@ "RT-DETR-H_layout_3cls": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/RT-DETR-H_layout_3cls_infer.tar", "RT-DETR-H_layout_17cls": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/RT-DETR-H_layout_17cls_infer.tar", "PicoDet_LCNet_x2_5_face": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/PicoDet_LCNet_x2_5_face_infer.tar", + "BlazeFace": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/BlazeFace_infer.tar", + "BlazeFace-FPN-SSH": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/BlazeFace-FPN-SSH_infer.tar", + "PP-YOLOE_plus-S_face": "https://paddle-model-ecology.bj.bcebos.com/paddlex/PP-YOLOE_plus-S_face_infer.tar", + "MobileFaceNet": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/MobileFaceNet_infer.tar", + "ResNet50_face": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b1_v2/ResNet50_face_infer.tar" } diff --git a/paddlex/modules/__init__.py b/paddlex/modules/__init__.py index 0369f2d58..27c23eeec 100644 --- a/paddlex/modules/__init__.py +++ b/paddlex/modules/__init__.py @@ -95,4 +95,11 @@ TSCLSExportor, ) +from .face_recognition import ( + FaceRecDatasetChecker, + FaceRecTrainer, + FaceRecEvaluator, + FaceRecExportor, +) + from .ts_forecast import TSFCDatasetChecker, TSFCTrainer, TSFCEvaluator diff --git a/paddlex/modules/face_recognition/__init__.py b/paddlex/modules/face_recognition/__init__.py new file mode 100644 index 000000000..c9092df7d --- /dev/null +++ b/paddlex/modules/face_recognition/__init__.py @@ -0,0 +1,18 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .trainer import FaceRecTrainer +from .dataset_checker import FaceRecDatasetChecker +from .evaluator import FaceRecEvaluator +from .exportor import FaceRecExportor diff --git a/paddlex/modules/face_recognition/dataset_checker/__init__.py b/paddlex/modules/face_recognition/dataset_checker/__init__.py new file mode 100644 index 000000000..a7f2fac8a --- /dev/null +++ b/paddlex/modules/face_recognition/dataset_checker/__init__.py @@ -0,0 +1,71 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from pathlib import Path + +from ...base import BaseDatasetChecker +from .dataset_src import check_train, check_val +from ..model_list import MODELS + + +class FaceRecDatasetChecker(BaseDatasetChecker): + """Dataset Checker for Image Classification Model""" + + entities = MODELS + sample_num = 10 + + def get_dataset_root(self, dataset_dir: str) -> str: + """find the dataset root dir + + Args: + dataset_dir (str): the directory that contain dataset. + + Returns: + str: the root directory of dataset. + """ + anno_dirs = list(Path(dataset_dir).glob("**/images")) + assert len(anno_dirs) == 2 + dataset_dir = anno_dirs[0].parent.parent.as_posix() + return dataset_dir + + def check_dataset(self, dataset_dir: str, sample_num: int = sample_num) -> dict: + """check if the dataset meets the specifications and get dataset summary + + Args: + dataset_dir (str): the root directory of dataset. + sample_num (int): the number to be sampled. + Returns: + dict: dataset summary. + """ + train_attr = check_train(os.path.join(dataset_dir, "train"), self.output) + val_attr = check_val(os.path.join(dataset_dir, "val"), self.output) + train_attr.update(val_attr) + return train_attr + + def get_show_type(self) -> str: + """get the show type of dataset + + Returns: + str: show type + """ + return "image" + + def get_dataset_type(self) -> str: + """return the dataset type + + Returns: + str: dataset type + """ + return "ClsDataset" diff --git a/paddlex/modules/face_recognition/dataset_checker/dataset_src/__init__.py b/paddlex/modules/face_recognition/dataset_checker/dataset_src/__init__.py new file mode 100644 index 000000000..f807cfecb --- /dev/null +++ b/paddlex/modules/face_recognition/dataset_checker/dataset_src/__init__.py @@ -0,0 +1,16 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from .check_dataset import check_train, check_val diff --git a/paddlex/modules/face_recognition/dataset_checker/dataset_src/check_dataset.py b/paddlex/modules/face_recognition/dataset_checker/dataset_src/check_dataset.py new file mode 100644 index 000000000..939a51113 --- /dev/null +++ b/paddlex/modules/face_recognition/dataset_checker/dataset_src/check_dataset.py @@ -0,0 +1,156 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import os.path as osp +import random +import pickle +from PIL import Image, ImageOps +from collections import defaultdict +from tqdm import tqdm + +from .....utils.errors import DatasetFileNotFoundError, CheckFailedError +from .utils.visualizer import draw_label + + +def check_train(dataset_dir, output, sample_num=10): + """check dataset""" + dataset_dir = osp.abspath(dataset_dir) + # Custom dataset + if not osp.exists(dataset_dir) or not osp.isdir(dataset_dir): + raise DatasetFileNotFoundError(file_path=dataset_dir) + + delim = " " + valid_num_parts = 2 + + label_map_dict = dict() + sample_paths = [] + labels = [] + + label_file = osp.join(dataset_dir, "label.txt") + if not osp.exists(label_file): + raise DatasetFileNotFoundError( + file_path=label_file, + solution=f"Ensure that `label.txt` exist in {dataset_dir}", + ) + with open(label_file, "r", encoding="utf-8") as f: + all_lines = f.readlines() + random.seed(123) + random.shuffle(all_lines) + sample_cnts = len(all_lines) + for line in all_lines: + substr = line.strip("\n").split(delim) + if len(substr) != valid_num_parts: + raise CheckFailedError( + f"The number of delimiter-separated items in each row in {label_file} \ + should be {valid_num_parts} (current delimiter is '{delim}')." + ) + file_name = substr[0] + label = substr[1] + + img_path = osp.join(dataset_dir, file_name) + + if not osp.exists(img_path): + raise DatasetFileNotFoundError(file_path=img_path) + + vis_save_dir = osp.join(output, "demo_img") + if not osp.exists(vis_save_dir): + os.makedirs(vis_save_dir) + + try: + label = int(label) + label_map_dict[label] = str(label) + except (ValueError, TypeError) as e: + raise CheckFailedError( + f"Ensure that the second number in each line in {label_file} should be int." + ) from e + + if len(sample_paths) < sample_num: + img = Image.open(img_path) + img = ImageOps.exif_transpose(img) + vis_im = draw_label(img, label, label_map_dict) + vis_path = osp.join(vis_save_dir, osp.basename(file_name)) + vis_im.save(vis_path) + sample_path = osp.join( + "check_dataset", os.path.relpath(vis_path, output) + ) + sample_paths.append(sample_path) + labels.append(label) + if min(labels) != 0: + raise CheckFailedError( + f"Ensure that the index starts from 0 in `{label_file}`." + ) + num_classes = max(labels) + 1 + attrs = {} + attrs["train_label_file"] = osp.relpath(label_file, output) + attrs["train_num_classes"] = num_classes + attrs["train_samples"] = sample_cnts + attrs["train_sample_paths"] = sample_paths + return attrs + +def check_val(dataset_dir, output, sample_num=10): + """check dataset""" + dataset_dir = osp.abspath(dataset_dir) + # Custom dataset + if not osp.exists(dataset_dir) or not osp.isdir(dataset_dir): + raise DatasetFileNotFoundError(file_path=dataset_dir) + + delim = " " + valid_num_parts = 3 + + labels = [] + label_file = osp.join(dataset_dir, "pair_label.txt") + if not osp.exists(label_file): + raise DatasetFileNotFoundError( + file_path=label_file, + solution=f"Ensure that `label.txt` exist in {dataset_dir}", + ) + with open(label_file, "r", encoding="utf-8") as f: + all_lines = f.readlines() + random.seed(123) + random.shuffle(all_lines) + sample_cnts = len(all_lines) + for line in all_lines: + substr = line.strip("\n").split(delim) + if len(substr) != valid_num_parts: + raise CheckFailedError( + f"The number of delimiter-separated items in each row in {label_file} \ + should be {valid_num_parts} (current delimiter is '{delim}')." + ) + left_file_name = substr[0] + right_file_name = substr[1] + label = substr[2] + + left_img_path = osp.join(dataset_dir, left_file_name) + if not osp.exists(left_img_path): + raise DatasetFileNotFoundError(file_path=left_img_path) + + right_img_path = osp.join(dataset_dir, right_file_name) + if not osp.exists(right_img_path): + raise DatasetFileNotFoundError(file_path=right_img_path) + + try: + label = int(label) + assert label in [0, 1], "Face eval dataset only support two classes" + except (ValueError, TypeError) as e: + raise CheckFailedError( + f"Ensure that the second number in each line in {label_file} should be int." + ) from e + labels.append(label) + num_classes = max(labels) + 1 + attrs = {} + attrs["val_label_file"] = osp.relpath(label_file, output) + attrs["val_num_classes"] = num_classes + attrs["val_samples"] = sample_cnts + return attrs diff --git a/paddlex/modules/face_recognition/dataset_checker/dataset_src/utils/__init__.py b/paddlex/modules/face_recognition/dataset_checker/dataset_src/utils/__init__.py new file mode 100644 index 000000000..59372f937 --- /dev/null +++ b/paddlex/modules/face_recognition/dataset_checker/dataset_src/utils/__init__.py @@ -0,0 +1,13 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/paddlex/modules/face_recognition/dataset_checker/dataset_src/utils/visualizer.py b/paddlex/modules/face_recognition/dataset_checker/dataset_src/utils/visualizer.py new file mode 100644 index 000000000..110e0ec6d --- /dev/null +++ b/paddlex/modules/face_recognition/dataset_checker/dataset_src/utils/visualizer.py @@ -0,0 +1,156 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import numpy as np +import json +from pathlib import Path +import PIL +from PIL import Image, ImageDraw, ImageFont + +from ......utils.fonts import PINGFANG_FONT_FILE_PATH + + +def colormap(rgb=False): + """ + Get colormap + """ + color_list = np.array( + [ + 0xFF, + 0x00, + 0x00, + 0xCC, + 0xFF, + 0x00, + 0x00, + 0xFF, + 0x66, + 0x00, + 0x66, + 0xFF, + 0xCC, + 0x00, + 0xFF, + 0xFF, + 0x4D, + 0x00, + 0x80, + 0xFF, + 0x00, + 0x00, + 0xFF, + 0xB2, + 0x00, + 0x1A, + 0xFF, + 0xFF, + 0x00, + 0xE5, + 0xFF, + 0x99, + 0x00, + 0x33, + 0xFF, + 0x00, + 0x00, + 0xFF, + 0xFF, + 0x33, + 0x00, + 0xFF, + 0xFF, + 0x00, + 0x99, + 0xFF, + 0xE5, + 0x00, + 0x00, + 0xFF, + 0x1A, + 0x00, + 0xB2, + 0xFF, + 0x80, + 0x00, + 0xFF, + 0xFF, + 0x00, + 0x4D, + ] + ).astype(np.float32) + color_list = color_list.reshape((-1, 3)) + if not rgb: + color_list = color_list[:, ::-1] + return color_list.astype("int32") + + +def font_colormap(color_index): + """ + Get font colormap + """ + dark = np.array([0x14, 0x0E, 0x35]) + light = np.array([0xFF, 0xFF, 0xFF]) + light_indexs = [0, 3, 4, 8, 9, 13, 14, 18, 19] + if color_index in light_indexs: + return light.astype("int32") + else: + return dark.astype("int32") + + +def draw_label(image, label, label_map_dict): + """Draw label on image""" + image = image.convert("RGB") + image_size = image.size + draw = ImageDraw.Draw(image) + min_font_size = int(image_size[0] * 0.02) + max_font_size = int(image_size[0] * 0.05) + for font_size in range(max_font_size, min_font_size - 1, -1): + font = ImageFont.truetype(PINGFANG_FONT_FILE_PATH, font_size, encoding="utf-8") + if tuple(map(int, PIL.__version__.split("."))) <= (10, 0, 0): + text_width_tmp, text_height_tmp = draw.textsize( + label_map_dict[int(label)], font + ) + else: + left, top, right, bottom = draw.textbbox( + (0, 0), label_map_dict[int(label)], font + ) + text_width_tmp, text_height_tmp = right - left, bottom - top + if text_width_tmp <= image_size[0]: + break + else: + font = ImageFont.truetype(PINGFANG_FONT_FILE_PATH, min_font_size) + color_list = colormap(rgb=True) + color = tuple(color_list[0]) + font_color = tuple(font_colormap(3)) + if tuple(map(int, PIL.__version__.split("."))) <= (10, 0, 0): + text_width, text_height = draw.textsize(label_map_dict[int(label)], font) + else: + left, top, right, bottom = draw.textbbox( + (0, 0), label_map_dict[int(label)], font + ) + text_width, text_height = right - left, bottom - top + + rect_left = 3 + rect_top = 3 + rect_right = rect_left + text_width + 3 + rect_bottom = rect_top + text_height + 6 + + draw.rectangle([(rect_left, rect_top), (rect_right, rect_bottom)], fill=color) + + text_x = rect_left + 3 + text_y = rect_top + draw.text((text_x, text_y), label_map_dict[int(label)], fill=font_color, font=font) + + return image diff --git a/paddlex/modules/face_recognition/evaluator.py b/paddlex/modules/face_recognition/evaluator.py new file mode 100644 index 000000000..67875f2b9 --- /dev/null +++ b/paddlex/modules/face_recognition/evaluator.py @@ -0,0 +1,51 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os + +from paddlex.utils.misc import abspath +from ..base import BaseEvaluator +from .model_list import MODELS + + +class FaceRecEvaluator(BaseEvaluator): + """Image Classification Model Evaluator""" + + entities = MODELS + + def update_config(self): + """update evalution config""" + if self.eval_config.log_interval: + self.pdx_config.update_log_interval(self.eval_config.log_interval) + self.update_dataset_cfg() + self.pdx_config.update_pretrained_weights(self.eval_config.weight_path) + def update_dataset_cfg(self): + val_dataset_dir = abspath(os.path.join(self.global_config.dataset_dir, "val")) + val_list_path = abspath(os.path.join(val_dataset_dir, "pair_label.txt")) + ds_cfg = [ + f"DataLoader.Eval.dataset.name=FaceEvalDataset", + f"DataLoader.Eval.dataset.dataset_root={val_dataset_dir}", + f"DataLoader.Eval.dataset.pair_label_path={val_list_path}", + ] + self.pdx_config.update(ds_cfg) + + def get_eval_kwargs(self) -> dict: + """get key-value arguments of model evalution function + + Returns: + dict: the arguments of evaluation function. + """ + return { + "weight_path": self.eval_config.weight_path, + "device": self.get_device(using_device_number=1), + } diff --git a/paddlex/modules/face_recognition/exportor.py b/paddlex/modules/face_recognition/exportor.py new file mode 100644 index 000000000..6f73a84c9 --- /dev/null +++ b/paddlex/modules/face_recognition/exportor.py @@ -0,0 +1,22 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from ..base import BaseExportor +from .model_list import MODELS + + +class FaceRecExportor(BaseExportor): + """Image Classification Model Exportor""" + + entities = MODELS diff --git a/paddlex/modules/face_recognition/model_list.py b/paddlex/modules/face_recognition/model_list.py new file mode 100644 index 000000000..f4634163c --- /dev/null +++ b/paddlex/modules/face_recognition/model_list.py @@ -0,0 +1,18 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +MODELS = [ + "MobileFaceNet", + "ResNet50_face" +] diff --git a/paddlex/modules/face_recognition/trainer.py b/paddlex/modules/face_recognition/trainer.py new file mode 100644 index 000000000..bd4ed5a45 --- /dev/null +++ b/paddlex/modules/face_recognition/trainer.py @@ -0,0 +1,73 @@ +# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from pathlib import Path + +from paddlex.utils.misc import abspath +from ..image_classification import ClsTrainer +from .model_list import MODELS + + +class FaceRecTrainer(ClsTrainer): + """Image Classification Model Trainer""" + + entities = MODELS + + def update_config(self): + """update training config""" + if self.train_config.log_interval: + self.pdx_config.update_log_interval(self.train_config.log_interval) + if self.train_config.eval_interval: + self.pdx_config.update_eval_interval(self.train_config.eval_interval) + if self.train_config.save_interval: + self.pdx_config.update_save_interval(self.train_config.save_interval) + + self.update_dataset_cfg() + if self.train_config.num_classes is not None: + self.pdx_config.update_num_classes(self.train_config.num_classes) + if self.train_config.pretrain_weight_path != "": + self.pdx_config.update_pretrained_weights( + self.train_config.pretrain_weight_path + ) + + label_dict_path = Path(self.global_config.dataset_dir).joinpath("label.txt") + if label_dict_path.exists(): + self.dump_label_dict(label_dict_path) + if self.train_config.batch_size is not None: + self.pdx_config.update_batch_size(self.train_config.batch_size) + if self.train_config.learning_rate is not None: + self.pdx_config.update_learning_rate(self.train_config.learning_rate) + if self.train_config.epochs_iters is not None: + self.pdx_config._update_epochs(self.train_config.epochs_iters) + if self.train_config.warmup_steps is not None: + self.pdx_config.update_warmup_epochs(self.train_config.warmup_steps) + if self.global_config.output is not None: + self.pdx_config._update_output_dir(self.global_config.output) + + def update_dataset_cfg(self): + train_dataset_dir = abspath(os.path.join(self.global_config.dataset_dir, "train")) + val_dataset_dir = abspath(os.path.join(self.global_config.dataset_dir, "val")) + train_list_path = abspath(os.path.join(train_dataset_dir, "label.txt")) + val_list_path = abspath(os.path.join(val_dataset_dir, "pair_label.txt")) + + ds_cfg = [ + f"DataLoader.Train.dataset.name=ClsDataset", + f"DataLoader.Train.dataset.image_root={train_dataset_dir}", + f"DataLoader.Train.dataset.cls_label_path={train_list_path}", + f"DataLoader.Eval.dataset.name=FaceEvalDataset", + f"DataLoader.Eval.dataset.dataset_root={val_dataset_dir}", + f"DataLoader.Eval.dataset.pair_label_path={val_list_path}", + ] + self.pdx_config.update(ds_cfg) diff --git a/paddlex/modules/object_detection/model_list.py b/paddlex/modules/object_detection/model_list.py index 1f3172857..e8099ac2d 100644 --- a/paddlex/modules/object_detection/model_list.py +++ b/paddlex/modules/object_detection/model_list.py @@ -64,4 +64,7 @@ "CenterNet-DLA-34", "CenterNet-ResNet50", "PicoDet_LCNet_x2_5_face", + "BlazeFace", + "BlazeFace-FPN-SSH", + "PP-YOLOE_plus-S_face" ] diff --git a/paddlex/pipelines/face_recognition.yaml b/paddlex/pipelines/face_recognition.yaml new file mode 100644 index 000000000..ce1cccb27 --- /dev/null +++ b/paddlex/pipelines/face_recognition.yaml @@ -0,0 +1,13 @@ +Global: + pipeline_name: face_recognition + input: https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/friends1.jpg + +Pipeline: + det_model: "BlazeFace" + rec_model: "MobileFaceNet" + det_batch_size: 1 + rec_batch_size: 1 + device: gpu + index_dir: None + score_thres: 0.4 + return_k: 5 diff --git a/paddlex/repo_apis/PaddleClas_api/cls/register.py b/paddlex/repo_apis/PaddleClas_api/cls/register.py index 98f321164..abdfde2c4 100644 --- a/paddlex/repo_apis/PaddleClas_api/cls/register.py +++ b/paddlex/repo_apis/PaddleClas_api/cls/register.py @@ -947,3 +947,25 @@ "hpi_config_path": None, } ) + +register_model_info( + { + "model_name": "MobileFaceNet", + "suite": "Cls", + "config_path": osp.join(PDX_CONFIG_DIR, "MobileFaceNet.yaml"), + "supported_apis": ["train", "evaluate", "predict", "export", "infer"], + "infer_config": "deploy/configs/inference_cls.yaml", + "hpi_config_path": None, + } +) + +register_model_info( + { + "model_name": "ResNet50_face", + "suite": "Cls", + "config_path": osp.join(PDX_CONFIG_DIR, "ResNet50_face.yaml"), + "supported_apis": ["train", "evaluate", "predict", "export", "infer"], + "infer_config": "deploy/configs/inference_cls.yaml", + "hpi_config_path": None, + } +) diff --git a/paddlex/repo_apis/PaddleClas_api/configs/MobileFaceNet.yaml b/paddlex/repo_apis/PaddleClas_api/configs/MobileFaceNet.yaml new file mode 100644 index 000000000..c33eba3bb --- /dev/null +++ b/paddlex/repo_apis/PaddleClas_api/configs/MobileFaceNet.yaml @@ -0,0 +1,126 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 25 + print_batch_step: 20 + use_visualdl: False + eval_mode: face_recognition + retrieval_feature_from: backbone + flip_test: True + feature_normalize: False + re_ranking: False + use_dali: False + # used for static mode and model export + image_shape: [3, 112, 112] + save_inference_dir: ./inference + +AMP: + scale_loss: 27648 + use_dynamic_loss_scaling: True + # O1: mixed fp16 + level: O1 + +# model architecture +Arch: + name: RecModel + infer_output_key: features + infer_add_softmax: False + + Backbone: + name: MobileFaceNet + Head: + name: ArcMargin + embedding_size: 128 + class_num: 93431 + margin: 0.5 + scale: 64 +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + +Optimizer: + name: AdamW + beta1: 0.9 + beta2: 0.999 + epsilon: 1e-8 + weight_decay: 0.05 + one_dim_param_no_weight_decay: True + lr: + # for 8 cards + name: Cosine + learning_rate: 4e-3 # lr 4e-3 for total_batch_size 1024 + eta_min: 1e-6 + warmup_epoch: 1 + warmup_start_lr: 0 + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: dataset/MS1M_v3/ + cls_label_path: dataset/MS1M_v3/label.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + backend: cv2 + - RandFlipImage: + flip_code: 1 + - ResizeImage: + size: [112, 112] + return_numpy: False + interpolation: bilinear + backend: cv2 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.5, 0.5, 0.5] + std: [0.5, 0.5, 0.5] + order: hwc + sampler: + name: DistributedBatchSampler + batch_size: 128 + drop_last: False + shuffle: True + loader: + num_workers: 8 + use_shared_memory: True + + Eval: + dataset: + name: FiveEvalDataset + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + backend: cv2 + - ResizeImage: + size: [112, 112] + return_numpy: False + interpolation: bilinear + backend: cv2 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.5, 0.5, 0.5] + std: [0.5, 0.5, 0.5] + order: hwc + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Metric: + Eval: + - BestAccuracy: {} diff --git a/paddlex/repo_apis/PaddleClas_api/configs/ResNet50_face.yaml b/paddlex/repo_apis/PaddleClas_api/configs/ResNet50_face.yaml new file mode 100644 index 000000000..cba9d40c6 --- /dev/null +++ b/paddlex/repo_apis/PaddleClas_api/configs/ResNet50_face.yaml @@ -0,0 +1,129 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + save_interval: 10 + eval_during_train: True + eval_interval: 1 + epochs: 25 + print_batch_step: 20 + use_visualdl: False + eval_mode: face_recognition + retrieval_feature_from: backbone + flip_test: True + feature_normalize: False + re_ranking: False + use_dali: False + # used for static mode and model export + image_shape: [3, 112, 112] + save_inference_dir: ./inference + +AMP: + scale_loss: 27648.0 + use_dynamic_loss_scaling: True + # O1: mixed fp16 + level: O1 + +# model architecture +Arch: + name: RecModel + infer_output_key: features + infer_add_softmax: False + + Backbone: + name: ResNet50 + max_pool: False + stride_list: [1, 2, 2, 2, 2] + class_num: 512 + Head: + name: ArcMargin + embedding_size: 512 + class_num: 995 + margin: 0.5 + scale: 64 +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + +Optimizer: + name: AdamW + beta1: 0.9 + beta2: 0.999 + epsilon: 1e-8 + weight_decay: 0.05 + one_dim_param_no_weight_decay: True + lr: + # for 8 cards + name: Cosine + learning_rate: 4e-3 # lr 4e-3 for total_batch_size 1024 + eta_min: 1e-6 + warmup_epoch: 1 + warmup_start_lr: 0 + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: dataset/MS1M_v3/ + cls_label_path: dataset/MS1M_v3/label.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + backend: cv2 + - RandFlipImage: + flip_code: 1 + - ResizeImage: + size: [112, 112] + return_numpy: False + interpolation: bilinear + backend: cv2 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.5, 0.5, 0.5] + std: [0.5, 0.5, 0.5] + order: hwc + sampler: + name: DistributedBatchSampler + batch_size: 128 + drop_last: False + shuffle: True + loader: + num_workers: 8 + use_shared_memory: True + + Eval: + dataset: + name: FaceEvalDataset + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + backend: cv2 + - ResizeImage: + size: [112, 112] + return_numpy: False + interpolation: bilinear + backend: cv2 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.5, 0.5, 0.5] + std: [0.5, 0.5, 0.5] + order: hwc + sampler: + name: DistributedBatchSampler + batch_size: 32 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Metric: + Eval: + - BestAccuracy: {} diff --git a/paddlex/repo_apis/PaddleDetection_api/configs/BlazeFace-FPN-SSH.yaml b/paddlex/repo_apis/PaddleDetection_api/configs/BlazeFace-FPN-SSH.yaml new file mode 100644 index 000000000..c6c11f7e7 --- /dev/null +++ b/paddlex/repo_apis/PaddleDetection_api/configs/BlazeFace-FPN-SSH.yaml @@ -0,0 +1,154 @@ +# Runtime +use_gpu: true +use_xpu: false +use_mlu: false +use_npu: false +log_iter: 20 +save_dir: output +print_flops: false +print_params: false +weights: output/blazeface_fpn_ssh_1000e/model_final +snapshot_epoch: 10 + +# Model +architecture: BlazeFace + +BlazeFace: + backbone: BlazeNet + neck: BlazeNeck + blaze_head: FaceHead + post_process: BBoxPostProcess + +BlazeNet: + blaze_filters: [[24, 24], [24, 24], [24, 48, 2], [48, 48], [48, 48]] + double_blaze_filters: [[48, 24, 96, 2], [96, 24, 96], [96, 24, 96], + [96, 24, 96, 2], [96, 24, 96], [96, 24, 96]] + act: hard_swish + +BlazeNeck: + neck_type : fpn_ssh + in_channel: [96,96] + +FaceHead: + in_channels: [48, 48] + anchor_generator: AnchorGeneratorSSD + loss: SSDLoss + +SSDLoss: + overlap_threshold: 0.35 + +AnchorGeneratorSSD: + steps: [8., 16.] + aspect_ratios: [[1.], [1.]] + min_sizes: [[16.,24.], [32., 48., 64., 80., 96., 128.]] + max_sizes: [[], []] + offset: 0.5 + flip: False + min_max_aspect_ratios_order: false + +BBoxPostProcess: + decode: + name: SSDBox + nms: + name: MultiClassNMS + keep_top_k: 750 + score_threshold: 0.01 + nms_threshold: 0.3 + nms_top_k: 5000 + nms_eta: 1.0 + +# Optimizer +epoch: 1000 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 333 + - 800 + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 500 +OptimizerBuilder: + optimizer: + momentum: 0.0 + type: RMSProp + regularizer: + factor: 0.0005 + type: L2 + +# Dataset +metric: WiderFace +num_classes: 1 +TrainDataset: + name: COCODataSet + image_dir: WIDER_train/images + anno_path: train.json + dataset_dir: data_face + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + name: COCODataSet + image_dir: WIDER_val/images + anno_path: val.json + dataset_dir: data_face + allow_empty: true + +TestDataset: + name: COCODataSet + image_dir: WIDER_val/images + anno_path: val.json + dataset_dir: data_face + +# Reader +worker_num: 8 +TrainReader: + inputs_def: + num_max_boxes: 90 + sample_transforms: + - Decode: {} + - RandomDistort: {brightness: [0.5, 1.125, 0.875], random_apply: False} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomFlip: {} + - CropWithDataAchorSampling: { + anchor_sampler: [[1, 10, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.2, 0.0]], + batch_sampler: [ + [1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + ], + target_size: 640} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 1} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 90} + batch_transforms: + - NormalizeImage: {mean: [123, 117, 104], std: [127.502231, 127.502231, 127.502231], is_scale: false} + - Permute: {} + batch_size: 16 + shuffle: true + drop_last: true +EvalReader: + sample_transforms: + - Decode: {} + - NormalizeImage: {mean: [123, 117, 104], std: [127.502231, 127.502231, 127.502231], is_scale: false} + - Permute: {} + batch_size: 1 + collate_samples: false + shuffle: false + drop_last: false +TestReader: + sample_transforms: + - Decode: {} + - NormalizeImage: {mean: [123, 117, 104], std: [127.502231, 127.502231, 127.502231], is_scale: false} + - Permute: {} + batch_size: 1 + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + fuse_conv_bn: False diff --git a/paddlex/repo_apis/PaddleDetection_api/configs/BlazeFace.yaml b/paddlex/repo_apis/PaddleDetection_api/configs/BlazeFace.yaml new file mode 100644 index 000000000..19ff860ed --- /dev/null +++ b/paddlex/repo_apis/PaddleDetection_api/configs/BlazeFace.yaml @@ -0,0 +1,147 @@ +# Runtime +use_gpu: true +use_xpu: false +use_mlu: false +use_npu: false +log_iter: 20 +save_dir: output +print_flops: false +print_params: false +weights: output/blazeface_1000e/model_final +snapshot_epoch: 10 + +# Model +architecture: BlazeFace +BlazeFace: + backbone: BlazeNet + neck: BlazeNeck + blaze_head: FaceHead + post_process: BBoxPostProcess +BlazeNet: + blaze_filters: [[24, 24], [24, 24], [24, 48, 2], [48, 48], [48, 48]] + double_blaze_filters: [[48, 24, 96, 2], [96, 24, 96], [96, 24, 96], + [96, 24, 96, 2], [96, 24, 96], [96, 24, 96]] + act: relu +BlazeNeck: + neck_type : None + in_channel: [96,96] +FaceHead: + in_channels: [96,96] + anchor_generator: AnchorGeneratorSSD + loss: SSDLoss +SSDLoss: + overlap_threshold: 0.35 +AnchorGeneratorSSD: + steps: [8., 16.] + aspect_ratios: [[1.], [1.]] + min_sizes: [[16.,24.], [32., 48., 64., 80., 96., 128.]] + max_sizes: [[], []] + offset: 0.5 + flip: False + min_max_aspect_ratios_order: false +BBoxPostProcess: + decode: + name: SSDBox + nms: + name: MultiClassNMS + keep_top_k: 750 + score_threshold: 0.01 + nms_threshold: 0.3 + nms_top_k: 5000 + nms_eta: 1.0 + +# Optimizer +epoch: 1000 +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + - 333 + - 800 + - !LinearWarmup + start_factor: 0.3333333333333333 + steps: 500 +OptimizerBuilder: + optimizer: + momentum: 0.0 + type: RMSProp + regularizer: + factor: 0.0005 + type: L2 + +# Dataset +metric: WiderFace +num_classes: 1 +TrainDataset: + name: COCODataSet + image_dir: WIDER_train/images + anno_path: train.json + dataset_dir: data_face + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + name: COCODataSet + image_dir: WIDER_val/images + anno_path: val.json + dataset_dir: data_face + allow_empty: true + +TestDataset: + name: COCODataSet + image_dir: WIDER_val/images + anno_path: val.json + dataset_dir: data_face + +# Reader +worker_num: 8 +TrainReader: + inputs_def: + num_max_boxes: 90 + sample_transforms: + - Decode: {} + - RandomDistort: {brightness: [0.5, 1.125, 0.875], random_apply: False} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomFlip: {} + - CropWithDataAchorSampling: { + anchor_sampler: [[1, 10, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.2, 0.0]], + batch_sampler: [ + [1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + [1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0], + ], + target_size: 640} + - Resize: {target_size: [640, 640], keep_ratio: False, interp: 1} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 90} + batch_transforms: + - NormalizeImage: {mean: [123, 117, 104], std: [127.502231, 127.502231, 127.502231], is_scale: false} + - Permute: {} + batch_size: 16 + shuffle: true + drop_last: true +EvalReader: + sample_transforms: + - Decode: {} + - NormalizeImage: {mean: [123, 117, 104], std: [127.502231, 127.502231, 127.502231], is_scale: false} + - Permute: {} + batch_size: 1 + collate_samples: false + shuffle: false + drop_last: false +TestReader: + sample_transforms: + - Decode: {} + - NormalizeImage: {mean: [123, 117, 104], std: [127.502231, 127.502231, 127.502231], is_scale: false} + - Permute: {} + batch_size: 1 + +# Exporting the model +export: + post_process: True # Whether post-processing is included in the network when export model. + nms: True # Whether NMS is included in the network when export model. + benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported. + fuse_conv_bn: False diff --git a/paddlex/repo_apis/PaddleDetection_api/configs/PP-YOLOE_plus-S_face.yaml b/paddlex/repo_apis/PaddleDetection_api/configs/PP-YOLOE_plus-S_face.yaml new file mode 100644 index 000000000..0c494f343 --- /dev/null +++ b/paddlex/repo_apis/PaddleDetection_api/configs/PP-YOLOE_plus-S_face.yaml @@ -0,0 +1,156 @@ +# Runtime +epoch: 10 +log_iter: 10 +find_unused_parameters: false +use_gpu: true +use_xpu: false +use_mlu: false +use_npu: false +use_ema: True +save_dir: output +snapshot_epoch: 1 +print_flops: false +print_params: false + +# Dataset +metric: WiderFace +num_classes: 1 + +worker_num: 4 +eval_height: &eval_height 1088 +eval_width: &eval_width 1088 +eval_size: &eval_size [*eval_height, *eval_width] + +TrainDataset: + name: COCODataSet + image_dir: WIDER_train/images + anno_path: train.json + dataset_dir: data_face + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + name: COCODataSet + image_dir: WIDER_val/images + anno_path: val.json + dataset_dir: data_face + allow_empty: true + +TestDataset: + name: COCODataSet + image_dir: WIDER_val/images + anno_path: val.json + dataset_dir: data_face + +TrainReader: + sample_transforms: + - Decode: {} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + - PadGT: {} + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + collate_batch: true + +EvalReader: + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 2 + +TestReader: + inputs_def: + image_shape: [3, *eval_height, *eval_width] + sample_transforms: + - Decode: {} + - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none} + - Permute: {} + batch_size: 1 + +# Model +architecture: YOLOv3 +pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_s_obj365_pretrained.pdparams + +norm_type: sync_bn +ema_decay: 0.9998 +ema_black_list: ['proj_conv.weight'] +custom_black_list: ['reduce_mean'] +depth_mult: 0.33 +width_mult: 0.50 + +YOLOv3: + backbone: CSPResNet + neck: CustomCSPPAN + yolo_head: PPYOLOEHead + post_process: ~ + +CSPResNet: + layers: [3, 6, 6, 3] + channels: [64, 128, 256, 512, 1024] + return_idx: [1, 2, 3] + use_large_stem: true + use_alpha: True + +CustomCSPPAN: + out_channels: [768, 384, 192] + stage_num: 1 + block_num: 3 + act: 'swish' + spp: true + +PPYOLOEHead: + fpn_strides: [32, 16, 8] + grid_cell_scale: 5.0 + grid_cell_offset: 0.5 + static_assigner_epoch: 30 + use_varifocal_loss: true + loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} + static_assigner: + name: ATSSAssigner + topk: 9 + assigner: + name: TaskAlignedAssigner + topk: 13 + alpha: 1.0 + beta: 6.0 + nms: + name: MultiClassNMS + nms_top_k: 1000 + keep_top_k: 300 + score_threshold: 0.01 + nms_threshold: 0.7 + +# Optimizer +LearningRate: + base_lr: 0.0001 + schedulers: + - name: CosineDecay + max_epochs: 300 + - name: LinearWarmup + start_factor: 0. + steps: 100 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 + +# Export +export: + post_process: true + nms: true + benchmark: false + fuse_conv_bn: false diff --git a/paddlex/repo_apis/PaddleDetection_api/object_det/register.py b/paddlex/repo_apis/PaddleDetection_api/object_det/register.py index 20cb4ea58..fda2e8de3 100644 --- a/paddlex/repo_apis/PaddleDetection_api/object_det/register.py +++ b/paddlex/repo_apis/PaddleDetection_api/object_det/register.py @@ -837,3 +837,50 @@ }, } ) + + +register_model_info( + { + "model_name": "BlazeFace", + "suite": "Det", + "config_path": osp.join(PDX_CONFIG_DIR, "BlazeFace.yaml"), + "supported_apis": ["train", "evaluate", "predict", "export", "infer"], + "supported_dataset_types": ["COCODetDataset"], + "supported_train_opts": { + "device": ["cpu", "gpu_nxcx", "xpu", "npu", "mlu"], + "dy2st": False, + "amp": ["OFF"], + }, + } +) + + +register_model_info( + { + "model_name": "BlazeFace-FPN-SSH", + "suite": "Det", + "config_path": osp.join(PDX_CONFIG_DIR, "BlazeFace-FPN-SSH.yaml"), + "supported_apis": ["train", "evaluate", "predict", "export", "infer"], + "supported_dataset_types": ["COCODetDataset"], + "supported_train_opts": { + "device": ["cpu", "gpu_nxcx", "xpu", "npu", "mlu"], + "dy2st": False, + "amp": ["OFF"], + }, + } +) + +register_model_info( + { + "model_name": "PP-YOLOE_plus-S_face", + "suite": "Det", + "config_path": osp.join(PDX_CONFIG_DIR, "PP-YOLOE_plus-S_face.yaml"), + "supported_apis": ["train", "evaluate", "predict", "export", "infer"], + "supported_dataset_types": ["COCODetDataset"], + "supported_train_opts": { + "device": ["cpu", "gpu_nxcx", "xpu", "npu", "mlu"], + "dy2st": False, + "amp": ["OFF"], + }, + } +)