Quantization based on Evolutionary Algorithm (QEA) is an automatic hybrid bit quantization algorithm. It uses an evolutionary strategy to search for the quantization bit width of each layer in a CNN network. Taking the automatic quantization and compression for ResNet-20 as an example, the quantized search space is a quantized bit width of the convolution kernel parameter of each layer and a quantized bit width of the activation value (for example, 2bit/4bit/8bit). A population P including N individuals is maintained, and each individual corresponds to a compressed network model. A population P' of the same size N is generated through cross mutation. Each compressed network model performs training/validation, and uses indicators such as accuracy, FLOPs, and a parameter quantity specified by a user in a verification set as an optimization target, to sort and select an individual, and update and maintain the population P'.
The search space is constructed with the quantization bit width of a parameter (weights) and a quantization bit width of an activations of each layer of the neural network (for example, 2-bit/4-bit/8-bit). Using ResNet-20 as an example, the first layer and the last layer are not quantized, the search is done for the quantization bit width of weights/activations of the middle 18 layers. Set the search candidate of each layer to [2 bits, 4 bits, 8 bits]. Then the total search space is
Pareto front is obtained using the NSGA-III multi-objective optimization evolution algorithm:
- Search process: Generate codes of N compressed models from the population P through evolution operations such as crossover and mutation.
- Evaluation process:
- Complete the construction of the compression model based on the N codes generated by the evolution operation.
- Execute the evaluation process to generate all user-defined evaluation metrics, including accuracy, FLOPs, and parameters.
- Optimization process: The evolutionary algorithm is invoked to update the population P.
Repeat the search, evaluation, and optimization to complete the entire evolutionary automatic quantization bit width search process and find the Pareto font. After the quantitative model is searched, the models on the Pareto front are trained to obtain the final performance. For details about the NSGA-III algorithm, see the original paper [1].
[1] Deb, Kalyanmoy, and Himanshu Jain. "An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints." *IEEE Transactions on Evolutionary Computation* 18.4 (2013): 577-601.
- The fp32 model can be quantized into low-bits to reduce computing and storage overheads.
- The evolution algorithm searches for the quantization bit width of each layer. Compared. The searched model takes advantages over the fix bit width quantization, like 8bit (baseline-w8a8) or 4bit (baseline-w4a4) quantization. It has less computing workload, and higher classification accuracy.
- The NSGA - III algorithm can search out the Pareto front and generate multiple optimal models with different constraints at a time.
Quantization bit width of the weight and activation value (configured by bit_candidates in examples/compression/quant_ea/config.yml. For example, [4,8] indicates that the search space is 4/8 bits.)
The current example provides the ResNet series as the basic neural network. If you need to replace the network with other networks, refer to vega/search_space/networks/backbones/quant_resnet.py to replace nn.Conv2d in your network with the quantized convolutional layer QuantConv.
QuantEA's data can be either a standard CIFAR-10 dataset or a custom dataset. For details, see the development manual.
The configuration of the CIFAR-10 data set is as follows:
Configure parameters including searching and training the quantization model, which corresponds to nas1 and fully_train in the examples/compression/quant_ea/config.yml configuration file.
The configuration file is directly transferred to the pipeline through main.py. The two phases are performed in sequence. The Pareto front is found during the search process, and the front models are trained to obtain the final performance.
The following two files are generated in the specified output directory:
- The result.csv file contains the encoding, flops, parameters, and accuracy of all models during the search process.
- pareto_front.csv contains the found pareto front information.