Skip to content

Latest commit

 

History

History
70 lines (43 loc) · 6.72 KB

edge.MD

File metadata and controls

70 lines (43 loc) · 6.72 KB

Edge AI 🦙

[ 🦙 @tinyML ]

Edge AI refers to the process of running artificial intelligence (AI) algorithms directly on devices at the edge of the network (e.g., IoT devices, smartphones, or sensors) rather than relying on cloud servers. Edge AI allows data processing and inference to happen locally, which reduces latency, improves privacy, and minimizes bandwidth usage. TinyML (Tiny Machine Learning) is a subset of Edge AI focusing on deploying machine learning models on extremely resource-constrained devices (with memory as low as a few kilobytes and processing power in milliwatts).

Key benefits of Edge AI and TinyML:

  • Low latency: Since data doesn't need to be sent to the cloud for processing, decisions are made instantly.
  • Reduced energy consumption: TinyML models are designed to run on low-power devices.
  • Privacy and security: Sensitive data stays on the device, reducing the risk of privacy breaches. Offline operation: No need for constant internet connectivity since models run on local devices.

Quantization in Edge AI and TinyML

Quantization is a process of reducing the precision of numbers used in models, which helps shrink the model's size and reduce the computational load without significantly impacting accuracy.

Types of Quantization

Post-training quantization: After training a model, it is converted into a lower precision format, such as 8-bit integers instead of 32-bit floating-point numbers.

  • Example: Converting a neural network model trained in 32-bit floating-point precision to 8-bit integers.

Quantization-aware training (QAT): In this method, quantization is considered during training itself, allowing the model to adapt to reduced precision and maintain better accuracy.

  • Example: Training a model with simulated quantized operations during backpropagation to ensure it performs well when deployed in a quantized form.

[ Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ), Quantization vs Pruning vs Distillation: Optimizing NNs for Inference, Transformer Neural Networks Derived from Scratch ]

Pruning for Efficient Model Deployment

Pruning is the process of removing parts of a model (e.g., unnecessary weights, filters, or neurons) that are not critical for its performance. The goal is to reduce the model's complexity and size while maintaining as much accuracy as possible.

Types of Pruning

Weight pruning: Pruning small or unimportant weights from the neural network, essentially setting them to zero.

  • Example: After training, weights below a certain threshold in a neural network are pruned, reducing the number of parameters.

Neuron pruning: Removing entire neurons or channels in the network that contribute little to the output.

  • Example: During pruning, neurons with minimal activations are pruned away, reducing the number of computations in each layer.

Structured pruning: Pruning entire filters or layers, reducing the complexity of the model in a structured manner.

  • Example: In a convolutional neural network (CNN), some filters are identified as redundant, and they are pruned to reduce the model's size and computation time.

Knowledge Distillation for Edge AI

Knowledge distillation is a technique where a smaller, less complex model (called the "student") is trained to mimic a larger, more complex model (the "teacher"). The teacher model, which is often too large for edge deployment, passes knowledge to the student, resulting in a smaller, more efficient model suitable for resource-constrained environments.

Benefits of Knowledge Distillation

  • Smaller model size: The student model is much smaller and lighter than the teacher.
  • Comparable accuracy: Even though the student model is smaller, it can retain a high level of accuracy by learning from the more complex teacher model.
  • Faster inference: The student model is optimized for edge devices, providing faster inference times.

Example of Knowledge Distillation in TinyML: A large pre-trained model for image classification is too big to deploy on a microcontroller. By applying knowledge distillation, a smaller version of the model is created that achieves 95% of the teacher model's accuracy but is small enough to run efficiently on the microcontroller.

[ Knowledge Distillation: A Good Teacher is Patient and Consistent, How ChatGPT Cheaps Out Over Time ]

Popular TinyML Frameworks and Libraries :

  • LiteRT / TensorFlow Lite for Microcontrollers (TFLite Micro): A version of TensorFlow Lite designed to run on microcontrollers with limited resources. TFLite Micro enables the deployment of machine learning models directly on tiny, low-power devices.

  • Edge Impulse: A development platform for building, training, and deploying TinyML models. It includes tools for collecting data from edge devices, training machine learning models, and optimizing them for deployment.

  • CMSIS-NN: A library of neural network kernels optimized for ARM Cortex-M microcontrollers. It allows TinyML models to leverage the processing power of ARM-based devices efficiently.

  • uTensor: A lightweight machine learning framework for embedded systems, providing an optimized implementation of common machine learning operations for microcontrollers.

Resources : MIT HanLab projects, tinyml.org, harvard@tinyML, Tiny Machine Learning: Progress and Futures, Building a TinyML Application with TF Micro and SensiML, @github/bitsandbytes, tinyML EMEA, [ @github/tinyml-projects-papers, Seedstudio blog - TinyML, openmv.io, Harvard TinyML: discuss ].