-
im using paddleocr for a project and i want to limit the resources its using. The CPU usage of paddle ocr is tied to the image it scans, i want to limit it but how ? many cases where the cpu usage reaches 100% when doing full screen ocr. this is my specs i tried setting the flags but no changes, i dont know if its because i set the flags wrong or that its not what i should expect. import tkinter as tk Custom DebugPaddleOCR class to include debugging capabilitiesclass DebugPaddleOCR(PaddleOCR):
Configure PaddlePaddle memory and CPU parameterspaddle.set_flags({ Initialize the DebugPaddleOCR instance with additional parametersocr = DebugPaddleOCR(lang='id', cpu_threads=1) # Set language to Indonesian and use 1 CPU thread def run_ocr():
Create the GUI applicationroot = tk.Tk() Add a button to select a fileselect_button = tk.Button(root, text="Select Image", command=run_ocr) Add a textbox to display OCR resultstext_box = tk.Text(root, wrap=tk.WORD, width=80, height=20) Run the applicationroot.mainloop() |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
To reduce PaddleOCR's CPU usage or GPU usage while maintaining performance, you can make the following adjustments: 1. Adjust CPU ThreadsThe number of CPU threads used by PaddleOCR directly affects resource usage. By default, PaddleOCR uses multiple threads, which can spike CPU usage. You can limit this by reducing the number of threads: ocr = PaddleOCR(cpu_threads=1) # Restrict to 1 thread In your script, you've already set 2. Optimize PaddlePaddle FlagsThe paddle.set_flags({
"FLAGS_fraction_of_cpu_memory_to_use": 0.5, # Limit to 50% of CPU memory
"FLAGS_use_pinned_memory": False, # Disable pinned memory for a lower CPU load
})
3. Use PaddleLite for Lightweight InferenceIf you want to further reduce resource usage, consider using PaddleLite, which is optimized for edge devices and constrained environments. PaddleLite is a lightweight inference engine that can reduce both CPU and GPU usage significantly. You can convert your PaddleOCR model to a PaddleLite model and run it with optimized performance. 4. Reduce Model SizeIf high accuracy isn't critical, you can use a lighter OCR model. PaddleOCR supports multiple lightweight models. For example: ocr = PaddleOCR(rec_model_dir='path/to/lite_model', det_model_dir='path/to/lite_model', lang='en') You can download these models from the PaddleOCR Model Zoo. 5. Limit Image SizeThe size of the input image directly impacts resource usage. If you're performing OCR on large images, resizing them to a smaller resolution can significantly reduce computation time and memory usage. For example: from PIL import Image
def preprocess_image(file_path, max_width=1000, max_height=1000):
image = Image.open(file_path)
image.thumbnail((max_width, max_height), Image.ANTIALIAS)
return image Before passing the image to 6. Batch Process Small RegionsInstead of performing OCR on the entire screen at once, divide the image into smaller regions and process them sequentially. This reduces the peak CPU usage: def split_image(image, num_splits=4):
width, height = image.size
split_height = height // num_splits
regions = [image.crop((0, i * split_height, width, (i + 1) * split_height)) for i in range(num_splits)]
return regions Process each region one at a time using 7. Switch to GPU (if available)Your hardware includes an AMD Radeon GPU. PaddleOCR typically supports NVIDIA GPUs via CUDA, but since you have an AMD GPU, you might not get native GPU acceleration. However, you can still offload some computations to GPU by enabling ocr = PaddleOCR(use_gpu=True) Check if the GPU is supported by verifying with 8. Avoid Multiprocessing OverheadThe provided discussion highlights that using multiprocessing with PaddleOCR can increase processing time due to overhead. Stick to single-threaded or multithreaded execution without multiprocessing unless necessary. If you must use multiprocessing, ensure proper initialization of the PaddleOCR object within each process. 9. Measure and Monitor Resource UsageTo debug and optimize further, monitor CPU and memory usage using tools like import psutil
print("CPU usage:", psutil.cpu_percent())
print("Memory usage:", psutil.virtual_memory().percent) Additional ReferenceThe discussion linked (GitHub Discussion) highlights that multiprocessing can increase processing times by up to 5x due to initialization overhead. Avoid multiprocessing unless you're processing multiple independent images concurrently and the initialization overhead is acceptable. By combining these optimizations, you should be able to significantly limit the CPU and GPU usage of PaddleOCR while maintaining reasonable performance. Response generated by feifei-bot | chatgpt-4o-latest |
Beta Was this translation helpful? Give feedback.
-
@GreatV i am sorry for the very late response... 1.Adjust Cpu threads 2.Optimize PaddlePaddle Flags. 3.PaddleLite for lightweight Interference 4.Reduce Model Size 5.Limit image size 6.batch process small regions 7.Switch to GPU, i dont have one 9.Measure and Monitor Resource Usage |
Beta Was this translation helpful? Give feedback.
oh yeah i used psu util to limit the process usage instead. works like a charm, only for the cpu though. Cant do the sam as ram.
import os
import time
import psutil
from PyQt6.QtCore import QThread, pyqtSignal
from paddleocr import PaddleOCR
class OcrWorker(QThread):
finished = pyqtSignal(list, list, float, float, float)