Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Memory Usage and Add Cancellation Support in TesseractOcrEngine and ServiceConfiguration #1202

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

DrHazemAli
Copy link

@DrHazemAli DrHazemAli commented Nov 6, 2024

This pull request introduces key optimizations and enhancements to improve memory handling and performance across two files: shared/Ocr/Tesseract/TesseractOcrEngine.cs and shared/ServiceConfiguration.cs.

  • Memory Efficiency in OCR: Updated ExtractTextFromImageAsync in TesseractOcrEngine to use buffered copying, minimizing memory spikes during large image processing. Implemented IDisposable to ensure proper resource cleanup of TesseractEngine.
  • Enhanced Cancellation Support: Added CancellationToken handling in ExtractTextFromImageAsync, allowing for responsive task cancellation.
  • Optimized GetServiceInstance<T>: Modified the service resolution approach in ServiceConfiguration.cs to avoid redundant IServiceCollection cloning, instead configuring services within a scoped lifecycle. This reduces memory usage by minimizing instance duplication and leverages dependency injection more efficiently.

Motivation and Context

  1. Why this change is required: This change enhances memory efficiency, supports cancellation during OCR extraction, ensures proper resource cleanup in TesseractOcrEngine, and optimizes service resolution in GetServiceInstance<T>.
  2. Problem it solves: Reduces memory usage during OCR processing of large files, improves response to cancellation requests, and prevents redundant service instantiation, which reduces overhead in dependency injection.
  3. Scenarios: Improves performance and memory handling, particularly for large image files in OCR, optimizes service instance management and avoid unexpected crashes.
  4. Fixes an open issue: Yet, No existing issues related to this.

Description

  1. Memory Optimization in ExtractTextFromImageAsync (shared/Ocr/Tesseract/TesseractOcrEngine.cs): Introduced buffered copy to handle large images more efficiently, minimizing memory usage during image processing.
  2. Cancellation Support: Updated ExtractTextFromImageAsync to handle cancellationToken properly by passing it into CopyToAsync and handling OperationCanceledException.
  3. Resource Cleanup with IDisposable: Implemented IDisposable in TesseractOcrEngine to ensure TesseractEngine resources are disposed when the instance is no longer needed.
  4. Optimized GetServiceInstance<T> (shared/ServiceConfiguration.cs): Reduced memory footprint by avoiding redundant IServiceCollection cloning and copying. Instead, services are configured within a scoped lifecycle to minimize memory overhead and prevent unnecessary duplication in dependency injection.
  5. Stream Position Reset: Reset the memory stream position after copying to enable seamless image loading.

Checklist

@dependabot rebase

This fix minimizes memory usage by avoiding the duplication of services and using scoped resolution, which is ideal for transient, one-time configurations. Additionally, this approach leverages DI lifecycle management to ensure services are automatically cleaned up.
Reduce memory usage during CopyToAsync
Copy link
Author

@DrHazemAli DrHazemAli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR improves memory handling and resource management across the TesseractOcrEngine and ServiceConfiguration components. Key optimizations include buffered copying for large image processing, enhanced cancellation support, efficient disposal of resources, and optimized service instance resolution. These changes contribute to more efficient memory usage, better responsiveness, and improved dependency injection practices.

@DrHazemAli
Copy link
Author

@microsoft-github-policy-service agree company="Skytells, Inc."

@DrHazemAli DrHazemAli changed the title Optimize Memory Usage and Add Cancellation Support in TesseractOcrEngine and ServiceConfiguration Optimize Memory Usage and Add Cancellation Support in TesseractOcrEngine and ServiceConfiguration [enhancement][memory][.net] Nov 6, 2024
@DrHazemAli DrHazemAli changed the title Optimize Memory Usage and Add Cancellation Support in TesseractOcrEngine and ServiceConfiguration [enhancement][memory][.net] Optimize Memory Usage and Add Cancellation Support in TesseractOcrEngine and ServiceConfiguration [enhancement] [memory] [.net] Nov 6, 2024
@DrHazemAli DrHazemAli changed the title Optimize Memory Usage and Add Cancellation Support in TesseractOcrEngine and ServiceConfiguration [enhancement] [memory] [.net] Optimize Memory Usage and Add Cancellation Support in TesseractOcrEngine and ServiceConfiguration Nov 6, 2024
@DrHazemAli
Copy link
Author

@dependabot rebase

Copy link
Author

@DrHazemAli DrHazemAli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed

Copy link
Author

@DrHazemAli DrHazemAli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rechecked ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants