Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The reason why am I making TFLiteSwift-Vision #3

Open
4 of 6 tasks
tucan9389 opened this issue Aug 22, 2021 · 0 comments
Open
4 of 6 tasks

The reason why am I making TFLiteSwift-Vision #3

tucan9389 opened this issue Aug 22, 2021 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@tucan9389
Copy link
Owner

tucan9389 commented Aug 22, 2021

Goal

Make a vision-specific layer that you can use on a TFLiteSwift application.
You can use pre-implemented vision-specific functions like following:

  • various image preprocessing methods (resizing, bwhc sequence, grayscale, normalization...)
  • postprocessing examples of various vision tasks (classification...)
picture 1. Data flow during using TFLiteSwift-Vision
Screen Shot 2021-08-23 at 11 24 33 PM

I don't know this implementation can merge into tensorflow/tflite-support or tensorflow/examples. But I'll maintain this framework for my personal needs first, and then I'll check this repo can be used or merged into the tensorflow's repos.

Motivation

There are a lot of general methods for image preprocessing when you use them in vision problems, and I would like other iOS developers can use the methods without the need to implement the general methods. In TFLiteSwift-Vision, I made implementation for abstracting and generalizing the image pre-processing as a first step, after then I'm going to make image post-processing and post-processing examples of task-specific cases. So I expect other researchers and developers can use the TFLite without re-implementation of the functions and achieve goals faster.

Why TFLiteSwift-Vision instead of MLKit?

picture 2. supporting tasks of MLkit's custom model
(captured at 21.08.23 from here)
Screen Shot 2021-08-23 at 11 13 32 PM

You can consider domain specific features in MLKit, but those are supporting image classification and object detection now and you cannot use them when you want to implement other tasks like segmentation, pose estimation, style transfer, etc. (If there are other methods that I don't know, please comment!)

picture 3. The architecture of MLKit and CoreML
image

As you can see on the right side of picture 3, Apple supports the image pre/post-processing layer through Vision framework. I expect TFLiteSwift-Vision will be able to be a similar role, I want to support not only tensorflow model's pre/post-processing, but also the tflite model which is converted from pytorch.

picture 4. TFLiteSwift-Vision's position in iOS TFLite architecture
image

How about tflite's the task-library?

As you can see in the picture 3, TFLite officially supports task-library which is the bunch of implementation of various domains pre/post-processing. But it was made by c language (ref), it could be a huddle for most iOS developers who are familiar with Swift in the customization aspect. For making more iOS developers leverage vision tflite model, we support the implementation of split things into pre-processing and post-processing parts and make Swift implementation.

In TFLiteSwift-Vision, we mainly support pre-processing part. Because there is a lot of research and applications for vision tasks that can receive the image as an input. But the case of model output is image is limited as GAN like tasks, so image output post-processing feature will be released after 1.0.0 version as a goal. Now I have made the basic feature that the framework returns Tensor.

picture 5. supporting tasks of official TFLite task library
(captured at 21.08.23 from here)
Screen Shot 2021-08-23 at 11 12 42 PM

Feature Works

  • 0.1.0 − basic implementation
    • TFLiteSwift-Vision: converting into Data
    • TFLiteSwift-Vision: normalization -- scale to 0.0...1.0, normalize with mean and std, do not normalize
    • TFLiteSwift-Vision: resizing and cropping
    • Example: inference mobilenet image classification
  • 0.2.0
    • TFLiteSwift-Vision: make more simple interface when use it
    • TFLiteSwift-Vision: support UInt8 on one of input type
  • 0.2.1
  • 0.2.2
    • Example: inference pose estimation
    • Example: inference mnist gray input model
  • 0.3.0
    • Example: support camera and realtime example
  • 1.0.0 − video
  • after 1.0.0
    • test code for validating with academic metric
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant