Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile to wasm #5

Open
kpu opened this issue Sep 23, 2020 · 4 comments
Open

compile to wasm #5

kpu opened this issue Sep 23, 2020 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@kpu
Copy link
Member

kpu commented Sep 23, 2020

@lhk writes in marian-nmt/marian#343 and I've transferred it here.

Feature description

I would like to embed machine translation in my webapp. There is tensorflow.js but so far I've been unable to find suitable pretrained translation models.

Opus-MT hosts a large repository of pretrained models for many language pairs. It uses marian for the neural machine translation.

The pre- and postprocessing is cheap. I would be able to host the tokenizer on a server. But marian-decoder is too costly to host myself.
It would be great if it was possible to compile the code to webassembly and run it client-side.

I have written small projects in C/C++ and in principle would be happy to dig deeper. But guidance from someone with more experience would be really helpful.

Is this feasible at all?

@kpu kpu added the enhancement New feature or request label Sep 23, 2020
@kpu
Copy link
Member Author

kpu commented Sep 23, 2020

Marian is fast enough to run natively on consumer desktops.

WebAssembly has several limitations that make it slower than native: https://docs.google.com/document/d/1pTl4clEaMHj5n4P0oc5zBckHsK1et4zSARVBPNsQ__M/edit

  1. No 8-bit dot product instruction https://github.com/WebAssembly/simd/issues/328
  2. Only 128-bit SIMD, not 256 or 512-bit yet
  3. We don't know how many registers there are, which makes matrix multiply slow. Matrix multiply is constrained by memory bandwidth so routines use as many registers as possible to make tiles as large as possible.

We don't know the exact performance impact on Marian yet, but it is Mozilla's current job in the project to compile Marian to WebAssembly and find out; @mlopatka and @abhi-agg are the Mozilla people currently working towards this.

There's also a proposal to support machine learning better in the browser https://webmachinelearning.github.io/webnn/#api-neuralnetworkcontext-gemm though it's nascent.

@lhk
Copy link

lhk commented Sep 23, 2020

@kpu that sounds awesome! Thanks for the quick response :)

I've taken a look at the browser.mt website and found this blogpost: https://browser.mt/blog/w3c-presentation-post
"Client side Firefox translation demo"

That sounds as if there is already a working configuration. Is it possible to play around with that? I couldn't find the actual demo, only the slideset.

@kpu
Copy link
Member Author

kpu commented Sep 23, 2020

The demo is based on a local Marian server running in the background (from this repo) and a Firefox fork https://github.com/browsermt/firefox that communicate over a REST API on TCP. The fork is outdated and probably a security risk at this point. This is all meant to be replaced by either native messaging inside an extension or a pure web extension that would run on WebAssembly / WebGL / WebNN possibly with rapid implementation of proposed standards that make performance tolerable. And it is Mozilla's job to explore this space.

So the full answer to your question is you are welcome to play around with the code but don't expect much support or documentation at this time. You can also get a fast model from http://statmt.org/bergamot/models/ .

The current project exposes translation to the user and has a native API (think Marian server plus quality estimation and word alignment). We hadn't even thought of the use case of exposing a javascript API for web pages to access translation themselves until you stopped by.

@mlopatka
Copy link

As Kenneth mentions, our current goals include exploration of the performance characteristics alternatives to the client-server architecture described in @kpu's comment.
At current time, we resourcing the development efforts to implement and assess both a WASM module and a native messaging solution, and @abhi-agg will be working with a (tbh) developer on those tasks. This thread is the right place to track progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants