Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A built-in wasm 8-bit matrix multiply primitive #205

Open
8 of 13 tasks
Tracked by #15
abhi-agg opened this issue Jul 14, 2021 · 4 comments
Open
8 of 13 tasks
Tracked by #15

A built-in wasm 8-bit matrix multiply primitive #205

abhi-agg opened this issue Jul 14, 2021 · 4 comments
Assignees

Comments

@abhi-agg
Copy link
Contributor

abhi-agg commented Jul 14, 2021

A summary of all the tasks that need to be done in this repository (and its submodules) to import matrix multiply function (based on 4x8-bit-to-32-bit dot product primitive) from a separate wasm module.

JS code will be able to instantiate a separate wasm module that exports a matrix multiply function and Bergamot code can, then, link against that instance to access that function. (Some context here: https://github.com/mozilla-extensions/firefox-translations/issues/75, corresponding bugzilla issue)

Tasks Stage-1:

Tasks Stage-2:

Tasks Stage-3:

  • Native implementations of the interface for ARM architecture
  • Test cases for the implementation
  • Land the implementation in Firefox
  • Test and Validate that the solution works and provides performance improvements

cc @yurydelendik @andrenatal @lonnen @kpu @eqrion @lars-t-hansen @julian-seward1

Please suggest if I missed anything. I have created separate issues for 1 of the task to have the specific discussions there. Same can be done for other tasks as well as we go ahead.

P.S.: Stage-2 and Stage-3 can be ordered differently. Stage-1 has to be executed first.

@jerinphilip
Copy link
Contributor

Hello, just wondering if some work I'm doing at arm-playground relates/overlaps with the todos here. Could you let know if the links mentioned below are related?

  1. Portability
    a. Develop a portable (architecture agnostic) fallback implementation for matrix-multiply function for wasm:
    I assume architecture agnostic means no presence of SIMD. Available via ruy as kStandardCppPath?
    b. Package this portable implementation with bergamot binary to serve as a fallback solution
    I'm confused, doesn't a compile of the intgemm stuff through emscripten into web-assembly give you this already (in the webassembly binary)? Is this a "native" fallback available via MozIntGemm? @yurydelendik is this what eliminates the JS in WASM -> JS -> WASM?

  2. SSSE3:
    a. Test cases for this implementation: https://github.com/jerinphilip/arm-playground/blob/5163d54a000e86205be32a01b9465e79d3a8af95/src/firefox_interface_test.cpp
    c. Test and Validate for SSSE3 architecture that this solution works and provides performance improvements as expected https://github.com/jerinphilip/arm-playground/blob/5163d54a000e86205be32a01b9465e79d3a8af95/src/firefox_interface_test.cpp, as a simple test of the wrapper against reference multiply, because intgemm speed remains.

  3. ARM
    a. Native implementations of the interface for other architectures (Armv8.0/AVX2/Armv8.5? + additional to be decided later)
    Available via ruy.
    b. Test cases for the implementations
    Available via https://github.com/jerinphilip/arm-playground/blob/5163d54a000e86205be32a01b9465e79d3a8af95/src/firefox_interface_test.cpp against reference multiply.

@yurydelendik
Copy link

I'm confused, doesn't a compile of the intgemm stuff through emscripten into web-assembly give you this already (in the webassembly binary)?

It is possible to compile intgemm into wasm, yes. But it will not be backed up by AVX stuff, thus no desired performance. The ARM solution is not expected to be compilable via emscripten into wasm since there will no be a performance win. The ARM64 solution will be used as part of MozIntGemm (notice that it is native code so we can use AVX or Neon directly).

Is this a "native" fallback available via MozIntGemm? is this what eliminates the JS in WASM -> JS -> WASM?

WASM -> JS -> WASM is temporary and "Package this portable implementation with bergamot binary to serve as a fallback solution" looks like an item to remove that. The "fallback" wasm module implements functionality of MozIntGemm, but in wasm (in comparison MozIntGemm will use AVX stuff which are not available in wasm) -- emscripten will be used here, I guess.

@kpu
Copy link
Member

kpu commented Nov 21, 2021

@jerinphilip Your short-term goal is to provide an implementation of Abhishek's C API using ruy so that it can be integrated into Gecko (part of Firefox) for ARM support. This native code will then be exposed as an intrinsic MozIntGemm that Marian perceives as a C function. Please try to avoid being a bull in a China shop.

There is no requirement for a SIMD-free implementation. My understanding is Firefox supports WebAssembly on ARM and x86. And both of these have SIMD. On an arch without WebAssembly we're sunk anyway.

On browsers without the MozIntGemm intrinsic, Mozilla's proposal is:

  1. Short-term the C API calls go into JS then back into WASM. Apparently linking a module back to itself is hard. This will be slow.
  2. Long-term they've asked for a separate matrix multiply library implementing the C functions which would run in WASM. Then the Marian tool can be linked to it without jumping into JS. This separate matrix multiply library already exists. It is what is included in Gecko, just with different compilation flags. intgemm compiled with WASM defines instead of native defines.

There is a problem though in that the library will also have to not expose symbols (to avoid conflicting with residual intgemm in Marian since we've only abstracted the parts used in Bergamot models), will take up more space for the artifact, and I don't know if there's some overhead for the linked function. Therefore I am proposing we just make the separate WASM library a detectable dummy implementation that Marian can avoid calling and handle internally. We already have this path for native builds; they don't jump through the C API.

@abhi-agg
Copy link
Contributor Author

Updated the issue to reflect the current status of the tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants