Target agnostic interface for 8-bit matrix multiply for wasm #49

abhi-agg · 2021-08-04T13:48:45Z

This PR add the necessary interface for 8-bit matrix multiply function (uint8 * int8) for wasm

Fixes: browsermt/bergamot-translator#213

I request reviewers to please verify whether the the function signatures are correct for the intended use case.

Open issues that will be resolved in this PR incrementally:

Which one of Int8::PrepareB and Int8::PrepareBTransposed functions should be added to make the interface complete while still keeping it target agnostic. I will add them once we reach a decision.
EDIT: We agreed to add both.
Do we need to add a fused Multiply function (a multiply function with PrepareA fused inside)?
EDIT: We agreed that it can be added later based on if this function can provide performance gains.
Should we support int8 * int8 multiplication as well? If yes, then I will add Int8::Multiply and the corresponding Int8::PrepareA function to support this use case.
EDIT: We agreed that we don't need to add separate functions for this. The same Multiply function can deal with both.

Checklist

I have tested the code manually
I have run regression tests
I have read and followed CONTRIBUTING.md
I have updated CHANGELOG.md

yurydelendik · 2021-08-04T14:13:53Z

Such data organization is not really practical and will take a performance hit on bounds checks: you have to check now if QuantizedBuffer& structure in bounds, then deference and read value pointer, as well as deference Index* cols, and perform bounds checks there. We want to keep data for intrinsic in registers as much as possible; any reference/pointer will force data to be read from/written into memory.

XapaJIaMnu

As talked in the other document, we need at least one more version of the the PrepareB functions. Furthermore it is not really practical to have a struct, as structs may exhibit different alignments on different hardware. Normally what is done is to stick extra metadata at the end of the long array (eg at the end of the int8_t* we have 1 float for the scale of A, one for the scale of B, and 2 zero_points, one corresponding to A and one to the B matrix).

src/tensors/cpu/wasm_intgemm_interface.h

XapaJIaMnu · 2021-08-04T14:27:26Z

As for point 2., we will write some experimental kernels where you take float matrix A and int8_t matrix B and see if they are competitive in terms of speed compared to the two pass approach.

abhi-agg · 2021-08-04T14:28:46Z

@yurydelendik @XapaJIaMnu If I understood correctly, removing QuantizedBuffer struct and adding its 3 fields (int8_t* value, float scale, int8_t zero_point) as function parameters will resolve the concern. Right?

src/tensors/cpu/wasm_intgemm_interface.h

XapaJIaMnu · 2021-08-04T14:36:32Z

@yurydelendik @XapaJIaMnu If I understood correctly, removing QuantizedBuffer struct and adding its 3 fields (int8_t* value, float scale, int8_t zero_point) as function parameters will resolve the concern. Right?

The zero point should be float, but yeah. Waiting for feedback from @kpu

- Add all of it's fields as function arguments

src/tensors/cpu/wasm_intgemm_interface.h

abhi-agg · 2021-08-09T15:30:15Z

@XapaJIaMnu Added PrepareB function as well 👍

src/tensors/cpu/wasm_intgemm_interface.h

kpu · 2021-08-18T19:17:20Z

I think it would be a helpful exercise for @abhi-agg to write short implementations of the proposed functions in terms of the existing functions for several reasons:

You'll have a working implementation.
The cost of binding it to implementation will disincentivize @abhi-agg from unnecessary changes that appear in the present API.
He is going to be part of auditing it anyway.

I worry the relatively cheap cost of speccing out a new API (when one already exists) will result in a tax on having to change the implementation.

For example, it is not clear why the function argument order has changed. Or what value making the user specify the output shape of a matrix is (as opposed to the input shape), when it's actually going to an opaque representation that is neither row major or column major.

src/tensors/cpu/wasm_intgemm_interface.h

kpu · 2021-08-18T19:34:30Z

While it's not explicitly stated, intgemm's interface always talks to the user in row major (unless the function name has Transposed) and that's the meaning of A_rows, inner dimension, and B_cols. These refer to A and B in row major format. The opaque representation (B) is private to intgemm. These sizes refer to the input. There is a convention if not a stated one.

Here there is a mix of output size and A_rows style. For prepareB the user specifies an output shape. From an API usability perspective, I feel like callers know the shape of their input more than they know the shape of their output. (Though obviously they'll need to know both). Is this targeted at some practice that it's better to force the user to explicitly size where writes are going than it is to explicitly size where reads are coming from?

abhi-agg · 2021-08-19T11:17:34Z

@kpu @XapaJIaMnu

For example, it is not clear why the function argument order has changed

I can make the order same as of intgemm interface 👍

While it's not explicitly stated, intgemm's interface always talks to the user in row major (unless the function name has Transposed) and that's the meaning of A_rows, inner dimension, and B_cols. These refer to A and B in row major format. The opaque representation (B) is private to intgemm. These sizes refer to the input. There is a convention if not a stated one.

Taking an example just to make sure that I understood it correctly. Do you mean that:

The argument rows of PrepareA(const float *input, int8_t *output, float quant_mult, Index rows, Index cols) and A_rows of Multiply(const int8_t *A, const int8_t *B, Index A_rows, Index width, Index B_cols, Callback callback) are same and represent the no. of rows of input matrix A (input)?
The argument rows of (*PrepareB)(const float *input, int8_t *output, float quant_mult, Index rows, Index cols), rows of (*SelectColumnsB)(const int8_t *input, int8_t *output, Index rows, const Index *cols_begin, const Index *cols_end) and width of Multiply(const int8_t *A, const int8_t *B, Index A_rows, Index width, Index B_cols, Callback callback) are all the same and represent the no. of rows of input matrix B?

From an API usability perspective, I feel like callers know the shape of their input more than they know the shape of their output. (Though obviously they'll need to know both). Is this targeted at some practice that it's better to force the user to explicitly size where writes are going than it is to explicitly size where reads are coming from?

Could you please explain what you mean by opaque representation? This would be super helpful.

PS: The unstated conventions of intgemm led me to make some assumptions that I documented here and it obviously came out as an attempt to redesign the APIs to you. The goal is not to redesign but to document the APIs with all the conventions, keeping in mind that this interface can work for Arm too and we are heading in the right direction because of your reviews. I can make the required changes 👍

XapaJIaMnu · 2021-08-19T17:59:27Z

@abhi-agg opaque representation refers to the fact, that after PrepareB the output B matrix is neither rowM, nor ColumnMajor, but an opaque binary representation. You can iterate it and make sense of it.

andrenatal · 2021-08-19T20:49:24Z

Can we move this faster and expedite it? Our WebAssembly team is blocked by the decision that needs to be taken here, and if we keep delaying it, we might lose their support and let this slip through the cracks.

What's still missing to have this landed, @kpu @XapaJIaMnu @abhi-agg?

Thanks

abhi-agg · 2021-08-19T20:50:25Z

@abhi-agg opaque representation refers to the fact, that after PrepareB the output B matrix is neither rowM, nor ColumnMajor, but an opaque binary representation. You can iterate it and make sense of it.

@XapaJIaMnu Thanks a lot. This clarifies a lot. Few questions:

Is the output of PrepareA and PrepareBias also opaque?
The result of the multiply function (C) is in row-major format. Right?

The argument rows of PrepareA(const float *input, int8_t *output, float quant_mult, Index rows, Index cols) and A_rows of Multiply(const int8_t *A, const int8_t *B, Index A_rows, Index width, Index B_cols, Callback callback) are same and represent the no. of rows of input matrix A (input)?

The argument rows of (*PrepareB)(const float *input, int8_t *output, float quant_mult, Index rows, Index cols), rows of (*SelectColumnsB)(const int8_t *input, int8_t *output, Index rows, const Index *cols_begin, const Index *cols_end) and width of Multiply(const int8_t *A, const int8_t *B, Index A_rows, Index width, Index B_cols, Callback callback) are all the same and represent the no. of rows of input matrix B?

Could you please also confirm this? This will resolve all the confusion 👍

XapaJIaMnu · 2021-08-20T09:30:10Z

@abhi-agg

PrepareA and PrepareBias are both row Major format, but it is difficult to explain to the user what exactly prepareBias does, as I illustrated it to you beforehand.
C is RowMajor.

Yes for prepareA and Multiply,
For PrepareB and Multiply, it's a bit more complicated. If B is supposed to be transposed while it is being prepare (PrepareBTransposed), it corresponds width corresponds to the rows of the transposed matrix (or the cols of the original one). See the signature of this function.

https://github.com/kpu/intgemm/blob/master/intgemm/intgemm.h#L339

What you have written holds for the default case where B doesn't need to be transposed before the operation:

https://github.com/kpu/intgemm/blob/master/intgemm/intgemm.h#L328

abhi-agg · 2021-08-20T14:54:31Z

@XapaJIaMnu Thanks a lot. Kenneth's and your answers have cleared a lot of unstated conventions of intgemm interface.

I have changed the documentation to reflect all of it. However, it is possible that there are few things which I still might have misunderstood. So, it would be great if you can give it a final look and point out anything that seems wrong 👍

Additionally:

I have added a separate function for taking the transposed B input. I believe the API stays clean that way (instead of a boolean flag to pass a transposed and non-transposed B matrix in the same function).
I haven't changed the order of the function arguments to make it consistent with intgemm as intgemm doesn't seem to follow a consistent pattern (there is inconsistency in the order of input and output arguments across functions that use a callback and the ones that don't). The current interface follows a simple pattern: output is always the last argument and this makes the interface pretty consistent.
I have changed the documentation to use rows and cols of the input matrices and not of the intermediate results (i.e. prepared A, prepared B and prepared Bias).
While reviewing, please look out for the places where I shouldn't have mentioned about the Shape of the matrix.

@andrenatal I believe one final review from Nik should make it possible to land this.

andrenatal · 2021-08-20T21:32:36Z

thanks @abhi-agg and @XapaJIaMnu

XapaJIaMnu

I think it looks good, just some more warnings about the data type of a prepared A.

src/tensors/cpu/wasm_intgemm_interface.h

abhi-agg added 4 commits August 4, 2021 14:41

Interface for the integer matrix multiplication for wasm

f08b8db

Refactoring

5d4ce49

Removed Int8PrepareBTransposed to leave room for discussions

f55b39c

Small refactoring

bc47163

abhi-agg requested review from kpu and XapaJIaMnu August 4, 2021 13:50

abhi-agg mentioned this pull request Aug 4, 2021

Define a generic (target-agnostic) interface for matrix-multiply for wasm browsermt/bergamot-translator#213

Closed

XapaJIaMnu reviewed Aug 4, 2021

View reviewed changes

src/tensors/cpu/wasm_intgemm_interface.h Outdated Show resolved Hide resolved

XapaJIaMnu reviewed Aug 4, 2021

View reviewed changes

src/tensors/cpu/wasm_intgemm_interface.h Outdated Show resolved Hide resolved

abhi-agg added 2 commits August 5, 2021 10:49

Removed QuantizedBuffer struct

e509979

- Add all of it's fields as function arguments

Make Prepared Bias a float instead of int8_t

0d3e84e

abhi-agg commented Aug 5, 2021

View reviewed changes

src/tensors/cpu/wasm_intgemm_interface.h Outdated Show resolved Hide resolved

abhi-agg requested a review from XapaJIaMnu August 5, 2021 09:10

abhi-agg commented Aug 5, 2021

View reviewed changes

src/tensors/cpu/wasm_intgemm_interface.h Outdated Show resolved Hide resolved

abhi-agg commented Aug 5, 2021

View reviewed changes

src/tensors/cpu/wasm_intgemm_interface.h Outdated Show resolved Hide resolved

abhi-agg commented Aug 5, 2021

View reviewed changes

src/tensors/cpu/wasm_intgemm_interface.h Outdated Show resolved Hide resolved

abhi-agg commented Aug 5, 2021

View reviewed changes

src/tensors/cpu/wasm_intgemm_interface.h Outdated Show resolved Hide resolved

abhi-agg added 4 commits August 5, 2021 18:29

Better documentation of each function

06e59cc

Small reformatting

7b7b57d

Added PrepareB function and removed "Shift" from function names

32d317d

Improved documentation

faa90cf

XapaJIaMnu reviewed Aug 9, 2021

View reviewed changes

src/tensors/cpu/wasm_intgemm_interface.h Outdated Show resolved Hide resolved

XapaJIaMnu reviewed Aug 9, 2021

View reviewed changes

src/tensors/cpu/wasm_intgemm_interface.h Outdated Show resolved Hide resolved

abhi-agg added 2 commits August 10, 2021 15:27

Added transpose in PrepareB function, removed Shift from API name

8d82ae0

Refactor parameter names

4b6f02e

abhi-agg requested a review from XapaJIaMnu August 11, 2021 11:59

abhi-agg added 7 commits August 11, 2021 14:05

Added changelog

d30a61b

Ran clang format

2f448da

Changed name of Multiply to MultiplyAndAddBias

e5d45b2

camelCase function names

e76c8f9

Better documentation for MultiplyAndAddBias function

e63f119

More documentation

1f7320b

Set Index to uint32_t

44dea58

abhi-agg mentioned this pull request Aug 18, 2021

Native implementation of the wasm matrix multiply interface for Intel architecture #51

Closed

kpu reviewed Aug 18, 2021

View reviewed changes

src/tensors/cpu/wasm_intgemm_interface.h Outdated Show resolved Hide resolved

kpu reviewed Aug 18, 2021

View reviewed changes

src/tensors/cpu/wasm_intgemm_interface.h Outdated Show resolved Hide resolved

Changed zero point to float

8943249

Consistent naming convention for bias argument

4e3a129

abhi-agg added 3 commits August 20, 2021 11:49

Removed row-major format and shape from documentation of prepared B

ed6fbcc

Final documentation addressing all comments from reviewers

bc38e4a

ran clang format

10661fc

XapaJIaMnu approved these changes Aug 22, 2021

View reviewed changes

src/tensors/cpu/wasm_intgemm_interface.h Show resolved Hide resolved

src/tensors/cpu/wasm_intgemm_interface.h Show resolved Hide resolved

Improved doc for PrepareA

ae5744e

abhi-agg merged commit 02fa9da into browsermt:master Aug 23, 2021

abhi-agg deleted the wasm-gemm-interface branch November 1, 2021 10:52

abhi-agg mentioned this pull request Nov 1, 2021

Wasm gemm interface update #61

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Target agnostic interface for 8-bit matrix multiply for wasm #49

Target agnostic interface for 8-bit matrix multiply for wasm #49

abhi-agg commented Aug 4, 2021 •

edited

Loading

yurydelendik commented Aug 4, 2021

XapaJIaMnu left a comment

XapaJIaMnu commented Aug 4, 2021

abhi-agg commented Aug 4, 2021 •

edited

Loading

XapaJIaMnu commented Aug 4, 2021

abhi-agg commented Aug 9, 2021

kpu commented Aug 18, 2021 •

edited

Loading

kpu commented Aug 18, 2021 •

edited

Loading

abhi-agg commented Aug 19, 2021 •

edited

Loading

XapaJIaMnu commented Aug 19, 2021

andrenatal commented Aug 19, 2021

abhi-agg commented Aug 19, 2021 •

edited

Loading

XapaJIaMnu commented Aug 20, 2021

abhi-agg commented Aug 20, 2021 •

edited

Loading

andrenatal commented Aug 20, 2021

XapaJIaMnu left a comment

Target agnostic interface for 8-bit matrix multiply for wasm #49

Target agnostic interface for 8-bit matrix multiply for wasm #49

Conversation

abhi-agg commented Aug 4, 2021 • edited Loading

Checklist

yurydelendik commented Aug 4, 2021

XapaJIaMnu left a comment

Choose a reason for hiding this comment

XapaJIaMnu commented Aug 4, 2021

abhi-agg commented Aug 4, 2021 • edited Loading

XapaJIaMnu commented Aug 4, 2021

abhi-agg commented Aug 9, 2021

kpu commented Aug 18, 2021 • edited Loading

kpu commented Aug 18, 2021 • edited Loading

abhi-agg commented Aug 19, 2021 • edited Loading

XapaJIaMnu commented Aug 19, 2021

andrenatal commented Aug 19, 2021

abhi-agg commented Aug 19, 2021 • edited Loading

XapaJIaMnu commented Aug 20, 2021

abhi-agg commented Aug 20, 2021 • edited Loading

andrenatal commented Aug 20, 2021

XapaJIaMnu left a comment

Choose a reason for hiding this comment

abhi-agg commented Aug 4, 2021 •

edited

Loading

abhi-agg commented Aug 4, 2021 •

edited

Loading

kpu commented Aug 18, 2021 •

edited

Loading

kpu commented Aug 18, 2021 •

edited

Loading

abhi-agg commented Aug 19, 2021 •

edited

Loading

abhi-agg commented Aug 19, 2021 •

edited

Loading

abhi-agg commented Aug 20, 2021 •

edited

Loading