Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target agnostic interface for 8-bit matrix multiply for wasm #49

Merged
merged 27 commits into from
Aug 23, 2021

Conversation

abhi-agg
Copy link

@abhi-agg abhi-agg commented Aug 4, 2021

This PR add the necessary interface for 8-bit matrix multiply function (uint8 * int8) for wasm

Fixes: browsermt/bergamot-translator#213

I request reviewers to please verify whether the the function signatures are correct for the intended use case.

Open issues that will be resolved in this PR incrementally:

  1. Which one of Int8::PrepareB and Int8::PrepareBTransposed functions should be added to make the interface complete while still keeping it target agnostic. I will add them once we reach a decision.
    EDIT: We agreed to add both.

  2. Do we need to add a fused Multiply function (a multiply function with PrepareA fused inside)?
    EDIT: We agreed that it can be added later based on if this function can provide performance gains.

  3. Should we support int8 * int8 multiplication as well? If yes, then I will add Int8::Multiply and the corresponding Int8::PrepareA function to support this use case.
    EDIT: We agreed that we don't need to add separate functions for this. The same Multiply function can deal with both.

Checklist

  • I have tested the code manually
  • I have run regression tests
  • I have read and followed CONTRIBUTING.md
  • I have updated CHANGELOG.md

@yurydelendik
Copy link

Such data organization is not really practical and will take a performance hit on bounds checks: you have to check now if QuantizedBuffer& structure in bounds, then deference and read value pointer, as well as deference Index* cols, and perform bounds checks there. We want to keep data for intrinsic in registers as much as possible; any reference/pointer will force data to be read from/written into memory.

Copy link
Collaborator

@XapaJIaMnu XapaJIaMnu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As talked in the other document, we need at least one more version of the the PrepareB functions. Furthermore it is not really practical to have a struct, as structs may exhibit different alignments on different hardware. Normally what is done is to stick extra metadata at the end of the long array (eg at the end of the int8_t* we have 1 float for the scale of A, one for the scale of B, and 2 zero_points, one corresponding to A and one to the B matrix).

src/tensors/cpu/wasm_intgemm_interface.h Outdated Show resolved Hide resolved
@XapaJIaMnu
Copy link
Collaborator

As for point 2., we will write some experimental kernels where you take float matrix A and int8_t matrix B and see if they are competitive in terms of speed compared to the two pass approach.

@abhi-agg
Copy link
Author

abhi-agg commented Aug 4, 2021

@yurydelendik @XapaJIaMnu If I understood correctly, removing QuantizedBuffer struct and adding its 3 fields (int8_t* value, float scale, int8_t zero_point) as function parameters will resolve the concern. Right?

@XapaJIaMnu
Copy link
Collaborator

@yurydelendik @XapaJIaMnu If I understood correctly, removing QuantizedBuffer struct and adding its 3 fields (int8_t* value, float scale, int8_t zero_point) as function parameters will resolve the concern. Right?

The zero point should be float, but yeah. Waiting for feedback from @kpu

@abhi-agg
Copy link
Author

abhi-agg commented Aug 9, 2021

@XapaJIaMnu Added PrepareB function as well 👍

@kpu
Copy link
Member

kpu commented Aug 18, 2021

I think it would be a helpful exercise for @abhi-agg to write short implementations of the proposed functions in terms of the existing functions for several reasons:

  1. You'll have a working implementation.
  2. The cost of binding it to implementation will disincentivize @abhi-agg from unnecessary changes that appear in the present API.
  3. He is going to be part of auditing it anyway.

I worry the relatively cheap cost of speccing out a new API (when one already exists) will result in a tax on having to change the implementation.

For example, it is not clear why the function argument order has changed. Or what value making the user specify the output shape of a matrix is (as opposed to the input shape), when it's actually going to an opaque representation that is neither row major or column major.

@kpu
Copy link
Member

kpu commented Aug 18, 2021

While it's not explicitly stated, intgemm's interface always talks to the user in row major (unless the function name has Transposed) and that's the meaning of A_rows, inner dimension, and B_cols. These refer to A and B in row major format. The opaque representation (B) is private to intgemm. These sizes refer to the input. There is a convention if not a stated one.

Here there is a mix of output size and A_rows style. For prepareB the user specifies an output shape. From an API usability perspective, I feel like callers know the shape of their input more than they know the shape of their output. (Though obviously they'll need to know both). Is this targeted at some practice that it's better to force the user to explicitly size where writes are going than it is to explicitly size where reads are coming from?

@abhi-agg
Copy link
Author

abhi-agg commented Aug 19, 2021

@kpu @XapaJIaMnu

For example, it is not clear why the function argument order has changed

I can make the order same as of intgemm interface 👍

While it's not explicitly stated, intgemm's interface always talks to the user in row major (unless the function name has Transposed) and that's the meaning of A_rows, inner dimension, and B_cols. These refer to A and B in row major format. The opaque representation (B) is private to intgemm. These sizes refer to the input. There is a convention if not a stated one.

Taking an example just to make sure that I understood it correctly. Do you mean that:

  1. The argument rows of PrepareA(const float *input, int8_t *output, float quant_mult, Index rows, Index cols) and A_rows of Multiply(const int8_t *A, const int8_t *B, Index A_rows, Index width, Index B_cols, Callback callback) are same and represent the no. of rows of input matrix A (input)?
  2. The argument rows of (*PrepareB)(const float *input, int8_t *output, float quant_mult, Index rows, Index cols), rows of (*SelectColumnsB)(const int8_t *input, int8_t *output, Index rows, const Index *cols_begin, const Index *cols_end) and width of Multiply(const int8_t *A, const int8_t *B, Index A_rows, Index width, Index B_cols, Callback callback) are all the same and represent the no. of rows of input matrix B?

From an API usability perspective, I feel like callers know the shape of their input more than they know the shape of their output. (Though obviously they'll need to know both). Is this targeted at some practice that it's better to force the user to explicitly size where writes are going than it is to explicitly size where reads are coming from?

Could you please explain what you mean by opaque representation? This would be super helpful.

PS: The unstated conventions of intgemm led me to make some assumptions that I documented here and it obviously came out as an attempt to redesign the APIs to you. The goal is not to redesign but to document the APIs with all the conventions, keeping in mind that this interface can work for Arm too and we are heading in the right direction because of your reviews. I can make the required changes 👍

@XapaJIaMnu
Copy link
Collaborator

@abhi-agg opaque representation refers to the fact, that after PrepareB the output B matrix is neither rowM, nor ColumnMajor, but an opaque binary representation. You can iterate it and make sense of it.

@andrenatal
Copy link

Can we move this faster and expedite it? Our WebAssembly team is blocked by the decision that needs to be taken here, and if we keep delaying it, we might lose their support and let this slip through the cracks.

What's still missing to have this landed, @kpu @XapaJIaMnu @abhi-agg?

Thanks

@abhi-agg
Copy link
Author

abhi-agg commented Aug 19, 2021

@abhi-agg opaque representation refers to the fact, that after PrepareB the output B matrix is neither rowM, nor ColumnMajor, but an opaque binary representation. You can iterate it and make sense of it.

@XapaJIaMnu Thanks a lot. This clarifies a lot. Few questions:

  1. Is the output of PrepareA and PrepareBias also opaque?
  2. The result of the multiply function (C) is in row-major format. Right?
  1. The argument rows of PrepareA(const float *input, int8_t *output, float quant_mult, Index rows, Index cols) and A_rows of Multiply(const int8_t *A, const int8_t *B, Index A_rows, Index width, Index B_cols, Callback callback) are same and represent the no. of rows of input matrix A (input)?

  2. The argument rows of (*PrepareB)(const float *input, int8_t *output, float quant_mult, Index rows, Index cols), rows of (*SelectColumnsB)(const int8_t *input, int8_t *output, Index rows, const Index *cols_begin, const Index *cols_end) and width of Multiply(const int8_t *A, const int8_t *B, Index A_rows, Index width, Index B_cols, Callback callback) are all the same and represent the no. of rows of input matrix B?

Could you please also confirm this? This will resolve all the confusion 👍

@XapaJIaMnu
Copy link
Collaborator

@abhi-agg

  1. PrepareA and PrepareBias are both row Major format, but it is difficult to explain to the user what exactly prepareBias does, as I illustrated it to you beforehand.
  2. C is RowMajor.

Yes for prepareA and Multiply,
For PrepareB and Multiply, it's a bit more complicated. If B is supposed to be transposed while it is being prepare (PrepareBTransposed), it corresponds width corresponds to the rows of the transposed matrix (or the cols of the original one). See the signature of this function.

https://github.com/kpu/intgemm/blob/master/intgemm/intgemm.h#L339

What you have written holds for the default case where B doesn't need to be transposed before the operation:

https://github.com/kpu/intgemm/blob/master/intgemm/intgemm.h#L328

@abhi-agg
Copy link
Author

abhi-agg commented Aug 20, 2021

@XapaJIaMnu Thanks a lot. Kenneth's and your answers have cleared a lot of unstated conventions of intgemm interface.

I have changed the documentation to reflect all of it. However, it is possible that there are few things which I still might have misunderstood. So, it would be great if you can give it a final look and point out anything that seems wrong 👍

Additionally:

  1. I have added a separate function for taking the transposed B input. I believe the API stays clean that way (instead of a boolean flag to pass a transposed and non-transposed B matrix in the same function).
  2. I haven't changed the order of the function arguments to make it consistent with intgemm as intgemm doesn't seem to follow a consistent pattern (there is inconsistency in the order of input and output arguments across functions that use a callback and the ones that don't). The current interface follows a simple pattern: output is always the last argument and this makes the interface pretty consistent.
  3. I have changed the documentation to use rows and cols of the input matrices and not of the intermediate results (i.e. prepared A, prepared B and prepared Bias).
  4. While reviewing, please look out for the places where I shouldn't have mentioned about the Shape of the matrix.

@andrenatal I believe one final review from Nik should make it possible to land this.

@andrenatal
Copy link

thanks @abhi-agg and @XapaJIaMnu

Copy link
Collaborator

@XapaJIaMnu XapaJIaMnu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good, just some more warnings about the data type of a prepared A.

src/tensors/cpu/wasm_intgemm_interface.h Show resolved Hide resolved
src/tensors/cpu/wasm_intgemm_interface.h Show resolved Hide resolved
@abhi-agg abhi-agg merged commit 02fa9da into browsermt:master Aug 23, 2021
@abhi-agg abhi-agg deleted the wasm-gemm-interface branch November 1, 2021 10:52
@abhi-agg abhi-agg mentioned this pull request Nov 1, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define a generic (target-agnostic) interface for matrix-multiply for wasm
5 participants