-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Target agnostic interface for 8-bit matrix multiply for wasm #49
Conversation
Such data organization is not really practical and will take a performance hit on bounds checks: you have to check now if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As talked in the other document, we need at least one more version of the the PrepareB functions. Furthermore it is not really practical to have a struct, as structs may exhibit different alignments on different hardware. Normally what is done is to stick extra metadata at the end of the long array (eg at the end of the int8_t*
we have 1 float for the scale of A, one for the scale of B, and 2 zero_points, one corresponding to A and one to the B matrix).
As for point 2., we will write some experimental kernels where you take |
@yurydelendik @XapaJIaMnu If I understood correctly, removing |
The zero point should be float, but yeah. Waiting for feedback from @kpu |
- Add all of it's fields as function arguments
@XapaJIaMnu Added |
I think it would be a helpful exercise for @abhi-agg to write short implementations of the proposed functions in terms of the existing functions for several reasons:
I worry the relatively cheap cost of speccing out a new API (when one already exists) will result in a tax on having to change the implementation. For example, it is not clear why the function argument order has changed. Or what value making the user specify the output shape of a matrix is (as opposed to the input shape), when it's actually going to an opaque representation that is neither row major or column major. |
While it's not explicitly stated, intgemm's interface always talks to the user in row major (unless the function name has Transposed) and that's the meaning of A_rows, inner dimension, and B_cols. These refer to A and B in row major format. The opaque representation (B) is private to intgemm. These sizes refer to the input. There is a convention if not a stated one. Here there is a mix of output size and A_rows style. For prepareB the user specifies an output shape. From an API usability perspective, I feel like callers know the shape of their input more than they know the shape of their output. (Though obviously they'll need to know both). Is this targeted at some practice that it's better to force the user to explicitly size where writes are going than it is to explicitly size where reads are coming from? |
I can make the order same as of intgemm interface 👍
Taking an example just to make sure that I understood it correctly. Do you mean that:
Could you please explain what you mean by PS: The unstated conventions of intgemm led me to make some assumptions that I documented here and it obviously came out as an attempt to redesign the APIs to you. The goal is not to redesign but to document the APIs with all the conventions, keeping in mind that this interface can work for Arm too and we are heading in the right direction because of your reviews. I can make the required changes 👍 |
@abhi-agg opaque representation refers to the fact, that after |
Can we move this faster and expedite it? Our WebAssembly team is blocked by the decision that needs to be taken here, and if we keep delaying it, we might lose their support and let this slip through the cracks. What's still missing to have this landed, @kpu @XapaJIaMnu @abhi-agg? Thanks |
@XapaJIaMnu Thanks a lot. This clarifies a lot. Few questions:
Could you please also confirm this? This will resolve all the confusion 👍 |
Yes for prepareA and Multiply, https://github.com/kpu/intgemm/blob/master/intgemm/intgemm.h#L339 What you have written holds for the default case where B doesn't need to be transposed before the operation: https://github.com/kpu/intgemm/blob/master/intgemm/intgemm.h#L328 |
@XapaJIaMnu Thanks a lot. Kenneth's and your answers have cleared a lot of unstated conventions of intgemm interface. I have changed the documentation to reflect all of it. However, it is possible that there are few things which I still might have misunderstood. So, it would be great if you can give it a final look and point out anything that seems wrong 👍 Additionally:
@andrenatal I believe one final review from Nik should make it possible to land this. |
thanks @abhi-agg and @XapaJIaMnu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good, just some more warnings about the data type of a prepared A.
This PR add the necessary interface for 8-bit matrix multiply function (
uint8 * int8
) for wasmFixes: browsermt/bergamot-translator#213
I request reviewers to please verify whether the the function signatures are correct for the intended use case.
Open issues that will be resolved in this PR incrementally:
Which one of
Int8::PrepareB
andInt8::PrepareBTransposed
functions should be added to make the interface complete while still keeping it target agnostic. I will add them once we reach a decision.EDIT: We agreed to add both.
Do we need to add a fused Multiply function (a multiply function with PrepareA fused inside)?
EDIT: We agreed that it can be added later based on if this function can provide performance gains.
Should we support
int8 * int8
multiplication as well? If yes, then I will addInt8::Multiply
and the correspondingInt8::PrepareA
function to support this use case.EDIT: We agreed that we don't need to add separate functions for this. The same Multiply function can deal with both.
Checklist