Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target agnostic interface for 8-bit matrix multiply for wasm #49

Merged
merged 27 commits into from
Aug 23, 2021
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
f08b8db
Interface for the integer matrix multiplication for wasm
abhi-agg Aug 4, 2021
5d4ce49
Refactoring
abhi-agg Aug 4, 2021
f55b39c
Removed Int8PrepareBTransposed to leave room for discussions
abhi-agg Aug 4, 2021
bc47163
Small refactoring
abhi-agg Aug 4, 2021
e509979
Removed QuantizedBuffer struct
abhi-agg Aug 5, 2021
0d3e84e
Make Prepared Bias a float instead of int8_t
abhi-agg Aug 5, 2021
06e59cc
Better documentation of each function
abhi-agg Aug 5, 2021
7b7b57d
Small reformatting
abhi-agg Aug 5, 2021
32d317d
Added PrepareB function and removed "Shift" from function names
abhi-agg Aug 9, 2021
faa90cf
Improved documentation
abhi-agg Aug 9, 2021
8d82ae0
Added transpose in PrepareB function, removed Shift from API name
abhi-agg Aug 10, 2021
4b6f02e
Refactor parameter names
abhi-agg Aug 10, 2021
8724a5d
Reformatting: Moved SelectColumnsOfB in the end
abhi-agg Aug 10, 2021
496d17a
Added documentation
abhi-agg Aug 10, 2021
d30a61b
Added changelog
abhi-agg Aug 11, 2021
2f448da
Ran clang format
abhi-agg Aug 11, 2021
e5d45b2
Changed name of Multiply to MultiplyAndAddBias
abhi-agg Aug 11, 2021
e76c8f9
camelCase function names
abhi-agg Aug 11, 2021
e63f119
Better documentation for MultiplyAndAddBias function
abhi-agg Aug 11, 2021
1f7320b
More documentation
abhi-agg Aug 11, 2021
44dea58
Set Index to uint32_t
abhi-agg Aug 11, 2021
8943249
Changed zero point to float
abhi-agg Aug 19, 2021
4e3a129
Consistent naming convention for bias argument
abhi-agg Aug 20, 2021
ed6fbcc
Removed row-major format and shape from documentation of prepared B
abhi-agg Aug 20, 2021
bc38e4a
Final documentation addressing all comments from reviewers
abhi-agg Aug 20, 2021
10661fc
ran clang format
abhi-agg Aug 20, 2021
ae5744e
Improved doc for PrepareA
abhi-agg Aug 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- Enable compiling marian on wasm platform
- Added capability to compile wasm compatible marian sources (i.e. the sources that compile on wasm successfully) natively.
- Enable loading SentencePiece vocabs from protobuf
- Added a target-agnostic matrix multiply interface for wasm builds

### Fixed
- Segfault of spm_train when compiled with -DUSE_STATIC_LIBS=ON seems to have gone away with update to newer SentencePiece version.
Expand Down
193 changes: 193 additions & 0 deletions src/tensors/cpu/wasm_intgemm_interface.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
#pragma once

/** Main interface for integer matrix multiplication followed by addition of bias for wasm.
*
* C = A * B + Bias
*
* A is typically activations whose rows should be a multiple of 1 (i.e. no restriction) and
* columns should be a multiple of 64.
*
* B is typically fixed model parameters whose rows should be a multiple of 64 and columns
* should be a multiple of 8.
*
* All matrices A, B and C are in row-major format.
*
* Please note that most of the functions in this interface might have architecture specific
* implementations.
*/

#include <cstdint>

using Index = uint32_t;

/**
* Prepare B for the Matrix Multiply routine.
*
* B is prepared in a CPU-dependent format by performing quantization on floating values.
* Please note that this interface might have architecture specific implementations.
*
* @param[in] input_B An array representing the input 2-D matrix.
* Size of the array = `output_rows` * `output_cols`.
*
* If the input matrix is in transposed form then:
* Shape of the matrix: (`output_cols`, `output_rows`)
*
* If the input matrix is NOT in transposed form then:
* Shape of the matrix: (`output_rows`, `output_cols`)
* @param[in] scale The scaling factor (for quantization)
* @param[in] zero_point The zero point (for quantization)
* @param[in] is_input_transposed Whether the input matrix is in transposed form or not.
* @param[in] output_rows No. of rows of output (prepared B) matrix.
* It should be a multiple of 64.
* @param[in] output_cols No. of columns of output (prepared B) matrix.
* It should be a multiple of 8.
* @param[out] output An array representing the prepared B matrix in row-major
* format. Size of the array = `output_rows` * `output_cols`.
* Shape of the matrix: (`output_rows`, `output_cols`)
*/
void int8PrepareB(const float* input_B,
float scale,
int8_t zero_point,
bool is_input_transposed,
Index output_rows,
abhi-agg marked this conversation as resolved.
Show resolved Hide resolved
Index output_cols,
int8_t* output);

/**
* Prepare B for the Matrix Multiply routine from an already quantized, transposed and a
* CPU-independent format of B.
*
* B is prepared in a CPU-dependent format. This function is useful while using the quantized models
* that are stored in a CPU-independent format on the disk.
*
* @param[in] input_B An array representing the input 2-D matrix.
* Size of the array = `output_rows` * `output_cols`.
* Shape of the matrix: (`output_cols`, `output_rows`)
* @param[in] output_rows No. of rows of output (prepared B) matrix.
* It should be a multiple of 64.
* @param[in] output_cols No. of columns of output (prepared B) matrix.
* It should be a multiple of 8.
* @param[out] output An array representing the prepared B matrix in row-major format.
* Size of the array = `output_rows` * `output_cols`.
* Shape of the matrix: (`output_rows`, `output_cols`)
*/
void int8PrepareBQuantizedTransposed(const int8_t* input_B,
Index output_rows,
Index output_cols,
int8_t* output);

/**
* Prepare A for the Matrix Multiply routine.
*
* It performs quantization on floating values.
* Please note that this interface might have architecture specific implementations.
*
* @param[in] input_A An array representing the input 2-D matrix in row-major format.
* Size of the array = `output_rows` * `output_cols`.
* Shape of the matrix: (`output_rows`, `output_cols`)
* @param[in] scale The scaling factor (for quantization)
* @param[in] zero_point The zero point (for quantization)
* @param[in] output_rows No. of rows of output (prepared A) matrix.
* No restriction on its size.
* @param[in] output_cols No. of columns of output (prepared A) matrix.
* It should be a multiple of 64.
* @param[out] output An array representing the prepared A matrix in row-major format.
* Size of the array = `output_rows` * `output_cols`.
* Shape of the matrix: (`output_rows`, `output_cols`)
*/
void int8PrepareA(const float* input_A,
float scale,
int8_t zero_point,
Index output_rows,
Index output_cols,
abhi-agg marked this conversation as resolved.
Show resolved Hide resolved
int8_t* output);

/**
* Prepares bias for the Matrix Multiply routine.
*
* It uses the prepared B and a bias input to prepare the final bias.
*
* @param[in] input_B An array representing the prepared B (input) 2-D matrix in row-major
* format. Size of the array = `rows_B` * `cols_B`.
* Shape of the matrix: (`rows_B`, `cols_B`)
* @param[in] scale The scaling factor (for quantization)
* @param[in] zero_point The zero point (for quantization)
* @param[in] rows_B No. of rows of the prepared B matrix. It should be a multiple of 64.
* @param[in] cols_B No. of columns of prepared B matrix. It should be a multiple of 8.
* @param[in] bias_input An array representing the input bias. Size of the array = 1 * `cols_B`
* @param[out] output An array representing the final prepared bias.
* Size of the array = 1 * `cols_B`
*/
void int8PrepareBias(const int8_t* input_B,
float scale,
int8_t zero_point,
Index rows_B,
Index cols_B,
const float* bias_input,
float* output);

/**
* Perform multiplication of 2 matrices followed by adding a bias.
*
* i.e Output = A * B + Bias
*
* Please note that:
* 1. This interface might have architecture specific implementation.
* 2. Inputs A, B and Bias must be prepared using the corresponding implementations
* of int8Prepare* functions for that architecture.
*
* @param[in] input_A An array representing prepared A (input) 2-D matrix in row-major
* format. Size of the array = `rows_A` * `width`.
* Shape of the matrix: (`rows_A`, `width`)
* @param[in] scale_A The scaling factor (for quantization) of A
* @param[in] zero_point_A The zero point (for quantization) of A
* @param[in] input_B An array representing prepared B (input) 2-D matrix in row-major
* format. Size of the array = `width` * `cols_B`.
* Shape of the matrix: (`width`, `cols_B`)
* @param[in] scale_B The scaling factor (for quantization) of B
* @param[in] zero_point_B The zero point (for quantization) of B
* @param[in] bias_input An array representing the prepared bias.
* Size of the array = 1 * `cols_B`
* @param[in] rows_A No. of rows of prepared A matrix. No restriction on its size.
* @param[in] width No. of columns of prepared A matrix (= no. of rows of prepared B
* matrix). It should be a multiple of 64.
* @param[in] cols_B No. of columns of prepared B matrix. It should be a multiple of 8.
* @param[out] output An array representing the multiplication result in row-major format.
* Size of the array = `rows_A` * `cols_B`
*/
void int8MultiplyAndAddBias(const int8_t* input_A,
float scale_A,
int8_t zero_point_A,
const int8_t* input_B,
float scale_B,
int8_t zero_point_B,
const float* bias_input,
Index rows_A,
Index width,
Index cols_B,
float* output);

/**
* Select a subset of columns from a prepared B matrix.
*
* Indices of the columns to be selected are specified by an array.
*
* @param[in] input_B An array representing the prepared B (input) 2-D matrix in row-major
* format. Size of the array = `rows_B` * `cols_B`.
* Shape of the matrix: (`rows_B`, `cols_B`)
* @param[in] rows_B No. of rows of input matrix. It should be a multiple of 64.
* @param[in] cols_B No. of columns of input matrix. It should be a multiple of 8.
* @param[in] cols An array of column indices to be selected from input matrix.
* All indices of the array should be valid. i.e.
* i.e. 0 <= cols[N] < cols_B where N = 0, 1, 2 .... (`num_cols`-1)
* @param[in] num_cols Size of the `cols` array. It should be a multiple of 8.
* @param[out] output An array representing the selected columns of input matrix.
* Size of the array = `rows_B` * `num_cols`.
* Shape of the matrix: (`rows_B`, `num_cols`)
*/
void int8SelectColumnsOfB(const int8_t* input_B,
Index rows_B,
Index cols_B,
const Index* cols,
const Index num_cols,
int8_t* output);