[CPU] [ARM] [INT8] FullyConnected #25171

eshoguli · 2024-06-22T21:57:32Z

Details:

[ARM] [INT8] FullyConnected

Tickets:

CVS-149494

github-actions · 2024-07-29T00:21:04Z

This PR will be closed in a week because of 2 weeks of no activity.

github-actions · 2024-09-13T00:21:46Z

This PR will be closed in a week because of 2 weeks of no activity.

EgorDuplensky · 2024-09-18T13:06:30Z

src/plugins/intel_cpu/tests/functional/shared_tests_instances/skip_tests_config.cpp

@@ -479,8 +479,10 @@ std::vector<std::string> disabledTestPatterns() {
    retVector.emplace_back(R"(smoke_TestsDFT_(1|2|3|4)d/DFTLayerTest.Inference.*)");
    // Issue 88764, 91647, 108802: accuracy issue
    retVector.emplace_back(R"(MultipleLSTMCellTest/MultipleLSTMCellTest.CompareWithRefs.*)");
+#if !defined(OPENVINO_ARCH_ARM64)


Does it mean we want to skip it for ARM32?

EgorDuplensky · 2024-09-18T13:14:51Z

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp

@@ -87,6 +88,11 @@ static const TypeMapping aclFCTypeMapping {
    {{_any, _any, _any, _any},               pt(just<f32>(), just<f32>(), just<f32>(), just<f32>())}
 };

+static const TypeMapping aclLowpFCTypeMapping {
+    // {src, wei, bia, dst}                  pt<src, wei, bias, dst>
+    {{_i8, _i8, _any, _f32},                 pt(just<i8>(), just<i8>(), just<f32>(), just<f32>())}


pt(bypass(), bypass(), just(), bypass())

EgorDuplensky · 2024-09-18T13:31:21Z

src/plugins/intel_cpu/src/nodes/executors/acl/acl_weights.cpp

+namespace ov {
+namespace intel_cpu {
+
+VectorDims acl_fc_executor::makeDummyInputDims(const Shape& inShape, const Shape& wShape) {


Regarding the name of the file.
From my understanding, the functions in this file are directly related to fullyconnected operation and also includes some functions such as makeDummyInputDims which are not completely about weights.
Can we rename it to something like acl_fullyconnected_utils.cpp
And the header as well.

Renamed it to acl_fullyconnected_utils

EgorDuplensky · 2024-09-18T13:40:09Z

src/plugins/intel_cpu/src/nodes/executors/acl/acl_lowp_fullyconnected.cpp

+    if (dequantizationScales.empty()) {
+        tensor_info->set_quantization_info(arm_compute::QuantizationInfo(1.f));
+    } else {
+        tensor_info->set_quantization_info(arm_compute::QuantizationInfo(dequantizationScales[0]));
+    }


Same question here

EgorDuplensky · 2024-09-18T13:40:13Z

src/plugins/intel_cpu/src/nodes/executors/acl/acl_lowp_fullyconnected.cpp

+    VERIFY(checkPostOps(config.postOps), UNSUPPORTED_TYPE_OF_POSTOPS);
+    VERIFY(one_of(srcRank(config), 2U, 3U, 4U), UNSUPPORTED_SRC_RANK);
+    VERIFY(one_of(weiRank(config), 2U, 3U, 4U), UNSUPPORTED_WEI_RANK);
+    VERIFY(static_cast<FCAttrs>(config.attrs).dequantizationScales.size() <= 1, UNSUPPORTED_PER_CHANNEL_QUANTIZATION);


Why do we expect that dequantizationScales can be empty?
Is it possible that quantized FullyConnected will not contain dequantization scales at all?

EgorDuplensky · 2024-09-18T13:44:53Z

@v-Golubev Could you please review an LPT part

v-Golubev · 2024-09-20T08:08:34Z

...onal/shared_tests_instances/low_precision_transformations/fully_connected_transformation.cpp

 INSTANTIATE_TEST_SUITE_P(smoke_LPT, FullyConnectedTransformation,
    ::testing::Combine(
        ::testing::ValuesIn(netPrecisions),
        ::testing::ValuesIn(shapes),
-        ::testing::Values(ov::test::utils::DEVICE_GPU),
-        ::testing::ValuesIn(trasformationParamValues)),
+        ::testing::Values(ov::test::utils::DEVICE_CPU),


Suggested change

::testing::Values(ov::test::utils::DEVICE_CPU),

::testing::Values(ov::test::utils::DEVICE_GPU),

v-Golubev · 2024-09-20T08:14:14Z

...nctional/shared_test_classes/src/base/low_precision_transformations/layer_transformation.cpp

@@ -60,14 +61,14 @@ std::string LayerTransformation::get_test_case_name_by_params(

 namespace {
 template <typename IsNodeF>
-std::string find_node_by_runtime_precision(const ov::CompiledModel& execNet, IsNodeF is_node_f) {
+std::string find_node_by_runtime_precision(const ov::CompiledModel& execNet, IsNodeF is_node_f, const std::string& propertyName = "runtimePrecision") {


Could you please rename this function in the correspondence to the changes?

Renamed to find_node_by_runtime_property

v-Golubev · 2024-09-20T08:16:37Z

src/tests/ov_helpers/ov_lpt_models/include/ov_lpt_models/mat_mul.hpp

-        const bool transpose2);
+        const bool transpose2,
+        const bool signedWeights,
+        const bool perChannelWeights,


perChannelWeights name is a bit confusing. Does this mean a dequantization on weights? Could you please rename it?

renamed to perChannelWeightsDequantization

v-Golubev · 2024-09-20T08:18:39Z

src/tests/ov_helpers/ov_lpt_models/src/mat_mul.cpp

+template <typename T>
+std::vector<T> generate_values(const ov::Shape& shape, float delimiter = 1.f) {
+    std::vector<T> values(ov::shape_size(shape));
+    for (size_t i = 0; i < values.size(); ++i) {
+        values[i] = static_cast<T>(static_cast<T>(i) / delimiter);
+    }
+    return values;
+}


Why can't we use the existing ov::test::utils::make_constant helper?

v-Golubev · 2024-09-20T08:21:06Z

src/tests/ov_helpers/ov_lpt_models/src/mat_mul.cpp

+std::vector<float> generate_dequantization_values(
+        const ov::Shape& shape,
+        const size_t levels,
+        const bool low) {
+    const auto shape_size = ov::shape_size(shape);
+    std::vector<float> values(shape_size);
+    for (size_t i = 0; i < shape_size; ++i) {
+        values[i] = low ? -128.f / (static_cast<float>(i) + 1.f) : 127.f / (static_cast<float>(i) + 1.f);
+    }
+    return values;
+}


The same question: why can't we reuse ov::test::utils::make_constant? We can pass the desired range of values in this helper

levels param is not used

v-Golubev · 2024-09-20T08:29:18Z

src/tests/ov_helpers/ov_lpt_models/src/mat_mul.cpp

-        weightsConst, precision, 256ul, { 1ul, 1ul },
-        { -128.f / 8.f }, { 127.f / 8.f }, { -128.f / 8.f }, { 127.f / 8.f });
-    fakeQuantizeOnWeights->set_friendly_name("fakeQuantizeOnWeights");
+    const size_t channel = inputShape2[inputShape2.size() - 2].get_length();


Shouldn't we transpose inputShape2 if transpose2 == true?

v-Golubev · 2024-09-20T08:29:44Z

src/tests/ov_helpers/ov_lpt_models/src/mat_mul.cpp

+    // fq
+    std::shared_ptr<Node> parentOnWeights;
+    if (fq) {
+        auto weightsConst = std::make_shared<ov::op::v0::Constant>(


Can we support transpose2=true option for fq?

v-Golubev · 2024-09-20T12:15:21Z

src/common/low_precision_transformations/src/mat_mul_with_dequantization.cpp

+        return;
+    }
+
+    ov::mark_as_bias(dequantization);


BiasAttribute is Add specific (please take a look at BiasAttribute::is_copyable realization), I think its usage for Multiply ops is not good idea. Let's discuss alternative options offline

v-Golubev · 2024-09-20T12:19:02Z

src/common/low_precision_transformations/src/low_precision.cpp

+#ifdef OPENVINO_ARCH_ARM64
+    ADD_MATCHER(common, MatMulWithDequantizationTransformation, params)
+#else
    ADD_MATCHER(common, MatMulTransformation, params)
+#endif


I believe such reconfiguration should be on plugin's side, not in common LPT code. I'd suggest do the following in CPU's transformation pipeline instead of the current implementation (for ARM platform): We can keep MatMul transformation as is (and not introduce its inheritor), but add an ARM specific transformation which will match on quantized MatMul + multiply, and mark the multiply with the needed attribute.

We already have additional_main_passes, which allow to add a custom matcher to LPT pipeline.

v-Golubev · 2024-09-20T12:25:54Z

src/plugins/intel_cpu/src/transformations/snippets/aarch64/pass/snippets_mark_skipped.cpp

+        if (ov::marked_as_bias(node)) {
+            SetNodeFusingType(node, NodeFusingType::FusedWithMisc);
+        }


It looks like a WA. This node should be marked as FusedWithFC because isSuitableMatMulParent returns true. If this doesn't happen, we need to modify the existing function

github-actions · 2024-10-05T00:22:29Z

This PR will be closed in a week because of 2 weeks of no activity.

eshoguli requested review from a team as code owners June 22, 2024 21:57

github-actions bot added category: IE Tests OpenVINO Test: plugins and common category: CPU OpenVINO CPU plugin category: LP transformations OpenVINO Low Precision transformations labels Jun 22, 2024

eshoguli changed the title ~~[TEST] [ARM] [INT8] FullyConnected~~ [TEST] [CPU] [ARM] [INT8] FullyConnected Jun 23, 2024

eshoguli force-pushed the es/aarch64/int8 branch from 46e41b5 to c2d4099 Compare June 26, 2024 00:21

github-actions bot removed the category: LP transformations OpenVINO Low Precision transformations label Jun 26, 2024

eshoguli requested review from a team as code owners June 26, 2024 10:53

github-actions bot added category: GPU OpenVINO GPU plugin category: build OpenVINO cmake script / infra labels Jun 26, 2024

eshoguli changed the title ~~[TEST] [CPU] [ARM] [INT8] FullyConnected~~ [CPU] [ARM] [INT8] FullyConnected Jun 26, 2024

eshoguli force-pushed the es/aarch64/int8 branch 5 times, most recently from b972f54 to 743281f Compare July 2, 2024 00:07

eshoguli force-pushed the es/aarch64/int8 branch 5 times, most recently from b56d725 to ea6c2b2 Compare July 10, 2024 19:32

github-actions bot added the Stale label Jul 29, 2024

eshoguli force-pushed the es/aarch64/int8 branch 2 times, most recently from af1105f to 0d7c9ec Compare July 31, 2024 00:36

eshoguli force-pushed the es/aarch64/int8 branch from 54049ba to c5e5fc9 Compare August 17, 2024 09:54

eshoguli removed category: IE Tests OpenVINO Test: plugins and common category: GPU OpenVINO GPU plugin category: build OpenVINO cmake script / infra labels Aug 18, 2024

eshoguli force-pushed the es/aarch64/int8 branch from c5e5fc9 to 6cc76f2 Compare August 19, 2024 22:42

github-actions bot added category: IE Tests OpenVINO Test: plugins and common category: GPU OpenVINO GPU plugin category: build OpenVINO cmake script / infra labels Aug 19, 2024

eshoguli assigned EgorDuplensky, allnes and alvoron Aug 19, 2024

ilya-lavrenov added the platform: arm OpenVINO on ARM / ARM64 label Aug 29, 2024

github-actions bot added the Stale label Sep 13, 2024

mg-intel removed the Stale label Sep 13, 2024

eshoguli added 3 commits September 14, 2024 13:02

[CPU] [ARM] FullyConnected: int8 support

8212473

[CPU] [ACL] FullyConnected fp32 executor refactoring

7dd226e

cleanup and refactoring

d4c30c6

eshoguli force-pushed the es/aarch64/int8 branch from 6cc76f2 to d4c30c6 Compare September 16, 2024 09:25

EgorDuplensky reviewed Sep 18, 2024

View reviewed changes

v-Golubev self-assigned this Sep 18, 2024

v-Golubev reviewed Sep 20, 2024

View reviewed changes

github-actions bot added the Stale label Oct 5, 2024

allnes added no_stale Do not mark as stale and removed Stale labels Oct 5, 2024

alvoron added 3 commits November 14, 2024 17:22

address comments #1

5dbd319

Merge branch 'master' into es/aarch64/int8

03315bf

fixed build

e6d4880

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] [ARM] [INT8] FullyConnected #25171

[CPU] [ARM] [INT8] FullyConnected #25171

eshoguli commented Jun 22, 2024 •

edited

Loading

github-actions bot commented Jul 29, 2024

github-actions bot commented Sep 13, 2024

EgorDuplensky Sep 18, 2024

EgorDuplensky Sep 18, 2024

alvoron Nov 14, 2024

EgorDuplensky Sep 18, 2024

alvoron Nov 14, 2024

EgorDuplensky Sep 18, 2024

EgorDuplensky Sep 18, 2024

EgorDuplensky commented Sep 18, 2024

v-Golubev Sep 20, 2024

alvoron Nov 14, 2024

v-Golubev Sep 20, 2024

alvoron Nov 14, 2024

v-Golubev Sep 20, 2024

alvoron Nov 14, 2024

v-Golubev Sep 20, 2024

v-Golubev Sep 20, 2024

v-Golubev Sep 20, 2024

v-Golubev Sep 20, 2024

v-Golubev Sep 20, 2024

v-Golubev Sep 20, 2024

v-Golubev Sep 20, 2024

github-actions bot commented Oct 5, 2024

	::testing::Values(ov::test::utils::DEVICE_CPU),
	::testing::Values(ov::test::utils::DEVICE_GPU),

[CPU] [ARM] [INT8] FullyConnected #25171

Are you sure you want to change the base?

[CPU] [ARM] [INT8] FullyConnected #25171

Conversation

eshoguli commented Jun 22, 2024 • edited Loading

Details:

Tickets:

github-actions bot commented Jul 29, 2024

github-actions bot commented Sep 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EgorDuplensky commented Sep 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 5, 2024

eshoguli commented Jun 22, 2024 •

edited

Loading