Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] [ARM] [INT8] FullyConnected #25171

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

eshoguli
Copy link
Contributor

@eshoguli eshoguli commented Jun 22, 2024

Details:

  • [ARM] [INT8] FullyConnected

Tickets:

@eshoguli eshoguli requested review from a team as code owners June 22, 2024 21:57
@github-actions github-actions bot added category: IE Tests OpenVINO Test: plugins and common category: CPU OpenVINO CPU plugin category: LP transformations OpenVINO Low Precision transformations labels Jun 22, 2024
@eshoguli eshoguli changed the title [TEST] [ARM] [INT8] FullyConnected [TEST] [CPU] [ARM] [INT8] FullyConnected Jun 23, 2024
@github-actions github-actions bot removed the category: LP transformations OpenVINO Low Precision transformations label Jun 26, 2024
@eshoguli eshoguli requested review from a team as code owners June 26, 2024 10:53
@github-actions github-actions bot added category: GPU OpenVINO GPU plugin category: build OpenVINO cmake script / infra labels Jun 26, 2024
@eshoguli eshoguli changed the title [TEST] [CPU] [ARM] [INT8] FullyConnected [CPU] [ARM] [INT8] FullyConnected Jun 26, 2024
@eshoguli eshoguli force-pushed the es/aarch64/int8 branch 5 times, most recently from b972f54 to 743281f Compare July 2, 2024 00:07
@eshoguli eshoguli force-pushed the es/aarch64/int8 branch 5 times, most recently from b56d725 to ea6c2b2 Compare July 10, 2024 19:32
Copy link
Contributor

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Jul 29, 2024
@eshoguli eshoguli force-pushed the es/aarch64/int8 branch 2 times, most recently from af1105f to 0d7c9ec Compare July 31, 2024 00:36
@eshoguli eshoguli removed category: IE Tests OpenVINO Test: plugins and common category: GPU OpenVINO GPU plugin category: build OpenVINO cmake script / infra labels Aug 18, 2024
@github-actions github-actions bot added category: IE Tests OpenVINO Test: plugins and common category: GPU OpenVINO GPU plugin category: build OpenVINO cmake script / infra labels Aug 19, 2024
@ilya-lavrenov ilya-lavrenov added the platform: arm OpenVINO on ARM / ARM64 label Aug 29, 2024
Copy link
Contributor

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Sep 13, 2024
@mg-intel mg-intel removed the Stale label Sep 13, 2024
@@ -479,8 +479,10 @@ std::vector<std::string> disabledTestPatterns() {
retVector.emplace_back(R"(smoke_TestsDFT_(1|2|3|4)d/DFTLayerTest.Inference.*)");
// Issue 88764, 91647, 108802: accuracy issue
retVector.emplace_back(R"(MultipleLSTMCellTest/MultipleLSTMCellTest.CompareWithRefs.*)");
#if !defined(OPENVINO_ARCH_ARM64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we want to skip it for ARM32?

@@ -87,6 +88,11 @@ static const TypeMapping aclFCTypeMapping {
{{_any, _any, _any, _any}, pt(just<f32>(), just<f32>(), just<f32>(), just<f32>())}
};

static const TypeMapping aclLowpFCTypeMapping {
// {src, wei, bia, dst} pt<src, wei, bias, dst>
{{_i8, _i8, _any, _f32}, pt(just<i8>(), just<i8>(), just<f32>(), just<f32>())}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pt(bypass(), bypass(), just(), bypass())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

namespace ov {
namespace intel_cpu {

VectorDims acl_fc_executor::makeDummyInputDims(const Shape& inShape, const Shape& wShape) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the name of the file.
From my understanding, the functions in this file are directly related to fullyconnected operation and also includes some functions such as makeDummyInputDims which are not completely about weights.
Can we rename it to something like acl_fullyconnected_utils.cpp
And the header as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it to acl_fullyconnected_utils

Comment on lines +85 to +89
if (dequantizationScales.empty()) {
tensor_info->set_quantization_info(arm_compute::QuantizationInfo(1.f));
} else {
tensor_info->set_quantization_info(arm_compute::QuantizationInfo(dequantizationScales[0]));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here

VERIFY(checkPostOps(config.postOps), UNSUPPORTED_TYPE_OF_POSTOPS);
VERIFY(one_of(srcRank(config), 2U, 3U, 4U), UNSUPPORTED_SRC_RANK);
VERIFY(one_of(weiRank(config), 2U, 3U, 4U), UNSUPPORTED_WEI_RANK);
VERIFY(static_cast<FCAttrs>(config.attrs).dequantizationScales.size() <= 1, UNSUPPORTED_PER_CHANNEL_QUANTIZATION);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we expect that dequantizationScales can be empty?
Is it possible that quantized FullyConnected will not contain dequantization scales at all?

@EgorDuplensky
Copy link
Contributor

@v-Golubev Could you please review an LPT part

@v-Golubev v-Golubev self-assigned this Sep 18, 2024
INSTANTIATE_TEST_SUITE_P(smoke_LPT, FullyConnectedTransformation,
::testing::Combine(
::testing::ValuesIn(netPrecisions),
::testing::ValuesIn(shapes),
::testing::Values(ov::test::utils::DEVICE_GPU),
::testing::ValuesIn(trasformationParamValues)),
::testing::Values(ov::test::utils::DEVICE_CPU),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
::testing::Values(ov::test::utils::DEVICE_CPU),
::testing::Values(ov::test::utils::DEVICE_GPU),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -60,14 +61,14 @@ std::string LayerTransformation::get_test_case_name_by_params(

namespace {
template <typename IsNodeF>
std::string find_node_by_runtime_precision(const ov::CompiledModel& execNet, IsNodeF is_node_f) {
std::string find_node_by_runtime_precision(const ov::CompiledModel& execNet, IsNodeF is_node_f, const std::string& propertyName = "runtimePrecision") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rename this function in the correspondence to the changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to find_node_by_runtime_property

const bool transpose2);
const bool transpose2,
const bool signedWeights,
const bool perChannelWeights,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perChannelWeights name is a bit confusing. Does this mean a dequantization on weights? Could you please rename it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to perChannelWeightsDequantization

Comment on lines +53 to +60
template <typename T>
std::vector<T> generate_values(const ov::Shape& shape, float delimiter = 1.f) {
std::vector<T> values(ov::shape_size(shape));
for (size_t i = 0; i < values.size(); ++i) {
values[i] = static_cast<T>(static_cast<T>(i) / delimiter);
}
return values;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use the existing ov::test::utils::make_constant helper?

Comment on lines +62 to +72
std::vector<float> generate_dequantization_values(
const ov::Shape& shape,
const size_t levels,
const bool low) {
const auto shape_size = ov::shape_size(shape);
std::vector<float> values(shape_size);
for (size_t i = 0; i < shape_size; ++i) {
values[i] = low ? -128.f / (static_cast<float>(i) + 1.f) : 127.f / (static_cast<float>(i) + 1.f);
}
return values;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The same question: why can't we reuse ov::test::utils::make_constant? We can pass the desired range of values in this helper
  2. levels param is not used

weightsConst, precision, 256ul, { 1ul, 1ul },
{ -128.f / 8.f }, { 127.f / 8.f }, { -128.f / 8.f }, { 127.f / 8.f });
fakeQuantizeOnWeights->set_friendly_name("fakeQuantizeOnWeights");
const size_t channel = inputShape2[inputShape2.size() - 2].get_length();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we transpose inputShape2 if transpose2 == true?

// fq
std::shared_ptr<Node> parentOnWeights;
if (fq) {
auto weightsConst = std::make_shared<ov::op::v0::Constant>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we support transpose2=true option for fq?

return;
}

ov::mark_as_bias(dequantization);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BiasAttribute is Add specific (please take a look at BiasAttribute::is_copyable realization), I think its usage for Multiply ops is not good idea. Let's discuss alternative options offline

Comment on lines +258 to +262
#ifdef OPENVINO_ARCH_ARM64
ADD_MATCHER(common, MatMulWithDequantizationTransformation, params)
#else
ADD_MATCHER(common, MatMulTransformation, params)
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe such reconfiguration should be on plugin's side, not in common LPT code. I'd suggest do the following in CPU's transformation pipeline instead of the current implementation (for ARM platform): We can keep MatMul transformation as is (and not introduce its inheritor), but add an ARM specific transformation which will match on quantized MatMul + multiply, and mark the multiply with the needed attribute.

We already have additional_main_passes, which allow to add a custom matcher to LPT pipeline.

Comment on lines +232 to +234
if (ov::marked_as_bias(node)) {
SetNodeFusingType(node, NodeFusingType::FusedWithMisc);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a WA. This node should be marked as FusedWithFC because isSuitableMatMulParent returns true. If this doesn't happen, we need to modify the existing function

Copy link
Contributor

github-actions bot commented Oct 5, 2024

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Oct 5, 2024
@allnes allnes added no_stale Do not mark as stale and removed Stale labels Oct 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: CPU OpenVINO CPU plugin category: GPU OpenVINO GPU plugin category: IE Tests OpenVINO Test: plugins and common category: LP transformations OpenVINO Low Precision transformations no_stale Do not mark as stale platform: arm OpenVINO on ARM / ARM64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants