-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] [ARM] [INT8] FullyConnected #25171
base: master
Are you sure you want to change the base?
Conversation
46e41b5
to
c2d4099
Compare
b972f54
to
743281f
Compare
b56d725
to
ea6c2b2
Compare
This PR will be closed in a week because of 2 weeks of no activity. |
af1105f
to
0d7c9ec
Compare
54049ba
to
c5e5fc9
Compare
c5e5fc9
to
6cc76f2
Compare
This PR will be closed in a week because of 2 weeks of no activity. |
6cc76f2
to
d4c30c6
Compare
@@ -479,8 +479,10 @@ std::vector<std::string> disabledTestPatterns() { | |||
retVector.emplace_back(R"(smoke_TestsDFT_(1|2|3|4)d/DFTLayerTest.Inference.*)"); | |||
// Issue 88764, 91647, 108802: accuracy issue | |||
retVector.emplace_back(R"(MultipleLSTMCellTest/MultipleLSTMCellTest.CompareWithRefs.*)"); | |||
#if !defined(OPENVINO_ARCH_ARM64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean we want to skip it for ARM32?
@@ -87,6 +88,11 @@ static const TypeMapping aclFCTypeMapping { | |||
{{_any, _any, _any, _any}, pt(just<f32>(), just<f32>(), just<f32>(), just<f32>())} | |||
}; | |||
|
|||
static const TypeMapping aclLowpFCTypeMapping { | |||
// {src, wei, bia, dst} pt<src, wei, bias, dst> | |||
{{_i8, _i8, _any, _f32}, pt(just<i8>(), just<i8>(), just<f32>(), just<f32>())} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pt(bypass(), bypass(), just(), bypass())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
namespace ov { | ||
namespace intel_cpu { | ||
|
||
VectorDims acl_fc_executor::makeDummyInputDims(const Shape& inShape, const Shape& wShape) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the name of the file.
From my understanding, the functions in this file are directly related to fullyconnected operation and also includes some functions such as makeDummyInputDims which are not completely about weights.
Can we rename it to something like acl_fullyconnected_utils.cpp
And the header as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed it to acl_fullyconnected_utils
if (dequantizationScales.empty()) { | ||
tensor_info->set_quantization_info(arm_compute::QuantizationInfo(1.f)); | ||
} else { | ||
tensor_info->set_quantization_info(arm_compute::QuantizationInfo(dequantizationScales[0])); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here
VERIFY(checkPostOps(config.postOps), UNSUPPORTED_TYPE_OF_POSTOPS); | ||
VERIFY(one_of(srcRank(config), 2U, 3U, 4U), UNSUPPORTED_SRC_RANK); | ||
VERIFY(one_of(weiRank(config), 2U, 3U, 4U), UNSUPPORTED_WEI_RANK); | ||
VERIFY(static_cast<FCAttrs>(config.attrs).dequantizationScales.size() <= 1, UNSUPPORTED_PER_CHANNEL_QUANTIZATION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we expect that dequantizationScales can be empty?
Is it possible that quantized FullyConnected will not contain dequantization scales at all?
@v-Golubev Could you please review an LPT part |
INSTANTIATE_TEST_SUITE_P(smoke_LPT, FullyConnectedTransformation, | ||
::testing::Combine( | ||
::testing::ValuesIn(netPrecisions), | ||
::testing::ValuesIn(shapes), | ||
::testing::Values(ov::test::utils::DEVICE_GPU), | ||
::testing::ValuesIn(trasformationParamValues)), | ||
::testing::Values(ov::test::utils::DEVICE_CPU), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
::testing::Values(ov::test::utils::DEVICE_CPU), | |
::testing::Values(ov::test::utils::DEVICE_GPU), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -60,14 +61,14 @@ std::string LayerTransformation::get_test_case_name_by_params( | |||
|
|||
namespace { | |||
template <typename IsNodeF> | |||
std::string find_node_by_runtime_precision(const ov::CompiledModel& execNet, IsNodeF is_node_f) { | |||
std::string find_node_by_runtime_precision(const ov::CompiledModel& execNet, IsNodeF is_node_f, const std::string& propertyName = "runtimePrecision") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please rename this function in the correspondence to the changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to find_node_by_runtime_property
const bool transpose2); | ||
const bool transpose2, | ||
const bool signedWeights, | ||
const bool perChannelWeights, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perChannelWeights
name is a bit confusing. Does this mean a dequantization on weights? Could you please rename it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to perChannelWeightsDequantization
template <typename T> | ||
std::vector<T> generate_values(const ov::Shape& shape, float delimiter = 1.f) { | ||
std::vector<T> values(ov::shape_size(shape)); | ||
for (size_t i = 0; i < values.size(); ++i) { | ||
values[i] = static_cast<T>(static_cast<T>(i) / delimiter); | ||
} | ||
return values; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we use the existing ov::test::utils::make_constant
helper?
std::vector<float> generate_dequantization_values( | ||
const ov::Shape& shape, | ||
const size_t levels, | ||
const bool low) { | ||
const auto shape_size = ov::shape_size(shape); | ||
std::vector<float> values(shape_size); | ||
for (size_t i = 0; i < shape_size; ++i) { | ||
values[i] = low ? -128.f / (static_cast<float>(i) + 1.f) : 127.f / (static_cast<float>(i) + 1.f); | ||
} | ||
return values; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The same question: why can't we reuse
ov::test::utils::make_constant
? We can pass the desired range of values in this helper levels
param is not used
weightsConst, precision, 256ul, { 1ul, 1ul }, | ||
{ -128.f / 8.f }, { 127.f / 8.f }, { -128.f / 8.f }, { 127.f / 8.f }); | ||
fakeQuantizeOnWeights->set_friendly_name("fakeQuantizeOnWeights"); | ||
const size_t channel = inputShape2[inputShape2.size() - 2].get_length(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we transpose inputShape2
if transpose2 == true
?
// fq | ||
std::shared_ptr<Node> parentOnWeights; | ||
if (fq) { | ||
auto weightsConst = std::make_shared<ov::op::v0::Constant>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we support transpose2=true
option for fq?
return; | ||
} | ||
|
||
ov::mark_as_bias(dequantization); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BiasAttribute
is Add specific (please take a look at BiasAttribute::is_copyable
realization), I think its usage for Multiply ops is not good idea. Let's discuss alternative options offline
#ifdef OPENVINO_ARCH_ARM64 | ||
ADD_MATCHER(common, MatMulWithDequantizationTransformation, params) | ||
#else | ||
ADD_MATCHER(common, MatMulTransformation, params) | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe such reconfiguration should be on plugin's side, not in common LPT code. I'd suggest do the following in CPU's transformation pipeline instead of the current implementation (for ARM platform): We can keep MatMul transformation as is (and not introduce its inheritor), but add an ARM specific transformation which will match on quantized MatMul + multiply, and mark the multiply with the needed attribute.
We already have additional_main_passes
, which allow to add a custom matcher to LPT pipeline.
if (ov::marked_as_bias(node)) { | ||
SetNodeFusingType(node, NodeFusingType::FusedWithMisc); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a WA. This node should be marked as FusedWithFC
because isSuitableMatMulParent
returns true. If this doesn't happen, we need to modify the existing function
This PR will be closed in a week because of 2 weeks of no activity. |
Details:
Tickets: