-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Snippets] Add debug caps for dumping snippets parameters #28378
base: master
Are you sure you want to change the base?
Conversation
e397c92
to
046410a
Compare
src/common/snippets/include/snippets/lowered/pass/brgemm_blocking.hpp
Outdated
Show resolved
Hide resolved
src/common/snippets/include/snippets/lowered/pass/brgemm_blocking.hpp
Outdated
Show resolved
Hide resolved
cfaf273
to
053a7b4
Compare
053a7b4
to
4aa19d8
Compare
src/common/snippets/include/snippets/utils/debug_caps_config.hpp
Outdated
Show resolved
Hide resolved
src/common/snippets/include/snippets/lowered/pass/brgemm_blocking.hpp
Outdated
Show resolved
Hide resolved
5ecff19
to
f2ab0f0
Compare
4229be8
to
1fe9e26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be also useful to dump Subgraph's name as well. Let's discuss possible options offline
src/common/snippets/include/snippets/lowered/pass/brgemm_debug_params.hpp
Outdated
Show resolved
Hide resolved
src/common/snippets/include/snippets/lowered/pass/brgemm_debug_params.hpp
Outdated
Show resolved
Hide resolved
src/common/snippets/include/snippets/lowered/pass/brgemm_debug_params.hpp
Outdated
Show resolved
Hide resolved
src/common/snippets/include/snippets/lowered/pass/brgemm_debug_params.hpp
Outdated
Show resolved
Hide resolved
src/common/snippets/include/snippets/lowered/pass/brgemm_debug_params.hpp
Outdated
Show resolved
Hide resolved
src/common/snippets/include/snippets/lowered/pass/brgemm_debug_params.hpp
Show resolved
Hide resolved
src/common/snippets/include/snippets/lowered/pass/brgemm_debug_params.hpp
Outdated
Show resolved
Hide resolved
Done |
eb64b0b
to
c85ad56
Compare
src/common/snippets/include/snippets/lowered/pass/brgemm_debug_params.hpp
Outdated
Show resolved
Hide resolved
src/common/snippets/include/snippets/lowered/pass/brgemm_debug_params.hpp
Outdated
Show resolved
Hide resolved
821ff78
to
e4751c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
src/common/snippets/include/snippets/lowered/pass/brgemm_debug_params.hpp
Outdated
Show resolved
Hide resolved
src/common/snippets/include/snippets/lowered/pass/brgemm_debug_params.hpp
Outdated
Show resolved
Hide resolved
a75589b
to
cb4be59
Compare
58a6ff1
to
7aa0810
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
OV_SNIPPETS_DUMP_BRGEMM_PARAMS="path=brgemm.csv" binary ... | ||
``` | ||
|
||
Output example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add a line break, so the output will be displayed as a table
Output example: | |
Output example: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,19 @@ | |||
# Snippet parameters dump | |||
|
|||
Snippet parameters can be captured during the pass flow, which can be useful for debugging and optimization purposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term parameters
here creates an unwanted association with the Parameter
node. It would be better to reformulate it: the pass dumps selected properties of some performance-critical operations in Subgraphs. Only MatMuls are currently supported by this pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed
for (size_t i = 0; i < tensor.size(); ++i) { | ||
const auto& v = tensor[i]; | ||
const auto v_str = | ||
utils::is_full_dim_value(v) ? "FULL_DIM" : utils::is_dynamic_value(v) ? "?" : std::to_string(v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid nested ternary operators, they are very hard to read. Or at least use indentation to distinguish between outer and inner conditions.
} | ||
if (brgemm_csv_path.empty()) { | ||
auto brgemm_csv_path_it = rt_info.find("brgemm_params_csv_path"); | ||
brgemm_csv_path = brgemm_csv_path_it->second.as<std::string>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if brgemm_csv_path_it == rt_info.end()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
template <typename BRGEMM_TYPE, | ||
typename std::enable_if<std::is_base_of<ov::snippets::op::Brgemm, BRGEMM_TYPE>::value, bool>::type = true> | ||
class BrgemmDebugParams : public snippets::lowered::pass::RangedPass { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to implement this pass as a template? What is the reasoning behind this?
I'm sure that in the absolute majority of use cases we'll be interested in BrgemmCPU.
We can also map onto snippets::op::Brgemm, if we want to support all Brgemm types
// Attach brgemm parameters to PerfCountEnd node | ||
auto csv_path = linear_ir.get_config().debug_config.dumpParams.csv_path; | ||
perf_count_end->get_rt_info()["brgemm_params"] = params; | ||
perf_count_end->get_rt_info()["brgemm_params_csv_path"] = csv_path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have 2 concerns with respect to csv_path
:
- All of the perf count nodes should have the same csv_path. This assumption is used implicitly in the node implementation (dump once when the last node is destroyed), but never enforced. For example, I can update this pass so different csv_path will be set in rt_info for every counter (or a group of counters), but this won't work as expected.
- It's excessive to store csv_path in every rt_info, when we actually need one (and only one) set. It looks like we indeed need a dedicated class with a proper setter for this static variable.
private: | ||
std::string collect_params(const ov::snippets::lowered::ExpressionPtr& brgemm_expr, | ||
const snippets::lowered::LinearIR& linear_ir) { | ||
auto debug_config = linear_ir.get_config().debug_config; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I miss smth, or we don't really need this debug_config here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, removed, thanks!
const auto brgemm = ov::as_type_ptr<BRGEMM_TYPE>(brgemm_expr->get_node()); | ||
OPENVINO_ASSERT(brgemm, "Brgemm is nullptr!"); | ||
std::stringstream ss; | ||
ss << m_subgraph_name << ','; | ||
ss << brgemm_expr->get_node()->get_friendly_name() << ','; | ||
for (size_t i = 0; i < brgemm->get_input_size(); ++i) { | ||
ss << brgemm->get_input_element_type(i); | ||
if (i != brgemm->get_input_size() - 1) { | ||
ss << ';'; | ||
} | ||
} | ||
ss << ','; | ||
for (size_t i = 0; i < brgemm->get_output_size(); ++i) { | ||
ss << brgemm->get_output_element_type(i); | ||
if (i != brgemm->get_output_size() - 1) { | ||
ss << ';'; | ||
} | ||
} | ||
ss << ','; | ||
for (size_t i = 0; i < brgemm->inputs().size(); ++i) { | ||
const auto& port_desc = brgemm_expr->get_input_port_descriptor(i); | ||
const auto& shape = ov::snippets::utils::get_planar_vdims(port_desc->get_shape(), port_desc->get_layout()); | ||
ss << utils::tensor2str(shape, " "); | ||
ss << ';'; | ||
} | ||
ss.seekp(-1, ss.cur); | ||
ss << ','; | ||
for (size_t i = 0; i < brgemm->outputs().size(); ++i) { | ||
const auto& port_desc = brgemm_expr->get_output_port_descriptor(i); | ||
const auto& shape = | ||
ov::snippets::utils::get_preordered_vdims(port_desc->get_shape(), port_desc->get_layout()); | ||
ss << utils::tensor2str(shape, " "); | ||
ss << ';'; | ||
} | ||
ss.seekp(-1, ss.cur); | ||
ss << ','; | ||
for (size_t i = 0; i < brgemm->inputs().size(); ++i) { | ||
const auto& port_desc = brgemm_expr->get_input_port_descriptor(i); | ||
ss << utils::tensor2str(port_desc->get_layout(), " "); | ||
ss << ';'; | ||
} | ||
ss << ','; | ||
for (size_t i = 0; i < brgemm->outputs().size(); ++i) { | ||
const auto& port_desc = brgemm_expr->get_output_port_descriptor(i); | ||
ss << utils::tensor2str(port_desc->get_layout(), " "); | ||
ss << ';'; | ||
} | ||
ss << ','; | ||
|
||
const auto& in_0_desc = brgemm_expr->get_input_port_descriptor(0); | ||
const auto& in_1_desc = brgemm_expr->get_input_port_descriptor(1); | ||
const auto& out_desc = brgemm_expr->get_output_port_descriptor(0); | ||
|
||
const auto& in_0_planar_dims = | ||
ov::snippets::utils::get_planar_vdims(in_0_desc->get_shape(), in_0_desc->get_layout()); | ||
const auto& in_1_planar_dims = | ||
ov::snippets::utils::get_planar_vdims(in_1_desc->get_shape(), in_1_desc->get_layout()); | ||
const auto& out_preordered_dims = | ||
ov::snippets::utils::get_preordered_vdims(out_desc->get_shape(), out_desc->get_layout()); | ||
|
||
const auto& m = *++out_preordered_dims.rbegin(); | ||
const auto& n = *out_preordered_dims.rbegin(); | ||
const auto& k0 = *in_0_planar_dims.rbegin(); | ||
const auto& k1 = *++in_1_planar_dims.rbegin(); | ||
size_t k = 0; | ||
OPENVINO_ASSERT(utils::merge_dynamic_dim(k, k0, k1), | ||
"Brgemm input descriptors have incompatible K dimension value."); | ||
ss << static_cast<int64_t>(m) << ',' << static_cast<int64_t>(n) << ',' << static_cast<int64_t>(k) << ','; | ||
|
||
size_t m_block = in_0_desc->get_subtensor().front(); | ||
size_t n_block = in_1_desc->get_subtensor().back(); | ||
size_t k_block = out_desc->get_subtensor().back(); | ||
|
||
auto append_block_info = [&](size_t block) { | ||
if (block == utils::get_full_dim_value()) { | ||
ss << "FULL_DIM"; | ||
} else if (block == utils::get_dynamic_value<size_t>()) { | ||
ss << "?"; | ||
} else { | ||
ss << block; | ||
} | ||
ss << ','; | ||
}; | ||
|
||
append_block_info(m_block); | ||
append_block_info(n_block); | ||
append_block_info(k_block); | ||
return ss.str(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything specific to a particular BRGEMM_TYPE
here, so it's another point to consider mapping onto snippets::op::Brgemm
@@ -109,6 +132,35 @@ void PerfCountEnd::output_perf_count() { | |||
std::cout << "max accumulated time:" << acc_max << "ns" << std::endl; | |||
// max avg | |||
std::cout << "max avg time:" << avg_max << "ns" << std::endl; | |||
|
|||
// Dump brgemm debug parameters to csv file | |||
if (acc_max != 0 && avg_max != 0 && get_friendly_name().find("_DebugParams") != std::string::npos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for these conditions on acc_max
and avg_max
? Why can't they be 0?
Also "_DebugParams" should not be used like this, as discussed in the transformation file.
--nodes_count; | ||
if (nodes_count == 0) { | ||
dump_brgemm_params_to_csv(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite elegant solution to minimize dumping overheads 👍
I think that we should align dumping behavior for default and csv modes.
One way to do it is implement csv-related functionality in a derived class and and use counters like here.
But if you think about it, what you do here is trying to mimic the shared poiner functionality in a way. So Why not to make another step and move the dumping functionality to a separate class altogether. It would be more scalable and convenient, since different types of output can be supported, or different groups of nodes can output to different files (might be convenient to dump MatMul params & say generic counters to separate files). How it might look:
- A developer creates a dumper class first, and passes relevant arguments to its constructor (csv path on or case, or nothing for std::cout dumper).
- The dumper instance is then passed to every PerfCountEnd node in constructor (through shared_ptr)
- PerfCountEnd calls dumper::update(this) in destructor to add a row to the table (or m_debug_params_map)
- When the last PerfCountEnd is deleted, the dumper destructor will be called, where we can conveniently write all accumulated table rows to whatever output we support.
What do you think?
7aa0810
to
06eee03
Compare
Details:
OV_SNIPPETS_DUMP_PARAMS
environment variableTickets: