Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Snippets] Add debug caps for dumping snippets parameters #28378

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

aobolensk
Copy link
Contributor

Details:

  • Add infrastructure to dump snippets parameters using OV_SNIPPETS_DUMP_PARAMS environment variable
  • Enable caps for brgemm snippet

Tickets:

  • 137302

@aobolensk aobolensk requested a review from v-Golubev January 10, 2025 15:06
@aobolensk aobolensk requested a review from a team as a code owner January 10, 2025 15:06
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Jan 10, 2025
@aobolensk aobolensk requested a review from a team as a code owner January 10, 2025 15:50
@aobolensk aobolensk requested review from tsavina and removed request for a team January 10, 2025 15:50
@github-actions github-actions bot added the category: docs OpenVINO documentation label Jan 10, 2025
@aobolensk aobolensk requested a review from a team as a code owner January 14, 2025 13:51
@aobolensk aobolensk force-pushed the brgemm-debug-caps branch 2 times, most recently from 4229be8 to 1fe9e26 Compare January 14, 2025 18:13
@aobolensk aobolensk requested a review from v-Golubev January 15, 2025 08:25
Copy link
Contributor

@v-Golubev v-Golubev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be also useful to dump Subgraph's name as well. Let's discuss possible options offline

@aobolensk
Copy link
Contributor Author

It can be also useful to dump Subgraph's name as well. Let's discuss possible options offline

Done

Copy link
Contributor

@v-Golubev v-Golubev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@aobolensk aobolensk force-pushed the brgemm-debug-caps branch 3 times, most recently from a75589b to cb4be59 Compare January 17, 2025 11:04
@aobolensk aobolensk force-pushed the brgemm-debug-caps branch 4 times, most recently from 58a6ff1 to 7aa0810 Compare January 17, 2025 12:52
Copy link
Contributor

@a-sidorova a-sidorova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@a-sidorova a-sidorova removed their assignment Jan 22, 2025
OV_SNIPPETS_DUMP_BRGEMM_PARAMS="path=brgemm.csv" binary ...
```

Output example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add a line break, so the output will be displayed as a table

Suggested change
Output example:
Output example:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,19 @@
# Snippet parameters dump

Snippet parameters can be captured during the pass flow, which can be useful for debugging and optimization purposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term parameters here creates an unwanted association with the Parameter node. It would be better to reformulate it: the pass dumps selected properties of some performance-critical operations in Subgraphs. Only MatMuls are currently supported by this pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

for (size_t i = 0; i < tensor.size(); ++i) {
const auto& v = tensor[i];
const auto v_str =
utils::is_full_dim_value(v) ? "FULL_DIM" : utils::is_dynamic_value(v) ? "?" : std::to_string(v);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid nested ternary operators, they are very hard to read. Or at least use indentation to distinguish between outer and inner conditions.

}
if (brgemm_csv_path.empty()) {
auto brgemm_csv_path_it = rt_info.find("brgemm_params_csv_path");
brgemm_csv_path = brgemm_csv_path_it->second.as<std::string>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if brgemm_csv_path_it == rt_info.end()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

Comment on lines +26 to +28
template <typename BRGEMM_TYPE,
typename std::enable_if<std::is_base_of<ov::snippets::op::Brgemm, BRGEMM_TYPE>::value, bool>::type = true>
class BrgemmDebugParams : public snippets::lowered::pass::RangedPass {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to implement this pass as a template? What is the reasoning behind this?
I'm sure that in the absolute majority of use cases we'll be interested in BrgemmCPU.
We can also map onto snippets::op::Brgemm, if we want to support all Brgemm types

// Attach brgemm parameters to PerfCountEnd node
auto csv_path = linear_ir.get_config().debug_config.dumpParams.csv_path;
perf_count_end->get_rt_info()["brgemm_params"] = params;
perf_count_end->get_rt_info()["brgemm_params_csv_path"] = csv_path;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have 2 concerns with respect to csv_path:

  1. All of the perf count nodes should have the same csv_path. This assumption is used implicitly in the node implementation (dump once when the last node is destroyed), but never enforced. For example, I can update this pass so different csv_path will be set in rt_info for every counter (or a group of counters), but this won't work as expected.
  2. It's excessive to store csv_path in every rt_info, when we actually need one (and only one) set. It looks like we indeed need a dedicated class with a proper setter for this static variable.

private:
std::string collect_params(const ov::snippets::lowered::ExpressionPtr& brgemm_expr,
const snippets::lowered::LinearIR& linear_ir) {
auto debug_config = linear_ir.get_config().debug_config;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I miss smth, or we don't really need this debug_config here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, removed, thanks!

Comment on lines +73 to +160
const auto brgemm = ov::as_type_ptr<BRGEMM_TYPE>(brgemm_expr->get_node());
OPENVINO_ASSERT(brgemm, "Brgemm is nullptr!");
std::stringstream ss;
ss << m_subgraph_name << ',';
ss << brgemm_expr->get_node()->get_friendly_name() << ',';
for (size_t i = 0; i < brgemm->get_input_size(); ++i) {
ss << brgemm->get_input_element_type(i);
if (i != brgemm->get_input_size() - 1) {
ss << ';';
}
}
ss << ',';
for (size_t i = 0; i < brgemm->get_output_size(); ++i) {
ss << brgemm->get_output_element_type(i);
if (i != brgemm->get_output_size() - 1) {
ss << ';';
}
}
ss << ',';
for (size_t i = 0; i < brgemm->inputs().size(); ++i) {
const auto& port_desc = brgemm_expr->get_input_port_descriptor(i);
const auto& shape = ov::snippets::utils::get_planar_vdims(port_desc->get_shape(), port_desc->get_layout());
ss << utils::tensor2str(shape, " ");
ss << ';';
}
ss.seekp(-1, ss.cur);
ss << ',';
for (size_t i = 0; i < brgemm->outputs().size(); ++i) {
const auto& port_desc = brgemm_expr->get_output_port_descriptor(i);
const auto& shape =
ov::snippets::utils::get_preordered_vdims(port_desc->get_shape(), port_desc->get_layout());
ss << utils::tensor2str(shape, " ");
ss << ';';
}
ss.seekp(-1, ss.cur);
ss << ',';
for (size_t i = 0; i < brgemm->inputs().size(); ++i) {
const auto& port_desc = brgemm_expr->get_input_port_descriptor(i);
ss << utils::tensor2str(port_desc->get_layout(), " ");
ss << ';';
}
ss << ',';
for (size_t i = 0; i < brgemm->outputs().size(); ++i) {
const auto& port_desc = brgemm_expr->get_output_port_descriptor(i);
ss << utils::tensor2str(port_desc->get_layout(), " ");
ss << ';';
}
ss << ',';

const auto& in_0_desc = brgemm_expr->get_input_port_descriptor(0);
const auto& in_1_desc = brgemm_expr->get_input_port_descriptor(1);
const auto& out_desc = brgemm_expr->get_output_port_descriptor(0);

const auto& in_0_planar_dims =
ov::snippets::utils::get_planar_vdims(in_0_desc->get_shape(), in_0_desc->get_layout());
const auto& in_1_planar_dims =
ov::snippets::utils::get_planar_vdims(in_1_desc->get_shape(), in_1_desc->get_layout());
const auto& out_preordered_dims =
ov::snippets::utils::get_preordered_vdims(out_desc->get_shape(), out_desc->get_layout());

const auto& m = *++out_preordered_dims.rbegin();
const auto& n = *out_preordered_dims.rbegin();
const auto& k0 = *in_0_planar_dims.rbegin();
const auto& k1 = *++in_1_planar_dims.rbegin();
size_t k = 0;
OPENVINO_ASSERT(utils::merge_dynamic_dim(k, k0, k1),
"Brgemm input descriptors have incompatible K dimension value.");
ss << static_cast<int64_t>(m) << ',' << static_cast<int64_t>(n) << ',' << static_cast<int64_t>(k) << ',';

size_t m_block = in_0_desc->get_subtensor().front();
size_t n_block = in_1_desc->get_subtensor().back();
size_t k_block = out_desc->get_subtensor().back();

auto append_block_info = [&](size_t block) {
if (block == utils::get_full_dim_value()) {
ss << "FULL_DIM";
} else if (block == utils::get_dynamic_value<size_t>()) {
ss << "?";
} else {
ss << block;
}
ss << ',';
};

append_block_info(m_block);
append_block_info(n_block);
append_block_info(k_block);
return ss.str();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything specific to a particular BRGEMM_TYPE here, so it's another point to consider mapping onto snippets::op::Brgemm

@@ -109,6 +132,35 @@ void PerfCountEnd::output_perf_count() {
std::cout << "max accumulated time:" << acc_max << "ns" << std::endl;
// max avg
std::cout << "max avg time:" << avg_max << "ns" << std::endl;

// Dump brgemm debug parameters to csv file
if (acc_max != 0 && avg_max != 0 && get_friendly_name().find("_DebugParams") != std::string::npos) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for these conditions on acc_max and avg_max? Why can't they be 0?
Also "_DebugParams" should not be used like this, as discussed in the transformation file.

Comment on lines +87 to +90
--nodes_count;
if (nodes_count == 0) {
dump_brgemm_params_to_csv();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite elegant solution to minimize dumping overheads 👍
I think that we should align dumping behavior for default and csv modes.
One way to do it is implement csv-related functionality in a derived class and and use counters like here.

But if you think about it, what you do here is trying to mimic the shared poiner functionality in a way. So Why not to make another step and move the dumping functionality to a separate class altogether. It would be more scalable and convenient, since different types of output can be supported, or different groups of nodes can output to different files (might be convenient to dump MatMul params & say generic counters to separate files). How it might look:

  1. A developer creates a dumper class first, and passes relevant arguments to its constructor (csv path on or case, or nothing for std::cout dumper).
  2. The dumper instance is then passed to every PerfCountEnd node in constructor (through shared_ptr)
  3. PerfCountEnd calls dumper::update(this) in destructor to add a row to the table (or m_debug_params_map)
  4. When the last PerfCountEnd is deleted, the dumper destructor will be called, where we can conveniently write all accumulated table rows to whatever output we support.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin category: docs OpenVINO documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants