Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CallProfiler #259

Merged
merged 32 commits into from
Dec 7, 2023
Merged

Implement CallProfiler #259

merged 32 commits into from
Dec 7, 2023

Conversation

tigercosmos
Copy link
Collaborator

@tigercosmos tigercosmos commented Dec 2, 2023

This work is based on the previous #245, and the discussion on #260

This work introduces a new profiler, called CallProfiler, which can print the profiling result of a hierarchical caller stack.

Using the macro for profiler insertion.

void Foo()
{
    modmesh::USE_CALLPROFILER_PROFILE_THIS_FUNCTION();
    // ...
}
Profiling Result
  foo1 - Total Time: 61 ms, Call Count: 1
    foo2 - Total Time: 54 ms, Call Count: 1
      foo3 - Total Time: 19 ms, Call Count: 1
  foo2 - Total Time: 54 ms, Call Count: 1
    foo3 - Total Time: 19 ms, Call Count: 1
  foo3 - Total Time: 38 ms, Call Count: 2

@tigercosmos
Copy link
Collaborator Author

@q40603 would you like to help review this proposal and give some feedback?

@yungyuc yungyuc added the performance Profiling, runtime, and memory consumption label Dec 2, 2023
@q40603
Copy link
Collaborator

q40603 commented Dec 2, 2023

What the proposal wants to solve

The new CallProfiler tends to complete the profiling utility by

  • constructing the radixTree with each node holding CallerProfile that has function name, execution name, hit count.
  • using an STL stack to handle the call stack execution time calculation
  • Using Function macro together with the profiling probe insertion

Comment:

The design of the radix tree is to replace the std::map in the existing profiling util. A user can leverage the TimeRegistry and the entry API could be replaced by the RadixTree entry API to construct the profiling result.

In this case, I think the STL stack could be avoided as the profiling probe is already on the stack frame.
From the performance perspective, I think memory alloc and dealloc could also be reduced during profiling.

@tigercosmos thanks for the proposal and using the radix tree.

@tigercosmos
Copy link
Collaborator Author

In this case, I think the STL stack could be avoided as the profiling probe is already on the stack frame.

I thought the current APIs not be able to get a hierarchical caller stack. Could you demonstrate how to get the same output as the example above with current APIs?

From the performance perspective, I think memory alloc and dealloc could also be reduced during profiling.

It's a good point. I think the overhead is ignorable at least when the function call takes a few milliseconds.

@q40603
Copy link
Collaborator

q40603 commented Dec 2, 2023

@tigercosmos I just created another PR for the demo.
You may refer to profileRadixTree.hpp for more details

@yungyuc
Copy link
Member

yungyuc commented Dec 3, 2023

From the performance perspective, I think memory alloc and dealloc could also be reduced during profiling.

It's a good point. I think the overhead is ignorable at least when the function call takes a few milliseconds.

No it's not. It's very significant when a function is called a million times in a tight loop. A scoped profiler should not be called in a tight loop, but when it is misused in that way, we still want to get reasonable results.

For low-level helper like the profiler, we should avoid STL containers.

@tigercosmos tigercosmos changed the title Draft: Implement CallProfiler Implement CallProfiler Dec 4, 2023
@tigercosmos
Copy link
Collaborator Author

@q40603 I refer to your approach and improve the code. I would like to have your review first.

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Use end-marking comment.
  • Do not prefix underscore in symbol names.
  • Delete constructors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not long and it seems to be OK to put the contents in RadixTree.hpp directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

std::chrono::nanoseconds total_time; /// use nanoseconds to have higher precision
int call_count = 0;
bool is_running = false;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use end-marking comment /* end struct CallProfiler */.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

private:
CallProfiler() = default;

void _print_profiling_result(const RadixTreeNode<CallerProfile> & node, const int depth, std::ostream & outstream) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not prefix underscore to symbols unless you are a compiler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this function is long. Please put it outside class declaration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

class CallProfiler
{
public:
/// A singleton.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly delete constructors. (You should delete all of them.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

private:
const char * m_caller_name;
bool m_cancel = false;
CallProfiler & m_profiler;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of this reference member m_profiler and the const member m_caller_name, you should delete implicitly created constructors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

foo3();
foo3();

// Example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the examples in the comments.

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. The code is getting cleaner. More issues:

  • CallerProfile constructor
  • CallProfilerTest moving to test file

struct CallerProfile
{
CallerProfile()
: total_time(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you give it a default value and avoid this custom default constructor? By the rule of 5, if you customize one of the 5, you should explicitly spell out all of the 5.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, I found that std::chrono::nanoseconds needs to explicitly assign 0 to have the correct behavior.
sure, I will add the other 4.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still set them as default, not sure if it is what you want

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to use the default value of a member datum:

std::chrono::nanoseconds total_time = 0; /// use nanoseconds to have higher precision

You already had the following. I do not see why total_time does not use a default value.

int call_count = 0;
bool is_running = false;

If you use the default value of the member datum, you do not need to spell out any of the constructors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::chrono::nanoseconds total_time = 0 is invalid, so I use std::chrono::nanoseconds total_time = std::chrono::nanoseconds(0);

bool is_running = false;
}; /* end struct CallerProfile */

class CallProfilerTest; // for gtest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you put it in gtest file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot do that, otherwise it cannot build. If I put this line to gtests file, then we get:

  RadixTree.cpp
C:\Users\tiger\modmesh\cpp\modmesh/toggle/RadixTree.hpp(297,28): error C2433: 'modmesh::CallProfilerTest': 'friend' not permitted on data decla
rations [C:\Users\tiger\modmesh\build\cpp\modmesh\modmesh_primary.vcxproj]
C:\Users\tiger\modmesh\cpp\modmesh/toggle/RadixTree.hpp(297,28): error C4430: missing type specifier - int assumed. Note: C++ does not support
default-int [C:\Users\tiger\modmesh\build\cpp\modmesh\modmesh_primary.vcxproj]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, for accessing private members of CallProfiler? Ok, although I don't like it, it's not intolerable.

But I think this CallProfilerTest should be at least put in detail namespace to highlight that it is not meant as part of the API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return instance;
}

CallProfiler(CallProfiler const &) = delete;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to read because the default constructor is too far away from here. It's more readable to move the private default constructor here, e.g.,

private:
    CallProfiler() = default;

public:
    CallProfiler(CallProfiler const &) = delete;
    //...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

void end_caller(const std::string & caller_name)
{
CallerProfile & callProfile = m_radix_tree.get_current_node()->data();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference is to avoid the unnecessary blank lines or even make them one-liner, e.g.,

callProfile.stop_stopwatch(); // Update profiling information
m_radix_tree.move_current_to_parent(); // Pop the caller from the call stack

I feel it is more readable than

// Update profiling information
callProfile.stop_stopwatch();

// Pop the caller from the call stack
m_radix_tree.move_current_to_parent();

But the latter is fine. I don't have a problem with it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, let me change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer local variables to use snake_case. But in the same way, I have not a problem with the style in the local scope.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ops, I missed this one.

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Clarify whether to use default value of member datum or spell out constructors of CallerProfile
  • Move CallProfilerTest to detail namespace

And please do make sure all CI including linters are OK every time.

bool is_running = false;
}; /* end struct CallerProfile */

class CallProfilerTest; // for gtest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, for accessing private members of CallProfiler? Ok, although I don't like it, it's not intolerable.

But I think this CallProfilerTest should be at least put in detail namespace to highlight that it is not meant as part of the API.

struct CallerProfile
{
CallerProfile()
: total_time(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to use the default value of a member datum:

std::chrono::nanoseconds total_time = 0; /// use nanoseconds to have higher precision

You already had the following. I do not see why total_time does not use a default value.

int call_count = 0;
bool is_running = false;

If you use the default value of the member datum, you do not need to spell out any of the constructors.

@tigercosmos tigercosmos requested a review from yungyuc December 6, 2023 00:06
@tigercosmos
Copy link
Collaborator Author

there are many unnecessary commits, please squash the commits when the PR is merged.

@yungyuc
Copy link
Member

yungyuc commented Dec 6, 2023

@tigercosmos I saw you click the request for review. Next time when you want to request for review, please leave a global comment (not inline) with a summary of what is expected to be reviewed. The diff is not self-explaining.

@tigercosmos
Copy link
Collaborator Author

@yungyuc Sure, the PR is ready for review, please take a look.

The CallProfiler implementation is done and the test for it is also finished.

In addition, the aforementioned issues are all addressed.

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • End marking

namespace detail
{
class CallProfilerTest; // for gtest
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot the end marking for the namespace

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yungyuc sorry, fixed.

@tigercosmos tigercosmos requested a review from yungyuc December 7, 2023 05:01
@tigercosmos
Copy link
Collaborator Author

@yungyuc all issues are clear. Ready for review.

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful

@yungyuc yungyuc merged commit a058969 into solvcon:master Dec 7, 2023
11 checks passed
@tigercosmos tigercosmos deleted the testProfiler branch December 7, 2023 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Profiling, runtime, and memory consumption
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants