Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

NM Profiler : Update visualize_trace.py #370

Closed
wants to merge 8 commits into from

Conversation

varun-sundar-rabindranath
Copy link

@varun-sundar-rabindranath varun-sundar-rabindranath commented Jul 9, 2024

Update visualize trace utility.

  • Add ability to plot decode steps
  • Expose the metric to plot as a command line argument
  • Generate 2 plots - 1 for prefill and another for decode-steps and store them in the user given output directory
  • Remove the ignore_sampler arg, and instead add a fold_json_node arg - This argument collapses the specified JSON tree so the plot has less clutter.

Usage:

  • python3 neuralmagic/tools/profiler/visualize_trace.py --json-trace profiler_fp8_trace.json --output-directory ./kernel --level kernel
    This command produce 2 output files : kernel/prefill.png and kernel/decode_steps.png which are stacked-bar graph plots. In these plots the operations are grouped together by high-level concepts such as gemms, attention, rms-norm etc.
    prefill
    decode_steps

  • python3 neuralmagic/tools/profiler/visualize_trace.py --json-trace profiler_fp8_trace.json --output-directory ./module --level module --plot-metric pct_cuda_time
    This command also produces 2 output files : module/prefill.png and module/decode_steps.png which are stacked-bar graph plots. In these plots the bars sum up to a 100 as the requested plot metric is pct_cuda_time
    prefill
    decode_steps

@varun-sundar-rabindranath varun-sundar-rabindranath marked this pull request as draft July 9, 2024 13:38
@varun-sundar-rabindranath varun-sundar-rabindranath marked this pull request as ready for review July 9, 2024 18:05
Varun Sundar Rabindranath added 2 commits July 9, 2024 18:51
Comment on lines +73 to +77
def shorten_plot_legend_strings(legend, max_char_len: int):
for t in legend.get_texts():
t.set_text(
trim_string_back(abbreviate_known_names(t.get_text()),
max_char_len))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for this :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lucas had this already :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now that it was there before this PR. The "thank you" still stands given how wide some of the previous plot legends were :)

'''


def group_trace_by_operations(trace_df: pd.DataFrame) -> pd.DataFrame:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice this is super useful, thanks!

Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for improving this!

@LucasWilkinson
Copy link
Collaborator

They Y-label in the --plot-metric pct_cuda_time graph appears to be wrong above, but in the code it seems to be getting set correctly? I assume this has been fixed?

@varun-sundar-rabindranath
Copy link
Author

They Y-label in the --plot-metric pct_cuda_time graph appears to be wrong above, but in the code it seems to be getting set correctly? I assume this has been fixed?

Hey Lucas. Yes, I noticed that and fixed it. Sorry, should have mentioned it somewhere.

@varun-sundar-rabindranath
Copy link
Author

Migrated all changes including all of the layer-by-layer profiling code to neuralmagic#3

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants