Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance tracking machinery #1509

Merged
merged 3 commits into from
Jan 24, 2025
Merged

Add performance tracking machinery #1509

merged 3 commits into from
Jan 24, 2025

Conversation

serramatutu
Copy link
Contributor

@serramatutu serramatutu commented Nov 5, 2024

NOTE: I had to significantly refactor the original version of this PR to make it use cProfile, so I changed the description and comments to match.

Summary

This PR introduces some new testing functionality that we can use to avoid performance regressions. From a user perspective, the performance benchmarking workflow is:

  1. Create a new test in tests_metricflow/performance/ that does something we want to track performance for. You can use measure_compilation_performance fixtures to measure the compilation of some SQL based on a query spec and a manifest.
  2. Run make perf to generate a performance-report.json.
  3. Run make perf-compare A=base-report.json B=other-report.json to generate a Markdown file with a report on changes.

The performance report

The performance-report.json file contains the "raw" data for a given run. It looks like this:

{
    "sessions": {
        "test_simple_manifest.py::test_simple_query": {
            "session_id": "test_simple_manifest.py::test_simple_query",
            "total_time": 0.04,
            "functions": {
                "source_scan_optimizer.py:115 :: __init__": {
                    "function_name": "source_scan_optimizer.py:115 :: __init__",
                    "base_calls": 1,
                    "total_calls":1,
                    "body_time": 0.01,
                    "per_call_body_time": 0.01,
                    "total_time": 0.03,
                    "per_call_total_time": 0.03
                },
                ...
            }
        },
        ...
    }
 }

Each "report" contains a list of "sessions". A session is a run where multiple measurements have been taken, i.e a pytest test. Each session contains a variety of "functions", which contain runtime statistics about each one of them.

Report comparison

I created a script called compare_reports.py which takes in two different reports files and calculates the difference between them (in absolute values and also percent). It then generates a markdown file called performance-comparison.md with a nice little table like the one on my first comment.

I added a time.sleep(1) to build_plan() so that I could get a non-negligible result on the first comment.

How it works

When you run make perf, the measure_compilation_performance fixture instantiates a PerformanceTracker and yields a _measure() function that will compile a query while running cProfile.

The tests then all run, and at the end of it all the fixture compiles all reports by calling perf_tracker.report_set. This will return all the gathered SessionReports. These reports contain all the data from all functions.

After this, it writes the report to the JSON file.

Then, when you run make perf-compare the script loads both the JSON files and uses the compare() methods in each of the report classes. These will generate Comparison classes with the percent and absolute differences. The script then renders these classes to markdown and writes it to a file. This script filters the comparisons to only show the top 10 most relevant comparisons, if their total runtime difference non-negligible (i.e the percent difference is higher than 10e-7). If any total runtime has increased by over 10%, it will add warnings ( ⚠️ ) to the file.

Next steps

Make a Github Actions bot that runs this on PRs to compare performance against main, and comments PRs with the generated markdown.

Copy link

github-actions bot commented Nov 5, 2024

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@serramatutu
Copy link
Contributor Author

serramatutu commented Nov 5, 2024

Performance comparison

Comparing performance-report-sleep.json against performance-report.json

Reporting top 10 highest non-negligible absolute total time differences for each session.

Legend

  • Function: the function name
  • Base calls: number of calls, excluding recursion
  • Total calls: number of calls, including recursion
  • Body time: time spent on the function body, excluding sub-function calls
  • Total time: time spent on the function body and in other sub-functions it calls

n.d. means "negligible difference", i.e the difference between measured outputs is negligible, and any differences could be down to imprecision, so the reported value is only the absolute measured value

Warnings ( ⚠️ ) will be emitted whenever there is more than 10% increase in total runtime of a test.


test_simple_manifest.py::test_simple_query ⚠️ ⚠️

Total time: 1s - 0s = 1s (+96.774%)

i Function Base calls Total calls Body time (cumulative) Body time (per call) Total time (cumulative) Total time (per call)
#1 dataflow_plan_builder.py:139 :: build_plan 1 (n.d.) 1 (n.d.) 0s (n.d.) 0s (n.d.) 1s - 0s = 1s (+97.987%) 0s (n.d.)

test_simple_manifest.py::test_simple_query_2 ⚠️ ⚠️

Total time: 1s - 0s = 1s (+96.967%)

i Function Base calls Total calls Body time (cumulative) Body time (per call) Total time (cumulative) Total time (per call)
#1 dataflow_plan_builder.py:139 :: build_plan 1 (n.d.) 1 (n.d.) 0s (n.d.) 0s (n.d.) 1s - 0s = 1s (+98.178%) 0s (n.d.)

This commit adds the basic functionality needed for measuring
performance. This includes:
- classes to measure performance in different sessions using
context managers
- a context manager that can be used to instrument certain methods
with performance tracking machinery
- test fixtures for automatically generating reports using these
classes
This commit adds `Comparison` classes which contain comparisons between
performance measurements taken at two different points in time. It uses
these classes in a new script that can be called to generate a markdown
report of what changed regarding performance.
@serramatutu serramatutu force-pushed the serramatutu/perf-tracking branch from c7eb66b to cb61a7d Compare January 24, 2025 10:00
@serramatutu serramatutu requested a review from a team as a code owner January 24, 2025 10:00
@serramatutu serramatutu force-pushed the serramatutu/perf-tracking branch from 2584d73 to e2493f6 Compare January 24, 2025 15:29
@serramatutu serramatutu force-pushed the serramatutu/perf-tracking branch from e2493f6 to f598cb0 Compare January 24, 2025 15:56
Copy link
Contributor

@plypaul plypaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@plypaul plypaul merged commit d10d263 into main Jan 24, 2025
24 of 25 checks passed
@plypaul plypaul deleted the serramatutu/perf-tracking branch January 24, 2025 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants