[Performance] Slowdown Caused by Gelu Fusion Removal #23491

SuhwanSong · 2025-01-25T14:51:04Z

Describe the issue

From commit 2cdc05f, ONNX Runtime (ORT) no longer performs Gelu fusion, resulting in a 4X performance slowdown.

Bisect range: de7a02b .. 2cdc05f.

Optimized model of `de7a02b`

Optimized model of `2cdc05f`

Performance Comparison

Key	`de7a02b`	`2cdc05f`	Ratio
model_loading_uri	611	603	0.9869
session_initialization	4256	4236	0.9953
/m4/MatMul_kernel_time	616211	531171	0.8623
/m4/Add_kernel_time		4973509
BiasGelu_kernel_time	513038
Gelu_kernel_time		171279
SequentialExecutor::Execute	1193568	5778856	4.8418
model_run	1223691	5796766	4.7372

To reproduce

Download and unzip "model.zip".
Run the following script.

import time
import onnxruntime
import numpy as np

# Set the random seed
np.random.seed(0)

onnx_model_path = 'model.onnx'

# Load the ONNX model with the CPUExecutionProvider
ort_session = onnxruntime.InferenceSession(onnx_model_path, providers=['CPUExecutionProvider'])
ort_session.get_modelmeta()
inputs = ort_session.get_inputs()

nth = 100000

# Warm-up inference to cache optimizations

input_data = np.load("input.npy", allow_pickle=True).item()
ort_session.run(None, input_data)

# Measure inference time excluding input creation
total_time_ns = 0
for _ in range(nth):

    start_ns = time.perf_counter_ns()
    ort_session.run(None, input_data)
    end_ns = time.perf_counter_ns()

    total_time_ns += end_ns - start_ns

avg_time_ns = total_time_ns / nth
avg_time_ms = avg_time_ns / 1e6

print(f'[{onnxruntime.__version__}] Average inference time: {avg_time_ms:.5f} ms')

Urgency

No response

Platform

Linux

OS Version

6.8.0

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.20.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

model.zip

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

SuhwanSong added the performance issues related to performance regressions label Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Slowdown Caused by Gelu Fusion Removal #23491

[Performance] Slowdown Caused by Gelu Fusion Removal #23491

SuhwanSong commented Jan 25, 2025 •

edited

Loading

[Performance] Slowdown Caused by Gelu Fusion Removal #23491

[Performance] Slowdown Caused by Gelu Fusion Removal #23491

Comments

SuhwanSong commented Jan 25, 2025 • edited Loading

Describe the issue

Optimized model of de7a02b

Optimized model of 2cdc05f

Performance Comparison

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

SuhwanSong commented Jan 25, 2025 •

edited

Loading

Optimized model of `de7a02b`

Optimized model of `2cdc05f`