Skip to content

Commit

Permalink
ci: run tests only on 1.10 for now (#975)
Browse files Browse the repository at this point in the history
* ci: run tests only on 1.10 for now

* ci: try reducing the number of groups
  • Loading branch information
avik-pal authored Oct 8, 2024
1 parent d230834 commit 04deedf
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 50 deletions.
4 changes: 2 additions & 2 deletions .buildkite/benchmarks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ steps:
- "8"
plugins:
- JuliaCI/julia#v1:
version: "1"
version: "1.10"
command: |
julia --project=benchmarks -e 'println("--- :julia: Instantiating project")
using Pkg
Expand Down Expand Up @@ -61,7 +61,7 @@ steps:
- label: "Combine benchmarks"
plugins:
- JuliaCI/julia#v1:
version: "1"
version: "1.10"
command: |
buildkite-agent artifact download "benchmarks/results/*" .
Expand Down
27 changes: 6 additions & 21 deletions .buildkite/testing.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
steps:
- group: ":julia: CUDA GPU"
steps:
- label: ":julia: Julia {{matrix.julia}} + CUDA GPU ({{matrix.group}})"
- label: ":julia: Julia {{matrix.julia}} + CUDA GPU"
plugins:
- JuliaCI/julia#v1:
version: "{{matrix.julia}}"
Expand All @@ -17,25 +17,19 @@ steps:
cuda: "*"
env:
BACKEND_GROUP: "CUDA"
LUX_TEST_GROUP: "{{matrix.group}}"
if: build.message !~ /\[skip tests\]/ && build.message !~ /\[skip ci\]/
timeout_in_minutes: 60
matrix:
setup:
julia:
- "1"
group:
- "!fluxcompat,distributed,recurrent_layers"
- "fluxcompat"
- "distributed"
- "recurrent_layers"
- "1.10"

- group: ":telescope: Downstream CUDA"
steps:
- label: ":julia: {{matrix.repo}} (Julia 1 + CUDA GPU)"
plugins:
- JuliaCI/julia#v1:
version: "1"
version: "1.10"
- JuliaCI/julia-coverage#v1:
codecov: true
dirs:
Expand All @@ -56,7 +50,7 @@ steps:

- group: ":julia: AMD GPU"
steps:
- label: ":julia: Julia: {{matrix.julia}} + AMD GPU ({{matrix.group}})"
- label: ":julia: Julia: {{matrix.julia}} + AMD GPU"
plugins:
- JuliaCI/julia#v1:
version: "{{matrix.julia}}"
Expand All @@ -69,8 +63,6 @@ steps:
- ext
env:
BACKEND_GROUP: "AMDGPU"
RETESTITEMS_NWORKERS: 2
LUX_TEST_GROUP: "{{matrix.group}}"
agents:
queue: "juliagpu"
rocm: "*"
Expand All @@ -80,19 +72,14 @@ steps:
matrix:
setup:
julia:
- "1"
group:
- "!fluxcompat,distributed,recurrent_layers"
- "fluxcompat"
- "distributed"
- "recurrent_layers"
- "1.10"

- group: ":telescope: Downstream AMD GPU"
steps:
- label: ":julia: {{matrix.repo}} (Julia 1 + AMD GPU)"
plugins:
- JuliaCI/julia#v1:
version: "1"
version: "1.10"
- JuliaCI/julia-coverage#v1:
codecov: true
dirs:
Expand All @@ -103,8 +90,6 @@ steps:
queue: "juliagpu"
rocm: "*"
rocmgpu: "*"
env:
RETESTITEMS_NWORKERS: 2
if: build.message !~ /\[skip tests\]/ && build.message !~ /\[skip downstream\]/ && build.message !~ /\[skip ci\]/ && build.pull_request.labels includes "run downstream test"
timeout_in_minutes: 60
matrix:
Expand Down
40 changes: 13 additions & 27 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,9 @@ jobs:
fail-fast: false
matrix:
version:
- "1"
- "1.10"
os:
- ubuntu-latest
- macos-latest
- windows-latest
test_group:
- "core_layers"
- "contrib"
Expand All @@ -44,6 +42,13 @@ jobs:
- "recurrent_layers"
- "eltype_match"
- "fluxcompat"
include:
- version: "1.10"
os: macos-latest
test_group: "all"
- version: "1.10"
os: windows-latest
test_group: "all"
steps:
- uses: actions/checkout@v4
- uses: julia-actions/setup-julia@v2
Expand Down Expand Up @@ -76,15 +81,13 @@ jobs:
downstream:
name: Downstream ${{ matrix.package.repo }}/${{ matrix.package.group }}
if: ${{ !contains(github.event.head_commit.message, '[skip tests]') && contains(github.event.pull_request.labels.*.name, 'run downstream test') }}
runs-on: ${{ matrix.os }}
timeout-minutes: 60
runs-on: ubuntu-latest
timeout-minutes: 240
env:
GROUP: ${{ matrix.package.group }}
strategy:
fail-fast: false
matrix:
julia-version: ["1"]
os: [ubuntu-latest]
package:
- { user: SciML, repo: DiffEqFlux.jl, group: BasicNeuralDE }
- { user: SciML, repo: DiffEqFlux.jl, group: AdvancedNeuralDE }
Expand All @@ -96,7 +99,7 @@ jobs:
- uses: actions/checkout@v4
- uses: julia-actions/setup-julia@v2
with:
version: ${{ matrix.julia-version }}
version: "1.10"
arch: x64
- uses: julia-actions/julia-buildpkg@v1
- name: Clone Downstream
Expand Down Expand Up @@ -131,33 +134,16 @@ jobs:

downgrade:
if: ${{ !contains(github.event.head_commit.message, '[skip tests]') && github.base_ref == github.event.repository.default_branch }}
name: Downgrade Julia ${{ matrix.version }} - ${{ matrix.test_group }}
name: Downgrade Julia 1.10
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
version: ["1"]
test_group:
- "core_layers"
- "contrib"
- "helpers"
- "distributed"
- "normalize_layers"
- "others"
- "autodiff"
- "recurrent_layers"
- "eltype_match"
- "fluxcompat"
steps:
- uses: actions/checkout@v4
- uses: julia-actions/setup-julia@v2
with:
version: ${{ matrix.version }}
version: "1.10"
- uses: julia-actions/julia-downgrade-compat@v1
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-runtest@v1
env:
LUX_TEST_GROUP: ${{ matrix.test_group }}
- uses: julia-actions/julia-processcoverage@v1
with:
directories: src,ext
Expand Down

1 comment on commit 04deedf

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: 04deedf Previous: d230834 Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 411750 ns 415291 ns 0.99
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 322271 ns 243167 ns 1.33
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 323042 ns 244625 ns 1.32
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 749375 ns 740667 ns 1.01
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 43905 ns 44725 ns 0.98
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 1306583 ns 1279354.5 ns 1.02
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 465625 ns 1221916 ns 0.38
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 13617333 ns 16280791 ns 0.84
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 2245750 ns 2240458 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 192831 ns 203277 ns 0.95
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 1394875 ns 1383187.5 ns 1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 634729.5 ns 1309667 ns 0.48
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 14050875 ns 16210875 ns 0.87
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 2238000 ns 2235875 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1661542 ns 1666375 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1196103.5 ns 1104041.5 ns 1.08
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1534187.5 ns 1509958 ns 1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3005667 ns 2989666 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 209529 ns 213111 ns 0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12111521 ns 12146875 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9554687 ns 8841167 ns 1.08
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9247000 ns 9243875 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18626583 ns 18585666.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1910271 ns 1936768 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17307250 ns 17311083.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14377958 ns 13983375 ns 1.03
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14526875 ns 14496187.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21836458.5 ns 21837875 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250439041.5 ns 250126228.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 174592521 ns 148997875 ns 1.17
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 115955208.5 ns 116519479.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 447243084 ns 446906458 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5470843 ns 5468434 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1228722500 ns 1223788875 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 543561875 ns 933142709 ns 0.58
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 830623396.5 ns 832839417 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1628878000 ns 1630170292 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 38000637 ns 31512911.5 ns 1.21
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1136994583 ns 1149549375 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 679379084 ns 997374541.5 ns 0.68
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1328113771 ns 1308662646 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1733752146 ns 1731062979.5 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 1103375 ns 1122500 ns 0.98
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 823209 ns 1658708 ns 0.50
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 3578479 ns 3605667 ns 0.99
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 786500 ns 782708.5 ns 1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA 266091.5 ns 284470.5 ns 0.94
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 2986021 ns 2990375 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 2426000 ns 4122208 ns 0.59
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 10461250 ns 10934125 ns 0.96
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3150042 ns 3140208 ns 1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1055864 ns 1127614 ns 0.94
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 2335042 ns 2349749.5 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1537708 ns 1366187.5 ns 1.13
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1740000 ns 1585125 ns 1.10
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 4348437.5 ns 4341687 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 212286 ns 211956.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 20266645.5 ns 20292146 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 17701209 ns 16982750 ns 1.04
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 17495416 ns 18160625 ns 0.96
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 26797000 ns 26736042 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1973706 ns 2009275 ns 0.98
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 44317750 ns 44384292 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 42027646 ns 41010166.5 ns 1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 41325000 ns 41252542 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 47734917 ns 47742354 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 4664854 ns 4667229 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2868521.5 ns 2627145.5 ns 1.09
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 3015958 ns 2754166 ns 1.10
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 8658937.5 ns 8646833.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 516555 ns 471691 ns 1.10
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 40579000.5 ns 40759792 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 34830104 ns 34074937.5 ns 1.02
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 34148292 ns 34004708 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 53661812 ns 53724708 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2969951 ns 3235352 ns 0.92
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 109640958 ns 110050750 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 84133666 ns 137101500 ns 0.61
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 255828791 ns 251499542 ns 1.02
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 96388416 ns 96734833 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 270215792 ns 270582500 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 186630271 ns 157462229 ns 1.19
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 128172709 ns 124550542 ns 1.03
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 489605542 ns 489233625 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 7104246 ns 7003527 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1502664042 ns 1494868312.5 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 821183792 ns 1205204209 ns 0.68
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 1092397958.5 ns 1091914979 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 2032173187.5 ns 2033756875 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 33798333 ns 34486848.5 ns 0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 2027767896 ns 2031846083.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1563910958 ns 1856502416 ns 0.84
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 2210346833.5 ns 2218211729 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 2560629834 ns 2563679583 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 2006833 ns 2093250 ns 0.96
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 1257333 ns 3113375 ns 0.40
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 7451041.5 ns 9724750 ns 0.77
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2470458 ns 2446083.5 ns 1.01
lenet(28, 28, 1, 128)/forward/GPU/CUDA 275531 ns 275113 ns 1.00
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 9463416 ns 9682833 ns 0.98
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 6552500 ns 12076166 ns 0.54
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 25529541 ns 24267792 ns 1.05
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 11734125 ns 11496500 ns 1.02
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1130415 ns 1185320 ns 0.95
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 380676854.5 ns 380917104.5 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 145328000 ns 315455208 ns 0.46
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 243564083 ns 265045166.5 ns 0.92
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 452336354.5 ns 453577208.5 ns 1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4879283 ns 4825872 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 1156932333 ns 1157170792 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 487570458 ns 976146875 ns 0.50
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 973572458 ns 1071077458 ns 0.91
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 1399439834 ns 1399279583 ns 1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 16976929 ns 18526493 ns 0.92
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1062687.5 ns 1057416 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 971124.5 ns 1660750 ns 0.58
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 6269583 ns 5839187.5 ns 1.07
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1393375 ns 1297896 ns 1.07
lenet(28, 28, 1, 64)/forward/GPU/CUDA 277704.5 ns 270186.5 ns 1.03
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 6494541.5 ns 6497437.5 ns 1.00
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 4635437.5 ns 13095667 ns 0.35
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 19450479 ns 19774958 ns 0.98
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 6080229 ns 6060250 ns 1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1148981 ns 1207468 ns 0.95
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70442208 ns 70439459 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 35305229 ns 43880645.5 ns 0.80
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39532604 ns 39802542 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132574604 ns 132617229.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1848251 ns 1928198.5 ns 0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 356785937.5 ns 354773521 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 159371854 ns 271527854 ns 0.59
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 254893688 ns 253115833 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 535009020.5 ns 534735167 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 16489529.5 ns 13227623 ns 1.25
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 395707667 ns 395827000 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 245564417 ns 373039667 ns 0.66
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 652089584 ns 703091167 ns 0.93
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 712574333 ns 714378250 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 1191762375 ns 1187937250 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 434009729.5 ns 839767834 ns 0.52
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 631038834 ns 640628833 ns 0.99
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 1771033395.5 ns 1772779750.5 ns 1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12471861 ns 12386874 ns 1.01
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 3670803208.5 ns 3628821667 ns 1.01
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 1633483458 ns 2842192167 ns 0.57
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 2737701958 ns 2716722458 ns 1.01
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 5038709417 ns 5042550875 ns 1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 49641386 ns 49688646 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3412146 ns 3430062.5 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2094750 ns 2069021 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2533833.5 ns 2518417 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6034292 ns 6032959 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 586721 ns 573246 ns 1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 26096750.5 ns 26098500 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 20315791.5 ns 19045208 ns 1.07
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 19312917 ns 19561125 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 39366625 ns 39345062.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2989473.5 ns 3186388 ns 0.94
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 54095229 ns 55895354 ns 0.97
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 28393083 ns 83953562.5 ns 0.34
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 177757792 ns 177984916 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 45278750 ns 45586542 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1778208 ns 1786000.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1204708 ns 1108812.5 ns 1.09
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1564000 ns 1583271 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3038771 ns 3031458.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 217944 ns 216476 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12531437.5 ns 12561896 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9964292 ns 9222083 ns 1.08
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9707042 ns 9681604.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18974500 ns 18991354 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1963028.5 ns 1983529 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17644270.5 ns 17661854.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14745500 ns 14350708 ns 1.03
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14639333 ns 14571666 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 22173792 ns 22207958 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70409562 ns 70523437 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 34786542 ns 43757146 ns 0.79
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39571499.5 ns 39692875 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132610521 ns 132543875 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1837717 ns 1868597 ns 0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 360588187.5 ns 358019500 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 237608334 ns 348616458.5 ns 0.68
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 299913354 ns 304684062.5 ns 0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 725805833 ns 726741083 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13956738 ns 14313431.5 ns 0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 418949812.5 ns 420910145.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 251360792 ns 427953667 ns 0.59
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 712732021 ns 711470292 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 717284542 ns 718110625 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 1912041.5 ns 1783333.5 ns 1.07
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 1579125 ns 1377417 ns 1.15
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 1549791.5 ns 1380791 ns 1.12
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 2657625 ns 2616709 ns 1.02
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 573525 ns 569443 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 9220000 ns 9249354 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 5936166 ns 15832708.5 ns 0.37
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 31895937.5 ns 32885020.5 ns 0.97
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 10214937.5 ns 10214250 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1399984.5 ns 1406558.5 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 22182333.5 ns 22309667 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 19138291.5 ns 28394500 ns 0.67
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 52527562.5 ns 56878750 ns 0.92
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 18888042 ns 18878041 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 791291.5 ns 690833.5 ns 1.15
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 69958.5 ns 613625 ns 0.11
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 997167 ns 1078916 ns 0.92
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 724499.5 ns 724417 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 48324 ns 47653 ns 1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 1508042 ns 1550500 ns 0.97
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 320291 ns 1006604.5 ns 0.32
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 1445145.5 ns 1431333.5 ns 1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 2258458.5 ns 2290167 ns 0.99
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 216350 ns 227007.5 ns 0.95
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 1537083 ns 1559479 ns 0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 428792 ns 1065562.5 ns 0.40
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 1444584 ns 1941250 ns 0.74
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 2250333 ns 2187500 ns 1.03
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3421750 ns 3412458 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2084312.5 ns 2060333 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2519375.5 ns 2504750 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6015021 ns 6004208.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 584297 ns 571869.5 ns 1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 24071521.5 ns 24064937.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 18050833 ns 17186562.5 ns 1.05
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 17227375 ns 17163520.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 37583145.5 ns 37576333 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2895440 ns 3169039 ns 0.91
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 52599188 ns 53946459 ns 0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 27644250 ns 83764604.5 ns 0.33
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 170611917 ns 175113292 ns 0.97
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 44514250 ns 44468375 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250102292 ns 250717708 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 174510104 ns 148723729 ns 1.17
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 115645729 ns 116337041.5 ns 0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 448140124.5 ns 447560562.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5446378 ns 5458848 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1105120833 ns 1101190667 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 467780729.5 ns 856965729.5 ns 0.55
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 825455520.5 ns 828981916.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1753431125 ns 1751973959 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 35149612 ns 29300703 ns 1.20
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1021983312.5 ns 1020791479.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 662517187.5 ns 981034709 ns 0.68
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1286071167 ns 1298484958 ns 0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1721665437.5 ns 1724676458.5 ns 1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1312041 ns 1192334 ns 1.10
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 928625 ns 722208.5 ns 1.29
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 903208 ns 802271 ns 1.13
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 2032416 ns 2055959 ns 0.99
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 575428 ns 554738 ns 1.04
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 5922771 ns 5970125 ns 0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 2615500 ns 9028833 ns 0.29
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 24427083.5 ns 27064125 ns 0.90
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 7104916.5 ns 7113729 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1363516 ns 1360766 ns 1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 9705958.5 ns 9717625 ns 1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 6499000 ns 16161979 ns 0.40
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 31929750 ns 34006416.5 ns 0.94
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 7614042 ns 7613041 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 483291 ns 386625 ns 1.25
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 31750 ns 466375 ns 0.06807826320021441
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 1795375 ns 2797833 ns 0.64
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 91542 ns 91041.5 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 28996 ns 28215 ns 1.03
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 392958 ns 410729 ns 0.96
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 175542 ns 458375 ns 0.38
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 4708417 ns 4385625 ns 1.07
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 273000 ns 273062.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 224707.5 ns 212092.5 ns 1.06
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 666333 ns 682000 ns 0.98
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 442250 ns 731083.5 ns 0.60
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 4499167 ns 4635250 ns 0.97
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 510979.5 ns 510917 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 430437.5 ns 329062.5 ns 1.31
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 13583 ns 405333 ns 0.03351071834763022
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 709208 ns 775209 ns 0.91
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 52584 ns 53250 ns 0.99
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 29296 ns 27988 ns 1.05
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 337250 ns 358354 ns 0.94
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 26375 ns 340937.5 ns 0.0773602199816682
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 484812.5 ns 667854 ns 0.73
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 151333 ns 151583 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 213308.5 ns 199391.5 ns 1.07
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 352521 ns 372916.5 ns 0.95
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 45792 ns 354896 ns 0.13
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 487125 ns 585667 ns 0.83
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 151000 ns 151375 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 603223875 ns 600844791 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 239241354 ns 434479500 ns 0.55
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 377713896 ns 395023625 ns 0.96
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 872019458 ns 872456875 ns 1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7676104.5 ns 7629063.5 ns 1.01
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 2005520125 ns 1996796291.5 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 947653916.5 ns 1637741500 ns 0.58
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 1551514604.5 ns 1582333333.5 ns 0.98
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 2653038416 ns 2658961958 ns 1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 27180094 ns 26619843 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 525604 ns 532479 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 168333 ns 405208 ns 0.42
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 1740625 ns 2880604.5 ns 0.60
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 875541 ns 877791.5 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 47837 ns 47573 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1943750 ns 1905250 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 1100208 ns 1799584 ns 0.61
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 14661875 ns 16464375 ns 0.89
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 2836709 ns 2818750 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 232330 ns 239346.5 ns 0.97
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 2974229 ns 2932000 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 2208583.5 ns 4975687.5 ns 0.44
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 15024229.5 ns 16759417 ns 0.90
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 3751750 ns 3748708 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1602291.5 ns 1367812.5 ns 1.17
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 1221084 ns 930041 ns 1.31
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 1264750 ns 1056709 ns 1.20
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2362750 ns 2313729.5 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 576709 ns 567030 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 5931125 ns 5196958 ns 1.14
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 2866334 ns 8601584 ns 0.33
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 25035834 ns 26184083.5 ns 0.96
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 6650208 ns 7337728.5 ns 0.91
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1379411 ns 1330201.5 ns 1.04
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 11605146 ns 11580375 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 8767458 ns 18587958.5 ns 0.47
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 35255000 ns 37621062.5 ns 0.94
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 9570000.5 ns 9557791 ns 1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2541 ns 3041 ns 0.84
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 2292 ns 2792 ns 0.82
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 3000 ns 3375 ns 0.89
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 2333 ns 2854 ns 0.82
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 25379.5 ns 25102 ns 1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 7125 ns 7125 ns 1
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 7083 ns 6958 ns 1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 7375 ns 7875 ns 0.94
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 7270.5 ns 7083 ns 1.03
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 193729.5 ns 201877.5 ns 0.96
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 8334 ns 8292 ns 1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 8500 ns 8333 ns 1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 8417 ns 8542 ns 0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 6084 ns 5958 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 10375.5 ns 10813 ns 0.96
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 14916 ns 13916 ns 1.07
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 11854 ns 11312.5 ns 1.05
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 7625 ns 7709 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 25646 ns 25316 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 21708 ns 21583 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 21500 ns 21625 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 21750 ns 21708 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 21875 ns 21500 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 203851 ns 219161 ns 0.93
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 53417 ns 53500 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 56583.5 ns 53458 ns 1.06
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 53583.5 ns 53542 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 51333 ns 51166.5 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 26895.5 ns 28292 ns 0.95
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 28333.5 ns 28792 ns 0.98
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 29000 ns 28375 ns 1.02
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 48291 ns 46125 ns 1.05
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 26739 ns 26235 ns 1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 220875 ns 229583 ns 0.96
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 44583 ns 277792 ns 0.16
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 4132667 ns 4446854.5 ns 0.93
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 145458 ns 145500 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 172310 ns 197661 ns 0.87
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 237312.5 ns 246666.5 ns 0.96
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 68625 ns 296000 ns 0.23
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 4360708 ns 4144084 ns 1.05
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 145917 ns 145750 ns 1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 2292 ns 1834 ns 1.25
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 1750 ns 1750 ns 1
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 2166 ns 2500 ns 0.87
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 1520.5 ns 3750 ns 0.41
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 23935 ns 23319.5 ns 1.03
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 5125 ns 5292 ns 0.97
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 5042 ns 5000 ns 1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 5458 ns 5416 ns 1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 5084 ns 5000 ns 1.02
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 176841 ns 226307 ns 0.78
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 7292 ns 7459 ns 0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 8166 ns 7375 ns 1.11
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 7541 ns 7792 ns 0.97
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 5167 ns 5042 ns 1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 80940833 ns 81067334 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 41092709 ns 48673125 ns 0.84
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 45570541 ns 43747500 ns 1.04
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 153559792 ns 153700375 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2660311 ns 2718893 ns 0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 621714834 ns 621060459 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 421739375 ns 430659541 ns 0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 414510667 ns 409758041.5 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 697568292 ns 699041292 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 15148414 ns 15621337.5 ns 0.97
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 872377937.5 ns 875541666.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 706482291.5 ns 845831187.5 ns 0.84
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 1162546146 ns 1160340833.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 1175739375 ns 1177842604 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.