-
Notifications
You must be signed in to change notification settings - Fork 60
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ci: run tests only on
1.10
for now (#975)
* ci: run tests only on 1.10 for now * ci: try reducing the number of groups
- Loading branch information
Showing
3 changed files
with
21 additions
and
50 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
04deedf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
411750
ns415291
ns0.99
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
322271
ns243167
ns1.33
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
323042
ns244625
ns1.32
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
749375
ns740667
ns1.01
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43905
ns44725
ns0.98
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1306583
ns1279354.5
ns1.02
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
465625
ns1221916
ns0.38
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
13617333
ns16280791
ns0.84
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2245750
ns2240458
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
192831
ns203277
ns0.95
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1394875
ns1383187.5
ns1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
634729.5
ns1309667
ns0.48
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
14050875
ns16210875
ns0.87
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2238000
ns2235875
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1661542
ns1666375
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1196103.5
ns1104041.5
ns1.08
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1534187.5
ns1509958
ns1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3005667
ns2989666
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
209529
ns213111
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12111521
ns12146875
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9554687
ns8841167
ns1.08
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9247000
ns9243875
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18626583
ns18585666.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1910271
ns1936768
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17307250
ns17311083.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14377958
ns13983375
ns1.03
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14526875
ns14496187.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21836458.5
ns21837875
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250439041.5
ns250126228.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
174592521
ns148997875
ns1.17
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115955208.5
ns116519479.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447243084
ns446906458
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5470843
ns5468434
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1228722500
ns1223788875
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
543561875
ns933142709
ns0.58
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
830623396.5
ns832839417
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1628878000
ns1630170292
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
38000637
ns31512911.5
ns1.21
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1136994583
ns1149549375
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
679379084
ns997374541.5
ns0.68
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1328113771
ns1308662646
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1733752146
ns1731062979.5
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1103375
ns1122500
ns0.98
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
823209
ns1658708
ns0.50
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3578479
ns3605667
ns0.99
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
786500
ns782708.5
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
266091.5
ns284470.5
ns0.94
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2986021
ns2990375
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
2426000
ns4122208
ns0.59
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
10461250
ns10934125
ns0.96
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3150042
ns3140208
ns1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1055864
ns1127614
ns0.94
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2335042
ns2349749.5
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1537708
ns1366187.5
ns1.13
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1740000
ns1585125
ns1.10
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4348437.5
ns4341687
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
212286
ns211956.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
20266645.5
ns20292146
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
17701209
ns16982750
ns1.04
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17495416
ns18160625
ns0.96
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
26797000
ns26736042
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1973706
ns2009275
ns0.98
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
44317750
ns44384292
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
42027646
ns41010166.5
ns1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
41325000
ns41252542
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
47734917
ns47742354
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4664854
ns4667229
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2868521.5
ns2627145.5
ns1.09
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
3015958
ns2754166
ns1.10
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8658937.5
ns8646833.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
516555
ns471691
ns1.10
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
40579000.5
ns40759792
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
34830104
ns34074937.5
ns1.02
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
34148292
ns34004708
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
53661812
ns53724708
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2969951
ns3235352
ns0.92
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
109640958
ns110050750
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
84133666
ns137101500
ns0.61
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
255828791
ns251499542
ns1.02
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
96388416
ns96734833
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
270215792
ns270582500
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
186630271
ns157462229
ns1.19
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
128172709
ns124550542
ns1.03
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
489605542
ns489233625
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7104246
ns7003527
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1502664042
ns1494868312.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
821183792
ns1205204209
ns0.68
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1092397958.5
ns1091914979
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2032173187.5
ns2033756875
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
33798333
ns34486848.5
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
2027767896
ns2031846083.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1563910958
ns1856502416
ns0.84
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
2210346833.5
ns2218211729
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2560629834
ns2563679583
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2006833
ns2093250
ns0.96
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
1257333
ns3113375
ns0.40
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
7451041.5
ns9724750
ns0.77
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2470458
ns2446083.5
ns1.01
lenet(28, 28, 1, 128)/forward/GPU/CUDA
275531
ns275113
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9463416
ns9682833
ns0.98
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
6552500
ns12076166
ns0.54
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
25529541
ns24267792
ns1.05
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11734125
ns11496500
ns1.02
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1130415
ns1185320
ns0.95
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
380676854.5
ns380917104.5
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
145328000
ns315455208
ns0.46
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
243564083
ns265045166.5
ns0.92
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
452336354.5
ns453577208.5
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4879283
ns4825872
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1156932333
ns1157170792
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
487570458
ns976146875
ns0.50
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
973572458
ns1071077458
ns0.91
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1399439834
ns1399279583
ns1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
16976929
ns18526493
ns0.92
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1062687.5
ns1057416
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
971124.5
ns1660750
ns0.58
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
6269583
ns5839187.5
ns1.07
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1393375
ns1297896
ns1.07
lenet(28, 28, 1, 64)/forward/GPU/CUDA
277704.5
ns270186.5
ns1.03
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6494541.5
ns6497437.5
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
4635437.5
ns13095667
ns0.35
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
19450479
ns19774958
ns0.98
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6080229
ns6060250
ns1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1148981
ns1207468
ns0.95
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70442208
ns70439459
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
35305229
ns43880645.5
ns0.80
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39532604
ns39802542
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132574604
ns132617229.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1848251
ns1928198.5
ns0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
356785937.5
ns354773521
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
159371854
ns271527854
ns0.59
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
254893688
ns253115833
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
535009020.5
ns534735167
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
16489529.5
ns13227623
ns1.25
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
395707667
ns395827000
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
245564417
ns373039667
ns0.66
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
652089584
ns703091167
ns0.93
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
712574333
ns714378250
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1191762375
ns1187937250
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
434009729.5
ns839767834
ns0.52
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
631038834
ns640628833
ns0.99
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1771033395.5
ns1772779750.5
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12471861
ns12386874
ns1.01
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3670803208.5
ns3628821667
ns1.01
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
1633483458
ns2842192167
ns0.57
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2737701958
ns2716722458
ns1.01
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5038709417
ns5042550875
ns1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49641386
ns49688646
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3412146
ns3430062.5
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2094750
ns2069021
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2533833.5
ns2518417
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6034292
ns6032959
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
586721
ns573246
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
26096750.5
ns26098500
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
20315791.5
ns19045208
ns1.07
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19312917
ns19561125
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39366625
ns39345062.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2989473.5
ns3186388
ns0.94
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
54095229
ns55895354
ns0.97
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
28393083
ns83953562.5
ns0.34
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
177757792
ns177984916
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45278750
ns45586542
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1778208
ns1786000.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1204708
ns1108812.5
ns1.09
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1564000
ns1583271
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3038771
ns3031458.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
217944
ns216476
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12531437.5
ns12561896
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9964292
ns9222083
ns1.08
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9707042
ns9681604.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18974500
ns18991354
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1963028.5
ns1983529
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17644270.5
ns17661854.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14745500
ns14350708
ns1.03
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14639333
ns14571666
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22173792
ns22207958
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70409562
ns70523437
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
34786542
ns43757146
ns0.79
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39571499.5
ns39692875
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132610521
ns132543875
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1837717
ns1868597
ns0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
360588187.5
ns358019500
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
237608334
ns348616458.5
ns0.68
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
299913354
ns304684062.5
ns0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
725805833
ns726741083
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13956738
ns14313431.5
ns0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
418949812.5
ns420910145.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
251360792
ns427953667
ns0.59
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
712732021
ns711470292
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
717284542
ns718110625
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1912041.5
ns1783333.5
ns1.07
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1579125
ns1377417
ns1.15
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1549791.5
ns1380791
ns1.12
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2657625
ns2616709
ns1.02
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
573525
ns569443
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
9220000
ns9249354
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
5936166
ns15832708.5
ns0.37
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
31895937.5
ns32885020.5
ns0.97
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
10214937.5
ns10214250
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1399984.5
ns1406558.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
22182333.5
ns22309667
ns0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
19138291.5
ns28394500
ns0.67
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
52527562.5
ns56878750
ns0.92
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
18888042
ns18878041
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
791291.5
ns690833.5
ns1.15
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
69958.5
ns613625
ns0.11
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
997167
ns1078916
ns0.92
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
724499.5
ns724417
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
48324
ns47653
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1508042
ns1550500
ns0.97
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
320291
ns1006604.5
ns0.32
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1445145.5
ns1431333.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2258458.5
ns2290167
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
216350
ns227007.5
ns0.95
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1537083
ns1559479
ns0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
428792
ns1065562.5
ns0.40
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1444584
ns1941250
ns0.74
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2250333
ns2187500
ns1.03
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3421750
ns3412458
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2084312.5
ns2060333
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2519375.5
ns2504750
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6015021
ns6004208.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
584297
ns571869.5
ns1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24071521.5
ns24064937.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18050833
ns17186562.5
ns1.05
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17227375
ns17163520.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37583145.5
ns37576333
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2895440
ns3169039
ns0.91
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
52599188
ns53946459
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
27644250
ns83764604.5
ns0.33
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
170611917
ns175113292
ns0.97
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44514250
ns44468375
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250102292
ns250717708
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
174510104
ns148723729
ns1.17
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115645729
ns116337041.5
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
448140124.5
ns447560562.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5446378
ns5458848
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1105120833
ns1101190667
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
467780729.5
ns856965729.5
ns0.55
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
825455520.5
ns828981916.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1753431125
ns1751973959
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
35149612
ns29300703
ns1.20
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1021983312.5
ns1020791479.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
662517187.5
ns981034709
ns0.68
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1286071167
ns1298484958
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1721665437.5
ns1724676458.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1312041
ns1192334
ns1.10
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
928625
ns722208.5
ns1.29
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
903208
ns802271
ns1.13
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2032416
ns2055959
ns0.99
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
575428
ns554738
ns1.04
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5922771
ns5970125
ns0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
2615500
ns9028833
ns0.29
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
24427083.5
ns27064125
ns0.90
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7104916.5
ns7113729
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1363516
ns1360766
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
9705958.5
ns9717625
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
6499000
ns16161979
ns0.40
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
31929750
ns34006416.5
ns0.94
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
7614042
ns7613041
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
483291
ns386625
ns1.25
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
31750
ns466375
ns0.06807826320021441
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
1795375
ns2797833
ns0.64
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
91542
ns91041.5
ns1.01
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28996
ns28215
ns1.03
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
392958
ns410729
ns0.96
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
175542
ns458375
ns0.38
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4708417
ns4385625
ns1.07
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
273000
ns273062.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
224707.5
ns212092.5
ns1.06
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
666333
ns682000
ns0.98
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
442250
ns731083.5
ns0.60
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
4499167
ns4635250
ns0.97
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
510979.5
ns510917
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
430437.5
ns329062.5
ns1.31
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
13583
ns405333
ns0.03351071834763022
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
709208
ns775209
ns0.91
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
52584
ns53250
ns0.99
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
29296
ns27988
ns1.05
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
337250
ns358354
ns0.94
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
26375
ns340937.5
ns0.0773602199816682
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
484812.5
ns667854
ns0.73
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
151333
ns151583
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
213308.5
ns199391.5
ns1.07
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
352521
ns372916.5
ns0.95
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
45792
ns354896
ns0.13
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
487125
ns585667
ns0.83
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
151000
ns151375
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
603223875
ns600844791
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
239241354
ns434479500
ns0.55
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
377713896
ns395023625
ns0.96
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
872019458
ns872456875
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7676104.5
ns7629063.5
ns1.01
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2005520125
ns1996796291.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
947653916.5
ns1637741500
ns0.58
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1551514604.5
ns1582333333.5
ns0.98
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2653038416
ns2658961958
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
27180094
ns26619843
ns1.02
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
525604
ns532479
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
168333
ns405208
ns0.42
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
1740625
ns2880604.5
ns0.60
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
875541
ns877791.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47837
ns47573
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1943750
ns1905250
ns1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
1100208
ns1799584
ns0.61
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
14661875
ns16464375
ns0.89
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2836709
ns2818750
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
232330
ns239346.5
ns0.97
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
2974229
ns2932000
ns1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
2208583.5
ns4975687.5
ns0.44
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
15024229.5
ns16759417
ns0.90
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
3751750
ns3748708
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1602291.5
ns1367812.5
ns1.17
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1221084
ns930041
ns1.31
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1264750
ns1056709
ns1.20
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2362750
ns2313729.5
ns1.02
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
576709
ns567030
ns1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5931125
ns5196958
ns1.14
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
2866334
ns8601584
ns0.33
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
25035834
ns26184083.5
ns0.96
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
6650208
ns7337728.5
ns0.91
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1379411
ns1330201.5
ns1.04
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
11605146
ns11580375
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
8767458
ns18587958.5
ns0.47
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
35255000
ns37621062.5
ns0.94
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
9570000.5
ns9557791
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2541
ns3041
ns0.84
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2292
ns2792
ns0.82
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3000
ns3375
ns0.89
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2333
ns2854
ns0.82
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
25379.5
ns25102
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7125
ns7125
ns1
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7083
ns6958
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7375
ns7875
ns0.94
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7270.5
ns7083
ns1.03
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
193729.5
ns201877.5
ns0.96
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8334
ns8292
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8500
ns8333
ns1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8417
ns8542
ns0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
6084
ns5958
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10375.5
ns10813
ns0.96
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
14916
ns13916
ns1.07
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
11854
ns11312.5
ns1.05
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7625
ns7709
ns0.99
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
25646
ns25316
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
21708
ns21583
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
21500
ns21625
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
21750
ns21708
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
21875
ns21500
ns1.02
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
203851
ns219161
ns0.93
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
53417
ns53500
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
56583.5
ns53458
ns1.06
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
53583.5
ns53542
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
51333
ns51166.5
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
26895.5
ns28292
ns0.95
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28333.5
ns28792
ns0.98
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
29000
ns28375
ns1.02
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
48291
ns46125
ns1.05
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
26739
ns26235
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
220875
ns229583
ns0.96
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
44583
ns277792
ns0.16
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4132667
ns4446854.5
ns0.93
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
145458
ns145500
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
172310
ns197661
ns0.87
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
237312.5
ns246666.5
ns0.96
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
68625
ns296000
ns0.23
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
4360708
ns4144084
ns1.05
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
145917
ns145750
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
2292
ns1834
ns1.25
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1750
ns1750
ns1
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2166
ns2500
ns0.87
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1520.5
ns3750
ns0.41
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23935
ns23319.5
ns1.03
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5125
ns5292
ns0.97
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5042
ns5000
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5458
ns5416
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5084
ns5000
ns1.02
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
176841
ns226307
ns0.78
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
7292
ns7459
ns0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
8166
ns7375
ns1.11
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
7541
ns7792
ns0.97
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
5167
ns5042
ns1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
80940833
ns81067334
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
41092709
ns48673125
ns0.84
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
45570541
ns43747500
ns1.04
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
153559792
ns153700375
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2660311
ns2718893
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
621714834
ns621060459
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
421739375
ns430659541
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
414510667
ns409758041.5
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
697568292
ns699041292
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
15148414
ns15621337.5
ns0.97
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
872377937.5
ns875541666.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
706482291.5
ns845831187.5
ns0.84
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
1162546146
ns1160340833.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
1175739375
ns1177842604
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.