-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: run tests only on 1.10
for now
#975
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
avik-pal
changed the title
ci: run tests only on 1.10 for now
ci: run tests only on Oct 8, 2024
1.10
for now
Benchmark Results (ASV)
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Benchmark suite | Current: 67d259c | Previous: d230834 | Ratio |
---|---|---|---|
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) |
411645.5 ns |
415291 ns |
0.99 |
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) |
322937.5 ns |
243167 ns |
1.33 |
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) |
322041 ns |
244625 ns |
1.32 |
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) |
739541 ns |
740667 ns |
1.00 |
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA |
43450 ns |
44725 ns |
0.97 |
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) |
592583.5 ns |
1279354.5 ns |
0.46 |
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) |
2412000 ns |
1221916 ns |
1.97 |
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) |
13927209 ns |
16280791 ns |
0.86 |
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) |
2248999.5 ns |
2240458 ns |
1.00 |
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA |
188422 ns |
203277 ns |
0.93 |
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) |
734083 ns |
1383187.5 ns |
0.53 |
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) |
2603292 ns |
1309667 ns |
1.99 |
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) |
14021833.5 ns |
16210875 ns |
0.86 |
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) |
2236271.5 ns |
2235875 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) |
1631250 ns |
1666375 ns |
0.98 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) |
1080125 ns |
1104041.5 ns |
0.98 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) |
1555750 ns |
1509958 ns |
1.03 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) |
2947354 ns |
2989666 ns |
0.99 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA |
209946.5 ns |
213111 ns |
0.99 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) |
12284250 ns |
12146875 ns |
1.01 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) |
8808604.5 ns |
8841167 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) |
9229438 ns |
9243875 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) |
18579958 ns |
18585666.5 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA |
1913542 ns |
1936768 ns |
0.99 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) |
17282645.5 ns |
17311083.5 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) |
13958458 ns |
13983375 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) |
14540625.5 ns |
14496187.5 ns |
1.00 |
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) |
21833708 ns |
21837875 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) |
121488833 ns |
250126228.5 ns |
0.49 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) |
148322083 ns |
148997875 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) |
116063000 ns |
116519479.5 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) |
446423041 ns |
446906458 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA |
5496616 ns |
5468434 ns |
1.01 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) |
596935833.5 ns |
1223788875 ns |
0.49 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) |
931042833 ns |
933142709 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) |
829999521 ns |
832839417 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) |
1630204291 ns |
1630170292 ns |
1.00 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA |
35312892 ns |
31512911.5 ns |
1.12 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) |
711570895.5 ns |
1149549375 ns |
0.62 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) |
1015965291.5 ns |
997374541.5 ns |
1.02 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) |
1327562479.5 ns |
1308662646 ns |
1.01 |
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) |
1729988979 ns |
1731062979.5 ns |
1.00 |
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) |
866979.5 ns |
1122500 ns |
0.77 |
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) |
1620750 ns |
1658708 ns |
0.98 |
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) |
3533895.5 ns |
3605667 ns |
0.98 |
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) |
788625 ns |
782708.5 ns |
1.01 |
lenet(28, 28, 1, 32)/forward/GPU/CUDA |
271924 ns |
284470.5 ns |
0.96 |
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) |
2708167 ns |
2990375 ns |
0.91 |
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) |
4121417 ns |
4122208 ns |
1.00 |
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) |
9736417 ns |
10934125 ns |
0.89 |
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) |
3152021 ns |
3140208 ns |
1.00 |
lenet(28, 28, 1, 32)/zygote/GPU/CUDA |
1079439 ns |
1127614 ns |
0.96 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) |
2240062.5 ns |
2349749.5 ns |
0.95 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) |
1473583 ns |
1366187.5 ns |
1.08 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) |
1700333 ns |
1585125 ns |
1.07 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) |
4359917 ns |
4341687 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA |
211937.5 ns |
211956.5 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) |
20399166 ns |
20292146 ns |
1.01 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) |
16953541.5 ns |
16982750 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) |
18292375 ns |
18160625 ns |
1.01 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) |
26726958 ns |
26736042 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA |
1984165 ns |
2009275 ns |
0.99 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) |
45070167 ns |
44384292 ns |
1.02 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) |
41013229.5 ns |
41010166.5 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) |
41270708.5 ns |
41252542 ns |
1.00 |
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) |
47745792 ns |
47742354 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) |
4313167 ns |
4667229 ns |
0.92 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) |
2855708 ns |
2627145.5 ns |
1.09 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) |
3013645.5 ns |
2754166 ns |
1.09 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) |
8659709 ns |
8646833.5 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA |
516999 ns |
471691 ns |
1.10 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) |
39932125 ns |
40759792 ns |
0.98 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) |
33968416 ns |
34074937.5 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) |
34069417 ns |
34004708 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) |
53668500 ns |
53724708 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA |
3241641 ns |
3235352 ns |
1.00 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) |
90463791 ns |
110050750 ns |
0.82 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) |
136157500 ns |
137101500 ns |
0.99 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) |
254064979 ns |
251499542 ns |
1.01 |
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) |
95920292 ns |
96734833 ns |
0.99 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) |
142447792 ns |
270582500 ns |
0.53 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) |
160728542 ns |
157462229 ns |
1.02 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) |
128320500 ns |
124550542 ns |
1.03 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) |
489398208 ns |
489233625 ns |
1.00 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA |
7076007 ns |
7003527 ns |
1.01 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) |
878198000 ns |
1494868312.5 ns |
0.59 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) |
1159512542 ns |
1205204209 ns |
0.96 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) |
1091950937.5 ns |
1091914979 ns |
1.00 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) |
2031129145.5 ns |
2033756875 ns |
1.00 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA |
33948505 ns |
34486848.5 ns |
0.98 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) |
1675439708 ns |
2031846083.5 ns |
0.82 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) |
1856136458 ns |
1856502416 ns |
1.00 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) |
2152323750 ns |
2218211729 ns |
0.97 |
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) |
2557724375 ns |
2563679583 ns |
1.00 |
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) |
1489375 ns |
2093250 ns |
0.71 |
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) |
3016271 ns |
3113375 ns |
0.97 |
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) |
7413417 ns |
9724750 ns |
0.76 |
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) |
2315437.5 ns |
2446083.5 ns |
0.95 |
lenet(28, 28, 1, 128)/forward/GPU/CUDA |
269701.5 ns |
275113 ns |
0.98 |
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) |
7569375 ns |
9682833 ns |
0.78 |
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) |
12022729 ns |
12076166 ns |
1.00 |
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) |
25932458 ns |
24267792 ns |
1.07 |
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) |
11703000 ns |
11496500 ns |
1.02 |
lenet(28, 28, 1, 128)/zygote/GPU/CUDA |
1118633.5 ns |
1185320 ns |
0.94 |
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) |
184907416.5 ns |
380917104.5 ns |
0.49 |
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) |
284546083.5 ns |
315455208 ns |
0.90 |
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) |
241426750 ns |
265045166.5 ns |
0.91 |
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) |
452991354 ns |
453577208.5 ns |
1.00 |
vgg16(32, 32, 3, 32)/forward/GPU/CUDA |
5112454 ns |
4825872 ns |
1.06 |
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) |
645024208 ns |
1157170792 ns |
0.56 |
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) |
997483375 ns |
976146875 ns |
1.02 |
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) |
925202750 ns |
1071077458 ns |
0.86 |
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) |
1397551666 ns |
1399279583 ns |
1.00 |
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA |
17373860 ns |
18526493 ns |
0.94 |
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) |
1087083 ns |
1057416 ns |
1.03 |
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) |
2091125 ns |
1660750 ns |
1.26 |
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) |
5668875 ns |
5839187.5 ns |
0.97 |
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) |
1359020.5 ns |
1297896 ns |
1.05 |
lenet(28, 28, 1, 64)/forward/GPU/CUDA |
272985 ns |
270186.5 ns |
1.01 |
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) |
6008312.5 ns |
6497437.5 ns |
0.92 |
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) |
12422354 ns |
13095667 ns |
0.95 |
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) |
19220333 ns |
19774958 ns |
0.97 |
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) |
6086791 ns |
6060250 ns |
1.00 |
lenet(28, 28, 1, 64)/zygote/GPU/CUDA |
1148718.5 ns |
1207468 ns |
0.95 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) |
23312791.5 ns |
70439459 ns |
0.33 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) |
43424291.5 ns |
43880645.5 ns |
0.99 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) |
39621249.5 ns |
39802542 ns |
1.00 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) |
132704688 ns |
132617229.5 ns |
1.00 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA |
1838219.5 ns |
1928198.5 ns |
0.95 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) |
183969250 ns |
354773521 ns |
0.52 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) |
270639125 ns |
271527854 ns |
1.00 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) |
254150458 ns |
253115833 ns |
1.00 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) |
534737354 ns |
534735167 ns |
1.00 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA |
16495205 ns |
13227623 ns |
1.25 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) |
296709063 ns |
395827000 ns |
0.75 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) |
408346313 ns |
373039667 ns |
1.09 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) |
675605375 ns |
703091167 ns |
0.96 |
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) |
713172208 ns |
714378250 ns |
1.00 |
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) |
658745500 ns |
1187937250 ns |
0.55 |
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) |
691800875 ns |
839767834 ns |
0.82 |
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) |
627903750 ns |
640628833 ns |
0.98 |
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) |
1771393041.5 ns |
1772779750.5 ns |
1.00 |
vgg16(32, 32, 3, 128)/forward/GPU/CUDA |
12470963 ns |
12386874 ns |
1.01 |
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) |
1896686208.5 ns |
3628821667 ns |
0.52 |
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) |
2821462667 ns |
2842192167 ns |
0.99 |
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) |
2708179792 ns |
2716722458 ns |
1.00 |
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) |
5031662417 ns |
5042550875 ns |
1.00 |
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA |
49800691 ns |
49688646 ns |
1.00 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) |
3056042 ns |
3430062.5 ns |
0.89 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) |
2073917 ns |
2069021 ns |
1.00 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) |
2534959 ns |
2518417 ns |
1.01 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) |
6039604 ns |
6032959 ns |
1.00 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA |
581352.5 ns |
573246 ns |
1.01 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) |
25564458 ns |
26098500 ns |
0.98 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) |
19122958 ns |
19045208 ns |
1.00 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) |
19457084 ns |
19561125 ns |
0.99 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) |
39399583.5 ns |
39345062.5 ns |
1.00 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA |
3193363 ns |
3186388 ns |
1.00 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) |
35357833.5 ns |
55895354 ns |
0.63 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) |
82465979.5 ns |
83953562.5 ns |
0.98 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) |
173616833.5 ns |
177984916 ns |
0.98 |
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) |
45586854 ns |
45586542 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) |
1658104.5 ns |
1786000.5 ns |
0.93 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) |
1103708 ns |
1108812.5 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) |
1584042 ns |
1583271 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) |
3040583 ns |
3031458.5 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA |
215913 ns |
216476 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) |
12742166 ns |
12561896 ns |
1.01 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) |
9206125 ns |
9222083 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) |
9678249.5 ns |
9681604.5 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) |
19021750 ns |
18991354 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA |
1940651.5 ns |
1983529 ns |
0.98 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) |
17683791.5 ns |
17661854.5 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) |
14325396.5 ns |
14350708 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) |
14622000 ns |
14571666 ns |
1.00 |
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) |
22174333 ns |
22207958 ns |
1.00 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) |
23780042 ns |
70523437 ns |
0.34 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) |
43554208.5 ns |
43757146 ns |
1.00 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) |
39504333 ns |
39692875 ns |
1.00 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) |
132670229.5 ns |
132543875 ns |
1.00 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA |
1845256.5 ns |
1868597 ns |
0.99 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) |
191180395.5 ns |
358019500 ns |
0.53 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) |
347504021 ns |
348616458.5 ns |
1.00 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) |
303868146 ns |
304684062.5 ns |
1.00 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) |
726867708 ns |
726741083 ns |
1.00 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA |
13917534 ns |
14313431.5 ns |
0.97 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) |
301366583.5 ns |
420910145.5 ns |
0.72 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) |
420040000 ns |
427953667 ns |
0.98 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) |
725917916.5 ns |
711470292 ns |
1.02 |
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) |
718330750 ns |
718110625 ns |
1.00 |
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) |
1914000 ns |
1783333.5 ns |
1.07 |
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) |
1577999.5 ns |
1377417 ns |
1.15 |
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) |
1563229 ns |
1380791 ns |
1.13 |
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) |
2670709 ns |
2616709 ns |
1.02 |
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA |
575822 ns |
569443 ns |
1.01 |
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) |
6147833 ns |
9249354 ns |
0.66 |
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) |
13077125 ns |
15832708.5 ns |
0.83 |
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) |
31966833 ns |
32885020.5 ns |
0.97 |
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) |
10221083.5 ns |
10214250 ns |
1.00 |
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA |
1411574.5 ns |
1406558.5 ns |
1.00 |
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) |
18784479.5 ns |
22309667 ns |
0.84 |
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) |
23789541 ns |
28394500 ns |
0.84 |
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) |
50722250 ns |
56878750 ns |
0.89 |
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) |
18859041 ns |
18878041 ns |
1.00 |
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) |
72708 ns |
690833.5 ns |
0.11 |
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) |
645208.5 ns |
613625 ns |
1.05 |
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) |
1037333 ns |
1078916 ns |
0.96 |
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) |
723792 ns |
724417 ns |
1.00 |
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA |
48478 ns |
47653 ns |
1.02 |
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) |
321229 ns |
1550500 ns |
0.21 |
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) |
1073041.5 ns |
1006604.5 ns |
1.07 |
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) |
1391333.5 ns |
1431333.5 ns |
0.97 |
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) |
2244937.5 ns |
2290167 ns |
0.98 |
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA |
216787 ns |
227007.5 ns |
0.95 |
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) |
432104 ns |
1559479 ns |
0.28 |
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) |
1120771 ns |
1065562.5 ns |
1.05 |
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) |
1403125 ns |
1941250 ns |
0.72 |
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) |
2259020.5 ns |
2187500 ns |
1.03 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) |
3047209 ns |
3412458 ns |
0.89 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) |
2067438 ns |
2060333 ns |
1.00 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) |
2509271 ns |
2504750 ns |
1.00 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) |
6022208 ns |
6004208.5 ns |
1.00 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA |
582721 ns |
571869.5 ns |
1.02 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) |
23636354 ns |
24064937.5 ns |
0.98 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) |
17233750 ns |
17186562.5 ns |
1.00 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) |
17161270.5 ns |
17163520.5 ns |
1.00 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) |
37565083.5 ns |
37576333 ns |
1.00 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA |
3117456 ns |
3169039 ns |
0.98 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) |
33371978.5 ns |
53946459 ns |
0.62 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) |
87222979.5 ns |
83764604.5 ns |
1.04 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) |
169899604.5 ns |
175113292 ns |
0.97 |
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) |
44498458.5 ns |
44468375 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) |
122773083 ns |
250717708 ns |
0.49 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) |
148369750 ns |
148723729 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) |
115599833 ns |
116337041.5 ns |
0.99 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) |
447883812 ns |
447560562.5 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA |
5473384 ns |
5458848 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) |
472111084 ns |
1101190667 ns |
0.43 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) |
855488604.5 ns |
856965729.5 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) |
827233750 ns |
828981916.5 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) |
1751025791 ns |
1751973959 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA |
32267631 ns |
29300703 ns |
1.10 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) |
641085708 ns |
1020791479.5 ns |
0.63 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) |
974367167 ns |
981034709 ns |
0.99 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) |
1299244125 ns |
1298484958 ns |
1.00 |
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) |
1727066979 ns |
1724676458.5 ns |
1.00 |
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) |
1281875 ns |
1192334 ns |
1.08 |
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) |
926208 ns |
722208.5 ns |
1.28 |
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) |
925000 ns |
802271 ns |
1.15 |
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) |
2055687 ns |
2055959 ns |
1.00 |
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA |
572649.5 ns |
554738 ns |
1.03 |
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) |
2956041 ns |
5970125 ns |
0.50 |
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) |
6528541.5 ns |
9028833 ns |
0.72 |
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) |
25140500 ns |
27064125 ns |
0.93 |
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) |
7117667 ns |
7113729 ns |
1.00 |
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA |
1376682 ns |
1360766 ns |
1.01 |
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) |
6629417 ns |
9717625 ns |
0.68 |
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) |
13121959 ns |
16161979 ns |
0.81 |
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) |
31633541.5 ns |
34006416.5 ns |
0.93 |
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) |
7727083 ns |
7613041 ns |
1.01 |
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) |
39000 ns |
386625 ns |
0.10 |
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) |
369125 ns |
466375 ns |
0.79 |
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) |
2060917 ns |
2797833 ns |
0.74 |
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) |
91500 ns |
91041.5 ns |
1.01 |
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA |
28479 ns |
28215 ns |
1.01 |
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) |
175271 ns |
410729 ns |
0.43 |
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) |
456875 ns |
458375 ns |
1.00 |
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) |
4747958 ns |
4385625 ns |
1.08 |
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) |
278792 ns |
273062.5 ns |
1.02 |
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA |
222945.5 ns |
212092.5 ns |
1.05 |
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) |
442208 ns |
682000 ns |
0.65 |
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) |
729125 ns |
731083.5 ns |
1.00 |
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) |
4992042 ns |
4635250 ns |
1.08 |
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) |
510708.5 ns |
510917 ns |
1.00 |
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) |
13875 ns |
329062.5 ns |
0.042165242165242166 |
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) |
307479 ns |
405333 ns |
0.76 |
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) |
742729 ns |
775209 ns |
0.96 |
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) |
54562.5 ns |
53250 ns |
1.02 |
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA |
28484 ns |
27988 ns |
1.02 |
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) |
25708 ns |
358354 ns |
0.07173911830201421 |
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) |
338916.5 ns |
340937.5 ns |
0.99 |
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) |
863958 ns |
667854 ns |
1.29 |
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) |
151875 ns |
151583 ns |
1.00 |
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA |
212558.5 ns |
199391.5 ns |
1.07 |
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) |
45958 ns |
372916.5 ns |
0.12 |
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) |
353000 ns |
354896 ns |
0.99 |
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) |
707124.5 ns |
585667 ns |
1.21 |
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) |
151208 ns |
151375 ns |
1.00 |
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) |
319447750 ns |
600844791 ns |
0.53 |
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) |
426610458.5 ns |
434479500 ns |
0.98 |
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) |
375378312 ns |
395023625 ns |
0.95 |
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) |
871258750 ns |
872456875 ns |
1.00 |
vgg16(32, 32, 3, 64)/forward/GPU/CUDA |
7669123 ns |
7629063.5 ns |
1.01 |
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) |
1100131354 ns |
1996796291.5 ns |
0.55 |
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) |
1622570062 ns |
1637741500 ns |
0.99 |
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) |
1650938062.5 ns |
1582333333.5 ns |
1.04 |
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) |
2689218667 ns |
2658961958 ns |
1.01 |
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA |
27018808 ns |
26619843 ns |
1.01 |
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) |
192917 ns |
532479 ns |
0.36 |
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) |
438750 ns |
405208 ns |
1.08 |
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) |
1662083 ns |
2880604.5 ns |
0.58 |
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) |
872375 ns |
877791.5 ns |
0.99 |
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA |
48456 ns |
47573 ns |
1.02 |
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) |
1206375 ns |
1905250 ns |
0.63 |
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) |
2525667 ns |
1799584 ns |
1.40 |
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) |
14712021 ns |
16464375 ns |
0.89 |
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) |
2780833 ns |
2818750 ns |
0.99 |
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA |
229439 ns |
239346.5 ns |
0.96 |
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) |
2304041.5 ns |
2932000 ns |
0.79 |
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) |
5195271 ns |
4975687.5 ns |
1.04 |
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) |
15008833 ns |
16759417 ns |
0.90 |
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) |
3672875 ns |
3748708 ns |
0.98 |
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) |
1605250.5 ns |
1367812.5 ns |
1.17 |
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) |
1257417 ns |
930041 ns |
1.35 |
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) |
1184583 ns |
1056709 ns |
1.12 |
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) |
2332750 ns |
2313729.5 ns |
1.01 |
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA |
564450.5 ns |
567030 ns |
1.00 |
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) |
3206145.5 ns |
5196958 ns |
0.62 |
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) |
4660438 ns |
8601584 ns |
0.54 |
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) |
25391500 ns |
26184083.5 ns |
0.97 |
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) |
7348916 ns |
7337728.5 ns |
1.00 |
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA |
1322967.5 ns |
1330201.5 ns |
0.99 |
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) |
8828437.5 ns |
11580375 ns |
0.76 |
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) |
14136583 ns |
18587958.5 ns |
0.76 |
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) |
34282271 ns |
37621062.5 ns |
0.91 |
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) |
9545875 ns |
9557791 ns |
1.00 |
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) |
2417 ns |
3041 ns |
0.79 |
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) |
2583 ns |
2792 ns |
0.93 |
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) |
3000 ns |
3375 ns |
0.89 |
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) |
2875 ns |
2854 ns |
1.01 |
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA |
24442 ns |
25102 ns |
0.97 |
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) |
7500 ns |
7125 ns |
1.05 |
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) |
6917 ns |
6958 ns |
0.99 |
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) |
7333 ns |
7875 ns |
0.93 |
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) |
7500 ns |
7083 ns |
1.06 |
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA |
186841 ns |
201877.5 ns |
0.93 |
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) |
8666 ns |
8292 ns |
1.05 |
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) |
8208 ns |
8333 ns |
0.98 |
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) |
8500 ns |
8542 ns |
1.00 |
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) |
6000 ns |
5958 ns |
1.01 |
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) |
10708.5 ns |
10813 ns |
0.99 |
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) |
15917 ns |
13916 ns |
1.14 |
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) |
10666.5 ns |
11312.5 ns |
0.94 |
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) |
8583 ns |
7709 ns |
1.11 |
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA |
24811 ns |
25316 ns |
0.98 |
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) |
21875 ns |
21583 ns |
1.01 |
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) |
21375 ns |
21625 ns |
0.99 |
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) |
21834 ns |
21708 ns |
1.01 |
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) |
21625 ns |
21500 ns |
1.01 |
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA |
196501.5 ns |
219161 ns |
0.90 |
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) |
56667 ns |
53500 ns |
1.06 |
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) |
53417 ns |
53458 ns |
1.00 |
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) |
53583 ns |
53542 ns |
1.00 |
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) |
51125 ns |
51166.5 ns |
1.00 |
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) |
28584 ns |
28292 ns |
1.01 |
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) |
28833 ns |
28792 ns |
1.00 |
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) |
29166.5 ns |
28375 ns |
1.03 |
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) |
46083 ns |
46125 ns |
1.00 |
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA |
25667 ns |
26235 ns |
0.98 |
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) |
44000 ns |
229583 ns |
0.19 |
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) |
272416 ns |
277792 ns |
0.98 |
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) |
4134937.5 ns |
4446854.5 ns |
0.93 |
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) |
150250 ns |
145500 ns |
1.03 |
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA |
167133.5 ns |
197661 ns |
0.85 |
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) |
68375 ns |
246666.5 ns |
0.28 |
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) |
290000 ns |
296000 ns |
0.98 |
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) |
4134250 ns |
4144084 ns |
1.00 |
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) |
145834 ns |
145750 ns |
1.00 |
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) |
1708 ns |
1834 ns |
0.93 |
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) |
1875 ns |
1750 ns |
1.07 |
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) |
2208 ns |
2500 ns |
0.88 |
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) |
2000 ns |
3750 ns |
0.53 |
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA |
23056 ns |
23319.5 ns |
0.99 |
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) |
5542 ns |
5292 ns |
1.05 |
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) |
5125 ns |
5000 ns |
1.02 |
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) |
5459 ns |
5416 ns |
1.01 |
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) |
5250 ns |
5000 ns |
1.05 |
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA |
171241.5 ns |
226307 ns |
0.76 |
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) |
8209 ns |
7459 ns |
1.10 |
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) |
7333 ns |
7375 ns |
0.99 |
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) |
7708 ns |
7792 ns |
0.99 |
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) |
5208 ns |
5042 ns |
1.03 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) |
34121125 ns |
81067334 ns |
0.42 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) |
49829583 ns |
48673125 ns |
1.02 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) |
45610000 ns |
43747500 ns |
1.04 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) |
153620250 ns |
153700375 ns |
1.00 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA |
2656287 ns |
2718893 ns |
0.98 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) |
398908813 ns |
621060459 ns |
0.64 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) |
429050459 ns |
430659541 ns |
1.00 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) |
411940459 ns |
409758041.5 ns |
1.01 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) |
700408042 ns |
699041292 ns |
1.00 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA |
15233782 ns |
15621337.5 ns |
0.98 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) |
746953208 ns |
875541666.5 ns |
0.85 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) |
839728187.5 ns |
845831187.5 ns |
0.99 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) |
1151171375 ns |
1160340833.5 ns |
0.99 |
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) |
1175552896 ns |
1177842604 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
till EnzymeAD/Enzyme.jl#1358 gets resolved lets run CI only on 1.10