arm neon optimization for layernorm fp32/bf16s/fp16s #5038
linux-aarch64-cpu-gcc.yml
on: pull_request
linux-gcc
27m 16s
linux-gcc-arm82
21m 51s
linux-gcc-arm86
12m 9s