Skip to content

Simple patched glibc memcpy implementation to fix issues with Cortex-A72 and device memory.

License

Notifications You must be signed in to change notification settings

jnettlet/cortex_a72_memcpy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A simple memcpy replacement that only changes the orderig of
2 stores in the 96 byte code path in order to fix an issue when
copying from userspace to memmapped device memory like framebuffers
or perhaps when using a toolkit like SPDK.

The build is done just by issueing the make command and then you
can copy the libmemcpy.so to any preferred library path.

The library can be tested by using LD_PRELOAD="./libmemcpy.so" application
or be made globablly available by adding the full path to the library
into the file /etc/ld.so.preload.

We are now pretty equivalent or faster than glibc's memcpy and Arm's optimized
library memcpy on the optimized library's memcpy benchmark. For the cortex-a72
we are definitely providing the best performance.

Random memcpy (bytes/ns):
          __memcpy_a72 32K: 2.74 64K: 2.41 128K: 2.20 256K: 2.04 512K: 1.73 1024K: 1.47 avg 2.01
      __memcpy_aarch64 32K: 2.65 64K: 2.40 128K: 2.22 256K: 2.04 512K: 1.74 1024K: 1.49 avg 2.01
 __memcpy_aarch64_simd 32K: 2.64 64K: 2.39 128K: 2.21 256K: 2.06 512K: 1.75 1024K: 1.50 avg 2.02
                memcpy 32K: 2.58 64K: 2.32 128K: 2.16 256K: 2.00 512K: 1.72 1024K: 1.47 avg 1.97
           memcpy_call 32K: 2.72 64K: 2.42 128K: 2.22 256K: 2.06 512K: 1.75 1024K: 1.48 avg 2.02

Aligned medium memcpy (bytes/ns):
          __memcpy_a72 8B: 1.95 16B: 4.40 32B: 8.79 64B: 12.80 128B: 12.79 256B: 15.58 512B: 16.55 
      __memcpy_aarch64 8B: 1.35 16B: 2.93 32B: 5.86 64B: 10.83 128B: 11.73 256B: 15.57 512B: 16.56 
 __memcpy_aarch64_simd 8B: 1.35 16B: 2.93 32B: 5.86 64B: 11.73 128B: 11.73 256B: 14.81 512B: 16.08 
                memcpy 8B: 1.17 16B: 2.51 32B: 5.03 64B: 9.38 128B: 11.73 256B: 15.21 512B: 16.32 
           memcpy_call 8B: 1.60 16B: 3.52 32B: 7.04 64B: 11.73 128B: 11.73 256B: 15.21 512B: 16.32 

Unaligned medium memcpy (bytes/ns):
          __memcpy_a72 8B: 1.95 16B: 4.40 32B: 8.80 64B: 8.46 128B: 10.42 256B: 13.05 512B: 14.24 
      __memcpy_aarch64 8B: 1.91 16B: 4.40 32B: 8.80 64B: 8.40 128B: 7.42 256B: 13.06 512B: 14.24 
 __memcpy_aarch64_simd 8B: 1.83 16B: 4.40 32B: 8.80 64B: 8.77 128B: 7.22 256B: 10.62 512B: 10.99 
                memcpy 8B: 1.55 16B: 3.52 32B: 7.03 64B: 9.35 128B: 7.77 256B: 11.94 512B: 14.24 
           memcpy_call 8B: 1.60 16B: 3.52 32B: 7.04 64B: 9.60 128B: 7.70 256B: 10.99 512B: 12.94 

Large memcpy (bytes/ns):
          __memcpy_a72 1K: 17.06 2K: 17.32 4K: 16.99 8K: 17.26 16K: 17.11 32K: 15.01 64K: 12.32 
      __memcpy_aarch64 1K: 17.06 2K: 17.31 4K: 16.99 8K: 17.24 16K: 17.13 32K: 15.04 64K: 12.32 
 __memcpy_aarch64_simd 1K: 16.80 2K: 17.19 4K: 16.83 8K: 16.98 16K: 16.96 32K: 14.73 64K: 12.21 
                memcpy 1K: 16.93 2K: 17.25 4K: 16.96 8K: 17.25 16K: 16.90 32K: 14.99 64K: 12.32 
           memcpy_call 1K: 16.93 2K: 17.25 4K: 16.96 8K: 17.20 16K: 16.80 32K: 15.00 64K: 12.32 

Unaligned forwards memmove (bytes/ns):
          __memcpy_a72 1K: 16.10 2K: 16.80 4K: 16.65 8K: 17.04 16K: 16.61 32K: 14.61 64K: 11.96 
      __memcpy_aarch64 1K: 16.09 2K: 16.78 4K: 16.62 8K: 17.03 16K: 16.62 32K: 14.60 64K: 11.96 
 __memcpy_aarch64_simd 1K: 10.76 2K: 10.69 4K: 10.63 8K: 10.63 16K: 10.43 32K: 9.94 64K: 10.10 
                memcpy 1K: 15.89 2K: 16.63 4K: 16.66 8K: 17.04 16K: 16.20 32K: 14.47 64K: 11.98 

Unaligned backwards memmove (bytes/ns):
          __memcpy_a72 1K: 15.80 2K: 16.68 4K: 16.48 8K: 17.01 16K: 16.70 32K: 14.64 64K: 11.69 
      __memcpy_aarch64 1K: 16.15 2K: 16.68 4K: 16.59 8K: 17.05 16K: 16.68 32K: 14.65 64K: 11.67 
 __memcpy_aarch64_simd 1K: 11.96 2K: 12.07 4K: 11.98 8K: 12.11 16K: 12.02 32K: 11.54 64K: 11.05 
                memcpy 1K: 15.95 2K: 16.72 4K: 16.70 8K: 17.14 16K: 16.08 32K: 14.06 64K: 6.54 

About

Simple patched glibc memcpy implementation to fix issues with Cortex-A72 and device memory.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published