You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current official version of libtorch only works on a single thread, no matter the settings:
As we can see, the above resnet process has 23 threads, with only one of them activated. The reason seems to be that the official version of libtorch doesn't link with libmkl.dylib:
$ otool -L macos/libtorch/lib/libtorch_cpu.dylib
macos/libtorch/lib/libtorch_cpu.dylib:
@rpath/libtorch_cpu.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libtensorpipe.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libiomp5.dylib (compatibility version 5.0.0, current version 5.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.50.4)
@rpath/libc10.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 400.9.0)
As a result, resnet with the official libtorch on macOS has only a throughput of about 1.55 samples/sec.
In contrast, the Homebrew version of libtorch works well with multi-thread (the Homebrew version links the Apple Accelarate Framework rather than libmkl.dylib):
otool -L /usr/local/Cellar/libtorch/1.6.0_1/lib/libtorch_cpu.dylib
/usr/local/Cellar/libtorch/1.6.0_1/lib/libtorch_cpu.dylib:
/usr/local/opt/libtorch/lib/libtorch_cpu.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/local/opt/libomp/lib/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)
/usr/local/opt/protobuf/lib/libprotobuf.24.dylib (compatibility version 25.0.0, current version 25.0.0)
/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.100.1)
@rpath/libc10.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 902.1.0)
The Homebrew libtorch starts about 50 threads and has 6-10 of them running:
As a result, resnet on macOS with the homebrew libtorch has a throughput of about 2.21~3.01 samples/sec, about 40%~100% faster than the official version.
The text was updated successfully, but these errors were encountered:
The current official version of libtorch only works on a single thread, no matter the settings:
As we can see, the above resnet process has 23 threads, with only one of them activated. The reason seems to be that the official version of libtorch doesn't link with libmkl.dylib:
As a result, resnet with the official libtorch on macOS has only a throughput of about 1.55 samples/sec.
In contrast, the Homebrew version of libtorch works well with multi-thread (the Homebrew version links the Apple Accelarate Framework rather than libmkl.dylib):
The Homebrew
libtorch
starts about 50 threads and has 6-10 of them running:As a result, resnet on macOS with the homebrew libtorch has a throughput of about 2.21~3.01 samples/sec, about 40%~100% faster than the official version.
The text was updated successfully, but these errors were encountered: