-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable sha512 instruction extension on Apple silicon #1960
Conversation
Tested on M1 and time consumed by sha512 is reduced to half.
#[cfg(all(target_os = "macos", target_arch = "aarch64"))] | ||
fn detect_features() -> u32 { | ||
1 << 6 // sha512 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above in the features!
macro invocation, we have:
// Keep in sync with `ARMV8_SHA256`.
"sha2" => SHA256 {
mask: 1 << 4,
},
Apparently in C and in Rust, the "sha3" Aarch64 CPU feature flag is overloaded for both SHA-3 and SHA-512 CPU features. So you should add an entry:
// Keep in sync with `ARMV8_SHA512`.
// "sha3" is overloaded for both SHA-3 and SHA512.
"sha3" => SHA512 {
mask: 1 << 6,
},
Then you will not need to add a detect_features()
because it will be statically detected on aarch64-apple-darwin:
$ rustc --print=cfg --target aarch64-apple-darwin | grep sha
target_feature="sha2"
target_feature="sha3"
# MSRV
$ rustc +1.61.0 --print=cfg --target aarch64-apple-darwin | grep sha
target_feature="sha2"
target_feature="sha3"
Also below, add an assertion analogous to const _AARCH64_APPLE_TARGETS_EXPECTED_FEATURES
called _AARCH64_APPLE_DARWIN_TARGETS_EXPECTED_FEATURES
that asserts that the "sha3" feature is enabled statically for aarch64-apple-darwin.
If you want to do the same for aarch64-apple-ios, then we have to use dynamic feature detection because aarch64-apple-ios doesn't enable it statically:
$ rustc +1.61.0 --print=cfg --target aarch64-apple-ios | grep sha
target_feature="sha2"
The BoringSSL implementation of this at https://boringssl.googlesource.com/boringssl/+/1e15682f1a4bb64c48b84884976a2b5c4201e878%5E%21/ demonstrates how to do that dynamic feature detection for iOS and also for other targets.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1960 +/- ##
=======================================
Coverage 96.29% 96.29%
=======================================
Files 135 135
Lines 20663 20669 +6
Branches 226 226
=======================================
+ Hits 19898 19904 +6
Misses 730 730
Partials 35 35 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these improvements. This is looking pretty good to me.
I would appreciate it if you could, when you make the requested changes, squash everything into a single commit.
I can see that the code coverage measurement is not going to like this as we don't have CI for non-macos Apple targets. So as a temporary measure, we'll need to add a second commit that filters out SHA512.mask
from ARMCAP_STATIC
if the target is an apple target, so that we ignore the statically-indicated presence of the feature and then fall back to the dynamic detection logic on macosx too (all apple targets). This way we can meaningfully say we are covering this code path to some extent.
let mut value: core::ffi::c_int = 0; | ||
let mut len = core::mem::size_of_val(&value); | ||
let rc = unsafe { | ||
sysctlbyname( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment "// TODO(MSRV 1.77): Use c"..."
literal."
oldlenp: *mut crate::c::size_t, | ||
newp: *mut core::ffi::c_void, | ||
newlen: crate::c::size_t, | ||
) -> core::ffi::c_int; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core::ffi::c_int
is too new for ring to use because of the MSRV, so you have to use c::int
. (c_void
is fine to use.)
I think we should just use the libc crate to call this, if that works. We have a goal to avoid libc dependencies but it seems like this function is provided by the OS's libc so we can't get around libc anyway, just like with getauxval
on Linux/Android. And hopefully using the libc crate to call it will handle any/all linking-related aspects.
let rc = unsafe { | ||
sysctlbyname( | ||
"hw.optional.armv8_2_sha512\0".as_ptr(), | ||
&mut value as *mut _ as _, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use as _
style casts but the clippy job in CI doesn't run on this target. Instead you can add this polyfill to ring::polyfill
:
// TODO(MSRV 1.76): Replace with `core::ptr::from_mut`.
#[inline(always)]
pub fn ptr_from_mut<T: ?Sized>(r: &mut T) -> *mut T {
<*mut T>::from(r)
}
and then use ptr_from_mut(&mut x).cast::<c_void>()
.
|
||
extern "C" { | ||
fn sysctlbyname( | ||
name: *const u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be c_char
.
let mut len = core::mem::size_of_val(&value); | ||
let rc = unsafe { | ||
sysctlbyname( | ||
"hw.optional.armv8_2_sha512\0".as_ptr(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need a .cast<c_char>()
here.
@@ -172,13 +180,46 @@ fn detect_features() -> u32 { | |||
features | |||
} | |||
|
|||
#[cfg(all(target_os = "ios", target_arch = "aarch64"))] | |||
fn detect_features() -> u32 { | |||
let mut features = ARMCAP_STATIC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let use this:
use core::{mem::size_of_val, ffi:{...}, ...};
use c::{...};
to avoid having to qualify everything.
0, | ||
) | ||
}; | ||
if rc == 0 && len == core::mem::size_of_val(&value) && value != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better for us to have wrapper for sysctlbyname
that looks something like this:
fn detect_feature(null_terminated_name: &[c_char]) -> bool {
debug_assert_eq!(null_terminated_name.last(), Some(0));
let mut value = 0;
let mut len = core::mem::size_of_val(&value);
... yada yada ...
if rc != 0 {
return false;
}
if len != size_of_val(&value) {
return false; // unreachable
}
value != 0
}
This will allow code coverage measurement to show us in more detail if we're triggering the error/impossible condition(s).
This was merged as part of PR #1978. Thanks for submitting this! |
Tested on M1 and time consumed by sha512 is reduced to half.