Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable sha512 instruction extension on Apple silicon #1960

Closed
wants to merge 2 commits into from

Conversation

BusyJay
Copy link
Contributor

@BusyJay BusyJay commented Feb 20, 2024

Tested on M1 and time consumed by sha512 is reduced to half.

digest::oneshot::sha512/0
                        time:   [93.800 ns 94.427 ns 95.059 ns]
                        change: [-55.464% -55.238% -55.020%] (p = 0.00 < 0.05)
                        Performance has improved.

Tested on M1 and time consumed by sha512 is reduced to half.
#[cfg(all(target_os = "macos", target_arch = "aarch64"))]
fn detect_features() -> u32 {
1 << 6 // sha512
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above in the features! macro invocation, we have:

    // Keep in sync with `ARMV8_SHA256`.
    "sha2" => SHA256 {
        mask: 1 << 4,
    },

Apparently in C and in Rust, the "sha3" Aarch64 CPU feature flag is overloaded for both SHA-3 and SHA-512 CPU features. So you should add an entry:

    // Keep in sync with `ARMV8_SHA512`.
    // "sha3" is overloaded for both SHA-3 and SHA512.
    "sha3" => SHA512 {
        mask: 1 << 6,
    },

Then you will not need to add a detect_features() because it will be statically detected on aarch64-apple-darwin:

$ rustc --print=cfg --target aarch64-apple-darwin | grep sha
target_feature="sha2"
target_feature="sha3"
# MSRV
$ rustc +1.61.0 --print=cfg --target aarch64-apple-darwin | grep sha
target_feature="sha2"
target_feature="sha3"

Also below, add an assertion analogous to const _AARCH64_APPLE_TARGETS_EXPECTED_FEATURES called _AARCH64_APPLE_DARWIN_TARGETS_EXPECTED_FEATURES that asserts that the "sha3" feature is enabled statically for aarch64-apple-darwin.

If you want to do the same for aarch64-apple-ios, then we have to use dynamic feature detection because aarch64-apple-ios doesn't enable it statically:

$ rustc +1.61.0 --print=cfg --target aarch64-apple-ios | grep sha
target_feature="sha2"

The BoringSSL implementation of this at https://boringssl.googlesource.com/boringssl/+/1e15682f1a4bb64c48b84884976a2b5c4201e878%5E%21/ demonstrates how to do that dynamic feature detection for iOS and also for other targets.

Copy link

codecov bot commented Feb 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.29%. Comparing base (9cb93e0) to head (09860de).
Report is 13 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1960   +/-   ##
=======================================
  Coverage   96.29%   96.29%           
=======================================
  Files         135      135           
  Lines       20663    20669    +6     
  Branches      226      226           
=======================================
+ Hits        19898    19904    +6     
  Misses        730      730           
  Partials       35       35           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Owner

@briansmith briansmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these improvements. This is looking pretty good to me.

I would appreciate it if you could, when you make the requested changes, squash everything into a single commit.

I can see that the code coverage measurement is not going to like this as we don't have CI for non-macos Apple targets. So as a temporary measure, we'll need to add a second commit that filters out SHA512.mask from ARMCAP_STATIC if the target is an apple target, so that we ignore the statically-indicated presence of the feature and then fall back to the dynamic detection logic on macosx too (all apple targets). This way we can meaningfully say we are covering this code path to some extent.

let mut value: core::ffi::c_int = 0;
let mut len = core::mem::size_of_val(&value);
let rc = unsafe {
sysctlbyname(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment "// TODO(MSRV 1.77): Use c"..." literal."

oldlenp: *mut crate::c::size_t,
newp: *mut core::ffi::c_void,
newlen: crate::c::size_t,
) -> core::ffi::c_int;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core::ffi::c_int is too new for ring to use because of the MSRV, so you have to use c::int. (c_void is fine to use.)

I think we should just use the libc crate to call this, if that works. We have a goal to avoid libc dependencies but it seems like this function is provided by the OS's libc so we can't get around libc anyway, just like with getauxval on Linux/Android. And hopefully using the libc crate to call it will handle any/all linking-related aspects.

let rc = unsafe {
sysctlbyname(
"hw.optional.armv8_2_sha512\0".as_ptr(),
&mut value as *mut _ as _,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use as _ style casts but the clippy job in CI doesn't run on this target. Instead you can add this polyfill to ring::polyfill:

// TODO(MSRV 1.76): Replace with `core::ptr::from_mut`.
#[inline(always)]
pub fn ptr_from_mut<T: ?Sized>(r: &mut T) -> *mut T {
    <*mut T>::from(r)
}

and then use ptr_from_mut(&mut x).cast::<c_void>().


extern "C" {
fn sysctlbyname(
name: *const u8,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be c_char.

let mut len = core::mem::size_of_val(&value);
let rc = unsafe {
sysctlbyname(
"hw.optional.armv8_2_sha512\0".as_ptr(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need a .cast<c_char>() here.

@@ -172,13 +180,46 @@ fn detect_features() -> u32 {
features
}

#[cfg(all(target_os = "ios", target_arch = "aarch64"))]
fn detect_features() -> u32 {
let mut features = ARMCAP_STATIC;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let use this:

use core::{mem::size_of_val, ffi:{...}, ...};
use c::{...};

to avoid having to qualify everything.

0,
)
};
if rc == 0 && len == core::mem::size_of_val(&value) && value != 0 {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better for us to have wrapper for sysctlbyname that looks something like this:

fn detect_feature(null_terminated_name: &[c_char]) -> bool {
    debug_assert_eq!(null_terminated_name.last(), Some(0));
    let mut value = 0;
    let mut len = core::mem::size_of_val(&value);
    ... yada yada ...
    if rc != 0 {
      return false;
   }
   if len != size_of_val(&value) {
     return false; // unreachable
   }
   value != 0
}

This will allow code coverage measurement to show us in more detail if we're triggering the error/impossible condition(s).

@briansmith
Copy link
Owner

@BusyJay I have made the tweaks I suggested above in PR #1978. PTAL.

@briansmith
Copy link
Owner

This was merged as part of PR #1978. Thanks for submitting this!

@briansmith briansmith closed this Mar 4, 2024
@BusyJay BusyJay deleted the dev branch March 5, 2024 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants