Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ARM Neon and scalar implementations of SIMD functions #359

Merged
merged 4 commits into from
Aug 31, 2022

Commits on Jun 17, 2022

  1. Make _mm_load_si128() explicit

    The previous code implicitly caused a load; change it so the load
    intrinsic is explicitly invoked, as the others are. (This in fact
    makes no difference to the generated code.)
    jmarshall committed Jun 17, 2022
    Configuration menu
    Copy the full SHA
    ab01ab4 View commit details
    Browse the repository at this point in the history

Commits on Jun 20, 2022

  1. On ARM, rewrite SSE2 SIMD calls using Neon intrinsics

    Many Intel intrinsics have a corresponding Neon equivalent.
    Other cases are more interesting:
    
    * Neon's vmaxvq directly selects the maximum entry in a vector,
      so can be used to implement both the __max_16/__max_8 macros
      and the _mm_movemask_epi8 early loop exit. Introduce additional
      helper macros alongside __max_16/__max_8 so that the early loop
      exit can similarly be implemented differently on the two platforms.
    
    * Full-width shifts can be done via vextq. This is defined close to
      the ksw_u8()/ksw_i16() functions (rather than in neon_sse.h) as it
      implicitly uses one of their local variables.
    
    * ksw_i16() uses saturating *signed* 16-bit operations apart from
      _mm_subs_epu16; presumably the data is effectively still signed but
      we wish to keep it non-negative. The ARM intrinsics are more careful
      about type checking, so this requires an extra U16() helper macro.
    jmarshall committed Jun 20, 2022
    Configuration menu
    Copy the full SHA
    165e524 View commit details
    Browse the repository at this point in the history

Commits on Jun 26, 2022

  1. Configuration menu
    Copy the full SHA
    ac612b6 View commit details
    Browse the repository at this point in the history

Commits on Jun 27, 2022

  1. Use native SSE2 intrinsics on i386 as well as x86-64

    Make the native SSE2 code conditional on __SSE2__, which is defined
    by GCC/Clang/etc on x86-64 by default and on i386 with -msse2 etc.
    jmarshall committed Jun 27, 2022
    Configuration menu
    Copy the full SHA
    c77ace7 View commit details
    Browse the repository at this point in the history