Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(xxhash3): Support LASX instruction set and refactor LSX implement #996

Merged
merged 2 commits into from
Jan 17, 2025

Conversation

24bit-xjkp
Copy link
Contributor

  1. Use __lsx_vmul_d dircetly instead of using 2 32-bit multiply to emulate a 64-bit multiply.
  2. Add LASX support.

1. Use __lsx_vmul_d dircetly instead of using 2 32-bit multiply to emulate a 64-bit multiply.
2. Add LASX support.
@24bit-xjkp
Copy link
Contributor Author

LSX:

xxhsum 0.8.3 by Yann Collet
compiled as 64-bit loongarch little endian with GCC 15.0.0 20240714 (experimental)
Sample of 100 KB...
 1#XXH32                         :     102400 ->    49781 it/s ( 4861.5 MB/s)
 2#XXH32 unaligned               :     102400 ->    49787 it/s ( 4862.0 MB/s)
 3#XXH64                         :     102400 ->   121442 it/s (11859.6 MB/s)
 4#XXH64 unaligned               :     102400 ->   117311 it/s (11456.1 MB/s)
 5#XXH3_64b                      :     102400 ->   207791 it/s (20292.1 MB/s)
 6#XXH3_64b unaligned            :     102400 ->   190468 it/s (18600.4 MB/s)
 7#XXH3_64b w/seed               :     102400 ->   208711 it/s (20381.9 MB/s)
 8#XXH3_64b w/seed unaligned     :     102400 ->   192219 it/s (18771.4 MB/s)
 9#XXH3_64b w/secret             :     102400 ->   158258 it/s (15454.9 MB/s)
10#XXH3_64b w/secret unaligned   :     102400 ->   146024 it/s (14260.1 MB/s)
11#XXH128                        :     102400 ->   207481 it/s (20261.8 MB/s)
12#XXH128 unaligned              :     102400 ->   190216 it/s (18575.8 MB/s)
13#XXH128 w/seed                 :     102400 ->   207458 it/s (20259.6 MB/s)
14#XXH128 w/seed unaligned       :     102400 ->   191485 it/s (18699.7 MB/s)
15#XXH128 w/secret               :     102400 ->   154919 it/s (15128.8 MB/s)
16#XXH128 w/secret unaligned     :     102400 ->   147794 it/s (14433.0 MB/s)
17#XXH32_stream                  :     102400 ->    60791 it/s ( 5936.6 MB/s)
18#XXH32_stream unaligned        :     102400 ->    59631 it/s ( 5823.3 MB/s)
19#XXH64_stream                  :     102400 ->   121414 it/s (11856.8 MB/s)
20#XXH64_stream unaligned        :     102400 ->   119730 it/s (11692.4 MB/s)
21#XXH3_stream                   :     102400 ->   199493 it/s (19481.7 MB/s)
22#XXH3_stream unaligned         :     102400 ->   193645 it/s (18910.7 MB/s)
23#XXH3_stream w/seed            :     102400 ->   199307 it/s (19463.6 MB/s)
24#XXH3_stream w/seed unaligned  :     102400 ->   193333 it/s (18880.2 MB/s)
25#XXH128_stream                 :     102400 ->   199387 it/s (19471.3 MB/s)
26#XXH128_stream unaligned       :     102400 ->   193626 it/s (18908.8 MB/s)
27#XXH128_stream w/seed          :     102400 ->   199291 it/s (19462.0 MB/s)
28#XXH128_stream w/seed unaligne :     102400 ->   193416 it/s (18888.3 MB/s)

LASX:

xxhsum 0.8.3 by Yann Collet
compiled as 64-bit loongarch little endian with GCC 15.0.0 20240714 (experimental)
Sample of 100 KB...
 1#XXH32                         :     102400 ->    48038 it/s ( 4691.2 MB/s)
 2#XXH32 unaligned               :     102400 ->    49793 it/s ( 4862.6 MB/s)
 3#XXH64                         :     102400 ->   121443 it/s (11859.6 MB/s)
 4#XXH64 unaligned               :     102400 ->   117315 it/s (11456.6 MB/s)
 5#XXH3_64b                      :     102400 ->   277294 it/s (27079.5 MB/s)
 6#XXH3_64b unaligned            :     102400 ->   277269 it/s (27077.1 MB/s)
 7#XXH3_64b w/seed               :     102400 ->   276368 it/s (26989.1 MB/s)
 8#XXH3_64b w/seed unaligned     :     102400 ->   276325 it/s (26984.9 MB/s)
 9#XXH3_64b w/secret             :     102400 ->   265865 it/s (25963.3 MB/s)
10#XXH3_64b w/secret unaligned   :     102400 ->   265864 it/s (25963.3 MB/s)
11#XXH128                        :     102400 ->   276331 it/s (26985.5 MB/s)
12#XXH128 unaligned              :     102400 ->   276379 it/s (26990.1 MB/s)
13#XXH128 w/seed                 :     102400 ->   274741 it/s (26830.2 MB/s)
14#XXH128 w/seed unaligned       :     102400 ->   274736 it/s (26829.7 MB/s)
15#XXH128 w/secret               :     102400 ->   264432 it/s (25823.4 MB/s)
16#XXH128 w/secret unaligned     :     102400 ->   264332 it/s (25813.7 MB/s)
17#XXH32_stream                  :     102400 ->    60791 it/s ( 5936.6 MB/s)
18#XXH32_stream unaligned        :     102400 ->    59625 it/s ( 5822.8 MB/s)
19#XXH64_stream                  :     102400 ->   121413 it/s (11856.7 MB/s)
20#XXH64_stream unaligned        :     102400 ->   119726 it/s (11692.0 MB/s)
21#XXH3_stream                   :     102400 ->   274586 it/s (26815.0 MB/s)
22#XXH3_stream unaligned         :     102400 ->   274407 it/s (26797.6 MB/s)
23#XXH3_stream w/seed            :     102400 ->   273574 it/s (26716.3 MB/s)
24#XXH3_stream w/seed unaligned  :     102400 ->   273352 it/s (26694.6 MB/s)
25#XXH128_stream                 :     102400 ->   275256 it/s (26880.4 MB/s)
26#XXH128_stream unaligned       :     102400 ->   275257 it/s (26880.5 MB/s)
27#XXH128_stream w/seed          :     102400 ->   274371 it/s (26794.1 MB/s)
28#XXH128_stream w/seed unaligne :     102400 ->   274349 it/s (26791.9 MB/s)

@24bit-xjkp
Copy link
Contributor Author

The benchmark of previous implement for LSX is in #981.

@Cyan4973 Cyan4973 self-assigned this Jan 13, 2025
xxhash.h Outdated
@@ -1125,6 +1125,7 @@ XXH_PUBLIC_API XXH_PUREF XXH64_hash_t XXH64_hashFromCanonical(XXH_NOESCAPE const
# define XXH_VSX 5 /*!< VSX and ZVector for POWER8/z13 (64-bit) */
# define XXH_SVE 6 /*!< SVE for some ARMv8-A and ARMv9-A */
# define XXH_LSX 7 /*!< LSX (128-bit SIMD) for LoongArch64 */
# define XXH_LASX 8 /*!< LASX (256-bit SIMD) for LoongArch64 */
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: alignment

@Cyan4973
Copy link
Owner

Cyan4973 commented Jan 13, 2025

Great work @24bit-xjkp !

I mostly have some comments to make sure the context is well understood.

  • LSX and LASX are both vector extensions specific to LoongArch. LSX is 128-bit, while LASX is 256-bit.
  • The LSX implementation is improved in this PR, but since the modification only impacts the scrambling stage, performance change is expected to be minor.

Also a request :

compiled as 64-bit loongarch little endian with GCC 15.0.0 20240714 (experimental)

It would be great to have an indication of which mode (scalar/lsx/lasx) is being compiled in the welcome message, this would help identify with certainty which mode is being benchmarked.

The display logic is mostly there : https://github.com/Cyan4973/xxHash/blob/dev/cli/xsum_arch.h#L165

@24bit-xjkp
Copy link
Contributor Author

I have added the SIMD extension we used to the welcome message.

The LSX implementation is improved in this PR, but since the modification only impacts the scrambling stage, performance change is expected to be minor.

Yes, I just fix the improper use of 32-bit multiply since LSX and LASX can execute a 64-bit multiply as fast as a 32-bit multiply. This also slightly reduces the size of binary.

… LoongArch

1. Display the mode which is used as below:
"loongarch64 + lasx" -> LoongArch64 platform with LoongArch Advanced SIMD Extension
"loongarch64 + lsx"  -> LoongArch64 platform with LoongArch SIMD Extension
"loongarch64"        -> LoongArch64 platform, use scalar implement
2. Align the define in xxhash.h
@24bit-xjkp
Copy link
Contributor Author

Shall we merge this pr?

@Cyan4973
Copy link
Owner

Cyan4973 commented Jan 17, 2025

Yes!

@Cyan4973 Cyan4973 merged commit 51fa4ef into Cyan4973:dev Jan 17, 2025
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants