Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uint256: Introduce package. #2787

Merged
merged 59 commits into from
Nov 20, 2021
Merged

uint256: Introduce package. #2787

merged 59 commits into from
Nov 20, 2021

Conversation

davecgh
Copy link
Member

@davecgh davecgh commented Nov 6, 2021

Profiling the CPU usage during an initial chain sync shows that roughly 65-70% of all time is spent in garbage collection operations. This is primarily the result of a large number of in-use allocations. Profiling the in-use allocations shows that around almost a quarter of all in-use allocations (~22%) are due to standard library big integers which require allocations. In other words, eliminating those allocations should lead to a speedup of around 5-10% to the initial chain sync. However, note that this series of commits only introduces the package and does not update all of the relevant code to make use of it as that will be done separately.

More specifically, the allocations in question are the result of several important calculations which could be done without allocations, and more efficiently in terms of execution time, via fixed precision unsigned 256-bit integers.

Thus, motivated by the previous discussion, this is part of a series of commits that implements highly optimized allocation free fixed precision unsigned 256-bit integer arithmetic that can ultimately be used in place of the standard library big integers.

For the time being, the package is introduced into the internal staging area for initial review.

The following is a brief overview of the main features and benefits:

  • Strong focus on performance and correctness
  • Every operation is faster than the stdlib big.Int equivalent and most operations, including the primary math operations, are significantly faster
  • Allocation free
    • All non-formatting operations with the specialized type are allocation free
  • Supports boolean comparison, bitwise logic, and bitwise shift operations
  • All operations are performed modulo 2^256
  • Ergonomic API with unary-style arguments are well as some binary variants
  • Conversion-free support for interoperation with native uint64 integers
  • Direct conversion to and from little and big endian byte arrays
  • Full support for formatted output and common base conversions
    • Formatted output uses fewer allocations than stdlib big.Int
  • 100% test coverage
  • Comprehensive benchmarks
  • Fully documented in README.md

The following benchmark results demonstrate the performance of most operations as compared to standard library big.Ints. The benchmarks are from a Ryzen 7 1700 processor and are the result of feeding benchstat 10 iterations of each.

Arithmetic Methods

Name big.Int Time/Op Uint256 Time/Op Delta vs big.Int
Add 158ns ± 2% 2ns ± 1% -98.67%
AddUint64 44.4ns ± 3% 3.4ns ± 2% -92.27%
Sub 53.9ns ± 1% 2.1ns ± 1% -96.12%
SubUint64 44.8ns ± 1% 3.4ns ± 2% -92.37%
Mul 419ns ± 1% 10ns ± 2% -97.64%
MulUint64 263ns ± 1% 4ns ± 1% -98.30%
Square 418ns ± 0% 7ns ± 2% -98.39%
Div/num_lt_den 75.4ns ± 1% 3.4ns ± 1% -95.51%
Div/num_eq_den 253ns ± 2% 4ns ± 3% -98.56%
Div/1_by_1_near 53.8ns ± 2% 4.5ns ± 2% -91.63%
Div/1_by_1_far 31.4ns ± 2% 14.6ns ± 2% -53.64%
Div/2_by_1_near 36.9ns ± 1% 10.1ns ± 2% -72.63%
Div/2_by_1_far 49.1ns ± 1% 28.8ns ± 1% -41.29%
Div/3_by_1_near 43.2ns ± 1% 13.7ns ± 3% -68.24%
Div/3_by_1_far 57.0ns ± 1% 43.6ns ± 1% -23.59%
Div/4_by_1_near 49.7ns ± 4% 18.0ns ± 1% -63.87%
Div/4_by_1_far 65.2ns ± 4% 57.8ns ± 2% -11.41%
Div/2_by_2_near 237ns ± 1% 22ns ± 3% -90.81%
Div/2_by_2_far 237ns ± 1% 30ns ± 3% -87.17%
Div/3_by_2_near 258ns ± 1% 29ns ± 1% -88.60%
Div/3_by_2_far 257ns ± 1% 50ns ± 2% -80.42%
Div/4_by_2_near 312ns ± 2% 40ns ± 3% -87.27%
Div/4_by_2_far 310ns ± 1% 71ns ± 3% -77.19%
Div/3_by_3_near 239ns ± 2% 21ns ± 2% -91.39%
Div/3_by_3_far 242ns ± 4% 33ns ± 3% -86.33%
Div/4_by_3_near 279ns ± 6% 31ns ± 1% -89.01%
Div/4_by_3_far 271ns ± 1% 46ns ± 3% -82.99%
Div/4_by_4_near 252ns ± 3% 20ns ± 3% -91.99%
Div/4_by_4_far 249ns ± 2% 36ns ± 2% -85.65%
DivRandom 202ns ± 1% 23ns ± 1% -88.43%
DivUint64 129ns ± 1% 47ns ± 0% -63.34%
Negate 47.3ns ± 2% 1.5ns ± 2% -96.91%

Comparison Methods

Name big.Int Time/Op Uint256 Time/Op Delta vs big.Int
Eq 12.7ns ± 1% 2.1ns ± 1% -83.72%
Lt 12.6ns ± 1% 3.0ns ± 1% -75.96%
Gt 12.6ns ± 1% 3.0ns ± 1% -75.91%
Cmp 12.6ns ± 1% 7.7ns ± 1% -39.01%
CmpUint64 5.93ns ± 2% 3.70ns ± 1% -37.60%

Bitwise Methods

Name big.Int Time/Op Uint256 Time/Op Delta vs big.Int
Lsh/bits_0 7.15ns ± 3% 2.58ns ± 1% -63.94%
Lsh/bits_1 14.8ns ± 1% 4.2ns ± 1% -71.40%
Lsh/bits_64 16.7ns ± 1% 2.7ns ± 1% -84.00%
Lsh/bits_128 16.9ns ± 2% 2.7ns ± 0% -84.21%
Lsh/bits_192 16.6ns ± 1% 2.6ns ± 1% -84.19%
Lsh/bits_255 16.3ns ± 2% 2.8ns ± 2% -83.11%
Lsh/bits_256 16.9ns ± 2% 2.6ns ± 2% -84.77%
Rsh/bits_0 8.76ns ± 2% 2.57ns ± 1% -70.63%
Rsh/bits_1 14.4ns ± 2% 4.3ns ± 2% -70.28%
Rsh/bits_64 12.8ns ± 1% 2.9ns ± 2% -77.31%
Rsh/bits_128 11.8ns ± 0% 2.9ns ± 2% -75.51%
Rsh/bits_192 10.5ns ± 2% 2.6ns ± 1% -75.17%
Rsh/bits_255 10.5ns ± 3% 2.8ns ± 2% -73.89%
Rsh/bits_256 5.50ns ± 1% 2.58ns ± 2% -53.15%
Not 25.4ns ± 2% 3.3ns ± 2% -86.79%
Or 17.9ns ± 5% 3.4ns ± 6% -80.94%
And 16.7ns ± 2% 3.4ns ± 0% -79.93%
Xor 17.9ns ± 1% 3.4ns ± 2% -80.91%
BitLen/bits_64 2.24ns ± 1% 1.94ns ± 3% -13.04%
BitLen/bits_128 2.25ns ± 2% 1.96ns ± 2% -13.17%
BitLen/bits_192 2.25ns ± 1% 1.60ns ± 1% -28.65%
BitLen/bits_255 2.26ns ± 2% 1.61ns ± 1% -29.04%

Conversion Methods

Name big.Int Time/Op Uint256 Time/Op Delta vs big.Int
SetBytes 9.09ns ±13% 3.05ns ± 1% -66.43%
SetBytesLE 59.9ns ± 4% 3.1ns ± 2% -94.76%
Bytes 61.3ns ± 1% 13.8ns ± 3% -77.49%
BytesLE 83.5ns ± 2% 13.9ns ± 2% -83.32%

Misc Convenience Methods

Name big.Int Time/Op Uint256 Time/Op Delta vs big.Int
Zero 2.99ns ± 2% 1.29ns ± 1% -56.82%
IsZero 1.78ns ± 0% 1.63ns ± 2% -8.23%
IsOdd 3.62ns ± 4% 1.64ns ± 1% -54.65%

Output Formatting Methods

Name big.Int Time/Op Uint256 Time/Op Delta vs big.Int
Text/base_2 579ns ± 3% 496ns ± 2% -14.37%
Text/base_8 266ns ± 1% 227ns ± 1% -14.58%
Text/base_10 536ns ± 1% 458ns ± 2% -14.58%
Text/base_16 205ns ± 2% 180ns ± 4% -11.90%
Format/base_2 987ns ±15% 852ns ± 2% -13.64%
Format/base_8 620ns ± 6% 544ns ± 3% -12.31%
Format/base_10 888ns ± 1% 726ns ± 1% -18.25%
Format/base_16 565ns ± 1% 449ns ± 1% -20.41%

This is work towards #2786.

@davecgh davecgh changed the title uint256: Introduce package infrastructure. uint256: Introduce package. Nov 6, 2021
@davecgh davecgh force-pushed the primitives_uint256 branch from e3f55fb to 8850e5c Compare November 7, 2021 06:55
@davecgh davecgh force-pushed the primitives_uint256 branch from 8850e5c to 41c2fa8 Compare November 8, 2021 07:31
internal/staging/primitives/uint256/uint256.go Outdated Show resolved Hide resolved
internal/staging/primitives/uint256/uint256.go Outdated Show resolved Hide resolved
internal/staging/primitives/uint256/uint256.go Outdated Show resolved Hide resolved
@davecgh davecgh force-pushed the primitives_uint256 branch from 41c2fa8 to a876369 Compare November 8, 2021 22:15
@davecgh davecgh force-pushed the primitives_uint256 branch 2 times, most recently from 82ec71c to 090a6a9 Compare November 12, 2021 02:20
@dnldd
Copy link
Member

dnldd commented Nov 12, 2021

Need to have another pass over the multiplication and division sections but looks good so far.

Copy link
Member

@matheusd matheusd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Halfway through the commits

internal/staging/primitives/uint256/uint256_test.go Outdated Show resolved Hide resolved
internal/staging/primitives/uint256/uint256_test.go Outdated Show resolved Hide resolved
internal/staging/primitives/uint256/uint256.go Outdated Show resolved Hide resolved
internal/staging/primitives/uint256/uint256.go Outdated Show resolved Hide resolved
@JoeGruffins
Copy link
Member

Pretty amazing work. Learning a lot looking over this.

In commit message of c8fc7bf
- Ergonomic API with unary-style arguments are well as some binary variants
should maybe be as well as

In commit message of dce5714
This adds the ability zero an existing uint256 and determine if it is
should maybe have ability to zero

benchmarks
goos: linux
goarch: amd64
pkg: github.com/decred/dcrd/internal/staging/primitives/uint256
cpu: AMD Ryzen 9 3900XT 12-Core Processor           
BenchmarkUint256SetBytes-24      	626857800	         1.801 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntSetBytes-24       	192123936	         6.213 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256SetBytesLE-24    	579135675	         1.862 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntSetBytesLE-24     	18501447	        67.09 ns/op	      32 B/op	       1 allocs/op
BenchmarkUint256Bytes-24         	124904251	         9.149 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntBytes-24          	16091646	        79.41 ns/op	      32 B/op	       1 allocs/op
BenchmarkUint256BytesLE-24       	135137600	         8.951 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntBytesLE-24        	13922724	        97.98 ns/op	      32 B/op	       1 allocs/op
BenchmarkUint256Zero-24          	1000000000	         0.7060 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntZero-24           	494819776	         2.159 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256IsZero-24        	806305870	         1.370 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntIsZero-24         	749324908	         1.521 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256IsOdd-24         	1000000000	         1.144 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntIsOdd-24          	477044432	         2.322 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Eq-24            	786457744	         1.518 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntEq-24             	182344155	         6.708 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lt-24            	541656760	         1.980 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLt-24             	186473148	         6.458 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Gt-24            	532842606	         1.977 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntGt-24             	178758286	         6.379 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Cmp-24           	437226667	         2.637 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntCmp-24            	185312695	         6.041 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256CmpUint64-24     	635997724	         1.622 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntCmpUint64-24      	297241944	         4.004 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Add-24           	702670026	         1.439 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntAdd-24            	 9621193	       120.7 ns/op	       3 B/op	       0 allocs/op
BenchmarkUint256AddUint64-24     	487079854	         2.141 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntAddUint64-24      	42218737	        27.30 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Sub-24           	703910578	         1.525 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntSub-24            	33799041	        35.51 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256SubUint64-24     	573150536	         2.141 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntSubUint64-24      	41408493	        28.95 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Mul-24           	205485963	         5.674 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntMul-24            	 2451584	       469.9 ns/op	      64 B/op	       1 allocs/op
BenchmarkUint256MulUint64-24     	378212292	         2.993 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntMulUint64-24      	 4817528	       226.4 ns/op	       8 B/op	       1 allocs/op
BenchmarkUint256Square-24        	273366194	         4.085 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntSquare-24         	 2502544	       493.0 ns/op	      64 B/op	       1 allocs/op
BenchmarkUint256Div/dividend_lt_divisor-24         	447356533	         2.375 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/dividend_eq_divisor-24         	440418166	         2.556 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/1_by_1_near-24                 	357880683	         3.321 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/1_by_1_far-24                  	100000000	        10.99 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/2_by_1_near-24                 	165720555	         7.381 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/2_by_1_far-24                  	57083955	        21.11 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/3_by_1_near-24                 	113972001	         9.993 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/3_by_1_far-24                  	38814946	        32.29 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_1_near-24                 	90064765	        13.06 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_1_far-24                  	28466821	        41.80 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/2_by_2_near-24                 	74131861	        15.55 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/2_by_2_far-24                  	53563335	        23.12 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/3_by_2_near-24                 	56249758	        21.93 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/3_by_2_far-24                  	32042884	        35.83 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_2_near-24                 	42563181	        27.74 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_2_far-24                  	24508135	        49.32 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/3_by_3_near-24                 	81699478	        14.12 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/3_by_3_far-24                  	48410469	        24.48 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_3_near-24                 	53083888	        21.94 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_3_far-24                  	36933514	        32.68 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_4_near-24                 	78725864	        15.36 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_4_far-24                  	44327823	        27.38 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256DivRandom-24                       	79271268	        15.27 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/dividend_lt_divisor-24          	16322353	        74.75 ns/op	      64 B/op	       1 allocs/op
BenchmarkBigIntDiv/dividend_eq_divisor-24          	 4489509	       279.3 ns/op	      80 B/op	       1 allocs/op
BenchmarkBigIntDiv/1_by_1_near-24                  	18364558	        62.23 ns/op	       8 B/op	       1 allocs/op
BenchmarkBigIntDiv/1_by_1_far-24                   	54461589	        21.26 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/2_by_1_near-24                  	37645087	        27.15 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/2_by_1_far-24                   	34748634	        34.90 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/3_by_1_near-24                  	36574906	        32.15 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/3_by_1_far-24                   	28280524	        40.10 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/4_by_1_near-24                  	32926687	        37.17 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/4_by_1_far-24                   	26827346	        47.72 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/2_by_2_near-24                  	 4243052	       276.4 ns/op	      64 B/op	       1 allocs/op
BenchmarkBigIntDiv/2_by_2_far-24                   	 4456932	       276.9 ns/op	      64 B/op	       1 allocs/op
BenchmarkBigIntDiv/3_by_2_near-24                  	 3941820	       307.3 ns/op	      64 B/op	       1 allocs/op
BenchmarkBigIntDiv/3_by_2_far-24                   	 3893846	       300.3 ns/op	      64 B/op	       1 allocs/op
BenchmarkBigIntDiv/4_by_2_near-24                  	 3383415	       342.9 ns/op	      80 B/op	       1 allocs/op
BenchmarkBigIntDiv/4_by_2_far-24                   	 3577161	       338.8 ns/op	      80 B/op	       1 allocs/op
BenchmarkBigIntDiv/3_by_3_near-24                  	 4432884	       279.9 ns/op	      64 B/op	       1 allocs/op
BenchmarkBigIntDiv/3_by_3_far-24                   	 4402246	       284.7 ns/op	      64 B/op	       1 allocs/op
BenchmarkBigIntDiv/4_by_3_near-24                  	 3721740	       312.4 ns/op	      80 B/op	       1 allocs/op
BenchmarkBigIntDiv/4_by_3_far-24                   	 3741186	       315.3 ns/op	      80 B/op	       1 allocs/op
BenchmarkBigIntDiv/4_by_4_near-24                  	 4409409	       265.8 ns/op	      80 B/op	       1 allocs/op
BenchmarkBigIntDiv/4_by_4_far-24                   	 4451058	       264.0 ns/op	      80 B/op	       1 allocs/op
BenchmarkBigIntDivRandom-24                        	 5221705	       241.6 ns/op	      72 B/op	       1 allocs/op
BenchmarkUint256DivUint64-24                       	33089503	        34.19 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDivUint64-24                        	10080028	       133.8 ns/op	       8 B/op	       1 allocs/op
BenchmarkUint256Negate-24                          	992767736	         1.014 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntNegate-24                           	37861459	        30.20 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lsh/bits_0-24                      	729918438	         1.718 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lsh/bits_1-24                      	387857133	         3.003 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lsh/bits_64-24                     	520758091	         2.146 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lsh/bits_128-24                    	568290314	         1.904 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lsh/bits_192-24                    	739967994	         1.633 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lsh/bits_255-24                    	547926159	         1.918 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lsh/bits_256-24                    	733664517	         1.697 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLsh/bits_0-24                       	223699926	         5.348 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLsh/bits_1-24                       	116737645	        10.05 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLsh/bits_64-24                      	105860778	        11.51 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLsh/bits_128-24                     	106117347	        10.90 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLsh/bits_192-24                     	109361703	        11.41 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLsh/bits_255-24                     	105343557	        11.10 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLsh/bits_256-24                     	103054900	        11.43 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Rsh/bits_0-24                      	550372821	         1.939 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Rsh/bits_1-24                      	392731686	         3.109 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Rsh/bits_64-24                     	569238034	         1.968 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Rsh/bits_128-24                    	637000917	         1.690 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Rsh/bits_192-24                    	806247422	         1.432 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Rsh/bits_255-24                    	602076444	         1.939 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Rsh/bits_256-24                    	598181062	         1.718 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntRsh/bits_0-24                       	195011600	         6.259 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntRsh/bits_1-24                       	100000000	        10.35 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntRsh/bits_64-24                      	126069898	         9.517 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntRsh/bits_128-24                     	134734641	         8.795 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntRsh/bits_192-24                     	155518081	         7.415 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntRsh/bits_255-24                     	155382307	         7.437 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntRsh/bits_256-24                     	288011690	         4.045 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Not-24                             	520997385	         2.051 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntNot-24                              	78752209	        15.59 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Or-24                              	484275884	         2.383 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntOr-24                               	98529921	        12.25 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256And-24                             	477428367	         2.260 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntAnd-24                              	100000000	        11.53 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Xor-24                             	475271222	         2.326 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntXor-24                              	96482734	        12.73 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256BitLen/bits_64-24                  	769874131	         1.512 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256BitLen/bits_128-24                 	816861885	         1.430 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256BitLen/bits_192-24                 	1000000000	         1.156 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256BitLen/bits_255-24                 	841902804	         1.231 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntBitLen/bits_64-24                   	628294862	         1.665 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntBitLen/bits_128-24                  	623833395	         1.628 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntBitLen/bits_192-24                  	641823882	         1.665 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntBitLen/bits_255-24                  	660455758	         1.620 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Text/base_2-24                     	 2203741	       544.0 ns/op	     512 B/op	       2 allocs/op
BenchmarkUint256Text/base_8-24                     	 4886019	       245.4 ns/op	     192 B/op	       2 allocs/op
BenchmarkUint256Text/base_10-24                    	 2212734	       565.5 ns/op	     160 B/op	       2 allocs/op
BenchmarkUint256Text/base_16-24                    	 6908899	       180.1 ns/op	     128 B/op	       2 allocs/op
BenchmarkBigIntText/base_2-24                      	 1998284	       608.5 ns/op	     528 B/op	       2 allocs/op
BenchmarkBigIntText/base_8-24                      	 4385248	       271.5 ns/op	     192 B/op	       2 allocs/op
BenchmarkBigIntText/base_10-24                     	 1914667	       629.6 ns/op	     224 B/op	       3 allocs/op
BenchmarkBigIntText/base_16-24                     	 5653011	       213.0 ns/op	     136 B/op	       2 allocs/op
BenchmarkUint256Format/base_2-24                   	 1252276	       958.6 ns/op	     768 B/op	       3 allocs/op
BenchmarkUint256Format/base_8-24                   	 1961947	       609.3 ns/op	     288 B/op	       3 allocs/op
BenchmarkUint256Format/base_10-24                  	 1313508	       932.1 ns/op	     240 B/op	       3 allocs/op
BenchmarkUint256Format/base_16-24                  	 2193850	       545.1 ns/op	     192 B/op	       3 allocs/op
BenchmarkBigIntFormat/base_2-24                    	 1000000	      1049 ns/op	     552 B/op	       5 allocs/op
BenchmarkBigIntFormat/base_8-24                    	 1708375	       698.7 ns/op	     216 B/op	       5 allocs/op
BenchmarkBigIntFormat/base_10-24                   	 1000000	      1038 ns/op	     248 B/op	       6 allocs/op
BenchmarkBigIntFormat/base_16-24                   	 1843668	       663.4 ns/op	     160 B/op	       5 allocs/op
BenchmarkUint256PutBig-24                          	54292158	        21.95 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256SetBig-24                          	44077651	        27.90 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/decred/dcrd/internal/staging/primitives/uint256	236.892s

internal/staging/primitives/uint256/uint256.go Outdated Show resolved Hide resolved
internal/staging/primitives/uint256/README.md Outdated Show resolved Hide resolved
Copy link
Member

@matheusd matheusd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 🎉 🎉

Excellent work as always! Only caught few more documentational issues.

internal/staging/primitives/uint256/uint256.go Outdated Show resolved Hide resolved
internal/staging/primitives/uint256/uint256.go Outdated Show resolved Hide resolved
@davecgh
Copy link
Member Author

davecgh commented Nov 17, 2021

Updated the commit messages per the @JoeGruffins review as well.

Copy link
Member

@matheusd matheusd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work as always!

Benchmarks
goos: linux
goarch: amd64
pkg: github.com/decred/dcrd/internal/staging/primitives/uint256
cpu: AMD Ryzen 3 2200G with Radeon Vega Graphics    
BenchmarkUint256SetBytes-4     	476069810	         2.482 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntSetBytes-4      	160615147	         7.908 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256SetBytesLE-4   	430822285	         2.687 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntSetBytesLE-4    	19294600	        60.83 ns/op	      32 B/op	       1 allocs/op
BenchmarkUint256Bytes-4        	88200577	        12.33 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntBytes-4         	18895275	        55.32 ns/op	      32 B/op	       1 allocs/op
BenchmarkUint256BytesLE-4      	100000000	        12.18 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntBytesLE-4       	15329814	        68.77 ns/op	      32 B/op	       1 allocs/op
BenchmarkUint256Zero-4         	1000000000	         1.120 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntZero-4          	451413762	         2.716 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256IsZero-4       	681048356	         1.635 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntIsZero-4        	613439154	         1.849 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256IsOdd-4        	797677572	         1.669 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntIsOdd-4         	302746519	         3.462 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Eq-4           	654700365	         1.949 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntEq-4            	112233123	        10.35 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lt-4           	480044094	         2.420 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLt-4            	100000000	        10.72 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Gt-4           	428775073	         2.633 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntGt-4            	112796678	        12.01 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Cmp-4          	162790765	         6.916 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntCmp-4           	117192634	        10.42 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256CmpUint64-4    	375092656	         3.216 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntCmpUint64-4     	241991760	         6.247 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Add-4          	575967142	         1.912 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntAdd-4           	 8228707	       138.5 ns/op	       4 B/op	       0 allocs/op
BenchmarkUint256AddUint64-4    	462350301	         2.604 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntAddUint64-4     	26791534	        45.15 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Sub-4          	651187911	         1.755 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntSub-4           	22559347	        55.86 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256SubUint64-4    	448262576	         2.629 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntSubUint64-4     	25150804	        40.20 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Mul-4          	135067701	         8.883 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntMul-4           	 2939754	       388.9 ns/op	      64 B/op	       1 allocs/op
BenchmarkUint256MulUint64-4    	321874484	         3.923 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntMulUint64-4     	 5118799	       247.4 ns/op	       8 B/op	       1 allocs/op
BenchmarkUint256Square-4       	194034268	         7.041 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntSquare-4        	 2914981	       393.1 ns/op	      64 B/op	       1 allocs/op
BenchmarkUint256Div/dividend_lt_divisor-4         	398487291	         2.945 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/dividend_eq_divisor-4         	364279308	         3.101 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/1_by_1_near-4                 	302580482	         3.855 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/1_by_1_far-4                  	96309692	        13.61 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/2_by_1_near-4                 	129636962	         8.936 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/2_by_1_far-4                  	46576605	        25.07 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/3_by_1_near-4                 	100000000	        11.71 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/3_by_1_far-4                  	32242239	        37.36 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_1_near-4                 	63737451	        15.75 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_1_far-4                  	20309397	        49.48 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/2_by_2_near-4                 	53796532	        20.07 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/2_by_2_far-4                  	42929392	        30.03 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/3_by_2_near-4                 	38020507	        26.79 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/3_by_2_far-4                  	25221169	        47.17 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_2_near-4                 	33020859	        35.22 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_2_far-4                  	16499960	        61.37 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/3_by_3_near-4                 	68166744	        17.98 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/3_by_3_far-4                  	42235713	        29.73 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_3_near-4                 	35219878	        32.05 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_3_far-4                  	28236904	        41.50 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_4_near-4                 	70831348	        17.26 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Div/4_by_4_far-4                  	38107536	        31.37 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256DivRandom-4                       	60160770	        20.02 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/dividend_lt_divisor-4          	15166644	        71.20 ns/op	      64 B/op	       1 allocs/op
BenchmarkBigIntDiv/dividend_eq_divisor-4          	 5455660	       261.5 ns/op	      80 B/op	       1 allocs/op
BenchmarkBigIntDiv/1_by_1_near-4                  	20247571	        58.81 ns/op	       8 B/op	       1 allocs/op
BenchmarkBigIntDiv/1_by_1_far-4                   	42041121	        28.51 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/2_by_1_near-4                  	36022141	        32.12 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/2_by_1_far-4                   	28347584	        42.89 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/3_by_1_near-4                  	30974647	        38.03 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/3_by_1_far-4                   	20732623	        49.20 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/4_by_1_near-4                  	28156827	        44.69 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/4_by_1_far-4                   	17195563	        63.88 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDiv/2_by_2_near-4                  	 5086784	       216.1 ns/op	      64 B/op	       1 allocs/op
BenchmarkBigIntDiv/2_by_2_far-4                   	 5746678	       212.5 ns/op	      64 B/op	       1 allocs/op
BenchmarkBigIntDiv/3_by_2_near-4                  	 5075608	       253.1 ns/op	      64 B/op	       1 allocs/op
BenchmarkBigIntDiv/3_by_2_far-4                   	 4244466	       257.3 ns/op	      64 B/op	       1 allocs/op
BenchmarkBigIntDiv/4_by_2_near-4                  	 4156348	       295.6 ns/op	      80 B/op	       1 allocs/op
BenchmarkBigIntDiv/4_by_2_far-4                   	 4208538	       295.9 ns/op	      80 B/op	       1 allocs/op
BenchmarkBigIntDiv/3_by_3_near-4                  	 4301845	       269.1 ns/op	      64 B/op	       1 allocs/op
BenchmarkBigIntDiv/3_by_3_far-4                   	 5341867	       221.1 ns/op	      64 B/op	       1 allocs/op
BenchmarkBigIntDiv/4_by_3_near-4                  	 4738832	       247.4 ns/op	      80 B/op	       1 allocs/op
BenchmarkBigIntDiv/4_by_3_far-4                   	 4631062	       250.4 ns/op	      80 B/op	       1 allocs/op
BenchmarkBigIntDiv/4_by_4_near-4                  	 5088583	       234.5 ns/op	      80 B/op	       1 allocs/op
BenchmarkBigIntDiv/4_by_4_far-4                   	 5235777	       229.4 ns/op	      80 B/op	       1 allocs/op
BenchmarkBigIntDivRandom-4                        	 5518939	       221.6 ns/op	      72 B/op	       1 allocs/op
BenchmarkUint256DivUint64-4                       	22925576	        44.61 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntDivUint64-4                        	10211529	       124.1 ns/op	       8 B/op	       1 allocs/op
BenchmarkUint256Negate-4                          	850187539	         1.238 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntNegate-4                           	24669282	        51.68 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lsh/bits_0-4                      	507476983	         2.201 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lsh/bits_1-4                      	316203012	         3.606 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lsh/bits_64-4                     	459433334	         2.491 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lsh/bits_128-4                    	484224193	         2.384 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lsh/bits_192-4                    	462848384	         2.296 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lsh/bits_255-4                    	480112038	         2.348 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Lsh/bits_256-4                    	448478752	         2.463 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLsh/bits_0-4                       	172591105	         6.000 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLsh/bits_1-4                       	93923139	        12.78 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLsh/bits_64-4                      	81818536	        14.59 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLsh/bits_128-4                     	79875747	        14.30 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLsh/bits_192-4                     	83981918	        14.12 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLsh/bits_255-4                     	87148645	        13.77 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntLsh/bits_256-4                     	84002761	        14.29 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Rsh/bits_0-4                      	475755678	         2.233 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Rsh/bits_1-4                      	313881753	         3.717 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Rsh/bits_64-4                     	448903030	         2.456 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Rsh/bits_128-4                    	522788181	         2.415 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Rsh/bits_192-4                    	501043711	         2.303 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Rsh/bits_255-4                    	493278232	         2.352 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Rsh/bits_256-4                    	524338425	         2.198 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntRsh/bits_0-4                       	163935820	         7.280 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntRsh/bits_1-4                       	79080118	        13.90 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntRsh/bits_64-4                      	107008660	        10.92 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntRsh/bits_128-4                     	121404361	         9.779 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntRsh/bits_192-4                     	136675233	         8.820 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntRsh/bits_255-4                     	135231632	         9.048 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntRsh/bits_256-4                     	179642262	         6.345 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Not-4                             	455189844	         2.649 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntNot-4                              	56325128	        21.92 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Or-4                              	425439813	         2.828 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntOr-4                               	76605534	        15.22 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256And-4                             	385027156	         3.201 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntAnd-4                              	79475395	        15.53 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Xor-4                             	415490449	         3.001 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntXor-4                              	63178353	        15.93 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256BitLen/bits_64-4                  	688524416	         1.729 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256BitLen/bits_128-4                 	681189094	         1.707 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256BitLen/bits_192-4                 	872857674	         1.380 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256BitLen/bits_255-4                 	865733118	         1.431 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntBitLen/bits_64-4                   	540039884	         2.160 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntBitLen/bits_128-4                  	600470946	         1.953 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntBitLen/bits_192-4                  	575591056	         1.964 ns/op	       0 B/op	       0 allocs/op
BenchmarkBigIntBitLen/bits_255-4                  	570677790	         2.301 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256Text/base_2-4                     	 2326405	       532.3 ns/op	     512 B/op	       2 allocs/op
BenchmarkUint256Text/base_8-4                     	 5222268	       214.1 ns/op	     192 B/op	       2 allocs/op
BenchmarkUint256Text/base_10-4                    	 2746434	       466.3 ns/op	     160 B/op	       2 allocs/op
BenchmarkUint256Text/base_16-4                    	 7155501	       195.4 ns/op	     128 B/op	       2 allocs/op
BenchmarkBigIntText/base_2-4                      	 2267157	       523.1 ns/op	     528 B/op	       2 allocs/op
BenchmarkBigIntText/base_8-4                      	 4910493	       246.6 ns/op	     192 B/op	       2 allocs/op
BenchmarkBigIntText/base_10-4                     	 2231427	       529.7 ns/op	     224 B/op	       3 allocs/op
BenchmarkBigIntText/base_16-4                     	 6620342	       184.6 ns/op	     135 B/op	       2 allocs/op
BenchmarkUint256Format/base_2-4                   	 1265379	       966.8 ns/op	     768 B/op	       3 allocs/op
BenchmarkUint256Format/base_8-4                   	 2023450	       523.9 ns/op	     288 B/op	       3 allocs/op
BenchmarkUint256Format/base_10-4                  	 1683890	       712.5 ns/op	     240 B/op	       3 allocs/op
BenchmarkUint256Format/base_16-4                  	 2705778	       450.0 ns/op	     192 B/op	       3 allocs/op
BenchmarkBigIntFormat/base_2-4                    	 1315701	       919.3 ns/op	     552 B/op	       5 allocs/op
BenchmarkBigIntFormat/base_8-4                    	 2011153	       584.3 ns/op	     216 B/op	       5 allocs/op
BenchmarkBigIntFormat/base_10-4                   	 1255659	       889.5 ns/op	     248 B/op	       6 allocs/op
BenchmarkBigIntFormat/base_16-4                   	 2304246	       522.1 ns/op	     160 B/op	       5 allocs/op
BenchmarkUint256PutBig-4                          	46232820	        23.74 ns/op	       0 B/op	       0 allocs/op
BenchmarkUint256SetBig-4                          	26237026	        43.49 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/decred/dcrd/internal/staging/primitives/uint256	230.729s

Copy link
Member

@rstaudt2 rstaudt2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. The documentation is excellent, making the logic easy to follow. I just had a few very minor comments inline.

internal/staging/primitives/uint256/README.md Outdated Show resolved Hide resolved
internal/staging/primitives/uint256/uint256.go Outdated Show resolved Hide resolved
internal/staging/primitives/uint256/uint256.go Outdated Show resolved Hide resolved
internal/staging/primitives/uint256/uint256.go Outdated Show resolved Hide resolved
internal/staging/primitives/uint256/uint256.go Outdated Show resolved Hide resolved
internal/staging/primitives/uint256/uint256.go Outdated Show resolved Hide resolved
@davecgh davecgh force-pushed the primitives_uint256 branch 2 times, most recently from e097b4f to 15b8ddc Compare November 19, 2021 22:23
Profiling the CPU usage during an initial chain sync shows that roughly
65-70% of all time is spent in garbage collection operations.  This is
primarily the result of a large number of in-use allocations.  Profiling
the in-use allocations shows that around almost a quarter of all in-use
allocations (~22%) are due to standard library big integers which
require allocations.  In other words, eliminating those allocations
should lead to a speedup of around 10% to the initial chain sync.

More specifically, the allocations in question are the result of several
important calculations which could be done without allocations, and more
efficiently in terms of execution time, via fixed precision unsigned
256-bit integers.

Thus, motivated by the previous discussion, this is part of a series of
commits that implements highly optimized allocation free fixed precision
unsigned 256-bit integer arithmetic that can ultimately be used in place
of the standard library big integers.

For the time being, the package is introduced into the internal staging
area for initial review.

The following is a brief overview of the main features and benefits:

- Strong focus on performance and correctness
- Every operation is faster than the stdlib big.Int equivalent and most
  operations, including the primary math operations, are significantly
  faster
- Allocation free
  - All non-formatting operations with the specialized type are allocation free
- Supports boolean comparison, bitwise logic, and bitwise shift operations
- All operations are performed modulo 2^256
- Ergonomic API with unary-style arguments as well as some binary variants
- Conversion-free support for interoperation with native uint64 integers
- Direct conversion to and from little and big endian byte arrays
- Full support for formatted output and common base conversions
  - Formatted output uses fewer allocations than stdlib big.Int
- 100% test coverage
- Comprehensive benchmarks

In order to help ease the review process, the full implementation will
be done across many commits.

This commit only contains the basic type definition and ability to set
it to a uint64 or another uint256 along with tests, so it is not very
useful on its own.

Future commits will implement support for interpreting and producing big
and little endian bytes, the primary arithmetic operations (addition,
subtraction, multiplication, squaring, division, negation), bitwise
operations (lsh, rsh, not, or, and, xor), comparison operations (equals,
less, greater, cmp), and other convenience methods such as determining
the minimum number of bits required to represent the current value,
whether or not the value can be represented as a uint64 without loss of
precision, and text formatting with base conversion.
This adds the ability for the uint256 to be set by interpreting arrays
and slices as a 256-bit big-endian integer and associated tests to
ensure proper functionality.

This is part of a series of commits to fully implement the uint256
package.
The following is a comparison between stdlib big integers (old) and the
specialized type (new) averaging 10 runs each:

name       old time/op    new time/op    delta
------------------------------------------------------------------
SetBytes   9.09ns ±13%    3.05ns ± 1%   -66.43%  (p=0.000 n=10+10)

name       old allocs/op  new allocs/op  delta
----------------------------------------------------------
SetBytes   0.00           0.00           ~     (all equal)

This is part of a series of commits to fully implement the uint256
package.
This adds the ability to determine if a uint256 is odd along with
associated tests to ensure proper functionality.

This is part of a series of commits to fully implement the uint256
package.
The following is a comparison between stdlib big integers (old) and the
specialized type (new) averaging 10 runs each:

name    old time/op     new time/op     delta
------------------------------------------------------------------
IsOdd   3.62ns ± 4%     1.64ns ± 1%     -54.65%  (p=0.000 n=10+10)

name    old allocs/op   new allocs/op   delta
---------------------------------------------------------
IsOdd   0.00            0.00            ~     (all equal)

This is part of a series of commits to fully implement the uint256
package.
This adds support for uint256 bitwise left shifting along with
associated tests to ensure proper functionality.

It includes left shifting an existing uint256 (a << b) and assigning the
result of left shifting a uint256 to a second one (a <<= b).

This is part of a series of commits to fully implement the uint256
package.
The following is a comparison between stdlib big integers (old) and the
specialized type (new) averaging 10 runs each:

name           old time/op     new time/op     delta
-------------------------------------------------------------------------
Lsh/bits_0      7.1ns ± 3%     2.58ns ± 1%     -63.94%  (p=0.000 n=10+10)
Lsh/bits_1     14.8ns ± 1%     4.2ns ± 1%      -71.40%  (p=0.000 n=10+10)
Lsh/bits_64    16.7ns ± 1%     2.7ns ± 1%      -84.00%  (p=0.000 n=10+10)
Lsh/bits_128   16.9ns ± 2%     2.7ns ± 0%      -84.21%  (p=0.000 n=10+10)
Lsh/bits_192   16.6ns ± 1%     2.6ns ± 1%      -84.19%  (p=0.000 n=10+10)
Lsh/bits_255   16.3ns ± 2%     2.8ns ± 2%      -83.11%  (p=0.000 n=10+10)
Lsh/bits_256   16.9ns ± 2%     2.6ns ± 2%      -84.77%  (p=0.000 n=10+10)

name           old allocs/op   new allocs/op   delta
----------------------------------------------------------------
Lsh/bits_0     0.00            0.00            ~     (all equal)
Lsh/bits_1     0.00            0.00            ~     (all equal)
Lsh/bits_64    0.00            0.00            ~     (all equal)
Lsh/bits_128   0.00            0.00            ~     (all equal)
Lsh/bits_192   0.00            0.00            ~     (all equal)
Lsh/bits_255   0.00            0.00            ~     (all equal)
Lsh/bits_256   0.00            0.00            ~     (all equal)

This is part of a series of commits to fully implement the uint256
package.
This adds support for uint256 bitwise right shifting along with
associated tests to ensure proper functionality.

It includes right shifting an existing uint256 (a >> b) and assigning
the result of right shifting a uint256 to a second one (a >>= b).

This is part of a series of commits to fully implement the uint256
package.
The following is a comparison between stdlib big integers (old) and the
specialized type (new) averaging 10 runs each:

name           old time/op     new time/op     delta
-------------------------------------------------------------------------
Rsh/bits_0     8.76ns ± 2%     2.57ns ± 1%     -70.63%  (p=0.000 n=10+10)
Rsh/bits_1     14.4ns ± 2%     4.3ns ± 2%      -70.28%  (p=0.000 n=10+10)
Rsh/bits_64    12.8ns ± 1%     2.9ns ± 2%      -77.31%  (p=0.000 n=10+10)
Rsh/bits_128   11.8ns ± 0%     2.9ns ± 2%      -75.51%  (p=0.000 n=10+10)
Rsh/bits_192   10.5ns ± 2%     2.6ns ± 1%      -75.17%  (p=0.000 n=10+10)
Rsh/bits_255   10.5ns ± 3%     2.8ns ± 2%      -73.89%  (p=0.000 n=10+10)
Rsh/bits_256   5.50ns ± 1%     2.58ns ± 2%     -53.15%  (p=0.000 n=10+10)

name           old allocs/op   new allocs/op   delta
----------------------------------------------------------------
Rsh/bits_0     0.00            0.00            ~     (all equal)
Rsh/bits_1     0.00            0.00            ~     (all equal)
Rsh/bits_64    0.00            0.00            ~     (all equal)
Rsh/bits_128   0.00            0.00            ~     (all equal)
Rsh/bits_192   0.00            0.00            ~     (all equal)
Rsh/bits_255   0.00            0.00            ~     (all equal)
Rsh/bits_256   0.00            0.00            ~     (all equal)

This is part of a series of commits to fully implement the uint256
package.
This adds support to compute the bitwise not of a uint256 along with
associated tests to ensure proper functionality.

This is part of a series of commits to fully implement the uint256
package.
The following is a comparison between stdlib big integers (old) and the
specialized type (new) averaging 10 runs each:

name   old time/op     new time/op     delta
-----------------------------------------------------------------
Not    25.4ns ± 2%     3.3ns ± 2%      -86.79%  (p=0.000 n=10+10)

name   old allocs/op   new allocs/op   delta
--------------------------------------------------------
Not    0.00            0.00            ~     (all equal)

This is part of a series of commits to fully implement the uint256
package.
This adds support to compute the bitwise or of two uint256s along with
associated tests to ensure proper functionality.

This is part of a series of commits to fully implement the uint256
package.
The following is a comparison between stdlib big integers (old) and the
specialized type (new) averaging 10 runs each:

name   old time/op     new time/op     delta
-----------------------------------------------------------------
Or     17.9ns ± 5%     3.4ns ± 6%      -80.94%  (p=0.000 n=10+10)

name   old allocs/op   new allocs/op   delta
--------------------------------------------------------
Or     0.00            0.00            ~     (all equal)

This is part of a series of commits to fully implement the uint256
package.
This adds support to compute the bitwise and of two uint256s along with
associated tests to ensure proper functionality.

This is part of a series of commits to fully implement the uint256
package.
The following is a comparison between stdlib big integers (old) and the
specialized type (new) averaging 10 runs each:

name   old time/op     new time/op     delta
-----------------------------------------------------------------
And    16.7ns ± 5%     3.4ns ± 6%      -79.93%  (p=0.000 n=10+10)

name   old allocs/op   new allocs/op   delta
--------------------------------------------------------
And    0.00            0.00            ~     (all equal)

This is part of a series of commits to fully implement the uint256
package.
This adds support to compute the bitwise xor of two uint256s along with
associated tests to ensure proper functionality.

This is part of a series of commits to fully implement the uint256
package.
The following is a comparison between stdlib big integers (old) and the
specialized type (new) averaging 10 runs each:

name   old time/op     new time/op     delta
-----------------------------------------------------------------
Xor    17.9ns ± 5%     3.4ns ± 6%      -80.91%  (p=0.000 n=10+10)

name   old allocs/op   new allocs/op   delta
--------------------------------------------------------
Xor    0.00            0.00            ~     (all equal)

This is part of a series of commits to fully implement the uint256
package.
This adds support for determining the minimum number of bits required to
represent the current value of a uint256 along with associated tests to
ensure proper functionality.

This is part of a series of commits to fully implement the uint256
package.
The following is a comparison between stdlib big integers (old) and the
specialized type (new) averaging 10 runs each:

name       old time/op     new time/op     delta
---------------------------------------------------------------------
bits_64    2.24ns ± 1%     1.94ns ± 3%     -13.04%  (p=0.000 n=10+10)
bits_128   2.25ns ± 2%     1.96ns ± 2%     -13.17%  (p=0.000 n=10+10)
bits_192   2.25ns ± 1%     1.60ns ± 1%     -28.65%  (p=0.000 n=10+10)
bits_255   2.26ns ± 2%     1.61ns ± 1%     -29.04%  (p=0.000 n=10+10)

name       old allocs/op   new allocs/op   delta
------------------------------------------------------------
bits_64    0.00            0.00            ~     (all equal)
bits_128   0.00            0.00            ~     (all equal)
bits_192   0.00            0.00            ~     (all equal)
bits_255   0.00            0.00            ~     (all equal)

This is part of a series of commits to fully implement the uint256
package.
This adds full support for formatting a uint256 along with associated
tests to ensure proper functionality.

It includes a fmt.Formatter that supports the full suite of the fmt
package format flags for integral types, a fmt.Stringer, and a separate
Text method that accepts an output base directly and produces the
relevant output with fewer allocations than using the standard fmt
methods.

This is part of a series of commits to fully implement the uint256
package.
The following is a comparison between stdlib big integers (old) and the
specialized type (new) averaging 10 runs each:

name             old time/op     new time/op     delta
---------------------------------------------------------------------------
Text/base_2      579ns ± 3%      496ns ± 2%      -14.37%  (p=0.000 n=10+10)
Text/base_8      266ns ± 1%      227ns ± 1%      -14.58%  (p=0.000 n=10+10)
Text/base_10     536ns ± 1%      458ns ± 2%      -14.58%  (p=0.000 n=10+10)
Text/base_16     205ns ± 2%      180ns ± 4%      -11.90%  (p=0.000 n=10+10)
Format/base_2    987ns ±15%      852ns ± 2%      -13.64%  (p=0.000 n=10+10)
Format/base_8    620ns ± 6%      544ns ± 3%      -12.31%  (p=0.000 n=10+10)
Format/base_10   888ns ± 1%      726ns ± 1%      -18.25%  (p=0.000 n=10+10)
Format/base_16   565ns ± 1%      449ns ± 1%      -20.41%  (p=0.000 n=10+10)

name             old allocs/op   new allocs/op   delta
--------------------------------------------------------------------------
Text/base_2      2.00 ± 0%       2.00 ± 0%         ~     (all equal)
Text/base_8      2.00 ± 0%       2.00 ± 0%         ~     (all equal)
Text/base_10     3.00 ± 0%       2.00 ± 0%      -33.33%  (p=0.000 n=10+10)
Text/base_16     2.00 ± 0%       2.00 ± 0%         ~     (all equal)
Format/base_2    5.00 ± 0%       3.00 ± 0%      -40.00%  (p=0.000 n=10+10)
Format/base_8    5.00 ± 0%       3.00 ± 0%      -40.00%  (p=0.000 n=10+10)
Format/base_10   6.00 ± 0%       3.00 ± 0%      -50.00%  (p=0.000 n=10+10)
Format/base_16   5.00 ± 0%       3.00 ± 0%      -40.00%  (p=0.000 n=10+10)

This is part of a series of commits to fully implement the uint256
package.
This adds convenience methods for converting a uint256 to a standard
library big integer along with associated tests to ensure proper
functionality.

It includes a method that allows an existing big integer to be reused
thereby potentially saving allocations as well as a method that returns
a new big integer.  The latter is often more convenient to use, but is
also virtually guaranteed to cause an allocation.

This is part of a series of commits to fully implement the uint256
package.
The following shows the typical performance of converting a uint256 to a
standard library big integer using one that already exists:

Uint256PutBig   43651442   27.29 ns/op   0 B/op   0 allocs/op

This is part of a series of commits to fully implement the uint256
package.
This adds a convenience method for converting a standard library big
integer to a uint256 (modulo 2^256) along with associated tests to
ensure proper functionality.

This is part of a series of commits to fully implement the uint256
package.
The following shows the typical performance of converting a standard
library big integer that has already been reduced modulo 2^256 to a
uint256:

Uint256SetBig   26944130   44.45 ns/op   0 B/op   0 allocs/op

This is part of a series of commits to fully implement the uint256
package.
This adds an example of calculating the result of dividing a max
unsigned 256-bit integer by a max unsigned 128-bit integer and
outputting that result in hex with leading zeros.

This is part of a series of commits to fully implement the uint256
package.
@davecgh davecgh merged commit 8f3fd5a into decred:master Nov 20, 2021
@davecgh davecgh deleted the primitives_uint256 branch November 20, 2021 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants