-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Faster RANLUX PRNGs #57
Comments
On Sep 26, 2019, at 12:06, Christoph Conrads ***@***.***> wrote:
This is a a cross-post from the boost developers mailing list from App 10.
Boost.Random and the C++11 standard library ship with a 24- and a 48-bit
RANLUX pseudo-random number generator (PRNG). I propose to add 32- and
64-bit RANLUX PRNGs because they are at least 40% faster. Additionally,
the 32-bit RANLUX PRNG can be implemented as follows:
....
typedef br::subtract_with_carry_engine<boost::uint32_t,32,3,17> ranlux32_base;
typedef br::discard_block_engine<ranlux32_base, 389, 16> ranlux32;
I see three separate issues with your proposal.
1) It is very well established that subtract_with_carry + discard_block_engine = RANLUX
only for the very specific parameter sets described in the paper by Luscher. What you have is not RANLUX.
I dont think it is fair to use open-source software names
without permission. I guess you could rename it to address this concern.
2) You say it is 40% faster than RANLUX, but RANLUX itself is 10-20 times slower that the other generators.
As a result, your generators would still be 5-10 times slower than the others.
3) Your paper is not published in a journal. I know, it is a little old fashioned in the age of facebook, but it is still a requirement.
Regards, Kostas
========================================
Institute of Nuclear and Particle Physics
NCSR Demokritos, Athens, Greece
http://mixmax.hepforge.org
https://github.com/kotika/random/
========================================
|
And we're talking about standardizing PCG! I think that ship has sailed. @christoph-conrads: Could you post code to run it through PractRand as well as your results? |
AMDG
On 9/26/19 6:02 AM, Nick wrote:
> 3) Your paper is not published in a journal. I know, it is a little old fashioned in the age of facebook, but it is still a requirement.
And we're talking about standardizing PCG! I think that ship has sailed.
I don't have the expertise to evaluate the quality of
a PRNG algorithm. I'm not convinced that the standards
committee does either.
@christoph-conrads: Could you post code to run it through PractRand as well as your results?
In Christ,
Steven Watanabe
|
On Sep 26, 2019, at 15:02, Nick ***@***.***> wrote:
And we're talking about standardizing PCG! I think that ship has sailed.
This one is slightly better than PCG, but both are far inferior to MT64. Every other generator in boost and the standard are
completely obsolete.
Cheers,
Kostas
|
You don't need to; if the code looks good to you, I'll do the requisite evaluation. |
Compile with
Below you find for the 32-bit generator
Let me know if you anything else.
dieharder results:
Building practrand:
|
On Sep 26, 2019, at 13:10, Kostas Savvidis ***@***.***> wrote:
I see three separate issues with your proposal.
1) It is very well established that subtract_with_carry + discard_block_engine = RANLUX
only for the very specific parameter sets described in the paper by Luscher. What you have is not RANLUX.
I dont think it is fair to use open-source software names
without permission. I guess you could rename it to address this concern.
2) You say it is 40% faster than RANLUX, but RANLUX itself is 10-20 times slower that the other generators.
As a result, your generators would still be 5-10 times slower than the others.
3) Your paper is not published in a journal. I know, it is a little old fashioned in the age of facebook, but it is still a requirement.
On 9/26/19 6:02 AM, Nick wrote:
> 3) Your paper is not published in a journal. I know, it is a little old fashioned in the age of facebook, but it is still a requirement.
And we're talking about standardizing PCG! I think that ship has sailed.
I would still like Christoph Conrads to address the first two points.
Cheers, Kostas
|
Changing parameters is in my opinion not sufficient to warrant renaming the generator. For example quicksort with randomized or three pivots is still called quicksort. The (Francis) QR algorithm is after 50 years of progress in numerical linear algebra and all of its adaptions to modern computer architectures still called the QR algorithm. The proposed generator parameters were derived by applying the methods given in the papers by Marsaglia, Zaman and Lüscher with only minor modifications. I retraced almost all of the steps in their papers. Also, the new parameters preserve all desirable properties of a RANLUX; no corners were cut.
Boost and C++11 ship with RANLUX PRNG implementations. The new 32-bit generator possesses all good properties of the existing generators while being much faster and it can be implemented with a simple type alias. Moreover, in C++ the generator never uses (single-precision) floating-point arithmetic which was originally the intention behind 24-bit numbers (cf. Section 2.4 in Lüscher's paper). |
@christoph-conrads : Your dieharder and practrand results look good; I'll try to reproduce them this afternoon. If people are still unconvinced (not me) you could also try to get TestU01 working. Do you mind writing a googlebenchmark file and posting the speeds you get? Also, could you add a benchmark for Here's an example for timing the normal distribution:
Is there a 64 bit generator as well as 32 bit? Does the generator hold internal state? How much? (Sorry, would read your code to back this out but simply asking seemed faster.) Is this only recommended for simulations? Are all 32-bit integers provably in the codomain of the generator? Can you also verify that the implementation is clean with |
On Sep 28, 2019, at 14:44, Christoph Conrads ***@***.***> wrote:
I retraced almost all of the steps in their papers.
In the field of RNGs every new guy who came and suggested a new LCG with a new modulus and a new multiplier
gave it a new name, e.g. "minstd" is the same old modulus of 2³¹ − 1 which had been used for 50 years before and a new multiplier=16807.
This is what you have done too, made a new modulus and multiplier for an LCG. As you say yourself, your work does not contain any original idea,
except for factoring the large non-prime modulus which Marsaglia and Zaman could not do in 1990.
Anyway, if you insist it is indeed a genuine ranlux and if your invention is of practical use, why dont you try to convince Luscher to include your "improvements" to his distribution?
His package is at http://luscher.web.cern.ch/luscher/ranlux/
and his work is copyrighted. I presume the name is not in the public domain.
* File ranlux.h
* Copyright (C) 2019 Martin Luescher
The last comment is that Luscher himself has improved the performance of his generators by a factor of 2-4 compared to the naive implementations,
when you say you improved it by 40% what are you comparing it to? And for those who are not aware there is already another experimental
new implementations of RANLUX by Sibidanov which are even faster.
Cheers,
Kostas
============================================================================================
Institute of Nuclear and Particle Physics
NCSR Demokritos
https://github.com/kotika/random <https://github.com/kotika/random>
https://mixmax.hepforge.org <https://mixmax.hepforge.org/>
|
Hi @kotika : Some of your suggestions are indeed interesting, but they are couched in somewhat inflammatory language which will not help us all move forward in making the library better. Could you edit your comment to only include technical commentary? |
@kotika : yes, you'll find my email in the git log of boost.math. Also, I recommend editing your original content instead of duplicating it. |
I did, it works (see my blog post from the thread start) but I do not want to post the code calling TestU01 because its license is unclear to me. The website mentions GPLv3 while
The file is below and it benchmarks Results:
Yes but the 64-bit generator requires a small amount of new code in
18 32-bit integers for
The generator aims to create high quality uniform variates. Existing RANLUX flavors have been used successfully in, e.g., computational physics.
Yes, all 32-bit integers are provably in the codomain of Yes, all 64-bit integers are provably in the codomain of
Note that the 32-bit generator is just an alias, there is no new code:
Earlier in this thread, I published
Google Benchmark file (
Build instructions for
|
The paper of Marsaglia of Zaman contains multiple tables of generator parameters so I would not say that this is true in general.
You are mixing up copyright and trademark law. Also, strictly speaking Frederick James would be the trademark owner because he introduced the term, not Lüscher. I would like to publish my findings in Boost.Random.
If you run the benchmark in the initial post, the two fastest generators should be With Google Benchmark (see above), the difference is 36% (44.7ns vs 28.6ns) but their measurement setup is different from mine. |
@christoph-conrads
This is by no means enough testing, think many TB's, not GB's. Basically you'll have to let it run until it fails, or until it starts to look it won't fail, but this can be way out! You should also just get the summary results. See also http://www.pcg-random.org/posts/pcg-passes-practrand.html . There is a flaw in your testing.
The above says you're testing a 32-bit prng on 64-bit input, this means that to generate 1 input, you use 2 outputs of the generator. This will obviously give good results. You should run the test like so:
The bench-marks:
You do see, that There are much faster prng's that dont' fail Practrand quickly, like: https://gist.github.com/imneme/85cff47d4bad8de6bdeb671f9c76c814 , https://gist.github.com/imneme/f1f7821f07cf76504a97f6537c818083 and https://gist.github.com/imneme/6179748664e88ef3c34860f44309fc71. The latter, The fastest of them all is https://gist.github.com/imneme/aeae7628565f15fb3fef54be8533e39c , but it fails |
@degski : Nice catch, yes terabytes is much more appropriate especially since this generator carries enough internal state to make it sensible. |
@NAThompson It seems |
@degski : I don't know how much internal state ranlux24 carries, but could you be seeing a birthday paradox? |
@NAThompson I believe it has a state of 120 bytes. The bdp-problem related only to I think for now the only issue is whether It still leaves the, for most people, most important issue on the table, 'it's slow' and there are better and faster prng's. |
@NAThompson I've got code for a number of |
@degski : I'm too busy at the moment to take over any rng efforts. However, I do have time to review contributions. Your xorshiro code looks incredibly well-done, and even though there might be better RNGs, I think adding your implementation would be a step in the right direction. |
@NAThompson Like I wrote, Steven reviewed it, so it should be good. The problem is in the tests and docs, let's just keep it on the back-burner for now, the code is there and will stay there. |
It [ |
Thanks for finding this.
The goal is to provide a better set of parameters for a high-quality PRNG that is already available to all users of Boost.Random and C++11. No or minimal code changes are required, respectively. All Also, RANLUX may be 25 times slower than the Mersenne Twister but MT is guaranteed to fail empirical tests. It depends on your requirements if this is acceptable to you or not. |
24*24 bits = 576 bits with a period of about 2^568 or 10^171. |
It should if the math is correct because
Marsaglia and Zaman discuss in their original paper generators of the form |
Thank you for running the test. |
Unfortunately we [me and my fellow-villagers] had a power-cut [I live in a 3-rd world country] and the test got whacked. I post the result so far, and do not intend to pursue this any further, this prng is still 25 times slower than MT, so I won't be using it [and nobody should].
|
@swatanabe Are you interested in ranlux32 has 544 bits of state plus a carry bit and a period of 2^534. |
The reason why Iooked into the RANLUX is because I just could not figure out why someone would voluntarily use a PRNG using 24-bit arithmetic in C++ when 32-bit arithmetic is ubiquitous. This forces compilers to generate an extra instruction (a |
On Sep 28, 2019, at 23:57, Christoph Conrads ***@***.***> wrote:
Please not that currently ranlux32 does not work with the GNU C++ Standard Library (libstdc++) due to GCC bug 89979.
=============================================================================================================================
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89979
Reported: 2019-04-04 21:46 UTC by Christoph Conrads
The carry inside `std::subtract_with_carry_engine` is computed incorrectly if word_size == std::numeric_limits<result_type>::digits.
The bug can be triggered neither by std::ranlux24 nor by std::ranlux48.
Assignee: Not yet assigned to anyone
===============================================================================================================================
The facts:
1) a proposal which can be implemented by adding two lines of typedef
2) but which does not work with GCC libstdc++
3) six months later, and noone at GCC has been assigned to look into this
And, today we get another tangible datapoint:
On Oct 3, 2019, at 17:22, degski ***@***.***> wrote:
Unfortunately we had a power-cut [I live in a 3-rd world country] and the test got whacked.
4) the time to run the statistical tests on this "fast" "new" rng is so long that it is longer than the time between power outages in some countries
Cheers,
Kostas
============================================================================================
Institute of Nuclear and Particle Physics
NCSR Demokritos
https://github.com/kotika/random
https://mixmax.hepforge.org
|
I devised new parameters (
The PRNG was started from the all-zero state with the carry bit set. For the test I made the PRNG regularly print its state so if you want to figure out if I really ran this experiment, you read in a random state in this file and let the PRNG generate approximately 8GB of data before you can again compare with the state in the file.
@swatanabe Given that I produce a document explaining how to arrive these parameters and benchmarks, would you consider to accept a pull request for the new parameters? |
This is a a cross-post from the boost developers mailing list from App 10.
Boost.Random and the C++11 standard library ship with a 24- and a 48-bit
RANLUX pseudo-random number generator (PRNG). I propose to add 32- and
64-bit RANLUX PRNGs because they are at least 40% faster. Additionally,
the 32-bit RANLUX PRNG can be implemented as follows:
There is C++98 benchmark code at the end of this e-mail requiring only
Boost and a C++ compiler.
Below I will briefly discuss the theory behind RANLUX PRNGs, where the
new generators are coming from, and why the 64-bit RANLUX PRNG requires
(a few lines of) new code.
Drawing from RANLUX PRNGs is a two-step process. In the first step a
value is drawn from a "subtract-with-borrow" (SWB) generator
implementing b-bit modular arithmetic [1]. Then, values are generously
discarded yielding uniformly distributed values (uniform in a dynamical
systems model) [2]. The RANLUX PRNGs using 24- and 48-bit arithmetic
always gave me an itch so I looked into the motivation behind this. It
turns out [3] that this restriction arose likely from a) computational
limitations in the 1990s and b) Lüscher's interest in the generation of
floating-point numbers (single-precision floating-point numbers have
24-bit significand).
SWB PRNGs come in two flavors (Type I, Type II) and they make use of
three parameters called base b, short lag s, and long lag r. Using
modern prime factorization methods, one can quickly come up with the
32-bit PRNG above, a Type I SWB generator with b=2^32, r=17, and s=3.
This one can be implemented as shown above.
To generate 64-bit values, one should use a Type II SWB with b=2^64,
r=62, and s=5 (all other 64-bit generators with b=2^64 and r < 100 will
be awfully slow). What distinguishes the different types of SWBs is the
recurrence for the computation of variates. For Type I it is
x_n := (x_{n-s} - x_{n-r} + c_n) mod b,
where c_n is a carry bit. For Type II it is
x_n := (x_{n-r} - x_{n-s} + c_n) mod b.
(Observe the order of long lang and short lag in these formulas.) Type
II SWBs are not difficult to implement but it requires either a
modification of
subtract_with_carry_engine
or a new class.Let me know what you think about this proposal.
Christoph Conrads
[1] G. Marsaglia, A. Zaman: "A New Class of Random Number Generators".
1991. URL: https://www.jstor.org/stable/2959748
[2] Martin Lüscher: "A Portable High-Quality Random Number Generator for
Lattice Field Theory Simulations". 1994. URL:
https://arxiv.org/abs/hep-lat/9309020
[3] Christoph Conrads: "Faster RANLUX Pseudo-Random Number Generators".
2019. URL:
https://christoph-conrads.name/faster-ranlux-pseudo-random-number-generators
The text was updated successfully, but these errors were encountered: