Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uniform_real_distribution<double> produces identical values too often #61

Open
danieltuzes opened this issue Nov 26, 2019 · 15 comments
Open

Comments

@danieltuzes
Copy link

If I generate uniformly distributed numbers with boost::random::uniform_real_distribution<double> it produces identical values too often. For further information, please read the original issue on SO and an example code on wandbox. I post my problem here as somebody suggested that it could be a bug. I am still not convinced about that but I am also not sure if there is any bug in my code that produces this strange behavior.

I copy the question from SO here but follow the link for further updates and suggestions.

Problem description

Sometimes I get the same random number from a uniform distribution using a Mersenne Twister engine even I properly used the engine and iterated it. I know that the number of possible states of the engine is finite and number of possible generated values is also finite, but this is not the case now.

Using boost's implementation, 1e6 number of uniformly distributed random values are generated on the range [0; 1e7). That means that there way more possible values than required number of random values. However, I get quite often the same values, sometimes more than 100 times in this range. How is it possible?

Code

A simple code is provided to reproduce the situation. On both platforms I get the same problem:

  • MSVS 2019 with boost-random:x64-windows 1.71.0, and
  • g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 with libboost-dev 1.58.0.1ubuntu1
#include <iostream>
#include <chrono>

#include <boost/random/mersenne_twister.hpp>          // random number generator
#include <boost/random/uniform_real_distribution.hpp> // uniform distribution generator
using namespace std;

int main()
{
    size_t seed = static_cast<int> (std::chrono::system_clock::now().time_since_epoch().count());
    cout << "seed = " << seed << endl;
    
    boost::random::mt19937 engine(seed);                         // the random number generator engine
    boost::random::uniform_real_distribution<double> u(0, 1e7);  // uniformly distributed double values on the range [0; 1e7)
    cout.precision(20);
    vector<double> history;                                      // stores the generated values for comparison
    for (size_t i = 0; i < 1e6; ++i)
    {
        history.push_back(u(engine));
        for (size_t j = 0; j < i; ++j)
            if (history[i] == history[j])
                cout << "Equal values ("<< history[i] <<") at ID = " << i << " and " << j << endl;
    }
}

Question

Is there a bug in the code that produces the same values? Or is it a bug in boost?

For my task it is important to generate numbers with uniform distribution. Finding identical values is one of the easiest tests but there are many more and I am sure I don't want to do quality analysis on a well-known library like Boost. I didn't want to use the standard library, because it is not guaranteed that two different compilers will give the same sequence for the same seed values, but it was a requirement for the task. What kind of a solution can you suggest?

Note

A strange behavior can be seen if one compares the generated values with the one std::random generates. Example for values from random::boost for seed 4561565448989 is

1755586.0406719148159
3354420.976247638464   <--
3630764.0071026980877
3488445.2889673411846  <--
7920481.4555123448372
8773544.1024415194988  <--

while standard library generates

3354420.9766563926823  <--
3488445.2898126943037  <--
8773544.1042856499553  <--
...

That is, every second generated value in the boost's sequence is very close to a corresponding value in the standard library's implementation. When two values in the boost-sequence are equal, the values in the standard-library-sequence are not equal, but close to each other. The similarity holds for MSVS and g++ compilers too, which have the right to have different implementation for Mersenne Twister and distributions.

It is the distribution generator

If the standard library's MT engine is used, but boost's distribution, the problem persists. But if the engine is the one from boost and the distribution is standard, the problem disappears. The problem is, as Peter pointed out, that the uniform distribution is platform depend for which I use boost.

@kotika
Copy link
Contributor

kotika commented Nov 26, 2019 via email

@danieltuzes
Copy link
Author

How would you than describe the cases I mentioned at It is the distribution generator? I.e., if I use std's MT engine and pass it to std's uniform distribution, I get the proper distribution, but if I pass it to boost's uniform distribution, I get the issue? Statistics are available on SO.

Could you please suggest a way (a complete example with the same features I posted above), using boost, to generate uniformly distributed double values on a range? Bc it seems ATM that it is not possible with boost's (random and) uniform distribution, but with std's (random and) uniform distribution. Or did you mean that it is impossible with boost ATM, and I should use copy your code code and forget about boost's implementation?

@kotika
Copy link
Contributor

kotika commented Nov 26, 2019 via email

@danieltuzes
Copy link
Author

danieltuzes commented Nov 26, 2019

I cannot test your code because my boost doesn't come with mixmax, so #include <boost/random/mixmax.hpp> fails. Or do you mean that boost 1.71 is not suitable to produce random numbers as good as the implementations from MSVS and gcc, and you suggest me to use your yet-3rdparty implementation? I make the requirement more precise: I need a solution that relies only on the available boost and a complete, compact code, let's say, a header file, without additional libraries and folders of sources files.

Your suggestion provides another way to generate numbers from the random number generator engine but uses the same distribution. As I pointed out, there is possibly no error with the engine but with the distribution.

@danieltuzes
Copy link
Author

@kotika You are right about that it is only an effect of the too discrete values produced by the distribution, as it can be deduced from the figure.

The figure shows that std's distribution (using boost's MT engine) follows well the expected cumulative distribution while boost's solution, the solid green line doesn't. The blue line is obtained by shifting the green line by the units q = 2.32831e-10 in x, supposing that an inner floor was applied somewhere, therefore the y points measured at x = x0, x1, ... correspond to x = x0+q, x1+q, ...

But as I measured, the same boost MT engine with std's distribution produces a much better distribution. Both with the std from MSVS and g++.

@kotika
Copy link
Contributor

kotika commented Nov 26, 2019 via email

@jzmaddock
Copy link
Contributor

It's boost's distribution, but I don't see off hand where the issue is.

I tried the following code:

   boost::random::mt19937 dist;
   boost::random::uniform_real_distribution<double> d(0, 100);

   std::map<double, unsigned> m;

   for (unsigned i = 0; ; ++i)
   {
      double dd = d(dist);
      if (m[dd]++)
      {
         std::cout << "Collision on value " << dd << " after " << i << " iterations with collision count " << m[dd] << std::endl;
      }
   }

On MSVC, and there are thousands of collisions, but strangely the collision count on any given value never rises above 2.

If I replace boost::random::uniform_real_distribution with std::uniform_real_distribution but keep the generator the same, then there are NO collisions.

I have scanned through the 2 codebases - uniform_real_distribution isn't actually that complex - but I can't spot the difference between the msvc and boost code at present.

@danieltuzes
Copy link
Author

@peteroupc Have you tried your suggestion? Was it working?

@danieltuzes
Copy link
Author

@kotika Once again, the MT engine is just fine. Boost's MT engine is good enough bc std's distribution produces good results with it. Std's MT engine is also good for the same reason.

Boost's distribution is bad, bc on both good engine it produces bad results, bc the produced small numbers are not evenly distributed even for a 32bit float. You can still use a 64bit engine that hides this problem, but one should make it clear than that uniform_real_distribution doesn't produce even float-uniformly distributed values from a 32bit engine.

@peteroupc
Copy link

@peteroupc Have you tried your suggestion? Was it working?

I took it back, sorry.

@kotika
Copy link
Contributor

kotika commented Nov 27, 2019 via email

@danieltuzes
Copy link
Author

Could you please give this as an answer on my SO question so I can accept it? Or do you mind if I copy it instead of you?

@peteroupc
Copy link

peteroupc commented Nov 27, 2019

I want to point out that boost::random::uniform_real_distribution, at line 33, generates one real number (of type T) from only one integer (of type Engine::result_type) from the random engine, and that the method doesn't care what the size and precision of T and Engine::result_type are. Notably, the fewer bits Engine::result_type has (in comparison to the precision of T, which for double is usually 53 bits), the fewer values will be spread across the range given to the generator. However, if that method took enough values from the random engine to be close to the precision of T (rather than just one value), this problem is reduced.

For example, if T is double the method could detect that the size of Engine::result_type is less than 53 bits, and if so, take two or more values from the random engine (rather than one) to generate each real number. This lessens the risk of using random engines with 32-bit outputs, but there are two disadvantages:

  • This method is not exactly trivial to implement, given that the distribution allows an arbitrary range.
  • The method will not be backward compatible with previous versions.

This is a suggestion that I have not tried, but I believe it may help reduce the risk of using random engines with 32-bit or smaller outputs.

The true and permanent solution
is to deprecate old generators and use better high-resolution generators, which like I said, means at the moment: MT64 and MIXMAX.

The quality of a PRNG does not depend on the size of random numbers it outputs (e.g., 32- or 64-bit numbers), but rather it depends on its state size, its period, and whether the numbers behave like independent uniform random numbers.

@kotika
Copy link
Contributor

kotika commented Nov 27, 2019 via email

@mathemaphysics
Copy link

Maybe this has been discovered already, but I got the same result using boost's uniform_real_distribution with the Mersenne generators definitively because I was accidently re-seeding the generator/engine using srand((unsigned)time(0)) followed by handing rand() to any of the engines as seed.

Once I made sure srand((unsigned)time(0)) was only called once before the boost RNG was initialized with rand() as seed and that I wasn't creating multiple boost RNGs to produce my ensemble averages it worked out nicely. No more odd repeated numbers showing up in the distribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants