`randomInt(min, max)` has biased distribution #2720

dimitri-xyz · 2017-04-25T18:41:45Z

dimitri-xyz
Apr 25, 2017

The final definition of randomInt uses a scale-then-floor algorithm to fit the distribution to the range:

      var _randomInt = function(min, max) {
        return Math.floor(min + distribution() * (max - min));
      };

This scaling introduces a tiny bias in the resulting distributions. If N random bits of precision are used in distribution() this bias is only of order 2^(-N), but it may be a problem for some scientific and crypto applications.

I have now seen this problem everywhere, so I wrote a blog post to explain the details.

You may want to consider giving a precision parameter to the definition of distribution and then performing the scaling at that level to make sure the bias can always be made as small as desirable. (A different solution for uniform distributions is presented in the blog post)

josdejong · 2017-05-07T12:26:15Z

josdejong
May 7, 2017
Maintainer

Thanks for bringing this up.

If anyone is interested in working out a solution for randomInt please let us know.

0 replies

dimitri-xyz · 2017-05-07T14:55:32Z

dimitri-xyz
May 7, 2017
Author

@josdejong

This comment is for the uniform distribution. I haven't looked at how to avoid this problem for other distributions.

I think the code at the end of the blog post can be used as the basis for an implementation. The question would be: how large a range do we require? (According to the docs randomInt doesn't currently support BigNumber) We can get a range from 0 to (2^51) with regular numbers, but for larger ranges we would need to use BigNumbers.

Is 2^51 large enough? Do we want to add BigNumber support to randomInt?

0 replies

josdejong · 2017-05-07T18:41:18Z

josdejong
May 7, 2017
Maintainer

So far the random functions only support numbers. Adding support for BigNumbers will be great, but apart from that it would be nice to improve the implementation for regular numbers. Would you be interested in giving this a try Dimitri?

0 replies

dimitri-xyz · 2017-05-08T00:42:37Z

dimitri-xyz
May 8, 2017
Author

Sure! Let me get acquainted with the error handling strategy used in the library and also how to write code that will work both in the browser with RandomSource.getRandomValues() and also outside it in Node.js with crypto.randomBytes() and I'll make a pull request. (If you have any tips/docs on the standard ways math.js solves these two problems, let me know.)

0 replies

josdejong · 2017-05-08T07:02:04Z

josdejong
May 8, 2017
Maintainer

Cool! If you have any questions just ask. For crypto you will probably have to create a switch to check whether in a browser environment or in node.js, and we should make sure that node.js crypto library isn't bundled with math.js (it's quite large). Maybe there are libraries out there to do this for you, I'm not sure.

And also, feel free to refactor code in distribution.js, it can use some love...

0 replies

dimitri-xyz · 2017-05-10T23:01:01Z

dimitri-xyz
May 10, 2017
Author

In my view, this interface:

   * @param {string} name   Name of a distribution. Choose from 'uniform', 'normal'.
   * @return {Object}       Returns a distribution object containing functions:
   *                        `random([size] [, min] [, max])`,
   *                        `randomInt([min] [, max])`,
   *                        `pickRandom(array)`

(defined in distribution.js) is inadequate. It assumes that all distributions behave like the uniform distribution in two ways:

the samples obtained have upper and lower bounds
the distribution can be made discrete

Neither of these is true for the normal distribution. A sample from the normal distribution can be any value from -infinity to +infinity and those values are not discrete.

Other distributions are discrete, but not bounded. For example, the geometric distribution is discrete but not bounded. It’s samples are in the interval [1, +infinity).

This means the interface defined by random() and randomInt() simply cannot be implemented for these other distributions. In the case of the normal distribution, this has required us to pick a particular value for the mean and standard deviation to force 99.7% of samples to be in [0, 1]. We should not provided a truncated distribution. We should instead allow the user to specify the required mean and standard deviation.

In short, I believe we should not have this interface. (randomInt() only makes sense for some distributions.) My suggestion would be to only have one method, sample (or randomSample). Also, parameters to the distributions should be provided. For example:

var normal = math.distribution('normal', 0 , 1); // create a normal distribution
                                                 // with zero mean and variance 1
var x = normal.sample(); // it's unlikely but possible that x = -11.2345

//create continuous uniform distribution in [2,5)
var uniform = math.distribution(‘continuous-uniform’, 2, 5) 
var y = uniform.sample() // it’s possible that y = 4.2867


var disc = math.distribution(‘discrete-uniform’, 2, 5); 
var z = disc.sample() // z can only be one of [2,3,4,5]

This makes clear the difference between a continuous uniform distribution and a discrete one.

So, I recommend a few breaking changes:

change the interface to receive parameters and provide a sample() function
change the normal distribution to range from -infinity to +infinity
separate the uniform distribution into continuous and discrete versions.

Thoughts?

0 replies

josdejong · 2017-05-11T18:51:12Z

josdejong
May 11, 2017
Maintainer

Thanks, your suggestion for a new API make sense to me, sounds good and simple (from a usage point of view).

Because the idea of distribution() wasn't thought out enough we removed distribution in an early stage from the public API. It's perfectly fine with me to replace it completely or change it's API. So we have all freedom here, no breaking changes I think (unless the behavior of math.random, math.randomIn, or math.pickRandom would change).

I'm not sure what's the easiest approach to realize this - maybe start from scratch and built a completely new implementation of distribution. So... knock yourself out :)

0 replies

dimitri-xyz · 2017-06-22T19:11:15Z

dimitri-xyz
Jun 22, 2017
Author

@josdejong
I'm sorry. I'm swamped and won't be able to get to this in the near future. Wanted to give up a heads up.

0 replies

josdejong · 2017-06-23T07:01:19Z

josdejong
Jun 23, 2017
Maintainer

Thanks for the update Dimitri, I fully understand :D no worries

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`randomInt(min, max)` has biased distribution #2720

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

randomInt(min, max) has biased distribution #2720

dimitri-xyz Apr 25, 2017

Replies: 9 comments

josdejong May 7, 2017 Maintainer

dimitri-xyz May 7, 2017 Author

josdejong May 7, 2017 Maintainer

dimitri-xyz May 8, 2017 Author

josdejong May 8, 2017 Maintainer

dimitri-xyz May 10, 2017 Author

josdejong May 11, 2017 Maintainer

dimitri-xyz Jun 22, 2017 Author

josdejong Jun 23, 2017 Maintainer

`randomInt(min, max)` has biased distribution #2720

dimitri-xyz
Apr 25, 2017

josdejong
May 7, 2017
Maintainer

dimitri-xyz
May 7, 2017
Author

josdejong
May 7, 2017
Maintainer

dimitri-xyz
May 8, 2017
Author

josdejong
May 8, 2017
Maintainer

dimitri-xyz
May 10, 2017
Author

josdejong
May 11, 2017
Maintainer

dimitri-xyz
Jun 22, 2017
Author

josdejong
Jun 23, 2017
Maintainer