OT: Random Number Generator

I_Automation · Mar 12, 2022

One thing about a random number is that it CAN repeat.

If you have a single digit random number generated and it has reported 7 of the 10 number values possible that does not mean the 8th value reported will be one of the 3 remaining unused values and not repeat a previously used value. It isn't going to log values reported and make sure the next value hasn't been used in the last 9 reports - that would force a very predictable pattern after the first 9 values. The 10th would HAVE to be the only unused number, and the 11th would be the same as the 1st, 12th = 2nd, 13th = 3rd,... 43,235,678th = 8th.

It is entirely possible with a single digit random number to easily get the same number twice, or even 3 times, in a row. For a 2 digit random number even getting the same value twice in a row is easily possible.

The purpose of a random number is to be completely unpredictable, and if you can predict that the value will not be any value previously used then it is not random, but calculated.

Peter Nachtwey · Mar 12, 2022

I_Automation said:
One thing about a random number is that it CAN repeat.

If you have a single digit random number generated and it has reported 7 of the 10 number values possible that does not mean the 8th value reported will be one of the 3 remaining unused values and not repeat a previously used value. It isn't going to log values reported and make sure the next value hasn't been used in the last 9 reports - that would force a very predictable pattern after the first 9 values. The 10th would HAVE to be the only unused number, and the 11th would be the same as the 1st, 12th = 2nd, 13th = 3rd,... 43,235,678th = 8th.

It is entirely possible with a single digit random number to easily get the same number twice, or even 3 times, in a row. For a 2 digit random number even getting the same value twice in a row is easily possible.

The purpose of a random number is to be completely unpredictable, and if you can predict that the value will not be any value previously used then it is not random, but calculated.

Yes.
Currently I am generating all the combinations of 0 to 1000000000 and counting the number of times a number is repeated. This is slow as I am doing it in python. I will have the exact answer.

drbitboy · Mar 13, 2022

I_Automation said:
One thing about a random number is that it CAN repeat. ...

That is the meaning of the phrase "with replacement" that we have been adding to our posts. That declares which of the two conventional sampling methods is used. The basic idea is that there is a set or pool or reservoir of some cardinal number, M, of unique items; in the most recent posts M is 10, and the items are the single decimal digits (0, 1, 2, ..., 9). Sampling "with replacement" means when an item, say the digit 4, is sampled (picked) at random from the set of 10 decimal digits, then that 4 is replaced into the set, so any future random sample may still/also be a 4. Sampling "without replacement" means that when an item is picked from the set, it is not replaced; a side effect of this is that the number of samples is limited to M.

If @Peter Nachtwey gets anything other than 10! for the number of samples with 10 unique digits in the range 0 to 1E10, it's the wrong answer.

P.S he could have saved time by using a near-to-10 power of 2 e.g. 8, so there would be about one-tenth the samples to check, and checking each digit via bit twiddling would be faster; the answer would be 8! (= 40,320) out of 8**10 (2**30 ~ 1G = 1E9), for a probability ~ 3.8E-5, and the mistakenly-applied binomial method would be off by a few orders of magnitude. Again.

drbitboy · Mar 13, 2022

That should be 8**8 =16M, not 8**10.

drbitboy · Mar 13, 2022



#include <stdio.h>

#include <stdlib.h>

#include <string.h>



int main() {

double v,vmod;

double matches;

char s[11];

char* s1 = s+1;

char* s10 = s+10;

char* p;

char* pmatch;

char cmd[] = { "date +%H:%M:%S" };

  system(cmd);

  for (matches=v=0.0; v<1e10; v+=1.0) {

    sprintf(s,"%010.0lf",v);

    pmatch = NULL;

    for (p=s1; NULL==pmatch && p<s10; ++p) {

      if ((pmatch=strchr(p,(int)p[-1]))) break;

    }

    if (NULL==pmatch) { matches += 1.0; }

    vmod += 1.0;

    if (2e8 == vmod) {

      vmod = 0.0;

      fprintf(stdout,"%s %12.1lf %12.1lf %10.8lf  ",s,v,matches,matches/v);

      fflush(stdout);

      system(cmd);

    }

  }

  return 0;

}

And the starting few results are



./09

15:30:55

0199999999  199999999.0      40320.0 0.00020160  15:31:50

0399999999  399999999.0     120960.0 0.00030240  15:32:49

0599999999  599999999.0     201600.0 0.00033600  15:33:50

0799999999  799999999.0     282240.0 0.00035280  15:34:49

0999999999  999999999.0     362880.0 0.00036288  15:35:49

1199999999 1199999999.0     403200.0 0.00033600  15:36:47

1399999999 1399999999.0     483840.0 0.00034560  15:37:48

1599999999 1599999999.0     564480.0 0.00035280  15:38:49

The bold blue italicized line, after 1G samples, is a preview of the what the final answer will be after 10G (10^10) samples, which makes sense because 9! = 10! ÷ 10.

drbitboy · Mar 13, 2022

I'm not always right ...



9999999999 9999999999.0    3628800.0 0.00036288  16:19:37

... but it does usually work out that way.

N.B. 3628800 = 10! and0.00036288 = 10! ÷ 1E10.

Peter Nachtwey · Mar 13, 2022

The binom.pmf function is correct. It will correctly calculate the percentage the number of items appears. To prove it I first tried generating all the combinations from 0 to 10^10 but that was going to take days. I reduced the problem using digits from 0 to 4 using base 5 and using only 5 digit. It is easy to generate all 0 to 625 combinations of base 5 numbers from 0 to 624 and add them all up.

My program is

Code:

import numpy as np
from scipy.stats import binom
from math import *

def digit_to_char(digit):
    if digit < 10: return chr(ord('0') + digit)
    else: return chr(ord('a') + digit - 10)

def str_base(number,base):
    if number < 0:
        return '-' + str_base(-number,base)
    else:
        (d,m) = divmod(number,base)
        if d:
            return str_base(d,base) + digit_to_char(m)
        else:
            return digit_to_char(m)
base = 5
a = np.zeros((base,base+1), dtype=int)
m = int(base**base)
n = int(base)
digset = set("01234")

for i in range(0,m):
    string = f"{str_base(i,base):05s}"  # format with leading zeris
    # print(string)
    for digit in digset:
        digcnt = string.count(digit)
        # print(f"digit = {digit}, digit count = {digcnt}")
        a[int(digit), digcnt] += 1
np.set_printoptions(precision=8, suppress=True)
print(a/m)

for k in range(0,base):
    prob = binom.pmf(k,n,0.2)
    print(f"The probability a digit appears {k} times is {prob:8.6f}")

The output is

Code:

[[0.32768 0.4096  0.2048  0.0512  0.0064  0.00032]
 [0.32768 0.4096  0.2048  0.0512  0.0064  0.00032]
 [0.32768 0.4096  0.2048  0.0512  0.0064  0.00032]
 [0.32768 0.4096  0.2048  0.0512  0.0064  0.00032]
 [0.32768 0.4096  0.2048  0.0512  0.0064  0.00032]]
The probability a digit appears 0 times is 0.327680
The probability a digit appears 1 times is 0.409600
The probability a digit appears 2 times is 0.204800
The probability a digit appears 3 times is 0.051200
The probability a digit appears 4 times is 0.006400

The binom.pmf function correctly computes the probability that any digit will appear a number of times. The 'a' array above has a row for each digit and a column for the percentage of times it will appear in a number from 0 to 624.

If you add up the probabilities of having 0,1,2,3,4 or 5 occurances of a number, they better add up to 1. Mine do.

In my example the probabilities are for a single digit. There are 5 digits. Each has a probabilty of a digit occuring 3 times out of 5 of 0.0512. If you want the probabilty of all digits occuring 3 times out of 5 then add up the probabilites. There are all there.

The point is that the binom.pmf function will accurately calculate the probabiltiy that a digit occurs 4 times in a 10 digit number too. THe binom.pmf function eliminates the need to brute force the calculations.

drbitboy · Mar 14, 2022

Peter Nachtwey said:
The binom.pmf function is correct.

Yes, but it is not relevant here because it does not answer the original question:

Peter Nachtwey said:
Lets say the number range is from 0 to 99 and the probability distribution is flat or even for all numbers. What are the chances of get the same number 2 times when taking only 10 samples?
This is a rather basic probability problem.
A little harder, maybe a lot harder, is what are the chances of getting 4 numbers the same.

I.e. it does not answer the birthday question. Your sampling program proves nothing.

...

Peter Nachtwey · Mar 14, 2022

The birthday problem is a different problem than what percentage of the time one picks 4 of the same out of 10 in 10^10 samples. I don't sew where your solution is correct of the percentages don't add up to 100%.

drbitboy · Mar 14, 2022

Peter Nachtwey said:
The birthday problem is a different problem than what percentage of the time one picks 4 of the same out of 10 in 10^10 samples.

"what are the chances of getting 4 numbers the same" when sampling 10 items with replacement from a set of 100 items is most definitely analogous to the birthday problem.

You have yet to unambiguously state whatever problem you are solving.

Peter Nachtwey said:
I don't sew where your solution is correct of the percentages don't add up to 100%.

Because you did not look, as usual.

Bold blue, see below; btw, 5**5 is 3125, not 625.



$ python 10.py 5 5 --verbose

[(<120; 0.0384; (1, 1, 1, 1, 1)>, 120, 1, 120, [5]),    <== 5 unique numbers:  all singles; e.g. 01234 or 24031

(<1200; 0.384; (1, 1, 1, 2)>, 120, 10, 12, [2, 3]),     <== 4 unique numbers:  four singles; one double; e.g. 01233 or 43324

 (<600; 0.192; (1, 1, 3)>, 60, 10, 12, [3, 2]),          <== 3 unique numbers:  two singles; one triple e.g. 01222 or 43434

 (<900; 0.288; (1, 2, 2)>, 60, 15, 8, [2, 2, 2]),        <== 3 unique numbers:  one single; two doubles; e.g. 01122 or 43423

 (<100; 0.032; (1, 4)>, 20, 5, 24, [4]),                 <== 2 unique numbers:  one single; one quadruple; e.g. 01111 or 44434

 (<200; 0.064; (2, 3)>, 20, 10, 12, [2, 3]),         <== 2 unique numbers:  one double; one triple; e.g. 00111 or 43443

 (<5; 0.0016; (5,)>, 5, 1, 120, [5])]                    <== 5 unique numbers:  one quintuple; e.g. 00000 or 44444

3125                               <== sum of all cases above, equal to 5**5 which is all possible combinations of 5 from 5 with replacement

Sampling 5 items at a time out of 5:

- At least  1 the same:   probability=         100%; count=3125=3125

- At least  2 the same:   probability=       96.16%; count=3005=3005

- At least  3 the same:   probability=       28.96%; count=905=905

- At least  4 the same:   probability=        3.36%; count=105=105

- At least  5 the same:   probability=        0.16%; count=5=5

Probability is hard; be patient, you'll get there.

...

OT: Random Number Generator

I_Automation

Lifetime Supporting Member

Peter Nachtwey

Member

drbitboy

Lifetime Supporting Member

drbitboy

Lifetime Supporting Member

drbitboy

Lifetime Supporting Member

drbitboy

Lifetime Supporting Member

Peter Nachtwey

Member

drbitboy

Lifetime Supporting Member

Peter Nachtwey

Member

drbitboy

Lifetime Supporting Member

Similar Topics