OT: Random Number Generator

drbitboy · Mar 10, 2022

OT (snort)

OT OT OT

Two out of 10k looks about right, but I'll have to think about why my two-the-same validation is off by a percent or so.



$ more 4.py 

from random import randint as ri

counts = [0]*100

last10 = [ri(0,99) for i in range(10)]

for i in last10: counts = 1

inext,total2,total4,totaltest,samples = 0,0.0,0.0,0.0,list()

twos,fours = set(),set()

try:

    while True:

        irand = ri(0,99)

        ilast = last10[inext]

        if 2 == counts[ilast]: twos.remove(ilast)

        counts[ilast] -= 1

        if 3 == counts[ilast]: fours.remove(ilast)

        counts[irand] += 1

        if 2 == counts[irand]: twos.add(irand)

        elif 4 == counts[irand]: fours.add(irand)

        totaltest += 1

        if len(twos):

            total2 += 1

            if len(fours):

                total4 += 1

            tup = (total2,total4 ,totaltest

                  ,total2/totaltest,total4/totaltest,)

            if len(fours) and (0.0 == (total4 % 50)):

                print(tup)

                samples.append(tup)

        last10[inext] = irand

        inext += (inext < 9) and 1 or -8

except KeyboardInterrupt: pass

p2 = 100.**(-9)

for i in range(91,100): p2 *= float(i)

import matplotlib.pyplot as plt

total2s,total4s,totaltests,total2prob,total4prob = zip(*samples)

fix,(ax2,ax4,) = plt.subplots(1,2, sharex=True)

ax2.plot(totaltests,total2prob)

ax2.axhline(1.-p2)

ax4.plot(totaltests,total4prob)

plt.show()

Peter Nachtwey · Mar 10, 2022

At least 4 or just 4. There is a difference.
I just want to make sure before I get into the probabilities. I am assuming drbitboy's answer 4 the same out of 100 is correct but I would like to prove it.
Letting my geek flag fly!

drbitboy · Mar 10, 2022

Peter Nachtwey said:
At least 4 or just 4. There is a difference.
I just want to make sure before I get into the probabilities. I am assuming drbitboy's answer 4 the same out of 100 is correct but I would like to prove it.
Letting my geek flag fly!

To use the approach where backing up to the solution gives the correct answer, I am certain it is "at least four," which is the point I made in your first post with the Birthday-Problem-ish query, and which is also the approach my Python code uses i.e. "at least four" and "at least two."

"Exactly two" or "exactly four," OTOH, gets messy, because you have to consider how you are going to handle multiple instances of duplicate values e.g. [99,98,98,97,97,97,96,96,96] does have both exactly two 98s and exactly two 96s.

For example, say the task was to pick four numbers with replacement from a set of 10; there would be 10k (10**4) possible combinations:

5040 "1 the same" i.e. all unique
- 5040 = 10 * 9 * 8 * 7, i.e.
  - All 10 possible for the 1st pick,
    - which leaves the 9 values not picked 1st as possible for the 2nd pick,
      - which leaves the 8 values picked neither 1st or 2nd for the 3rd pick,
        
        which leaves the 7 values picked neither 1st nor 2nd nor 3rd for the 4th pick
4320 "exactly 2 the same, the others different from each other"
- 2160 = 10 * 9 * 8 * 3
  - 10*9*8 => First three picks are unique,
  - 3 => 4th pick same as one of the first 3
- 1440 = 10 * 9 * 2 * 8
  - 10*9 => First two picks are unique
  - 2 => 2nd pick sam as one fo the first 2
  - 8 => 4th pick unique
- 720 = 10 * 1 * 9 * 8
  - 10*1 => First two picks are the same number
  - 9*8 => 3rd and 4th pick are different both from the first two picks and from each other
360 "Exactly 3 the same"
- 180 = 10 * 9 * 2 * 1
  - 10*9 => First two pick are unique
  - 2 => 3rd pick has to match one of the first two
  - 1 => 4th pick has to match 2rd pic
- 90 = 10 * 1 * 9 * 1
  - 10 => 1st pick is unique
  - 1 => 2nd pick matches 1st pick
  - 9 => 3rd pick is unique from first two picks
  - 1 => 4th pick has to match first two picks
- 90 = 10 * 1 * 1 * 9
  - 10 => 1st pick is unique
  - 1 => 2nd pick matches 1st pick
  - 1 => 3rd pick matches first two picks
  - 9 => 4th pick is unique from first three picks
10 "Exactly four the same"
- 10 = 10 * 1 * 1 * 1
  - 10 => 1st pick is unique
  - 1*1*1 => last three picks match the 1st pick
The total so far is 9730, or 270 short of 10k, because we did not consider:
270 "two the same, and the other two both the same and unique from the first two"
- 180 = 10 * 9 * 2 * 1
  - 10*9 => First two are unique
  - 2 => 3rd pick matches one of the first two
  - 1 => 4th pick matches the other of the first two
- 90 = 10 * 1 * 9 * 1
  - 10 => 1st pick is unique
  - 1 => 2nd pick matches first pick
  - 9 => 3rd pick is unique from first two picks
  - 1 => 4th pick matches third pick

Note that there are four groups, indicated by the bold underlined colors above, each group with the same first three numbers, and the four numbers in each group sum to 10:: 10*9*8*[7,3], 10*9*2*[8,1,1], 10*1*9*[8,1,1], 10*1*1*[9,1].

drbitboy · Mar 10, 2022

drbitboy said:
"Exactly two" or "exactly four," OTOH, gets messy, because you have to consider how you are going to handle multiple instances of duplicate values e.g. [99,98,98,97,97,97,96,96,96] does have both exactly two 98s and exactly four 96s.

Correction in bold above.

drbitboy · Mar 10, 2022



$ cat 10.py

import sys



L = [(int(([10]+sys.argv[1:]).pop()),)]

s=set(L)



while L:

    t = L.pop()

    for i in range(len(t)):

        pfx,ival,sfx = t[:i],t,t[i+1:]

        j = len(pfx) and pfx[-1] or 1

        while (j+j) <= ival:

            k = j

            j += 1

            newt = t[:i] + (k,ival-k,) + t[i+1:]

            if newt in s: continue

            L.append(newt)

            s.add(newt)

import pprint

pprint.pprint(sorted(s))

print(len(s))

Might come in handy. Jes' sayin'

...

drbitboy · Mar 11, 2022

drbitboy said:
Might come in handy. Jes' sayin'

It did; the script is attached.



>

>

> python 10.py

Sampling 10 items at a time out of 100:

- At least  1 the same:   probability=         100%; count=100000000000000000000=1e+20

- At least  2 the same:   probability=     37.1843%; count=37184349044470528000=3.71843e+19

- At least  3 the same:   probability=     1.13653%; count=1136534315040640000=1.13653e+18

- At least  4 the same:   probability=   0.0200126%; count=20012610593260000=2.00126e+16

- At least  5 the same:   probability= 0.000241678%; count=241678430740000=2.41678e+14

- At least  6 the same:   probability= 2.02894e-06%; count=2028939412600=2.02894e+12

- At least  7 the same:   probability= 1.16878e-08%; count=11687791600=1.16878e+10

- At least  8 the same:   probability= 4.42036e-11%; count=44203600=4.42036e+07

- At least  9 the same:   probability=    9.91e-14%; count=99100=99100

- At least 10 the same:   probability=       1e-16%; count=100=100

>

>

> python 10.py 5 10

Sampling 5 items at a time out of 10:

- At least  1 the same:   probability=         100%; count=100000=100000

- At least  2 the same:   probability=       69.76%; count=69760=69760

- At least  3 the same:   probability=        8.56%; count=8560=8560

- At least  4 the same:   probability=        0.46%; count=460=460

- At least  5 the same:   probability=        0.01%; count=10=10

>

>

ojz0r · Mar 11, 2022

ndzied1 said:
Schrödinger's cat disagrees with you ;-)

That darn cat! It always gets into trouble.

plvlce said:
The key being 'all *data* was known and calculated.' If the uncertainty principle is true, it is simply impossible for this to be the case -- limits to our ability to measure (and thus know or calculate) are a fundamental property.
...
Heisenberg tells us that even if you know the starting conditions, there are limits to how accurately current values can be known.

Ofcourse its not very practical, neither is the above cat

drbitboy · Mar 11, 2022

ndzied1 · Mar 11, 2022

drbitboy said:
It did; the script is attached.

Damn, now I need to learn python.... ;-)

Peter Nachtwey · Mar 11, 2022

I think drbitboy is doing it the hard way.
https://www.geeksforgeeks.org/python-binomial-distribution/

drbitboy · Mar 11, 2022

Peter Nachtwey said:
I think drbitboy is doing it the hard way.
https://www.geeksforgeeks.org/python-binomial-distribution/

I think that is for binomial distribution only i.e. at least one pair of two persons having the same birthday in a random sample of N people with equal probabilities of birthdays for each person.

The problem I am solving is a multinomial distribution e.g. see this link.

Peter Nachtwey · Mar 11, 2022

It is hard to tell what you are trying to do because you keep changing the problem.
Can we go back to picking the same 4 numbers out of 10 with replacement? Assume even probability for each digit or p=0.1 This problem can be easily brute forced and compared with a formula.

Basically there are 10*digits*10 numbers so there are 10^10 combination of numbers from 0-9. This can be brute forced by generating all the numbers and counting how many have 4 digits of the same value.

More good stuff
https://www.statology.org/binomial-distribution-python/

drbitboy · Mar 11, 2022

Peter Nachtwey said:
It is hard to tell what you are trying to do because you keep changing the problem.

No, I am not. I have been doing the birthday problem using the "At least N matches" semantic, where N can be an integer other than 2.

I already solved the four out of 10 by hand.

Peter Nachtwey said:
Can we go back to picking the same 4 numbers out of 10 with replacement? Assume even probability for each digit or p=0.1 This problem can be easily brute forced and compared with a formula.

Basically there are 10*digits*10 numbers so there are 10^10 combination of numbers from 0-9. This can be brute forced by generating all the numbers and counting how many have 4 digits of the same value.

That is incoherent. Do you mean "Pick four (N=4) digits out of ten (M=10) with replacement, and determine what is probability of at least 2, or at least 3, or at least 4, the same? Obviously the last case is always * M**(1-N).

Peter Nachtwey · Mar 12, 2022

The number of items sampled makes a difference but if sampling with replacement then it makes no difference if there are 100 items or 1000 items. Only the probability of the picking an item matters.

You are solving a little bit different problem with your "at least" qualifier.

Code:

from scipy.stats import binom

#calculate binomial probability of selecting k items the same
k=0     # the number of items that ARE THE SAME 
n=10    # the number of items in a sample    
p=0.1   # the probability of selecting an item
sum=0.
for k in range(0,n+1):
    prob = binom.pmf(k,n,p)
    sum = sum + prob
    print(f"probability of {k} items being the same {prob} and the sum is {sum}")

The output is

Code:

probability of 0 items being the same 0.3486784401000001 and the sum is 0.3486784401000001
probability of 1 items being the same 0.38742048899999965 and the sum is 0.7360989290999997
probability of 2 items being the same 0.19371024450000007 and the sum is 0.9298091735999998
probability of 3 items being the same 0.05739562799999998 and the sum is 0.9872048015999998
probability of 4 items being the same 0.01116026100000001 and the sum is 0.9983650625999998
probability of 5 items being the same 0.0014880347999999995 and the sum is 0.9998530973999998
probability of 6 items being the same 0.00013778100000000007 and the sum is 0.9999908783999998
probability of 7 items being the same 8.748000000000003e-06 and the sum is 0.9999996263999997
probability of 8 items being the same 3.6449999999999996e-07 and the sum is 0.9999999908999997
probability of 9 items being the same 8.999999999999995e-09 and the sum is 0.9999999998999997
probability of 10 items being the same 1.0000000000000006e-10 and the sum is 0.9999999999999997

python also has a cumulative density function, cdf

Code:

probability of 0 items or less is 0.34867844009999993
probability of 1 items or less is 0.7360989291
probability of 2 items or less is 0.9298091736
probability of 3 items or less is 0.9872048016
probability of 4 items or less is 0.9983650626
probability of 5 items or less is 0.9998530974
probability of 6 items or less is 0.9999908784
probability of 7 items or less is 0.9999996264
probability of 8 items or less is 0.9999999909
probability of 9 items or less is 0.9999999999
probability of 10 items or less is 1.0

[code]
the output is 

[code]
probability of 0 items or less is 0.34867844009999993
probability of 1 items or less is 0.7360989291
probability of 2 items or less is 0.9298091736
probability of 3 items or less is 0.9872048016
probability of 4 items or less is 0.9983650626
probability of 5 items or less is 0.9998530974
probability of 6 items or less is 0.9999908784
probability of 7 items or less is 0.9999996264
probability of 8 items or less is 0.9999999909
probability of 9 items or less is 0.9999999999
probability of 10 items or less is 1.0

These numbers should match the sum numbers above.

drbitboys at least k is 1-binom.cdf(k-1,n,0.1)
It better be. I will let someone else check that out.

for those that like to play table tennis or those that want to dig deeper into python's binom or Mathcad dbinom functions then look at this. This is the only TT handicap table on the internet where the probabilities and handicaps were calculated.

drbitboy · Mar 12, 2022

Peter Nachtwey said:

Code:

probability of 0 items being the same 0.3486784401000001 and the sum is 0.3486784401000001[code][/QUOTE]


As my son used to say, when he had his Volvo 5-speed Turbo S70, and someone would pull up next to him to use the merging right lane on the other side of the the light to get ahead: "I don't think so."
  
[LIST]
[*]The least 10-digit decimal number is 0000000000
[*]The greatest 10-digit decimal number is 999999999
[*]There are 1E10 (= 10 to the 10th power = 10**10 = 10^10) possible elements in the set of all 10-digit decimal numbers
[*]I think we can agree that the 10 digits in each element of that set of numbers is equivalent to exactly one possible sample of 10 items with replacement from a reservoir of items of 10 types
[*]The least 10-digit decimal number with "0 [digits] being the same" is 0123456789 (N.B. that includes the leading zero);
[*]The largest such decimal number is 98765432310;
[*]the count of all such "0 digits the same" decimal numbers is 3628800 (= 10! = 3.6288E6).
[*]So if we sample 10 items with replacement from a reservoir of items of 10 types, the probability of the sample having 10 uniqe items (0 the same) is (3.6E6 ÷ 1E10) = 3.6E-4, or almost three orders of magnitude less than @Peter Nachtwey's 0.3486784401.
[/LIST]
That's a quick off-the-cuff check, and I will not be surprised if it is wrong in some way.  So consider also the following tests:
 
1) Empirical check that Python [B]random [/B]module's .[B]randint[/B](lo,hi) method (cf. [URL="https://docs.python.org/3/library/random.html#random.randint"]this link[/URL]) is uniform:[INDENT][ladder]
### Initialize a dictionary (hash table) with keys 0 through 9

>>> d=dict([(i,0) for i in range(10)])
>>> d
{0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0}

### Take ten million samples of the digits 0 through 9 using random.randint

>>> for i in range(10000000): d[random.randint(0,9)]+=1
...

### Print the histogram of those samples

>>> d
{0: 1001439, 1: 1000790, 2: 1001262, 3: 1000092, 4: 999470, 5: 1000007, 6: 999654, 7: 997713, 8: 999378, 9: 1000195}
[/ladder]
[/INDENT]1.1) Is a distribution of random.randint samples uniform?  Yes, apparently so.





2) Is the probability of getting 0 digits the greater than the one in three, which @Peter Nachtwey predicts, when sampling 10 decimal digits at a time with replacement?[INDENT][ladder]
>>> for i in range(10): print([random.randint(0,9) for i in range(10)])
...
[[COLOR=Blue][B]2[/B][/COLOR], 3, 9, [COLOR=blue][B]2[/B][/COLOR], 1, 4, 1, 5, 1, 9]
[[COLOR=blue][B]0[/B][/COLOR], 8, 6, 9, [COLOR=blue][B]0[/B][/COLOR], 3, 8, 4, 6, 0]
[5, [COLOR=Blue][B]6[/B][/COLOR], 1, [COLOR=blue][B]6[/B][/COLOR], 3, 9, 3, 1, 8, 3]
[[COLOR=blue][B]3[/B][/COLOR], 6, 7, 7, 6, 5, 0, [COLOR=blue][B]3[/B][/COLOR], 7, 9]
[[COLOR=blue][B]8[/B][/COLOR], 9, 5, 5, 6, 4, 7, 9, [COLOR=blue][B]8[/B][/COLOR], 4]
[7, 2, [COLOR=blue][B]0[/B][/COLOR], 9, 6, 8, 1, 4, [COLOR=blue][B]0[/B][/COLOR], 8]
[[COLOR=blue][B]6[/B][/COLOR], 3, 3, 3, 5, [COLOR=blue][B]6[/B][/COLOR], 3, 7, 4, 4]
[0, [COLOR=blue][B]7[/B][/COLOR], 3, 8, 4, 6, 9, [COLOR=blue][B]7[/B][/COLOR], 4, 6]
[7, [COLOR=blue][B]6[/B][/COLOR], [COLOR=blue][B]6[/B][/COLOR], 4, 2, 4, 2, 9, 2, 9]
[[COLOR=blue][B]2[/B][/COLOR], 0, [COLOR=blue][B]2[/B][/COLOR], 1, 1, 1, 9, 5, 1, 9]
>>>
[/ladder]
[/INDENT]2.1) No, it is much less than 1 in 3.

Probability, like measurement, is hard

OT: Random Number Generator

drbitboy

Lifetime Supporting Member

Peter Nachtwey

Member

drbitboy

Lifetime Supporting Member

drbitboy

Lifetime Supporting Member

drbitboy

Lifetime Supporting Member

drbitboy

Lifetime Supporting Member

ojz0r

Member

drbitboy

Lifetime Supporting Member

ndzied1

Lifetime Supporting Member

Peter Nachtwey

Member

drbitboy

Lifetime Supporting Member

Peter Nachtwey

Member

drbitboy

Lifetime Supporting Member

Peter Nachtwey

Member

drbitboy

Lifetime Supporting Member

Similar Topics