Algorithm or Test Method to generate test case for Keno game - algorithm

Keno Game rules: Keno is a lottery-like game which generates random combination of number ranging from 1 to 80 with the size of 20. Player may choose a number game to play (1,2,3,4,5,6,7,8,9,10,15). The payout depends on the number game and number of matches.
I understand the difficulties of generating a complete test case to cover all possible combination not to mention the possibility of matching the random game result. Therefore, I initially applied the Random Combination testing method but later found out it is hard to achieve high coverage of all possible cases (roughly about 10%). By now, I have come across Pure Random Combinatorial, CATS, AETG, K-combination but none is ideal for Keno game.
For now, the inputs are num_game_size, numSelected[num_game_size]. Meanwhile, the outputs are result[20], matchedNum[], matched_num_size, payout. Of course, there are more inputs: continuous_game_toplay_size, bet_amount.
I'm looking forward for any suggestion on any testing method or algorithm that has high coverage on pure random and large combination test case if executed for a month or two. My objective is to test combination of selected numbers and their payout for each different number of matches when the result is pure random generated. For instance:
/* Assume the result is pure random generated */
/* Match 0 */
num_game_size = 2
numSelected[2] = {1,72}
result[20] = {2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21}
matchedNum[] = {}
matched_num_size = 0
payout = 0
/* Match 1 */
num_game_size = 2
numSelected[2] = {1,72}
result[20] = {1,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21}
matchedNum[] = {1}
matched_num_size = 1
payout = 1
/* Match 2 */
num_game_size = 2
numSelected[2] = {1,72}
result[20] = {1,72,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21}
matchedNum[] = {1,72}
matched_num_size = 2
payout = 5
The total possibility will be C(80,2) * C(80,20) = 3160 * 3535316142212174320 = 1.117159900939047e+19. Meaning for each combination of number with the size of two within range of 1 to 80, there are C(80,20) possible results. It will probably takes few years to cover all possibility (including 1,3,4,5,6,7,8,9,10,15 number game) when the result is pure random generated (quantum RNG).
Ps: Most test method I found only consider either random or combination problem and require a tremendous amount of time to complete test case generation. I'm trying to create any program to help me in winning the Keno game IRL.

Related

Positive random number generation with given mean and standard deviation

I am working on a simulation project. Need help on random number generation. I need two sets of random numbers.
Properties:
Primary set (var_a) > Normally distributed, greater than 0, with given mean and std.
Secondary set (var_b) -> Same as a primary set, with an addition, that second set cannot be greater than primary set. The reason being the output of a deterministic function will be in percentage between 0-1 only. For example:
service level calculation
import numpy as np
n = 100000
# Calls Handled
callshandled = np.random.normal(loc=65, scale=97, size=n)
print('Calls handled: ', callshandled)
# Call handled within sl. Has to always be less or equal to Calls Handled
ansinsl = np.random.normal(loc=60, scale=82, size=n)
print('Answered in SL', ansinsl)
# Service Level - Has to be between 0-1. With normal distribution we get values in negative
sl = np.array(ansinsl)/np.array(callshandled)
print('Service level', sl)
Calls handled: [ 43.26825426 129.79198758 31.56460354 ... 37.45059791 1.71420416
-94.87241356]
Answered in SL [-12.72293091 204.28084996 232.25722235 ... 166.03208722 -53.69933624
-36.71949656]
Service level [ -0.29404771 1.57390956 7.35815427 ... 4.43336279 -31.32610312
0.38704082]
There is a well-known “natural” way of generating pairs of Gaussian pseudo-random variates, known as Box-Muller.
You could try it that way:
generate pairs of unit normal variates à la Box-Muller
scale the pairs to whatever your (µ,σ) parameters are
reject the pairs that do not fit your criteria

Generate increasing random number rails

I've a random number generator code:
5.times.map { [*0..9].sample }.join.to_i
It gives me random numbers like 63832, 42337, 34998. As you can see that they are completely random, but how to make than I would get only in an increasing way? Not 63832, 42337, 34998, but 34998, 42337, 63832 (this is just an example, Ideally I would get smth like 00[number] => 0025, where 25 is a random number which was generated.
Hope my explanation is understandable :)
If you have the current / last random number, you can generate a larger one by simply adding a random number to it, e.g:
def generate(base = 0)
base + rand(1_000..10_000)
end
number = generate #=> 9635
number = generate(number) #=> 17761
number = generate(number) #=> 22082
number = generate(number) #=> 31061
Each number is 1,000 to 10,000 larger than its predecessor.
An alternative approach, if you want to generate all random numbers within a known range:
[*1..10000].sample(5).sort
# => [602, 5608, 7912, 8384, 8714]
However, this only works if you want to fetch all random numbers upfront, rather than continuously being able to generate new ones which are larger.
It's also not a good approach if your upper limit is very big - e.g. this will freeze your system and need to be cancelled:
[*1..10000000000].sample(5).sort
...But in that case, since the numbers are so huge, you can surely get away with the tiny risk of having a collision:
5.times.map{ rand(1..10000000000) }.sort
# => [460188573, 555213355, 3576967759, 3994239233, 9570165205]

Input to different attributes values from a random.sample list

so this is what I'm trying to do, and I'm not sure how cause I'm new to python. I've searched for a few options and I'm not sure why this doesn't work.
So I have 6 different nodes, in maya, called aiSwitch. I need to generate random different numbers from 0 to 6 and input that value in the aiSiwtch*.index.
In short the result should be
aiSwitch1.index = (random number from 0 to 5)
aiSwitch2.index = (another random number from 0 to 5 different than the one before)
And so on unil aiSwitch6.index
I tried the following:
import maya.cmds as mc
import random
allswtich = mc.ls('aiSwitch*')
for i in allswitch:
print i
S = range(0,6)
print S
shuffle = random.sample(S, len(S))
print shuffle
for w in shuffle:
print w
mc.setAttr(i + '.index', w)
This is the result I get from the prints:
aiSwitch1 <-- from print i
[0,1,2,3,4,5] <--- from print S
[2,3,5,4,0,1] <--- from print Shuffle (random.sample results)
2
3
5
4
0
1 <--- from print w, every separated item in the random.sample list.
Now, this happens for every aiSwitch, cause it's in a loop of course. And the random numbers are always a different list cause it happens every time the loop runs.
So where is the problem then?
aiSwitch1.index = 1
And all the other aiSwitch*.index always take only the last item in the list but the time I get to do the setAttr. It seems to be that w is retaining the last value of the for loop. I don't quite understand how to
Get a random value from 0 to 5
Input that value in aiSwitch1.index
Get another random value from 0 to 6 different to the one before
Input that value in aiSwitch2.index
Repeat until aiSwitch5.index.
I did get it to work with the following form:
allSwitch = mc.ls('aiSwitch')
for i in allSwitch:
mc.setAttr(i + '.index', random.uniform(0,5))
This gave a random number from 0 to 5 to all aiSwitch*.index, but some of them repeat. I think this works cause the value is being generated every time the loop runs, hence setting the attribute with a random number. But the numbers repeat and I was trying to avoid that. I also tried a shuffle but failed to get any values from it.
My main mistake seems to be that I'm generating a list and sampling it, but I'm failing to assign every different item from that list to different aiSwitch*.index nodes. And I'm running out of ideas for this.
Any clues would be greatly appreciated.
Thanks.
Jonathan.
Here is a somewhat Pythonic way: shuffle the list of indices, then iterate over it using zip (which is useful for iterating over structures in parallel, which is what you need to do here):
import random
index = list(range(6))
random.shuffle(index)
allSwitch = mc.ls('aiSwitch*')
for i,j in zip(allSwitch,index):
mc.setAttr(i + '.index', j)

How to get running maximum in Stata?

I would like to get the running maximum by writing Stata code.
I think I am quite close:
gen ctrhigh`iv' = max(ctr, L1.ctr, L2.ctr, L3.ctr, ..., L`iv'.ctr)
As you can see, my data are time series and `iv' represents the window (e.g. 5, 10 or 200 days)
The only problem is that you cannot pass a varlist or string containing numbers to max. E.g. the following is not possible:
local ivs 5 10 50 100 200
foreach iv in `ivs' {
local vals
local i = 1
while (`i' <= `iv') {
vals "`vals' `i'"
local ++i
}
gen ctrhigh`iv' = max(varlist vals) //not possible
}
How would I achieve this instead?
Example of quickly computing a running standard deviation
* standard deviation of ctr, see http://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods *
gen ctr_sq = ctr^2
by tid: gen ctr_cum = sum(ctr) if !missing(ctr)
by tid: gen ctr_sq_cum = sum(ctr_sq) if !missing(ctr_sq)
foreach iv in $ivs {
if `iv' == 1 continue
by tid: gen ctr_sum = ctr_cum - L`iv'.ctr_cum if !missing(ctr_cum) & !missing(L`iv'.ctr_cum)
by tid: gen ctr_sq_sum = ctr_sq_cum - L`iv'.ctr_sq_cum if !missing(ctr_sq_cum) & !missing(L`iv'.ctr_sq_cum)
by tid: gen ctrsd`iv' = sqrt((`iv' * ctr_sq_sum - ctr_sum^2) / (`iv'*(`iv'-1))) if !missing(ctr_sq_sum) & !missing(ctr_sum)
label variable ctrsd`iv' "Rolling std dev of close ticker rank by `iv' days."
drop ctr_sum ctr_sq_sum
}
drop ctr_sq ctr_cum ctr_sq_cum
Note: this is not an exact sd, it's an approximation. I realize that this is very different from a maximum, but this may serve as an illustration on how to deal with large data computations.
Your example is time series data and implies that you have tsset the data. You don't say whether you also have panel or longitudinal structure. I will assume the worst and assume the latter as it doesn't make the code much worse. So, suppose tsset id date. In fact, that's irrelevant to the code here except to make explicit my assumption that id is an identifier and date a time variable.
An unattractive way to do this is to loop over observations. Suppose window is set to 42.
local window = 42
gen max = .
tsset id date
quietly forval i = 1/`=_N' {
su ctr if inrange(date, date[`i'] - `window', date[`i']) & id == id[`i'], meanonly
replace max = r(max) in `i'
}
So, in words as well: summarize values of ctr if date within window and it's in the same panel (same id), and put the maximum in the current observation.
The meanonly option is not well named. It calculates some other quantities besides the mean, and the maximum is one. But you do want the meanonly option to make summarize go as fast as possible.
See my 2007 paper on events in intervals, freely available at http://www.stata-journal.com/sjpdf.html?articlenum=pr0033
I say unattractive, but this approach does have the advantage that it is easy to work with once you understand it.
I am not setting up an expression with lots of arguments to max(). You said 200 as an example and nothing stated that you might not ask for more, so far as I can see there may be no upper limit on window length, but there will be a limit on how complicated that expression can be.
If I think of a better way to do it, I'll post it. Or someone else will....
It seems like I can pass a string of arguments to max, like so:
* OPTION 1: compute running max by days *
foreach iv in $ivs {
* does not make sense for less than two days *
if `iv' < 2 continue
di "computing running max for ctr interval `iv'"
* set high for this amount of days *
local vars "ctr"
forval i = 1 / `iv' {
local vars "`vars', L`i'.ctr"
}
by tid: gen ctrh`iv' = max(`vars')
}
* OPTION 2: compute running max by days, ensuring that entire range is nonmissing *
foreach iv in $ivs {
* does not make sense for less than two days *
if `iv' < 2 continue
di "computing running max for ctr interval `iv'"
* set high for this amount of days *
local vars "ctr"
local condition "!missing(ctr)"
forval i = 1 / `iv' {
local vars "`vars', L`i'.ctr"
local condition "`condition' & !missing(L`i'.ctr)"
}
by tid: gen ctrh`iv' = max(`vars') if `condition'
}
This computes very quickly and does exactly what I need.
However, if you need an arbitrarily large window I think you should resort to Nick's answer.

Build fixed interval dataset from random interval dataset using stale data

Update: I've provided a brief analysis of the three answers at the bottom of the question text and explained my choices.
My Question: What is the most efficient method of building a fixed interval dataset from a random interval dataset using stale data?
Some background: The above is a common problem in statistics. Frequently, one has a sequence of observations occurring at random times. Call it Input. But one wants a sequence of observations occurring say, every 5 minutes. Call it Output. One of the most common methods to build this dataset is using stale data, i.e. set each observation in Output equal to the most recently occurring observation in Input.
So, here is some code to build example datasets:
TInput = 100;
TOutput = 50;
InputTimeStamp = 730486 + cumsum(0.001 * rand(TInput, 1));
Input = [InputTimeStamp, randn(TInput, 1)];
OutputTimeStamp = 730486.002 + (0:0.001:TOutput * 0.001 - 0.001)';
Output = [OutputTimeStamp, NaN(TOutput, 1)];
Both datasets start at close to midnight at the turn of the millennium. However, the timestamps in Input occur at random intervals while the timestamps in Output occur at fixed intervals. For simplicity, I have ensured that the first observation in Input always occurs before the first observation in Output. Feel free to make this assumption in any answers.
Currently, I solve the problem like this:
sMax = size(Output, 1);
tMax = size(Input, 1);
s = 1;
t = 2;
%#Loop over input data
while t <= tMax
if Input(t, 1) > Output(s, 1)
%#If current obs in Input occurs after current obs in output then set current obs in output equal to previous obs in input
Output(s, 2:end) = Input(t-1, 2:end);
s = s + 1;
%#Check if we've filled out all observations in output
if s > sMax
break
end
%#This step is necessary in case we need to use the same input observation twice in a row
t = t - 1;
end
t = t + 1;
if t > tMax
%#If all remaining observations in output occur after last observation in input, then use last obs in input for all remaining obs in output
Output(s:end, 2:end) = Input(end, 2:end);
break
end
end
Surely there is a more efficient, or at least, more elegant way to solve this problem? As I mentioned, this is a common problem in statistics. Perhaps Matlab has some in-built function I'm not aware of? Any help would be much appreciated as I use this routine a LOT for some large datasets.
THE ANSWERS: Hi all, I've analyzed the three answers, and as they stand, Angainor's is the best.
ChthonicDaemon's answer, while clearly the easiest to implement, is really slow. This is true even when the conversion to a timeseries object is done outside of the speed test. I'm guessing the resample function has a lot of overhead at the moment. I am running 2011b, so it is possible Mathworks have improved it in the intervening time. Also, this method needs an additional line for the case where Output ends more than one observation after Input.
Rody's answer runs only slightly slower than Angainor's (unsurprising given they both employ the histc approach), however, it seems to have some problems. First, the method of assigning the last observation in Output is not robust to the last observation in Input occurring after the last observation in Output. This is an easy fix. But there is a second problem which I think stems from having InputTimeStamp as the first input to histc instead of the OutputTimeStamp adopted by Angainor. The problem emerges if you change OutputTimeStamp = 730486.002 + (0:0.001:TOutput * 0.001 - 0.001)'; to OutputTimeStamp = 730486.002 + (0:0.0001:TOutput * 0.0001 - 0.0001)'; when setting up the example inputs.
Angainor's appears robust to everything I threw at it, plus it was the fastest.
I did a lot of speed tests for different input specifications - the following numbers are fairly representative:
My naive loop: Elapsed time is 8.579535 seconds.
Angainor: Elapsed time is 0.661756 seconds.
Rody: Elapsed time is 0.913304 seconds.
ChthonicDaemon: Elapsed time is 22.916844 seconds.
I'm +1-ing Angainor's solution and marking the question solved.
This "stale data" approach is known as a zero order hold in signal and timeseries fields. Searching for this quickly brings up many solutions. If you have Matlab 2012b, this is all built in to the timeseries class by using the resample function, so you would simply do
TInput = 100;
TOutput = 50;
InputTimeStamp = 730486 + cumsum(0.001 * rand(TInput, 1));
InputData = randn(TInput, 1);
InputTimeSeries = timeseries(InputData, InputTimeStamp);
OutputTimeStamp = 730486.002 + (0:0.001:TOutput * 0.001 - 0.001);
OutputTimeSeries = resample(InputTimeSeries, OutputTimeStamp, 'zoh'); % zoh stands for zero order hold
Here is my take on the problem. histc is the way to go:
% find Output timestamps in Input bins
N = histc(Output(:,1), Input(:,1));
% find counts in the non-empty bins
counts = N(find(N));
% find Input signal value associated with every bin
val = Input(find(N),2);
% now, replicate every entry entry in val
% as many times as specified in counts
index = zeros(1,sum(counts));
index(cumsum([1 counts(1:end-1)'])) = 1;
index = cumsum(index);
val_rep = val(index)
% finish the signal with last entry from Input, as needed
val_rep(end+1:size(Output,1)) = Input(end,2);
% done
Output(:,2) = val_rep;
I checked against your procedure for a few different input models (I changed the number of Output timestamps) and the results are the same. However, I am still not sure I understood your problem, so if something is wrong here let me know.

Resources