Positive random number generation with given mean and standard deviation - random

I am working on a simulation project. Need help on random number generation. I need two sets of random numbers.
Properties:
Primary set (var_a) > Normally distributed, greater than 0, with given mean and std.
Secondary set (var_b) -> Same as a primary set, with an addition, that second set cannot be greater than primary set. The reason being the output of a deterministic function will be in percentage between 0-1 only. For example:
service level calculation
import numpy as np
n = 100000
# Calls Handled
callshandled = np.random.normal(loc=65, scale=97, size=n)
print('Calls handled: ', callshandled)
# Call handled within sl. Has to always be less or equal to Calls Handled
ansinsl = np.random.normal(loc=60, scale=82, size=n)
print('Answered in SL', ansinsl)
# Service Level - Has to be between 0-1. With normal distribution we get values in negative
sl = np.array(ansinsl)/np.array(callshandled)
print('Service level', sl)
Calls handled: [ 43.26825426 129.79198758 31.56460354 ... 37.45059791 1.71420416
-94.87241356]
Answered in SL [-12.72293091 204.28084996 232.25722235 ... 166.03208722 -53.69933624
-36.71949656]
Service level [ -0.29404771 1.57390956 7.35815427 ... 4.43336279 -31.32610312
0.38704082]

There is a well-known “natural” way of generating pairs of Gaussian pseudo-random variates, known as Box-Muller.
You could try it that way:
generate pairs of unit normal variates à la Box-Muller
scale the pairs to whatever your (µ,σ) parameters are
reject the pairs that do not fit your criteria

Related

How do I add noise/variability to a dataset in Python, given the CV?

Given a dataset of blood results, say cholesterol level, and knowing that the instrument that produced those results is subject to a known degree of variability, how would I add that variability back into the dataset? i.e. I want to assume the result in the original dataset is the true/mean value, and then produce new results that are subject to the known variability of the instrument.
In Excel you use =NORM.INV(RAND(), mean, std_dev), where RAND() provides a random value between 0 and 1, "mean" will be the original value and I have the CV so I can calculate the SD. NORM.INV then provides the inverse of the cumulative normal distribution function.
I've done the following to create a new column with my new values, but would like to know if it is valid (i.e., will each row have a different random number between 0 and 1 as the probability? and is this formula equivalent to NORM.INV?
df8000['HDL_1'] = norm.ppf(random(), loc = df8000['HDL_0'], scale = TAE_df.loc[0,'HDL'])
Thanks in advance!

Is it possible to pre-assign values to decision variables in CPLEX OPL

I have a large number of variables ( both Binary and Continuous). Therefore I have determined a logic to assign some variables set to 0 so that they do not become part of the optimisation process.
For example I have a binary decision variable y[b][t]:
where b varies from 1 to 100
and t from 1 to 5.
I could determine using some logic that y[20][2] onwards to y[100][2] would be 0. I want to assign the fixed value of 0 to these variables y[20][2] onwards to y[100][2] thereby reducing the number of variables in my optimisation problem. While y is a binary decision variable I have other continuous variable as well which I would like to similarly set to 0 in advance.
Is there a way how this can be achieved ? I haven't used Python with CPEX but hear that this can be probably be achieved by setting a lower and upper bound of the variables. Is there a similar method in OPL.
----Added 13th Aug
May be I was not very clear or I could not understand the solution suggested.
What I wanted is say I have the following decision variable Xbmt ...(I have a few of them)
Originally declared as :
dvar float+ Xbmt[PitBlocks][Plants][TimePeriods];
But for some of the PitBlocks and some time periods I want to define this decision variable as 0. Those time periods for which I want to set the decision variable as 0 are defined in a tuple nullVariables. It has block id same as PitBlocks, and it has time_period same as TimePeriod. Hence I want something like below. But I cannot declare the decision variable twice. I need it 0 only for those ids in the nullVariable set.
dvar float+ Xbmt[NullVariablesSet.block_id][Plants][NullVariablesSet.time_period] in 0..0;
How can this be achieved where some of Xbmt remain as decision variables where as some are removed by setting as 0
see https://github.com/AlexFleischerParis/zooopl/blob/master/zoopreassign.mod
within
Making Decision Optimization Simple
int nbKids=300;
{int} seats={40,30}; // how many seats, {} means this is a set
float costBus[seats]=[500,400];
// Now let s see how preassign some decision variables
// Suppose we know that we have exactly 6 buses 40 seats
{int} preassignedseats={40};
int preassignedvalues[preassignedseats]=[6];
dvar int+ nbBus[s in seats]
in
((s in preassignedseats)?preassignedvalues[s]:0)
..
((s in preassignedseats)?preassignedvalues[s]:maxint);
minimize sum(b in seats) costBus[b]*nbBus[b];
subject to
{
sum(b in seats) b*nbBus[b]>=nbKids;
}

Algorithm or Test Method to generate test case for Keno game

Keno Game rules: Keno is a lottery-like game which generates random combination of number ranging from 1 to 80 with the size of 20. Player may choose a number game to play (1,2,3,4,5,6,7,8,9,10,15). The payout depends on the number game and number of matches.
I understand the difficulties of generating a complete test case to cover all possible combination not to mention the possibility of matching the random game result. Therefore, I initially applied the Random Combination testing method but later found out it is hard to achieve high coverage of all possible cases (roughly about 10%). By now, I have come across Pure Random Combinatorial, CATS, AETG, K-combination but none is ideal for Keno game.
For now, the inputs are num_game_size, numSelected[num_game_size]. Meanwhile, the outputs are result[20], matchedNum[], matched_num_size, payout. Of course, there are more inputs: continuous_game_toplay_size, bet_amount.
I'm looking forward for any suggestion on any testing method or algorithm that has high coverage on pure random and large combination test case if executed for a month or two. My objective is to test combination of selected numbers and their payout for each different number of matches when the result is pure random generated. For instance:
/* Assume the result is pure random generated */
/* Match 0 */
num_game_size = 2
numSelected[2] = {1,72}
result[20] = {2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21}
matchedNum[] = {}
matched_num_size = 0
payout = 0
/* Match 1 */
num_game_size = 2
numSelected[2] = {1,72}
result[20] = {1,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21}
matchedNum[] = {1}
matched_num_size = 1
payout = 1
/* Match 2 */
num_game_size = 2
numSelected[2] = {1,72}
result[20] = {1,72,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21}
matchedNum[] = {1,72}
matched_num_size = 2
payout = 5
The total possibility will be C(80,2) * C(80,20) = 3160 * 3535316142212174320 = 1.117159900939047e+19. Meaning for each combination of number with the size of two within range of 1 to 80, there are C(80,20) possible results. It will probably takes few years to cover all possibility (including 1,3,4,5,6,7,8,9,10,15 number game) when the result is pure random generated (quantum RNG).
Ps: Most test method I found only consider either random or combination problem and require a tremendous amount of time to complete test case generation. I'm trying to create any program to help me in winning the Keno game IRL.

EasyPredictModelWrapper giving wrong prediction

public BinomialModelPrediction predictBinomial(RowData data) throws PredictException {
double[] preds = this.preamble(ModelCategory.Binomial, data);
BinomialModelPrediction p = new BinomialModelPrediction();
double d = preds[0];
p.labelIndex = (int)d;
String[] domainValues = this.m.getDomainValues(this.m.getResponseIdx());
p.label = domainValues[p.labelIndex];
p.classProbabilities = new double[this.m.getNumResponseClasses()];
System.arraycopy(preds, 1, p.classProbabilities, 0, p.classProbabilities.length);
if(this.m.calibrateClassProbabilities(preds)) {
p.calibratedClassProbabilities = new double[this.m.getNumResponseClasses()];
System.arraycopy(preds, 1, p.calibratedClassProbabilities, 0, p.calibratedClassProbabilities.length);
}
return p;
}
Eg: classProbabilities =[0.82333,0,276666]
labelIndex = 1
label = true
domainValues = [false,true]
what does this labelIndex signifies and does the class probabilities
order is same as the domain value order ,If order is same then it means that here probability of false is 0.82333 and probability of true is 0.27666 but why is this labelIndex showing as 1 and label as true.
Please help me to figure out this issue.
Like Tom commented, the prediction is not "wrong". You can infer from this that the threshold H2O has chosen is less than 0.27666. You probably have imbalanced training data, otherwise H2O would have not picked a low threshold for classifying a predicted value of 0.27666 as a 1. Does your training set include fewer examples of the positive class than the negative class?
If you don't like that threshold for whatever reason, then you can manually create your own. Just make sure you know how to properly evaluate the effect of using different thresholds on the performance of your model, otherwise I'd recommend just using the default threshold.
The name, "classProbabilities" is a misnomer. These are not actual probabilities, they are predicted values, though people often use the terms interchangeably. Binary classification algorithms produce "predicted values" that look like probabilities when they're between 0 and 1, but unless a calibration process is performed, they are not going to represent the probabilities. Calibration is not necessarily a straight-forward process and there are many techniques. Here's some more info about calibration methods for imbalanced data. In H2O, you can perform calibration using Platt scaling using the calibrate_model option. But this is probably not really necessary to what you're trying to do.
The proper way to use the raw output from a binary classification model is to only look at the predicted value for the positive class (you can simply ignore the predicted value for the negative class). Then you choose a threshold which suits your needs, or you can use the default threshold in H2O, which is chosen to maximize the F1 score. Some other software will use a hardcoded threshold of 0.5, but that will be a terrible choice if you don't have an even number of positive and negative examples in your training data. If you have only a few positive examples in your training data, then the best threshold will be something much lower than 0.5.

find endpoints for range given a value within the range

I am trying to solve a simple problem, but at the moment I cannot think of a better solution. I am testing an API that is not documented.
There is an ID used to fetch objects and it has a min and max value with random values missing in-between. I'm trying to test the responses I receive for random objects, but to find objects, I need to have valid IDs.
It would be very inefficient to test random numbers and hope that I get an object back. The best I can do is find a range, get a random number between that range and check if it exists before conducting tests.
A sample list of all of the IDs in the database might look like this:
[1005, 25984, 25986, 29587, 30000, ...]
Assuming the deviation from one value to another will never exceed C, e.g. from the first value to the next value, the difference will never be greater than a pre-defined constant, how would you calculate the min/max of the range given only one value in the range?
Starting from a given value and looping until the last value is found is horrible but that is how it was implemented by previous devs. Below is pseudocode that more or less covers what they do.
// this can be any valid object ID from the database
// assuming the ID's in the database are [1005, 25984, 25986, 29587, 30000]
// "i" could be any one of these values
var i = givenPredefinedObjectId;
var deviation = 100;
// objectWithIdExists() is going to lookup an object with the ID "i" in the database
// if there is no object with the ID "i" , it will return false
// otherwise the object will get tested and return true
while(objectWithIdExists(i)){
i++;
}
for(i; i < i+deviation; i++){
if(objectWithIdExists(i)){
goto while loop;
}
}
endPoint = i - deviation;
Assuming there is no knowledge about the possible values except you can check if they exist and you are given one valid value (there is no array with all possible IDs, that was just an example), how would you find the min/max values?
Unbounded binary search is feasible, with a factor of C slowdown. Given an algorithm for unbounded binary search that, given access to the oracle less_equal(n) for some natural number n, returns n in time O(log n), implement the oracle on input k by querying all of the IDs C*k, C*k+1, ..., C*k+C-1 and reporting that k is less than or equal to n if and only if one ID is found. The running time is O(C*log((max-min)/C)).

Resources