Choose variable based on weighted probability - random

I have a list of variables with their weighted probability, as below:
Cloudy: 0.25
Sunny and warm: 0.25
Dry and cold: 0.125
Wet and cold: 0.25
Wet and warm: 0.375
Stormy/rainy: 0
In google sheets I am using the following function to select a variable based on their cumulative probabilities
=INDEX('WeatherChoices'!A2:A7,COUNTIF('WeatherChoices'!D2:D7,"<="&RAND())+1)
Where A2:A7 is the list of weather types and D2:D7 is the cumulative probabilities.
I am now trying to replicate this as a google script function but can't figure out an alternative to nested IF statements. I.e. generate a random number and if between two of the cumulative values choose the variable.
Is there are more elegant way of doing this in one step?
I have looked through the documentation but couldn't find anything other than the .random() function.

Try this:
function test(){
var number= 0;
for (var i = 0; i < 100; i++){
number = Math.random().toFixed(3);
switch(true){
case number > 0 && number < 0.375:
Logger.log("wet");
break;
case number > 0.376:
Logger.log("something else");
break;
}
}
}
This is a very simple version of what you'll need, but it shows you how to use the switch in App Script. You will need to set each case for the different probabilities you have, similar to with if statements, but this looks much more organized. If you have any doubts, this link will give you the basics, and this one will show you how to set more complex cases.

Related

Calculate statistics on numbers entered by user

This was my tutorial given by the lecturer. I don't understand the question. I need guidance into the right direction.
Write an algorithm to read in a list of basketball scores (non-negative integers) one at a time from the user and output the following statistics:
Total number of games.
Total number of games scoring at least 90 points.
Percentage of games scoring at least 90 points.
The user entering a negative sentinel value indicates the end of the input. Note that the sentinel value is not used in computing the highest, lowest or average game score.
Requirements:
Write pseudo code for how you would solve each statistic
Example: total number of games
For each input score, increment
games by one
Determine the variables you will need and figure out the type of each variable
Define and initialize each variable
Determine what type of loop you are going to write
Start with statistic number one (total number of games) and get your loop to compute the total number of games. When you end your loop,
output the total number of games, and then move to problem two.
You only need to write one loop.
Write a complete algorithm for the above problem.
I've tried to understand the requirement and tried googling for some alternative language but unable to find so
n = 0 // number of games
o = 0 // total number of games scoring at least 90 points
for( o = 0; o <= 90; o++ )
{
input =get user input for score
n++
o += input
}
percentage = n/o *100
output percentage
Have I correctly understood the question criteria?
EDIT Answer Attempt 1 :-
int numGames = 0; //number of games
int numTotalPoints = 0; //total number of games scoring
int userInput =0; //to Track input if negative number is enterred
double average = 0.0 //to get average of the game
double gameTo90Points =0.0; //calculate total games to reach 90 points
double percentage 0.0; //to calculate the percentage
Text.put("Input the game score");
userInput = text.getInt;
while(userInput >= 0 )
{
numTotalPoints += userInput;
numGames++;
Text.put("Input the game score");
userInput = text.getInt;
}
if(numGames = 0)
{
Text.put("Not enough score to tabulate");
}
else
{
average = ((double)numTotalPoints)/numGames);
gameTo90Points = 90/average;
percentage = (gameTo90Points/90)*100
Text.put("Total number of games :" +numGames);
Text.put("Total number of games scoring at least 90 points:" +gameTo90Points);
Text.put("Percentage of games scoring at least 90 points:" +percentage);
}
As this is a task you must complete, we should not provide you with the answer to that assignment.
I will provide some comments on your current pseudo-code.
n = 0 // number of games
o = 0 // total number of games scoring at least 90 points
So far this is a good start, but it is better to use variable names that actually tell something about it (e.g. numGames, numHighScoringGames would be good candidates). Also, the assignment asks to "figure out the type of each variable". This is something you have not done yet...
for( o = 0; o <= 90; o++ )
This loop is wrong. After the loop finishes o will be a number greater than 90. But o is supposed to be a particular number of games (with a score of at least 90). This should trigger an alarm... You haven't read any input yet and you already seem to know there will be more than 90 of such games? That's not right.
The value of o should have nothing to do with whether the loop should continue or not.
input =get user input for score
Again, the data type should be determined for the variable input.
n++
This is good, but you did not take into account this part of the assignment:
The user entering a negative sentinel value indicates the end of the input.
Your code should verify if the user entered a negative sentinel value. And if so, you should not ask for more input.
o += input
The variable o is supposed to be a number of games, but now you are adding a score to it... that cannot be right. Also, you add it unconditionally... Should you not first check whether that game is "scoring at least 90 points"?
percentage = n/o *100
Here you use o as it was intended (as a number of games). But think about this... which one of the two will be greater (when not equal)? n or o? Taking that answer into account: Is your formula correct?
Secondly, could the denominator be zero? Should you protect the code from it?
output percentage
OK, but don't forget that the assignment asks for three statistics, not just one.

How to compute blot exposure in backgammon efficiently

I am trying to implement an algorithm for backgammon similar to td-gammon as described here.
As described in the paper, the initial version of td-gammon used only the raw board encoding in the feature space which created a good playing agent, but to get a world-class agent you need to add some pre-computed features associated with good play. One of the most important features turns out to be the blot exposure.
Blot exposure is defined here as:
For a given blot, the number of rolls out of 36 which would allow the opponent to hit the blot. The total blot exposure is the number of rolls out of 36 which would allow the opponent to hit any blot. Blot exposure depends on: (a) the locations of all enemy men in front of the blot; (b) the number and location of blocking points between the blot and the enemy men and (c) the number of enemy men on the bar, and the rolls which allow them to re-enter the board, since men on the bar must re-enter before blots can be hit.
I have tried various approaches to compute this feature efficiently but my computation is still too slow and I am not sure how to speed it up.
Keep in mind that the td-gammon approach evaluates every possible board position for a given dice roll, so each turn for every players dice roll you would need to calculate this feature for every possible board position.
Some rough numbers: assuming there are approximately 30 board position per turn and an average game lasts 50 turns we get that to run 1,000,000 game simulations takes: (x * 30 * 50 * 1,000,000) / (1000 * 60 * 60 * 24) days where x is the number of milliseconds to compute the feature. Putting x = 0.7 we get approximately 12 days to simulate 1,000,000 games.
I don't really know if that's reasonable timing but I feel there must be a significantly faster approach.
So here's what I've tried:
Approach 1 (By dice roll)
For every one of the 21 possible dice rolls, recursively check to see a hit occurs. Here's the main workhorse for this procedure:
private bool HitBlot(int[] dieValues, Checker.Color checkerColor, ref int depth)
{
Moves legalMovesOfDie = new Moves();
if (depth < dieValues.Length)
{
legalMovesOfDie = LegalMovesOfDie(dieValues[depth], checkerColor);
}
if (depth == dieValues.Length || legalMovesOfDie.Count == 0)
{
return false;
}
bool hitBlot = false;
foreach (Move m in legalMovesOfDie.List)
{
if (m.HitChecker == true)
{
return true;
}
board.ApplyMove(m);
depth++;
hitBlot = HitBlot(dieValues, checkerColor, ref depth);
board.UnapplyMove(m);
depth--;
if (hitBlot == true)
{
break;
}
}
return hitBlot;
}
What this function does is take as input an array of dice values (i.e. if the player rolls 1,1 the array would be [1,1,1,1]. The function then recursively checks to see if there is a hit and if so exits with true. The function LegalMovesOfDie computes the legal moves for that particular die value.
Approach 2 (By blot)
With this approach I first find all the blots and then for each blot I loop though every possible dice value and see if a hit occurs. The function is optimized so that once a dice value registers a hit I don't use it again for the next blot. It is also optimized to only consider moves that are in front of the blot. My code:
public int BlotExposure2(Checker.Color checkerColor)
{
if (DegreeOfContact() == 0 || CountBlots(checkerColor) == 0)
{
return 0;
}
List<Dice> unusedDice = Dice.GetAllDice();
List<int> blotPositions = BlotPositions(checkerColor);
int count = 0;
for(int i =0;i<blotPositions.Count;i++)
{
int blotPosition = blotPositions[i];
for (int j =unusedDice.Count-1; j>= 0;j--)
{
Dice dice = unusedDice[j];
Transitions transitions = new Transitions(this, dice);
bool hitBlot = transitions.HitBlot2(checkerColor, blotPosition);
if(hitBlot==true)
{
unusedDice.Remove(dice);
if (dice.ValuesEqual())
{
count = count + 1;
}
else
{
count = count + 2;
}
}
}
}
return count;
}
The method transitions.HitBlot2 takes a blotPosition parameter which ensures that only moves considered are those that are in front of the blot.
Both of these implementations were very slow and when I used a profiler I discovered that the recursion was the cause, so I then tried refactoring these as follows:
To use for loops instead of recursion (ugly code but it's much faster)
To use parallel.foreach so that instead of checking 1 dice value at a time I check these in parallel.
Here are the average timing results of my runs for 50000 computations of the feature (note the timings for each approach was done of the same data):
Approach 1 using recursion: 2.28 ms per computation
Approach 2 using recursion: 1.1 ms per computation
Approach 1 using for loops: 1.02 ms per computation
Approach 2 using for loops: 0.57 ms per computation
Approach 1 using parallel.foreach: 0.75 ms per computation
6 Approach 2 using parallel.foreach: 0.75 ms per computation
I've found the timings to be quite volatile (Maybe dependent on the random initialization of the neural network weights) but around 0.7 ms seems achievable which if you recall leads to 12 days of training for 1,000,000 games.
My questions are: Does anyone know if this is reasonable? Is there a faster algorithm I am not aware of that can reduce training?
One last piece of info: I'm running on a fairly new machine. Intel Cote (TM) i7-5500U CPU #2.40 GHz.
Any more info required please let me know and I will provide.
Thanks,
Ofir
Yes, calculating these features makes really hairy code. Look at the GNU Backgammon code. find the eval.c and look at the lines for 1008 to 1267. Yes, it's 260 lines of code. That code calculates what the number of rolls that hits at least one checker, and also the number of rolls that hits at least 2 checkers. As you see, the code is hairy.
If you find a better way to calculate this, please post your results. To improve I think you have to look at the board representation. Can you represent the board in a different way that makes this calculation faster?

linear probability from non-linear one

here is my problem.
Imagine you've got a function like that : (here in C )
int strangeRand() {
if ( rand() % 100 <= 70 ) return 0;
else return 1;
}
This one return 0 a probability of 0.7
and 1 with a probability of 0.3
Here is what i want to do, create a function that return 0 with a probability of 0.5 and 1 with a probability of 0.5 too.
I need to only use strangeRand() function [Can't modify it] (and loop, and if etc but no rand() function )
Is someone got an idea, how to do this ?
Thanks.
This is actually a solved problem! It's usually known as getting a fair result from an unfair coin.
The algorithm works as follows:
Call the function twice.
If the results match, start over, forgetting both results.
If the results differ, use the first result, forgetting the second.
The provided link contains an explanation of why the algorithm works.

Get N samples given iterator

Given are an iterator it over data points, the number of data points we have n, and the maximum number of samples we want to use to do some calculations (maxSamples).
Imagine a function calculateStatistics(Iterator it, int n, int maxSamples). This function should use the iterator to retrieve the data and do some (heavy) calculations on the data element retrieved.
if n <= maxSamples we will of course use each element we get from the iterator
if n > maxSamples we will have to choose which elements to look at and which to skip
I've been spending quite some time on this. The problem is of course how to choose when to skip an element and when to keep it. My approaches so far:
I don't want to take the first maxSamples coming from the iterator, because the values might not be evenly distributed.
Another idea was to use a random number generator and let me create maxSamples (distinct) random numbers between 0 and n and take the elements at these positions. But if e.g. n = 101 and maxSamples = 100 it gets more and more difficult to find a new distinct number not yet in the list, loosing lot of time just in the random number generation
My last idea was to do the contrary: to generate n - maxSamples random numbers and exclude the data elements at these positions elements. But this also doesn't seem to be a very good solution.
Do you have a good idea for this problem? Are there maybe standard known algorithms for this?
To provide some answer, a good way to collect a set of random numbers given collection size > elements needed, is the following. (in C++ ish pseudo code).
EDIT: you may need to iterate over and create the "someElements" vector first. If your elements are large they can be "pointers" to these elements to save space.
vector randomCollectionFromVector(someElements, numElementsToGrab) {
while(numElementsToGrab--) {
randPosition = rand() % someElements.size();
resultVector.push(someElements.get(randPosition))
someElements.remove(randPosition);
}
return resultVector;
}
If you don't care about changing your vector of elements, you could also remove random elements from someElements, as you mentioned. The algorithm would look very similar, and again, this is conceptually the same idea, you just pass someElements by reference, and manipulate it.
Something worth noting, is the quality of psuedo random distributions as far as how random they are, grows as the size of the distribution you used increases. So, you may tend to get better results if you pick which method you use based on which method results in the use of more random numbers. Example: if you have 100 values, and need 99, you should probably pick 99 values, as this will result in you using 99 pseudo random numbers, instead of just 1. Conversely, if you have 1000 values, and need 99, you should probably prefer the version where you remove 901 values, because you use more numbers from the psuedo random distribution. If what you want is a solid random distribution, this is a very simple optimization, that will greatly increase the quality of "fake randomness" that you see. Alternatively, if performance matters more than distribution, you would take the alternative or even just grab the first 99 values approach.
interval = n/(n-maxSamples) //an euclidian division of course
offset = random(0..(n-1)) //a random number between 0 and n-1
totalSkip = 0
indexSample = 0;
FOR it IN samples DO
indexSample++ // goes from 1 to n
IF totalSkip < (n-maxSamples) AND indexSample+offset % interval == 0 THEN
//do nothing with this sample
totalSkip++
ELSE
//work with this sample
ENDIF
ENDFOR
ASSERT(totalSkip == n-maxSamples) //to be sure
interval represents the distance between two samples to skip.
offset is not mandatory but it allows to have a very little diversity.
Based on the discussion, and greater understanding of your problem, I suggest the following. You can take advantage of a property of prime numbers that I think will net you a very good solution, that will appear to grab pseudo random numbers. It is illustrated in the following code.
#include <iostream>
using namespace std;
int main() {
const int SOME_LARGE_PRIME = 577; //This prime should be larger than the size of your data set.
const int NUM_ELEMENTS = 100;
int lastValue = 0;
for(int i = 0; i < NUM_ELEMENTS; i++) {
lastValue += SOME_LARGE_PRIME;
cout << lastValue % NUM_ELEMENTS << endl;
}
}
Using the logic presented here, you can create a table of all values from 1 to "NUM_ELEMENTS". Because of the properties of prime numbers, you will not get any duplicates until you rotate all the way around back to the size of your data set. If you then take the first "NUM_SAMPLES" of these, and sort them, you can iterate through your data structure, and grab a pseudo random distribution of numbers(not very good random, but more random than a pre-determined interval), without extra space and only one pass over your data. Better yet, you can change the layout of the distribution by grabbing a random prime number each time, again must be larger than your data set, or the following example breaks.
PRIME = 3, data set size = 99. Won't work.
Of course, ultimately this is very similar to the pre-determined interval, but it inserts a level of randomness that you do not get by simply grabbing every "size/num_samples"th element.
This is called the Reservoir sampling

How to generate random number with a kind of required probability?

In my game a user should be able to specify a probability when one or another random scene should be shown up with two integers A_x and B_x. Say, when A_x = 3, B_x = 6 a scene B should be shown in general 2 times frequently than scene A.
Are there any read-to-use formulas? Could you please point me out on them?
The first imagined idea of mine is smth. like saving the previously generated scene id and count it accordingly to the probability criterias A_x and B_x; but it looks silly.
With just two alternatives, you can work out the probability of A as A_x / (A_x + B_x) = 1/3. If you have a random number generator returning numbers uniformly distributed between 0 and 1 with a call such as rr.nextDouble() then something like the following should work.
if (rr.nextDouble() <= probA)
{
show A
}
else
{
show B
}
This generates A if the random number generator generates something <= probA, which should happen with probability probA.

Resources