Binary search is not efficient with traversal costs. What is? - algorithm

Binary search let me down when I tried to apply it to the real world. The scenario is as follows.
I need to test the range of a device that communicates over radio.
Communication needs to occur quickly, but slow transmission is
tolerable, up to a point (say, about 3 minutes). I need to test
whether transmissions will be successful every 200 feet until failure, up to 1600
feet. Every 200 feet a test will be run which requires 3 minutes to
execute.
I naively assumed that a binary search would be the most efficient method of finding the failure point, but consider a travel speed of 200 ft/min and test time of 3 minutes. If failure to transmit occurs at 500 feet, binary search is not the most efficient means of finding the failure point, as shown below.
Simply walking along and testing every single point would have found the solution sooner, taking only 12 minutes, whereas binary search & testing would take 16 minutes.
My question: How do you calculate the most efficient path to the solution when traveling time matters? What is this called (e.g., binary-travel search, etc.)?

Binary search is indeed predicated on O(1) access times; there's little point binary searching a linked list, for example [but see Note 1], and that's essentially what you're doing, since you seem to be assuming that only discrete intervals are worth testing. If you were seeking a more accurate answer, you would find that the binary search allows an arbitrary precision, at the cost of one additional test per bit of precision.
Let's suppose you don't know even what the maximum value might be. Then you couldn't first test in the middle, since you wouldn't know where the middle was. Instead, you might do an exponential search for a limit (which is kind of a binary search inside out); you start by testing at x, then 2x, then 4x until you reach a point which is greater than the maximum (the signal doesn't reach that far). (x is the smallest answer you find interesting; in other words, if the first test at x shows the signal doesn't reach, you will then stop.) At the end of this phase, you'll be at 2ix, for some integer i, and you will know the answer is between 2i-1x and 2ix.
Now you can actually do the binary search, starting by going backwards by 2i-2x. From there, you might go either forwards or backwards, but you will definitely travel 2i-3x, and the next iteration you'll travel 2i-4x, and so on.
So in all, in the first phase (search for a maximum), you walked to 2ix, and did i tests. In the second phase, binary refinement, you walk a total of (2i-1-1)x and do i-1 tests. You'll end up at some point d which is between 2i-1 and 2i, so at worst you'll have walked 3d of the final point (and at best, you'll have walked 3d/2). The number of tests you will have done will be 2*ceil(log2(d/x)) - 1, which is within one test of 2*log2(d/x).
Under what circumstances should you do the binary search algorithm, then? Basically, it depends on the ratio of the travel time and the test time, and the desired precision of the answer. The simple sequential algorithm finds position d after d/x moves of size x and d/x tests; the binary search algorithm above finds position d after travelling at most 3d but doing only around 2 log(d/x) tests. Roughly speaking, if a test costs you more than twice the cost of travelling d/x, and the expected distance is sufficiently larger than the precision, you should prefer the binary search.
In your example, you appear to want the result with a precision of 200 feet; the travel time is 1 minute and the test time is 3 minutes, which is more than twice the travel time. So you should prefer the binary search, unless you expect that the answer will be found in a small number of multiples of the precision (as is the case). Note that although the binary algorithm uses four tests and 1000 feet of travel (compared with three tests and 600 feet for the sequential algorithm), improving the precision to 50 feet will only add four more tests and 150 feet of travel to the binary algorithm, while the sequential algorithm will require 20 tests.
Note 1: Actually, it might make sense to binary search a linked list, using precisely the above algorithm, if the cost of the test is high. Assuming the cost of the test is not proportional to the index in the list, the complexity of the search will be O(N) for both a lineary search and the binary search, but the binary search will do O(log N) tests and O(N) steps, while the sequential search will do O(N) tests and O(N) steps. For large enough N, this doesn't matter, but for real-world sized N it might matter a lot.

In reality, binary search can be applied here, but with several changes. We must calc not center, but an optimalPosition to visit.
int length = maxUnchecked - minChecked;
whereToGo = minChecked + (int)(length * factorIncrease) + stepIncrease;
Because we need find first position where communication failing, sometimes we must go back, after that can be optimal to use other strategy
int length = maxUnchecked - minChecked;
int whereToGo = 0;
if ( increase )
whereToGo = minChecked + (int)(length * factorIncrease) + stepIncrease;
else
whereToGo = minChecked + (int)(length * factorDecrease) + stepDecrease;
So, our task - to figure out such optimal factorIncrease, factorDecrease, stepIncrease, stepDecrease, that value of sum of f(failPos) will be minimal. How? Full bruteforce will help you if n (total length / 200.0f) is small. Else you can try use genetic algorithms or smth simple.
Step precision = 1, step limit = [0, n).
Factor eps - 1/(4*n), factor limit - [0,1).
Now, simple code (c#) to demonstate this:
class Program
{
static double factorIncrease;
static int stepIncrease;
static double factorDecrease;
static int stepDecrease;
static bool debug = false;
static int f(int lastPosition, int minChecked, int maxUnchecked, int last, int failPos, bool increase = true, int depth = 0)
{
if ( depth == 100 )
throw new Exception();
if ( maxUnchecked - minChecked <= 0 ) {
if ( debug )
Console.WriteLine("left: {0} right: {1}", minChecked, maxUnchecked);
return 0;
}
int length = maxUnchecked - minChecked;
int whereToGo = 0;
if ( increase )
whereToGo = minChecked + (int)(length * factorIncrease) + stepIncrease;
else
whereToGo = minChecked + (int)(length * factorDecrease) + stepDecrease;
if ( whereToGo <= minChecked )
whereToGo = minChecked + 1;
if ( whereToGo >= maxUnchecked )
whereToGo = maxUnchecked;
int cur = Math.Abs(whereToGo - lastPosition) + 3;
if ( debug ) {
Console.WriteLine("left: {2} right: {3} whereToGo:{0} cur: {1}", whereToGo, cur, minChecked, maxUnchecked);
}
if ( failPos == whereToGo || whereToGo == maxUnchecked )
return cur + f(whereToGo, minChecked, whereToGo - 1, last, failPos, true & increase, depth + 1);
else if ( failPos < whereToGo )
return cur + f(whereToGo, minChecked, whereToGo, last, failPos, true & increase, depth + 1);
else
return cur + f(whereToGo, whereToGo, maxUnchecked, last, failPos, false, depth + 1);
}
static void Main(string[] args)
{
int n = 20;
int minSum = int.MaxValue;
var minFactorIncrease = 0.0;
var minStepIncrease = 0;
var minFactorDecrease = 0.0;
var minStepDecrease = 0;
var eps = 1 / (4.00 * (double)n);
for ( factorDecrease = 0.0; factorDecrease < 1; factorDecrease += eps )
for ( stepDecrease = 0; stepDecrease < n; stepDecrease++ )
for ( factorIncrease = 0.0; factorIncrease < 1; factorIncrease += eps )
for ( stepIncrease = 0; stepIncrease < n; stepIncrease++ ) {
int cur = 0;
for ( int i = 0; i < n; i++ ) {
try {
cur += f(0, -1, n - 1, n - 1, i);
}
catch {
Console.WriteLine("fail {0} {1} {2} {3} {4}", factorIncrease, stepIncrease, factorDecrease, stepDecrease, i);
return;
}
}
if ( cur < minSum ) {
minSum = cur;
minFactorIncrease = factorIncrease;
minStepIncrease = stepIncrease;
minFactorDecrease = factorDecrease;
minStepDecrease = stepDecrease;
}
}
Console.WriteLine("best - mathmin={4}, f++:{0} s++:{1} f--:{2} s--:{3}", minFactorIncrease, minStepIncrease, minFactorDecrease, minStepDecrease, minSum);
factorIncrease = minFactorIncrease;
factorDecrease = minFactorDecrease;
stepIncrease = minStepIncrease;
stepDecrease = minStepDecrease;
//debug =true;
for ( int i = 0; i < n; i++ )
Console.WriteLine("{0} {1}", 3 + i * 4, f(0, -1, n - 1, n - 1, i));
debug = true;
Console.WriteLine(f(0, -1, n - 1, n - 1, n - 1));
}
}
So, some values (f++ - factorIncrease, s++ - stepIncrease, f-- - factorDecrease):
n = 9 mathmin = 144, f++: 0,1(1) s++: 1 f--: 0,2(2) s--: 1
n = 20 mathmin = 562, f++: 0,1125 s++: 2 f--: 0,25 s--: 1

Depending on what you actually want to optimise, there may be a way to work out an optimum search pattern. I presume you don't want to optimise the worst case time, because the slowest case for many search strategies will be when the break is at the very end, and binary search is actually pretty good here - you walk to the end without changing direction, and you don't make very many stops.
You might consider different binary trees, and perhaps work out the average time taken to work your way down to a leaf. Binary search is one sort of tree, and so is walking along and testing as you go - a very unbalanced tree in which each node has at least one leaf attached to it.
When following along such a tree you always start at one end or another of the line you are walking along, walk some distance before making a measurement, and then, depending on the result and the tree, either stop or repeat the process with a shorter line, where you are at one end or another of it.
This gives you something you can attack using dynamic programming. Suppose you have solved the problem for lengths of up to N segments, so that you know the cost for the optimum solutions of these lengths. Now you can work out the optimum solution for N+1 segments. Consider breaking the N+1 segments into two pieces in the N+1 possible ways. For each such way, work out the cost of moving to its decision point and taking a measurement and then add on the cost of the best possible solutions for the two sections of segments on either side of the decision point, possibly weighted to account for the probability of ending up in those sections. By considering those N+1 possible ways, you can work out the best way of splitting up N+1 segments, and its cost, and continue until you work out a best solution for the number of sections you actually have.

Related

Generating Random Numbers for RPG games

I'm wondering if there is an algorithm to generate random numbers that most likely will be low in a range from min to max. For instance if you generate a random number between 1 and 100 it should most of the time be below 30 if you call the function with f(min: 1, max: 100, avg: 30), but if you call it with f(min: 1, max: 200, avg: 10) the most the average should be 10. A lot of games does this, but I simply can't find a way to do this with formula. Most of the examples I have seen uses a "drop table" or something like that.
I have come up with a fairly simple way to weight the outcome of a roll, but it is not very efficient and you don't have a lot of control over it
var pseudoRand = function(min, max, n) {
if (n > 0) {
return pseudoRand(min, Math.random() * (max - min) + min, n - 1)
}
return max;
}
rands = []
for (var i = 0; i < 20000; i++) {
rands.push(pseudoRand(0, 100, 1))
}
avg = rands.reduce(function(x, y) { return x + y } ) / rands.length
console.log(avg); // ~50
The function simply picks a random number between min and max N times, where it for every iteration updates the max with the last roll. So if you call it with N = 2, and max = 100 then it must roll 100 two times in a row in order to return 100
I have looked at some distributions on wikipedia, but I don't quite understand them enough to know how I can control the min and max outputs etc.
Any help is very much welcomed
A simple way to generate a random number with a given distribution is to pick a random number from a list where the numbers that should occur more often are repeated according with the desired distribution.
For example if you create a list [1,1,1,2,2,2,3,3,3,4] and pick a random index from 0 to 9 to select an element from that list you will get a number <4 with 90% probability.
Alternatively, using the distribution from the example above, generate an array [2,5,8,9] and pick a random integer from 0 to 9, if it's ≤2 (this will occur with 30% probability) then return 1, if it's >2 and ≤5 (this will also occur with 30% probability) return 2, etc.
Explained here: https://softwareengineering.stackexchange.com/a/150618
A probability distribution function is just a function that, when you put in a value X, will return the probability of getting that value X. A cumulative distribution function is the probability of getting a number less than or equal to X. A CDF is the integral of a PDF. A CDF is almost always a one-to-one function, so it almost always has an inverse.
To generate a PDF, plot the value on the x-axis and the probability on the y-axis. The sum (discrete) or integral (continuous) of all the probabilities should add up to 1. Find some function that models that equation correctly. To do this, you may have to look up some PDFs.
Basic Algorithm
https://en.wikipedia.org/wiki/Inverse_transform_sampling
This algorithm is based off of Inverse Transform Sampling. The idea behind ITS is that you are randomly picking a value on the y-axis of the CDF and finding the x-value it corresponds to. This makes sense because the more likely a value is to be randomly selected, the more "space" it will take up on the y-axis of the CDF.
Come up with some probability distribution formula. For instance, if you want it so that as the numbers get higher the odds of them being chosen increases, you could use something like f(x)=x or f(x)=x^2. If you want something that bulges in the middle, you could use the Gaussian Distribution or 1/(1+x^2). If you want a bounded formula, you can use the Beta Distribution or the Kumaraswamy Distribution.
Integrate the PDF to get the Cumulative Distribution Function.
Find the inverse of the CDF.
Generate a random number and plug it into the inverse of the CDF.
Multiply that result by (max-min) and then add min
Round the result to the nearest integer.
Steps 1 to 3 are things you have to hard code into the game. The only way around it for any PDF is to solve for the shape parameters of that correspond to its mean and holds to the constraints on what you want the shape parameters to be. If you want to use the Kumaraswamy Distribution, you will set it so that the shape parameters a and b are always greater than one.
I would suggest using the Kumaraswamy Distribution because it is bounded and it has a very nice closed form and closed form inverse. It only has two parameters, a and b, and it is extremely flexible, as it can model many different scenarios, including polynomial behavior, bell curve behavior, and a basin-like behavior that has a peak at both edges. Also, modeling isn't too hard with this function. The higher the shape parameter b is, the more tilted it will be to the left, and the higher the shape parameter a is, the more tilted it will be to the right. If a and b are both less than one, the distribution will look like a trough or basin. If a or b is equal to one, the distribution will be a polynomial that does not change concavity from 0 to 1. If both a and b equal one, the distribution is a straight line. If a and b are greater than one, than the function will look like a bell curve. The best thing you can do to learn this is to actually graph these functions or just run the Inverse Transform Sampling algorithm.
https://en.wikipedia.org/wiki/Kumaraswamy_distribution
For instance, if I want to have a probability distribution shaped like this with a=2 and b=5 going from 0 to 100:
https://www.wolframalpha.com/input/?i=2*5*x%5E(2-1)*(1-x%5E2)%5E(5-1)+from+x%3D0+to+x%3D1
Its CDF would be:
CDF(x)=1-(1-x^2)^5
Its inverse would be:
CDF^-1(x)=(1-(1-x)^(1/5))^(1/2)
The General Inverse of the Kumaraswamy Distribution is:
CDF^-1(x)=(1-(1-x)^(1/b))^(1/a)
I would then generate a number from 0 to 1, put it into the CDF^-1(x), and multiply the result by 100.
Pros
Very accurate
Continuous, not discreet
Uses one formula and very little space
Gives you a lot of control over exactly how the randomness is spread out
Many of these formulas have CDFs with inverses of some sort
There are ways to bound the functions on both ends. For instance, the Kumaraswamy Distribution is bounded from 0 to 1, so you just input a float between zero and one, then multiply the result by (max-min) and add min. The Beta Distribution is bounded differently based on what values you pass into it. For something like PDF(x)=x, the CDF(x)=(x^2)/2, so you can generate a random value from CDF(0) to CDF(max-min).
Cons
You need to come up with the exact distributions and their shapes you plan on using
Every single general formula you plan on using needs to be hard coded into the game. In other words, you can program the general Kumaraswamy Distribution into the game and have a function that generates random numbers based on the distribution and its parameters, a and b, but not a function that generates a distribution for you based on the average. If you wanted to use Distribution x, you would have to find out what values of a and b best fit the data you want to see and hard code those values into the game.
I would use a simple mathematical function for that. From what you describe, you need an exponential progression like y = x^2. at average (average is at x=0.5 since rand gets you a number from 0 to 1) you would get 0.25. If you want a lower average number, you can use a higher exponent like y = x^3 what would result in y = 0.125 at x = 0.5
Example:
http://www.meta-calculator.com/online/?panel-102-graph&data-bounds-xMin=-2&data-bounds-xMax=2&data-bounds-yMin=-2&data-bounds-yMax=2&data-equations-0=%22y%3Dx%5E2%22&data-rand=undefined&data-hideGrid=false
PS: I adjusted the function to calculate the needed exponent to get the average result.
Code example:
function expRand (min, max, exponent) {
return Math.round( Math.pow( Math.random(), exponent) * (max - min) + min);
}
function averageRand (min, max, average) {
var exponent = Math.log(((average - min) / (max - min))) / Math.log(0.5);
return expRand(min, max, exponent);
}
alert(averageRand(1, 100, 10));
You may combine 2 random processes. For example:
first rand R1 = f(min: 1, max: 20, avg: 10);
second rand R2 = f(min:1, max : 10, avg : 1);
and then multiply R1*R2 to have a result between [1-200] and average around 10 (the average will be shifted a bit)
Another option is to find the inverse of the random function you want to use. This option has to be initialized when your program starts but doesn't need to be recomputed. The math used here can be found in a lot of Math libraries. I will explain point by point by taking the example of an unknown random function where only four points are known:
First, fit the four point curve with a polynomial function of order 3 or higher.
You should then have a parametrized function of type : ax+bx^2+cx^3+d.
Find the indefinite integral of the function (the form of the integral is of type a/2x^2+b/3x^3+c/4x^4+dx, which we will call quarticEq).
Compute the integral of the polynomial from your min to your max.
Take a uniform random number between 0-1, then multiply by the value of the integral computed in Step 5. (we name the result "R")
Now solve the equation R = quarticEq for x.
Hopefully the last part is well known, and you should be able to find a library that can do this computation (see wiki). If the inverse of the integrated function does not have a closed form solution (like in any general polynomial with degree five or higher), you can use a root finding method such as Newton's Method.
This kind of computation may be use to create any kind of random distribution.
Edit :
You may find the Inverse Transform Sampling described above in wikipedia and I found this implementation (I haven't tried it.)
You can keep a running average of what you have returned from the function so far and based on that in a while loop get the next random number that fulfills the average, adjust running average and return the number
Using a drop table permit a very fast roll, that in a real time game matter. In fact it is only one random generation of a number from a range, then according to a table of probabilities (a Gauss distribution for that range) a if statement with multiple choice. Something like that:
num = random.randint(1,100)
if num<10 :
case 1
if num<20 and num>10 :
case 2
...
It is not very clean but when you have a finite number of choices it can be very fast.
There are lots of ways to do so, all of which basically boil down to generating from a right-skewed (a.k.a. positive-skewed) distribution. You didn't make it clear whether you want integer or floating point outcomes, but there are both discrete and continuous distributions that fit the bill.
One of the simplest choices would be a discrete or continuous right-triangular distribution, but while that will give you the tapering off you desire for larger values, it won't give you independent control of the mean.
Another choice would be a truncated exponential (for continuous) or geometric (for discrete) distribution. You'd need to truncate because the raw exponential or geometric distribution has a range from zero to infinity, so you'd have to lop off the upper tail. That would in turn require you to do some calculus to find a rate λ which yields the desired mean after truncation.
A third choice would be to use a mixture of distributions, for instance choose a number uniformly in a lower range with some probability p, and in an upper range with probability (1-p). The overall mean is then p times the mean of the lower range + (1-p) times the mean of the upper range, and you can dial in the desired overall mean by adjusting the ranges and the value of p. This approach will also work if you use non-uniform distribution choices for the sub-ranges. It all boils down to how much work you're willing to put into deriving the appropriate parameter choices.
One method would not be the most precise method, but could be considered "good enough" depending on your needs.
The algorithm would be to pick a number between a min and a sliding max. There would be a guaranteed max g_max and a potential max p_max. Your true max would slide depending on the results of another random call. This will give you a skewed distribution you are looking for. Below is the solution in Python.
import random
def get_roll(min, g_max, p_max)
max = g_max + (random.random() * (p_max - g_max))
return random.randint(min, int(max))
get_roll(1, 10, 20)
Below is a histogram of the function ran 100,000 times with (1, 10, 20).
private int roll(int minRoll, int avgRoll, int maxRoll) {
// Generating random number #1
int firstRoll = ThreadLocalRandom.current().nextInt(minRoll, maxRoll + 1);
// Iterating 3 times will result in the roll being relatively close to
// the average roll.
if (firstRoll > avgRoll) {
// If the first roll is higher than the (set) average roll:
for (int i = 0; i < 3; i++) {
int verificationRoll = ThreadLocalRandom.current().nextInt(minRoll, maxRoll + 1);
if (firstRoll > verificationRoll && verificationRoll >= avgRoll) {
// If the following condition is met:
// The iteration-roll is closer to 30 than the first roll
firstRoll = verificationRoll;
}
}
} else if (firstRoll < avgRoll) {
// If the first roll is lower than the (set) average roll:
for (int i = 0; i < 3; i++) {
int verificationRoll = ThreadLocalRandom.current().nextInt(minRoll, maxRoll + 1);
if (firstRoll < verificationRoll && verificationRoll <= avgRoll) {
// If the following condition is met:
// The iteration-roll is closer to 30 than the first roll
firstRoll = verificationRoll;
}
}
}
return firstRoll;
}
Explanation:
roll
check if the roll is above, below or exactly 30
if above, reroll 3 times & set the roll according to the new roll, if lower but >= 30
if below, reroll 3 times & set the roll according to the new roll, if
higher but <= 30
if exactly 30, don't set the roll anew
return the roll
Pros:
simple
effective
performs well
Cons:
You'll naturally have more results that are in the range of 30-40 than you'll have in the range of 20-30, simple due to the 30-70 relation.
Testing:
You can test this by using the following method in conjunction with the roll()-method. The data is saved in a hashmap (to map the number to the number of occurences).
public void rollTheD100() {
int maxNr = 100;
int minNr = 1;
int avgNr = 30;
Map<Integer, Integer> numberOccurenceMap = new HashMap<>();
// "Initialization" of the map (please don't hit me for calling it initialization)
for (int i = 1; i <= 100; i++) {
numberOccurenceMap.put(i, 0);
}
// Rolling (100k times)
for (int i = 0; i < 100000; i++) {
int dummy = roll(minNr, avgNr, maxNr);
numberOccurenceMap.put(dummy, numberOccurenceMap.get(dummy) + 1);
}
int numberPack = 0;
for (int i = 1; i <= 100; i++) {
numberPack = numberPack + numberOccurenceMap.get(i);
if (i % 10 == 0) {
System.out.println("<" + i + ": " + numberPack);
numberPack = 0;
}
}
}
The results (100.000 rolls):
These were as expected. Note that you can always fine-tune the results, simply by modifying the iteration-count in the roll()-method (the closer to 30 the average should be, the more iterations should be included (note that this could hurt the performance to a certain degree)). Also note that 30 was (as expected) the number with the highest number of occurences, by far.
<10: 4994
<20: 9425
<30: 18184
<40: 29640
<50: 18283
<60: 10426
<70: 5396
<80: 2532
<90: 897
<100: 223
Try this,
generate a random number for the range of numbers below the average and generate a second random number for the range of numbers above the average.
Then randomly select one of those, each range will be selected 50% of the time.
var psuedoRand = function(min, max, avg) {
var upperRand = (int)(Math.random() * (max - avg) + avg);
var lowerRand = (int)(Math.random() * (avg - min) + min);
if (math.random() < 0.5)
return lowerRand;
else
return upperRand;
}
Having seen much good explanations and some good ideas, I still think this could help you:
You can take any distribution function f around 0, and substitute your interval of interest to your desired interval [1,100]: f -> f'.
Then feed the C++ discrete_distribution with the results of f'.
I've got an example with the normal distribution below, but I can't get my result into this function :-S
#include <iostream>
#include <random>
#include <chrono>
#include <cmath>
using namespace std;
double p1(double x, double mean, double sigma); // p(x|x_avg,sigma)
double p2(int x, int x_min, int x_max, double x_avg, double z_min, double z_max); // transform ("stretch") it to the interval
int plot_ps(int x_avg, int x_min, int x_max, double sigma);
int main()
{
int x_min = 1;
int x_max = 20;
int x_avg = 6;
double sigma = 5;
/*
int p[]={2,1,3,1,2,5,1,1,1,1};
default_random_engine generator (chrono::system_clock::now().time_since_epoch().count());
discrete_distribution<int> distribution {p*};
for (int i=0; i< 10; i++)
cout << i << "\t" << distribution(generator) << endl;
*/
plot_ps(x_avg, x_min, x_max, sigma);
return 0; //*/
}
// Normal distribution function
double p1(double x, double mean, double sigma)
{
return 1/(sigma*sqrt(2*M_PI))
* exp(-(x-mean)*(x-mean) / (2*sigma*sigma));
}
// Transforms intervals to your wishes ;)
// z_min and z_max are the desired values f'(x_min) and f'(x_max)
double p2(int x, int x_min, int x_max, double x_avg, double z_min, double z_max)
{
double y;
double sigma = 1.0;
double y_min = -sigma*sqrt(-2*log(z_min));
double y_max = sigma*sqrt(-2*log(z_max));
if(x < x_avg)
y = -(x-x_avg)/(x_avg-x_min)*y_min;
else
y = -(x-x_avg)/(x_avg-x_max)*y_max;
return p1(y, 0.0, sigma);
}
//plots both distribution functions
int plot_ps(int x_avg, int x_min, int x_max, double sigma)
{
double z = (1.0+x_max-x_min);
// plot p1
for (int i=1; i<=20; i++)
{
cout << i << "\t" <<
string(int(p1(i, x_avg, sigma)*(sigma*sqrt(2*M_PI)*20.0)+0.5), '*')
<< endl;
}
cout << endl;
// plot p2
for (int i=1; i<=20; i++)
{
cout << i << "\t" <<
string(int(p2(i, x_min, x_max, x_avg, 1.0/z, 1.0/z)*(20.0*sqrt(2*M_PI))+0.5), '*')
<< endl;
}
}
With the following result if I let them plot:
1 ************
2 ***************
3 *****************
4 ******************
5 ********************
6 ********************
7 ********************
8 ******************
9 *****************
10 ***************
11 ************
12 **********
13 ********
14 ******
15 ****
16 ***
17 **
18 *
19 *
20
1 *
2 ***
3 *******
4 ************
5 ******************
6 ********************
7 ********************
8 *******************
9 *****************
10 ****************
11 **************
12 ************
13 *********
14 ********
15 ******
16 ****
17 ***
18 **
19 **
20 *
So - if you could give this result to the discrete_distribution<int> distribution {}, you got everything you want...
Well, from what I can see of your problem, I would want for the solution to meet these criteria:
a) Belong to a single distribution: If we need to "roll" (call math.Random) more than once per function call and then aggregate or discard some results, it stops being truly distributed according to the given function.
b) Not be computationally intensive: Some of the solutions use Integrals, (Gamma distribution, Gaussian Distribution), and those are computationally intensive. In your description, you mention that you want to be able to "calculate it with a formula", which fits this description (basically, you want an O(1) function).
c) Be relatively "well distributed", e.g. not have peaks and valleys, but instead have most results cluster around the mean, and have nice predictable slopes downwards towards the ends, and yet have the probability of the min and the max to be not zero.
d) Not to require to store a large array in memory, as in drop tables.
I think this function meets the requirements:
var pseudoRand = function(min, max, avg )
{
var randomFraction = Math.random();
var head = (avg - min);
var tail = (max - avg);
var skewdness = tail / (head + tail);
if (randomFraction < skewdness)
return min + (randomFraction / skewdness) * head;
else
return avg + (1 - randomFraction) / (1 - skewdness) * tail;
}
This will return floats, but you can easily turn them to ints by calling
(int) Math.round(pseudoRand(...))
It returned the correct average in all of my tests, and it is also nicely distributed towards the ends. Hope this helps. Good luck.

Divide N cake to M people with minimum wastes

So here is the question:
In a party there are n different-flavored cakes of volume V1, V2, V3 ... Vn each. Need to divide them into K people present in the party such that
All members of party get equal volume of cake (say V, which is the solution we are looking for)
Each member should get a cake of single flavour only (you cannot distribute parts of different flavored cakes to a member).
Some volume of cake will be wasted after distribution, we want to minimize the waste; or, equivalently, we are after a maximum distribution policy
Given known condition that: if V is an optimal solution, then at least one cake, X, can be divided by V without any volume left, i.e., Vx mod V == 0
I am trying to look for a solution with best time complexity (brute force will do it, but I need a quicker way).
Any suggestion would be appreciated.
Thanks
PS: It is not an assignment, it is an Interview question. Here is the pseducode for brute force:
int return_Max_volumn(List VolumnList)
{
maxVolumn = 0;
minimaxLeft = Integer.Max_value;
for (Volumn v: VolumnList)
for i = 1 to K people
targeVolumn = v/i;
NumberofpeoplecanGetcake = v1/targetVolumn +
v2/targetVolumn + ... + vn/targetVolumn
if (numberofPeopleCanGetcake >= k)
remainVolumn = (v1 mod targetVolumn) + (v2 mod targetVolumn)
+ (v3 mod targetVolumn + ... + (vn mod targetVolumn)
if (remainVolumn < minimaxLeft)
update maxVolumn to be targetVolumn;
update minimaxLeft to be remainVolumn
return maxVolumn
}
This is a somewhat classic programming-contest problem.
The answer is simple: do a basic binary search on volume V (the final answer).
(Note the title says M people, yet the problem description says K. I'll be using M)
Given a volume V during the search, you iterate through all of the cakes, calculating how many people each cake can "feed" with single-flavor slices (fed += floor(Vi/V)). If you reach M (or 'K') people "fed" before you're out of cakes, this means you can obviously also feed M people with any volume < V with whole single-flavor slices, by simply consuming the same amount of (smaller) slices from each cake. If you run out of cakes before reaching M slices, it means you cannot feed the people with any volume > V either, as that would consume even more cake than what you've already failed with. This satisfies the condition for a binary search, which will lead you to the highest volume V of single-flavor slices that can be given to M people.
The complexity is O(n * log((sum(Vi)/m)/eps) ). Breakdown: the binary search takes log((sum(Vi)/m)/eps) iterations, considering the upper bound of sum(Vi)/m cake for each person (when all the cakes get consumed perfectly). At each iteration, you have to pass through at most all N cakes. eps is the precision of your search and should be set low enough, no higher than the minimum non-zero difference between the volume of two cakes, divided by M*2, so as to guarantee a correct answer. Usually you can just set it to an absolute precision such as 1e-6 or 1e-9.
To speed things up for the average case, you should sort the cakes in decreasing order, such that when you are trying a large volume, you instantly discard all the trailing cakes with total volume < V (e.g. you have one cake of volume 10^6 followed by a bunch of cakes of volume 1.0. If you're testing a slice volume of 2.0, as soon as you reach the first cake of volume 1.0 you can already return that this run failed to provide M slices)
Edit:
The search is actually done with floating point numbers, e.g.:
double mid, lo = 0, hi = sum(Vi)/people;
while(hi - lo > eps){
mid = (lo+hi)/2;
if(works(mid)) lo = mid;
else hi = mid;
}
final_V = lo;
By the end, if you really need more precision than your chosen eps, you can simply take an extra O(n) step:
// (this step is exclusively to retrieve an exact answer from the final
// answer above, if a precision of 'eps' is not acceptable)
foreach (cake_volume vi){
int slices = round(vi/final_V);
double difference = abs(vi-(final_V*slices));
if(difference < best){
best = difference;
volume = vi;
denominator = slices;
}
}
// exact answer is volume/denominator
Here's the approach I would consider:
Let's assume that all of our cakes are sorted in the order of non-decreasing size, meaning that Vn is the largest cake and V1 is the smallest cake.
Generate the first intermediate solution by dividing only the largest cake between all k people. I.e. V = Vn / k.
Immediately discard all cakes that are smaller than V - any intermediate solution that involves these cakes is guaranteed to be worse than our intermediate solution from step 1. Now we are left with cakes Vb, ..., Vn, where b is greater or equal to 1.
If all cakes got discarded except the biggest one, then we are done. V is the solution. END.
Since we have more than one cake left, let's improve our intermediate solution by redistributing some of the slices to the second biggest cake Vn-1, i.e. find the biggest value of V so that floor(Vn / V) + floor(Vn-1 / V) = k. This can be done by performing a binary search between the current value of V and the upper limit (Vn + Vn-1) / k, or by something more clever.
Again, just like we did on step 2, immediately discard all cakes that are smaller than V - any intermediate solution that involves these cakes is guaranteed to be worse than our intermediate solution from step 4.
If all cakes got discarded except the two biggest ones, then we are done. V is the solution. END.
Continue to involve the new "big" cakes in right-to-left direction, improve the intermediate solution, and continue to discard "small" cakes in left-to-right direction until all remaining cakes get used up.
P.S. The complexity of step 4 seems to be equivalent to the complexity of the entire problem, meaning that the above can be seen as an optimization approach, but not a real solution. Oh well, for what it is worth... :)
Here's one approach to a more efficient solution. Your brute force solution in essence generates an implicit of possible volumes, filters them by feasibility, and returns the largest. We can modify it slightly to materialize the list and sort it so that the first feasible solution found is the largest.
First task for you: find a way to produce the sorted list on demand. In other words, we should do O(n + m log n) work to generate the first m items.
Now, let's assume that the volumes appearing in the list are pairwise distinct. (We can remove this assumption later.) There's an interesting fact about how many people are served by the volume at position k. For example, with volumes 11, 13, 17 and 7 people, the list is 17, 13, 11, 17/2, 13/2, 17/3, 11/2, 13/3, 17/4, 11/3, 17/5, 13/4, 17/6, 11/4, 13/5, 17/7, 11/5, 13/6, 13/7, 11/6, 11/7.
Second task for you: simulate the brute force algorithm on this list. Exploit what you notice.
So here is the algorithm I thought it would work:
Sort the volumes from largest to smallest.
Divide the largest cake to 1...k people, i.e., target = volume[0]/i, where i = 1,2,3,4,...,k
If target would lead to total number of pieces greater than k, decrease the number i and try again.
Find the first number i that will result in total number of pieces greater than or equal to K but (i-1) will lead to a total number of cakes less than k. Record this volume as baseVolume.
For each remaining cake, find the smallest fraction of remaining volume divide by number of people, i.e., division = (V_cake - (baseVolume*(Math.floor(V_cake/baseVolume)) ) / Math.floor(V_cake/baseVolume)
Add this amount to the baseVolume(baseVolume += division) and recalculate the total pieces all volumes could divide. If the new volume result in less pieces, return previous value, otherwise, repeat step 6.
Here are the java codes:
public static int getKonLagestCake(Integer[] sortedVolumesList, int k) {
int result = 0;
for (int i = k; i >= 1; i--) {
double volumeDividedByLargestCake = (double) sortedVolumesList[0]
/ i;
int totalNumber = totalNumberofCakeWithGivenVolumn(
sortedVolumesList, volumeDividedByLargestCake);
if (totalNumber < k) {
result = i + 1;
break;
}
}
return result;
}
public static int totalNumberofCakeWithGivenVolumn(
Integer[] sortedVolumnsList, double givenVolumn) {
int totalNumber = 0;
for (int volume : sortedVolumnsList) {
totalNumber += (int) (volume / givenVolumn);
}
return totalNumber;
}
public static double getMaxVolume(int[] volumesList, int k) {
List<Integer> list = new ArrayList<Integer>();
for (int i : volumesList) {
list.add(i);
}
Collections.sort(list, Collections.reverseOrder());
Integer[] sortedVolumesList = new Integer[list.size()];
list.toArray(sortedVolumesList);
int previousValidK = getKonLagestCake(sortedVolumesList, k);
double baseVolume = (double) sortedVolumesList[0] / (double) previousValidK;
int totalNumberofCakeAvailable = totalNumberofCakeWithGivenVolumn(sortedVolumesList, baseVolume);
if (totalNumberofCakeAvailable == k) {
return baseVolume;
} else {
do
{
double minimumAmountAdded = minimumAmountAdded(sortedVolumesList, baseVolume);
if(minimumAmountAdded == 0)
{
return baseVolume;
}else
{
baseVolume += minimumAmountAdded;
int newTotalNumber = totalNumberofCakeWithGivenVolumn(sortedVolumesList, baseVolume);
if(newTotalNumber == k)
{
return baseVolume;
}else if (newTotalNumber < k)
{
return (baseVolume - minimumAmountAdded);
}else
{
continue;
}
}
}while(true);
}
}
public static double minimumAmountAdded(Integer[] sortedVolumesList, double volume)
{
double mimumAdded = Double.MAX_VALUE;
for(Integer i:sortedVolumesList)
{
int assignedPeople = (int)(i/volume);
if (assignedPeople == 0)
{
continue;
}
double leftPiece = (double)i - assignedPeople*volume;
if(leftPiece == 0)
{
continue;
}
double division = leftPiece / (double)assignedPeople;
if (division < mimumAdded)
{
mimumAdded = division;
}
}
if (mimumAdded == Double.MAX_VALUE)
{
return 0;
}else
{
return mimumAdded;
}
}
Any Comments would be appreciated.
Thanks

finding the position of a fraction in farey sequence

For finding the position of a fraction in farey sequence, i tried to implement the algorithm given here http://www.math.harvard.edu/~corina/publications/farey.pdf under "initial algorithm" but i can't understand where i'm going wrong, i am not getting the correct answers . Could someone please point out my mistake.
eg. for order n = 7 and fractions 1/7 ,1/6 i get same answers.
Here's what i've tried for given degree(n), and a fraction a/b:
sum=0;
int A[100000];
A[1]=a;
for(i=2;i<=n;i++)
A[i]=i*a-a;
for(i=2;i<=n;i++)
{
for(j=i+i;j<=n;j+=i)
A[j]-=A[i];
}
for(i=1;i<=n;i++)
sum+=A[i];
ans = sum/b;
Thanks.
Your algorithm doesn't use any particular properties of a and b. In the first part, every relevant entry of the array A is a multiple of a, but the factor is independent of a, b and n. Setting up the array ignoring the factor a, i.e. starting with A[1] = 1, A[i] = i-1 for 2 <= i <= n, after the nested loops, the array contains the totients, i.e. A[i] = phi(i), no matter what a, b, n are. The sum of the totients from 1 to n is the number of elements of the Farey sequence of order n (plus or minus 1, depending on which of 0/1 and 1/1 are included in the definition you use). So your answer is always the approximation (a*number of terms)/b, which is close but not exact.
I've not yet looked at how yours relates to the algorithm in the paper, check back for updates later.
Addendum: Finally had time to look at the paper. Your initialisation is not what they give. In their algorithm, A[q] is initialised to floor(x*q), for a rational x = a/b, the correct initialisation is
for(i = 1; i <= n; ++i){
A[i] = (a*i)/b;
}
in the remainder of your code, only ans = sum/b; has to be changed to ans = sum;.
A non-algorithmic way of finding the position t of a fraction in the Farey sequence of order n>1 is shown in Remark 7.10(ii)(a) of the paper, under m:=n-1, where mu-bar stands for the number-theoretic Mobius function on positive integers taking values from the set {-1,0,1}.
Here's my Java solution that works. Add head(0/1), tail(1/1) nodes to a SLL.
Then start by passing headNode,tailNode and setting required orderLevel.
public void generateSequence(Node leftNode, Node rightNode){
Fraction left = (Fraction) leftNode.getData();
Fraction right= (Fraction) rightNode.getData();
FractionNode midNode = null;
int midNum = left.getNum()+ right.getNum();
int midDenom = left.getDenom()+ right.getDenom();
if((midDenom <=getMaxLevel())){
Fraction middle = new Fraction(midNum,midDenom);
midNode = new FractionNode(middle);
}
if(midNode!= null){
leftNode.setNext(midNode);
midNode.setNext(rightNode);
generateSequence(leftNode, midNode);
count++;
}else if(rightNode.next()!=null){
generateSequence(rightNode, rightNode.next());
}
}

mob picking - random selection of multiple items, each with a cost, given a range to spend

I am considering a random mode for a real-time strategy game.
In this mode, the computer opponent needs to generate a random group of attackers (the mob) which will come at the player. Each possible attacker has an associated creation cost, and each turn there is a certain maximum amount to spend. To avoid making it uninteresting, the opponent should always spend at least half of that amount.
The amount to spend is highly dynamic, while creation costs are dynamic but change slower.
I am seeking a routine of the form:
void randomchoice( int N, int * selections, int * costs, int minimum, int maximum )
Such that given:
N = 5 (for example, I expect it to be around 20 or so)
selections is an empty array of 5 positions
costs is the array {11, 13, 17, 19, 23}
minimum and maximum are 83 and 166
Would return:
83 <= selection[0]*11 + selection[1]*13 + selection[2]*17 + selection[3]*19 + selection[4]*23 <= 166
Most importantly, I want an uniformly random selection - all approaches I've tried result mostly in a few of the largest attackers, and "zergs" of the small ones are too rare.
While I would prefer solutions in the C/C++ family, any algorithmic hints would be welcome.
Firstly I suggest you create a random number r between your min and max number, and we'll try to approach that number in cost, to simplify this a bit., so min <= r <= max.
Next create a scheme that is uniform to your liking in dispatching your units. If I understand correctly, it would be something like this:
If a unit A has a cost c, then m_a = r / c is the rough number of such units you can maximally buy. Now we have units of other types - B, C, with their own costs, and own number m_b, m_c, etc. Let S = m_a + m_b + .... Generate a random number U between 0 and S. Find the smallest i, such that S = m_a + ... m_i is larger than U. Then create a unit of type i, and subtract the units cost from r. Repeat while r > 0.
It seems intuitively clear, that there should be a more efficient method without recomputations, but for a given meaning of the word uniform, this is passable.
Truly uniform? If the number of types of units (N=20?) and cost to max spend ratio is relatively small, the search space for valid possibilities is fairly small and you can probably just brute force this one. Java-esque, sorry (more natural for me, should be easy to port.
List<Integer> choose(int[] costs, int min, int max) {
List<List<Integer>> choices = enumerate(costs, min, max);
return choices.get(new Random().nextInt(choices.size()));
}
// Recursively computes the valid possibilities.
List<List<Integer>> enumerate(int[] costs, int min, int max) {
List<List<Integer>> possibilities = new ArrayList<List<List<Integer>>();
// Base case
if (costs.length == 1) {
for (int i = min / costs[0]; i < max / costs[0]; i++) {
List<Integer> p = new ArrayList<Integer>();
p.add(i);
possibilities.add(p);
}
return possibilities;
}
// Recursive case - iterate through all possible options for this unit, recursively find
// all remaining solutions.
for (int i = 0; i < max / costs[0]; i++) {
// Pythonism because I'm lazy - your recursive call should be a subarray of the
// cost array from 1-end, since we handled the costs[0] case here.
List<List<Integer>> partial = enumerate(costs[1:], min - (costs[0] * i), max - (costs[0] * i));
for (List<Integer> li : partial) {
possibilities.add(li.add(0, i));
}
}
return possibilities;
}

Pixies in the custard swamp puzzle

(With thanks to Rich Bradshaw)
I'm looking for optimal strategies for the following puzzle.
As the new fairy king, it is your duty to map the kingdom's custard swamp.
The swamp is covered in an ethereal mist, with islands of custard scattered throughout.
You can send your pixies across the swamp, with instructions to fly low or high at each point.
If a pixie swoops down over a custard, it will be distracted and won't complete its sequence.
Since the mist is so thick, all you know is whether a pixie got to the other side or not.
In coding terms..
bool flutter( bool[size] swoop_map );
This returns whether a pixie exited for a given sequence of swoops.
The simplest way is to pass in sequences with only one swoop. That reveals all custard islands in 'size' tries.
I'd rather something proportional to the number of custards - but have problems with sequences like:
C......C (that is, custards at beginning and end)
Links to other forms of this puzzle would be welcome as well.
This makes me think of divide and conquer. Maybe something like this (this is slightly broken pseudocode. It may have fence-post errors and the like):
retval[size] check()
{
bool[size] retval = ALLFALSE;
bool[size] flut1 = ALLFALSE;
bool[size] flut2 = ALLFALSE;
for (int i = 0; i < size/2; ++i) flut1[i] = TRUE;
for (int i = size/2; i < size; ++i) flut2[i] = TRUE;
if (flutter(flut1)) retval[0..size/2] = <recurse>check
if (flutter(flut2)) retval[size/2..size] = <recurse>check
}
In plain English, it calls flutter on each half of the custard map. If any half returns false, that whole half has no custard. Otherwise, half of the half has the algorithm applied recursively. I'm not sure if it is possible to do better. However, this algorithm is kind of lame if the swamp is mostly custard.
Idea Two:
int itsize = 1
bool[size] retval = ALLFALSE;
for (int pos = 0; pos < size;)
{
bool[size] nextval = ALLFALSE;
for (int pos2 = pos; pos2 < pos + size && pos2 < size; ++pos2) nextval[pos2] = true;
bool flut = flutter(nextval)
if (!flut || itsize == 1)
{
for (int pos2 = pos; pos2 < pos + size && pos2 < size; ++pos2) retval[pos2] = flut;
pos+=itsize;
}
if (flut) itsize = 1;
if (!flut) itsize*=2;
}
In plain English, it calls flutter on each element of the custard map, one at a time. If it does not find custard, the next call will be on twice as many elements as the previous call. This is kind of like binary search, except only in one direction since it does not know how many items it is searching for. I have no idea how efficient this is.
Brian's first divide and conquer algorithm is optimal in the following sense: there exists a constant C such that over all swamps with n squares and at most k custards, no algorithm has a worst case that is more than C times better than Brian's. Brian's algorithm uses O(k log(n/k)) flights, which is within a constant factor the information-theoretic lower bound of log2(n choose k) >= log2((n/k)^k) = k Omega(k log(n/k)). (You need an assumption like k <= n/2 to make the last step rigorous, but at this point, we've already reached the maximum of O(n) flights.)
Why does Brian's algorithm use only O(k log(n/k)) flights? At recursion depth i, it makes at most min(2^i, k) flights. The sum for 0 <= i <= log2(k) is O(k). The sum for log2(k) < i <= log2(n) is k (log2(n) - log2(k)) = k (log2(n/k)).

Resources