I'm trying to do a simple simple 'crowd' model and need distribute random points within a 2D area. This semi-pseudo code is my best attempt, but I can see big issues even before I run it, in that for dense crowds, the chances of a new point being too close could get very high very quickly, making it very inefficient and prone to fail unless the values are fine tuned. Probably issues with signed values too, but I'm leaving that out for simplicity.
int numPoints = 100;
int x[numPoints];
int y[numPoints];
int testX, testY;
tooCloseRadius = 20;
maxPointChecks = 100;
pointCheckCount = 0;
for (int newPoint = 0; newPoint < numPoints; newPoint++ ){
//Keep checking random points until one is found with no other points in close proximity, or maxPointChecks reached.
while (pointCheckCount < maxPointChecks){
tooClose = false;
// Make a new random point and check against all previous points
testX = random(1000);
testY = random(1000);
for ( testPoint = 0; testPoint < newPoint; testPoint++ ){
if ( (isTooClose (x[testPoint] , y[testPoint], textX, testY, tooCloseRadius) ) {
tooClose = true;
break; // (exit for loop)
}
if (tooClose == false){
// Yay found a point with some space!
x[newPoint] = testX;
y[newPoint] = testY;
break; // (exit do loop)
}
//Too close to one of the points, start over.
pointCheckCount++;
}
if (tooClose){
// maxPointChecks reached without finding a point that has some space.
// FAILURE DEPARTMENT
} else {
// SUCCESS
}
}
// Simple Trig to check if a point lies within a circle.
(bool) isTooClose(centerX, centerY, testX, testY, testRadius){
return (testX - centreX)^2 + (testY - centreY)^2) < testRadius ^2
}
After googling the subject, I believe what I've done is called Rejection Sampling (?), and the Adaptive Rejection Sampling could be a better approach, but the math is far too complex.
Are there any elegant methods for achieving this that don't require a degree in statistics?
For the problem you are proposing the best way to generate random samples is to use Poisson Disk Sampling.
https://www.jasondavies.com/poisson-disc
Now if you want to sample random points in a rectangle the simple way. Simply
sample two values per point from 0 to the length of the largest dimension.
if the value representing the smaller dimension is larger than the dimension throw the pair away and try again.
Pseudo code:
while (need more points)
begin
range = max (rect_width, rect_height);
x = uniform_random(0,range);
y = uniform_random(0,range);
if (x > rect_width) or (y > rect_height)
continue;
else
insert point(x,y) into point_list;
end
The reason you sample up to the larger of the two lengths, is to make the uniform selection criteria equivalent when the lengths are different.
For example assume one side is of length K and the other side is of length 10K. And assume the numeric type used has a resolution of 1/1000 of K, then for the shorter side, there are only 1000 possible values, whereas for the longer side there are 10000 possible values to choose from. A probability of 1/1000 is not the same as 1/10000. Simply put the coordinate value for the short side will have a 10x greater probability of occurring than those of the longer side - which means that the sampling is not truly uniform.
Pseudo code for the scenario where you want to ensure that the point generated is not closer than some distance to any already generated point:
while (need more points)
begin
range = max (rect_width, rect_height)
x = uniform_random(0,range);
y = uniform_random(0,range);
if (x > rect_width) or (y > rect_height)
continue;
new_point = point(x,y);
too_close = false;
for (p : all points)
begin
if (distance(p, new_point) < minimum_distance)
begin
too_close = true;
break;
end
end
if (too_close)
continue;
insert point(x,y) into point_list;
end
While Poisson Disk solution is usually fine and dandy, I would like to point an alternative using quasi-random numbers. For quasi-random Sobol sequences there is a statement which says that there is minimum positive distance between points which amounts to 0.5*sqrt(d)/N, where d is dimension of the problem (2 in your case), and N is number of points sampled in hypercube. Paper from the man himself http://www.sciencedirect.com/science/article/pii/S0378475406002382.
Why I thought it should be Python? Sorry, my bad. For C-like languanges best to call GSL, function name is gsl_qrng_sobol. Example to use it at d=2 is linked here
I'm wondering if there is an algorithm to generate random numbers that most likely will be low in a range from min to max. For instance if you generate a random number between 1 and 100 it should most of the time be below 30 if you call the function with f(min: 1, max: 100, avg: 30), but if you call it with f(min: 1, max: 200, avg: 10) the most the average should be 10. A lot of games does this, but I simply can't find a way to do this with formula. Most of the examples I have seen uses a "drop table" or something like that.
I have come up with a fairly simple way to weight the outcome of a roll, but it is not very efficient and you don't have a lot of control over it
var pseudoRand = function(min, max, n) {
if (n > 0) {
return pseudoRand(min, Math.random() * (max - min) + min, n - 1)
}
return max;
}
rands = []
for (var i = 0; i < 20000; i++) {
rands.push(pseudoRand(0, 100, 1))
}
avg = rands.reduce(function(x, y) { return x + y } ) / rands.length
console.log(avg); // ~50
The function simply picks a random number between min and max N times, where it for every iteration updates the max with the last roll. So if you call it with N = 2, and max = 100 then it must roll 100 two times in a row in order to return 100
I have looked at some distributions on wikipedia, but I don't quite understand them enough to know how I can control the min and max outputs etc.
Any help is very much welcomed
A simple way to generate a random number with a given distribution is to pick a random number from a list where the numbers that should occur more often are repeated according with the desired distribution.
For example if you create a list [1,1,1,2,2,2,3,3,3,4] and pick a random index from 0 to 9 to select an element from that list you will get a number <4 with 90% probability.
Alternatively, using the distribution from the example above, generate an array [2,5,8,9] and pick a random integer from 0 to 9, if it's ≤2 (this will occur with 30% probability) then return 1, if it's >2 and ≤5 (this will also occur with 30% probability) return 2, etc.
Explained here: https://softwareengineering.stackexchange.com/a/150618
A probability distribution function is just a function that, when you put in a value X, will return the probability of getting that value X. A cumulative distribution function is the probability of getting a number less than or equal to X. A CDF is the integral of a PDF. A CDF is almost always a one-to-one function, so it almost always has an inverse.
To generate a PDF, plot the value on the x-axis and the probability on the y-axis. The sum (discrete) or integral (continuous) of all the probabilities should add up to 1. Find some function that models that equation correctly. To do this, you may have to look up some PDFs.
Basic Algorithm
https://en.wikipedia.org/wiki/Inverse_transform_sampling
This algorithm is based off of Inverse Transform Sampling. The idea behind ITS is that you are randomly picking a value on the y-axis of the CDF and finding the x-value it corresponds to. This makes sense because the more likely a value is to be randomly selected, the more "space" it will take up on the y-axis of the CDF.
Come up with some probability distribution formula. For instance, if you want it so that as the numbers get higher the odds of them being chosen increases, you could use something like f(x)=x or f(x)=x^2. If you want something that bulges in the middle, you could use the Gaussian Distribution or 1/(1+x^2). If you want a bounded formula, you can use the Beta Distribution or the Kumaraswamy Distribution.
Integrate the PDF to get the Cumulative Distribution Function.
Find the inverse of the CDF.
Generate a random number and plug it into the inverse of the CDF.
Multiply that result by (max-min) and then add min
Round the result to the nearest integer.
Steps 1 to 3 are things you have to hard code into the game. The only way around it for any PDF is to solve for the shape parameters of that correspond to its mean and holds to the constraints on what you want the shape parameters to be. If you want to use the Kumaraswamy Distribution, you will set it so that the shape parameters a and b are always greater than one.
I would suggest using the Kumaraswamy Distribution because it is bounded and it has a very nice closed form and closed form inverse. It only has two parameters, a and b, and it is extremely flexible, as it can model many different scenarios, including polynomial behavior, bell curve behavior, and a basin-like behavior that has a peak at both edges. Also, modeling isn't too hard with this function. The higher the shape parameter b is, the more tilted it will be to the left, and the higher the shape parameter a is, the more tilted it will be to the right. If a and b are both less than one, the distribution will look like a trough or basin. If a or b is equal to one, the distribution will be a polynomial that does not change concavity from 0 to 1. If both a and b equal one, the distribution is a straight line. If a and b are greater than one, than the function will look like a bell curve. The best thing you can do to learn this is to actually graph these functions or just run the Inverse Transform Sampling algorithm.
https://en.wikipedia.org/wiki/Kumaraswamy_distribution
For instance, if I want to have a probability distribution shaped like this with a=2 and b=5 going from 0 to 100:
https://www.wolframalpha.com/input/?i=2*5*x%5E(2-1)*(1-x%5E2)%5E(5-1)+from+x%3D0+to+x%3D1
Its CDF would be:
CDF(x)=1-(1-x^2)^5
Its inverse would be:
CDF^-1(x)=(1-(1-x)^(1/5))^(1/2)
The General Inverse of the Kumaraswamy Distribution is:
CDF^-1(x)=(1-(1-x)^(1/b))^(1/a)
I would then generate a number from 0 to 1, put it into the CDF^-1(x), and multiply the result by 100.
Pros
Very accurate
Continuous, not discreet
Uses one formula and very little space
Gives you a lot of control over exactly how the randomness is spread out
Many of these formulas have CDFs with inverses of some sort
There are ways to bound the functions on both ends. For instance, the Kumaraswamy Distribution is bounded from 0 to 1, so you just input a float between zero and one, then multiply the result by (max-min) and add min. The Beta Distribution is bounded differently based on what values you pass into it. For something like PDF(x)=x, the CDF(x)=(x^2)/2, so you can generate a random value from CDF(0) to CDF(max-min).
Cons
You need to come up with the exact distributions and their shapes you plan on using
Every single general formula you plan on using needs to be hard coded into the game. In other words, you can program the general Kumaraswamy Distribution into the game and have a function that generates random numbers based on the distribution and its parameters, a and b, but not a function that generates a distribution for you based on the average. If you wanted to use Distribution x, you would have to find out what values of a and b best fit the data you want to see and hard code those values into the game.
I would use a simple mathematical function for that. From what you describe, you need an exponential progression like y = x^2. at average (average is at x=0.5 since rand gets you a number from 0 to 1) you would get 0.25. If you want a lower average number, you can use a higher exponent like y = x^3 what would result in y = 0.125 at x = 0.5
Example:
http://www.meta-calculator.com/online/?panel-102-graph&data-bounds-xMin=-2&data-bounds-xMax=2&data-bounds-yMin=-2&data-bounds-yMax=2&data-equations-0=%22y%3Dx%5E2%22&data-rand=undefined&data-hideGrid=false
PS: I adjusted the function to calculate the needed exponent to get the average result.
Code example:
function expRand (min, max, exponent) {
return Math.round( Math.pow( Math.random(), exponent) * (max - min) + min);
}
function averageRand (min, max, average) {
var exponent = Math.log(((average - min) / (max - min))) / Math.log(0.5);
return expRand(min, max, exponent);
}
alert(averageRand(1, 100, 10));
You may combine 2 random processes. For example:
first rand R1 = f(min: 1, max: 20, avg: 10);
second rand R2 = f(min:1, max : 10, avg : 1);
and then multiply R1*R2 to have a result between [1-200] and average around 10 (the average will be shifted a bit)
Another option is to find the inverse of the random function you want to use. This option has to be initialized when your program starts but doesn't need to be recomputed. The math used here can be found in a lot of Math libraries. I will explain point by point by taking the example of an unknown random function where only four points are known:
First, fit the four point curve with a polynomial function of order 3 or higher.
You should then have a parametrized function of type : ax+bx^2+cx^3+d.
Find the indefinite integral of the function (the form of the integral is of type a/2x^2+b/3x^3+c/4x^4+dx, which we will call quarticEq).
Compute the integral of the polynomial from your min to your max.
Take a uniform random number between 0-1, then multiply by the value of the integral computed in Step 5. (we name the result "R")
Now solve the equation R = quarticEq for x.
Hopefully the last part is well known, and you should be able to find a library that can do this computation (see wiki). If the inverse of the integrated function does not have a closed form solution (like in any general polynomial with degree five or higher), you can use a root finding method such as Newton's Method.
This kind of computation may be use to create any kind of random distribution.
Edit :
You may find the Inverse Transform Sampling described above in wikipedia and I found this implementation (I haven't tried it.)
You can keep a running average of what you have returned from the function so far and based on that in a while loop get the next random number that fulfills the average, adjust running average and return the number
Using a drop table permit a very fast roll, that in a real time game matter. In fact it is only one random generation of a number from a range, then according to a table of probabilities (a Gauss distribution for that range) a if statement with multiple choice. Something like that:
num = random.randint(1,100)
if num<10 :
case 1
if num<20 and num>10 :
case 2
...
It is not very clean but when you have a finite number of choices it can be very fast.
There are lots of ways to do so, all of which basically boil down to generating from a right-skewed (a.k.a. positive-skewed) distribution. You didn't make it clear whether you want integer or floating point outcomes, but there are both discrete and continuous distributions that fit the bill.
One of the simplest choices would be a discrete or continuous right-triangular distribution, but while that will give you the tapering off you desire for larger values, it won't give you independent control of the mean.
Another choice would be a truncated exponential (for continuous) or geometric (for discrete) distribution. You'd need to truncate because the raw exponential or geometric distribution has a range from zero to infinity, so you'd have to lop off the upper tail. That would in turn require you to do some calculus to find a rate λ which yields the desired mean after truncation.
A third choice would be to use a mixture of distributions, for instance choose a number uniformly in a lower range with some probability p, and in an upper range with probability (1-p). The overall mean is then p times the mean of the lower range + (1-p) times the mean of the upper range, and you can dial in the desired overall mean by adjusting the ranges and the value of p. This approach will also work if you use non-uniform distribution choices for the sub-ranges. It all boils down to how much work you're willing to put into deriving the appropriate parameter choices.
One method would not be the most precise method, but could be considered "good enough" depending on your needs.
The algorithm would be to pick a number between a min and a sliding max. There would be a guaranteed max g_max and a potential max p_max. Your true max would slide depending on the results of another random call. This will give you a skewed distribution you are looking for. Below is the solution in Python.
import random
def get_roll(min, g_max, p_max)
max = g_max + (random.random() * (p_max - g_max))
return random.randint(min, int(max))
get_roll(1, 10, 20)
Below is a histogram of the function ran 100,000 times with (1, 10, 20).
private int roll(int minRoll, int avgRoll, int maxRoll) {
// Generating random number #1
int firstRoll = ThreadLocalRandom.current().nextInt(minRoll, maxRoll + 1);
// Iterating 3 times will result in the roll being relatively close to
// the average roll.
if (firstRoll > avgRoll) {
// If the first roll is higher than the (set) average roll:
for (int i = 0; i < 3; i++) {
int verificationRoll = ThreadLocalRandom.current().nextInt(minRoll, maxRoll + 1);
if (firstRoll > verificationRoll && verificationRoll >= avgRoll) {
// If the following condition is met:
// The iteration-roll is closer to 30 than the first roll
firstRoll = verificationRoll;
}
}
} else if (firstRoll < avgRoll) {
// If the first roll is lower than the (set) average roll:
for (int i = 0; i < 3; i++) {
int verificationRoll = ThreadLocalRandom.current().nextInt(minRoll, maxRoll + 1);
if (firstRoll < verificationRoll && verificationRoll <= avgRoll) {
// If the following condition is met:
// The iteration-roll is closer to 30 than the first roll
firstRoll = verificationRoll;
}
}
}
return firstRoll;
}
Explanation:
roll
check if the roll is above, below or exactly 30
if above, reroll 3 times & set the roll according to the new roll, if lower but >= 30
if below, reroll 3 times & set the roll according to the new roll, if
higher but <= 30
if exactly 30, don't set the roll anew
return the roll
Pros:
simple
effective
performs well
Cons:
You'll naturally have more results that are in the range of 30-40 than you'll have in the range of 20-30, simple due to the 30-70 relation.
Testing:
You can test this by using the following method in conjunction with the roll()-method. The data is saved in a hashmap (to map the number to the number of occurences).
public void rollTheD100() {
int maxNr = 100;
int minNr = 1;
int avgNr = 30;
Map<Integer, Integer> numberOccurenceMap = new HashMap<>();
// "Initialization" of the map (please don't hit me for calling it initialization)
for (int i = 1; i <= 100; i++) {
numberOccurenceMap.put(i, 0);
}
// Rolling (100k times)
for (int i = 0; i < 100000; i++) {
int dummy = roll(minNr, avgNr, maxNr);
numberOccurenceMap.put(dummy, numberOccurenceMap.get(dummy) + 1);
}
int numberPack = 0;
for (int i = 1; i <= 100; i++) {
numberPack = numberPack + numberOccurenceMap.get(i);
if (i % 10 == 0) {
System.out.println("<" + i + ": " + numberPack);
numberPack = 0;
}
}
}
The results (100.000 rolls):
These were as expected. Note that you can always fine-tune the results, simply by modifying the iteration-count in the roll()-method (the closer to 30 the average should be, the more iterations should be included (note that this could hurt the performance to a certain degree)). Also note that 30 was (as expected) the number with the highest number of occurences, by far.
<10: 4994
<20: 9425
<30: 18184
<40: 29640
<50: 18283
<60: 10426
<70: 5396
<80: 2532
<90: 897
<100: 223
Try this,
generate a random number for the range of numbers below the average and generate a second random number for the range of numbers above the average.
Then randomly select one of those, each range will be selected 50% of the time.
var psuedoRand = function(min, max, avg) {
var upperRand = (int)(Math.random() * (max - avg) + avg);
var lowerRand = (int)(Math.random() * (avg - min) + min);
if (math.random() < 0.5)
return lowerRand;
else
return upperRand;
}
Having seen much good explanations and some good ideas, I still think this could help you:
You can take any distribution function f around 0, and substitute your interval of interest to your desired interval [1,100]: f -> f'.
Then feed the C++ discrete_distribution with the results of f'.
I've got an example with the normal distribution below, but I can't get my result into this function :-S
#include <iostream>
#include <random>
#include <chrono>
#include <cmath>
using namespace std;
double p1(double x, double mean, double sigma); // p(x|x_avg,sigma)
double p2(int x, int x_min, int x_max, double x_avg, double z_min, double z_max); // transform ("stretch") it to the interval
int plot_ps(int x_avg, int x_min, int x_max, double sigma);
int main()
{
int x_min = 1;
int x_max = 20;
int x_avg = 6;
double sigma = 5;
/*
int p[]={2,1,3,1,2,5,1,1,1,1};
default_random_engine generator (chrono::system_clock::now().time_since_epoch().count());
discrete_distribution<int> distribution {p*};
for (int i=0; i< 10; i++)
cout << i << "\t" << distribution(generator) << endl;
*/
plot_ps(x_avg, x_min, x_max, sigma);
return 0; //*/
}
// Normal distribution function
double p1(double x, double mean, double sigma)
{
return 1/(sigma*sqrt(2*M_PI))
* exp(-(x-mean)*(x-mean) / (2*sigma*sigma));
}
// Transforms intervals to your wishes ;)
// z_min and z_max are the desired values f'(x_min) and f'(x_max)
double p2(int x, int x_min, int x_max, double x_avg, double z_min, double z_max)
{
double y;
double sigma = 1.0;
double y_min = -sigma*sqrt(-2*log(z_min));
double y_max = sigma*sqrt(-2*log(z_max));
if(x < x_avg)
y = -(x-x_avg)/(x_avg-x_min)*y_min;
else
y = -(x-x_avg)/(x_avg-x_max)*y_max;
return p1(y, 0.0, sigma);
}
//plots both distribution functions
int plot_ps(int x_avg, int x_min, int x_max, double sigma)
{
double z = (1.0+x_max-x_min);
// plot p1
for (int i=1; i<=20; i++)
{
cout << i << "\t" <<
string(int(p1(i, x_avg, sigma)*(sigma*sqrt(2*M_PI)*20.0)+0.5), '*')
<< endl;
}
cout << endl;
// plot p2
for (int i=1; i<=20; i++)
{
cout << i << "\t" <<
string(int(p2(i, x_min, x_max, x_avg, 1.0/z, 1.0/z)*(20.0*sqrt(2*M_PI))+0.5), '*')
<< endl;
}
}
With the following result if I let them plot:
1 ************
2 ***************
3 *****************
4 ******************
5 ********************
6 ********************
7 ********************
8 ******************
9 *****************
10 ***************
11 ************
12 **********
13 ********
14 ******
15 ****
16 ***
17 **
18 *
19 *
20
1 *
2 ***
3 *******
4 ************
5 ******************
6 ********************
7 ********************
8 *******************
9 *****************
10 ****************
11 **************
12 ************
13 *********
14 ********
15 ******
16 ****
17 ***
18 **
19 **
20 *
So - if you could give this result to the discrete_distribution<int> distribution {}, you got everything you want...
Well, from what I can see of your problem, I would want for the solution to meet these criteria:
a) Belong to a single distribution: If we need to "roll" (call math.Random) more than once per function call and then aggregate or discard some results, it stops being truly distributed according to the given function.
b) Not be computationally intensive: Some of the solutions use Integrals, (Gamma distribution, Gaussian Distribution), and those are computationally intensive. In your description, you mention that you want to be able to "calculate it with a formula", which fits this description (basically, you want an O(1) function).
c) Be relatively "well distributed", e.g. not have peaks and valleys, but instead have most results cluster around the mean, and have nice predictable slopes downwards towards the ends, and yet have the probability of the min and the max to be not zero.
d) Not to require to store a large array in memory, as in drop tables.
I think this function meets the requirements:
var pseudoRand = function(min, max, avg )
{
var randomFraction = Math.random();
var head = (avg - min);
var tail = (max - avg);
var skewdness = tail / (head + tail);
if (randomFraction < skewdness)
return min + (randomFraction / skewdness) * head;
else
return avg + (1 - randomFraction) / (1 - skewdness) * tail;
}
This will return floats, but you can easily turn them to ints by calling
(int) Math.round(pseudoRand(...))
It returned the correct average in all of my tests, and it is also nicely distributed towards the ends. Hope this helps. Good luck.
I have some 2d points in space and I need to find the point [xmin, ymax]. I can do it in 2 passes using x and then y, but I want to do it in a single pass.
Is there a way I can combine these values into a single float so I can find the right point by a single sort?
I thought of doing x+y, but I don't know if that's reliable, most likely not, as from [5, 2], [2, 5], the last one will have higher priority.
Any ideas?
You shouldn't sort your point list to find maximum and minimum values, since that will take about O(n*log(n)) time. Instead, iterate through the list once, keeping a reference to the highest and lowest values you have found. This will take O(n) time.
min_x = myPoints[0].x;
max_y = myPoints[0].y;
for(Point p in myPoints){
if (p.x < min_x){min_x = p.x;}
if (p.y > max_y){max_y = p.y;}
}
Edit: from Wikipedia's Graham scan article:
The first step in this algorithm is to find the point with the lowest y-coordinate. If the lowest y-coordinate exists in more than one point in the set, the point with the lowest x-coordinate out of the candidates should be chosen.
So, finding the min x and max y separately is inappropriate, because the point you're looking for might not have both. We can modify the code from above, to use these new criteria.
candidate = myPoints[0];
for (Point p in myPoints){
if (p.y < candidate.y or (p.y == candidate.y and px < candidate.x)){
candidate = p;
}
}
It may be necessary to change some of these "less than" signs to "greater than" signs, depending on your definition of "lowest". In some coordinate systems, (0,0) is in the upper-left corner of the graph, and the lower you go on the screen, the larger Y becomes. In which case, you ought to use if (p.y > candidate.y instead
If you are trying to find the minimum point according to some variant of the lexicographic order (or even some other kind of order over 2D points), then simply traverse your set of points once but use a custom comparison to find / keep the minimum. Here is an example in C++, min_element comes from the STL and is just a simple loop (http://www.cplusplus.com/reference/algorithm/min_element/):
#include <algorithm>
#include <iostream>
using namespace std;
struct Point {
int x, y;
};
struct PointCompare {
bool operator()(const Point& p1, const Point& p2) const {
if (p1.x < p2.x)
return true;
if (p1.x == p2.x)
return p1.y > p2.y; // Your order: xmin then ymax?
//return p1.y < p2.y; // Standard lexicographic order
return false;
}
};
int main()
{
// + 3
// + 4
// + 0
// + 2
// + 1
const Point points[] = {
{ 1, 1 }, // 0
{ 2,-1 }, // 1
{ 0, 0 }, // 2
{ 1, 3 }, // 3
{ 0, 2 }, // 4
};
const Point* first = points;
const Point* last = points + sizeof(points) / sizeof(Point);
const Point* it = min_element(first, last, PointCompare());
cout << it->x << ", " << it->y << endl;
}
It looks like you want to find a max-min point - a point with maximal y-coordinate among points with minimal x-coordinate, right?
If yes, you can store all your points in STL multimap, mapping x-coordinate to y-coordinate. This map will be automatically sorted, and there is a chance, that a point with minimal x-coordinate will be only one in this map. If it's not a single point, then you can scan all points with the same (minimal) x-coordinate to find a point with maximal y-coordinate. It will still need two passes, but the second pass statistically should be very short.
If you really want a single pass solution, then you can store your points into STL map, mapping x-coordinate into a set of y-coordinates. It requires more work, but in the end you will have your point - you will see its x-coordinate at the beginning of the map, and its y-coordinate will be at the end of the set, corresponding to this x-coordinate.
1.can sort by sort value: x+(MAX*y) or y+(MAX*x)
where MAX is big enough value MAX>=abs(max(x)-min(x)) or MAX>=abs(max(y)-min(y))
2.if you have enough memory and need freaking speed
then you can ignore sorting and use sort value as address
x,y values must be shifted to non-negative
x,y differences must be >= 1.0 (ideally =1.0) so they can be used as address
there are no duplicate points allowed or needed after sorting
create big enough array pnt[] of points and set them as unused (for example x=-1,y=-1)
sort input points pnt0[] like this
for (i=0i<points;i++)
{
x=pnt0[i].x;
y=pnt0[i].y;
pnt[x+(MAX*y)]=pnt[0];
}
if needed you can pack the pnt[] array to remove unused points but this can be considered as another pass
but you can remember min,max addresses and cycle only between them
Note: I've edited this question with the helpful feedback of Keith Randall and Chiyou.
I have an idea to use 1D array to cache most used 2D NxM matrix entries in a particular processing context. The question is that does there exist already a body of work from where to gain insight or is there just people in the know. I will outline the leg-work first and ask the question later too. As an additional note, Chiyou already proposed space filling curves, which look conceptually the thing I'm after, but won't readily answer to my specific question (i.e. I can't find a curve that would do what I'm after).
The leg-work:
Currently the most used entries are defined as a combination of loops and a special pointer_array (see code below), which in combination produce the indices to the most used matrix elements. The pointer_array contains numbers in range [0, max(N, M)] (see above matrix dimensions defined as NxM).
The purpose of the code is to produce a suitable (in context of the program) combination of indices to process some of the matrix entries. It's possible N = M, i.e. a square matrix. The code (C++) traversing the matrix would look like the following
for(int i = 0; i < N; i++)
{
for(int j = i + 1; j < M; j++)
{
auto pointer1 = pointer_array[i];
auto pointer2 = pointer_array[i + 1];
auto pointer3 = pointer_array[j];
auto pointer4 = pointer_array[j + 1];
auto entry1 = some_matrix[pointer1][pointer3];
auto entry2 = some_matrix[pointer2][pointer4];
auto entry3 = some_matrix[pointer3][pointer4];
auto entry4 = some_matrix[pointer1][pointer2];
//Computing with the entries...
}
}
Some things worth of separate note:
To make this caching worthwhile, I think it should be dense. That is, something that can be accessed in random and is laid out contiguosly in memory. I.e. no space "wasted" (except maybe if there are many separate chunks) and complexity is in O(1). It would mean the cache can't be the same 2D matrix ordered in its entirety as a 1D row/column-major ordered array. I feel it should fit into an array having length max(N, M) (see above matrix dimension definition).
Many of the entries in some_matrix will be ignored during the loops.
The pointer_array ordering records a route through the matrix, so it's used elsewhere too.
I've come across this need in various occasions, but this time I've written a 2-opt algorithm, memory access pattern of which I'm seeking to improve. I'm aware there are other ways to write the algorithm (but see the point there are other settings I've come across this and am now wondering if there were a general solution).
Outside of the box thinking would be to produce a similar kind of combination of indices to something that is conceptually like accessing a 2D matrix, but only smarter. One option would be to cache matrix rows/columns during looping.
As the entries will be used a lot, so it would be more ideal to replace the pointer_array indirection by caching the some_matrix calls so that acquiring the entry* values in the loops would be faster, i.e. from a pre-loaded cache instead of possibly rather big matrix. Other point here is that it would save in storage, which in turn means the often used values could very well fit a faster cache.
Question: Is it possible to device an indexing scheme so that the matrix indexing would essentially become
//Load the 2D matrix entries to some_1d_cache using the pointer_array
//so that the loop indices can be used directly...
for (int i = 0; i < N; i++)
{
for (int j = i + 1; j < M; j++)
{
//The some_1d_cache((x, y) call will calculate an index to the 1D cache array...
auto entry1 = some_1d_cache(i, j);
auto entry2 = some_1d_cache(i + 1, j + 1);
auto entry3 = some_1d_cache(j, j + 1);
auto entry4 = some_1d_cache(i, i + 1);
//Computing with the entries...
}
}
Or maybe something like following would do too
for (int i = 0; i < N; i++)
{
for (int j = i + 1; j < M; j++)
{
auto pointer1 = pointer_array[i];
auto pointer2 = pointer_array[i + 1];
auto pointer3 = pointer_array[j];
auto pointer4 = pointer_array[j + 1];
//The some_1d_cache((x, y) call will calculate an index to the 1D cache array...
auto entry1 = some_1d_cache(pointer1, pointer3);
auto entry2 = some_1d_cache(pointer2, pointer4);
auto entry3 = some_1d_cache(pointer3, pointer4);
auto entry4 = some_1d_cache(pointer1, pointer2);
//Computing with the entries...
}
}
You can index a 2d matrix with a 1d space filling curve. Mathematically it's H(x,y) = (h(x) * h(y)). They are also useful because they basically subdivide, store some locality information and reorder the plane.