Generating random data for a scatter plot

Generating random data for a scatter plot - algorithm

I'm testing out different JavaScript graphing frameworks. I'm trying out line graphs and scatter plots with generated data. While it's all going quite okay. I've run into a trouble while trying to generate data for a scatter plot.
So it would be quite easy to do something like this in PHP, or in any other language:
for ($i=0; $i < $x; $i++) {
$data[] = array(
'x' => mt_rand(0, 10000),
'y' => mt_rand(0, 10000)
);
}
The result is distributed pretty much equally around the whole chart. So here I am trying to think of a way to come up with better random data, which would eventually look more like a scatter plot, rather than a equally distributed dots on a page. And I can't come up with anything.
I would like to end up with something more like this random scatter plot from the Web:
So it is more intense in some part of the plot and pretty much nothing in the corners. But I wouldn't like to make it completely impossible for a dot to make it to the corners.
Any algorithmic ideas?

For something like the image you showed, where you have a line around which you want to scatter data, it's pretty easy. For example, imagine a line in which y = x * 0.75. Given that, you select an x value in the range 0..xMax (whatever your maximum X value is), and then generate a value for y with some variance. For example, if 90% of the time the Y value is within 10% of the expected value, then you'd generate a random value between 0.675x and 0.825x.
Say that 5% of the time, the Y value is within 50% of the expected value and 5% of the time the value is unconstrained. For each of those, you generate a Y value the same way: a random value that is equal to the expected Y value, plus or minus 50% (or, in the latter case, plus or minus some very large number).
You can adjust the probabilities and the variance as appropriate.
You can also adjust the distribution of X values. For example, it looks like most of your data points are between about .15 xMax and .6 xMax. So what you want is a higher percentage of X values in that range. Imagine, then that your X values are broken into three different ranges:
0 to .149 * xMax - 20%
.15 to .60 * xMax - 70%
> .60 xMax - 10%
Generate a random number between 0 and 100. Then:
if value < 20, generate an x value between 0 and .15 xmax
if value > 19 & < 60, generate an x value between .15 xMax and .60 xMax
otherwise, generate an x value > .60 xMax and < xMax

Define a function that becomes the center line of the distribution, for example c(x) = sqrt(x).
Define a function that specifies the maximal allowed deviation from the center line, for example d(x) = 0.1 (x - 5)².
For every x value generate one or a few y values y(x) = c(x) + 2 * (random() - 0.5) * d(x) where random() is a (pseudo) random number generator with values in [0;1].
For a more realistic look use a (pseudo) random number generator that has a more interesting distribution, for example a normal distributed with standard deviation d(x).

Related

How to create a mask or detect image section based on the intensity value?

I have a matrix named figmat from which I obtain the following pcolor plot (Matlab-Version R 2016b).
Basically I only want to extract the bottom red high intensity line from this plot.
I thought of doing it in some way of extracting the maximum values from the matrix and creating some sort of mask on the main matrix. But I'm not understanding a possible way to achieve this. Can it be accomplished with the help of any edge/image detection algorithms?
I was trying something like this with the following code to create a mask
A=max(figmat);
figmat(figmat~=A)=0;
imagesc(figmat);
But this gives only the boundary of maximum values. I also need the entire red color band.

Okay, I assume that the red line is linear and its values can uniquely be separated from the rest of the picture. Let's generate some test data...
[x,y] = meshgrid(-5:.2:5, -5:.2:5);
n = size(x,1)*size(x,2);
z = -0.2*(y-(0.2*x+1)).^2 + 5 + randn(size(x))*0.1;
figure
surf(x,y,z);
This script generates a surface function. Its set of maximum values (x,y) can be described by a linear function y = 0.2*x+1. I added a bit of noise to it to make it a bit more realistic.
We now select all points where z is smaller than, let's say, 95 % of the maximum value. Therefore find can be used. Later, we want to use one-dimensional data, so we reshape everything.
thresh = min(min(z)) + (max(max(z))-min(min(z)))*0.95;
mask = reshape(z > thresh,1,n);
idx = find(mask>0);
xvec = reshape(x,1,n);
yvec = reshape(y,1,n);
xvec and yvec now contain the coordinates of all values > thresh.
The last step is to do some linear polynomial over all points.
pp = polyfit(xvec(idx),yvec(idx),1)
pp =
0.1946 1.0134
Obviously these are roughly the coefficients of y = 0.2*x+1 as it should be.
I do not know, if this also works with your data, since I made some assumptions. The threshold level must be chosen carefully. Maybe some preprocessing must be done to dynamically detect this level if you really want to process your images automatically. There might also be a simpler way to do it... but for me this one was straight forward without the need of any toolboxes.

By assuming:
There is only one band to extract.
It always has the maximum values.
It is linear.
I can adopt my previous answer to this case as well, with few minor changes:
First, we get the distribution of the values in the matrix and look for a population in the top values, that can be distinguished from the smaller values. This is done by finding the maximum value x(i) on the histogram that:
Is a local maximum (its bin is higher than that of x(i+1) and x(i-1))
Has more values above it than within it (the sum of the height of bins x(i+1) to x(end) < the height of bin x):
This is how it is done:
[h,x] = histcounts(figmat); % get the distribution of intesities
d = diff(fliplr(h)); % The diffrence in bin height from large x to small x
band_min_ind = find(cumsum(d)>size(figmat,2) & d<0, 1); % 1st bin that fit the conditions
flp_val = fliplr(x); % the value of x from large to small
band_min = flp_val(band_min_ind); % the value of x that fit the conditions
Now we continue as before. Mask all the unwanted values, interpolate the linear line:
mA = figmat>band_min; % mask all values below the top value mode
[y1,x1] = find(mA,1); % find the first nonzero row
[y2,x2] = find(mA,1,'last'); % find the last nonzero row
m = (y1-y2)/(x1-x2); % the line slope
n = y1-m*x1; % the intercept
f_line = #(x) m.*x+n; % the line function
And if we plot it we can see the red line where the band for detection was:
Next, we can make this line thicker for a better representation of this line:
thick = max(sum(mA)); % mode thickness of the line
tmp = (1:thick)-ceil(thick/2); % helper vector for expanding
rows = bsxfun(#plus,tmp.',floor(f_line(1:size(A,2)))); % all the rows for each column
rows(rows<1) = 1; % make sure to not get out of range
rows(rows>size(A,1)) = size(A,1); % make sure to not get out of range
inds = sub2ind(size(A),rows,repmat(1:size(A,2),thick,1)); % convert to linear indecies
mA(inds) = true; % add the interpolation to the mask
result = figmat.*mA; % apply the mask on figmat
Finally, we can plot that result after masking, excluding the unwanted areas:
imagesc(result(any(result,2),:))

Plotting two variables function

This question is for learning purpose. I am writing my own function to plot an equation. For example:
function e(x) { return sin(x); }
plot(e);
I wrote a plot function that takes function as parameter. The plotting code is simple, x run from some value to some value and increase by small step. This is plot that the plot() manage to produce.
But there is the problem. It cannot express the circle equation like x2 + y2 = 1. So the question would be how should the plot and equation function look like to be able to handle two variables.
Noted that I am not only interesting in two circle equation. A more generalize way of plotting function with two variables.

Well to plot a non function 1D equation (x,y variables) you have 3 choices:
convert to parametric form
so for example x^2 + y^2 = 1 will become:
x = cos(t);
y = sin(t);
t = <0,2*PI>
So plot each function as 1D function plot while t is used as parameter. But for this you need to exploit mathematic identities and substitute ... That is not easily done programaticaly.
convert to 1D functions
non function means you got more than 1 y values for some x values. If you separate your equation into intervals and divide to all cases covering whole plot then you can plot each derived function instead.
So you derive y algebraicaly (let assume unit circle again):
x^2 + y^2 = 1
y^2 = 1 - x^2
y = +/- sqrt (1 - x^2)
----------------------
y1 = +sqrt (1 - x^2)
y2 = -sqrt (1 - x^2)
x = <-1,+1>
this is also not easily done programaticaly but it is a magnitude easier than #1.
do a 2D plot using equation as predicator
simply loop your view through all pixels and render only those for which the equation is true. So again unit circle:
for (x=-1.0;x<=+1.0;x+=0.001)
for (y=-1.0;y<=+1.0;y+=0.001)
if (fabs((x*x)+(y*y)-1.0)<=1e-6)
plot_pixel(x,y,some_color); // x,y should be rescaled and offset to the actual plot view
So you just convert your equation to implicit form:
x^2 + y^2 = 1
-----------------
x^2 + y^2 - 1 = 0
and compare to zero with some threshold (to avoid FPU accuracy problems):
| x^2 + y^2 - 1 | <= threshold_near_zero
The threshold is half size of plot lines width. So this way you can easily change plot width to any pixel size... As you can see this is easily done programaticaly but the plot is slower as you need to loop through all the pixels of the plot view. The step for x,y for loops should match pixel size of the view scale.
Also while using equation as predicate you should handle math singularities as with blind probing you will most likely hit some like division by zero, domain errors for asin,acos,sqrt,etc.
So for arbitrary 1D non function use #3. unless you got some mighty symbolic math engine for #1 or #2.

Defination of a function : A function f takes an input x, and returns a single output f(x).
Now it means for any input there will be one and only one unique output. Like y = sin(x). this is a function on x and y definnes that function.
For equaltion like (x*x) + (y*y) = 1. there are two possible values of y for a single value of `x, hence it can not be termed as a valid equaltion for a function.
If you need to draw it then one possible solution is to plot two points for a single value of x, i.e. sqrt(1-(x*x)) and other -1*sqrt(1-(x*x)). Plot both the values (one will be positive other will be negative with the same absolute value).

random unit vector in multi-dimensional space

I'm working on a data mining algorithm where i want to pick a random direction from a particular point in the feature space.
If I pick a random number for each of the n dimensions from [-1,1] and then normalize the vector to a length of 1 will I get an even distribution across all possible directions?
I'm speaking only theoretically here since computer generated random numbers are not actually random.

One simple trick is to select each dimension from a gaussian distribution, then normalize:
from random import gauss
def make_rand_vector(dims):
vec = [gauss(0, 1) for i in range(dims)]
mag = sum(x**2 for x in vec) ** .5
return [x/mag for x in vec]
For example, if you want a 7-dimensional random vector, select 7 random values (from a Gaussian distribution with mean 0 and standard deviation 1). Then, compute the magnitude of the resulting vector using the Pythagorean formula (square each value, add the squares, and take the square root of the result). Finally, divide each value by the magnitude to obtain a normalized random vector.
If your number of dimensions is large then this has the strong benefit of always working immediately, while generating random vectors until you find one which happens to have magnitude less than one will cause your computer to simply hang at more than a dozen dimensions or so, because the probability of any of them qualifying becomes vanishingly small.

You will not get a uniformly distributed ensemble of angles with the algorithm you described. The angles will be biased toward the corners of your n-dimensional hypercube.
This can be fixed by eliminating any points with distance greater than 1 from the origin. Then you're dealing with a spherical rather than a cubical (n-dimensional) volume, and your set of angles should then be uniformly distributed over the sample space.
Pseudocode:
Let n be the number of dimensions, K the desired number of vectors:
vec_count=0
while vec_count < K
generate n uniformly distributed values a[0..n-1] over [-1, 1]
r_squared = sum over i=0,n-1 of a[i]^2
if 0 < r_squared <= 1.0
b[i] = a[i]/sqrt(r_squared) ; normalize to length of 1
add vector b[0..n-1] to output list
vec_count = vec_count + 1
else
reject this sample
end while

There is a boost implementation of the algorithm that samples from normal distributions: random::uniform_on_sphere

I had the exact same question when also developing a ML algorithm.
I got to the same conclusion as Jim Lewis after drawing samples for the 2-d case and plotting the resulting distribution of the angle.
Furthermore, if you try to derive the density distribution for the direction in 2d when you draw at random from [-1,1] for the x- and y-axis ,you will see that:
f_X(x) = 1/(4*cos²(x)) if 0 < x < 45⁰
and
f_X(x) = 1/(4*sin²(x)) if x > 45⁰
where x is the angle, and f_X is the probability density distribution.
I have written about this here:
https://aerodatablog.wordpress.com/2018/01/14/random-hyperplanes/

#define SCL1 (M_SQRT2/2)
#define SCL2 (M_SQRT2*2)
// unitrand in [-1,1].
double u = SCL1 * unitrand();
double v = SCL1 * unitrand();
double w = SCL2 * sqrt(1.0 - u*u - v*v);
double x = w * u;
double y = w * v;
double z = 1.0 - 2.0 * (u*u + v*v);

Confused in DDA algorithm , need some help

I need help regarding DDA algorithm , i'm confused by the tutorial which i found online on DDA Algo , here is the link to that tutorial
http://i.thiyagaraaj.com/tutorials/computer-graphics/basic-drawing-techniques/1-dda-line-algorithm
Example:
xa,ya=>(2,2)
xb,yb=>(8,10)
dx=6
dy=8
xincrement=6/8=0.75
yincrement=8/8=1
1) for(k=0;k<8;k++)
xincrement=0.75+0.75=1.50
yincrement=1+1=2
1=>(2,2)
2) for(k=1;k<8;k++)
xincrement=1.50+0.75=2.25
yincrement=2+1=3
2=>(3,3)
Now i want to ask that , how this line came xincrement=0.75+0.75=1.50 , when it is written in theory that
"If the slope is greater than 1 ,the roles of x any y at the unit y intervals Dy=1 and compute each successive y values.
Dy=1
m= Dy / Dx
m= 1/ ( x2-x1 )
m = 1 / ( xk+1 – xk )
xk+1 = xk + ( 1 / m )
"
it should be xincrement=x1 (which is 2) + 0.75 = 2.75
or i am understanding it wrong , can any one please teach me the how it's done ?
Thanks a lot)

There seems to be a bit of confusion here.
To start with, let's assume 0 <= slope <= 1. In this case, you advance one pixel at a time in the X direction. At each X step, you have a current Y value. You then figure out whether the "ideal" Y value is closer to your current Y value, or to the next larger Y value. If it's closer to the larger Y value, you increment your current Y value. Phrased slightly differently, you figure out whether the error in using the current Y value is greater than half a pixel, and if it is you increment your Y value.
If slope > 1, then (as mentioned in your question) you swap the roles of X and Y. That is, you advance one pixel at a time in the Y direction, and at each step determine whether you should increment your current X value.
Negative slopes work pretty much the same, except you decrement instead of incrementing.

Pixels locations are integer values. Ideal line equations are in real numbers. So line drawing algorithms convert the real numbers of a line equation into integer values. The hard and slow way to draw a line would be to evaluate the line equation at each x value on your array of pixels. Digital Differential Analyzers optimize that process in a number of ways.
First, DDAs take advantage of the fact that at least one pixel is known, the start of the line. From that pixel, the DDAs calculate the next pixel in the line, until they reach the end point of the line.
Second, DDAs take advantage of the fact that along either the x or y axis, the next pixel in the line is always the next integer value towards the end of the line. DDA's figure out which axis by evaluating the slope. Positive slopes between 0 and 1 will increment the x value by 1. Positive slopes greater than one will increment the y value by 1. Negative slopes between -1 and 0 will increment the x value by -1, and negative slopes less than -1 will increment the y value by -1.
Thrid, DDAs take advantage of the fact that if the change in one direction is 1, the change in the other direction is a function of the slope. Now it becomes much more difficult to explain in generalities. Therefore I'll just consider positive slopes between 0 and 1. In this case, to find the next pixel to plot, x is incremented by 1, and the change in y is calculated. One way to calculate the change in y is just add the slope to the previous y, and round to the integer value. This doesn't work unless you maintain the y value as a real number. Slopes greater than one can just increment y by 1, and calculate the change in x.
Fourth, some DDAs further optimize the algorithm by avoiding floating point calculations. For example, Bresenham's line algorithm is a DDA optimized to use integer arithmetic.
In this example, a line from (2, 2) to (8, 10), the slope is 8/6, which is greater than 1. The first pixel is at (2, 2). The next pixel is calculated by incrementing the y value by 1, and adding the change in x (the inverse slope, of dx/dy = 6/8 = .75) to x. The value of x would be 2.75 which is rounded to 3, and (3, 3) is plotted. The third pixel would increment y again, and then add the change in x to x (2.75 + .75 = 3.5). Rounding would plot the third pixel at (4, 4). The fourth pixel would then plot (5, 4), since y would be incremented by 1, but x would be incremented by .75, and equal 4.25.
From this example, can you see the problem with your code?

Equation to calculate different speeds for fade animation

I'm trying to add a fade effect to my form by manually changing the opacity of the form but I'm having some trouble calculating the correct value to increment by the Opacity value of the form.
I know I could use the AnimateWindow API but it's showing some unexpected behavior and I'd rather do it manually anyways as to avoid any p/invoke so I could use it in Mono later on.
My application supports speeds ranging from 1 to 10. And I've manually calculated that for a speed of 1 (slowest) I should increment the opacity by 0.005 and for a speed of 10 (fastest) I should increment by 0.1. As for the speeds between 1 and 10, I used the following expression to calculate the correct value:
double opSpeed = (((0.1 - 0.005) * (10 - X)) / (1 - 10)) + 0.1; // X = [1, 10]
I though this could give me a linear value and that that would be OK. However, for X equal 4 and above, it's already too fast. More than it should be. I mean, speeds between 7, and 10, I barely see a difference and the animation speed with these values should be a little more spaced
Note that I still want the fastest increment to be 0.1 and the slowest 0.005. But I need all the others to be linear between them.
What I'm doing wrong?
It actually makes sense why it works like this, for instance, for a fixed interval between increments, say a few milliseconds, and with the equation above, if X = 10, then opSpeed = 0.1 and if X = 5, then opSpeed = 0.47. If we think about this, a value of 0.1 will loop 10 times and a value of 0.47 will loop just the double. For such a small interval of just a few milliseconds, the difference between these values is not that much as to differentiate speeds from 5 to 10.

I think what you want is:
0.005 + ((0.1-0.005)/9)*(X-1)
for X ranging from 1-10
This gives a linear scale corresponding to 0.005 when X = 1 and 0.1 when X = 10
After the comments below, I'm also including my answer fit for a geometric series instead of a linear scale.
0.005 * (20^((X-1)/9)))
Results in a geometric variation corresponding to 0.005 when X = 1 and 0.1 when X = 10
After much more discussion, as seen in the comments below, the updates are as follows.
#Nazgulled found the following relation between my geometric series and the manual values he actually needed to ensure smooth fade animation.
The relationship was as follows:
Which means a geometric/exponential series is the way to go.
After my hours of trying to come up with the appropriate curve fitting to the right hand side graph and derive a proper equation, #Nazgulled informed me that Wolfram|Alpha does that. Seriously amazing. :)
Wolfram Alpha link
He should have what he wants now, barring very high error from the equation above.

Your problem stems from the fact that the human eye is not linear in its response; to be precise, the eye does not register the difference between a luminosity of 0.05 to 0.10 to be the same as the luminosity difference between 0.80 and 0.85. The whole topic is complicated; you may want to search for the phrase "gamma correction" for some additional information. In general, you'll probably want to find an equation which effectively "gamma corrects" for human ocular response, and use that as your fading function.

It's not really an answer, but I'll just point out that everyone who's posted so far, including the original question, are all posting the same equation. So with four independent derivations, maybe we should assume that the equation was probably correct.
I did the algebra, but here's the code to verify (in Python, btw, with offsets added to separate the curves:
from pylab import *
X = arange(1, 10, .1)
opSpeed0 = (((0.1 - 0.005) * (10 - X)) / (1 - 10)) + 0.1 # original
opSpeed1 = 0.005 + ((0.1-0.005)/9)*(X-1) # Suvesh
opSpeed2 = 0.005*((10-X)/9.) + 0.1*(X-1)/9. # duffymo
a = (0.1 - 0.005) / 9 #= 0.010555555555... # Roger
b = 0.005 - a #= -0.00555555555...
opSpeed3 = a*X+b
nonlinear01 = 0.005*2**((2*(-1 + X))/9.)*5**((-1 + X)/9.)
plot(X, opSpeed0)
plot(X, opSpeed1+.001)
plot(X, opSpeed2+.002)
plot(X, opSpeed3+.003)
plot(X, nonlinear01)
show()
Also, at Nazgulled's request, I've included the non-linear curve suggested by Suvesh (which also, btw, looks quite alot like a gamma correction curve, as suggested by McWafflestix). The Suvesh's nonlinear equation is in the code as nonlinear01.

Here's how I'd program that linear relationship. But first I'd like to make clear what I think you're doing.
You want the rate of change in opacity to be a linear function of speed:
o(v) = o1*N1(v) + o2*N2(v) so that 0 <= v <=1 and o(v1) = o1 and o(v2) = o2.
If we choose N1(v) to equal 1-v and N2(v) = v we end up with what you want:
o(v) = o1*(1-v) + o2*v
So, plugging in your values:
v = (u-1)/(10-1) = (u-1)/9
o1 = 0.005 and o2 = 0.1
So the function should look like this:
o(u) = 0.005*{1-(u-1)/9} + 0.1*(u-1)/9
o(u) = 0.005*{(9-u+1)/9} + 0.1*(u-1)/9
o(u) = 0.005*{(10-u)/9} + 0.1(u-1)/9
You can simplify this until you get a simple formula for o(u) where 1 <= u <= 10. Should work fine.

If I understand what you're after, you want the equation of a line which passes through these two points in the plane: (1, 0.005) and (10, 0.1). The general equation for such a line (as long as it is not vertical) is y = ax+b. Plug the two points into this equation and solve the resulting set of two linear equations to get
a = (0.1 - 0.005) / 9 = 0.010555555555...
b = 0.005 - a = -0.00555555555...
Then, for each integer x = 1, 2, 3, ..., 10, plug x into y = ax+b to compute y, the value you want.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio