How to generate Bad Random Numbers - random

I'm sure the opposite has been asked many times but I couldn't find any answers on how to generate bad random numbers.
I want to write a small program for cluster analysis and want to generate some random Points for testing. If I would just insert 1000 Points with random coordinates they would be scattered all over the field which would make a cluster analysis worthless.
Is there a simple way to generate Random Numbers which build clusters?
I already thought about either not using random() but random()*random() which generates normally distributed numbers (I think I read this somewhere here on Stack Overflow).
Second approach would be picking a few areas at random and run the point generation again in this area which would of course produce a cluster in this area.
Do you have a better idea?

If you are deliberately producing well formed clusters (rather than completely random clusters), you could combine the two to find a cluster center, and then put lots of points around it in a normal distribution.
As well working in cartesian coords (x,y); you could use a radial method to distribute points for a particular cluster. Choose a random angle (0-2PI radians), then choose a radius.
Note that as circumference is proportional radius, the area distribution will be denser close to the centre - but the distribution per specific radius will be the same. Modify the radial distribution to produce a more tightly packed cluster.
OR you could use real world derived data for semi-random point distributions with natural clustering. Recently I've been doing quite a bit of geospatial cluster analysis. For this I have used real world data - zipcode centroids (which form natural clusters around cities); and restaurant locations. Another suggestion: you could use a stellar catalogue or galactic catalogue.

Generate few anchors. True random numbers. Then generate noise around them:
anchor + dist * (random() - 0.5))
this will generate clustered numbers, that will be evenly distributed in distance dist.

Add an additional dimension to your model.
Draw an irregular (i.e. not flat) surface.
Generate numbers in the extended space.
Discard all numbers which are on one side of the surface.
From every number left, drop the additional dimension.

Maybe I have misunderstood, but the gnu scientific library (written in c) has many distributions written within it - could you not pick coordinates from the Gaussian/poisson etc from that library?
http://www.gnu.org/software/gsl/manual/html_node/Random-Number-Distributions.html
They provide a simple example with the Poisson distribution from the link, too.
If you need your distribution to be bounded (for example y-coordinate not less than -1) then you can achieve that by rejection sampling from the uniform distribution in the gsl.
Blessings, Tom

My first thought was that you could implement your own using a linear congruential generator and experiment with the coefficients until you get a low enough period to suit your needs. A really low m coefficient should do the trick.
I also like your second idea of running a good RNG around a few pre-selected points to create clusters. You could either target specific areas for the clusters with this method, or generate those randomly as well.

Related

How can I get a random distribution that "clusters" objects?

I'm working on a game, and I want to place some objects randomly throughout the world. However, I want the objects to be "clustered" in clumps. Is there any random distribution that clusters like this? Or is there some other technique I could use?
Consider using a bivariate normal (a.k.a. Gaussian) distribution. Generate separate normal values for the X and Y location. Bivariate normals are denser towards the center, sparser farther out, so your choice for the standard deviation of the distribution will determine how tight the clustering is - 2/3 of the items will be within 1 standard deviation of the distribution's center, 95% within 2 standard deviations, and almost all within 3 standard deviations.

Algorithms for finding a look alike face?

I'm doing a personal project of trying to find a person's look-alike given a database of photographs of other people all taken in a consistent manner - people looking directly into the camera, neutral expression and no tilt to the head (think passport photo).
I have a system for placing markers for 2d coordinates on the faces and I was wondering if there are any known approaches for finding a look alike of that face given this approach?
I found the following facial recognition algorithms:
http://www.face-rec.org/algorithms/
But none deal with the specific task of finding a look-alike.
Thanks for your time.
I believe you can also try searching for "Face Verification" rather than just "Face Recognition". This might give you more relevant results.
Strictly speaking, the 2 are actually different things in scientific literature but are sometimes lumped under face recognition. For details on their differences and some sample code, take a look here: http://www.idiap.ch/~marcel/labs/faceverif.php
However, for your purposes, what others such as Edvard and Ari has kindly suggested would work too. Basically they are suggesting a K-nearest neighbor style face recognition classifier.
As a start, you can probably try that. First, compute a feature vector for each of your face images in your database. One possible feature to use is the Local Binary Pattern (LBP). You can find the code by googling it. Do the same for your query image. Now, loop through all the feature vectors and compare them to that of your query image using euclidean distance and return the K nearest ones.
While the above method is easy to code, it will generally not be as robust as some of the more sophisticated ones because they generally fail badly when faces are not aligned (known as unconstrained pose. Search for "Labelled Faces in the Wild" to see the results for state of the art for this problem.) or taken under different environmental conditions. But if the faces in your database are aligned and taken under similar conditions as you mentioned, then it might just work. If they are not aligned, you can use the face key points, which you mentioned you are able to compute, to align the faces. In general, comparing faces which are not aligned is a very difficult problem in computer vision and is still a very active area of research. But, if you only consider faces that look alike and in the same pose to be similar (i.e. similar in pose as well as looks) then this shouldn't be a problem.
The website your gave have links to the code for Eigenfaces and Fisherfaces. These are essentially 2 methods for computing feature vectors for your face images. Faces are identified by doing a K nearest neighbor search for faces in the database with feature vectors (computed using PCA and LDA respectively) closest to that of the query image.
I should probably also mention that in the Fisherfaces method, you will need to have "labels" for the faces in your database to identify the faces. This is because Linear Discriminant Analysis (LDA), the classification method used in Fisherfaces, needs this information to compute a projection matrix that will project feature vectors for similar faces close together and dissimilar ones far apart. Comparison is then performed on these projected vectors. Here lies the difference between Face Recognition and Face Verification: for recognition, you need to have "labels" your training images in your database i.e. you need to identify them.
For verification, you are only trying to tell whether any 2 given faces are of the same person. Often, you don't need the "labelled" data in the traditional sense (although some methods might make use of auxiliary training data to help in the face verification).
The code for computing Eigenfaces and Fisherfaces are available in OpenCV in case you use it.
As a side note:
A feature vector is actually just a vector in your linear algebra sense. It is simply n numbers packed together. The word "feature" refers to something like a "statistic" i.e. a feature vector is a vector containing statistics that characterizes the object it represents. For e.g., for the task of face recognition, the simplest feature vector would be the intensity values of the grayscale image of the face. In that case, I just reshape the 2D array of numbers into a n rows by 1 column vector, each entry containing the value of one pixel. The pixel value here is the "feature", and the n x 1 vector of pixel values is the feature vector. In the LBP case, roughly speaking, it computes a histogram at small patches of pixels in the image and joins these histograms together into one histogram, which is then used as the feature vector. So the Local Binary Pattern is the statistic and the histograms joined together is the feature vector. Together they described the "texture" and facial patterns of your face.
Hope this helps.
These two would seem like the equivalent problem, but I do not work in the field. You essentially have the following two problems:
Face recognition: Take a face and try to match it to a person.
Find similar faces: Take a face and try to find similar faces.
Aren't these equivalent? In (1) you start with a picture that you want to match to the owner and you compare it to a database of reference pictures for each person you know. In (2) you pick a picture in your reference database and run (1) for that picture against the other pictures in the database.
Since the algorithms seem to give you a measure of how likely two pictures belong to the same person, in (2) you just sort the measures in decreasing order and pick the top hits.
I assume you should first analyze all the picture in your database with whatever approach you are using. You should then have a set of metrics for each picture which you can compare a specific picture with and statistically find the closest match.
For example, if you can measure the distance between the eyes, you can find faces that have the same distance. You can then find the face that has the overall closest match and return that.

gis polygon map overlay intersection operation

There are many algorithms for binary map overlay operation in vector data format which take two layers of map and produce resultant layer i.e overlaid layer as output. I am wondering whether there are any algorithms which take more than two layers say 3 layers simultaneously and produce the overlay result?
There are a variety of geographic computational overlay procedures available for multiple layers. These fall into the group of multiple criteria decision analysis, whereby multiple criteria (map)layers are standardized and combined (overlayed) to produce a resulting (map)layer. However, many of these are for raster data inputs!
If in fact you want to just combine vector data to produce an intersection, a procedural model would work best as #Thomas has commented. This can be done vis a vis python (standalone) or with model builder inside arcgis. Alas, there are other methods that can be used to script the procedural overlay process.
I would like you to think about what exactly you're aiming to do. Let's think about the following scenarios:
You have a vector polygon of some City, and your goal is to overlay all the industrial, residential and commercial land usage. This would leave you to subtract the different land uses from your City polygon, one by one. Or, you can merge your three land uses into one poylgon and subtract from your City polygon.
Given the wide range of multiple criteria decision analysis methodologies (eg. weighted linear combination), a raster methodology might be suitable if you're looking for the "optimal location" For instance, if you were looking for a location in the City that has an optimal combination of industrial, commercial and retail land use, weighted linear combination could be used.
Let us define our land use weights as 20%, 40%, 40% (industrial, commercial, retail). We must also standardize our land use layer values between 0 and 1. The following combination of layer values give the most optimal combination of the three criteria: 0.2, 0.4 and 0.4 = 1.

Algorithm for following the path of ridges on a 3D image

I'm trying to find an algorithm (or algorithm ideas) for following a ridge on a 3D image, derived from a digital elevation model (DEM). I've managed to get very basic program working which just iterates across each row of the image marking a ridge line wherever it finds a large change in aspect (ie. from < 180 degrees to > 180 degrees).
However, the lines this produces aren't brilliant, there are often gaps and various strange artefacts. I'm hoping to try and extend this by using some sort of algorithm to follow the ridge lines, thus producing lines that are complete (that is, no gaps) and more accurate.
A number of people have mentioned snake algorithms to me, but they don't seem to be quite what I'm looking for. I've also done a lot of searching about path-finding algorithms, but again, they don't seem to be quite the right thing.
Does anyone have any suggestions for types or algorithms or specific algorithms I should look at?
Update: I've been asked to add some more detail on the exact area I'll be applying this to. It's working with gridded elevation data of sand dunes. I'm trying to extract the crests if these sand dunes, which look similar to the boundaries between drainage basins, but can be far more complex (for example, there can be multiple sand dunes very close to each other with gradually merging crests)
You can get a good estimate of the ridges using sign changes of the curvature. Note that the curvature will be near infinity at flat regions. Hence possible psuedo-code for a ridge detection algorithm could be:
for each face in the mesh
compute 1/curvature
if abs(1/curvature) != zeroTolerance
flag face as ridge
else
continue
(zeroTolerance is a number near but not equal to zero e.g. 0.003 etc)
Also Meshlab provides a module for normal & curvature estimation on most formats. You can test the idea using it, before you code it up.
I don't know how what your data is like or how much automation you need. This won't work if if consists of peaks without clear ridges (but then you probably wouldn't be asking the question.)
startPoint = highest point in DEM (or on ridge)
curPoint = startPoint;
line += curPoint;
Loop
curPoint = highest point adjacent to curPoint not in line; // (Don't backtrack)
line += point;
Repeat
Curious what the real solution turns out to be.
Edited to add: depending on the coarseness of your data set, 'point' can be a single point or a smoothed average of a local region of points.
http://en.wikipedia.org/wiki/Ridge_detection
You can treat the elevation as you would a grayscale color, then use a 2D edge recognition filter. There are lots of edge recognition methods available. The best would depend on your specific needs.

What's a good way to generate random clusters and paths?

I'm toying around with writing a random map generator, and am not quite sure how to randomly generate realistic landscapes. I'm working with these sorts of local-scale maps, which presents some interesting problems.
One of the simplest cases is the forest:
Sparse Medium Dense
Typical trees 50% 70% 80%
Massive trees — 10% 20%
Light undergrowth 50% 70% 50%
Heavy undergrowth — 20% 50%
Trees and undergrowth can exist in the same space, so an average sparse forest has 25% typical trees and light undergrowth, 25% typical trees, 25% light undergrowth, and 25% open space. Medium and dense forests will take a bit more thinking, but it's not where my problem lies either, as it's all evenly dispersed.
My problem lies in generating clusters and paths, while keeping the percentage constraints. Marshes are a good example of this:
Moor Swamp
Shallow bog 20% 40%
Deep bog 5% 20%
Light undergrowth 30% 20%
Heavy undergrowth 10% 20%
Deep bog squares are usually clustered together and surrounded by an irregular ring of shallow bog squares.
An additional map element, a hedgerow, may also be present, as well as a path of open ground, snaking through the bog. Both of these types of map elements (clusters and paths) present problems, as the total composition of the map should contain X% of the element, but it's not evenly distributed. Other elements, such as streams, ponds, and quicksand need either a cluster or path-type generation as well.
What technique can I use to generate realistic maps given these constraints?
I'm using C#, FYI (but this isn't a C#-specific question.)
Realistic "random" distribution is often done using Perlin Noise, which can be used to give a distribution with "clumps" like you mention. It works by summing/combining multiple layers of linearly interpolated values from random data points. Each layer (or "octave") has twice as many data points as the last, and confined to a narrower range of values. The result is "realistic" looking random texture.
Here is a beautiful demonstration of the theory behind Perlin Noise by Hugo Elias.
Here is the first thing I found on Perlin Noise in C#.
What you can do is generate a Perlin Noise image and set a "threshold", where anything above a value is "on" and everything below it is "off". What you will end up with is clumps where things are above the threshold, which look irregular and awesome. Simply assign the ones above the threshold to where you want your terrain feature to be.
Here is a demonstration if a program generating a Perlin Noise bitmap and then adjusting the cut-off threshold over time. A clear "clumping" is visible. It could be just what you wanted.
Notice that, with a high threshold, very few points are above it, and it's sparse. But as the threshold lowers, those points "grow" into clumps (by the nature of perlin noise), and some of these clumps will join eachother, and basically create something very natural and terrain-like.
Note that you could also set the "clump factor", or the tendency of features to clump, by setting the "turbulence" of your Perlin Noise function, which basically causes peaks and valleys of your PN function to be accentuated and closer together.
Now, where to set the threshold? The higher the threshold, the lower the percentage of the feature on the final map. The lower the threshold, the higher the percentage. You can mess around with them. You could probably get exact percentages by fiddling around with a little math (it seems that the distribution of values follows a Normal Distribution; I could be wrong). Tweak it until it's just right :)
EDIT As pointed out in the comments, you can find the exact percentage by creating a cumulative histogram (index of what % of the map is under a threshold) and pick the threshold that gives you the percent you need.
The coolest thing here is that you can create features that clump around certain other features (like your marsh features) trivially here -- just use the same Perlin Noise map twice -- the second time, lowering the threshold. The first one will be clumpy, and the second one will be clumpy around the same areas, but with the clumps enlarged (refer to the flash animation posted earlier).
As for other features like hedgerows, you could try modeling simple random walk lines that have a higher tendency to go straight than turn, and place them anywhere randomly on your perlin-based map.
samples
Here is a sample 50x50 tile Sparse Forest Map. The undergrowth is colored brown and the trees are colored blue (sorry) to make it clear which is which.
For this map I didn't make exact thresholds to match 50%; I only set the threshold at 50% of the maximum. Statistically, this will average out to exactly 50% every time. But it might not be exact enough for your purposes; see the earlier note for how to do this.
Here is a demo of your Marsh features (not including undergrowth, for clarity), with shallow marsh in grey and deep marsh in back:
This is just 50x50, so there are some artifacts from that, but you can see how easily you can make the shallow marsh "grow" from the deep marsh -- simply by adjusting the threshold on the same Perlin map. For this one, I eyeballed the threshold level to give the most eye-pleasing results, but for your own purposes, you could do what was mentioned before.
Here is a marsh map generated from the same Perlin Noise map, but on stretched out over 250x250 tiled map instead:
I've never done this sort of thing, but here are some thoughts.
You can obtain clusters by biasing random selection to locations on the grid that are close to existing elements of that type. Assign a default value of 1 to all squares. For squares with existing clustered elements, add clustering value to to adjacent squares (the higher the clustering value, the stronger the clustering will be). Then do random selection for the next element of that type on the probability distribution function of all the squares.
For paths, you could have a similar procedure, except that paths would be extended step-wise (probability of path is finite at squares next to the end of the path and zero everywhere else). Directional paths could be done by increasing the probability of selection in the direction of the path. Meandering paths could have a direction that changes over the course of random extension (new_direction = mf * old_direction + (1-mf) * rand_direction, where mf is a momentum factor between 0 and 1).
To expand on academicRobot's comments, you could start with a default marsh or forest seed in some of the grid cells and let them grow from the source using a correlated random number. For instance a bog might have eight adjacent grid cells each of which has a 90% probability of also being a bog, but a 10% probability of being something else. You can let the ecosytem form from the seed and adjust the correlation until you get something that looks right. Probably pretty easy to implement even in a spreadsheet.
You could start reading links here. I remember looking at much better document. Will post it if I find it (it was also based on L-systems).
But that's on the general side; on the particular problem you face I guess you should model it in terms of
percentages
other rules (clusters and paths)
The point is that even though you don't know how to construct the map with given properties, if you are able to evaluate the properties (clustering ratio; path niceness) and score on them you can then brute force or do some other problem space transversal.
If you still want to do generative approach then you will have to examine generative rules a bit closer; here's an idea that I would pursue
create patterns of different terrains and terrain covers that have required properties of 'clusterness', 'pathness' or uniformity
create the patterns in such a way that the values for deep bog are not discreet, but assign probability value; after the pattern had been created you can normalize this probability in such a way that it will produce required percentage of cover
mix different patterns together
You might have some success for certain types of area with a Voronoi pattern. I've never seen it used to create maps but I have seen it used in a number of similar fields.

Resources