I have an assignment problem at hand and am wondering how suitable it would be to apply local search techniques to reach a desirable solution (the search space is quite large).
I have a directed graph (a flow-chart) that I would like to visualize on 2-D plane in a way that it is very clear, understandable and easy to read by human-eye. Therefore; I will be assigning (x,y) positions to each vertex. I'm thinking of solving this problem using simulated annealing, genetic algorithms, or any such method you can suggest
Input: A graph G = (V,E)
Output: A set of assignments, {(xi, yi) for each vi in V}. In other words, each vertex will be assigned a position (x, y) where the coordinates are all integers and >= 0.
These are the criteria that I will use to judge a solution (I welcome any suggestions):
Number of intersecting edges should be minimal,
All edges flow in one direction (i.e from left to right),
High angular resolution (the smallest angle formed by two edges
incident on the same vertex),
Small area - least important.
Furthermore; I have an initial configuration (assignment of positions to vertices), made by hand. It is very messy and that's why I'm trying to automate the process.
My questions are,
How wise would it be to go with local search techniques? How likely
would it produce a desired outcome?
And what should I start with? Simulated annealing, genetic algorithms
or something else?
Should I seed randomly at the beginning or use the initial
configuration to start with?
Or, if you already know of a similar implementation/pseudo-code/thing, please point me to it.
Any help will be greatly appreciated. Thanks.
EDIT: It doesn't need to be fast - not in real-time. Furthermore; |V|=~200 and each vertex has about 1.5 outgoing edges on average. The graph has no disconnected components. It does involve cycles.
I would suggest looking at http://www.graphviz.org/Theory.php since graphviz is one of the leading open source graph visualizers.
Depending on what the assignment is, maybe it would make sense to use graphviz for the visualization altogether.
This paper is a pretty good overview of the various approaches. Roberto Tomassia's book is also a good bet.
http://oreilly.com/catalog/9780596529321 - In this book you might find implementation of genetic algorithm for fine visualization of 2D graph.
In similar situations I'm prefer using genetic algorithm. Also you might start with random initialized population - according to my experience after few iterations, you'll find quite good (but also not the best) solution.
Also, using java you're may paralell this algorithm (isolated islands strategy) - it is rather efficient improvement.
Also I'd like to advice you Differential evolution algorithm. From my experience - it finds solution much more quickly than genetic optimization.
function String generateGenetic()
String genetic = "";
for each vertex in your graph
Generate random x and y;
String xy = Transform x and y to a fixed-length bit string;
genetic + = xy;
endfor
return genetic;
write a function double evaluate(String genetic) which will give you a level of statisfaction. (probably based on the how many edges intersect and edges direction.
your program:
int population = 1000;
int max_iterations = 1000;
double satisfaction = 0;
String[] genetics = new String[population]; //this is ur population;
while((satisfaction<0.8)&&(count<max_iterations)){
for (int i=0;i<population;i++){
if(evaluate(genetics[i])>satisfaction)
satisfaction = evaluate(genetics[i]);
else
manipulate(genetics[i]);
}
}
funciton manipulate can flip some bit of the string or multiple bits or a portion that encodes x and y of a vertex or maybe generate completely a new genetic string or try to solve a problem inside it(direct an edge).
To answer your first question, I must say it depends. It depends on a number of different factors such as:
How fast it needs to be (does it need to be done in real-time?)
How many vertices there are
How many edges there are compared to the number of vertices (i.e. is it a dense or sparse graph?)
If it needs to be done in a real-time, then local search techniques would not be best as they can take a while to run before getting a good result. They would only be fast enough if the size of the graph was small. And if it's small to begin with, you shouldn't have to use local search to begin with.
There are already algorithms out there for rendering graphs as you describe. The question is, at which point does the problem grow too big for them to be effective? I don't know the answer to that question, but I'm sure you could do some research to find out.
Now going on to your questions about implementation of a local search.
From my personal experience, simulated annealing is easier to implement than a genetic algorithm. However I think this problem translates nicely into both settings. I would start with SA though.
For simulated annealing, you would start out with a random configuration. Then you can randomly perturb the configuration by moving one or more vertices some random distance. I'm sure you can complete the details of the algorithm.
For a genetic algorithm approach, you can also start out with a random population (each graph has random coordinates for the vertices). A mutation can be like the perturbation in SA algorithm I described. Recombination can simply be taking random vertices from the parents and using them in the child graph. Again, I'm sure you can fill in the blanks.
The sum up: Use local search only if your graph is big enough to warrant it and if you don't need it to be done super quickly (say less than a few seconds). Otherwise use a different algorithm.
EDIT: In light of your graph parameters, I think you can do just use whichever algorithm is easiest to code. With V=200, even an O(V^3) algorithm would be sufficient. Personally I feel like simulated annealing would be easiest and the best route.
Related
I am certain there is already some algorithm that does what I need, but I am not sure what phrase to Google, or what is the algorithm category.
Here is my problem: I have a polyhedron made up by several contacting blocks (hyperslabs), i. e. the edges are axis aligned and the angles between edges are 90°. There may be holes inside the polyhedron.
I want to break up this concave polyhedron in as little convex rectangular axis-aligned whole blocks are possible (if the original polyhedron is convex and has no holes, then it is already such a block, and therefore, the solution). To illustrate, some 2-D images I made (but I need the solution for 3-D, and preferably, N-D):
I have this geometry:
One possible breakup into blocks is this:
But the one I want is this (with as few blocks as possible):
I have the impression that an exact algorithm may be too expensive (is this problem NP-hard?), so an approximate algorithm is suitable.
One detail that maybe make the problem easier, so that there could be a more appropriated/specialized algorithm for it is that all edges have sizes multiple of some fixed value (you may think all edges sizes are integer numbers, or that the geometry is made up by uniform tiny squares, or voxels).
Background: this is the structured grid discretization of a PDE domain.
What algorithm can solve this problem? What class of algorithms should I
search for?
Update: Before you upvote that answer, I want to point out that my answer is slightly off-topic. The original poster have a question about the decomposition of a polyhedron with faces that are axis-aligned. Given such kind of polyhedron, the question is to decompose it into convex parts. And the question is in 3D, possibly nD. My answer is about the decomposition of a general polyhedron. So when I give an answer with a given implementation, that answer applies to the special case of polyhedron axis-aligned, but it might be that there exists a better implementation for axis-aligned polyhedron. And when my answer says that a problem for generic polyhedron is NP-complete, it might be that there exists a polynomial solution for the special case of axis-aligned polyhedron. I do not know.
Now here is my (slightly off-topic) answer, below the horizontal rule...
The CGAL C++ library has an algorithm that, given a 2D polygon, can compute the optimal convex decomposition of that polygon. The method is mentioned in the part 2D Polygon Partitioning of the manual. The method is named CGAL::optimal_convex_partition_2. I quote the manual:
This function provides an implementation of Greene's dynamic programming algorithm for optimal partitioning [2]. This algorithm requires O(n4) time and O(n3) space in the worst case.
In the bibliography of that CGAL chapter, the article [2] is:
[2] Daniel H. Greene. The decomposition of polygons into convex parts. In Franco P. Preparata, editor, Computational Geometry, volume 1 of Adv. Comput. Res., pages 235–259. JAI Press, Greenwich, Conn., 1983.
It seems to be exactly what you are looking for.
Note that the same chapter of the CGAL manual also mention an approximation, hence not optimal, that run in O(n): CGAL::approx_convex_partition_2.
Edit, about the 3D case:
In 3D, CGAL has another chapter about Convex Decomposition of Polyhedra. The second paragraph of the chapter says "this problem is known to be NP-hard [1]". The reference [1] is:
[1] Bernard Chazelle. Convex partitions of polyhedra: a lower bound and worst-case optimal algorithm. SIAM J. Comput., 13:488–507, 1984.
CGAL has a method CGAL::convex_decomposition_3 that computes a non-optimal decomposition.
I have the feeling your problem is NP-hard. I suggest a first step might be to break the figure into sub-rectangles along all hyperplanes. So in your example there would be three hyperplanes (lines) and four resulting rectangles. Then the problem becomes one of recombining rectangles into larger rectangles to minimize the final number of rectangles. Maybe 0-1 integer programming?
I think dynamic programming might be your friend.
The first step I see is to divide the polyhedron into a trivial collection of blocks such that every possible face is available (i.e. slice and dice it into the smallest pieces possible). This should be trivial because everything is an axis aligned box, so k-tree like solutions should be sufficient.
This seems reasonable because I can look at its cost. The cost of doing this is that I "forget" the original configuration of hyperslabs, choosing to replace it with a new set of hyperslabs. The only way this could lead me astray is if the original configuration had something to offer for the solution. Given that you want an "optimal" solution for all configurations, we have to assume that the original structure isn't very helpful. I don't know if it can be proven that this original information is useless, but I'm going to make that assumption in this answer.
The problem has now been reduced to a graph problem similar to a constrained spanning forest problem. I think the most natural way to view the problem is to think of it as a graph coloring problem (as long as you can avoid confusing it with the more famous graph coloring problem of trying to color a map without two states of the same color sharing a border). I have a graph of nodes (small blocks), each of which I wish to assign a color (which will eventually be the "hyperslab" which covers that block). I have the constraint that I must assign colors in hyperslab shapes.
Now a key observation is that not all possibilities must be considered. Take the final colored graph we want to see. We can partition this graph in any way we please by breaking any hyperslab which crosses the partition into two pieces. However, not every partition is meaningful. The only partitions that make sense are axis aligned cuts, which always break a hyperslab into two hyperslabs (as opposed to any more complicated shape which could occur if the cut was not axis aligned).
Now this cut is the reverse of the problem we're really trying to solve. That cutting is actually the thing we did in the first step. While we want to find the optimal merging algorithm, undoing those cuts. However, this shows a key feature we will use in dynamic programming: the only features that matter for merging are on the exposed surface of a cut. Once we find the optimal way of forming the central region, it generally doesn't play a part in the algorithm.
So let's start by building a collection of hyperslab-spaces, which can define not just a plain hyperslab, but any configuration of hyperslabs such as those with holes. Each hyperslab-space records:
The number of leaf hyperslabs contained within it (this is the number we are eventually going to try to minimize)
The internal configuration of hyperslabs.
A map of the surface of the hyperslab-space, which can be used for merging.
We then define a "merge" rule to turn two or more adjacent hyperslab-spaces into one:
Hyperslab-spaces may only be combined into new hyperslab-spaces (so you need to combine enough pieces to create a new hyperslab, not some more exotic shape)
Merges are done simply by comparing the surfaces. If there are features with matching dimensionalities, they are merged (because it is trivial to show that, if the features match, it is always better to merge hyperslabs than not to)
Now this is enough to solve the problem with brute force. The solution will be NP-complete for certain. However, we can add an additional rule which will drop this cost dramatically: "One hyperslab-space is deemed 'better' than another if they cover the same space, and have exactly the same features on their surface. In this case, the one with fewer hyperslabs inside it is the better choice."
Now the idea here is that, early on in the algorithm, you will have to keep track of all sorts of combinations, just in case they are the most useful. However, as the merging algorithm makes things bigger and bigger, it will become less likely that internal details will be exposed on the surface of the hyperslab-space. Consider
+===+===+===+---+---+---+---+
| : : A | X : : : :
+---+---+---+---+---+---+---+
| : : B | Y : : : :
+---+---+---+---+---+---+---+
| : : | : : : :
+===+===+===+ +---+---+---+
Take a look at the left side box, which I have taken the liberty of marking in stronger lines. When it comes to merging boxes with the rest of the world, the AB:XY surface is all that matters. As such, there are only a handful of merge patterns which can occur at this surface
No merges possible
A:X allows merging, but B:Y does not
B:Y allows merging, but A:X does not
Both A:X and B:Y allow merging (two independent merges)
We can merge a larger square, AB:XY
There are many ways to cover the 3x3 square (at least a few dozen). However, we only need to remember the best way to achieve each of those merge processes. Thus once we reach this point in the dynamic programming, we can forget about all of the other combinations that can occur, and only focus on the best way to achieve each set of surface features.
In fact, this sets up the problem for an easy greedy algorithm which explores whichever merges provide the best promise for decreasing the number of hyperslabs, always remembering the best way to achieve a given set of surface features. When the algorithm is done merging, whatever that final hyperslab-space contains is the optimal layout.
I don't know if it is provable, but my gut instinct thinks that this will be an O(n^d) algorithm where d is the number of dimensions. I think the worst case solution for this would be a collection of hyperslabs which, when put together, forms one big hyperslab. In this case, I believe the algorithm will eventually work its way into the reverse of a k-tree algorithm. Again, no proof is given... it's just my gut instinct.
You can try a constrained delaunay triangulation. It gives very few triangles.
Are you able to determine the equations for each line?
If so, maybe you can get the intersection (points) between those lines. Then if you take one axis, and start to look for a value which has more than two points (sharing this value) then you should "draw" a line. (At the beginning of the sweep there will be zero points, then two (your first pair) and when you find more than two points, you will be able to determine which points are of the first polygon and which are of the second one.
Eg, if you have those lines:
verticals (red):
x = 0, x = 2, x = 5
horizontals (yellow):
y = 0, y = 2, y = 3, y = 5
and you start to sweep through of X axis, you will get p1 and p2, (and we know to which line-equation they belong ) then you will get p3,p4,p5 and p6 !! So here you can check which of those points share the same line of p1 and p2. In this case p4 and p5. So your first new polygon is p1,p2,p4,p5.
Now we save the 'new' pair of points (p3, p6) and continue with the sweep until the next points. Here we have p7,p8,p9 and p10, looking for the points which share the line of the previous points (p3 and p6) and we get p7 and p10. Those are the points of your second polygon.
When we repeat the exercise for the Y axis, we will get two points (p3,p7) and then just three (p1,p2,p8) ! On this case we should use the farest point (p8) in the same line of the new discovered point.
As we are using lines equations and points 2 or more dimensions, the procedure should be very similar
ps, sorry for my english :S
I hope this helps :)
I am trying to solve the following problem:
I have a 2D tiled game which consists in airplanes flying in the airspace, trying to land in the nearest airport (there can be 'n' goals). The idea is making the planes search for the best path by themselves, avoiding colisions.
So I was going to try the A* algorithm, but then I found this other restriction: The planes can change their altitude if they need to. So I had the idea to implement the same philosophy of A*, but in 3D (of expanding nodes to the possible moves, letting the plane move also up, down, down-north, up-east, etc., making an abstract 3D to handle a relative altitude, and thus letting the algorithm find the best path with 3D moves).
About the heuristics, I discarded the manhattan dinstance because I wanted the algorithm to be more efficient (because you know a good heuristic makes a more efficient search, manhattan overstimates the cost, and I am using diagonal moves), so I decided to implement the diagonal distance (which combines aspects from both manhattan and euclidean), recommended to 8-adjacencies (expanding nodes also in diagonal moves). But I have a lot more adjacencies, so I was trying to adapt the diagonal distance formulas to 16-adjacencies (excluding the up and down diagonals like up-northeast, down-sowthwest, and so on), so the manhattan estimate for every 'diagonal move' (except those I mention) has the same cost value (1 diagonal move = 2 ortogonal moves, not 3 as it'd be in the "up and down diagonals" I have excluded), and with that the formulas for this heuristic were generalized like this:
Let node A be the start, and B the goal, and their respective locations be (xa,ya,za) and (xb,yb,zb)
numberOfDiagonalSteps = min{|xa-xb|,|ya-yb|,|za-zb|}
manhattanDistance = |xa-xb| + |ya-yb| + |za-zb|
numberOfStraightSteps = manhattanDistance - 2*numberOfDiagonalSteps
And assuming diagonal steps cost sqrt(3) (you know, Pythagoras, having ortogonal costing 1):
The heuristic is: h(n) = numberOfStraightSteps + sqrt(3)*numberOfDiagonalSteps
Well... one of my questions is that, as planes are moving ("obstacle nodes"), the algorithm has to be refreshing, re-executing, so, what do you recommend me to do best?
I mean... is it better to try it like that, or better try to implement the D*-Lite?
And my other question is about time complexity. It is clear that the worst case for these algorithms is exponential, but it can be really improved from a good heuristic. But I don't find how the algorithm in my problem can be precisely analyzed. What time complexity can I give to that algorithm, or what do you suggest me to do in my case?
Thank you for your attention.
I would use simple map filling see:
http://en.wikipedia.org/wiki/Dijkstra's_algorithm
https://www.allegro.cc/forums/thread/599046
but the map will have more layers (flight altitudes). There can be just few of them (to limit time/memory wasting) for example 8 layers should be enough for up to 128 airplanes.
Of course it depends on 2D map area size and also after filling the map just take the shortest path from it. In filling the map consider any plane as obstacle (with some border around for safety). In this algorithm you can very simply add fuel consumption criteria or any other.
Also airfield selection can be very simple by this (first wants first gets the closest one). You have to have map for each airplane in time of decision (or refiling the same one for each plane separately). Do not need to be the whole map ... just the area between destination and plane
If you have to obey air traffic regulations then you need to apply flight plans + ad hoc scheduling instead. That is not an easy task (took me almost half a year to code it) and also the air traffic control is a bit complex especially the waiting ques in air and field sharing on the ground. All must by dynamically changeable (weather,waits,technical/political or security issues,...) but I strongly doubt this is the case so simple map filling above should do :)
Here is the problem:
I have many sets of points, and want to come up with a function that can take one set and rank matches based on their similarity to the first. Scaling, translation, and rotation do not matter, and some points may be missing from any of the sets of points. The best match is the one that if scaled and translated in the ideal way has the least mean square error between points (maybe with a cap on penalty, or considering only the best fraction of points to handle missing points).
I am trying to come up with a good way to do this, and am wondering if there are any well known algorithms that can handle this type of problem? Just the name of something would be awesome! I lack a formal CSCI or math education, and am doing the best to teach myself.
A few things I have tried
The first thing that comes to mind is to normalize the points somehow, but I dont think that this is helpful because the missing points may throw things off.
The best way I can think of is to estimate a starting point by translating to match their centroids, scaling so that the largest distances from the centroid of the sets match. From there, do an A* search, scaling, rotating, and translating until I reach a maximum, and then compare the two sets. (I hope I am using the term A* correctly, I mean trying small translations and scalings and selecting the move giving the best match) I think this will find the global maximum most of the time, but is not guaranteed to. I am looking for a better way that will always be correct.
Thanks a ton for the help! It has been fun and interesting trying to figure this out so far, so I hope it is for you as well.
There's a very clever algorithm for identifying starfields. You find 4 points in a diamond shape and then using the two stars farthest apart you define a coordinate system locating the other two stars. This is scale and rotation invariant because the locations are relative to the first two stars. This forms a hash. You generate several of these hashes and use those to generate candidates. Once you have the candidates you look for ones where multiple hashes have the correct relationships.
This is described in a paper and a presentation on http://astrometry.net/ .
This paper may be useful: Shape Matching and Object Recognition Using Shape Contexts
Edit:
There is a couple of relatively simple methods to solve the problem:
To combine all possible pairs of points (one for each set) to nodes, connect these nodes where distances in both sets match, then solve the maximal clique problem for this graph. Since the maximal clique problem is NP-complete, the complexity is probably O(exp(n^2)), so if you have too many points, don't use this algorithm directly, use some approximation.
Use Generalised Hough transform to match two sets of points. This approach has less complexity (O(n^4)). But it is more complicated, so I cannot explain it here.
You can find the details in computer vision books, for example "Machine vision: theory, algorithms, practicalities" by E. R. Davies (2005).
I'm asking this questions out of curiostity, since my quick and dirty implementation seems to be good enough. However I'm curious what a better implementation would be.
I have a graph of real world data. There are no duplicate X values and the X value increments at a consistant rate across the graph, but Y data is based off of real world output. I want to find the nearest point on the graph from an arbitrary given point P programmatically. I'm trying to find an efficient (ie fast) algorithm for doing this. I don't need the the exact closest point, I can settle for a point that is 'nearly' the closest point.
The obvious lazy solution is to increment through every single point in the graph, calculate the distance, and then find the minimum of the distance. This however could theoretically be slow for large graphs; too slow for what I want.
Since I only need an approximate closest point I imagine the ideal fastest equation would involve generating a best fit line and using that line to calculate where the point should be in real time; but that sounds like a potential mathematical headache I'm not about to take on.
My solution is a hack which works only because I assume my point P isn't arbitrary, namely I assume that P will usually be close to my graph line and when that happens I can cross out the distant X values from consideration. I calculating how close the point on the line that shares the X coordinate with P is and use the distance between that point and P to calculate the largest/smallest X value that could possible be closer points.
I can't help but feel there should be a faster algorithm then my solution (which is only useful because I assume 99% of the time my point P will be a point close to the line already). I tried googling for better algorithms but found so many algorithms that didn't quite fit that it was hard to find what I was looking for amongst all the clutter of inappropriate algorithms. So, does anyone here have a suggested algorithm that would be more efficient? Keep in mind I don't need a full algorithm since what I have works for my needs, I'm just curious what the proper solution would have been.
If you store the [x,y] points in a quadtree you'll be able to find the closest one quickly (something like O(log n)). I think that's the best you can do without making assumptions about where the point is going to be. Rather than repeat the algorithm here have a look at this link.
Your solution is pretty good, by examining how the points vary in y couldn't you calculate a bound for the number of points along the x axis you need to examine instead of using an arbitrary one.
Let's say your point P=(x,y) and your real-world data is a function y=f(x)
Step 1: Calculate r=|f(x)-y|.
Step 2: Find points in the interval I=(x-r,x+r)
Step 3: Find the closest point in I to P.
If you can use a data structure, some common data structures for spacial searching (including nearest neighbour) are...
quad-tree (and octree etc).
kd-tree
bsp tree (only practical for a static set of points).
r-tree
The r-tree comes in a number of variants. It's very closely related to the B+ tree, but with (depending on the variant) different orderings on the items (points) in the leaf nodes.
The Hilbert R tree uses a strict ordering of points based on the Hilbert curve. The Hilbert curve (or rather a generalization of it) is very good at ordering multi-dimensional data so that nearby points in space are usually nearby in the linear ordering.
In principle, the Hilbert ordering could be applied by sorting a simple array of points. The natural clustering in this would mean that a search would usually only need to search a few fairly-short spans in the array - with the complication being that you need to work out which spans they are.
I used to have a link for a good paper on doing the Hilbert curve ordering calculations, but I've lost it. An ordering based on Gray codes would be simpler, but not quite as efficient at clustering. In fact, there's a deep connection between Gray codes and Hilbert curves - that paper I've lost uses Gray code related functions quite a bit.
EDIT - I found that link - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.133.7490
Given are two sets of three-dimensional points, a source and a destination set. The number of points on each set is arbitrary (may be zero). The task is to assign one or no source point to every destination point, so that the sum of all distances is minimal. If there are more source than destination points, the additional points are to be ignored.
There is a brute-force solution to this problem, but since the number of points may be big, it is not feasible. I heard this problem is easy in 2D with equal set sizes, but sadly these preconditions are not given here.
I'm interested in both approximations and exact solutions.
Edit: Haha, yes, I suppose it does sound like homework. Actually, it's not. I'm writing a program that receives positions of a large number of cars and i'm trying to map them to their respective parking cells. :)
One way you could approach this problem is to treat is as the classical assignment problem: http://en.wikipedia.org/wiki/Assignment_problem
You treat the points as the vertices of the graph, and the weights of the edges are the distance between points. Because the fastest algorithms assume that you are looking for maximum matching (and not minimum as in your case), and that the weights are non-negative, you can redefine weights to be e.g.:
weight(A, B) = bigNumber- distance(A,B)
where bigNumber is bigger than your longest distance.
Obviously you end up with a bipartite graph. Then you use one of the standard algorithms for maximum weighted bipartite matching (lots of resources on the web, e.g. http://valis.cs.uiuc.edu/~sariel/teach/courses/473/notes/27_matchings_notes.pdf or Wikipedia for overview: http://en.wikipedia.org/wiki/Perfect_matching#Maximum_bipartite_matchings) This way you will end-up with a O(NM max(N,M)) algoritms, where N and M are sizes of your sets of points.
Off the top of my head, spatial sort followed by simulated annealing.
Grid the space & sort the sets into spatial cells.
Solve the O(NM) problem within each cell, then within cell neighborhoods, and so on, to get a trial matching.
Finally, run lots of cycles of simulated annealing, in which you randomly alter matches, so as to explore the nearby space.
This is heuristic, getting you a good answer though not necessarily the best, and it should be fairly efficient due to the initial grid sort.
Although I don't really have an answer to your question, I can suggest looking into the following topics. (I know very little about this, but encountered it previously on Stack Overflow.)
Nearest Neighbour Search
kd-tree
Hope this helps a bit.