Combinatorial optimization - algorithm

Suppose we have a connected and undirected graph: G=(V,E).
Definition of connected-set: a group of points belonging to V of G forms a valid connected-set iff every point in this group is within T-1 edges away from any other point in the same group, T is the number of points in the group.
Pls note that a connected set is just a connected subgraph of G without the edges but with the points.
And we have an arbitrary function F defined on connected-set, i.e given an arbitrary connected-set CS F(CS) will give us a real value.
Two connected-sets are said disjoint if their union is not a connected set.
For an visual explanation, pls see the graph below:
In the graph, the red,black,green point sets are all valid connected-sets, green set is disjoint to red set, but black set is not disjoint to the red one.
Now the question:
We want to find a bunch of disjoint connected-sets from G so that:
(1)every connected-set has at least K points. (K is a global parameter).
(2)the sum of their function values,i.e max(Σ F(CS)) are maximized.
Is there any efficient algorithm to tackle such a problem other than an exhaustive search?
Thx!
For example, the graph can be a planar graph in the 2D Euclidean plane, and the function value F of a connected-set CS can be defined as the area of the minimum bounding rectangle of all the points in CS(minimum bounding rectangle is the smallest rectangle enclosing all the points in the CS).

If you can define your function and prove it is a Submodular Function (property analogous to that of Convexity in continuous Optimization) then there are very efficient (strongly polynomial) algorithms that will solve your problem e.g. Minimum Norm Point.
To prove that your function is Submodular you only need to prove the following:
There are several available implementations of the Minimum Norm Point algorithm e.g. Matlab Toolbox for Submodular Function Optimization

I doubt there is an efficient algorithm since for a complete graph for instance, you cannot solve the problem without knowing the value of F on every subgraph (except if you have assumptions on F: monotonicity for instance).
Nevertheless, I'd go for a non deterministic algorithm. Try simulated annealing, with transitions being:
Remove a point from a set (if it stays connected)
Move a point from a set to another (if they stay connected)
Remove a set
Add a set with one point
Good luck, this seems to be a difficult problem.

For such a general F, it is not an easy task to draft an optimized algorithm, far from the brute force approach.
For instance, since we want to find a bunch of CS where F(CS) is maximized, should we assume we want actually to find max(Σ F(CS)) for all CS or the highest F value from all possible CS, max(F(csi))? We don't know for sure.
Also, F being arbitrary, we cannot estimate the probability of having F(cs+p1) > F(cs) => F(cs+p1+p2) > F(cs).
However, we can still discuss it:
It seems we can deduce from the problem that we can treat each CS independently, meaning if n = F(cs1) adding any cs2 (being disjoint from cs1) will have no impact on the n value.
It seems also believable, and this is where we should be able to get some gain, that the calculation of F can be made starting from any point of a CS, and, in general, if CS = cs1+cs2, F(CS) = F(cs1+cs2) = F(cs2+cs1).
Then we want to inject memoization in the algorithm in order to speed up the process when a CS is grown up little by little in order to find max(F(cs)) [considering F general, the dynamic programming approach, for instance starting from a CS made of all points, then reducing it little by little, doesn't seem to have a big interest].
Ideally, we could start with a CS made of a point, extending it by one, checking and storing F values (for each subset). Each test would first check if the F value exists in order not to calculate it ; then repeat the process for another point etc..., find the best subsets that maximize F. For a large number of points, this is a very lengthy experience.
A more reasonable approach would be to try random points and grow the CS up to a given size, then try another area distinct from the bigger CS obtained at the previous stage. One could try to assess the probability explained above, and direct the algorithm in a certain way depending on the result.
But, again due to lack of F properties, we can expect an exponential space need via memoization (like storing F(p1,...,pn), for all subsets). And an exponential complexity.

I would use dynamic programming. You can start out rephrasing your problem as a node coloring problem:
Your goal is to assign a color to each node. (In other words you are looking for a coloring of the nodes)
The available colors are black and white.
In order to judge a coloring you have to examine the set of "maximal connected sets of black nodes".
A set of black nodes is called connected if the induced subgraph is connected
A connected set of black nodes is called maximal none of the nodes in the set has a black neighbor in the original graph that is not contained in the set)
Your goal is to find the coloring that maximizes ΣF(CS). (Here you sum over the "maximal connected sets of black nodes")
You have some extra constraints are specified in your original post.
Perhaps you could look for an algorithm that does something like the following
Pick a node
Try to color the chosen node white
Look for a coloring of the remaining nodes that maximizes ΣF(CS)
Try to color the chosen node black
Look for a coloring of the remaining nodes that maximizes ΣF(CS)
Each time you have colored a node white then you can examine whether or not the graph has become "decomposable" (I made up this word. It is not official):
A partially colored graph is called "decomposable" if it contains a pair of none-white nodes that are not connected by any path that does not contain a white node.
If your partially colored graph is decomposable then you can split your problem in to two sub-problems.
EDIT: I added an alternative idea and deleted it again. :)

Related

Shortest path in a maze

I'm developing a game similar to Pacman: consider this maze:
Each white square is a node from the maze where an object located at P, say X, is moving towards node A in the right-to-left direction. X cannot switch to its opposite direction unless it encounters a dead-end such as A. Thus the shortest path joining P and B goes through A because X cannot reverse its direction towards the rightmost-bottom node (call it C). A common A* algorithm would output:
to get to B from P first go rightward, then go upward;
which is wrong. So I thought: well, I can set the C's visited attribute to true before running A* and let the algorithm find the path. Obviously this method doesn't work for the linked maze, unless I allow it to rediscover some nodes (the question is: which nodes? How to discriminate from useless nodes?). The first thinking that crossed my mind was: use the previous method always keeping track of the last-visited cell; if the resulting path isn't empty, you are done. Otherwise, when you get to the last-visited dead-end, say Y, (this step is followed by the failing of A*) go to Y, then use standard A* to get to the goal (I'm assuming the maze is connected). My questions are: is this guaranteed to work always? Is there a more efficient algorithm, such as an A*-derived algorithm modified to this purpose? How would you tackle this problem? I would greatly appreciate an answer explaining both optimal and non-optimal search techniques (actually I don't need the shortest path, a slightly long path is good, but I'm curious if such an optimal algorithm running as efficiently as Dijkstra's algorithm exists; if it does, what is its running time compared to a non-optimal algorithm?)
EDIT For Valdo: I added 3 cells in order to generalize a bit: please tell me if I got the idea:
Good question. I can suggest the following approach.
Use Dijkstra (or A*) algorithm on a directed graph. Each cell in your maze should be represented by multiple (up to 4) graph nodes, each node denoting the visited cell in a specific state.
That is, in your example you may be in the cell denoted by P in one of 2 states: while going left, and while going right. Each of them is represented by a separate graph node (though spatially it's the same cell). There's also no direct link between those 2 nodes, since you can't switch your direction in this specific cell.
According to your rules you may only switch direction when you encounter an obstacle, this is where you put links between the nodes denoting the same cell in different states.
You may also think of your graph as your maze copied into 4 layers, each layer representing the state of your pacman. In the layer that represents movement to the right you put only links to the right, also w.r.t. to the geometry of your maze. In the cells with obstacles where moving right is not possible you put links to the same cells at different layers.
Update:
Regarding the scenario that you described in your sketch. It's actually correct, you've got the idea right, but it looks complicated because you decided to put links between different cells AND states.
I suggest the following diagram:
The idea is to split your inter-cell AND inter-state links. There are now 2 kinds of edges: inter-cell, marked by blue, and inter-state, marked by red.
Blue edges always connect nodes of the same state (arrow direction) between adjacent cells, whereas red edges connect different states within the same cell.
According to your rules the state change is possible where the obstacle is encountered, hence every state node is the source of either blue edges if no obstacle, or red if it encounters an obstacle (i.e. can't emit a blue edge). Hence I also painted the state nodes in blue and red.
If according to your rules state transition happens instantly, without delay/penalty, then red edges have weight 0. Otherwise you may assign a non-zero weight for them, the weight ratio between red/blue edges should correspond to the time period ratio of turn/travel.

Merge adjacent vertices of a graph until single vertex left in the fewest steps possible

I have a game system that can be represented as an undirected, unweighted graph where each vertex has one (relevant) property: a color. The goal of the game in terms of the graph representation is to reduce it down to one vertex in the fewest "steps" possible. In each step, the player can change the color of any one vertex, and all adjacent vertices of the same color are merged with it. (Note that in the example below I just happened to show the user only changing one specific vertex the whole game, but the user can pick any vertex in each step.)
What I am after is a way to compute the fewest amount of steps necessary to "beat" a given graph per the procedure described above, and also provide the specific moves needed to do so. I'm familiar with the basics of path-finding, BFS, and things of that nature, but I'm having a hard time framing this problem in terms of a "fastest path" solution.
I am unable to find this same problem anywhere on Google, or even a graph-theory term that encapsulates the problem. Does anyone have an idea of at least how to get started approaching this problem? Can anyone point me in the right direction?
EDIT Since this problem seems to be really difficult to solve efficiently, perhaps I could change the aim of my question. Could someone describe how I would even set up a brute force, breadth first search for this? (Brute force could possibly be okay, since in practice these graphs will only be 20 vertices at most.) I know how to write a BFS for a normal linked graph data structure... but in this case it seems quite weird since each vertex would have to contain a whole graph within itself, and the next vertices in the search graph would have to be generated based on possible moves to make in the graph within the vertex. How would one setup the data structure and search algorithm to accomplish this?
EDIT 2 This is an old question, but I figured it might help to just state outright what the game was. The game was essentially to be a rip-off of Kami 2 for iOS, except my custom puzzle editor would automatically figure out the quickest possible way to solve your puzzle, instead of having to find the shortest move number by trial and error yourself. I'm not sure if Kami was a completely original game concept, or if there is a whole class of games like it with the same "flood-fill" mechanic that I'm unaware of. If this is a common type of game, perhaps knowing the name of it could allow finding more literature on the algorithm I'm seeking.
EDIT 3 This Stack Overflow question seems like it may have some relevant insights.
Intuitively, the solution seems global. If you take a larger graph, for example, which dot you select first will have an impact on the direct neighbours which will have an impact on their neighbours and so on.
It sounds as if it were of the same breed of problems as the map colouring problem. Not because of the colours but because of the implications of a local selection to the other end of the graph down the road. In the map colouring, you have to decide what colour to draw a country and its neighbouring countries so two countries that touch don't have the same colour. That first set of selections have an impact on whether there is a solution in the subsequent iterations.
Just to show how complex problem is.
Lets check simpler problem where graph is changed with a tree, and only root vertex can change a colour. In that case path to a leaf can be represented as a sequence of colours of vertices on that path. Sequence A of colour changes collapses a leaf if leaf's sequence is subsequence of A.
Problem can be stated that for given set of sequences problem is to find minimal length sequence (S) so that each initial sequence is contained in S. That is called shortest common supersequence problem, and it is NP-complete.
Your problem is for sure more complex than this one :-/
Edit *
This is a comment on question's edit. Check this page for a terms.
Number of minimal possible moves is >= than graph radius. With that it seems good strategy to:
use central vertices for moves,
use moves that reduce graph radius, or at least reduce distance from central vertices to 'large' set of vertices.
I would go with a strategy that keeps track of central vertices and distances of all graph vertices to these central vertices. Step is to check all meaningful moves and choose one that reduce radius or distance to central vertices the most. I think BFS can be used for distance calculation and how move influences them. There are tricky parts, like when central vertices changes after moves. Maybe it is good idea to use not only central vertices but also vertices close to central.
I think the graph term you are looking for is the "valence" of a graph, which is the number of edges that a node is connected to. It looks like you want to change the color based on what node has the highest valence. Then in the resulting graph change the color for the node that has the highest valence, etc. until you have just one node left.

How to select points at a regular density

how do I select a subset of points at a regular density? More formally,
Given
a set A of irregularly spaced points,
a metric of distance dist (e.g., Euclidean distance),
and a target density d,
how can I select a smallest subset B that satisfies below?
for every point x in A,
there exists a point y in B
which satisfies dist(x,y) <= d
My current best shot is to
start with A itself
pick out the closest (or just particularly close) couple of points
randomly exclude one of them
repeat as long as the condition holds
and repeat the whole procedure for best luck. But are there better ways?
I'm trying to do this with 280,000 18-D points, but my question is in general strategy. So I also wish to know how to do it with 2-D points. And I don't really need a guarantee of a smallest subset. Any useful method is welcome. Thank you.
bottom-up method
select a random point
select among unselected y for which min(d(x,y) for x in selected) is largest
keep going!
I'll call it bottom-up and the one I originally posted top-down. This is much faster in the beginning, so for sparse sampling this should be better?
performance measure
If guarantee of optimality is not required, I think these two indicators could be useful:
radius of coverage: max {y in unselected} min(d(x,y) for x in selected)
radius of economy: min {y in selected != x} min(d(x,y) for x in selected)
RC is minimum allowed d, and there is no absolute inequality between these two. But RC <= RE is more desirable.
my little methods
For a little demonstration of that "performance measure," I generated 256 2-D points distributed uniformly or by standard normal distribution. Then I tried my top-down and bottom-up methods with them. And this is what I got:
RC is red, RE is blue. X axis is number of selected points. Did you think bottom-up could be as good? I thought so watching the animation, but it seems top-down is significantly better (look at the sparse region). Nevertheless, not too horrible given that it's much faster.
Here I packed everything.
http://www.filehosting.org/file/details/352267/density_sampling.tar.gz
You can model your problem with graphs, assume points as nodes, and connect two nodes with edge if their distance is smaller than d, Now you should find the minimum number of vertex such that they are with their connected vertices cover all nodes of graph, this is minimum vertex cover problem (which is NP-Hard in general), but you can use fast 2-approximation : repeatedly taking both endpoints of an edge into the vertex cover, then removing them from the graph.
P.S: sure you should select nodes which are fully disconnected from the graph, After removing this nodes (means selecting them), your problem is vertex cover.
A genetic algorithm may probably produce good results here.
update:
I have been playing a little with this problem and these are my findings:
A simple method (call it random-selection) to obtain a set of points fulfilling the stated condition is as follows:
start with B empty
select a random point x from A and place it in B
remove from A every point y such that dist(x, y) < d
while A is not empty go to 2
A kd-tree can be used to perform the look ups in step 3 relatively fast.
The experiments I have run in 2D show that the subsets generated are approximately half the size of the ones generated by your top-down approach.
Then I have used this random-selection algorithm to seed a genetic algorithm that resulted in a further 25% reduction on the size of the subsets.
For mutation, giving a chromosome representing a subset B, I randomly choose an hyperball inside the minimal axis-aligned hyperbox that covers all the points in A. Then, I remove from B all the points that are also in the hyperball and use the random-selection to complete it again.
For crossover I employ a similar approach, using a random hyperball to divide the mother and father chromosomes.
I have implemented everything in Perl using my wrapper for the GAUL library (GAUL can be obtained from here.
The script is here: https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_density.pl
It accepts a list of n-dimensional points from stdin and generates a collection of pictures showing the best solution for every iteration of the genetic algorithm. The companion script https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_gen.pl can be used to generate the random points with a uniform distribution.
Here is a proposal which makes an assumption of Manhattan distance metric:
Divide up the entire space into a grid of granularity d. Formally: partition A so that points (x1,...,xn) and (y1,...,yn) are in the same partition exactly when (floor(x1/d),...,floor(xn/d))=(floor(y1/d),...,floor(yn/d)).
Pick one point (arbitrarily) from each grid space -- that is, choose a representative from each set in the partition created in step 1. Don't worry if some grid spaces are empty! Simply don't choose a representative for this space.
Actually, the implementation won't have to do any real work to do step one, and step two can be done in one pass through the points, using a hash of the partition identifier (the (floor(x1/d),...,floor(xn/d))) to check whether we have already chosen a representative for a particular grid space, so this can be very, very fast.
Some other distance metrics may be able to use an adapted approach. For example, the Euclidean metric could use d/sqrt(n)-size grids. In this case, you might want to add a post-processing step that tries to reduce the cover a bit (since the grids described above are no longer exactly radius-d balls -- the balls overlap neighboring grids a bit), but I'm not sure how that part would look.
To be lazy, this can be casted to a set cover problem, which can be handled by mixed-integer problem solver/optimizers. Here is a GNU MathProg model for the GLPK LP/MIP solver. Here C denotes which point can "satisfy" each point.
param N, integer, > 0;
set C{1..N};
var x{i in 1..N}, binary;
s.t. cover{i in 1..N}: sum{j in C[i]} x[j] >= 1;
minimize goal: sum{i in 1..N} x[i];
With normally distributed 1000 points, it didn't find the optimum subset in 4 minutes, but it said it knew the true minimum and it selected only one more point.

Solving a picture jumble!

What algorithm would you make a computer use, if you want to solve a picture jumble?
You can always argue in terms of efficiency of the algorithm, but here I am really looking at what approaches to follow.
Thanks
What you need to do is to define an indexing vocabulary for each face of a jigsaw puzzle, such that the index of a right-facing edge can can tell you what the index of a corresponding left-facing edge is (e.g, a simple vocabulary: "convex" and "concave", with "convex" on a face implying "concave" on a matching opposite face), and then classify each piece according to the indexing vocabulary. The finer the vocabulary, the more discrimantory the face-matching and the faster your algorthm will go, however you implement it. (For instance, you may have "flat edge, straight-edge-leans-left, straight-edge-leans-right, concave, convex, knob, knob-hole, ...). We assume that the indexing scheme abstracts the actual shape of the edge, and that there is a predicate "exactly-fits(piece1,edge1,piece2,edge2)" that is true only if the edges exactly match. We further assume that there is at most one exact match of a piece with a particular edge.
The goal is grow a set of regions, e.g., a set of connected pieces, until it is no longer possible to grow the regions. We first mark all pieces with unique region names, 1 per piece, and all edges as unmatched. Then we enumerate the piece edges in any order. For each enumerated piece P with edge E, use the indexing scheme to select potentially matching piece/edge pairs. Check the exactly-fits predicate; at most one piece Q, with edge F, exactly-matches. Combine the regions for P and Q together to make a large region. Repeat. I think this solves the puzzle.
Solving a jigsaw puzzle can basically be reduced to matching like edges to like edges. In a true jigsaw puzzle only one pair of pieces can properly be interlocked along a particular edge, and each piece will have corners so that you know where a particular side starts and ends.
Thus, just find the endpoints of each edge and select a few control points. Then iterate through all pieces with unbound edges until you find the right one. When there are no more unbound edges, you have solved the puzzle.
To elaborate on Ira Baxter's answer, another way of conceptualizing the problem is to think about the jigsaw puzzle as a graph, where each piece is a node and each iterface with another piece is an edge.
For example if you were designing a puzzle game, storing the "answer" this way would make "check if this fits" code a lot faster, since it could be reduced to some sort of hash-lookup.
.1 Find 2x2-grams such that all four edges will fit. Then, evaluate how well the image content matches with each other.
P1 <--> P2
^ ^
| |
v v
P3 <--> P4
.2 Tag orientations (manually or heuristically), but only use them as heuristics scores (for ranking), not as definitive search criteria.
.3 Shape Context

Mapping a set of 3D points to another set with minimal sum of distances

Given are two sets of three-dimensional points, a source and a destination set. The number of points on each set is arbitrary (may be zero). The task is to assign one or no source point to every destination point, so that the sum of all distances is minimal. If there are more source than destination points, the additional points are to be ignored.
There is a brute-force solution to this problem, but since the number of points may be big, it is not feasible. I heard this problem is easy in 2D with equal set sizes, but sadly these preconditions are not given here.
I'm interested in both approximations and exact solutions.
Edit: Haha, yes, I suppose it does sound like homework. Actually, it's not. I'm writing a program that receives positions of a large number of cars and i'm trying to map them to their respective parking cells. :)
One way you could approach this problem is to treat is as the classical assignment problem: http://en.wikipedia.org/wiki/Assignment_problem
You treat the points as the vertices of the graph, and the weights of the edges are the distance between points. Because the fastest algorithms assume that you are looking for maximum matching (and not minimum as in your case), and that the weights are non-negative, you can redefine weights to be e.g.:
weight(A, B) = bigNumber- distance(A,B)
where bigNumber is bigger than your longest distance.
Obviously you end up with a bipartite graph. Then you use one of the standard algorithms for maximum weighted bipartite matching (lots of resources on the web, e.g. http://valis.cs.uiuc.edu/~sariel/teach/courses/473/notes/27_matchings_notes.pdf or Wikipedia for overview: http://en.wikipedia.org/wiki/Perfect_matching#Maximum_bipartite_matchings) This way you will end-up with a O(NM max(N,M)) algoritms, where N and M are sizes of your sets of points.
Off the top of my head, spatial sort followed by simulated annealing.
Grid the space & sort the sets into spatial cells.
Solve the O(NM) problem within each cell, then within cell neighborhoods, and so on, to get a trial matching.
Finally, run lots of cycles of simulated annealing, in which you randomly alter matches, so as to explore the nearby space.
This is heuristic, getting you a good answer though not necessarily the best, and it should be fairly efficient due to the initial grid sort.
Although I don't really have an answer to your question, I can suggest looking into the following topics. (I know very little about this, but encountered it previously on Stack Overflow.)
Nearest Neighbour Search
kd-tree
Hope this helps a bit.

Resources