Find representative vertices in a graph - algorithm

For some project in computer vision I have N points in high-dimensional space. I want to select k of them that will be "the most distinguishable" from each other. For example, it can translate to sum of distances between chosen points is maximum. Or it can be that volume of polyhedron is maximum. But generally anything that has some intuition behind can go.
As expected I want to find these representative points.
There are two questions:
What definition for "the most distinguishable" points is more commonly used? Do they change the algorithm used to find those points?
What is the algorithm to find the points? It highly reminds me maximal weighted clique problem. Is it NP-hard problem? In this case can we make some good approximation against optimal solution?

The way you define "the most distinguishable" will definitely affect the algorithm you'll want to use. for example, you can define "the most distinguishable" as the set with the maximal sum of distances between any two points in the set, but you could also define it as the set with the maximal minimum distance between any two points. these are two completely different problems.
As for algorithms, as I've said, that depends on your definition. If you're looking to find the K farthest points, you should look into this question. This problem is NP-Complete, but you may get some ideas about how to approach the problem.

Related

Algorithm for independent set of a graph?

is there an algorithm for finding all the independent sets of an directed graph ?
From what i've read an independent set represents a set formed by the nodes that are not adjacent.
So for this example I would have {1} {2} {1,3}
So how is possible to find all of them, I am thinking about something recursive but I don't really know the algorithm, if someone could point me in the right direction it would be much appreciated !
Thank you!
Typical way to find independent sets is to consider the complement of a graph. A complement of a graph is defined as a graph with the same set of vertices and an edge between a pair if and only if there is no edge between them in the original graph. An independent set in the graph corresponds to a clique in the complements. Finding all the cliques is exponential in complexity so you can not improve brute force much. Still I believe considering the complement of the graph may make the problem easier to deal with.
Other than complement and finding cliques, I can also think about "Graph Coloring", you color the vertices somehow that no two adjacent vertices have the same color (you can do it with a very simple heuristic algorithm like SL = Smallest Last), and then choose vertices in every color as a subset (as a maximal independent subset).
The only problem is that there are probably too many ways of coloring a graph. You have to keep all the found (maximal) independent sets and move on until you get enough sets!
The Bron–Kerbosch algorithm is commonly used for this problem, see the Wikipedia article for a description and pseudocode that can be turned into a useable program without too much problem. The size of output is, in the worst case, exponential in the number of vertices, but brute force will always be exponential while BK will be polynomial if the output is polynomial. In other words if you know that the output will be reasonable then BK will produce it in a reasonable time. This is an active area of research and there are a number of other algorithms that do the same thing with varying efficiency depending of the type and size of graph. There are applications in several areas, in particular genetics.

Best subsample in the Maxmin distance sense

I have a set of N points in a D-dimensional metric space. I want to select K of them in such a way that the smallest distance between any two points in the subset is the largest.
For instance, with N=4 and K=3 in 3D Euclidean space, the solution is the face of the tetrahedron having the longest short side.
Is there a classical way to achieve that ? Can it be solved exactly in polynomial time ?
I have googled as much as I could, but I have not figured out yet how to call this problem.
In my case N=50, K=10 and D=300 typically.
Clarification:
A brute force approach would be to try every combination of K points among the N and determine the closest pair in every subset. The solution is given by the subset that yields the longest pair.
Done the trivial way, an O(K^2) process, to be repeated N! / K!(N-K)! times.
Hum, 10^2 50! / 10! 40! = 1027227817000
I think you might find papers on unit disk graphs informative but discouraging. For instance, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.84.3113&rep=rep1&type=pdf states that the maximum independent set problem on unit disk graphs in NP-complete, even if the disk representation is known. A unit disk graph is the graph you get by placing points in the plane and forming links between every pair of points at most a unit distance apart.
So I think that if you could solve your problem in polynomial time you could run it on a unit disk graph for different values of K until you find a value at which the smallest distance between two chosen points was just over one, and I think this would be a maximum independent set on the unit disk graph, which would be solving an NP-complete problem in polynomial time.
(Just about to jump on a bicycle so this is a bit rushed, but searching for papers on unit disk graphs might at least turn up some useful search terms)
Here's an attempt to explain it piece by piece:
Here is another attempt to relate the two problems.
For maximum independent set see http://en.wikipedia.org/wiki/Maximum_independent_set#Finding_maximum_independent_sets. A decision problem version of this is "Are there K vertices in this graph such that no two are joined by an edge?" If you can solve this you can certainly find a maximum independent set by finding the largest K by asking this question for different K and then finding the K nodes by asking the question on versions of the graph with one or more nodes deleted.
I state without proof that finding the maximum independent set in a unit disk graph is NP-complete. Another reference for this is http://web.sau.edu/lilliskevinm/wirelessbib/ClarkColbournJohnson.pdf.
A decision version of your problem is "Do there exist K points with distance at least D between any two points?" Again, you can solve this in polynomial time iff you can solve your original problem in polynomial time - play around until you find the largest D that gives answer yes, and then delete points and see what happens.
A unit disk graph has an edge exactly when the distance between two points is 1 or less. So if you could solve the decision version of your original problem you could solve the decision version of the unit disk graph problem just by setting D = 1 and solving your problem.
So I think I have constructed a series of links showing that if you could solve your problem you could solve an NP-complete problem by turning it into your problem, which causes me to think that your problem is hard.

How to find the minimum cost of linking two sets of points

I got two sets of points S and V, both have the size n. I want to link the two sets so that every point in S links to one and only one point in V. The cost to link two points is defined as the Euclidean distance between the two points. There should be n! possible ways to link. So how to find the way of minimum cost? (in an efficient way)
This is an assignment problem. You can solve it with the Hungarian Method. There are implementations of this in python. You can also solve the problem with any linear programming solver. The LP formulation will always give you an integer solution.

Sorting points such that the minimal Euclidean distance between consecutive points would be maximized

Given a set of points in a 3D Cartesian space, I am looking for an algorithm that will sort these points, such that the minimal Euclidean distance between two consecutive points would be maximized.
It would also be beneficial if the algorithm tends to maximize the average Euclidean distance between consecutive points.
Edit:
I've crossposted on https://cstheory.stackexchange.com/ and got a good answer. See https://cstheory.stackexchange.com/questions/8609/sorting-points-such-that-the-minimal-euclidean-distance-between-consecutive-poin.
Here is a lower bound for the cost of the solution, which might serve as a building block for branch and bound or a more unreliable incomplete search algorithm:
Sort the distances between the points and consider them in non-increasing order. Use http://en.wikipedia.org/wiki/Disjoint-set_data_structure to keep track of sets of points, merging two sets when connected by a link between two points. The length of the shortest distance you encounter up to the point when you merge all the points into one set is an upper bound to the minimum distance in a perfect solution, because a perfect solution also merges all the points into one. However your upper bound may be longer than the minimum distance for a perfect solution, because the links you are joining up will probably form a tree, not a path.
You can model your problem by graph, draw line between your points, now you have a complete graph, now your problem is finding longest path in this graph which is NP-Hard, see wiki for longest path.
In fact I answered a second part of problem, maximize average, which means maximize path which goes from every node of graph, if you weight them as 1/distance it will be a travelling salesman problem (minimize the path length) and is NP-Hard. and for this case may be is useful to see Metric TSP approximation.

Optimization from partial solution: minimize sum of distances between pairs

I have a problem which I like and I love to think about solutions, but I'm stuck unfortunately. I hope you like it too. The problem states:
I have two lists of 2D points(say A and B) and need to pair up points from A with points from B, under the condition that the sum of the distances in all pairs is minimal. A pair contains one point from A and one from B, a point can be used only once, and as many as possible pairs should be created(i.e. min(length(A), length(B))).
I've made a simple example, where color denotes which list the point is from, and the black connections are the solution.
Although this is a nice problem and I suspect is NP-hard, it gets nicer. I can build on existing solutions. Suppose I have two lists and the corresponding solution(i.e. the set of pairs), then the problem I need to solve is to reoptimalize that solution when a point is added to or removed from either list.
I've unfortunately not been able to come up with any non-brute force algorithm yielding the optimal solution. I hope you can. Any algorithm is appreciated in any (pseudo) language, preferably C#.
This problem is solvable in polynomial time via the Hungarian algorithm. To get a square matrix, add dummy entries to the shorter list at "distance 0" from everything.
Your problem is an instance of the weighted minimum maximal matching problem (as described in this Wikipedia article). There is no polynomial-time algorithm even for the unweighted problem (all distances equal). There are efficient algorithms to approximately solve it in polynomial time (within a factor of 2).
This is the minimum weight Euclidean bipartite matching problem. There is a O(n^(2+epsilon)) algorithm.

Resources