Using nondeterminism to detect cliques? - algorithm

I am trying to understand non-determinism with the clique-problem.
In computer science, the clique problem refers to any of the problems related to
finding particular complete subgraphs ("cliques") in a graph, i.e., sets of
elements where each pair of elements is connected.
Say I have a graph with nodes A, B, C, D, E, F and i want to decide if a clique of 4 exists.
My understanding of non-determinism is to make a guess by taking four nodes (B, C, D, F) and check if a connection exists between all 4 nodes. If it exists, I conclude that a clique exists and if doesn't, I conclude a clique does not exist.
What I am not sure of however is how this helps solve the problem as I just might have made the wrong choice.
I guess I am trying to understand the application of non-determinism in general.

Nondeterministic choices are different from random or arbitrary choices. When using nondeterminism, if any possible choice that can be made will lead to the algorithm outputting YES, then one of those choices will be selected. If no choice exists that does this, then an arbitrary choice will be made.
If this seems like cheating, in a sense it is. It's unknown how to implement nondeterminism efficiently using a deterministic computer, a randomized algorithm, or parallel computers that have lots of processors but which can only do a small amount of work on each core. These are the P = NP, BPP = NP, and NC = NP questions, respectively. Accordingly, nondeterminism is primarily a theoretical approach to problem solving.
Hope this helps!

Related

The problem complexity of the Maximum Coverage problem with set size constraint (P or NP)

The classic Maximum Coverage (MC) problem is an NP-hard optimization problem. Consider d elements U = {e1, e2, ... ed} and c sets T1, T2 ... Tc. Each set contains some elements in U. The problem aims to find at most b sets, such that the cardinality of the union of these sets is maximized.
For example, T1={e1, e3}, T2={e1, e2, e3} and T3={e3, e4}. When b=2, the optimal solution picks T2 and T3.
I am considering a variation of the classic MC problem, which imposes a set size constraint. Consider 1 < k <= d, if the size of all sets is bounded by k. Call this problem k-MC. Is the problem still NP-hard?
My conjecture is that k-MC is still NP-hard, but I am struggling to come up with a polynomial reduction from a proven NP-hard problem, like MC.
For an arbitrary instance of Maximum coverage, if I could find a polynomial reduction to my problem for all k>1, I can conclude that my problem is also NP-hard.
Here is what I got so far:
When k=d, the problem is trivially equivalent to the classic Maximum Coverage.
When k=d-1, we look at the given MC instance and see if there exist a set with size d. If there is, simply pick that. Otherwise, it reduces to the k-MC problem with k=d-1.
When k is less than d-1, I resort to dynamic programming to complete the reduction. However, this yields a non-polynomial time reduction, which defeat the purpose of reduction from a NP-hard problem.
If anyone could give me some pointers on how I should tackle this problem, or even just make an educated guess on the problem complexity of k-MC (P or NP), I'd really appreciate it.
2-MC is easy -- interpret the sets of size 2 as a graph and run your favorite matching algorithm for non-bipartite graphs. Once you exceed the matching cardinality, you're stuck picking singletons.
3-MC is hard. You can encode an instance of 3-partition as 3-MC by taking the sets to be the triples that sum to the target, then decide if it's solvable by checking coverage for b = n/3.

Cost of building a "connected matrix"

I'm sure there is an abundance of information on how to do exactly what I'm after, but it's a matter of not knowing the technical term for it. Basically what I want to create is an adjacency matrix for a directed graph, however rather than simply storing whether or not each vertex pair has a direct adjacency, for every vertex pair in the matrix I want to store if there is ANY path connecting the two (and what those paths are).
This would give me constant time complexity for lookups which is desirable, however what's not immediately clear to me is what the expected optimal time complexity of building this matrix will be.
Also, is there a formal name for such a matrix?
Playing this out in my head, it seems like a dynamic programming problem. If I want to know if A is connected to Z, I should be able to ask each of A's neighbors, B, C and D if they are (in some way) connected to Z, and if so, then I know A is. And if B doesn't have this answer stored, then he would ask the same question of his direct neighbors, and so on. I would memoize the results along the way, so subsequent lookups would be constant.
I haven't spent time to implement this yet, because it feels like ϴ(n^n) to build a complete matrix, so my question is whether or not I'm going about this the right way, and if indeed there is a lower-cost way to build such a matrix?
The transitive closure of a graph (https://en.wikipedia.org/wiki/Transitive_closure#In_graph_theory) can indeed be computed by dynamic programming with a variation of Floyd Warshall algorithm: https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm.
Using |V| DFS (or BFS) is more efficient, though.
Using networkx connected components
G = nx.path_graph(4)
G.add_path([10, 11, 12])
d = {}
for group in idx, group in enumerate(nx.connected_components(G)):
for node in group:
d[node] = idx
def connected(node1, node2):
return d[node1]==d[node2]
Generation should be O(N) lookup should be O(1)

optimal search algorithm without admissible heuristic

Please forgive me if I'm not using the correct terms or have overlooked an existing solution. I'm not experienced in search algorithms and the theories behind it. I just would like to solve a problem.
I've previously used what I was told to be the A* algorithm to solve a different problem. But reading up on it I've realized that what I learned is not quite what wikipedia tells me.
What I learned was:
Start at your origin node
Open a new solution for each path you can take
Recursively create a new subsolution for each path you can take from there
When you arrive at the same place with multiple solutions, drop those who took longer than the fastest
Now if I understand wikipedia correctly, this is what I was supposed to do:
Start at your origin node
Open a new solution for each path you can take
Order the solutions by "cost of path taken" + "estimated cost to target"
Take cheapest solution and create subsolutions for each possible path
order those solutions into the others then rinse repeat
I can see how this would help with not calculating quite as many solutions but my problem is that I see no possiblity to create an "optimistic" estimate.
I'm not searching for a path on a geographical map. I'm trying to find the best sequence of actions. There's a minimum sequence of - say - ABCDEFGH. You cannot do F before E but repeating previous actions in particilar ordering might make later actions more efficient.
Do I need a different search algorithm? Do I do what I originally learned and just live with the fact that doing more work is the price for not having a good heuristic function?
I believe my teacher recognized this problem. And what I learned was simply A* with a heuristic function of f(n) = 0.
I'm not searching for a path on a geographical map. I'm trying to find
the best sequence of actions. There's a minimum sequence of - say -
ABCDEFGH. You cannot do F before E but repeating previous actions in
particular ordering might make later actions more efficient.
It is not clear to me whether you can repeat one action, i.e., a solution is ABCDEFGH, but would ABBBBCDEFGH be possible?
If not, then you might be able to have A* algorithm, implemented like this:
1. At some stage (say the first, "empty"), you have one of several actions
available.
2. The cost of going from Empty City to A City is the cost of action A.
3. The cost of going from Empty City to B city is the cost of action B.
When you've reached B, the cost of doing C is constant (if it is not, then you can't use A* as is) and you insert the cost of going from B City to C City as the cost of C.
So you can handle the case in which an action has different costs, provided that this difference is completely described by the previous state. For example, if you can only do C if you have done A or B, and the cost of C is 5 and 8, you enter the "distance" between A and C as 5, and B to C as 8.
If the cost of, say, D depends on the two previous states, you can still use a more complicated A* implementation where you define the virtual "cities" BC, AB and AC, and the distance from BC to D is "the cost of D having done B and C", and so on. The cost of reaching BC from A is "the cost of B given A, and the cost of C given A and B". So if these costs depend on the previous states, things get even more complicated.
In the end, the complexity of this revised A* will grow until it becomes your algorithm, where every state depends potentially on the sequence of all preceding states. The more this is true, the more your algorithm is convenient; the more every state is a cost unto itself, the more A* is convenient.
And of course the possibility of closed loops (visiting the same state/action twice, making this a cyclic graph) blows A* straight out of the water.

Affect m students to n groups, but with constraints?

I asked about the minimum cost maximum flow several weeks ago. Kraskevich's answer was brilliant and solved my problem. I've implemented it and it works fine (available only in French, sorry). Additionaly, the algorithm can handle the assignement of i (i > 1) projects to each student.
Now I'm trying something more difficult. I'd like to add constraints on choices. In the case one wants to affect i (i > 1) projects to each student, I'd like to be able to specify which projects are compatible (each other).
In the case some projects are not compatible, I'd like the algorithm to return the global optimum, i.e. affect i projects to each student maximizing global happiness and repecting compatibility constraints.
Chaining i times the original method (and checking constraints at each step) will not help, as it would only return a local optimum.
Any idea about the correct graph to work with ?
Unfortunately, it is not solvable in polynomial time(unless P = NP or there are additional constraints).
Here is a polynomial time reduction from the maximum independent set problem(which is known to be NP-complete) to this one:
Given a graph G and a number k, do the following:
Create a project for each vertex in the graph G and say that two project are incompatible iff there is an edge between the corresponding vertices in G.
Create one student who likes each project equally(we can assume that the happiness each project gives to him is equal to 1).
Find the maximum happiness using an algorithm that solves the problem stated in your question. Let's call it h.
A set of projects can be picked iff they all are compatible, which means that the picked vertices of G form an independent set(due to the way we constructed the graph).
Thus, h is equal to the size of the maximum independent set.
Return h >= k.
What does it mean in practice? It means that it is not reasonable to look for a polynomial time solution to this problem. There are several things that can be done:
If the input is small, you can use exhaustive search.
If it is not, you can use heuristics and/or approximations to find a relatively good solution(not necessary the optimal one, though).
If you can stomach the library dependency, integer programming will be quicker and easier than anything you can implement yourself. All you have to do is formulate the original problem as an integer program and then add your ad hoc constraints at the end.

Comparison-based ranking algorithm

I would like to rank or sort a collection of items (with size potentially greater than 100,000) where items in the collection have no intrinsic (comparable) value, instead all I have is the comparisons between any two items which have been provided by users in a subjective manner.
Example: Consider a collection with elements [a, b, c, d] and comparisons by users b > a, a > d, d > c. The correct order of this collection would be [b, a, d, c].
This example is simple, however there could be more complicated cases:
Since the comparisons are subjective, a user could also say that c > b. In which case that would cause a conflict with the ordering above.
Also you may not have comparisons that “connects” all the items, i.e. b > a, d > c. In which case the ordering is ambiguous. It could be [b, a, d, c] or [d, c, b, a]. In this case either ordering is acceptable.
If possible it would be nice to somehow take into account multiple instances of the same comparison and give those with higher occurrences more weight. But a solution without this condition would still be acceptable.
A similar application of this algorithm was used by Zuckerberg's FaceMash application where he ranked people based on comparisons (if I understood it correctly), but I have not been able to find what that algorithm actually was.
Is there an algorithm which already exists that can solve the problem above? I would not like to spend effort trying to come up with one if that is the case. If there is no specific algorithm, is there perhaps certain types of algorithms or techniques which you can point me to?
This is a problem that has already occurred in another arena: competitive games! Here, too, the goal is to assign each player a global "rank" on the basis of a series of 1 vs. 1 comparisons. The difficulty, of course, is that the comparisons are not transitive (I take "subjective" to mean "provided by a human being" in your question). Kasparov beats Fischer beats (don't know another chess player!) Bob beats Kasparov, potentially.
This renders useless algorithms that rely on transitivity (i.e. a > b and b > c => a > c) as you end up with (likely) a highly cyclic graph.
Several rating systems have been devised to tackle this problem.
The most well-known system is probably the Elo algorithm/score for competitive chess players. Its descendants (for instance, the Glicko rating system) are more sophisticated and take into account statistical properties of the win/loss record---in other words, how reliable is a rating? This is similar to your idea of weighting more heavily records with more "games" played. Glicko also forms the basis for the TrueSkill system used on Xbox Live for multiplayer video games.
You may be interested in the minimum feedback arc set problem. Essentially the problem is to find the minimum number of comparisons that "go the wrong way" if the elements are linearly ordered in some ordering. This is the same as finding the minimum number of edges that must be removed to make the graph acyclic. Unfortunately, solving the problem exactly is NP-hard.
A couple of links that discuss the problem:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.8157&rep=rep1&type=pdf
http://en.wikipedia.org/wiki/Feedback_arc_set
I googled this out, look for chapter 12.3, Topological sorting and Depth-first Search
http://www.cs.cmu.edu/~avrim/451f09/lectures/lect1006.pdf
Your set of relations describe a directed acyclic graph (hopefully acyclic) and so graph topological sorting is exactly what you need.

Resources