Proof that using adjacency matrix for bipartite testing has Ω(n^2) - algorithm

so in one of my lectures I came across the proof for:
𝐓𝐡𝐞𝐨𝐫𝐞𝐦: 𝐴𝑛𝑦 algorithm that determines if a graph is bipartite
that has as its input an undirected graph 𝐺 = (𝑉, 𝐸) represented
as an 𝑛 × 𝑛 adjacency matrix, has the running time of Ω(𝑛^2)
We assume an algorithm ALG which test for bipartiteness (returns either true or false). And we also assume we have a graph 𝐺0 = (𝑉, 𝐸0) with 𝑉 = {1,2, … , 𝑛} and 𝐸0 = { 1, 𝑖 : 2 ≤ 𝑖 ≤ 𝑛} (as this is a star it is a bipartite graph)
Within the proof there's a step saying:
"For a given algorithm ALG, we will construct another graph 𝐺1 st: if ALG performs less than (𝑛−1)C2 accesses to the adjacency matrix 𝐴 of 𝐺0,
then ALG will not distinguish between 𝐺0 and 𝐺1, and 𝐺1 is not bipartite."
My question is what does (n-1)C2 accesses mean. Is it saying that for example if we have a different V = {A,B,C,D} then ALG will look at all node pairs except for the ones between D and the other nodes ?
Sorry if this isn't clear this proof really confused me.

G0 is an n-vertex star graph. It's bipartite, but if you add any other edge to it, the resulting graph is not. There are n−1 choose 2 = (n−1)(n−2)/2 = Ω(n2) other edges that we can add. Every correct algorithm must check every single one in order to verify that G0 is bipartite.

Related

Quadratic-time vertex cover verification

Suppose you are given an undirected graph G with n vertices and m
edges represented by an n x n adjacency matrix A, and you are also
given a subset of vertices S (represented by an array of size m).
How can you check whether S is a vertex cover of G with quadratic
time and space complexity?
By the definition of a vertex cover, I know that we require every edge must be incident to a vertex that's contained in S.
I can easily come up with a cubic algorithm: iterate over the adjacency matrix; each 1 represents an edge (u, v). Check whether u or v are in S. If not, the answer is no. If we get to the end of the adjacency matrix, the answer is yes.
But how can I do this in O(n^2) time? I guess the only real "observation" I've made so far is that we can possibly skip intermediate rows while iterating over the adjacency matrix if we've already found the vertex corresponding to that row in S. However, this has not helped me very much.
Can someone please help me (or point me in the correct direction)?
Thanks
Construct an array T which is the positions of all of the elements NOT in S.
And then:
for i in T:
for j in T:
if A[i][j] == 1:
return False
return True

Number of complete graph components

Given an undirected graph. How do I check if it can be divided into two sets where every node of one set is connected to every other node of its own set (complete graph). A set can be empty or of only one node. No node should be remaining.
Thanks.
EDIT: Edges between two sets is not forbidden.
Basically we have to check if the graph can be divided into two cliques
As commented by #Damien, checking whether vertices of a given graph can be partitioned into two cliques is actually the decision problem of clique cover with k = 2. For general k (even for k = 3), the clique cover problem is known to be NP-complete. For k = 2, there exists a O(n2) algorithm, based on the observation below.
Given a graph G = (V, E), denote its complement as G'. Then V can be partitioned into two cliques if and only if G' is 2-colorable.
The proof is simple and thus omitted here. The sketch of the algorithm is shown below.
01. construct G' from G;
02. if G' is bipartite
03. return true;
04. else
05. return false;
Note that the first line requires O(n2) time, while testing whether G' is bipartite requires only O(n + m) time using BFS, where n is the # of vertices and m is the # of edges. Therefore, the total complexity is O(n2).

Linear-time algorithm for number of distinct paths from each vertex in a directed acyclic graph

I am working on the following past paper question for an algorithms module:
Let G = (V, E) be a simple directed acyclic graph (DAG).
For a pair of vertices v, u in V, we say v is reachable from u if there is a (directed) path from u to v in G.
(We assume that every vertex is reachable from itself.)
For any vertex v in V, let R(v) be the reachability number of vertex v, which is the number of vertices u in V that are reachable from v.
Design an algorithm which, for a given DAG, G = (V, E), computes the values of R(v) for all vertices v in V.
Provide the analysis of your algorithm (i.e., correctness and running time
analysis).
(Optimally, one should try to design an algorithm running in
O(n + m) time.)
So, far I have the following thoughts:
The following algorithm for finding a topological sort of a DAG might be useful:
TopologicalSort(G)
1. Run DFS on G and compute a DFS-numbering, N // A DFS-numbering is a numbering (starting from 1) of the vertices of G, representing the point at which the DFS-call on a given vertex v finishes.
2. Let the topological sort be the function a(v) = n - N[v] + 1 // n is the number of nodes in G and N[v] is the DFS-number of v.
My second thought is that dynamic programming might be a useful approach, too.
However, I am currently not sure how to combine these two ideas into a solution.
I would appreciate any hints!
EDIT: Unfortunately the approach below is not correct in general. It may count multiple times the nodes that can be reached via multiple paths.
The ideas below are valid if the DAG is a polytree, since this guarantees that there is at most one path between any two nodes.
You can use the following steps:
find all nodes with 0 in-degree (i.e. no incoming edges).
This can be done in O(n + m), e.g. by looping through all edges
and marking those nodes that are the end of any edge. The nodes with 0
in-degree are those which have not been marked.
Start a DFS from each node with 0 in-degree.
After the DFS call for a node ends, we want to have computed for that
node the information of its reachability.
In order to achieve this, we need to add the reachability of the
successors of this node. Some of these values might have already been
computed (if the successor was already visited by DFS), therefore this
is a dynamic programming solution.
The following pseudocode describes the DFS code:
function DFS(node) {
visited[node] = true;
reachability[node] = 1;
for each successor of node {
if (!visited[successor]) {
DFS(successor);
}
reachability[node] += reachability[successor];
}
}
After calling this for all nodes with 0 in-degree, the reachability
array will contain the reachability for all nodes in the graph.
The overall complexity is O(n + m).
I'd suggest using a Breadth First Search approach.
For every node, add all the nodes that are connected to the queue. In addition to that, maintain a separate array for calculating the reachability.
For example, if a A->B, then
1.) Mark A as traversed
2.) B is added to the queue
3.) arr[B]+=1
This way, we can get R(v) for all vertices in O(|V| + |E|) time through arr[].

All pairs shortest paths with dynamic programming

All,
I am reading about the relationship between all pairs shortest path and matrix multiplication.
Consider the multiplication of the weighted adjacency matrix with
itself - except, in this case, we replace the multiplication operation
in matrix multiplication by addition, and the addition operation by
minimization. Notice that the product of weighted adjacency matrix
with itself returns a matrix that contains shortest paths of length 2
between any pair of nodes.
It follows from this argument that A to power of n contains all shortest paths.
Question number 1:
My question is that in a graph we will be having at most n-1 edges between two nodes in a path, on what basis is the author discussing of path of length "n"?
Following slides
www.infosun.fim.uni-passau.de/br/lehrstuhl/.../Westerheide2.PPT
On slide 10 it is mentioned as below.
dij(1) = cij
dij(m) = min (dij(m-1), min1≤k≤n {dik(m-1) + ckj}) --> Eq 1
= min1≤k≤n {dik(m-1) + ckj} ------------------> Eq 2
Question 2: how author concluded Eq 2 from Eq 1.
In Cormen et al book on introduction to algorithms, it is mentioned as below:
What are the actual shortest-path weights delta(i, j)? If the graph contains no negative-weight cycles, then all shortest paths are simple and thus contain at most n - 1 edges. A path from vertex i to vertex j with more than n - 1 edges cannot have less weight than a shortest path from i to j. The actual shortest-path weights are therefore given by
delta(i,j) = d(i,j) power (n-1) = (i,j) power (n) = (i,j) power (n+1) = ...
Question 3: in above equation how author came with n, n+1 edges as we have at most n-1, and also how above assignment works?
Thanks!
The n vs n-1 is just an unfortunate variable name choice. He should have used a different letter instead to be more clear.
A^1 has the shortest paths of length up to 1 (trivially)
A^2 has the shortest paths of length up to 2
A^k has the shortest paths of length up to k
Eq 2 does not directly come from Eq1. Eq 2 is just the second term from the first equation. I presume this is a formatting or copy-paste error (I can't check - your ppt link is broken)
The author is just explicitly pointing out that you have nothing to gain by adding more then n-1 edges to the path:
A^(n-1), //the shortest paths of length up tp (n-1)
is equal to A^n //the shortest paths of length up tp (n)
is equal to A^(n+1) //the shortest paths of length up tp (n+1)
...
This is just so that you can safely stop your computations at (n-1) and be sure that you have the minimum paths among all paths of all lengths. (This is kind of obvious but the textbook has a point in being strict here...)
In a graph we will be having atmost n-1 edges between two nodes in a path, on what basis author is discussing of path of length "n"?
You're confusing the multiple measures being discussed:
A^n represents the "shortest paths" (by weight) of length n between vertices.
"at most n-1 edges between two nodes" -- I presume in this case you're thinking of n as the size of the graph.
The graph could have hundreds of vertices but your adjacency matrix A^3 shows the shortest paths of length 3. Different n measures.

Clique problem algorithm design

One of the assignments in my algorithms class is to design an exhaustive search algorithm to solve the clique problem. That is, given a graph of size n, the algorithm is supposed to determine if there is a complete sub-graph of size k. I think I've gotten the answer, but I can't help but think it could be improved. Here's what I have:
Version 1
input: A graph represented by an array A[0,...n-1], the size k of the subgraph to find.
output: True if a subgraph exists, False otherwise
Algorithm (in python-like pseudocode):
def clique(A, k):
P = A x A x A //Cartesian product
for tuple in P:
if connected(tuple):
return true
return false
def connected(tuple):
unconnected = tuple
for vertex in tuple:
for test_vertex in unconnected:
if vertex is linked to test_vertex:
remove test_vertex from unconnected
if unconnected is empty:
return true
else:
return false
Version 2
input: An adjacency matrix of size n by n, and k the size of the subgraph to find
output: All complete subgraphs in A of size k.
Algorithm (this time in functional/Python pseudocode):
//Base case: return all vertices in a list since each
//one is a 1-clique
def clique(A, 1):
S = new list
for i in range(0 to n-1):
add i to S
return S
//Get a tuple representing all the cliques where
//k = k - 1, then find any cliques for k
def clique(A,k):
C = clique(A, k-1)
S = new list
for tuple in C:
for i in range(0 to n-1):
//make sure the ith vertex is linked to each
//vertex in tuple
for j in tuple:
if A[i,j] != 1:
break
//This means that vertex i makes a clique
if j is the last element:
newtuple = (i | tuple) //make a new tuple with i added
add newtuple to S
//Return the list of k-cliques
return S
Does anybody have any thoughts, comments, or suggestions? This includes bugs I might have missed as well as ways to make this more readable (I'm not used to using much pseudocode).
Version 3
Fortunately, I talked to my professor before submitting the assignment. When I showed him the pseudo-code I had written, he smiled and told me that I did way too much work. For one, I didn't have to submit pseudo-code; I just had to demonstrate that I understood the problem. And two, he was wanting the brute force solution. So what I turned in looked something like this:
input: A graph G = (V,E), the size of the clique to find k
output: True if a clique does exist, false otherwise
Algorithm:
Find the Cartesian Product Vk.
For each tuple in the result, test whether each vertex is connected to every other. If all are connected, return true and exit.
Return false and exit.
UPDATE: Added second version. I think this is getting better although I haven't added any fancy dynamic programming (that I know of).
UPDATE 2: Added some more commenting and documentation to make version 2 more readable. This will probably be the version I turn in today. Thanks for everyone's help! I wish I could accept more than one answer, but I accepted the answer by the person that's helped me out the most. I'll let you guys know what my professor thinks.
Some comments:
You only need to consider n-choose-k combinations of vertices, not all k-tuples (n^k of them).
connected(tuple) doesn't look right. Don't you need to reset unconnected inside the loop?
As the others have suggested, there are better ways of brute-forcing this. Consider the following recursive relation: A (k+1)-subgraph is a clique if the first k vertices form a clique and vertex (k+1) is adjacent to each of the first k vertices. You can apply this in two directions:
Start with a 1-clique and gradually expand the clique until you get the desired size. For example, if m is the largest vertex in the current clique, try to add vertex {m+1, m+2, ..., n-1} to get a clique that is one vertex larger. (This is similar to a depth-first tree traversal, where the children of a tree node are the vertices larger than the largest vertex in the current clique.)
Start with a subgraph of the desired size and check if it is a clique, using the recursive relation. Set up a memoization table to store results along the way.
(implementation suggestion) Use an adjacency matrix (0-1) to represent edges in the graph.
(initial pruning) Throw away all vertices with degree less than k.
I once implemented an algorithm to find all maximal cliques in a graph, which is a similar problem to yours. The way I did it was based on this paper: http://portal.acm.org/citation.cfm?doid=362342.362367 - it described a backtracking solution which I found very useful as a guide, although I changed quite a lot from that paper. You'd need a subscription to get at that though, but I presume your University would have one available.
One thing about that paper though is I really think they should have named the "not set" the "already considered set" because it's just too confusing otherwise.
The algorithm "for each k-tuple of vertices, if it is a clique, then return true" works for sure. However, it's brute force, which is probably not what an algorithms course is searching for. Instead, consider the following:
Every vertex is a 1-clique.
For every 1-clique, every vertex that connects to the vertex in the 1-clique contributes to a 2-clique.
For every 2-clique, every vertex that connects to each vertex in the 2-clique contributes to a 3-clique.
...
For every (k-1)-clique, every vertex that connects to each vertex in the (k-1) clique contributes to a k-clique.
This idea might lead to a better approach.
It's amazing what typing things down as a question will show you about what you've just written. This line:
P = A x A x A //Cartesian product
should be this:
P = A k //Cartesian product
What do you mean by A^k? Are you taking a matrix product? If so, is A the adjacency matrix (you said it was an array of n+1 elements)?
In setbuilder notation, it would look something like this:
P = {(x0, x1, ... xk) | x0 ∈ A and x1 ∈ A ... and xk ∈ A}
It's basically just a Cartesian product of A taken k times. On paper, I wrote it down as k being a superscript of A (I just now figured out how to do that using markdown).
Plus, A is just an array of each individual vertex without regard for adjacency.

Resources