How to generate matrices which satisfy the triangle inequality? - algorithm

Let's consider square matrix
(n is a dimension of the matrix E and fixed (for example n = 4 or n=5)). Matrix entries
satisfy following conditions:
The task is to generate all matrices E. My question is how to do that? Is there any common approach or algorithm? Is that even possible? What to start with?

Naive solution
A naive solution to consider is to generate every possible n-by-n matrix E where each component is a nonnegative integer no greater than n, then take from those only the matrices that satisfy the additional constraints. What would be the complexity of that?
Each component can take on n + 1 values, and there are n^2 components, so there are O((n+1)^(n^2)) candidate matrices. That has an insanely high growth rate.
Link: WolframAlpha analysis of (n+1)^(n^2)
I think it's safe to safe that this not a feasible approach.
Better solution
A better solution follows. It involves a lot of math.
Let S be the set of all matrices E that satisfy your requirements. Let N = {1, 2, ..., n}.
Definitions:
Let a metric on N to have the usual definition, except with the requirement of symmetry omitted.
Let I and J partition the set N. Let D(I,J) be the n x n matrix that has D_ij = 1 when i is in I and j is in J, and D_ij = 0 otherwise.
Let A and B be in S. Then A is adjacent to B if and only if there exist I and J partitioning N such that A + D(I,J) = B.
We say A and B are adjacent if and only if A is adjacent to B or B is adjacent to A.
Two matrices A and B in S are path-connected if and only if there exists a sequence of adjacent elements of S between them.
Let the function M(E) denote the sum of the elements of matrix E.
Lemma 1:
E = D(I,J) is a metric on N.
Proof:
This is a trivial statement except for the case of an edge going from I to J. Let i be in I and j be in J. Then E_ij = 1 by definition of D(I,J). Let k be in N. If k is in I, then E_ik = 0 and E_kj = 1, so E_ik + E_kj >= E_ij. If k is in J, then E_ik = 1 and E_kj = 0, so E_ij + E_kj >= E_ij.
Lemma 2:
Let E be in S such that E != zeros(n,n). Then there exist I and J partitioning N such that E' = E - D(I,J) is in S with M(E') < M(E).
Proof:
Let (i,j) be such that E_ij > 0. Let I be the subset of N that can be reached from i by a directed path of cost 0. I cannot be empty, because i is in I. I cannot be N, because j is not in I. This is because E satisfies the triangle inequality and E_ij > 0.
Let J = N - I. Then I and J are both nonempty and partition N. By the definition of I, there does not exist any (x,y) such that E_xy = 0 and x is in I and y is in J. Therefore E_xy >= 1 for all x in I and y in J.
Thus E' = E - D(I,J) >= 0. That M(E') < M(E) is obvious, because all we have done is subtract from elements of E to get E'. Now, since E is a metric on N and D(I,J) is a metric on N (by Lemma 1) and E >= D(I,J), we have E' is a metric on N. Therefore E' is in S.
Theorem:
Let E be in S. Then E and zeros(n,n) are path-connected.
Proof (by induction):
If E = zeros(n,n), then the statement is trivial.
Suppose E != zeros(n,n). Let M(E) be the sum of the values in E. Then, by induction, we can assume that the statement is true for any matrix E' having M(E') < M(E).
Since E != zeros(n,n), by Lemma 2 we have some E' in S such that M(E') < M(E). Then by the inductive hypothesis E' is path-connected to zeros(n,n). Therefore E is path-connected to zeros(n,n).
Corollary:
The set S is path-connected.
Proof:
Let A and B be in S. By the Theorem, A and B are both path-connected to zeros(n,n). Therefore A is path-connected to B.
Algorithm
The Corollary tells us that everything in S is path-connected. So an effective way to discover all of the elements of S is to perform a breadth-first search over the graph defined by the following.
The elements of S are the nodes of the graph
Nodes of the graph are connected by an edge if and only if they are adjacent
Given a node E, you can find all of the (potentially) unvisited neighbors of E by simply enumerating all of the possible matrices D(I,J) (of which there are 2^n) and generating E' = E + D(I,J) for each. Enumerating the D(I,J) should be relatively straightforward (there is one for every possible subset I of D, except for the empty set and D).
Note that, in the preceding paragraph, E and D(I,J) are both metrics on N. So when you generate E' = E + D(I,J), you don't have to check that it satisfies the triangle inequality - E' is the sum of two metrics, so it is a metric. To check that E' is in S, all you have to do is verify that the maximum element in E' does not exceed n.
You can start the breadth-first search from any element of S and be guaranteed that you won't miss any of S. So you can start the search with zeros(n,n).
Be aware that the cardinality of the set S grows extremely fast as n increases, so computing the entire set S will only be tractable for small n.

Related

Minimum steps to reach target, given (1, 0) and operations A = 2A - B or B = 2B - A

I had an interview and I was not able to give a best approach for the problem.
A=1, B=0
- Operation L: A=2A-B
- Operation R: B=2B-A
For each step, only one operation(L or R) is taken place.
For a given number N, what is the minimum number of operations required to make A or B equals to N?
The most important one is an efficiency.
Thanks in advance.
In k operations you can get all values of N in [-(2^k)+1, 2^k].
Notice that abs(A) + abs(B) = 2^k for all possible k paths, and that A & B exactly cover the range [-(2^k)+1, 2^k] in the set of paths of length k.
k=0: (1,0)
k=1: (1,-1), (2,0)
k=2: (1,-3), (2,-2), (3,-1), (4,0)
etc...
Given N we can find the minimum k via log. Then we know the final pair is (N, N - 2^k) (or (N-2^k, N) if N <= 0). It's easy to follow the path back up to k=0 because one of the two elements will be out of range for the next smaller k.
E.g., N = 35.
Log2(35) = 5.13, so we use k=6.
2^6 = 64, so our final pair is (35, -29)
(35,-29) -> (3,-29) -> (3, -13) -> (3, -5) -> (3,-1) -> (1,-1) -> (1,0)
Figuring out k is O(1), finding the path is O(k) which is O(log(abs(N)).
It's not likely you need to prove anything in an interview, but if you did, you could use this:
By observation: A - B = 2^k for k steps observed for small k.
Then via induction: we have some valid (A, B) s.t. A-B = 2^k. Then L gets us (2A-B, B), but 2A-B-B = 2A-2B = 2(A-B) = 2^(k+1) as desired. Similarly for R.
It would be a challenging task for an interview but I would start with the recursion of trying to find the origin from the result. Given valid (A', B), where A' is the target we are after,
A' = 2A - B
for some A, which means that
A = (A' + B) / 2
The latter tells us that all (A' + B) in the path must be divisible by 2, and since the path ends (starts) at 1, all the (A' + B) in it are powers of 2.
Another property we can observe, although it may not be relevant to the solution is that once we switch in the first step to (even, even) or (odd, odd), we cannot switch back.

Given a directed weighted graph with self loops ,find the list of nodes that are exactly k dist from a given node x?

Each edge in the graph has weight of 1,The graph may have cycles ,if a node has self loop it can be any distance from itself from 0 to infinity , depending on the no. of time we take the self loop.
I have solved the problem using bfs, but the constraint on distance is in order of 10^9 ,hence bfs is slow.
We ll be asked multiple queries on a given graph of the form
(distance , source)
and the o/p is the list of nodes that are exactly at the given distance starting from the source vertex.
Constraints
1<=Nodes<=500
1<queries<=500
1<=distance<=10^9
I have a feeling ,there would be many repeated computations as the no. of nodes are small,but i am not able to figure out how do i reduce the problem in smaller problems.
What is the efficient way to do this?
Edit : I have tried using matrix exponentiation but its too slow ,for the given constraints. The problem has a time limit of 1 sec.
Let G = (V,E) be your graph, and define the adjacency matrix A as follows:
A[i][j] = 1 (V[i],V[j]) is in E
0 otherwise
In this matrix, for each k:
(A^k)[i][j] > 0 if and only if there is a path from v[i] to v[j] of length exactly k.
This means by creating this matrix and then calculating the exponent, you can easily get your answer.
For fast exponent calculation you can use exponent by squaring, which will yield O(M(n)^log(k)) where M(n) is the cost for matrix multiplication for nXn matrix.
This will also save you some calculation when looking for different queries on the same graph.
Appendix - claim proof:
Base: A^1 = A, and indeed in A by definition, A[i][j]=1 if and only if (V[i],V[j]) is in E
Hypothesis: assume the claim is correct for all l<k
A^k = A^(k-1)*A. From induction hypothesis, A^(k-1)[i][j] > 0 iff there is a path of length k-1 from V[i] to V[j].
Let's examine two vertices v1,v2 with indices i and j.
If there is a path of length k between them, let it be v1->...->u->v2. Let the index of u be m.
From i.h. A^(k-1)[i][m] > 0 because there is a path. In addition A[m][j] = 1, because (u,v2) = (V[m],V[j]) is an edge.
A^k[i][j] = A^(k-1)*A[i][j] = A^(k-1)[i][1]A[1][j] + ... + A^(k-1)[i][m]A[m][j] + ... + A^(k-1)[i][n]A[n][j]
And since A[m][j] > 0 and A^(k-1)[i][m] > 0, then A^(k-1)*A[i][j] > 0
If there is no such path, then for each vertex u such that (u,v2) is an edge, there is no path of length k-1 from v to u (otherweise v1->..->u->v2 is a path of length k).
Then, using induction hypothesis we know that if A^(k-1)[i][m] > 0 then A[m][j] = 0, for all m.
If we assign that in the sum defining A^k[i][j], we get that A^k[i][j] = 0
QED
Small note: Technically, A^k[i][j] is the number of paths between i and j of length exactly k. This can be proven similar to above but with a bit more attention to details.
To avoid the numbers growing too fast (which will increase M(n) because you might need big integers to store that value), and since you don't care for the value other than 0/1 - you can treat the matrix as booleans - using only 0/1 values and trimming anything else.
if there are cycles in your graph, then you can infer that there is a link between each adjacent nodes in cycle * N + 1, because you can iterate through as much as you wish.
That bring me to the idea, we can use the cycles to our advantage!
using BFS while detecting a cycle, we calculate offset + cycle*N and then we get as close to our goal(K)
and search for the K pretty easily.
e.g.
A -> B -> C -> D -> B
K = 1000;
S = A;
A - 0
B - 1
C - 2
D - 3
B - 1 (+ 4N)
here you can check floor() of k - (1+4N) = 0 > 1000 - 1 - 4N = 0 > 999 = 4N > N=249 => best B is 249*4 + 1 = 997
simpler way would be to calculate: round(k - offset, cycle)
from here you can count only few more steps.
in this example (as a REGEX): A(BCD){249}BCD

Random Polynomial TM

I need to show that there exists random polynomial TM M, that uses O(log(n)) space s.t.
for input G,s,t where G is directed graph and s,t are two vertices in
G: If there is a path from s to t then Pr[M(G,s,t) = 1] ≥ 1/nⁿ
Else Pr[M(G,s,t)=1] = 0
I tried to choose each time a random neighbor, but I can't figure why the probability is 1/nⁿ,
and I'm not sure about the number of iterations.
And another question:
I need to use the above result and the fact that I have "random counter" that uses O(log k) space, and can count up to 2k, to show that:
L is in LN iff there exists random polynomial TM M that uses O(log n)
space and for every input x, M will spot and: If x is in L then
Pr[M(x) = 1] ≥ 1/2 Else Pr[M(x) = 1] = 0
I will only answer the first question as this should be one question per post.
Algorithm:
start with vertice s
set a counter to zero.
choose a random neighbor v. (and increment the counter)
check if v equals t.
If not choose another random neighbor of v. (increment the counter)
repeat 3. and 4. until you found t or the counter reaches n. (or maybe c⋅n where c is a constant number)
To do this, you onlyhave to save 3 vertices (s,v,t) and the counter. If the counter is stored as binaray number it needs log₂(n) Bits. So this runs in O(log(n)) space.
If there is no path from s to t you will never get a neighbors neighbor of s that is t, so Pr[M(G,s,t)=1] = 0 holds.
If there exists a path, the propability of finding it would be the product of the propabilitys of choosing the right neighbors. So the worst case is that g is a complete graph, so evert vertice has n-1 neighbors. The path cannot be longer than n vertices. So let [s,v₁,v₂,...,vm,t] a path from s to t, with m < n. The we get
Pr[M(G,s,t) = 1] ≥ Πk=1,...,m 1/|neighbors(vk)|
≥ Πk=1,...,m 1/(n-1)
≥ Πk=1,...,m 1/n
≥ Πk=1,...,n 1/n
= 1/nⁿ

Topological sort to find the number of paths to t

I have to develop an O(|V|+|E|) algorithm related to topological sort which, in a directed acyclic graph (DAG), determines the number of paths from each vertex of the graph to t (t is a node with out-degree 0). I have developed a modification of DFS as follow:
DFS(G,t):
for each vertex u ∈ V do
color(u) = WHITE
paths_to_t(u) = 0
for each vertex u ∈ V do
if color(u) == WHITE then
DFS-Visit(u,t)
DFS-Visit(u,t):
color(u) = GREY
for each v ∈ neighbors(u) do
if v == t then
paths_to_t(u) = paths_to_t(u) + 1
else then
if color(v) == WHITE then
DFS-Visit(v)
paths_to_t(u) = paths_to_t(u) + paths_to_t(v)
color(u) = BLACK
But I am not sure if this algorithm is related to topological sort or if should I restructure my work with another point of view.
It can be done using Dynamic Programming and topological sort as follows:
Topological sort the vertices, let the ordered vertices be v1,v2,...,vn
create new array of size t, let it be arr
init: arr[t] = 1
for i from t-1 to 1 (descending, inclusive):
arr[i] = 0
for each edge (v_i,v_j) such that i < j <= t:
arr[i] += arr[j]
When you are done, for each i in [1,t], arr[i] indicates the number of paths from vi to vt
Now, proving the above claim is easy (comparing to your algorithm, which I have no idea if its correct and how to prove it), it is done by induction:
Base: arr[t] == 1, and indeed there is a single path from t to t, the empty one.
Hypothesis: The claim is true for each k in range m < k <= t
Proof: We need to show the claim is correct for m.
Let's look at each out edge from vm: (v_m,v_i).
Thus, the number of paths to vt starting from v_m that use this edge (v_m,v_i). is exactly arr[i] (induction hypothesis). Summing all possibilities of out edges from v_m, gives us the total number of paths from v_m to v_t - and this is exactly what the algorithm do.
Thus, arr[m] = #paths from v_m to v_t
QED
Time complexity:
The first step (topological sort) takes O(V+E).
The loop iterate all edges once, and all vertices once, so it is O(V+E) as well.
This gives us total complexity of O(V+E)

Find the set S which maximize the minimum distance between points in S union A

I would like to find a set S of given cardinality k maximizing the minimum distance between each points and a given set A. Is there a simple algorithm to find the solution of this max-min problem ?
Given a universe X ⊆ R^d and A ⊆ X,
find the argmax_{S⊆X, |S|=k} min_{s⊆S, s'≠s ⊆ S∪A} distance(s,s')
Thanks !
For a given k with X and A as input brute forcing is obviously in P (as C(|X|,k) is polynomial in |X|).
If k is also an input then it might depends on 'distance' :
If 'distance' is arbitrary then your problem is equivalent to find a fixed size clique in a graph (which is NP-complete):
NP-Hardness :
Take an instance of the clique problem, that is a graph (G,E) and a integer k.
Add to this graph a vertex 'a' connected to every other vertex, let (G',E') be your modified graph.
(G',E') k+1 is then an equivalent instance of your first instance of the clique problem.
Create a map Phi from G' to R^d (you can map G' on N anyway ...) and defined 'distance' such that distance(Phi(c),Phi(d')) = 1 if (c,d) ⊆ E', 0 otherwise.
X = Phi(G'), A = Phi({a}), k+1 give you an instance of your problem.
You can notice that by construction s ≠ Phi(a) <=> distance(s,Phi(A)) = 1 = max distance(.,.), i.e. min_{s⊆S, s'≠s ⊆ S∪A} distance(s,s') = min_{s⊆S, s'≠s ⊆ S} distance(s,s') if |S| = k >= 2
Solve this instance (X,A,k+1) of your problem : this give you a set S of cardinality k+1 such that min( distance(s,s') |s,s'⊆ S, s≠s') is maximal.
Check whether there are s,s'⊆ S, s≠s', distance(s,s') = 0 (this can be done in k^2 as |S| = k) :
if it is the case then there is no set of cardinality k+1 such that forall s,s' distance(s,s') = 1 i.e. there is no subgraph of G' which is a k+1-clique
if it's not the case then map back S into G', by definition of the 'distance' there is an edge between any vertices of Phi^-1 (S) : it's a k+1 clique
This solve with a polynomial reduction a problem equivalent to the clique problem (known as NP-hard).
NP-Easiness :
Let X, A, k be an instance of your problem.
For any subset S of X min_{s⊆S, s'≠s ⊆ S∪A} distance(s,s') can only take value in {distance(x,y), x,y ⊆ X} -which has a polynomial cardinal-.
To find a set that maximize this distance we will just test every possible distance for a correct set in a decreasing order.
To test a distance d we first reduce X to X' containing only points at distance >= d of A{the point itself}.
Then we create a graph (X',E) ¤ where (s,s') ⊆ E iff distance(s,s') >= d.
Test this graph for a k-clique (it's NP-easy), by construction there is one iff there it's vertices S are a set with min_{s⊆S, s'≠s ⊆ S∪A} distance(s,s') >= d
If 'distance' is euclidean or have any funny property it might be in P, I don't know but I won't hope too much if I were you.
¤ I assumed that X (and thus X') was finite here
This is probably NP-hard. Greedily choosing the point furthest away from previous choices is a 2-approximation. There might be a complicated approximation scheme for low d based on the scheme for Euclidean TSP by Arora and Mitchell. For d = 10, forget it.
The top 2 results for the search sphere packing np complete turned up references to a 1981 paper proving that optimal packing and covering in 2d is np complete. I do not have access to a research library, so I can't read the paper. But I expect that your problem can be rephrased as that one, in which case you have a proof that it you have an NP-complete problem.

Resources