Random Polynomial TM - complexity-theory

I need to show that there exists random polynomial TM M, that uses O(log(n)) space s.t.
for input G,s,t where G is directed graph and s,t are two vertices in
G: If there is a path from s to t then Pr[M(G,s,t) = 1] ≥ 1/nⁿ
Else Pr[M(G,s,t)=1] = 0
I tried to choose each time a random neighbor, but I can't figure why the probability is 1/nⁿ,
and I'm not sure about the number of iterations.
And another question:
I need to use the above result and the fact that I have "random counter" that uses O(log k) space, and can count up to 2k, to show that:
L is in LN iff there exists random polynomial TM M that uses O(log n)
space and for every input x, M will spot and: If x is in L then
Pr[M(x) = 1] ≥ 1/2 Else Pr[M(x) = 1] = 0

I will only answer the first question as this should be one question per post.
Algorithm:
start with vertice s
set a counter to zero.
choose a random neighbor v. (and increment the counter)
check if v equals t.
If not choose another random neighbor of v. (increment the counter)
repeat 3. and 4. until you found t or the counter reaches n. (or maybe c⋅n where c is a constant number)
To do this, you onlyhave to save 3 vertices (s,v,t) and the counter. If the counter is stored as binaray number it needs log₂(n) Bits. So this runs in O(log(n)) space.
If there is no path from s to t you will never get a neighbors neighbor of s that is t, so Pr[M(G,s,t)=1] = 0 holds.
If there exists a path, the propability of finding it would be the product of the propabilitys of choosing the right neighbors. So the worst case is that g is a complete graph, so evert vertice has n-1 neighbors. The path cannot be longer than n vertices. So let [s,v₁,v₂,...,vm,t] a path from s to t, with m < n. The we get
Pr[M(G,s,t) = 1] ≥ Πk=1,...,m 1/|neighbors(vk)|
≥ Πk=1,...,m 1/(n-1)
≥ Πk=1,...,m 1/n
≥ Πk=1,...,n 1/n
= 1/nⁿ

Related

How to solve the equation sum{max(a_i, x)}=y with variable x? Is there any algorithm with O(n) time complexity?

I am trying to find an algorithm to solve the following equation:
∑ max(ai, x) = y
in which the ai are constants and x is the variable.
I can find an algorithm with O(n log n) time complexity as follows:
First of all, sort the ai in O(n log n) time, and arrange intervals
(−∞, a0), (a0, a1), …, (ai, ai+1), …, (an−1, an), (an, ∞)
Then, for each interval, assume x belongs to this interval, and solve the equation. We could get a x̂, and then test whether x̂ belongs to this interval or not. If x̂ belongs to the corresponding interval, we will assign x̂ to x, and return x. On the other hand, we will try the next interval until we get the solution.
The above method is an O(n log n) algorithm due to the sort. With the definition of the equation-solving problem, I expect an algorithm with O(n) time complexity. Is there any reference for this problem?
First of all, this only has a solution if the sum of all a_i is smaller than y. You should check this first, because the algorithm below depends on this property.
Assume that we have chosen some pivot p from all a_i and want to calculate the x that corresponds to the interval [p, q), where q is the next larger a_i. This is:
If you move p to the next larger a_i, x changes as follows:
, where p' is the new pivot and n is the old number of a_i that are smaller or equal to p. Under the assumption that the sum of all a_i is smaller than y, this clearly leads to a decrease of x. Similarly, if we choose a smaller p, x is increased.
Coming back to the first equation, we can observe the following: If x is smaller than p, we should choose a smaller p. If x is greater than the smallest of the greater a_is, we should choose a larger p. In every other case, we have found the right x.
This can be utilized in a quick select procedure. #MvG's comment brought me onto this track. All credits for the quick select idea go to him. Here is some pseudo code (modified version from Wikipedia):
findX(list, y)
left := 0
right := length(list) - 1
sumGreater := 0 // the sum of all a_i greater than the current interval
numSmaller := 0 // the number of all a_i smaller than the current interval
minGreater := inf //the minimum of all a_i greater than the current interval
loop
if left = right
return (y - sumGreater) / (numSmaller + 1)
pivotIndex := medianOfMedians(list, left, right)
//the partition function will also sum the elements larger than the pivot,
//count the elements smaller than the pivot, and find the minimum of the
//larger elements
(pivotIndex, partialSumGreater, partialNumSmaller, partialMinGreater)
:= partition(list, left, right, pivotIndex)
x := (y - sumGreater - partialSumGreater) / (numSmaller + partialNumSmaller + 1)
if(x >= list[pivotIndex] && x < min(partialMinGreater, minGreater))
return x
else if x < list[pivotIndex]
right := pivotIndex - 1
minGreater := list[pivotIndex]
sumGreater += partialSumGreater + list[pivotIndex]
else
left := pivotIndex + 1
numSmaller += partialNumSmaller + 1
The key idea is that the partitioning function gathers some additional statistics. This does not change the time complexity of the partitioning function because it requires O(n) additional operations, leaving a total time complexity of O(n) for the partitioning function. The medianOfMedians function is also linear in time. The remaining operations in the loop are constant time. Assuming that the median of medians yields good pivots, the total time of the entire algorithm is approximately O(n + n/2 + n/4 + n/8 ...) = O(n).
Since comments might get deleted, I'm turning my own comments into a coherent answer. Contrary to the original question, I'm using indices 1 through n, avoiding the a0 originally used. So this is consistent one-based indexing using inclusive indices.
Assume for the moment that bi are the coefficients from your input, but in sorted order, so bi ≤ bi+1. As you essentially already wrote, if bi ≤ x ≤ bi+1 then the result is i ⋅ x + bi+1 + ⋯ + bn since the first i terms will use the x and the other terms will use the bj. Solving for x you get x = (y − bi+1 − ⋯ - bn) / i and putting that back into your inequality you have i ⋅ bi ≤ y − bi+1 − ⋯ − bn ≤ i ⋅ bi+1. Concentrating on one of the inequalities, you want the largest i such that
i ⋅ bi ≤ y − bi+1 − ⋯ − bn       (subsequently called “the inequality”)
But in order to make this work on unsorted ai, you'd need something similar to the median of medians. That is an algorithm which achieves O(n) guaranteed worst-case behavior for the problem of selecting a median, where the typical quickselect would take O(n²) in the worst case although it usually does quite well in practice.
Actually your problem is not that different from quickselect. You can pick a pivot coefficient, and split the remainder into larger and smaller values. Then you evaluate the inequality for the pivot element. If it is satisfied, you recurse into the list of larger elements, otherwise you recurse into the list of smaller elements, until at some point you have two adjacent elements, one which satisfies the inequality and one which does not.
This is O(n²) in the worst case, since you might need O(n) recursive calls, each of them taking O(n) time to process its input. Just like the O(n²) quickselect itself is suboptimal. The median-of-medians shows that that problem can indeed be solved in O(n). So we either need to find a similar solution here, or reformulate this problem here in terms of finding the median, or write some algorithm wich makes use of the median in a reasonable way.
Actually Nico Schertler found a way to achieve that last option: Take the algorithm I outlined above, but choose the pivot element to be the median. That way you can guarantee that each recursive call will process at most half as much input as the previous call. Since the median of medians itself is O(n) this can be done without exceeding the O(n) bound for each recursive call.
So in pseudocode it's like this (using inclusive indices throughout):
# f: Process whole problem with coefficients a_1 through a_n
f(y, a, n) := begin
if y < (sum of a_i for i from 1 through n): # O(n)
throw Error "Cannot satisfy equation" # Or omit check and risk division by zero
return g(a, 1, n, y) # O(n)
end
# g: Recursively process part of the problem, namely a_l through a_r
# Precondition: we know inequality holds for i = l - 1 and fails for i = r + 1
# a: the array as provided to f; will get modified in place
# l: left index (inclusive)
# r: right index (inclusive)
# y: (original y) - (sum of a_j for j from r + 1 through n)
g(a, l, r, y) := begin # process a_l through a_r O(r-l)
if r < l: # inequality holds in r but fails in l O(1)
return y / r # compute x for the case of i = r O(1)
m = median(a, l, r) # computed using median of medians O(r-l)
i = floor((l + r) / 2) # index of median, with same tie breaks O(1)
partition(a, l, r, m) # so a_l…a_(i-1) ≤ a_i=m ≤ a_(i+1)…a_r O(r-l)
rhs = y - (sum of a_j for j from i + 1 to r) # O((r-l)/2)
if i * a_i ≤ rhs: # condition holds, check larger i
return g(a, i + 1, r, y) # recurse in right half of list O((r-l)/2)
else: # condition fails, check smaller i
return g(a, l, i - 1, rhs - m) # recurse in left half of list O((r-l)/2)
end

Algorithm to find best combination or path through nodes

As I am not very proficient in various optimization/tree algorithms, I am seeking help.
Problem Description:
Assume, a large sequence of sorted nodes is given with each node representing an integer value L. L is always getting bigger with each node and no nodes have the same L.
The goal now is to find the best combination of nodes, where the difference between the L-values of subsequent nodes is closest to a given integer value M(L) that changes over L.
Example:
So, in the beginning I would have L = 50 and M = 100. The next nodes have L = 70,140,159,240,310.
First, the value of 159 seems to be closest to L+M = 150, so it is chosen as the right value.
However, in the next step, M=100 is still given and we notice that L+M = 259, which is far away from 240.
If we now go back and choose the node with L=140 instead, which then is followed by 240, the overall match between the M values and the L-differences is stronger. The algorithm should be able to find back to the optimal path, even if a mistake was made along the way.
Some additional information:
1) the start node is not necessarily part of the best combination/path, but if required, one could first develop an algorithm, which chooses the best starter candidate.
2) the optimal combination of nodes is following the sorted sequence and not "jumping back" -> so 1,3,5,7 is possible but not 1,3,5,2,7.
3) in the end, the differences between the L values of chosen nodes should in the mean squared sense be closest to the M values
Every help is much appreciated!
If I understand your question correctly, you could use Dijktras algorithm:
https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
http://www.mathworks.com/matlabcentral/fileexchange/20025-dijkstra-s-minimum-cost-path-algorithm
For that you have to know your neighbours of every node and create an Adjacency Matrix. With the implementation of Dijktras algorithm which I posted above you can specify edge weights. You could specify your edge weight in a manner that it is L of the node accessed + M. So for every node combination you have your L of new node + M. In that way the algorithm should find the optimum path between your nodes.
To get all edge combinations you can use Matlabs graph functions:
http://se.mathworks.com/help/matlab/ref/graph.html
If I understand your problem correctly you need an undirected graph.
You can access all edges with the command
G.Edges after you have created the graph.
I know its not the perfect answer but I hope it helps!
P.S. Just watch out, Djikstras algorithm can only handle positive edge weights.
Suppose we are given a number M and a list of n numbers, L[1], ..., L[n], and we want to find a subsequence of at least q of the latter numbers that minimises the sum of squared errors (SSE) with respect to M, where the SSE of a list of k positions x[1], ..., x[k] with respect to M is given by
SSE(M, x[1], ..., x[k]) = sum((L[x[i]]-L[x[i-1]]-M)^2) over all 2 <= i <= k,
with the SSE of a list of 0 or 1 positions defined to be 0.
(I'm introducing the parameter q and associated constraint on the subsequence length here because without it, there always exists a subsequence of length exactly 2 that achieves the minimum possible SSE -- and I'm guessing that such a short sequence isn't helpful to you.)
This problem can be solved in O(qn^2) time and O(qn) space using dynamic programming.
Define f(i, j) to be the minimum sum of squared errors achievable under the following constraints:
The number at position i is selected, and is the rightmost selected position. (Here, i = 0 implies that no positions are selected.)
We require that at least j (instead of q) of these first i numbers are selected.
Also define g(i, j) to be the minimum of f(k, j) over all 0 <= k <= i. Thus g(n, q) will be the minimum sum of squared errors achievable on the entire original problem. For efficient (O(1)) calculation of g(i, j), note that
g(i>0, j>0) = min(g(i-1, j), f(i, j))
g(0, 0) = 0
g(0, j>0) = infinity
To calculate f(i, j), note that if i > 0 then any solution must be formed by appending the ith position to some solution Y that selects at least j-1 positions and whose rightmost selected position is to the left of i -- i.e. whose rightmost selected position is k, for some k < i. The total SSE of this solution to the (i, j) subproblem will be whatever the SSE of Y was, plus a fixed term of (L[x[i]]-L[x[k]]-M)^2 -- so to minimise this total SSE, it suffices to minimise the SSE of Y. But we can compute that minimum: it is g(k, j-1).
Since this holds for any 0 <= k < i, it suffices to try all such values of k, and take the one that gives the lowest total SSE:
f(i>=j, j>=2) = min of (g(k, j-1) + (L[x[i]]-L[x[k]]-M)^2) over all 0 <= k < i
f(i>=j, j<2) = 0 # If we only need 0 or 1 position, SSE is 0
f(i, j>i) = infinity # Can't choose > i positions if the rightmost chosen position is i
With the above recurrences and base cases, we can compute g(n, q), the minimum possible sum of squared errors for the entire problem. By memoising values of f(i, j) and g(i, j), the time to compute all needed values of f(i, j) is O(qn^2), since there are at most (n+1)*(q+1) possible distinct combinations of input parameters (i, j), and computing a particular value of f(i, j) requires at most (n+1) iterations of the loop that chooses values of k, each iteration of which takes O(1) time outside of recursive subcalls. Storing solution values of f(i, j) requires at most (n+1)*(q+1), or O(qn), space, and likewise for g(i, j). As established above, g(i, j) can be computed in O(1) time when all needed values of f(x, y) have been computed, so g(n, q) can be computed in the same time complexity.
To actually reconstruct a solution corresponding to this minimum SSE, you can trace back through the computed values of f(i, j) in reverse order, each time looking for a value of k that achieves a minimum value in the recurrence (there may in general be many such values of k), setting i to this value of k, and continuing on until i=0. This is a standard dynamic programming technique.
I now answer my own post with my current implementation, in order to structure my post and load images. Unfortunately, the code does not do what it should do. Imagine L,M and q given like in the images below. With the calcf and calcg functions I calculated the F and G matrices where F(i+1,j+1) is the calculated and stored f(i,j) and G(i+1,j+1) from g(i,j). The SSE of the optimal combination should be G(N+1,q+1), but the result is wrong. If anyone found the mistake, that would be much appreciated.
G and F Matrix of given problem in the workspace. G and F are created by calculating g(N,q) via calcg(L,N,q,M).
calcf and calcg functions

How to generate matrices which satisfy the triangle inequality?

Let's consider square matrix
(n is a dimension of the matrix E and fixed (for example n = 4 or n=5)). Matrix entries
satisfy following conditions:
The task is to generate all matrices E. My question is how to do that? Is there any common approach or algorithm? Is that even possible? What to start with?
Naive solution
A naive solution to consider is to generate every possible n-by-n matrix E where each component is a nonnegative integer no greater than n, then take from those only the matrices that satisfy the additional constraints. What would be the complexity of that?
Each component can take on n + 1 values, and there are n^2 components, so there are O((n+1)^(n^2)) candidate matrices. That has an insanely high growth rate.
Link: WolframAlpha analysis of (n+1)^(n^2)
I think it's safe to safe that this not a feasible approach.
Better solution
A better solution follows. It involves a lot of math.
Let S be the set of all matrices E that satisfy your requirements. Let N = {1, 2, ..., n}.
Definitions:
Let a metric on N to have the usual definition, except with the requirement of symmetry omitted.
Let I and J partition the set N. Let D(I,J) be the n x n matrix that has D_ij = 1 when i is in I and j is in J, and D_ij = 0 otherwise.
Let A and B be in S. Then A is adjacent to B if and only if there exist I and J partitioning N such that A + D(I,J) = B.
We say A and B are adjacent if and only if A is adjacent to B or B is adjacent to A.
Two matrices A and B in S are path-connected if and only if there exists a sequence of adjacent elements of S between them.
Let the function M(E) denote the sum of the elements of matrix E.
Lemma 1:
E = D(I,J) is a metric on N.
Proof:
This is a trivial statement except for the case of an edge going from I to J. Let i be in I and j be in J. Then E_ij = 1 by definition of D(I,J). Let k be in N. If k is in I, then E_ik = 0 and E_kj = 1, so E_ik + E_kj >= E_ij. If k is in J, then E_ik = 1 and E_kj = 0, so E_ij + E_kj >= E_ij.
Lemma 2:
Let E be in S such that E != zeros(n,n). Then there exist I and J partitioning N such that E' = E - D(I,J) is in S with M(E') < M(E).
Proof:
Let (i,j) be such that E_ij > 0. Let I be the subset of N that can be reached from i by a directed path of cost 0. I cannot be empty, because i is in I. I cannot be N, because j is not in I. This is because E satisfies the triangle inequality and E_ij > 0.
Let J = N - I. Then I and J are both nonempty and partition N. By the definition of I, there does not exist any (x,y) such that E_xy = 0 and x is in I and y is in J. Therefore E_xy >= 1 for all x in I and y in J.
Thus E' = E - D(I,J) >= 0. That M(E') < M(E) is obvious, because all we have done is subtract from elements of E to get E'. Now, since E is a metric on N and D(I,J) is a metric on N (by Lemma 1) and E >= D(I,J), we have E' is a metric on N. Therefore E' is in S.
Theorem:
Let E be in S. Then E and zeros(n,n) are path-connected.
Proof (by induction):
If E = zeros(n,n), then the statement is trivial.
Suppose E != zeros(n,n). Let M(E) be the sum of the values in E. Then, by induction, we can assume that the statement is true for any matrix E' having M(E') < M(E).
Since E != zeros(n,n), by Lemma 2 we have some E' in S such that M(E') < M(E). Then by the inductive hypothesis E' is path-connected to zeros(n,n). Therefore E is path-connected to zeros(n,n).
Corollary:
The set S is path-connected.
Proof:
Let A and B be in S. By the Theorem, A and B are both path-connected to zeros(n,n). Therefore A is path-connected to B.
Algorithm
The Corollary tells us that everything in S is path-connected. So an effective way to discover all of the elements of S is to perform a breadth-first search over the graph defined by the following.
The elements of S are the nodes of the graph
Nodes of the graph are connected by an edge if and only if they are adjacent
Given a node E, you can find all of the (potentially) unvisited neighbors of E by simply enumerating all of the possible matrices D(I,J) (of which there are 2^n) and generating E' = E + D(I,J) for each. Enumerating the D(I,J) should be relatively straightforward (there is one for every possible subset I of D, except for the empty set and D).
Note that, in the preceding paragraph, E and D(I,J) are both metrics on N. So when you generate E' = E + D(I,J), you don't have to check that it satisfies the triangle inequality - E' is the sum of two metrics, so it is a metric. To check that E' is in S, all you have to do is verify that the maximum element in E' does not exceed n.
You can start the breadth-first search from any element of S and be guaranteed that you won't miss any of S. So you can start the search with zeros(n,n).
Be aware that the cardinality of the set S grows extremely fast as n increases, so computing the entire set S will only be tractable for small n.

Optimizing a DP on Intervals/Points

Well the problem is quite easy to solve naively in O(n3) time. The problem is something like:
There are N unique points on a number line. You want to cover every
single point on the number line with some set of intervals. You can
place an interval anywhere, and it costs B + MX to create an
interval, where B is the initial cost of creating an interval, and
X is half the length of the interval, and M is the cost per
length of interval. You want to find the minimum cost to cover every
single interval.
Sample data:
Points = {0, 7, 100}
B = 20
M = 5
So the optimal solution would be 57.50 because you can build an interval [0,7] at cost 20 + 3.5×5 and build an interval at [100,100] at cost 100 + 0×5, which adds up to 57.50.
I have an O(n3) solution, where the DP is minimum cost to cover points from [left, right]. So the answer would be in DP[1][N]. For every pair (i,j) I just iterate over k = {i...j-1} and compute DP[i][k] + DP[k + 1][j].
However, this solution is O(n3) (kind of like matrix multiplication I think) so it's too slow on N > 2000. Any way to optimize this?
Here's a quadratic solution:
Sort all the points by coordinate. Call the points p.
We'll keep an array A such that A[k] is the minimum cost to cover the first k points. Set A[0] to zero and all other elements to infinity.
For each k from 0 to n-1 and for each l from k+1 to n, set A[l] = min(A[l], A[k] + B + M*(p[l-1] - p[k])/2);
You should be able to convince yourself that, at the end, A[n] is the minimum cost to cover all n points. (We considered all possible minimal covering intervals and we did so from "left to right" in a certain sense.)
You can speed this up so that it runs in O(n log n) time; replace step 3 with the following:
Set A[1] = B. For each k from 2 to n, set A[k] = A[k-1] + min(M/2 * (p[k-1] - p[k-2]), B).
The idea here is that we either extend the previous interval to cover the next point or we end the previous interval at p[k-2] and begin a new one at p[k-1]. And the only thing we need to know to make that decision is the distance between the two points.
Notice also that, when computing A[k], I only needed the value of A[k-1]. In particular, you don't need to store the whole array A; only its most recent element.

Find sum in array equal to zero

Given an array of integers, find a set of at least one integer which sums to 0.
For example, given [-1, 8, 6, 7, 2, 1, -2, -5], the algorithm may output [-1, 6, 2, -2, -5] because this is a subset of the input array, which sums to 0.
The solution must run in polynomial time.
You'll have a hard time doing this in polynomial time, as the problem is known as the Subset sum problem, and is known to be NP-complete.
If you do find a polynomial solution, though, you'll have solved the "P = NP?" problem, which will make you quite rich.
The closest you get to a known polynomial solution is an approximation, such as the one listed on Wikipedia, which will try to get you an answer with a sum close to, but not necessarily equal to, 0.
This is a Subset sum problem, It's NP-Compelete but there is pseudo polynomial time algorithm for it. see wiki.
The problem can be solved in polynomial if the sum of items in set is polynomially related to number of items, from wiki:
The problem can be solved as follows
using dynamic programming. Suppose the
sequence is
x1, ..., xn
and we wish to determine if there is a
nonempty subset which sums to 0. Let N
be the sum of the negative values and
P the sum of the positive values.
Define the boolean-valued function
Q(i,s) to be the value (true or false)
of
"there is a nonempty subset of x1, ..., xi which sums to s".
Thus, the solution to the problem is
the value of Q(n,0).
Clearly, Q(i,s) = false if s < N or s
P so these values do not need to be stored or computed. Create an array to
hold the values Q(i,s) for 1 ≤ i ≤ n
and N ≤ s ≤ P.
The array can now be filled in using a
simple recursion. Initially, for N ≤ s
≤ P, set
Q(1,s) := (x1 = s).
Then, for i = 2, …, n, set
Q(i,s) := Q(i − 1,s) or (xi = s) or Q(i − 1,s − xi) for N ≤ s ≤ P.
For each assignment, the values of Q
on the right side are already known,
either because they were stored in the
table for the previous value of i or
because Q(i − 1,s − xi) = false if s −
xi < N or s − xi > P. Therefore, the
total number of arithmetic operations
is O(n(P − N)). For example, if all
the values are O(nk) for some k, then
the time required is O(nk+2).
This algorithm is easily modified to
return the subset with sum 0 if there
is one.
This solution does not count as
polynomial time in complexity theory
because P − N is not polynomial in the
size of the problem, which is the
number of bits used to represent it.
This algorithm is polynomial in the
values of N and P, which are
exponential in their numbers of bits.
A more general problem asks for a
subset summing to a specified value
(not necessarily 0). It can be solved
by a simple modification of the
algorithm above. For the case that
each xi is positive and bounded by the
same constant, Pisinger found a linear
time algorithm.[2]
It is well known Subset sum problem which NP-complete problem.
If you are interested in algorithms then most probably you are math enthusiast that I advise you look at
Subset Sum problem in mathworld
and here you can find the algorithm for it
Polynomial time approximation algorithm
initialize a list S to contain one element 0.
for each i from 1 to N do
let T be a list consisting of xi+y,
for all y in S
let U be the union of T and S
sort U
make S empty
let y be the smallest element of U
add y to S
for each element z of U in
increasing order do //trim the list by
eliminating numbers
close one to another
if y<(1-c/N)z, set y=z and add z to S
if S contains a number between (1-c)s and s, output yes, otherwise no

Resources