String to string correction problem np-completeness proof - algorithm

I have this assignment to prove that this problem:
Finite alphabet £, two strings x,y €
£*, and a positive integer K. Is
there a way to derive the string y
from the string x by a sequence of K
or fewer operations of single symbol
deletion or adjacent symbol
interchange?
is np-complete. I already figured out I have to make transformation from decision version of set covering problem, but I have no clue how to do this. Any help would be appreciated.

It looks like modified Levenshtein distance. Problem can be solved with DP in quadratic time.
Transformation from minimum set cover (MSC) to this string correction problem is described in:
Robert A. Wagner
On the complexity of the Extended String-to-String Correction Problem
1975, Proceedings of seventh annual ACM symposium on Theory of computing
In short with MSC problem:
Given finite sets x_1, ..., x_n, and integer L, does there exists a subset J of {1,...,n} such that |J| <= L, and
union_{j in J} x_j = union all x_i ?
Let w = union all x_i, let t = |w| and r = t^2, and choose symbols Q, R, S not in w.
Take strings:
A = Q^r R x_1 Q^r S^(r+1) ... Q^r R x_n Q^r S^(r+1)
B = R Q^r ... R Q^r w S^(r+1) ... S^(r+1) <- each ... is n times
and
k = (l+1)r - 1 + 2t(r+1)(n-1) + n(n-1)(r+1)^2/2 + (r*n + |x_1 ... x_n| - t)*W_d
[W_d is delete operation weight, can be 1.]
It is shown that string-string correction problems (A,B,k) is satisfiable iff source MSC problem is.
From strings construction it is clear that proof is not trivial :-) But it isn't too complex to manage.

The NP-hardness proof that is mentioned only works for arbitrarily large alphabets.
For finite alphabets, the problem is polynomial-time, see
https://dblp.org/rec/bibtex/journals/tcs/Meister15

Related

Represent a prime number as a sum of four squared integers

Given a prime number p, find a four integers such that p is equal to sum of square of those integers.
1 < p < 10^12.
If p is of form 8n + 1 or 8n + 5, then p can be written as sum of two squares. This can be solved in O(sqrt(p)*log(sqrt(p)). But for other cases,i.e. when p cannot be written as sum of two squares, than is very inefficient. So, it would be great if anyone can give some resource material which i can read to solve the problem.
Given your constraints, I think that you can do a smart brute force.
First, note that if p = a^2 + b^2 + c^2 + d^2, each of a, b, c, d have to be less than 10^6. So just loop over a from 0 to sqrt(p). Consider q = p - a^2. It is easy to check whether q can be written as the sum of three squares using Legendre's three-square theorem. Once you find a value of q that works, a is fixed and you can just worry about q.
Deal with q the same way. Loop over b from 0 to sqrt(q), and consider r = q - b^2. Fermat's two-square theorem tells you how to check whether r can be written as the sum of two squares. Though this check requires O(sqrt(r)) time again, in practice you should be able to quickly find a value of b that works.
After this, it should be straightforward to find a (c,d) pair that works for r.
Since the loops for finding a and b and (c,d) are not nested but come one after the other, the complexity should be low enough to work in your problem.

Roots of a polynomial mod a prime

I'm looking for a speedy algorithm to find the roots of a univariate polynomial in a prime finite field.
That is, if f = a0 + a1x + a2x2 + ... + anxn (n > 0) then an algorithm that finds all r < p satisfying f(r) = 0 mod p, for a given prime p.
I found Chiens search algorithm https://en.wikipedia.org/wiki/Chien_search but I can't imagine this being that fast for primes greater than 20 bits. Does anyone have experience with Chien's search algorithm or know a faster way? Is there a sympy module for this?
This is pretty well studied, as mcdowella's comment indicates. Here is how the Cantor-Zassenhaus random algorithm works for the case where you want to find the roots of a polynomial, instead of the more general factorization.
Note that in the ring of polynomials with coefficients mod p, the product x(x-1)(x-2)...(x-p+1) has all possible roots, and equals x^p-x by Fermat's Little Theorem and unique factorization in this ring.
Set g = GCD(f,x^p-x). Using Euclid's algorithm to compute the GCD of two polynomials is fast in general, taking a number of steps that is logarithmic in the maximum degree. It does not require you to factor the polynomials. g has the same roots as f in the field, and no repeated factors.
Because of the special form of x^p-x, with only two nonzero terms, the first step of Euclid's algorithm can be done by repeated squaring, in about 2 log_2 (p) steps involving only polynomials of degree no more than twice the degree of f, with coefficients mod p. We may compute x mod f, x^2 mod f, x^4 mod f, etc, then multiply together the terms corresponding to nonzero places in the binary expansion of p to compute x^p mod f, and finally subtract x.
Repeatedly do the following: Choose a random d in Z/p. Compute the GCD of g with r_d = (x+d)^((p-1)/2)-1, which we can again compute rapidly by Euclid's algorithm, using repeated squaring on the first step. If the degree of this GCD is strictly between 0 and the degree of g, we have found a nontrivial factor of g, and we can recurse until we have found the linear factors hence roots of g and thus f.
How often does this work? r_d has as roots the numbers that are d less than a nonzero square mod p. Consider two distinct roots of g, a and b, so (x-a) and (x-b) are factors of g. If a+d is a nonzero square, and b+d is not, then (x-a) is a common factor of g and r_d, while (x-b) is not, which means GCD(g,r_d) is a nontrivial factor of g. Similarly, if b+d is a nonzero square while a+d is not, then (x-b) is a common factor of g and r_d while (x-a) is not. By number theory, one case or the other happens close to half of the possible choices for d, which means that on average it takes a constant number of choices of d before we find a nontrivial factor of g, in fact one separating (x-a) from (x-b).
Your answers are good, but I think I found a wonderful method to find the roots modulo any number: This method based on "LATTICES". Let r ≤ R be a root of mod p. We must find another function such as h(x) such that h isn't large and r is root of h. Lattice method find this function. At the first time, we must create a basis of polynomial for lattice and then, with "LLL" algorithm, we find a "shortest vector" that has root r without modulo p. In fact, we eliminate modulo p with this way.
For more explanation, refer to "Coppersmith D. Finding small solutions to small degree polynomials. In Cryptography and lattices".

Finding integral solution of an equation

This is part of a bigger question. Its actually a mathematical problem. So it would be really great if someone can direct me to any algorithm to obtain the solution of this problem or a pseudo code will be of help.
The question. Given an equation check if it has an integral solution.
For example:
(26a+5)/32=b
Here a is an integer. Is there an algorithm to predict or find if b can be an integer. I need a general solution not specific to this question. The equation can vary. Thanks
Your problem is an example of a linear Diophantine equation. About that, Wikipedia says:
This Diophantine equation [i.e., a x + b y = c] has a solution (where x and y are integers) if and only if c is a multiple of the greatest common divisor of a and b. Moreover, if (x, y) is a solution, then the other solutions have the form (x + k v, y - k u), where k is an arbitrary integer, and u and v are the quotients of a and b (respectively) by the greatest common divisor of a and b.
In this case, (26 a + 5)/32 = b is equivalent to 26 a - 32 b = -5. The gcd of the coefficients of the unknowns is gcd(26, -32) = 2. Since -5 is not a multiple of 2, there is no solution.
A general Diophantine equation is a polynomial in the unknowns, and can only be solved (if at all) by more complex methods. A web search might turn up specialized software for that problem.
Linear Diophantine equations take the form ax + by = c. If c is the greatest common divisor of a and b this means a=z'c and b=z''c then this is Bézout's identity of the form
with a=z' and b=z'' and the equation has an infinite number of solutions. So instead of trial searching method you can check if c is the greatest common divisor (GCD) of a and b
If indeed a and b are multiples of c then x and y can be computed using extended Euclidean algorithm which finds integers x and y (one of which is typically negative) that satisfy Bézout's identity
(as a side note: this holds also for any other Euclidean domain, i.e. polynomial ring & every Euclidean domain is unique factorization domain). You can use Iterative Method to find these solutions:
Integral solution to equation `a + bx = c + dy`

How do we know that an NFA has a minimum amount of states?

Is there some kind of proof for this? How can we know that the current NFA has the minimum amount?
As opposed to DFA minimization, where efficient methods exist to not only determine the size of, but actually compute, the smallest DFA in terms of number of states that describes a given regular language, no such general method is known for determining the size of a smallest NFA. Moreover, unless P=PSPACE, no polynomial-time algorithm exists to compute a minimal NFA to recognize a language, as the following decision problem is PSPACE-complete:
Given a DFA M that accepts the regular language L, and an integer k, is there an NFA with ≤ k states accepting L?
(Jiang & Ravikumar 1993).
There is, however, a simple theorem from Glaister and Shallit that can be used to determine lower bounds on the number of states of a minimal NFA:
Let L ⊆ Σ* be a regular language and suppose that there exist n pairs P = { (xi, wi) | 1 ≤ i ≤ n } such that:
xi wi ∈ L for 1 ≤ i ≤ n
xj wi ∉ L for 1 ≤ j, i ≤ n and j ≠ i
Then any NFA accepting L has at least n states.
See: Ian Glaister and Jeffrey Shallit (1996). "A lower bound technique for the size of nondeterministic finite automata". Information Processing Letters 59 (2), pp. 75–77. DOI:10.1016/0020-0190(96)00095-6.

Find sum in array equal to zero

Given an array of integers, find a set of at least one integer which sums to 0.
For example, given [-1, 8, 6, 7, 2, 1, -2, -5], the algorithm may output [-1, 6, 2, -2, -5] because this is a subset of the input array, which sums to 0.
The solution must run in polynomial time.
You'll have a hard time doing this in polynomial time, as the problem is known as the Subset sum problem, and is known to be NP-complete.
If you do find a polynomial solution, though, you'll have solved the "P = NP?" problem, which will make you quite rich.
The closest you get to a known polynomial solution is an approximation, such as the one listed on Wikipedia, which will try to get you an answer with a sum close to, but not necessarily equal to, 0.
This is a Subset sum problem, It's NP-Compelete but there is pseudo polynomial time algorithm for it. see wiki.
The problem can be solved in polynomial if the sum of items in set is polynomially related to number of items, from wiki:
The problem can be solved as follows
using dynamic programming. Suppose the
sequence is
x1, ..., xn
and we wish to determine if there is a
nonempty subset which sums to 0. Let N
be the sum of the negative values and
P the sum of the positive values.
Define the boolean-valued function
Q(i,s) to be the value (true or false)
of
"there is a nonempty subset of x1, ..., xi which sums to s".
Thus, the solution to the problem is
the value of Q(n,0).
Clearly, Q(i,s) = false if s < N or s
P so these values do not need to be stored or computed. Create an array to
hold the values Q(i,s) for 1 ≤ i ≤ n
and N ≤ s ≤ P.
The array can now be filled in using a
simple recursion. Initially, for N ≤ s
≤ P, set
Q(1,s) := (x1 = s).
Then, for i = 2, …, n, set
Q(i,s) := Q(i − 1,s) or (xi = s) or Q(i − 1,s − xi) for N ≤ s ≤ P.
For each assignment, the values of Q
on the right side are already known,
either because they were stored in the
table for the previous value of i or
because Q(i − 1,s − xi) = false if s −
xi < N or s − xi > P. Therefore, the
total number of arithmetic operations
is O(n(P − N)). For example, if all
the values are O(nk) for some k, then
the time required is O(nk+2).
This algorithm is easily modified to
return the subset with sum 0 if there
is one.
This solution does not count as
polynomial time in complexity theory
because P − N is not polynomial in the
size of the problem, which is the
number of bits used to represent it.
This algorithm is polynomial in the
values of N and P, which are
exponential in their numbers of bits.
A more general problem asks for a
subset summing to a specified value
(not necessarily 0). It can be solved
by a simple modification of the
algorithm above. For the case that
each xi is positive and bounded by the
same constant, Pisinger found a linear
time algorithm.[2]
It is well known Subset sum problem which NP-complete problem.
If you are interested in algorithms then most probably you are math enthusiast that I advise you look at
Subset Sum problem in mathworld
and here you can find the algorithm for it
Polynomial time approximation algorithm
initialize a list S to contain one element 0.
for each i from 1 to N do
let T be a list consisting of xi+y,
for all y in S
let U be the union of T and S
sort U
make S empty
let y be the smallest element of U
add y to S
for each element z of U in
increasing order do //trim the list by
eliminating numbers
close one to another
if y<(1-c/N)z, set y=z and add z to S
if S contains a number between (1-c)s and s, output yes, otherwise no

Resources