3SUM problem (finding triplets) in better than O(n^2) - algorithm

Consider the following problem:
Given an unsorted array of integers, find all triplets that satisfy x^2 + y^2 = z^2.
For example if given array is 1, 3, 7, 5, 4, 12, 13, the answer should be 5, 12, 13 and 3, 4, 5
I suggest the below algorithm with complexity O(n^2):
Sort the array in descending order, O(nlogn)
square each element, O(n)
Now it reduces to the problem of finding all triplets(a,b,c) in a sorted array such that a = b+c.
The interviewer was insisting on a solution better than O(n^2).
I have read 3SUM problem on Wikipedia, which emphasizes problem can be solved in O(n+ulogu) if numbers are in range [-u,u] assuming the array can be represented as a bit vector. But I am not able to get a clear picture of further explanations.
Can someone please help me in understanding what is going on with a nice example?

First of all. Finding all triplets in worst case is O(n^3). Suppose you have n=3k numbers. K of them are 3, k are 4 and k are 5.
3,....,3,4,....,4,5,.....5
There are k^3 = n^3/27 = O(n^3) such triplets. Just printing them takes O(n^3) time.
Next will be explaining of 3SUM problem in such form:
There are numbers s_1, ..., s_n each of them in range [-u;u]. How many triplets a,b,c there are that a+b=c?
transforming. Get 2*u numbers a_-u, ..., a_0, a_1, ... a_u. a_i is amount of numbers s_j, that s_j = i. This is done in O(n+u)
res = a_0 * sum(i=-u..u, i=/=0, C(a_i, 2)) + C(a_0,3) a_0 = 0
Build a polynom P(x) = sum(i = -u...u, a_i*x^(i+u).
Find Q(x) = P(x)*P(x) using FFT.
Note that Q(x) = sum(i=-2u..2u, b_i*x^(i+2*u)), where b_i is number of pairs s_u,s_k that s_u+s_k = i (This includes using same number twice).
For all even i do b_i = b_i - a_(i/2). This will remove using same number twice.
Sum all b_i*a_i/2 - add this to res.
Example: to be more simple, I will assume that range for numbers is [0..u] and won't use any +u in powers of x.
Suppose that we have numbers 1,2,3
- a_0 = 0, a_1 = 1, a_2 = 1, a_3 = 1
res = 0
P(x) = x + x^2 + x^3
Q(x) = x^2 +2x^3+3x^4+2x^5+x^6
After subtracting b_2 = 0, b_3 = 2, b_4 = 2, b_5 = 2, b_6 = 0
res += 0*1/2 + 2*1/2 + 2*0/2 + 2*0/2 + 6*0/2 = 1

Another possibility (who can fathom the mind of an interviewer?) would be to rewrite the equation as:
x^2 + y^2 = z^2
x^2 = z^2 - y^2 = (z-y)(z+y)
If we knew the prime factorisation of x^2 then we could simply iterate through all possible factorisations into a pair of numbers p,q (with p < q) and compute
x^2 = p.q = (z-y)(z+y)
p+q = (z-y)+(z+y) = 2z
q-p = (z+y)-(z-y) = 2y
z = (p+q)/2
y = (q-p)/2
So given a factorisation x^2=p.q we can work out the z and y values. By putting all the integer values into a set we can then check each possible answer in time O(1) (per check) by looking to see if those z,y values are in the array (taking care that negative values are also detected).
Wikipedia says that a randomly chosen integer has about log(n) divisors so this should take about n.log(n) assuming you can do the factorisation fast enough (e.g. if you knew all the integers were under a million you could precompute an array of the smallest factor for each integer).

Related

Number of pair in array that satisfy Bitwise Equation

I found this problem in a contest. The question is:
You are given an array of N non-negative integers (A1, A2, A3,..., An) and an integer M. Your task is to find the number of unordered pairs of array elements (X,Y) that satisfies the following bitwise equation:
2 * set_bits(X|Y) = M + set_bits(X ⊕ Y)
Note:
set_bits(n) represents the number of ones in the binary represenataion of an integer n.
X|Y represents the bitwise OR of integer X and Y.
X ⊕ Y represents the bitwise XOR of integer X and Y.
The unordered pair of array elements is pair (Ai, Aj) where 1 ≤ i < j ≤ N.
Print the number of unordered pairs of array elements that satisfy the above bitwise equation.
Sample Input 1:
N=4 M=2
arr = [3, 0, 4, 5]
Sample Output: 2
2 pairs are (3,0) and (0,5)
Sample Input 2:
N=8 M=2
arr = [3, 0, 4, 5, 6, 8, 1, 8]
Sample Output: 9
Is there any other way except brute force to solve this equation?
A solution with time complexity O(n) exists if the time complexity of set_bits is O(1).
First, we are going to rephrase the condition (the bitwise equation) a bit. Assume a pair of elements (X, Y) is given. Let c_01 represent the number of digits where X is 0 but Y is 1, c_10 represent the number digits where X is 1 and Y is 0, and c_11 represent the number of digits where X and Y are 1. For example, when X=5 and Y=1 (X=101, Y=001), c_01 = 0, c_10 = 1, c_11 = 1. Now, the condition can be rewritten as
2 * (c_01 + c_10 + c_11) = M + (c_01 + c_10)
because set_bits(X|Y) is equal to c_01 + c_10 + c_11 and set_bits(X^Y) is equal to c_01 + c_10.
We can reorder the equation into
c_01 + c_10 + 2*c_11 = M
by moving the term on the right to the left side. Now, realize that set_bits(X) = c_10 + c_11. Applying this information on the equation we get
c_01 + c_11 = M - set_bits(X)
Now, also realize that set_bits(Y) = c_01 + c_11. The equation becomes
set_bits(Y) = M - set_bits(X)
or
set_bits(X) + set_bits(Y) = M
The problem has turned into counting the number of pairs such that the number of set bits in the first element plus the number of set bits in the second element is equal to M. This can be done in linear time assuming you can compute set_bits in constant time.

Finding distinct pairs {x, y} that satisfies the equation 1/x + 1/y = 1/n with x, y, and n being whole numbers

The task is to find the amount of distinct pairs of {x, y} that fits the equation 1/x + 1/y = 1/n, with n being the input given by the user. Different ordering of x and y does not count as a new pair.
For example, the value n = 2 will mean 1/n = 1/2. 1/2 can be formed with two pairs of {x, y}, whcih are 6 and 3 and 4 and 4.
The value n = 3 will mean 1/n = 1/3. 1/3 can be formed with two pairs of {x, y}, which are 4 and 12 and 6 and 6.
The mathematical equation of 1/x + 1/y = 1/n can be converted to y = nx/(x-n) where if y and x in said converted equation are whole, they count as a pair of {x, y}. Using said converted formula, I will iterate n times starting from x = n + 1 and adding x by 1 per iteration to find whether nx % (x - n) == 0; if it yields true, the x and y are a new distinct pair.
I found the answer to limit my iteration by n times by manually computing the answers and finding the number of repetitions 'pattern'. x also starts with n+1 because otherwise, division by zero will happen or y will result in a negative number. The modulo operator is to indicate that the y attained is whole.
Questions:
Is there a mathematical explanation behind why the iteration is limited to n times? I found out that the limit of iteration is n times by doing manual computation and finding the pattern: that I only need to iterate n times to find the amount of distinct pairs.
Is there another way to find the amount of distinct pairs {x, y} other than my method above, which is by finding the VALUES of distinct pairs itself and then summing the amount of distinct pair? Is there a quick mathematical formula I'm not aware of?
For reference, my code can be seen here: https://gist.github.com/TakeNoteIAmHere/596eaa2ccf5815fe9bbc20172dce7a63
Assuming that x,y,n > 0 we have
Observation 1: both, x and y must be greater than n
Observation 2: since (x,y) and (y,x) do not count as distinct, we can assume that x <= y.
Observation 3: x = y = 2n is always a solution and if x > 2n then y < x (thus no new solution)
This means the possible values for x are from n+1 up to 2n.
A little algebra convers the equation
1/x + 1/y = n
into
(x-n)*(y-n) = n*n
Since we want a solution in integers, we seek integers f, g so that
f*g = n*n
and then the solution for x and y is
x = f+n, y = g+n
I think the easiest way to proceed is to factorise n, ie write
n = (P[1]^k[1]) * .. *(P[m]^k[m])
where the Ps are distinct primes, the ks positive integers and ^ denotes exponentiation.
Then the possibilities for f and g are
f = P[1]^a[1]) * .. *(P[m]^a[m])
g = P[1]^b[1]) * .. *(P[m]^b[m])
where the as and bs satisfy, for each i=1..m
0<=a[i]<=2*k[i]
b[i] = 2*k[i] - a[i]
If we just wanted to count the number of solutions, we would just need to count the number of fs, ie the number of distinct sequences a[]. But this is just
Nall = (2*k[1]+1)*... (2*[k[m]+1)
However we want to count the solution (f,g) and (g,f) as being the same. There is only one case where f = g (because the factorisation into primes is unique, we can only have f=g if the a[] equal the b[]) and so the number we seek is
1 + (Nall-1)/2

Dividing N items in p groups

You are given N total number of item, P group in which you have to divide the N items.
Condition is the product of number of item held by each group should be max.
example N=10 and P=3 you can divide the 10 item in {3,4,3} since 3x3x4=36 max possible product.
You will want to form P groups of roughly N / P elements. However, this will not always be possible, as N might not be divisible by P, as is the case for your example.
So form groups of floor(N / P) elements initially. For your example, you'd form:
floor(10 / 3) = 3
=> groups = {3, 3, 3}
Now, take the remainder of the division of N by P:
10 mod 3 = 1
This means you have to distribute 1 more item to your groups (you can have up to P - 1 items left to distribute in general):
for i = 0 up to (N mod P) - 1:
groups[i]++
=> groups = {4, 3, 3} for your example
Which is also a valid solution.
For fun I worked out a proof of the fact that it in an optimal solution either all numbers = N/P or the numbers are some combination of floor(N/P) and ceiling(N/P). The proof is somewhat long, but proving optimality in a discrete context is seldom trivial. I would be interested if anybody can shorten the proof.
Lemma: For P = 2 the optimal way to divide N is into {N/2, N/2} if N is even and {floor(N/2), ceiling(N/2)} if N is odd.
This follows since the constraint that the two numbers sum to N means that the two numbers are of the form x, N-x.
The resulting product is (N-x)x = Nx - x^2. This is a parabola that opens down. Its max is at its vertex at x = N/2. If N is even this max is an integer. If N is odd, then x = N/2 is a fraction, but such parabolas are strictly unimodal, so the closer x gets to N/2 the larger the product. x = floor(N/2) (or ceiling, it doesn't matter by symmetry) is the closest an integer can get to N/2, hence {floor(N/2),ceiling(N/2)} is optimal for integers.
General case: First of all, a global max exists since there are only finitely many integer partitions and a finite list of numbers always has a max. Suppose that {x_1, x_2, ..., x_P} is globally optimal. Claim: given and i,j we have
|x_i - x_ j| <= 1
In other words: any two numbers in an optimal solution differ by at most 1. This follows immediately from the P = 2 lemma (applied to N = x_i + x_ j).
From this claim it follows that there are at most two distinct numbers among the x_i. If there is only 1 number, that number is clearly N/P. If there are two numbers, they are of the form a and a+1. Let k = the number of x_i which equal a+1, hence P-k of the x_i = a. Hence
(P-k)a + k(a+1) = N, where k is an integer with 1 <= k < P
But simple algebra yields that a = (N-k)/P = N/P - k/P.
Hence -- a is an integer < N/P which differs from N/P by less than 1 (k/P < 1)
Thus a = floor(N/P) and a+1 = ceiling(N/P).
QED

Algorithm to find entires of array summing to 0 in O(nlog(n)) time

I'm working on the following problem:
Let A be an array of length n with each element -10n <= A[i] <= 10n. Create an algorithm running in O(nlog(n)) time that determines whether or not there exist entries A[i], A[j], and A[k] (i, j, and k not necessarily distinct) such that A[i] + A[j] + A[k] = 0.
I'm approaching it in the following way. Define a polynomial p of degree n-1 such that the coefficient on the x^k term is A[k]. Then use the FFT to multiply p with itself, and then multiply the resulting polynomial again by p. If any of the coefficients in the resulting polynomial are 0, then return true. Else, return false. Since the FFT is O(nlog(n)), this algorithm is then O(nlog(n)).
The problem I'm running into is that the FFT combines like terms, so to speak. Thus, the existence of a coefficient 0 does not imply that such entries exist.
Could anyone suggest a modification to this algorithm to improve it?
If I remember it right, the way to solve this problem is:
Define a polynomial of degree 60n + 1, where the coefficient on the term x^k is number of occurrences of element k - 10n in the array. For instance, if n=8, the coefficient on x^5 is number of occurrences of -75 (-75 = 5 - 10x8)
Use FFT to raise that polynomial (of degree 60n + 1) to the third power.
See if the coefficient on x^(30n) is non-zero. If it is, there's an answer.
Here's a sample implementation on python, it seems to work for all the cases I came up with:
import numpy as np
from numpy.fft import fft, ifft
def hasZeroSum(a):
n = len(a)
b = [0 for x in range(n * 60 + 1)]
for el in a: b[el + 10 * n] += 1
f = fft(b, n * 60 + 1)
f = np.power(f, 3)
res = ifft(f, n * 60 + 1)
return np.absolute(res[n * 30]) > 0.5
print hasZeroSum([-11, -5, 2, 3, 7])
print hasZeroSum([-11, -5, 2, 4, 8])
Prints
True
False

Maximum of sums of unsorted array and each of a number of sorted arrays

Given an unsorted array
A = a_1 ... a_n
And a set of sorted Arrays
B_i = b_i_1 ... b_i_n # for i from 1 to $large_number
I would like to find the maximums from the (not yet calculated) sum arrays
C_i = (a_1 + b_i_1) ... (a_n + b_i_n)
for each i.
Is there a trick to do better than just calculating all the C_i and finding their maximums in O($large_number * n)?
Can we do better when we know that the B arrays are just shifts from an endless sequence,
e.g.
S = 0 1 4 9 16 ...
B_i = S[i:i+n]
(The above sequence has the maybe advantageous property that (S_i - S_i-1 > S_i-1 - S_i-2))
There are $large_number * n data in your first problem, so there can't be any such trick.
You can prove this with an adversary argument. Suppose you have an algorithm that solves your problem without looking at all n * $large_number entries of b. I'm going to pick a fixed a, namely (-10, -20, -30, ..., -10n). The first $large_number * n - 1 the algorithm looks at an entry b_(i,j), I'll answer that it's 10j, for a sum of zero. The last time it looks at an entry, I'll answer that it's 10j+1, for a sum of 1.
If $large_number is Omega(n), your second problem requires you to look at n * $large_number entries of S, so it also can't have any such trick.
However, if you specify S, there may be something. And if $large_number <= n/2 (or whatever it is), then, all of the entries of S must be sorted, so you only have to look at the last B.
If we don't know anything I don't it's possible to do better than O($large_number * n)
However - If it's just shifts of an endless sequence we can do it in O($large_number + n):
We calculate B_0 ןמ O($large_number).
Than B_1 = (B_0 - S[0]) + S[n+1]
And in general: B_i = (B_i-1 - S[i-1]) + S[i-1+n].
So we can calculate all the other entries and the max in O(n).
This is for a general sequence - if we have some info about it, it might be possible to do better.
we know that the B arrays are just shifts from an endless sequence,
e.g.
S = 0 1 4 9 16 ...
B_i = S[i:i+n]
You can easily calculate S[i:i+n] as (sum of squares from 1 to i+n) - (sum of squares from 1 to i-1)
See https://math.stackexchange.com/questions/183316/how-to-get-to-the-formula-for-the-sum-of-squares-of-first-n-numbers
With the provided example, S1 = 0, S2 = 1, S3 = 4...
Let f(n) = SUM of Si for i=1 to n = (n-1)(n)(2n-1)/6
B_i = f(i+n) - f(i-1)
You then add SUM(A) to each sum.
Another approach is to calculate the difference between B_i and B_(i-1):
That would be: S[i:i+n] - S[i-1:i+n-1] = S(i+n) - S(i-1)
That way, you can just calculate the difference of the sums of each array with the previous one. In my understanding, since Ci = SUM(Bi)+SUM(A), SUM(A) becomes a constant that is irrelevant in finding the maximum.

Resources