Unable to figure out the logic of this usage - algorithm

I have two arrays, A and B of same size (say, k) and I need to find the nth smallest sum of one value from A and one value from B. For eg, let
A is [1,2,3] and B is [4, 5, 6].There are 9 elements in the set of sums :
1+4=5;
2+4=6;
1 + 5 = 6;
1 + 6 = 7;
2 + 5 = 7;
3 + 4 = 7;
2 + 6 = 8;
3 + 5 = 8;
3 + 6 = 9;
If I need to find the 4th smallest sum, my answer is 7.
A naive solution involving a double-loop is easy to find, but I came across this code for the double-loop:
sort(a+1,a+k+1);
sort(b+1,b+k+1);
for(i=1; i<=k; i++)
{
n=10001/i; //WHY THIS LINE
ind=min(k,n); //WHY THIS LINE
v=a[i];
for(j=1; j<=ind; j++)
{
vc.push_back((v+b[j]));
}
}
I'm unable to understand the use of that 'n' here which I guess is some kind of optimization as without this 'n', the rest of the solution is naive. Also, I'm not sure if it's important, but the constraints of n are
1 <= k <= 10000
Hope somebody can help.
Source: A problem from CodeChef - LOWSUM

All this is saying is:
"If the k-th smallest sum (with k<=10000) is made of a[i]+b[j] for some given i, then j can not be bigger than (k+1)/i, so it can't be bigger than 10001/i either". So you just don't look for a j bigger than (10001)/i to associate with a given i.
This is because you know that the smallest i values in a, associated with the (k+1)/i values in b, will already give you at least k+1 possible sums, all smaller than the ones made with a[i] and b[j>(k+1)/i]
Obviously j should not be bigger than k either, since all the a[i]+b[j<=q] would be smaller. So j must be smaller than min(k, 10001/i).
(I haven't properly checked my cases of equality, the +1 that are needed or not, etc, but the idea is here).

Related

Could anyone tell me a better solution for this problem? I could only think of brute force way which is O(n^2)

Recently I was attempting the following problem:
Given an array of integers, arr.
Find sum of floor of (arr[i]/arr[j]) for all pairs of indices (i,j).
e.g.
Input: arr[]={1,2,3,4,5}
Output: Sum=27.
Explanation:
(1/1)+(1/5)+(1/4)+(1/2)+(1/3) = 1+0+0+0+0 = 1
(5/1)+(5/5)+(5/4)+(5/2)+(5/3) = 5+1+1+2+1 = 10
(4/1)+(4/5)+(4/4)+(4/2)+(4/3) = 4+0+1+2+1 = 8
(2/1)+(2/5)+(2/4)+(2/2)+(2/3) = 2+0+0+1+0 = 3
(3/1)+(3/5)+(3/4)+(3/2)+(3/3) = 3+0+0+1+1 = 5
I could only think of naive O(n^2) solution. Is there any other better approach?
Thanks in advance.
A possibility resides in "quickly" skipping the elements that are the same integer multiple of a given element (after rounding).
For the given example, the vertical bars below delimit runs of equal ratios (the lower triangle is all zeroes and ignored; I show the elements on the left and the ratios on the right):
1 -> 2 | 3 | 4 | 5 ≡ 2 | 3 | 4 | 5
2 -> 3 | 4 5 ≡ 1 | 2 2
3 -> 4 5 ≡ 1 1
4 -> 5 ≡ 1
For bigger arrays, the constant runs can be longer.
So the algorithm principle is
sort all elements increasingly;
process the elements from smallest to largest;
for a given element, find the index of the first double and count the number of skipped elements;
from there, find the index of the first triple and count twice the number of skipped elements;
continue with higher multiples until you exhaust the tail of the array.
A critical operation is to "find the next multiple". This should be done by an exponential search followed by a dichotomic search, so that the number of operations remains logarithmic in the number of elements to skip (a pure dichotomic search would be logarithmic in the total number of remaining elements). Hence the cost of a search will be proportional to the sum of the logarithms of the distances between the multiples.
Hopefully, this sum will be smaller than the sum of the distances themselves, though in the worst case the complexity remains O(N). In the best case, O(log(N)).
A global analysis is difficult and in theory the worst-case complexity remains O(N²); but in practice it could go down to O(N log N), because the worst case would require that the elements grow faster than a geometric progression of common ratio 2.
Addendum:
If the array contains numerous repeated values, it can be beneficial to compress it by storing a repetition count and a single instance of every value. This can be done after sorting.
int[] arr = { 1, 2, 3, 4, 5 };
int result = 0;
int BuffSize = arr.Max() * 2;
int[] b = new int[BuffSize + 1];
int[] count = new int[BuffSize];
for (int i = 0; i < arr.Length; ++i)
count[arr[i]]++;
for (int i = BuffSize - 1; i >= 1; i--)
{
b[i] = b[i + 1] + count[i];
}
for (int i = 1; i < BuffSize; i++)
{
if (count[i] == 0)
{
continue;
}
for (int j = i, mul = 1; j < BuffSize; j += i, mul++)
{
result += 1 * (b[j] - b[Math.Min(BuffSize - 1, j + i)]) * mul * count[i];
}
}
This code takes advantage of knowing difference between each successive value ahead of time, and only process the remaining portion of the array rather than redundantly processing the entire thing n^2 times,
I believe it has a worst case runtime of O(n*sqrt(n)*log(n))

Have O(n^2) algorithm for "two-sum", convert to O(n) linear solution [duplicate]

This question already has answers here:
Find a pair of elements from an array whose sum equals a given number
(33 answers)
Closed 5 years ago.
I have an O(n^2) solution to the classic two-sum problem. Where A[1...n] sorted array of positive integers. t is some positive integer.
Need to show that A contains two distinct elements a and b s.t. a+ b = t
Here is my solution so far:
t = a number;
for (i=0; i<A.length; i++)
for each A[j]
if A[i] + A[j] == t
return true
return false
How do I make this a linear solution? O(n) scratching my head trying to figure it out.
Here's an approach I have in mind so far. i will start at the beginning of A, j will start at the end of A. i will increment, j will decrement. So I'll have two counter variables in the for loop, i & j.
There are couple of ways to improve upon that.
You could extend your algorithm, but instead of doing a simple search for every term, you could do a binary search
t = a number
for (i = 0; i < A.length; i++)
j = binarySearch(A, t - A[i], i, A.length - 1)
if (j != null)
return true
return false
Binary search is done by O(log N) steps, since you perform a binary search per every element in the array, the complexity of the whole algorithm would be O(N*log N)
This already is a tremendous improvement upon O(N^2), but you can do better.
Let's take the sum 11 and the array 1, 3, 4, 8, 9 for example.
You can already see that (3,8) satisfy the sum. To find that, imagine having two pointers, once pointing at the beginning of the array (1), we'll call it H and denote it with bold and another one pointing at the end of the array (9), we'll call it T and denote it with emphasis.
1 3 4 8 9
Right now the sum of the two pointers is 1 + 9 = 10.
10 is less than the desired sum (11), there is no way to reach the desired sum by moving the T pointer, so we'll move the H pointer right:
1 3 4 8 9
3 + 9 = 12 which is greater than the desired sum, there is no way to reach the desired sum by moving the H pointer, moving it right will further increase the sum, moving it left bring us to the initital state, so we'll move the T pointer left:
1 3 4 8 9
3 + 8 = 11 <-- this is the desired sum, we're done.
So the rules of the algorithm consist of moving the H pointer left or moving the T pointer right, we're finished when the sum of the two pointer is equal to the desired sum, or H and T crossed (T became less than H).
t = a number
H = 0
T = A.length - 1
S = -1
while H < T && S != t
S = A[H] + A[T]
if S < t
H++
else if S > t
T--
return S == t
It's easy to see that this algorithm runs at O(N) because we traverse each element at most once.
You make 2 new variables that contain index 0 and index n-1, let's call them i and j respectively.
Then, you check the sum of A[i] and A[j] and if the sum is smaller than t, then increment i (the lower index), and if it is bigger then decrement j (the higher index). continue until you either find i and j such that A[i] + A[j] = t so you return true, or j <= i, and you return false.
int i = 0, j = n-1;
while(i < j) {
if(A[i] + A[j] == t)
return true;
if(A[i] + A[j] < t)
i++;
else
j--;
return false;
Given that A[i] is relatively small (maybe less than 10^6), you can create an array B of size 10^6 with each value equal to 0. Then apply the following algorithm:
for i in 1...N:
B[A[i]] += 1
for i in 1...N:
if t - A[i] > 0:
if B[t-A[i]] > 0:
return True
Edit: well, now that we know that the array is sorted, it may be wiser to find another algorithm. I'll leave the answer here since it still applies to a certain class of related problems.

Tips in optimizing an algorithm

I'll try to phrase this question without making it sound like I am seeking for homework answers (this is just a practice problem for algorithms)
you have an array of numbers where every value can occur at most 2x [1 3 5 2 5 1 2 3]
Examine sums from one value to the other instance of itself (5 + 2 + 5) (2 + 5 + 1 + 2)
Find an algorithm that finds the max of such sum.
I came up with a pretty simple algorithm:
iterate through the array (for i=1 to n)
iterate through the remaining array (for j=i+1)
if A[i] == A[j]
s = 0
iterate through the values between those two points (for k=i to j)
s = s + A[k]
maxVal = max(maxVal,s)
What are some steps I can take towards optimizing the algorithm. (or any algorithm for that matter). Obviously this solution is the first that came to mind but I am having trouble envisioning better solutions that would be more efficient.
Edit: For sake of the problem I'll just say all elements are postitive
Calculate array of cumulative sums:
C[0] = A[0]
for i = 1 to n
C[i] = C[i - 1] + A[i]
A[1 3 5 2 5 1 2 3]
C[0 1 4 9 11 16 17 19 22]
Use these values when pair found:
Sum(i to j) = C[j] - C[i - 1]
P.S. Are all elements always positive?
You can get rid of the most inner loop by pre calculate all the sum from index 1 to index i and store it into an array, call sum. So if you want to get the sum between i and j, the result will be sum[j] - sum[i - 1].
for(i = 1 to n)
sum[i] = A[i];
if(i - 1 > 1)
sum[i] += sum[i - 1];
Also notice that there are only two occurrences for each value in the array, we can remember the first position of this value using a map/dictionary or an array pos (if possible) and if we saw it again , we can use it to calculate the sum between the first and this position.
pos[];
for(i = 1 to n)
if(pos[A[i]] ==0)
pos[A[i]] = i;
else
result = max(result,sum[i] - sum[pos[A[i]] - 1])
So in total, the time complexity of this will be O(n)

Specific Max Sum of the elements of an Int array - C/C++

Let's say we have an array: 7 3 1 1 6 13 8 3 3
I have to find the maximum sum of this array such that:
if i add 13 to the sum: i cannot add the neighboring elements from each side: 6 1 and 8 3 cannot be added to the sum
i can skip as many elements as necessary to make the sum max
My algorithm was this:
I take the max element of the array and add that to the sum
I make that element and the neighbor elements -1
I keep doing this until it's not possible to find anymore max
The problem is that for some specific test cases this algorithm is wrong.
Lets see this one: 15 40 45 35
according to my algorithm:
I take 45 and make neighbors -1
The program ends
The correct way to do it is 15 + 35 = 50
This problem can be solved with dynamic programming.
Let A be the array, let DP[m] be the max sum in {A[1]~A[m]}
Every element in A only have two status, been added into the sum or not. First we suppose we have determine DP[1]~DP[m-1], now look at {A[1]~A[m]}, A[m] only have two status that we have said, if A[m] have been added into, A[m-1] and A[m-2] can't be added into the sum, so in add status, the max sum is A[m]+DP[m-3] (intention: DP[m-3] has been the max sum in {A[1]~A[m-3]}), if A[m] have not been added into the sum, the max sum is DP[m-1], so we just need to compare A[m]+DP[m-3] and DP[m-1], the bigger is DP[m]. The thought is the same as mathematical induction.
So the DP equation is DP[m] = max{ DP[m-3]+A[m], DP[m-1] },DP[size(A)] is the result
The complexity is O(n), pseudocode is follow:
DP[1] = A[1];
DP[2] = max(DP[1], DP[2]);
DP[3] = max(DP[1], DP[2], DP[3]);
for(i = 4; i <= size(A); i++) {
DP[i] = DP[i-3] + A[i];
if(DP[i] < DP[i-1])
DP[i] = DP[i-1];
}
It's solvable with a dynamic programming approach, taking O(N) time and O(N) space. Implementation following:
int max_sum(int pos){
if( pos >= N){ // N = array_size
return 0;
}
if( visited(pos) == true ){ // if this state is already checked
return ret[pos]; // ret[i] contains the result for i'th cell
}
ret[pos] = max_sum(pos+3) + A[pos] + ret[pos-2]; // taking this item
ret[pos] = max(ret[pos], ret[pos-1]+ max_sum(pos+1) ); // if skipping this item is better
visited[pos] = true;
return ret[pos];
}
int main(){
// clear the visited array
// and other initializations
cout << max_sum(2) << endl; //for i < 2, ret[i] = A[i]
}
The above problem is max independent set problem (with twist) in a path graph which has dynamic programming solution in O(N).
Recurrence relation for solving it : -
Max(N) = maximum(Max(N-3) + A[N] , Max(N-1))
Explanation:- IF we have to select maximum set from N elements than we can either select Nth element and the maximum set from first N-3 element or we can select maximum from first N-1 elements excluding Nth element.
Pseudo Code : -
Max(1) = A[1];
Max(2) = maximum(A[1],A[2]);
Max(3) = maximum(A[3],Max(2));
for(i=4;i<=N;i++) {
Max(N) = maximum(Max(N-3)+A[N],Max(N-1));
}
As suggested, this is a dynamic programming problem.
First, some notation, Let:
A be the array, of integers, of length N
A[a..b) be the subset of A containing the elements at index a up to
but not including b (the half open interval).
M be an array such that M[k] is the specific max sum of A[0..k)
such that M[N] is the answer to our original problem.
We can describe an element of M (M[n]) by its relation to one or more elements of M (M[k]) where k < n. And this lends itself to a nice linear time algorithm. So what is this relationship?
The base cases are as follows:
M[0] is the max specific sum of the empty list, which must be 0.
M[1] is the max specific sum for a single element, so must be
that element: A[0].
M[2] is the max specific sum of the first two elements. With only
two elements, we can either pick the first or the second, so we better
pick the larger of the two: max(A[0], A[1]).
Now, how do we calculate M[n] if we know M[0..n)? Well, we have a choice to make:
Either we add A[n-1] (the last element in A[0..n)) or we don't. We don't know for
certain whether adding A[n-1] in will make for a larger sum, so we try both and take
the max:
If we don't add A[n-1] what would the sum be? It would be the same as the
max specific sum immediately before it: M[n-1].
If we do add A[n-1] then we can't have the previous two elements in our
solution, but we can have any elements before those. We know that M[n-1] and
M[n-2] might have used those previous two elements, but M[n-3] definitely
didn't, because it is the max in the range A[0..n-3). So we get
M[n-3] + A[n-1].
We don't know which one is bigger though, (M[n-1] or M[n-3] + A[n-1]), so to find
the max specific sum at M[n] we must take the max of those two.
So the relation becomes:
M[0] = 0
M[1] = A[0]
M[2] = max {A[0], A[1]}
M[n] = max {M[n-1], M[n-3] + A[n-1]} where n > 2
Note a lot of answers seem to ignore the case for the empty list, but it is
definitely a valid input, so should be accounted for.
The simple translation of the solution in C++ is as follows:
(Take special note of the fact that the size of m is one bigger than the size of a)
int max_specific_sum(std::vector<int> a)
{
std::vector<int> m( a.size() + 1 );
m[0] = 0; m[1] = a[0]; m[2] = std::max(a[0], a[1]);
for(unsigned int i = 3; i <= a.size(); ++i)
m[i] = std::max(m[i-1], m[i-3] + a[i-1]);
return m.back();
}
BUT This implementation has a linear space requirement in the size of A. If you look at the definition of M[n], you will see that it only relies on M[n-1] and M[n-3] (and not the whole preceding list of elements), and this means you need only store the previous 3 elements in M, resulting in a constant space requirement. (The details of this implementation are left to the OP).

Report all missing numbers in an array of integers (represented in binary)

I recently had a friend report to me that during a job interview he was asked the following question, which seems to be a pretty popular one:
You are given a list L[1...n] that contains all the elements from 0 to n except one. The elements of this list are represented in binary and are not given in any particular order, and the only operation we can use to access them is to fetch the jth bit of L[i] in constant time.
How can you find the missing number in O(n) ?
He was able to answer this question (which I believe has multiple solutions, none of which being too complicated). For example, the following pseudo-code solves the above problem:
Say all numbers are represented by k bits and set j as the least significant bit (initially the rightmost).
1. Starting from j, separate all the numbers in L into two sets (S1 containing all numbers that have 1 as its jth bit, and S2 containing all numbers that have 0 in that position).
2. The smaller of the two sets contains the missing number, recurse on this subset and set j = j-1
At each iteration we reduce the size of the set by half. So initially we have O(n), followed by O(n/2), O(n/4) ... = O(n)
However the follow-up question was: "What if we now have k numbers missing in our list L and we wish to report all k numbers while still keeping the O(n) complexity and the limitations of the initial problem? How to do?
Any suggestions?
bool J[1..n + 1]={false,false...}
int temp;
for(i = 1; i <= n; i++)
{
temp=bitwisecopy of L[i];
J[temp + 1]=true
}
for(i = 1; i <= n+1; i++)
{
if(J[i]==false)
print i + 1;
}
Lol thats the jist of it...I think indices may be messed up.
Am I understanding the problem correctly? It wasn't all the clear to me what exactly was meant by the only operation is access the jth bit of L[i].
You can solve the original problem in O(n) by just doing a linear walk of the array until you find a number that doesn't match the expected value, like so (yes, I know I'm using an array of ints to approximate the array of bits, but the concept is the same):
int[] bits = {1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0};
int bitIndex = 0;
for (int num = 1; num < Integer.MAX_VALUE; num++) {
int numBits = (int) (Math.log(num) / Math.log(2)) + 1;
int nextNum = 0;
for (int index = 0; index < numBits; index++) {
nextNum = (nextNum << 1) | bits[bitIndex + index];
}
if (nextNum != num) {
System.out.println("Missing number: expected=" + num + ", actual=" + nextNum);
break;
}
bitIndex += numBits;
}
If you want to print all of the numbers that are not present in the array while keeping O(n) runtime, just replace the break; with num = nextNum; to continue checking for the next number.
Though there are some potential problems with this approach. If multiple consecutive numbers are missing then all bets are off. Also if the number of bits in num + 1 is larger than the number of bits in num, and num is missing from the bit array, then the bit index will be out of alignment with the data.
Of course, if multiple numbers are allowed to be missing, then the problem isn't really solvable. Consider for example:
{1,1,1,1,1,1,1}
It's just as valid in this case to say that I have numbers 1, 3, and 15 as it is to say that I only have 127 or that I have 7 and 15. When multiple consecutive values are permitted to go missing, the way to parse the bits essentially becomes arbitrary.
So perhaps one way to answer the second question is to read all the bits into a single large integer, and say "you have [very large number], and all the numbers before it are missing". Then you've produced a valid answer in O(n) time.
My idea is to solve it in the following way:
lets say 2^M is the lowest power of 2 that higher than N:
2^M>N, 2^M-1 <= N
now go over all the numbers from 1 to 2^M-1 and do bitwise XOR between all the numbers
(since you can only go over bit J each time do it for each digis separately - it's the same)
the result of all the XORs will be the number you are looking for.
for example: if N=6, and the missing number is 3:
M=3 => 2^M-1=7 =>
1 XOR 2 XOR 4 XOR 5 XOR 6 XOR 7 = 3

Resources