Find ascending triples in a list - algorithm

I encountered the problem in a programming interview and have no idea about it by now.
A list whose length is n, the elements in it are all positive integers without order.
To find out all possible triples (a, b, c), that a < b < c, and a appears before b and b before c in the list.
And analyse the time complexity of your algorithm.

No general algorithm can be faster than O(n^3), as given a sorted input of distinct elements then the output will have size O(n^3), so just to produce the output will take time proportional. Indeed even a randomly generated list of integers will already have n^3 triples up until constant factors.
That given you could simply iterate over all possible triples in list order, and compare them for sorted order. This naive solution is already the best it can be asymptotically (that is O(n^3))
for (int i = 0; i < n; i++)
for (int j = i+1; j < n; j++)
for (int k = j+1; k < n; k++)
if (X[i] < X[j] && X[j] < X[k)
output(X[i],X[j],X[k])
I suspect you may have a transcription error in your problem statement - or else the question is supposed to be a very easy short coding exercise.

If it is known that there are only a small set of triples (say k), then you may prefer to find all the triples by storing pointers to the previous smallest element.
ALGORITHM
Prepare an empty data structure (possible choices described later).
Prepare an empty array B of length n.
Then for each element c in the list:
Store the index (in the array B) of the most recent element in the list that is smaller than c (if it exists) using the data structure.
Store c (and its index in the original list) in the data structure
Then use array B to find all elements b smaller than c, and then again to find all elements a smaller than b, and emit all these combinations as output triples.
DATA STRUCTURE
The data structure needs to be able to store value,position pairs to make it easy to find the largest position (i.e. most recent) over all elements with value less than c.
One easy way to do this if the range of allowed values is fairly small is to use a series of arrays where A[k][x] stores the maximum position for all elements in the range [x*2^k,(x+1)*2^k).
If the values have up to M bits (i.e. the values are in the range 0 to 2^M-1) then updating or accessing this data structure are both O(M) operations.
COMPLEXITY
The given method is O(nM+k).
If the values have a larger range, then you could use a form of binary search tree instead of the series of arrays, or instead sort the values and replace the values with their ordinal value. This would then have complexity O(nlogn+k).
COUNTING TRIPLES
If you just wish to know the total number of triples of this form then you can do this in O(n).
The idea is similar to before:
Find the most recent smaller element for each index, and the count of smaller elements for each index
Find the next greater element for each index, and the count of greater elements
Compute the sum of the product of the count of smaller elements and the count of larger elements for each index.
To make this O(n) we need to be able to find the next greater element in O(n). This can be done by:
Push the current index i to the stack
while A[top(stack)] < A[i+1], pop an index x off the stack and store NGE[x]=i+1
increment i and return to step 1
We also need to be able to find the count of greater elements in O(n). Once the NGE array has been prepared, we can find the counts by iterating backwards over the array and computing
count_greater_elements[i] = count_greater_elements[ NGE[i] ] + 1 if NGE[i] is defined
= 0 otherwise
The most recent smaller elements and counts can be computed in an analogous way.

N^2 solution for general case (to count all such triples, not output them all; output will take n^3 just because of its size):
for each number X in array lets count amount of numbers less than X with indexes less than x and amount of numbers greater than X with indexes greater than X. Than for each X we can get number of triples in which X is the middle element just as less[X] * greater[X]. The answer is sum of such products.
int calc(vector<int> numbers) {
int n = numbers.size();
vector<int> less(n), more(n);
for (int i = 0; i < n; i++)
for (int j = i + 1; j < n; j++)
if (numbers[i] < numbers[j])
less[j]++, more[i]++;
int res = 0;
for (int i = 0; i < n; i++)
res += less[i] * more[i];
return res;
}

Related

How was the assumption made on half will have i < j?

I have been reading "Cracking the Coding Interview 6th Edition".. On Chapter 0 - Big O, I have problem understanding an assumption made to a problem on Example 3.
void printUnorderedPairs(int[] array){
for(int i = 0; i < array.length; i++){
for(int j = i + 1; j < array.length; j++){
...
}
}
}
Under What It Means section, it assumed that:
There are N^2 total pairs. Roughly half of those will have i < j and the remaining half will have i > j. This code goes through roughly n^2/2 pairs so it does O(N^2) work.
My question is, how was the assumption made on Roughly half of those will have i < j and the remaining half will have i > j done? Can someone explain it to me please?
Thanks!
There are several ways you can try to think about this assumption, I quite like the "geometric" suggestion from #IanMercer in the comments. Here is another:
What is an unordered pair
An unordered pair is a pair of integers (i,j) where i and j is in the domain (1, N). (They can take any value from 1 to N).
How many pairs are there?
i can be of any value from 1 to N, and j can be of any value from 1 to N. Any combination of i forms a valid pair. So there are are N*N pairs.
Among all the pairs, how many pairs are there that i < j
Note that for any pair (a,b) where a is smaller than b, there exists a counterpart (b,a) (same values but flipped). So there is an equal amount of pairs where i<j as there are pairs 'i>j'.
So what is this confusing roughly part? It is because of all those N*N pairs there are some where neither i<j nor j>i, and those are precisely the N pairs where i==j.
The N*N pairs are thus divided into three parts (those where i < j), (those where j> i) and (those where i==j). Since first two are much larger O(N**2)/2 vs. the last group which has only N elements, we can state that roughly half have the property that i<j.

Comparing between two arrays

How can I compare between two arrays with sorted contents of integer in binary algorithm?
As in every case: it depends.
Assuming that the arrays are ordered or hashed the time complexity is at most O(n+m).
You did not mention any language, so it's pseudo code.
function SortedSequenceOverlap(Enumerator1, Enumerator2)
{ while (Enumerator1 is not at the end and Enumerator2 is not at the end)
{ if (Enumerator1.current > Enumerator2.current)
Enumerator2.fetchNext()
else if (Enumerator2.current > Enumerator1.current)
Enumerator1.fetchNext()
else
return true
}
return false
}
If the sort order is descending you need to use a reverse enumerator for this array.
However, this is not always the fastest way.
If one of the arrays have significantly different size it could be more efficient to use binary search for a few elements of the elements of the shorter array.
This can be even more improved because when you start with the median element of the small array you need not do a full search for any further element. Any element before the median element must be in the range before the location where the median element was not found and any element after the median element must be in the upper range of the large array. This can be applied recursively until all elements have been located. Once you get a hit, you can abort.
The disadvantage of this method is that it takes more time in worst case, i.e. O(n log m), and it requires random access to the arrays which might impact cache efficiency.
On the other side, multiplying with a small number (log m) could be better than adding a large number (m). In contrast to the above algorithm typically only a few elements of the large array have to be accessed.
The break even is approximately when log m is less than m/n, where n is the smaller number.
You think that's it? - no
In case the random access to the larger array causes a higher latency, e.g. because of reduced cache efficiency, it could be even better to do the reverse, i.e. look for the elements of the large array in the small array, starting with the median of the large array.
Why should this be faster? You have to look up much more elements.
Answer:
No, there are no more lookups. As soon as the boundaries where you expect a range of elements of the large array collapses you can stop searching for these elements since you won't find any hits anymore.
In fact the number of comparisons is exactly the same.
The difference is that a single element of the large array is compared against different elements of the small array in the first step. This takes only one slow access for a bunch of comparisons while the other way around you need to access the same element several times with some other elements accesses in between.
So there are less slow accesses at the expense of more fast ones.
(I implemented search as you type this way about 30 years ago where access to the large index required disk I/O.)
If you know that they are sorted, then you can have a pointer to the beginning of each array, and move on both arrays, and moving one of the pointers up (or down) after each comparison. That would make it O(n). Not sure you could bisect anything as you don't know where the common number would be.
Still better than the brute force O(n2).
If you know the second array is sorted you can use binary search to look through the second array for elements from the first array.
This can be done in two ways.
a) Binary Search b) Linear Search
For Binary Search - for each element in array A look for element in B with binary search, then in that case the complexity is O(n log n )
For Linear Search - it is O( m + n ) - where m, n are sizes of the arrays. In your case m = n.
Linear search :
Have two indices i, j that point to the arrays A, B
Compare A[i], B[j]
If A[i] < B[j] then increment i, because any match if exists can be found only in later indices in A.
If A[i] > B[j] then increment j, because any match if exists can be found only in later indices in B.
If A[i] == B[j] you found the answer.
Code:
private int findCommonElement(int[] A, int[] B) {
for ( int i = 0, j = 0; i < A.length && j < B.length; ) {
if ( A[i] < B[j] ) {
i++;
} else if ( A[i] > B[j] ) {
j++;
}
return A[i];
}
return -1; //Assuming all integers are positive.
}
Now if you have both descending, just reverse the comparison signs i.e. if A[i] < B[j] increment j else increment i
If you have one descending (B) one ascending (A) then i for A starts from beginning of array and j for B starts from end of array and move them accordingly as shown below:
for (int i = 0, j = B.length - 1; i < A.length && j >= 0; ) {
if ( A[i] < B[j] ) {
i++;
} else if ( A[i] > B[j] ) {
j--;
}
return A[i];
}

Given N arrays, how many ways are there for each array to contribute one element and add to k?

Say I had N arrays. These N arrays are in an array of arrays A. How many N-tuples are there such that for a tuple t,
sum = 0
for i = 0 ... N-1
sum += A[i][t[i]]
sum == k
What is an efficient way to solve this? The best I can come up with is just enumerating all possibilities.
P.S. This isn't a homework question. I saw this on LeetCode and was curious about a solution to the general case.
Conceptual solution (can be improved):
sort the elements in each array
shift the elements in each array by the absolute minimum of all arrays (abs_min - to shift all arrays you'll subtract the abs_min from each element of all arrays) - you now have all arrays with non-negative elements and you are searching for a target_sum = initial_sum - num_arrays*abs_min
set your curr_array as the first one
binary search for the position of target_sum in the curr_array. You will need to consider all the elements in curr_array with indices under this position. Take one such element, subtract it from the target_sum, and recursively repeat the search with the next array.
I believe the (amortised) complexity will be somewhere of O(num_arrays*N*log(N)) where N is the (maximum) number of elements in the arrays.
Opportunities for improvement:
I kinda feel that shifting all arrays by abs_min is unnecessary (just an artifice that helps the thinking). Maybe before going one step deeper in recursion in step 4, the target_sum may be shifted by the min of current array?
reordering the arrays so that the shorter ones are considered first will perhaps improve the performance (lower number of elements in the upper levels of the recursion to consider) [Edit] or maybe reordering the arrays in the descending order of their min value (take out from the target_sum in the most aggressive way possible)?
adopting a scheme which eliminate/multiplexes the duplicates inside the initial arrays may help - i.e a map with the index_key=unique_value and the map-value the set of indexes). If the specific tuples are not required, then a map of unique-value->occurrence_count would be enough. (this may be useful if one can be sure that duplicate exist - e.g. the values in the arrays are within tight ranges and arrays are pretty long - pigeonhole principle)
[Edited to show how it works in the example of {{1, 2, 3}, {42,43, 44, 45, 46, 47}}]
Upper limit = index of the element strictly greater than the provided value. If you want values lesser or equal, take values strictly below that index!!
Zero-based index convention
49 target sum in the first array gets the upper limit of index=3 (so all indexes under 3 need to to be considered)
first array - start index=2 / value=3 in the first array, you will be looking for a target_sum of 46 in the second. Upper limit by binary search in the second is index=5 (and will be looking strictly under), so start with index=4/value=46 (the algo cuts out the value of 47). 46 is good and retained, index=3/value=45 is not enough and (not having a 3-rd array to recurse into it) the algo won't even consider under index=3/value=45.
first array, index=1/value=2, looking for a target_sum of 47 in the second array. Get an upper limit (binary search) affords index=7 (to search strictly under it) so index=6/value=47. 47 is retained, 46 and below and the algo cut out
down in the first array, index=0/value=1, looking for a target_sum of 48 in the second array. Upper limit is again 7, at index=6/value=47 we get an insufficient value and terminate.
So, grand totals:
Total binary searches: 1-in the first array, 3 in the second.
Total successful equalities tested=2 (two tuples found).
Total unsuccesful equalities tested=3 (until the second array does no longer offer a satisfactory answer).
Total additions/subtraction performed=3 (one for each value in the first array)
By contrast, the exhaustive scanning would get:
no binary searches
total additions = 3*6=18
total successful equality tested = 2
total un-successful equality tested = 16
Language: C++
Constraints: A[i][j] >= 0
Complexity: O(N * k)
int A [MAX_N][MAX_N], memo[MAX_N][MAX_K+1];
int countWays2(int m, int sum, int N){
if(memo [m][sum] != -1)
return memo [m][sum];
if(m == 0)
return memo[m][sum] = count(A[0], A[0]+N, sum);
int ways = 0;
for(int i = 0; i < N; ++i)
if(sum >= A[m][i])
ways += countWays2(m-1, sum - A[m][i], N);
return memo[m][sum] = ways;
}
int countWays(int N, int k){
if(k < 0) return 0;
for(int i = 0; i < N; ++i)
fill(memo[i], memo[i] + k + 1, -1); //Initialize memoization table
return countWays2(N-1, k, N);
}
The answer is countWays(N, k)

Insertion sort comparison?

How to count number of comparisons in insertion sort in less than O(n^2) ?
When we're inserting an element, we alternate comparisons and swaps until either (1) the element compares not less than the element to its right (2) we hit the beginning of the array. In case (1), there is one comparison not paired with a swap. In case (2), every comparison is paired with a swap. The upward adjustment for number of comparisons can be computed by counting the number of successive minima from left to right (or however your insertion sort works), in time O(n).
num_comparisons = num_swaps
min_so_far = array[0]
for i in range(1, len(array)):
if array[i] < min_so_far:
min_so_far = array[i]
else:
num_comparisons += 1
As commented, to do it in less than O(n^2) is hard, maybe impossible if you must pay the price for sorting. If you already know the number of comparisons done at each external iteration then it would be possible in O(n), but the price for sorting was payed sometime before.
Here is a way for counting the comparisons inside the method (in pseudo C++):
void insertion_sort(int p[], const size_t n, size_t & count)
{
for (long i = 1, j; i < n; ++i)
{
auto tmp = p[i];
for (j = i - 1; j >= 0 and p[j] > tmp; --j) // insert a gap where put tmp
p[j + 1] = p[j];
count += i - j; // i - j is the number of comparisons done in this iteration
p[j + 1] = tmp;
}
}
n is the number of elements and count the comparisons counter which must receive a variable set to zero.
If I remember correctly, this is how insertion sort works:
A = unsorted input array
B := []; //sorted output array
while(A is not empty) {
remove first element from A and add it to B, preserving B's sorting
}
If the insertion to B is implemented by linear search from the left until you find a greater element, then the number of comparisons is the number of pairs (i,j) such that i < j and A[i] >= A[j] (I'm considering the stable variant).
In other words, for each element x, count the number of elements before x that have less or equal value. That can be done by scanning A from the left, adding it's element to some balanced binary search tree, that also remembers the number of elements under each node. In such tree, you can find number of elements lesser or equal to a certain value in O(log n). Total time: O(n log n).

find maximum sum of n elements in an array such that not more than k elements are adjacent

Almost the same as this:
find maximum sum of elements in an array such that not more than k elements are adjacent
except there is a limit of n elements we can choose. How to modify the DP algorithm to make it work for this?
Add new dimension of DP function:
f[i, j, l] - max sum for first i elements, if used j total elements and last l elements in this sum.
well, let me make it more clearly.
question: find maximum sum of n elements in an array such that not more than K elements are adjacent
let int f[i][j][k] means the maximum sum for first i elements, using j total elements and the last k elements are used. let bool g[i][j][k] denotes whether it is possible to get certain combination. eg. g[1][1][2] is false. this is important because without restrict, f may generate impossible answers.
initially, memset f and g to be all zeros and set g[0][0][0] to be true. we can use forward recurrence to solve this DP problem. obviously, each time you encounter a number, you have two choices: choose it, or abadon it. thay gives out the recurrence formula:
f[i][j][k] can infer f[i+1][j+1][k+1], or
f[i][j][k] can infer f[i+1][j][0]
so, the pseudo code can be as follow:
memset(f,0,sizeof(f));
memset(g,0,sizeof(g));
g[0][0][0]=true;
for (int i=0;i<array.size();i++)
for (int j=0;j<=n;j++)
for (int k=0;k<=K;k++) if (g[i][j][k]) {
f[i+1][j][0]=max(f[i+1][j][0],f[i][j][k]);
f[i+1][j+1][k+1]=max(f[i+1][j+1][k+1],f[i][j][k]+array[i]);
g[i+1][j][0]=true;
g[i+1][j+1][k+1]=true;
}
and the final result will be:
ans=0;
for (i=0;i<=K;i++)
ans=max(ans,f[array.size()][n][i]);
return ans;
above gives exactly j elements. if you want to get at most j elements, you can change it in this way:
ans=0;
for (i=0;i<=n;i++)
for (j=0;j<=K;j++)
ans=max(ans,f[array.size()][i][j]);
return ans;

Resources