Efficient Way to Find Pair Orderings? - algorithm

Let's say I have three arrays a, b, and c of equal length N. The elements of each of these arrays come from a totally ordered set, but are not sorted. I also have two index variables, i and j. For all i != j, I want to count the number of index pairs such that a[i] < a[j], b[i] > b[j] and c[i] < c[j]. Is there any way this can be done in less than O(N ^ 2) time complexity, for example by creative use of sorting algorithms?
Notes: The inspiration for this question is that, if you only have two arrays, a and b, you can find the number of index pairs such that a[i] < a[j] and b[i] > b[j] in O(N log N) with a merge sort. I'm basically looking for a generalization to three arrays.
For simplicity, you may assume that no two elements of any array are equal (no ties).

By sorting the array a and rearranging the arrays b and c at the same time, we can suppose that a[i] < a[j] <=> i < j. So we need to find the number of pairs (i,j) such that i < j, b[i] > b[j] and c[i] < c[j]. Let's view (b[i], c[i]) as a point on a plane. We add the points one by one. Each time we add a point (b[j], c[j]), first we count the number of already added points (i < j) such that b[i] > b[j] and c[i] < c[j]. Then we add the point j and proceed to the next one. The sum of the numbers obtained at each step is our result.
Now it seems that this kind of queries can be fulfilled by two-dimensional segment tree: http://en.wikipedia.org/wiki/Segment_tree The cost of one iteration will be O(log^2 n), and the total complexity is O(n log^2 n).
(Note that I assume here that the elements of arrays are numbers. It's OK, because using a sorting we can always replace the elements of an array with numbers from 1 to n so that the order was preserved.)
Edit: In fact, a simpler structure called Fenwick tree or binary indexed tree is sufficient. See this link: http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees#2d

Related

Adding two arrays into one

I have two arrays (a and b) of size n, (positive whole numbers)
a= [a1…..an] b= [b1….bn]
I want to store them in array c, also an array of size n
c=[c1…..cn]
where I add one element from a plus one element from b (each used once) into c, lets say the first element in c is combining a1+b3
Quick example:
n=4 a=[a1,a2,a3,a4] b=[b1,b2,b3,b4]
one way could be:
c=[a1+b2,b3+a4,a2+b1,a3+b4]
The problem is that I want to add them in a way so that the elements in c become as evenly distributed as possible,
One ideal case would be that c came out as:
c=[5,5,5,5]
but the numbers in a and b might not match up so they become even, so I want it to come as close to even as possible.
I an trying to find a way so that the difference between the biggest number in c minus the smallest number in c (after being combined as evenly as I can) to be as small as possible. In my optimal example above that would be 5-5=0 which is most optimal since 0 is the smallest minimum difference I want to achieve. Some other case with other numbers might come out as 6-5=1, which might be the smallest I could get in that situation
My way of going would be to sort array a in ascending order and my array b in descending order,and then combining them with the same element that they are in. Im not sure if this is the best way or the fastest to do this in, I want my code (doing it with python) to be fast. I cant come up with a better way where I could distribute them more evenly,any clue if there are better ways to solve this problem? I really appreciate all advice I could get! Thank you
When trying to solve it in a way where one of the arrays is ascending, and the other one being descending, there might already exist an algorithm that solves it better that I have not thought of. Thank you for reading!
Your algorithm is both correct and fast. It is just proving it that is optimal which is tricky.
We can do this by proving the following two results.
Any other matching of a and b will lead to a maximum at least as big as yours.
Any other matching of a and b will lead to a minimum at least as small as yours.
And the conclusion is that any other matching must have a maximum-minimum at least as big as yours. From which yours must be optimal.
Now let's look at part 1. Sort a ascending, and b descending. Find the i such that c[i] = a[i] + b[i] is a maximum. Suppose that m is any other matching where we're matching up a[j] + b[m[j]]. Note that m[1], ..., m[n] is a permutation of 1, ..., n.
If a[i] + b[m[i]] >= a[i] + b[i] then part 1 is true..
If a[i] + b[m[i]] < a[i] + b[i] then b[m[i]] < b[i] and so we must have i < m[i]. Now there are n-i numbers in the range i+1, ..., n. m maps something out of that range into that range. Because m is a permutation, by the pigeonhole principle, m must map something in that range, out of that range.
In other words there must be a j > i such that m[j] <= i. But now a[i] <= a[j] and b[i] <= b[m[j]] and therefore a[i] + b[i] <= a[j] + b[m[j]]. And so part 1 is true again.
That concludes the proof of part 1.
The proof of part 2 is similar. Except now a[i] + b[i] is at a minimum, m[i] < i, there is a j < i with i <= m[j], a[j] <= a[i], b[m[j]] <= b[i], and a[j] + b[m[j]] <= a[i] + b[i].
And as noted, part 1 and part 2 together implies that you've minimized the difference between the minimum and maximum.

Data structure to find the nearest untaken element

I have an array A[0..N-1] containing N elements and a list containing M indexes. Each index in the list corresponds to an element in the array. For example, an index of 0 corresponds to A[0], an index of 1 corresponds to A[1]
I want to process the list sequentially from the first index to the last index as follows:
For the current index i,
if A[i] is not taken, then A[i] will be taken.
if A[i] is taken, then the smallest j > i where A[j] is not taken will be chosen and A[j] will be taken. If there is no such element, output -1
I want to output an array B of length M where B[i] denotes the index of element taken. I wonder how do I do it in linear complexity. (i.e. O(N) or O(M)). What data structure could be used?
Here is a suggestion for an O(M*logM) algorithm.
At the start, insert all M i-indexes of the list into a sorted tree allowing doubles. The steps become,
If A[i] is not taken, then take A[i]. Remove the i from the tree.
If A[i] is taken, then the smallest j > i where A[j] is not taken will be chosen and A[j] will be taken. The smallest j>i is readily available after i in the tree. If there is no such j, output -1. Otherwise, remove the j from the tree.

How was the assumption made on half will have i < j?

I have been reading "Cracking the Coding Interview 6th Edition".. On Chapter 0 - Big O, I have problem understanding an assumption made to a problem on Example 3.
void printUnorderedPairs(int[] array){
for(int i = 0; i < array.length; i++){
for(int j = i + 1; j < array.length; j++){
...
}
}
}
Under What It Means section, it assumed that:
There are N^2 total pairs. Roughly half of those will have i < j and the remaining half will have i > j. This code goes through roughly n^2/2 pairs so it does O(N^2) work.
My question is, how was the assumption made on Roughly half of those will have i < j and the remaining half will have i > j done? Can someone explain it to me please?
Thanks!
There are several ways you can try to think about this assumption, I quite like the "geometric" suggestion from #IanMercer in the comments. Here is another:
What is an unordered pair
An unordered pair is a pair of integers (i,j) where i and j is in the domain (1, N). (They can take any value from 1 to N).
How many pairs are there?
i can be of any value from 1 to N, and j can be of any value from 1 to N. Any combination of i forms a valid pair. So there are are N*N pairs.
Among all the pairs, how many pairs are there that i < j
Note that for any pair (a,b) where a is smaller than b, there exists a counterpart (b,a) (same values but flipped). So there is an equal amount of pairs where i<j as there are pairs 'i>j'.
So what is this confusing roughly part? It is because of all those N*N pairs there are some where neither i<j nor j>i, and those are precisely the N pairs where i==j.
The N*N pairs are thus divided into three parts (those where i < j), (those where j> i) and (those where i==j). Since first two are much larger O(N**2)/2 vs. the last group which has only N elements, we can state that roughly half have the property that i<j.

Find kth number in sum array

Given an array A with N elements I need to find pair (i,j) such that i is not equal to j and if we write the sum A[i]+A[j] for all pairs of (i,j) then it comes at the kth position.
Example : Let N=4 and arrays A=[1 2 3 4] and if K=3 then answer is 5 as we can see it clearly that sum array becomes like this : [3,4,5,5,6,7]
I can't go for all pair of i and j as N can go up to 100000. Please help how to solve this problem
I mean something like this :
int len=N*(N+1)/2;
int sum[len];
int count=0;
for(int i=0;i<N;i++){
for(int j=i+1;j<N;j++){
sum[count]=A[i]+A[j];
count++;
}
}
//Then just find kth element.
We can't go with this approach
A solution that is based on a fact that K <= 50: Let's take the first K + 1 elements of the array in a sorted order. Now we can just try all their combinations. Proof of correctness: let's assume that a pair (i, j) is the answer, where j > K + 1. But there are K pairs with the same or smaller sum: (1, 2), (1, 3), ..., (1, K + 1). Thus, it cannot be the K-th pair.
It is possible to achieve an O(N + K ^ 2) time complexity by choosing the K + 1 smallest numbers using a quickselect algorithm(it is possible to do even better, but it is not required). You can also just the array and get an O(N * log N + K ^ 2 * log K) complexity.
I assume that you got this question from http://www.careercup.com/question?id=7457663.
If k is close to 0 then the accepted answer to How to find kth largest number in pairwise sums like setA + setB? can be adapted quite easily to this problem and be quite efficient. You need O(n log(n)) to sort the array, O(n) to set up a priority queue, and then O(k log(k)) to iterate through the elements. The reversed solution is also efficient if k is near n*n - n.
If k is close to n*n/2 then that won't be very good. But you can adapt the pivot approach of http://en.wikipedia.org/wiki/Quickselect to this problem. First in time O(n log(n)) you can sort the array. In time O(n) you can set up a data structure representing the various contiguous ranges of columns. Then you'll need to select pivots O(log(n)) times. (Remember, log(n*n) = O(log(n)).) For each pivot, you can do a binary search of each column to figure out where it split it in time O(log(n)) per column, and total cost of O(n log(n)) for all columns.
The resulting algorithm will be O(n log(n) log(n)).
Update: I do not have time to do the finger exercise of supplying code. But I can outline some of the classes you might have in an implementation.
The implementation will be a bit verbose, but that is sometimes the cost of a good general-purpose algorithm.
ArrayRangeWithAddend. This represents a range of an array, summed with one value.with has an array (reference or pointer so the underlying data can be shared between objects), a start and an end to the range, and a shiftValue for the value to add to every element in the range.
It should have a constructor. A method to give the size. A method to partition(n) it into a range less than n, the count equal to n, and a range greater than n. And value(i) to give the i'th value.
ArrayRangeCollection. This is a collection of ArrayRangeWithAddend objects. It should have methods to give its size, pick a random element, and a method to partition(n) it into an ArrayRangeCollection that is below n, count of those equal to n, and an ArrayRangeCollection that is larger than n. In the partition method it will be good to not include ArrayRangeWithAddend objects that have size 0.
Now your main program can sort the array, and create an ArrayRangeCollection covering all pairs of sums that you are interested in. Then the random and partition method can be used to implement the standard quickselect algorithm that you will find in the link I provided.
Here is how to do it (in pseudo-code). I have now confirmed that it works correctly.
//A is the original array, such as A=[1,2,3,4]
//k (an integer) is the element in the 'sum' array to find
N = A.length
//first we find i
i = -1
nl = N
k2 = k
while (k2 >= 0) {
i++
nl--
k2 -= nl
}
//then we find j
j = k2 + nl + i + 1
//now compute the sum at index position k
kSum = A[i] + A[j]
EDIT:
I have now tested this works. I had to fix some parts... basically the k input argument should use 0-based indexing. (The OP seems to use 1-based indexing.)
EDIT 2:
I'll try to explain my theory then. I began with the concept that the sum array should be visualised as a 2D jagged array (diminishing in width as the height increases), with the coordinates (as mentioned in the OP) being i and j. So for an array such as [1,2,3,4,5] the sum array would be conceived as this:
3,4,5,6,
5,6,7,
7,8,
9.
The top row are all values where i would equal 0. The second row is where i equals 1. To find the value of 'j' we do the same but in the column direction.
... Sorry I cannot explain this any better!

Minimum of sum of absolute values

Problem statement:
There are 3 arrays A,B,C all filled with positive integers, and all the three arrays are of the same size.
Find min(|a-b|+|b-c|+|c-a|) where a is in A, b is in B, c is in C.
I worked on the problem the whole weekend. A friend told me that it can be done in linear time. I don't see how that could be possible.
How would you do it ?
Well, I think I can do it in O(n log n). I can only do O(n) if the arrays are initially sorted.
First, observe that you can permute a,b,c however you like without changing the value of the expression. So let x be the smallest of a,b,c; let y be the middle of the three; and let z be the maximum. Then note that the expression just equals 2*(z-x). (Edit: This is easy to see... Once you have the three numbers in order, x < y < z, the sum is just (y-x) + (z-y) + (z-x) which equals 2*(z-x))
Thus, all we are really trying to do is find three numbers such that the outer two are as close together as possible, with the other number "sandwiched" between them.
So start by sorting all three arrays in O(n log n). Maintain an index into each array; call these i, j, and k. Initialize all three to zero. Whichever index points to the smallest value, increment that index. That is, if A[i] is smaller than B[j] and C[k], increment i; if B[j] is smallest, increment j; if C[k] is smallest, increment k. Repeat, keeping track of |A[i]-B[j]| + |B[j]-C[k]| + |C[k]-A[i]| the whole time. The smallest value you observe during this march is your answer. (When the smallest of the three is at the end of its array, stop because you are done.)
At each step, you add one to exactly one index; but you can only do this n times for each array before hitting the end. So this is at most 3*n steps, which is O(n), which is less than O(n log n), meaning the total time is O(n log n). (Or just O(n) if you can assume the arrays are sorted.)
Sketch of a proof that this works: Suppose A[I], B[J], C[K] are the a, b, c that form the actual answer; i.e., they have the minimum |a-b|+|b-c|+|c-a|. Suppose further that a > b > c; the proof for the other cases is symmetric.
Lemma: During our march, we do not increment j past J until after we increment k past K. Proof: We always increment the index of the smallest element, and when k <= K, B[J] > C[k]. So when j=J and k <= K, B[j] is not the smallest element, so we do not increment j.
Now suppose we increment k past K before i reaches I. What do things look like just before we perform that increment? Well, C[k] is the smallest of the three at that moment, because we are about to increment k. A[i] is less than or equal to A[I], because i < I and A is sorted. Finally, j <= J because k <= K (by our Lemma), so B[j] is also less than A[I]. Taken together, this means our sum-of-abs-diff at this moment is less than 2*(c-a), which is a contradiction.
Thus, we do not increment k past K until i reaches I. Therefore, at some point during our march i=I and k=K. By our Lemma, at this point j is less than or equal to J. So at this point, either B[j] is less than the other two and j will get incremented; or B[j] is between the other two and our sum is just 2*(A[i]-C[k]), which is the right answer.
This proof is sloppy; in particular, it fails to explicitly account for the case where one or more of a,b,c are equal. But I think that detail can be worked out pretty easily.
I would write a really simple program like this:
#!/usr/bin/python
import sys, os, random
A = random.sample(range(100), 10)
B = random.sample(range(100), 10)
C = random.sample(range(100), 10)
minsum = sys.maxint
for a in A:
for b in B:
for c in C:
print 'checking with a=%d b=%d c=%d' % (a, b, c)
abcsum = abs(a - b) + abs(b - c) + abs(c - a)
if abcsum < minsum:
print 'found new low sum %d with a=%d b=%d c=%d' % (abcsum, a, b, c)
minsum = abcsum
And test it over and over until I saw some pattern emerge. The pattern I found here is what would be expected: the numbers that are closest together in each set, regardless of whether the numbers are "high" or "low", are those that produce the smallest minimum sum. So it becomes a nearest-number problem. For whatever that's worth, probably not much.

Resources