Counting number of points in lower left quadrant? - algorithm

I am having trouble understanding a solution to an algorithmic problem
In particular, I don't understand how or why this part of the code
s += a[i];
total += query(s);
update(s);
allows you to compute the total number of points in the lower left quadrant of each point.
Could someone please elaborate?

As an analogue for the plane problem, consider this:
For a point (a, b) to lie in the lower left quadrant of (x, y), a <
x & b < y; thus, points of the form (i, P[i]) lie in the lower left quadrant
of (j, P[j]) iff i < j and P[i] < P[j]
When iterating in ascending order, all points that were considered earlier lie on the left compared to the current (i, P[i])
So one only has to locate all P[j]s less that P[i] that have been considered until now
*current point refers to the point in consideration in the current iteration of the for loop that you quoted ie, (i, P[i])
Let's define another array, C[s]:
C[s] = Number of Prefix Sums of array A[1..(i - 1)] that amount to s
So the solution to #3 becomes the sum ... C[-2] + C[-1] + C[0] + C[1] + C[2] ... C[P[i] - 1], ie prefix sum of C[P[i]]
Use the BIT to store the prefix sum of C, thus defining query(s) as:
query(s) = Number of Prefix Sums of array A[1..(i - 1)] that amount to a value < s
Using these definitions, s in the given code gives you the prefix sum up to the current index i (P[i]). total builds the answer, and update simply adds P[i] to the BIT.
We have to repeat this method for all i, hence the for loop.
PS: It uses a data structure called a Binary Indexed Tree (http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees) for operations. If you aren't acquainted with it, I'd recommend that you check the link.
EDIT:
You are given a array S and a value X. You can split S into two disjoint subarrays such that L has all elements of S less than X, and H that has those that are greater than or equal to X.
A: All elements of L are less than all elements of H.
Any subsequence T of S will have some elements of L and some elements of H. Let's say it has p elements of L and q of H. When T is sorted to give T', all p elements of L appear before the q elements of H because of A.
Median being the central value is the value at location m = (p + q)/2
It is intuitive to think that having q >= p implies that the median lies in X, as a proof:
Values in locations [1..p] in T' belong to L. Therefore for the median to be in H, it's position m should be greater than p:
m > p
(p + q)/2 > p
p + q > 2p
q > p
B: q - p > 0
To computer q - p, I replace all elements in T' with -1 if they belong to L ( < X ) and +1 if they belong to H ( >= X)
T looks something like {-1, -1, -1... 1, 1, 1}
It has p times -1 and q times 1. Sum of T' will now give me:
Sum = p * (-1) + q * (1)
C: Sum = q - p
I can use this information to find the value in B.
All subsequences are of the form {A[i], A[i + 2], A[i + 3] ... A[j + 1]} since they are contiguous, To compute sum of A[i] to A[j + 1], I can compute the prefix sum of A[i] with P[i] = A[1] + A[2] + .. A[i - 1]
Sum of subsequence from A[i] to A[j] then can be computed as P[j] - P[i] (j is greater of j and i)
With C and B in mind, we conclude:
Sum = P[j] - P[i] = q - p (q - p > 0)
P[j] - P[i] > 0
P[j] > P[i]
j > i and P[j] > P[i] for each solution that gives you a median >= X
In summary:
Replace all A[i] with -1 if they are less than X and -1 otherwise
Computer prefix sums of A[i]
For each pair (i, P[i]), count pairs which lie to its lower left quadrant.

Related

Counting inversions in an array of 2D pair

Problem Description:
Let there be an array of 2D pairs ((x1, y1), . . . ,(xn, yn))
. With a fixed constant
y' a pair (i, j) is called half-inverted if i < j, xi > xj , and yi ≥ y' > yj . Devise an algorithm
that counts the number of half-inverted pairs. You will get full marks if your algorithm is
correct of complexity no more than O(n log n).
\My idea is to treat this using similar method as counting inversion in a normal array, but my problem is that how do we maintain the order during the Merge And Count step?
It is a simple modification of the familiar merge-sort inversion counting algorithm which can be used to solve this problem so make you fully understand it as a prerequisite.
If we examine the merge step of this algorithm we have 2 sorted halves and 2 pointers pointing to an element of each. Let our left pointer be i and our right, j. Using the traditional definition of an inversion, if our i pointer points to a value that is larger than the value pointed to by j then due the arrays being sorted and all the elements on the left being before those on the right in the real array, we know all the elements from i to the end of the left half meet our definition of an inversion for our value at j so we increase our count by mid - i where mid is the end of the left half.
Switching back to your problem, we are dealing with pairs (x,y). If we can keep our x values sorted then, using the approach described above, we can simply count the number of inversions only considering x values. Looking at your definition of half inversions we will surely be over counting the number we need if we only count xi > xj. We are missing the additional constraint of yi >= y' > yj which must be filtered out of our counting.
So, if we look back to our traditional algorithm when our i pointer is pointing to a value greater than the value at j we also need to make sure that our y value at j is less than y'. If this not true then none of the x's from i to mid will match our definition of a half inversion and so we cannot count them. Now let's assume our j's y is smaller than y', if we simply counted all the pairs from i to mid then we would still be over counting the pairs which have yi < y'.
One way to fix this is to keep track of the of y values in the left half from i to mid which are >= y' and add that value to our count. We can keep track of how many y >= y' we see in the merge step up to any i, and subtract that from the total number of y's which are >= y' in the left half. To keep track of that total number we can return that value from our recursive function (total = left + right) and only use the number which came from the left half when merging. We also need to modify our base case which is straightforward.
def count_half_inversions(l, y):
return count_rec(l, 0, len(l), l.copy(), y)[0]
def count_rec(l, begin, end, copy, y):
if end-begin <= 1:
# we have only 1 pair
return (0, 1 if l[begin][1] >= y else 0)
mid = begin + ((end-begin) // 2)
left = count_rec(copy, begin, mid, l, y)
right = count_rec(copy, mid, end, l, y)
between = merge_count(l, begin, mid, end, copy, left[1], y)
# return (inversion count, number of pairs, (i,j), with j >= y)
return (left[0] + right[0] + between, left[1] + right[1])
def merge_count(l, begin, mid, end, copy, left_y_count, y):
result = 0
i,j = begin, mid
k = begin
while i < mid and j < end:
if copy[i][0] > copy[j][0]:
if y > copy[j][1]:
result += left_y_count
smaller = copy[j]
j += 1
else:
if copy[i][1] >= y:
left_y_count -= 1
smaller = copy[i]
i += 1
l[k] = smaller
k += 1
while i < mid:
l[k] = copy[i]
i += 1
k += 1
while j < end:
l[k] = copy[j]
j += 1
k += 1
return result
test_case = [(1,1), (6,4), (6,3), (1,2), (1,2), (3,3), (6,2), (0,1)]
fixed_y = 2
print(count_half_inversions(test_case, fixed_y))

Algorithm: minimize the costs of rearranging piles

I have come across this algorithmic problem that I was not able to solve: https://prologin.org/train/2017/semifinal/collection_de_feuilles (in French).
N, K, and M[] will be given as the input. N refers to the number of piles of items, and in M[i] is the number of items in the i-th pile. You can only merge the i-th pile of M[i] items into the j-th pile of M[j] items if j > i, and the cost of this merge is defined to be M[i] * (j - i). The output is the minimum cost of merging the initial N piles into K piles.
My idea was to use a function min_rearrange(x, num_piles) which calculates the minimum cost to rearrange piles from M[x] to M[N - 1] into the specified number of piles. When num_piles is equal to 1, this function returns the sum of the the costs to move M[j] into M[N -
1], x ≤ j < N. Otherwise, since there must exist an i with x ≤ i ≤ N - num_piles that we move all the piles from M[x] to M[i - 1] into M[i], we calculate that sum and then recursively call min_rearrange(i + 1, num_piles - 1) to find the minimum cost.
I have also tried to memoize the solutions:
# https://prologin.org/train/2017/semifinal/collection_de_feuilles
n, k = map(int, input().split())
piles = list(map(int, input().split()))
memory = {}
def min_rearrange(x, num_piles):
"""Min cost to rearrange piles[x:] into num_piles"""
if (x, num_piles) in memory:
return memory[x, num_piles]
if num_piles == 1:
memory[x, num_piles] = sum([(n - 1 - i) * piles[i] for i in range(x, n)])
return memory[x, num_piles]
min_cost = float('inf')
for i in range(x, n - num_piles + 1):
cost = sum([(i - j) * piles[j] for j in range(x, i)])
min_cost = min(min_cost, cost + min_rearrange(i + 1, num_piles - 1))
memory[x, num_piles] = min_cost
return min_cost
print(min_rearrange(0, k))
But it takes too much time for large input sizes. I'd like to know how the problem can be solved more efficiently.

Finding median in merged array of two sorted arrays

Assume we have 2 sorted arrays of integers with sizes of n and m. What is the best way to find median of all m + n numbers?
It's easy to do this with log(n) * log(m) complexity. But i want to solve this problem in log(n) + log(m) time. So is there any suggestion to solve this problem?
Explanation
The key point of this problem is to ignore half part of A and B each step recursively by comparing the median of remaining A and B:
if (aMid < bMid) Keep [aMid +1 ... n] and [bLeft ... m]
else Keep [bMid + 1 ... m] and [aLeft ... n]
// where n and m are the length of array A and B
As the following: time complexity is O(log(m + n))
public double findMedianSortedArrays(int[] A, int[] B) {
int m = A.length, n = B.length;
int l = (m + n + 1) / 2;
int r = (m + n + 2) / 2;
return (getkth(A, 0, B, 0, l) + getkth(A, 0, B, 0, r)) / 2.0;
}
public double getkth(int[] A, int aStart, int[] B, int bStart, int k) {
if (aStart > A.length - 1) return B[bStart + k - 1];
if (bStart > B.length - 1) return A[aStart + k - 1];
if (k == 1) return Math.min(A[aStart], B[bStart]);
int aMid = Integer.MAX_VALUE, bMid = Integer.MAX_VALUE;
if (aStart + k/2 - 1 < A.length) aMid = A[aStart + k/2 - 1];
if (bStart + k/2 - 1 < B.length) bMid = B[bStart + k/2 - 1];
if (aMid < bMid)
return getkth(A, aStart + k / 2, B, bStart, k - k / 2); // Check: aRight + bLeft
else
return getkth(A, aStart, B, bStart + k / 2, k - k / 2); // Check: bRight + aLeft
}
Hope it helps! Let me know if you need more explanation on any part.
Here's a very good solution I found in Java on Stack Overflow. It's a method of finding the K and K+1 smallest items in the two arrays where K is the center of the merged array.
If you have a function for finding the Kth item of two arrays then finding the median of the two is easy;
Calculate the weighted average of the Kth and Kth+1 items of X and Y
But then you'll need a way to find the Kth item of two lists; (remember we're one indexing now)
If X contains zero items then the Kth smallest item of X and Y is the Kth smallest item of Y
Otherwise if K == 2 then the second smallest item of X and Y is the smallest of the smallest items of X and Y (min(X[0], Y[0]))
Otherwise;
i. Let A be min(length(X), K / 2)
ii. Let B be min(length(Y), K / 2)
iii. If the X[A] > Y[B] then recurse from step 1. with X, Y' with all elements of Y from B to the end of Y and K' = K - B, otherwise recurse with X' with all elements of X from A to the end of X, Y and K' = K - A
If I find the time tomorrow I will verify that this algorithm works in Python as stated and provide the example source code, it may have some off-by-one errors as-is.
Take the median element in list A and call it a. Compare a to the center elements in list B. Lets call them b1 and b2 (if B has odd length then exactly where you split b depends on your definition of the median of an even length list, but the procedure is almost identical regardless). if b1&leq;a&leq;b2 then a is the median of the merged array. This can be done in constant time since it requires exactly two comparisons.
If a is greater than b2 then we add the top half of A to the top of B and repeat. B will no longer be sorted, but it doesn't matter. If a is less than b1 then we add the bottom half of A to the bottom of B and repeat. These will iterate log(n) times at most (if the median is found sooner then stop, of course).
It is possible that this will not find the median. If this is the case then the median is in B. If so, perform the same algorithm with A and B reversed. This will require log(m) iterations. In total you will have performed at most 2*(log(n)+log(m)) iterations of a constant time operation, so you have solved the problem in order log(n)+log(m) time.
This is essentially the same answer as was given by iehrlich, but written out more explicitly.
Yes, this can be done. Given two arrays, A and B, in the worst-case scenario you have to first perform a binary search in A, and then, if it fails, binary search in B looking for the median. On each step of a binary search, you check if the current element is actually a median of a merged A+B array. Such check takes constant time.
Let's see why such check is constant. For simplicity, let's assume that |A| + |B| is an odd number, and that all numbers in both arrays are different. You can remove these restrictions later by applying the usual median definition approach (i.e., how to calculate the median of an array containing duplicates, or of an array with even length). Anyway, given that, we know for sure, that in the merged array there will be (|A| + |B| - 1) / 2 elements to the right and to the left of an actual median. In the process of a binary search in A, we know the index of current element x in array A (let it be i). Now, if x satisfies the condition B[j] < x < B[j+1], where i + j == (|A| + |B| - 1) / 2, then x is your median.
The overall complexity is O(log(max(|A|, |B|)) time and O(1) memory.

From a loop index k, obtain pairs i,j with i < j?

I need to traverse all pairs i,j with 0 <= i < n, 0 <= j < n and i < j for some positive integer n.
Problem is that I can only loop through another variable, say k. I can control the bounds of k. So the problem is to determine two arithmetic methods, f(k) and g(k) such that i=f(k) and j=g(k) traverse all admissible pairs as k traverses its consecutive values.
How can I do this in a simple way?
I think I got it (in Python):
def get_ij(n, k):
j = k // (n - 1) # // is integer (truncating) division
i = k - j * (n - 1)
if i >= j:
i = (n - 2) - i
j = (n - 1) - j
return i, j
for n in range(2, 6):
print n, sorted(get_ij(n, k) for k in range(n * (n - 1) / 2))
It basically folds the matrix so that it's (almost) rectangular. By "almost" I mean that there could be some unused entries on the far right of the bottom row.
The following pictures illustrate how the folding works for n=4:
and n=5:
Now, iterating over the rectangle is easy, as is mapping from folded coordinates back to coordinates in the original triangular matrix.
Pros: uses simple integer math.
Cons: returns the tuples in a weird order.
I think I found another way, that gives the pairs in lexicographic order. Note that here i > j instead of i < j.
Basically the algorithm consists of the two expressions:
i = floor((1 + sqrt(1 + 8*k))/2)
j = k - i*(i - 1)/2
that give i,j as functions of k. Here k is a zero-based index.
Pros: Gives the pairs in lexicographic order.
Cons: Relies on floating-point arithmetic.
Rationale:
We want to achieve the mapping in the following table:
k -> (i,j)
0 -> (1,0)
1 -> (2,0)
2 -> (2,1)
3 -> (3,0)
4 -> (3,1)
5 -> (3,2)
....
We start by considering the inverse mapping (i,j) -> k. It isn't hard to realize that:
k = i*(i-1)/2 + j
Since j < i, it follows that the value of k corresponding to all pairs (i,j) with fixed i satisfies:
i*(i-1)/2 <= k < i*(i+1)/2
Therefore, given k, i=f(k) returns the largest integer i such that i*(i-1)/2 <= k. After some algebra:
i = f(k) = floor((1 + sqrt(1 + 8*k))/2)
After we have found the value i, j is trivially given by
j = k - i*(i-1)/2
I'm not sure to understand exactly the question, but to sum up, if 0 <= i < n, 0 <= j < n , then you want to traverse 0 <= k < n*n
for (int k = 0; k < n*n; k++) {
int i = k / n;
int j = k % n;
// ...
}
[edit] I just saw that i < j ; so, this solution is not optimal since there's less that n*n necessary iterations ...
If we think of our solution in terms of a number triangle, where k is the sequence
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
...
Then j would be our (non zero-based) row number, that is, the greatest integer such that
j * (j - 1) / 2 < k
Solving for j:
j = ceiling ((sqrt (1 + 8 * k) - 1) / 2)
And i would be k's (zero-based) position in the row
i = k - j * (j - 1) / 2 - 1
The bounds for k are:
1 <= k <= n * (n - 1) / 2
Is it important that you actually have two arithmetic functions f(k) and g(k) doing this? Because you could first create a list such as
L = []
for i in range(n-1):
for j in range(n):
if j>i:
L.append((i,j))
This will give you all the pairs you asked for. Your variable k can now just run along the index of the list. For example, if we take n=5,
for x in L:
print(x)
gives us
(0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
Suppose your have 2<=k<5 for example, then
for k in range(2, 5)
print L[k]
yields
(0,3), (0,4), (1,2)

Count number of swaps to sort first k-smallest element using a bubble sort like algorithm

Given an array a and integer k. Someone uses following algorithm to get first k smallest elements:
cnt = 0
for i in [1, k]:
for j in [i + 1, n]:
if a[i] > a[j]:
swap(a[i], a[j])
cnt = cnt + 1
The problem is: How to calculate value of cnt (when we get final k-sorted array), i.e. the number of swaps, in O(n log n) or better ?
Or simply put: calculate the number of swaps needed to get first k-smallest number sorted using the above algorithm, in less than O(n log n).
I am thinking about a binary search tree, but I get confused (How array will change when increase i ? How to calculate number of swap for a fixed i ?...).
This is a very good question: it involves Inverse Pairs, Stack and some proof techniques.
Note 1: All index used below are 1-based, instead of traditional 0-based.
Note 2: If you want to see the algorithm directly, please start reading from the bottom.
First we define Inverse Pairs as:
For a[i] and a[j], in which i < j holds, if we have a[i] > a[j], then a[i] and a[j] are called an Inverse Pair.
For example, In the following array:
3 2 1 5 4
a[1] and a[2] is a pair of Inverse Pair, a[2] and a[3] is another pair.
Before we start the analysis, let's define a common language: in the reset of the post, "inverse pair starting from i" means the total number of inverse pairs involving a[i].
For example, for a = {3, 1, 2}, inverse pair starting from 1 is 2, and inverse pair starting from 2 is 0.
Now let's look at some facts:
If we have i < j < k, and a[i] > a[k], a[j] > a[k], swap a[i] and a[j] (if they are an inverse pair) won't affect the total number of inverse pair starting from j;
Total inverse pairs starting from i may change after a swap (e.g. suppose we have a = {5, 3, 4}, before a[1] is swapped with a[2], total number of inverse pair starting from 1 is 2, but after swap, array becomes a = {3, 5, 4}, and the number of inverse pair starting from 1 becomes 1);
Given an array A and 2 numbers, a and b, as the head element of A, if we can form more inverse pair with a than b, we have a > b;
Let's denote the total number of inverse pair starting from i as ip[i], then we have: if k is the min number satisfies ip[i] > ip[i + k], then a[i] > a[i + k] while a[i] < a[i + 1 .. i + k - 1] must be true. In words, if ip[i + k] is the first number smaller than ip[i], a[i + k] is also the first number smaller than a[i];
Proof of point 1:
By definition of inverse pair, for all a[k], k > j that forms inverse pair with a[j], a[k] < a[j] must hold. Since a[i] and a[j] are a pair of inverse and provided that i < j, we have a[i] > a[j]. Therefore, we have a[i] > a[j] > a[k], which indicates the inverse-pair-relationships are not broken.
Proof of point 3:
Leave as empty since quite obvious.
Proof of point 4:
First, it's easy to see that when i < j, a[i] > a[j], we have ip[i] >= ip[j] + 1 > ip[j]. Then, it's inverse-contradict statement is also true, i.e. when i < j, ip[i] <= ip[j], we have a[i] <= a[j].
Now back to the point. Since k is the min number to satisfy ip[i] > ip[i + k], then we have ip[i] <= ip[i + 1 .. i + k - 1], which indicates a[i] <= a[i + 1.. i + k - 1] by the lemma we just proved, which also indicates there's no inverse pairs in the region [i + 1, i + k - 1]. Therefore, ip[i] is the same as the number of inverse pairs starting from i + k, but involving a[i]. Given ip[i + k] < ip[i], we know a[i + k] has less inverse pair than a[i] in the region of [i + k + 1, n], which indicates a[i + k] < a[i] (by Point 3).
You can write down some sequences and try out the 4 facts mentioned above and convince yourself or disprove them :P
Now it's about the algorithm.
A naive implementation will take O(nk) to compute the result, and the worst case will be O(n^2) when k = n.
But how about we make use of the facts above:
First we compute ip[i] using Fenwick Tree (see Note 1 below), which takes O(n log n) to construct and O(n log n) to get all ip[i] calculated.
Next, we need to make use of facts. Since swap of 2 numbers only affect current position's inverse pair number but not values after (point 1 and 2), we don't need to worry about the value change. Also, since the nearest smaller number to the right shares the same index in ip and a, we only need to find the first ip[j] that is smaller than ip[i] in [i + 1, n]. If we denote the number of swaps to get first i element sorted as f[i], we have f[i] = f[j] + 1.
But how to find this "first smaller number" fast? Use stack! Here is a post which asks a highly similar problem: Given an array A,compute B s.t B[i] stores the nearest element to the left of A[i] which is smaller than A[i]
In short, we are able to do this in O(n).
But wait, the post says "to the left" but in our case it's "to the right". The solution is simple: we do backward in our case, then everything the same :D
Therefore, in summary, the total time complexity of the algorithm is O(n log n) + O(n) = O(n log n).
Finally, let's talk with an example (a simplified example of #make_lover's example in the comment):
a = {2, 5, 3, 4, 1, 6}, k = 2
First, let's get the inverse pairs:
ip = {1, 3, 1, 1, 0, 0}
To calculate f[i], we do backward (since we need to use the stack technique):
f[6] = 0, since it's the last one
f[5] = 0, since we could not find any number that is smaller than 0
f[4] = f[5] + 1 = 1, since ip[5] is the first smaller number to the right
f[3] = f[5] + 1 = 1, since ip[5] is the first smaller number to the right
f[2] = f[3] + 1 = 2, since ip[3] is the first smaller number to the right
f[1] = f[5] + 1 = 1, since ip[5] is the first smaller number to the right
Therefore, ans = f[1] + f[2] = 3
Note 1: Using Fenwick Tree (Binary Index Tree) to get inverse pair can be done in O(N log N), here is a post on this topic, please have a look :)
Update
Aug/20/2014: There was a critical error in my previous post (thanks to #make_lover), here is the latest update.

Resources