Count number of swaps to sort first k-smallest element using a bubble sort like algorithm - algorithm

Given an array a and integer k. Someone uses following algorithm to get first k smallest elements:
cnt = 0
for i in [1, k]:
for j in [i + 1, n]:
if a[i] > a[j]:
swap(a[i], a[j])
cnt = cnt + 1
The problem is: How to calculate value of cnt (when we get final k-sorted array), i.e. the number of swaps, in O(n log n) or better ?
Or simply put: calculate the number of swaps needed to get first k-smallest number sorted using the above algorithm, in less than O(n log n).
I am thinking about a binary search tree, but I get confused (How array will change when increase i ? How to calculate number of swap for a fixed i ?...).

This is a very good question: it involves Inverse Pairs, Stack and some proof techniques.
Note 1: All index used below are 1-based, instead of traditional 0-based.
Note 2: If you want to see the algorithm directly, please start reading from the bottom.
First we define Inverse Pairs as:
For a[i] and a[j], in which i < j holds, if we have a[i] > a[j], then a[i] and a[j] are called an Inverse Pair.
For example, In the following array:
3 2 1 5 4
a[1] and a[2] is a pair of Inverse Pair, a[2] and a[3] is another pair.
Before we start the analysis, let's define a common language: in the reset of the post, "inverse pair starting from i" means the total number of inverse pairs involving a[i].
For example, for a = {3, 1, 2}, inverse pair starting from 1 is 2, and inverse pair starting from 2 is 0.
Now let's look at some facts:
If we have i < j < k, and a[i] > a[k], a[j] > a[k], swap a[i] and a[j] (if they are an inverse pair) won't affect the total number of inverse pair starting from j;
Total inverse pairs starting from i may change after a swap (e.g. suppose we have a = {5, 3, 4}, before a[1] is swapped with a[2], total number of inverse pair starting from 1 is 2, but after swap, array becomes a = {3, 5, 4}, and the number of inverse pair starting from 1 becomes 1);
Given an array A and 2 numbers, a and b, as the head element of A, if we can form more inverse pair with a than b, we have a > b;
Let's denote the total number of inverse pair starting from i as ip[i], then we have: if k is the min number satisfies ip[i] > ip[i + k], then a[i] > a[i + k] while a[i] < a[i + 1 .. i + k - 1] must be true. In words, if ip[i + k] is the first number smaller than ip[i], a[i + k] is also the first number smaller than a[i];
Proof of point 1:
By definition of inverse pair, for all a[k], k > j that forms inverse pair with a[j], a[k] < a[j] must hold. Since a[i] and a[j] are a pair of inverse and provided that i < j, we have a[i] > a[j]. Therefore, we have a[i] > a[j] > a[k], which indicates the inverse-pair-relationships are not broken.
Proof of point 3:
Leave as empty since quite obvious.
Proof of point 4:
First, it's easy to see that when i < j, a[i] > a[j], we have ip[i] >= ip[j] + 1 > ip[j]. Then, it's inverse-contradict statement is also true, i.e. when i < j, ip[i] <= ip[j], we have a[i] <= a[j].
Now back to the point. Since k is the min number to satisfy ip[i] > ip[i + k], then we have ip[i] <= ip[i + 1 .. i + k - 1], which indicates a[i] <= a[i + 1.. i + k - 1] by the lemma we just proved, which also indicates there's no inverse pairs in the region [i + 1, i + k - 1]. Therefore, ip[i] is the same as the number of inverse pairs starting from i + k, but involving a[i]. Given ip[i + k] < ip[i], we know a[i + k] has less inverse pair than a[i] in the region of [i + k + 1, n], which indicates a[i + k] < a[i] (by Point 3).
You can write down some sequences and try out the 4 facts mentioned above and convince yourself or disprove them :P
Now it's about the algorithm.
A naive implementation will take O(nk) to compute the result, and the worst case will be O(n^2) when k = n.
But how about we make use of the facts above:
First we compute ip[i] using Fenwick Tree (see Note 1 below), which takes O(n log n) to construct and O(n log n) to get all ip[i] calculated.
Next, we need to make use of facts. Since swap of 2 numbers only affect current position's inverse pair number but not values after (point 1 and 2), we don't need to worry about the value change. Also, since the nearest smaller number to the right shares the same index in ip and a, we only need to find the first ip[j] that is smaller than ip[i] in [i + 1, n]. If we denote the number of swaps to get first i element sorted as f[i], we have f[i] = f[j] + 1.
But how to find this "first smaller number" fast? Use stack! Here is a post which asks a highly similar problem: Given an array A,compute B s.t B[i] stores the nearest element to the left of A[i] which is smaller than A[i]
In short, we are able to do this in O(n).
But wait, the post says "to the left" but in our case it's "to the right". The solution is simple: we do backward in our case, then everything the same :D
Therefore, in summary, the total time complexity of the algorithm is O(n log n) + O(n) = O(n log n).
Finally, let's talk with an example (a simplified example of #make_lover's example in the comment):
a = {2, 5, 3, 4, 1, 6}, k = 2
First, let's get the inverse pairs:
ip = {1, 3, 1, 1, 0, 0}
To calculate f[i], we do backward (since we need to use the stack technique):
f[6] = 0, since it's the last one
f[5] = 0, since we could not find any number that is smaller than 0
f[4] = f[5] + 1 = 1, since ip[5] is the first smaller number to the right
f[3] = f[5] + 1 = 1, since ip[5] is the first smaller number to the right
f[2] = f[3] + 1 = 2, since ip[3] is the first smaller number to the right
f[1] = f[5] + 1 = 1, since ip[5] is the first smaller number to the right
Therefore, ans = f[1] + f[2] = 3
Note 1: Using Fenwick Tree (Binary Index Tree) to get inverse pair can be done in O(N log N), here is a post on this topic, please have a look :)
Update
Aug/20/2014: There was a critical error in my previous post (thanks to #make_lover), here is the latest update.

Related

Finding best algorithm for sum of a section of an array's values

Given an array of n integers in the locations A[1], A[2], …, A[n], describe an O(n^2) time algorithm to
compute the sum A[i] + A[i+1] + … + A[j] for all i, j, 1 ≤ i < j ≤ n.
I've tried multiple ways of solving this problem but none have in O(n^2) time.
So for an array containing {1,2,3,4}
You would output:
1+2 = 3
1+2+3 = 6
1+2+3+4 = 10
2+3 = 5
2+3+4 = 9
3+4 = 7
The answer does not need to be in a specific language, pseudocode is preferred.
A good preperation is everything.
You could create an array of integrals:
I[0..n] = (0, I[0] + A[1], I[1] + A[2], ..., I[n-1]+A[n]);
This will cost you O(n) * O(1) (looping over all elements and doing one addition);
Now you can calculate each Sum(A, i, j) with just a single subtraction: I[j] - I[i-1];
so this has O(1)
Looping over all combinations of i and j with 1 <= (i,j) <= n has O(n^2).
So you end up with O(n) * O(1) + O(n^2) * O(1) = O(n^2) .
Edit:
Your array A starts at 1 - adapted to this - this also solves the little quirk with i-1
So the integral array I starts with index 0 and is 1 element larger than A
Edit:
First you'll maybe have thought about the most naive idea:
Naive idea
Create a function that for given values of i and of j will return the sum A[i] + ... + A[j].
function sumRange(A, i, j):
sum = 0
for k = i to j
sum = sum + A[k]
return sum
Then generate all pairs of i and j (with i < j) and call the above function for each pair:
for i = 1 to n
for j = i+1 to n
output sumRange(A, i, j)
This is not O(n²), because already the two loops on i and j represent O(n²) iterations, and then the function will perform yet another loop, making it O(n³).
Better idea
The above can be improved. Look at the repetition it performs. The sum that was calculated for given values of i and j could be reused to calculate the sum for when j has increased with 1, without starting from scratch and summing the values between i and (now) j-1 again, only to add that one more value to it.
We should just remember what the previous sum was, and add A[j] to it.
So without a separate function:
for i = 1 to n
sum = A[i]
for j = i+1 to n
sum = sum + A[j]
output sum
Note how the sum is not reset to 0 once it is output. It is preserved, so that when j is incremented, only one value needs to be added to it.
Now it is O(n²). Note also how it does not require an extra array for storage. It only needs the memory for a few variables (i, j, sum), so its space complexity is O(1).
As the number of sums you need to output is O(n²), there is no way to improve this time complexity any further.
NB: I assume here that single array values do not constitute a "sum". As you stated in your question, i < j, and also in your example you only showed sums of at least two array values. The above can be easily adapted to also include single value "sums" if ever that were needed.

Counting Inversions In An Array - Special Case

Inversion Count for an array indicates – how far (or close) the array is from being sorted. If array is already sorted then inversion count is 0. If array is sorted in reverse order that inversion count is the maximum.
Formally speaking, two elements a[i] and a[j] form an inversion if a[i] > a[j] and i < j Example:
The sequence 2, 4, 1, 3, 5 has three inversions (2, 1), (4, 1), (4, 3).
Now, there are various algorithms to solve this in O(n log n).
There is a special case where the array only has 3 types of elements - 1, 2 and 3. Now, is it possible to count the inversions in O(n) ?
Eg 1,1,3,2,3,1,3
Yes it is. Just take 3 integers a,b,c where a is number of 1's encountered till now, b is number of 2's encountered till now and c is number of 3's encountered till now. Given this follow the algorithm below ( I assume numbers are given in array arr and the size is n, with 1 based indexing, also following is just a pseudocode )
no_of_inv = 0
a = 0
b = 0
c = 0
for i from 1 to n:
if arr[i] == 1:
no_of_inv = no_of_inv + b + c
a++
else if arr[i] == 2:
no_of_inv = no_of_inv + c
b++
else:
c++
(This algorithm is extremely similar to Sasha's. I just wanted to provide an explanation as well.)
Every inversion (i, j) satisfies 0 ≤ i < j < n. Let's define S[j] to be the number of inversions of the form (i, j); that is, S[j] is the number of times A[i] > A[j] for 0 ≤ i < j. Then the total number of inversions is T = S[0] + S[1] + … + S[n - 1].
Let C[x][j] be the number of times A[i] > x for 0 ≤ i < j. Then S[j] = C[A[j]][j] for all j. If we can compute the 3n values C[x][j] in linear time, then we can compute S in linear time.
Here is some Python code:
>>> import numpy as np
>>> A = np.array([1, 1, 3, 2, 3, 1, 3])
>>> C = {x: np.cumsum(A > x) for x in np.unique(A)}
>>> T = sum(C[A[j]][j] for j in range(len(A)))
>>> print T
4
This could be made more efficient—although not in asmpytotic terms—by not storing all C values at once. The algorithm really only needs a single pass through the array. I have chosen to present it this way because it is most concise.

Given 2 arrays of non-negative numbers, find the minimum sum of products

Given two arrays A and B, each containing n non-negative numbers, remove a>0 elements from the end of A and b>0 elements from the end of B. Evaluate the cost of such an operation as X*Y where X is the sum of the a elements removed from A and Y the sum of the b elements removed from B. Keep doing this until both arrays are empty. The goal is to minimize the total cost.
Using dynamic programming and the fact that an optimal strategy will always take exactly one element from either A or B I can find an O(n^3) solution. Now I'm curious to know if there is an even faster solution to this problem?
EDIT: Stealing an example from #recursive in the comments:
A = [1,9,1] and B = [1, 9, 1]. Possible to do with a cost of 20. (1) *
(1 + 9) + (9 + 1) * (1)
Here's O(n^2). Let CostA(i, j) be the min cost of eliminating A[1..i], B[1..j] in such a way that the first removal takes only one element from B. Let CostB(i, j) be the min cost of eliminating A[1..i], B[1..j] in such a way that the first removal takes only one element from A. We have mutually recursive recurrences
CostA(i, j) = A[i] * B[j] + min(CostA(i - 1, j),
CostA(i - 1, j - 1),
CostB(i - 1, j - 1))
CostB(i, j) = A[i] * B[j] + min(CostB(i, j - 1),
CostA(i - 1, j - 1),
CostB(i - 1, j - 1))
with base cases
CostA(0, 0) = 0
CostA(>0, 0) = infinity
CostA(0, >0) = infinity
CostB(0, 0) = 0
CostB(>0, 0) = infinity
CostB(0, >0) = infinity.
The answer is min(CostA(n, n), CostB(n, n)).

count no. of pairs such that absolute difference is less than K

Given an array A of size N, how do I count the number of pairs(A[i], A[j]) such that the absolute difference between them is less than or equal to K where K is any positive natural number? (i, j<=N and i!=j)
My approach:
Sort the array.
Create another array that stores the absolute difference between two consecutive numbers.
Am I heading in the right direction? If yes, then how do I proceed further?
Here is a O(nlogn) algorithm :-
1. sort input
2. traverse the sorted array in ascending order.
3. for A[i] find largest index A[j]<=A[i]+k using binary search.
4. count = count+j-i
5. do 3 to 4 all i's
Time complexity :-
Sorting : O(n)
Binary Search : O(logn)
Overall : O(nlogn)
This is O(n^2):
Sort the array
For each item_i in array,
For each item_j in array such that j > i
If item_j - item_i <= k, print (item_j, item_i)
Else proceed with the next item_i
Your approach is partially correct. You first sort the array. Then keep two pointers i and j.
1. Initialize i = 0, j = 1.
2. Check if A[j] - A[i] <= K.
- If yes, then increment j,
- else
- **increase the count of pairs by C(j-i,2)**.
- increment i.
- if i == j, then increment j.
3. Do this till pointer j goes past the end of the array. Then just add C(j-1,2) to the count and stop.
By i and j, you are basically maintaining a window within which the difference between elements is <= K.
EDIT: This is the basic idea, you will have to check for boundary conditions. Also you will have to keep track of the past interval that was added to the count. You will need to subtract the overlap with the current interval to avoid double counting.
Complexity: O(NlogN), for the sort operation, linear for the array traversal
Once your array is sorted, you can compute the sum in O(N) time.
Here's some code. The O(N) algorithm is pair_sums, and pair_sums_slow is the obviously correct, but O(N^2) algorithm. I run through some test cases at the end to make sure that the two algorithms returns the same results.
def pair_sums(A, k):
A.sort()
counts = 0
j = 0
for i in xrange(len(A)):
while j < len(A) and A[j] - A[i] <= k:
j+=1
counts += j - i - 1
return counts
def pair_sums_slow(A, k):
counts = 0
for i in xrange(len(A)):
for j in xrange(i+1, len(A)):
if A[j] - A[i] <= k:
counts+=1
return counts
cases = [
([0, 1, 2, 3, 4, 5], 10),
([0, 0, 0, 0, 0], 1),
([0, 1, 2, 4, 8, 16], 9),
([0, -1, -2, 1, 2], 2)
]
for A, k in cases:
want = pair_sums_slow(A, k)
got = pair_sums(A, k)
if want != got:
print A, k, want, got
The idea behind pair_sums is that for each i, we find the smallest j such that A[j] - A[i] > K (or j=N). Then j-i-1 is the number of pairs with i as the first value.
Because the array is sorted, j only ever increases as i increases, so the overall complexity is linear since although there's nested loops the inner operation j+=1 can occur at most N times.

Find largest continuous sum such that the minimum of it and it's complement is largest

I'm given a sequence of numbers a_1,a_2,...,a_n. It's sum is S=a_1+a_2+...+a_n and I need to find a subsequence a_i,...,a_j such that min(S-(a_i+...+a_j),a_i+...+a_j) is the largest possible (both sums must be non-empty).
Example:
1,2,3,4,5 the sequence is 3,4, because then min(S-(a_i+...+a_j),a_i+...+a_j)=min(8,7)=7 (and it's the largest possible which can be checked for other subsequences).
I tried to do this the hard way.
I load all values into the array tab[n].
I do this n-1 times tab[i]+=tab[i-j]. So that tab[j] is the sum from the beginning till j.
I check all possible sums a_i+...+a_j=tab[j]-tab[i-1] and substract it from the sum, take the minimum and see if it's larger than before.
It takes O(n^2). This makes me very sad and miserable. Is there a better way?
Seems like this can be done in O(n) time.
Compute the sum S. The ideal subsequence sum is the longest one which gets closest to S/2.
Start with i=j=0 and increase j until sum(a_i..a_j) and sum(a_i..a_{j+1}) are as close as possible to S/2. Note which ever is closer and save the values of i_best,j_best,sum_best.
Increment i and then increase j again until sum(a_i..a_j) and sum(a_i..a_{j+1}) are as close as possible to S/2. Note which ever is closer and replace the values of i_best,j_best,sum_best if they are better. Repeat this step until done.
Note that both i and j are never decremented, so they are changed a total of at most O(n) times. Since all other operations take only constant time, this results in an O(n) runtime for the entire algorithm.
Let's first do some clarifications.
A subsequence of a sequence is actually a subset of the indices of the sequence. Haivng said that, and specifically int he case where you sequence has distinct elements, your problem will reduce to the famous Partition problem, which is known to be NP-complete. If that is the case, you can manage to solve the problem in O(Sn) where "n" is the number of elements and "S" is the total sum. This is not polynomial time as "S" can be arbitrarily large.
So lets consider the case with a contiguous subsequence. You need to observe array elements twice. First run sums them up into some "S". In the second run you carefully adjust array length. Lets assume you know that a[i] + a[i + 1] + ... + a[j] > S / 2. Then you let i = i + 1 to reduce the sum. Conversely, if it was smaller, you would increase j.
This code runs in O(n).
Python code:
from math import fabs
a = [1, 2, 3, 4, 5]
i = 0
j = 0
S = sum(a)
s = 0
while s + a[j] <= S / 2:
s = s + a[j]
j = j + 1
s = s + a[j]
best_case = (i, j)
best_difference = fabs(S / 2 - s)
while True:
if fabs(S / 2 - s) < best_difference:
best_case = (i, j)
best_difference = fabs(S / 2 - s)
if s > S / 2:
s -= a[i]
i += 1
else:
j += 1
if j == len(a):
break
s += a[j]
print best_case
i = best_case[0]
j = best_case[1]
print "Best subarray = ", a[i:j + 1]
print "Best sum = " , sum(a[i:j + 1])

Resources