Quicksort: Performance issues with Python implementation - performance

I am trying to implement a quicksort algorithm in python 3 that uses a partition function that checks for equal elements.
My solution seems to work find with respect to the solution, however needs up to 40 seconds on arrays of length 10^5 already, which is rather extreme.
I am fairly new to python and can not detect what causes the code to run this slowly.
I am using the following code:
import sys
import random
def partition3(a, l, r):
# bringing all elements less than or equal to x to the front
x = a[l]
j = l;
for i in range(l + 1, r + 1):
if a[i] <= x:
j += 1
a[i], a[j] = a[j], a[i]
#print('exchange happened:', a)
a[l], a[j] = a[j], a[l]
# shifting all elements with value x to the middle
cnt = j-1
xcnt = 0
i = l
while i < cnt:
if a[i] == x:
xcnt += 1
while a[cnt] == x and cnt > i:
cnt -= 1
a[cnt], a[i] = a[i], a[cnt]
cnt -= 1
i += 1
return j - xcnt, j
def randomized_quick_sort(a, l, r):
if l >= r:
return
k = random.randint(l, r)
a[l], a[k] = a[k], a[l]
m1, m2 = partition3(a, l, r)
randomized_quick_sort(a, l, m1 - 1);
randomized_quick_sort(a, m2 + 1, r);
I would be very grateful to receive some advice on this issue.

I found the problem:
In the definition of partition3(a,l,r) instead of:
# shifting all elements with value x to the middle
cnt = j-1
xcnt = 0
i = l
while i < cnt:
if a[i] == x:
xcnt += 1
while a[cnt] == x and cnt > i:
cnt -= 1
a[cnt], a[i] = a[i], a[cnt]
i += 1
I needed to add a further xcnt += 1 to get correct counting of the elements equal to the the pivot, so that the code becomes:
# shifting all elements with value x to the middle
cnt = j-1
xcnt = 0
i = l
while i < cnt:
if a[i] == x:
xcnt += 1
while a[cnt] == x and cnt > i:
cnt -= 1
xcnt += 1
a[cnt], a[i] = a[i], a[cnt]
cnt -= 1
i += 1
This mistake lead to underperformance in data sets that contained very many equal values, because the length of the block of equal values was underestimated. Therefore many more than necessary function calls had been executed, reverting the solution essentially back to a standard Quicksort without accounting for equal elements at all.

Related

Number of ways to take k steps on a path of length N

We have a path of length N. At a time we can only take a unit step. How many ways we can take K steps while remaining inside the path. Initially we are at the 0th position.
example N =5
|---|---|---|---|---|
0 1 2 3 4 5
if k = 3 then we move like -
0->1->2->1
0->1->0->1
0->1->2->3
Can you please give some directions/links on how to approach this problem?
It's likely to be solvable using combinatorial methods rather than computational methods. But since you're asking on stackoverflow, I assume you want a computational solution.
There's a recurrence relation defining the number of paths ending at i:
P[N, 0, i] = 1 if i==0 otherwise 0
P[N, K, i] = 0 if i<0 or i>N
P[N, K, i] = P[N, K-1, i-1] + P[N, K-1, i+1]
We can iteratively compute the array of P[N, K, i] for i=0..N for a given K from the array P[N, K-1, i] for i=0..N.
Here's some Python code that does this. It uses a small trick of having an extra 0 at the end of the array so that r[-1] and r[N+1] are both zero.
def paths(N, K):
r = [1] + [0] * (N+1)
for _ in xrange(K):
r = [r[i-1]+r[i+1] for i in xrange(N+1)] + [0]
return sum(r)
print paths(5, 3)
This runs in O(NK) time.
A different (but related) solution is to let M be the (N+1) by (N+1) matrix consisting of 1's at positions (i+1,i) and (i,i+1) for i=0..N+1, and 0's elsewhere -- that is, 1's on the subdiagonal and superdiagonal. Then M^K (that is, M raised to the Kth power) contains at position (i, j) the number of paths from i to j in K steps. So sum(M^K[0,i] for i=0..N) is the total number of all paths starting at 0 of length K. This runs in O(N^3logK) time, so is better than the iterative method only if K is much larger than N.
Java implementation of first approach in accepted answer -
for (int i = 0; i <= K; i++) {
for (int j = 1; j <= N; j++) {
if (i > 0)
dp1[i][j] = (dp1[i - 1][j - 1] + dp1[i - 1][j + 1]) % 1000000007;
else
dp1[i][j] = 1;
}
}
System.out.println(dp1[K][N-1])
Complexity O(KN)
Java DP implementation, it computes answers for all starting positions and values 1-N and 1-K -
for (int i = 0; i <= K; i++) {
for (int j = 1; j <= N; j++) {
for (int k = 1; k <= j; k++) {
if (i > 0)
dp[k][j][i] =
(dp[k - 1][j][i - 1] + dp[k + 1][j][i - 1]) % 1000000007;
else
dp[k][j][i] = 1;
}
}
}
System.out.println(dp[1][5][3]);
O(KN^2)

LSD Radix Sort for Integers

I'm having trouble wrapping my head around using radix sort for a group of fixed-length integers. In my below attempt to implement least significant digit radix sort, I have a function called num_at which returns the digit d in the number num. The code I've written has w = 2, where w represents the length of each number. (Essentially, then, this code is written for 3 digit numbers as my input below shows).
I modeled this off of key-indexed counting for each digit, but I'm getting an output of [0, 12, 32, 32, 44, 0] and frankly having a tough time following why. This is my input: [32,12,99,44,77, 12] I initiated the count array with 99 values because of all the possible values there can be between 0 - 99 in accordance with key indexed counting. Does anything here immediately jump out as incorrect? Also, is my num_at method the right way to do this or is there a better way?
def num_at(num, d)
return num if num/10 == 0
a = num
#return the dth digit of num
counter = 0
until counter == d
a/=10
counter += 1
end
a % 10
end
def radix_sort(arr)
w = 2
#count_arr can have any possible number from 0-99
aux = Array.new(arr.length) {0}
d = w-1
while d >= 0
count = Array.new(99) {0}
i = 0
#create freq arr
while i < arr.length
count[num_at(arr[i], d) + 1] += 1 #offset by 1
i += 1
end
#compute cumulates
i = 0
while i < count.length - 1
count[i + 1] += count[i]
i += 1
end
z = 0
#populate aux arr
while z < arr.length
aux[num_at(arr[z], d)] = arr[z]
count[num_at(arr[z], d)] += 1
z += 1
end
#override original arr
z = 0
while z < arr.length
arr[z] = aux[z]
z += 1
end
d -= 1
end
arr
end
This is functioning Java code for LSD radix sort taken from a textbook that I'm trying to implement with Ruby:
public static void lsd(String[] a)
{
int N = a.length;
int W = a[0].length;
for (int d = W-1; d >= 0; d--)
{
int[] count = new int[R];
for (int i = 0; i < N; i++)
count[a[i].charAt(d) + 1]++;
for (int k = 1; k < 256; k++)
count[k] += count[k-1];
for (int i = 0; i < N; i++)
temp[count[a[i].charAt(d)]++] = a[i];
for (int i = 0; i < N; i++)
a[i] = temp[i];
}
and the pseudocode for key-indexed counting (which is being repeated for every char):
Task: sort an array a[] of N integers between 0 and R-1
Plan: produce sorted result in array temp[]
1. Count frequencies of each letter using key as index
2. Compute frequency cumulates
3. Access cumulates using key as index to find record positions.
4. Copy back into original array
5. List item
LSD radix sort. Consider characters d from right to left Stably sort
using dth character as the key via key-indexed counting.
The above java code / pseudocode information is pulled from this link: http://www.cs.princeton.edu/courses/archive/spring07/cos226/lectures/11RadixSort.pdf

Optimize: Divide an array into continuous subsequences of length no greater than k such that sum of maximum value of each subsequence is minimum

Optimize O(n^2) algorithm to O(n log n).
Problem Statement
Given array A of n positive integers. Divide the array into continuous subsequences of length no greater than k such that sum of maximum value of each subsequence is minimum. Here's an example.
If n = 8 and k = 5 and elements of the array are 1 4 1 3 4 7 2 2, best solution is 1 | 4 1 3 4 7 | 2 2. The sum would be max{1} + max{4, 1, 3, 4, 7} + max{2, 2} = 1 + 7 + 2 = 10.
O(n^2) solution
Let dp[i] be the minimum sum as in problem statement for subproblem array A[0] ... A[i]. dp[0] = A[0] and, for 0 < i < n (dp[-1] = 0),
// A, n, k, - defined
// dp - all initialized to INF
dp[0] = A[0];
for (auto i = 1; i < n; i++) {
auto max = -INF;
for (auto j = i; j >= 0 && j >= i-k+1; j--) {
if (A[j] > max)
max = A[j];
auto sum = max + (j > 0 ? dp[j-1] : 0);
if (sum < dp[i])
dp[i] = sum;
}
}
// answer: dp[n-1]
O(n log n) ?
The problem author claimed that it was possible to solve this in O(n log n) time, and there are some people who were able to pass the test cases. How can this be optimized?
NOTE: I'm gonna change slightly your dynamic programming relation, so that there is no special case if j = 0. Now dp[j] is the answer for the first j termsA[0], ..., A[j-1] and:
dp[i] = min(dp[j] + max(A[j], ..., A[i-1]), i-k <= j < i)
The answer of the problem is now dp[n].
Notice that if j < i and dp[j] >= dp[i], you won't need dp[j] in the following transitions, because max(A[j], ..., A[l]) >= max(A[i], ..., A[l]) (so it will be always better to cut at i instead of j.
Furthermore let C[j] = max(A[j+1], ..., A[l]) (where l is our current index in the dynamic programming step, ie. i in your C++ program).
Then you can keep in memory some set of indices x1 < ... < xm (the "interesting" indices for the transitions of your dynamic programming relation) such that: dp[x1] < ... < dp[xm] (1). Then automatically C[x1] >= ... >= C[xm] (2).
To store {x1, ..., xm}, we need some data structure that supports the following operations:
Pop back (when we move from i to i+1, we must say that i-k is now unreachable) or front (cf. insertion).
Push front x (when we have computed dp[i], we insert it while preserving (1), by deleting the corresponding elements).
Compute min(dp[xj] + C[xj], 1 <= j <= m).
Thus some queue to store x1, ..., xk together with a set to store all dp[xi] + C[xi] will be enough.
How do we both preserve (1) and update C when we insert an element i?
Before computing dp[i], we update C with A[i-1]. For that we find the smallest element xj in the set x s.t. C[xj] <= A[i-1]. Then (1) and (2) imply dp[j'] + C[j'] >= dp[j] + C[j] for all j' >= j, so we update C[xj] to A[i-1] and we delete x(j+1), ..., xm from the set (*).
When we insert dp[i], we just delete all elements s.t. dp[j] >= dp[i] by popping front.
When we remove i-k, it may be possible that some element destroyed in (*) is now becoming best. So if necessary we update C and insert the last element.
Complexity : O(n log n) (there could be at most 2n insertions in the set).
This code sums up the main ideas:
template<class T> void relaxmax(T& r, T v) { r = max(r, v); }
vector<int> dp(n + 1);
vector<int> C(n + 1, -INF);
vector<int> q(n + 1);
vector<int> ne(n + 1, -INF);
int qback = 0, qfront = 0;
auto cmp = [&](const int& x, const int& y) {
int vx = dp[x] + C[x], vy = dp[y] + C[y];
return vx != vy ? vx < vy : x < y;
};
set<int, decltype(cmp)> s(cmp);
dp[0] = 0;
s.insert(0);
q[qfront++] = 0;
for (int i = 1; i <= n; ++i) {
C[i] = A[i - 1];
auto it_last = lower_bound(q.begin() + qback, q.begin() + qfront, i, [=](const int& x, const int& y) {
return C[x] > C[y];
});
for (auto it = it_last; it != q.begin() + qfront; ++it) {
s.erase(*it);
C[*it] = A[i - 1];
ne[*it] = i;
if (it == it_last) s.insert(*it);
}
dp[i] = dp[*s.begin()] + C[*s.begin()];
while (qback < qfront && dp[q[qfront]] >= dp[i]) {
s.erase(q[qfront]);
qfront--;
}
q[qfront++] = i;
C[i] = -INF;
s.insert(i);
if (q[qback] == i - k) {
s.erase(i - k);
if (qback + 1 != qfront && ne[q[qback]] > q[qback + 1]) {
s.erase(q[qback + 1]);
relaxmax(C[q[qback + 1]], C[i - k]);
s.insert(q[qback + 1]);
}
qback++;
}
}
// answer: dp[n]
This time I stress-tested it against your algorithm: see here.
Please let me know if it's still unclear.

Replace operators of equation, so that the sum is equal to zero

I'm given the equation like this one:
n = 7
1 + 1 - 4 - 4 - 4 - 2 - 2
How can I optimally replace operators, so that the sum of the equation is equal to zero, or print  -1. I think of one algorithm, but it is not optimal. I have an idea to bruteforce all cases with complexity O(n*2^n), but (n < 300).
Here is the link of the problem: http://codeforces.com/gym/100989/problem/M.
You can solve this with dynamic programming. Keep a map of all possible partial sums (mapping to the minimum number of changes to reach this sum), and then update it one number at a time,
Here's a concise Python solution:
def signs(nums):
xs = {nums[0]: 0}
for num in nums[1:]:
ys = dict()
for d, k in xs.iteritems():
for cost, n in enumerate([num, -num]):
ys[d+n] = min(ys.get(d+n, 1e100), k+cost)
xs = ys
return xs.get(0, -1)
print signs([1, 1, -4, -4, -4, -2, -2])
In theory this has exponential complexity in the worst case (since the number of partial sums can double at each step). However, if (as here) the given numbers are always (bounded) small ints, then the number of partial sums grows linearly, and the program works in O(n^2) time.
A somewhat more optimised version uses a sorted array of (subtotal, cost) instead of a dict. One can discard partial sums that are too large or too small (making it impossible to end up at 0 assuming all of the remaining elements are between -300 and +300). This runs approximately twice as fast, and is a more natural implementation to port to a lower-level language than Python for maximum speed.
def merge(xs, num):
i = j = 0
ci = 0 if num >= 0 else 1
cj = 0 if num < 0 else 1
num = abs(num)
while j < len(xs):
if xs[i][0] + num < xs[j][0] - num:
yield (xs[i][0] + num, xs[i][1] + ci)
i += 1
elif xs[i][0] + num > xs[j][0] - num:
yield (xs[j][0] - num, xs[j][1] + cj)
j += 1
else:
yield (xs[i][0] + num, min(xs[i][1] + ci, xs[j][1] + cj))
i += 1
j += 1
while i < len(xs):
yield (xs[i][0] + num, xs[i][1] + ci)
i += 1
def signs2(nums):
xs = [(nums[0], 0)]
for i in xrange(1, len(nums)):
limit = (len(nums) - 1 - i) * 300
xs = [x for x in merge(xs, nums[i]) if -limit <= x[0] <= limit]
for x, c in xs:
if x == 0: return c
return -1
print signs2([1, 1, -4, -4, -4, -2, -2])
Here is the implementation in C++:
unordered_map <int, int> M, U;
unordered_map<int, int>::iterator it;
int a[] = {1, -1, 4, -4};
int solve() {
for(int i = 0; i < n; ++i) {
if(i == 0) M[a[i]] = 1;
else {
vector <pair <int, int>> vi;
for(it = M.begin(); it != M.end(); ++it) {
int k = it->first, d = it->second;
vi.push_back({k + a[i], d});
vi.push_back({k - a[i], d + 1});
}
for(int j = 0; j < vi.size(); ++j) M[vi[j].first] = MAXN;
for(int j = 0; j < vi.size(); ++j) {
M[vi[j].first] = min(M[vi[j].first], vi[j].second);
}
}
}
return (M[0] == 0 ? -1 : M[0] - 1);
}
What I can think of:
You calculate the original equation. This results in -14.
Now you sort the numbers (taking into account their + or -)
When the equation results in a negative number, you look for the largest numbers to fix the equation. When a number is too large, you skip it.
orig_eq = -14
After sorting:
-4, -4, -4, -2, -2, 1, 1
You loop over this and select each number if the equation orig_eq - current number is closer to zero.
This way you can select each number to change the sign of

Correctness of Hoare Partition

Hoare partition as given in cormen:
Hoare-Partition(A, p, r)
x = A[p]
i = p - 1
j = r + 1
while true
repeat
j = j - 1
until A[j] <= x
repeat
i = i + 1
until A[i] >= x
if i < j
swap( A[i], A[j] )
else
return j
when using this in Quick Sort, with {1,3,9,8,2,7,5} as input, after first partition getting {1,3,5,2,8,7,9}, which is not correct since, all elements smaller to pivot( here 5 ) should be on the left side. Can someone point out as to what I am missing?
The algorithm is correct. You're partitioning the subarray A[p..r] using A[p] as the pivot. So the pivot is 1 and not 5.
Hoare-Partition(A=[1,3,9,8,2,7,5], p=0, r=6)
results in:
x = A[p] = 1
i = -1
j = 7
repeat:
j = j - 1 = 6; A[j] = 5
j = j - 1 = 5; A[j] = 7
j = j - 1 = 4; A[j] = 2
...
j = j - 1 = 0; A[j] = 1
A[j] == x
repeat:
i = i + 1 = 0; A[i] = 1
A[i] == x
if i < j
i == j, therefore return j
In this case, no elements are swapped.
I don't have Cormen in front of me, but I'm pretty sure that the comparison in the first loop should be until A[j] < x - if it's <=, you'll move the pivot element to the left side and leave it there (followed by smaller elements), which is what happened in your example. Alternatively, the first comparison could be <= and the second could be > - just make sure that the pivot element won't get "caught" by both comparisons.

Resources