Merge sorted sequences with split and concat - algorithm

I am struggling with following assignment:
Given sorted sequences of numbers and operations and , find an optimal sequence of those operations (the shortest one), which creates one sorted sequence.
I've devised following algorithm:
1. Sort sequences C1, C2, ..., Cn with respect to they first elements.
2. When number of sequences is greater than one:
3. Find in C1 the last position with the number that is less than the first number in C2.
4. If found_position == |C1|
5. C1 = concat(C1, C2)
6. Else:
7. C1a, C1b = split(C1, found_position + 1).
8. C1 = concat(C1a, C2).
9. Insert C1b to the set of sequences maintaining the order (with respect to their first elements).
10. Remove C2 from the set of sequences.
11. Go to step 2., in step 3. start searching from found_position.
An example:
1 4 5 9
2 6 10 11
7 12 20
1 4 5 9 2 6 10 11 7 12 20
^
1 4 5 9 2 6 10 11 7 12 20 // split
^
1 2 6 10 11 4 5 9 7 12 20 // concat
^
1 2 6 10 11 4 5 9 7 12 20
1 2 6 10 11 4 5 9 7 12 20
^
1 2 4 5 9 6 10 11 7 12 20
^
1 2 4 5 9 6 10 11 7 12 20
^
1 2 4 5 9 6 10 11 7 12 20
^
1 2 4 5 6 10 11 7 12 20 9
^
. . .
. . .
1 2 4 5 6 7 9 10 11 12 20
To maintain ordered working set of sequences, I could use balanced binary tree (insert in step 8 is nlog n).
Is it correct? How to prove its correctness?

Related

Algorithm for generating a cross-sum matrix game

I'm trying to generate a matrix, like for a cross-sum game, where in a matrix of random numbers, for a given a sum (or a product, depending on a chosen operation) for each row and column, there's exactly 1 way to "deactivate" (meaning, to exclude the number from the final sum or product) correct numbers so that each row and column end up summing the active numbers to the correct sum.
To illustrate this, let's say I have a 3x3 matrix, and chosen sums (numbers next to * represent the sum):
*12* *5* *3*
4* 1 2 3 *4
9* 4 5 6 *9
7* 7 8 9 *7
In order to solve this, I would need to deactivate numbers 2, 6, 9 and 8.
One way to generate a matrix with needed sums it to just generate the numbers, and then choose which ones to exclude at random. However, the drawback is that for bigger matrices, like 7x7, 8x8, there's a good possibility that there will be more than 1 solution.
Another solution I'm thinking of is to exclude the numbers that can add up to another for each row / column. For example if a required sum is 5, then 4 2 1 3 would be invalid because (4 + 1 and 3 + 2), but this seems rather complicated and inefficient.
If anyone has any pointers, I'd greatly appreciate it. This seems like it's a solved problem, but I have no idea what to look for.
Checking random grids with a solver
For a matrix of limited size, up to 10×10 or so, a simple solver can quickly find the solutions. If there is only one, even the quick'n'dirty solver I wrote in javaScript usually finds in it less than a second.
I used a simple recursive row-by-row solver which works like this:
For each column, iterate over every possible selection of numbers and check whether excluding them gives the column the correct sum. Then check whether any numbers are part of all or none of the valid selections; these will have to be included and avoided in all selections.
For each row, iterate over every possible selection of numbers and check whether excluding them gives the row the correct sum, and whether they contain all of the to-be-included and none of the to-be-avoided numbers identified in the previous step. Store all valid selections per row.
After these preparations, the recursive part of the algorithm is then called:
Receive a matrix with numbers, a list of sums per row, and a list of sums per column.
For the top row, check whether any of the numbers cannot be excluded (because the numbers below it add up to less that the sum for that column).
Iterate over all valid selections of numbers in the top row (as identified in the preparation phase). For each selection, check whether removing it gives the row its correct sum. If it does, recurse with a copy of the matrix with the top row removed, a list of sums per row with the first item removed, and a list of sums per column with the non-excluded numbers in the top row subtracted.
Starting from a pattern like this, where the X's indicate which cell will be excluded:
- - - X - - - X - -
- - - - X - X - - -
X - - - - X - - - -
- X - - - - - - - X
- - X - - - - - X -
- X - - - - - X - -
X - - - - - - - X -
- - - - X - - - - X
- - - X - X - - - -
- - X - - - X - - -
I let the matrix be filled with random numbers from 1 to 9, and then ran the solver on it, and about one in ten attempts results in a grid like this, which has exactly one solution:
4 1 3 8 1 3 4 1 1 8 25
9 9 7 8 1 1 3 2 1 7 44
9 8 8 1 5 5 9 2 2 6 41
4 6 8 1 9 2 1 7 1 5 33
9 4 2 4 4 5 8 6 3 8 48
8 5 6 9 6 6 6 4 1 8 50
4 3 2 4 8 7 6 7 9 1 38
6 7 8 1 9 9 9 4 6 7 50
7 7 1 7 9 6 2 7 1 2 36
3 3 8 8 9 2 4 9 6 8 48
50 42 43 36 51 35 45 44 19 48
When using only numbers from 1 to 9, grids with only one solution are easy to find for smaller grids (more than half of 8×8 grids have only one solution), but become hard to find for grid sizes over 10×10. Most larger grids have many solutions, like this one which has 16:
4 1 5 7 2 2 5 6 5 8 32
5 1 1 6 4 6 5 2 2 9 32
9 2 3 8 7 7 4 8 3 6 41
4 8 1 8 4 3 1 9 7 2 37
4 6 9 8 8 5 8 6 6 5 50
1 5 5 5 1 3 5 7 7 1 28
5 5 1 7 2 9 2 6 3 8 40
9 8 9 2 8 3 1 9 6 8 47
5 1 3 7 1 2 6 1 8 9 34
1 5 1 2 1 1 1 6 4 3 23
33 29 28 46 26 32 32 47 42 49
The number of solutions also depends on the number of excluded numbers per row and column. The results shown above are specifically for the pattern with two excluded numbers per row and column. The more excluded numbers, the greater the average number of solutions (I assume with a peak at 50% excluded numbers).
You can of course use a random pattern of cells to be excluded, or choose the numbers by hand, or have random numbers chosen with a certain distribution, or give the matrix any other property that you think will enhance its usefulness as a puzzle. Multiple solutions don't seem to be a big problem for smaller grids, but it is of course best to check for them; I first ran the solver on a grid I had made by hand, and it turned out to have three solutions.
Choosing the excluded values
Because the value of the excluded numbers can be chosen freely, this is the obvious way to improve the chance of a matrix having only one solution. If you choose numbers that don't occur anywhere else in the row and column, or only once, then the percentage of 10×10 grids that have only one solution rises from 10% to 50%.
(This simple method obviously gives a clue about which numbers should be excluded – it's not the numbers that occur several times in a row or column – so it's probably better to use how many times each number occurs in the whole grid, not just in its own row and column.)
You could of course choose excluded values that add up to a number that can't be made with any other combination of values in the row or column, and that would guarantee only one solution. The problem with this is of course that such a grid doesn't really work as a puzzle; there is only ever one way to exclude values and get the correct sum for every row and column. A variant would be to choose excluded values so that the sum of the row or column can be made in exactly two, or three, or ... ways. This would also give you a way to choose the difficulty level of the puzzle.
Sudoku – avoiding duplicate values
The fact that larger grids have a higher chance of having more than one solution is of course linked to using only values for 1 to 9. Grids of 10×10 and greater are guaranteed to have duplicate values in every row and column.
To check whether grids with no duplicate values per row or column are more likely to lead to only one solution, the obvious test data is the Sudoku.
When using random patterns of 1 to 3 cells per row and column to be excluded, around 90% of cross-sum matrix games based on Sudokus have only one solution, compared to around 60% when using random values.
(It could of course be interesting to create puzzles which work both as a Sudoku and as a cross-sum matrix puzzle. For every Sudoku it should be easy to find a visually pleasing pattern of excluded cells that has only one solution.)
Examples
For those who like a challenge (or want to test a solver), here's a cross-sum Sudoku and an 11×11, 12×12 and 13×13 cross-sum matrix puzzle that have just one solution:
. 3 . 4 . . . . . 36
. 6 . . 9 . . 4 5 35
4 . . . . . 9 . . 33
. . 3 . . 1 . . . 39
. . . . . 8 2 . 3 29
. 7 . . . 2 6 . 9 40
. 2 . . . . . . . 33
3 . 8 . . . . . . 31
. . 7 . 5 . . 6 4 36
33 34 35 37 27 42 34 32 38
6 6 5 2 9 4 4 6 7 1 8 44
1 8 1 1 4 7 3 3 3 1 2 25
5 8 7 7 5 5 6 1 7 6 5 43
8 9 6 2 9 1 6 2 9 8 3 59
8 8 2 3 6 3 7 7 5 9 8 53
8 2 7 2 6 2 9 4 7 1 2 47
3 9 2 8 8 4 2 9 3 6 6 50
3 1 8 2 6 4 1 7 9 4 6 42
8 3 6 7 8 5 4 4 2 8 4 46
8 3 8 6 5 7 9 8 6 9 2 59
9 6 8 4 6 2 4 8 5 6 2 49
52 50 47 40 58 34 46 50 54 48 38
1 5 8 6 6 5 4 9 9 7 7 8 66
5 6 2 5 5 4 8 5 7 7 3 6 54
8 2 8 2 8 6 9 4 9 5 9 9 67
1 2 8 2 3 4 5 8 8 7 6 2 48
8 9 4 8 7 2 8 2 2 3 7 7 57
2 2 1 9 4 1 1 1 5 6 1 5 36
2 1 4 2 9 1 2 8 1 6 9 7 49
3 6 5 7 5 5 7 9 4 7 7 5 59
8 2 3 4 8 2 2 3 3 1 6 1 35
4 2 1 7 7 1 7 9 6 7 9 7 51
7 4 3 2 8 3 6 7 8 3 1 8 54
3 8 9 8 7 6 5 7 1 1 7 3 59
48 45 51 47 62 38 61 59 57 50 60 57
4 3 9 3 7 6 6 9 7 7 5 9 1 71
2 7 4 7 1 1 9 8 8 3 3 5 4 52
6 9 6 5 6 4 6 7 3 6 6 8 8 68
5 7 8 8 1 5 3 4 5 7 2 9 6 60
5 3 1 3 3 5 4 5 9 1 8 2 7 50
3 8 3 1 8 4 8 2 2 9 7 3 6 58
6 6 9 8 3 5 9 1 4 6 9 8 2 69
8 1 8 2 9 7 1 3 8 5 2 1 5 50
9 9 4 5 4 9 7 1 8 8 1 2 6 60
9 2 4 8 4 5 3 3 7 9 6 1 6 58
5 2 7 6 8 5 6 6 1 3 4 7 2 47
8 3 5 2 7 2 4 5 8 1 2 6 2 49
7 1 7 4 9 2 9 8 9 3 5 2 3 59
66 50 69 50 58 49 64 57 65 66 56 47 54

Quicksort on an set of length 2

When quicksorting a dataset the list gets split down and is recursive, in that the solution calls itself on the smaller lists.
I was practising quicksort on an algorithm but a sublist of length 2 is a stone in my shoe, I can't solve it. The original list was:
2 0 1 7 4 3 5 6
Pivot being at 2, left at 0, right at 6, I start. Left moves along to 7, 7>=2. Right moves down to 1, 1<=2. Left and right have crossed. As I understand, now right becomes the split point and two new lists are formed.
2 0 1 7 4 3 5 6
As you can see, the first list, 2 and 0, is 2 items long. So 2 is the pivot, and 0 is both left and right. Left doesn't move along, right moves along to 2, 2<=2. Left and right have crossed so p replaces R and L onwards is a new list. But this leaves 2 and 0 unsorted.
Where am I going wrong?
The problem in your case came from the fact that i don't move pivot in its sorted place. After the partitioning with pivot 2 your array should look like this:
0 1 2 7 4 3 5 6
^
Let's go through partition procedure with the input array 13 19 9 5 12 8 7 4 21 2 6 11. And let's choose 11 as a pivot.
During the procedure, you need to maintain two pointers, one for the element just before the first element bigger than the pivot ^^, and another one for the current you are looking at ||.
The code looks like this:
A is array left..right
pivot = A[right]
i = left - 1 // the one before the first bigger than the pivot
for j = left to right - 1
if A[j] <= pivot
i = i + 1
swap A[i] with A[j]
swap A[i+1] with A[right] // put pivot at its place, i + 1 - is the index to split on
And the example:
13 19 9 5 12 8 7 4 21 2 6 11
13 19 9 5 12 8 7 4 21 2 6 11 13 > 11, skip
^^ ||
13 19 9 5 12 8 7 4 21 2 6 11 19 > 11, skip
^^ ||
9 19 13 5 12 8 7 4 21 2 6 11 9 < 11, swap
^^ ||
9 5 13 19 12 8 7 4 21 2 6 11 5 < 11, swap
^^ ||
9 5 13 19 12 8 7 4 21 2 6 11 12 > 11, skip
^^ ||
9 5 8 19 12 13 7 4 21 2 6 11 8 < 11, swap
^^ ||
9 5 8 7 12 13 19 4 21 2 6 11 7 < 11, swap
^^ ||
9 5 8 7 4 13 19 12 21 2 6 11 4 < 11, swap
^^ ||
9 5 8 7 4 13 19 12 21 2 6 11 21 > 11, skip
^^ ||
can you continue yourself?
The quicksort algorithm only has base case of empty array or array of size 1. In your case of [2 0] , the algorithm chooses 2 as a pivot, partitions [2 0] into empty array and array [0] and merges it with pivot [2], giving sorted array [0 2].

Determine all consecutive subsets of the set {1,2,3,…,n}. The subsets should have at least 2 elements

I need to partition a set S={1, 2, 3, … , n} consisting of consecutive numbers such that each subset has has at least 2 elements (rule 1) and it consists of consecutive numbers (rule 2).
The rules are:
Each subset has at least two elements.
All elements of all subsets are consecutive.
All elements of S are included in the partition.
Examples:
There is 1 subset for n = 2:
1 2
There is 1 subset for n = 3:
1 2 3
There are 2 subset combinations for n = 4:
1 2 3 4
1 2 - 3 4
There are 3 subset combinations for n = 5:
1 2 3 4 5
1 2 - 3 4 5
1 2 3 - 4 5
There are 5 subset combinations for n = 6:
1 2 3 4 5 6
1 2 - 3 4 5 6
1 2 3 - 4 5 6
1 2 3 4 - 5 6
1 2 - 3 4 - 5 6
There are 8 subset combinations for n = 7:
1 2 3 4 5 6 7
1 2 - 3 4 5 6 7
1 2 3 - 4 5 6 7
1 2 3 4 - 5 6 7
1 2 3 4 5 - 6 7
1 2 - 3 4 - 5 6 7
1 2 - 3 4 5 - 6 7
1 2 3 - 4 5 - 6 7
There are 13 subset combinations for n = 8:
1 2 3 4 5 6 7 8
1 2 - 3 4 5 6 7 8
1 2 3 - 4 5 6 7 8
1 2 3 4 - 5 6 7 8
1 2 3 4 5 - 6 7 8
1 2 3 4 5 6 - 7 8
1 2 - 3 4 - 5 6 7 8
1 2 - 3 4 5 - 6 7 8
1 2 - 3 4 5 6 - 7 8
1 2 3 - 4 5 - 6 7 8
1 2 3 - 4 5 6 - 7 8
1 2 3 4 - 5 6 - 7 8
1 2 - 3 4 - 5 6 - 7 8
There are 21 subset combinations for n = 9:
1 2 3 4 5 6 7 8 9
1 2 - 3 4 5 6 7 8 9
1 2 3 - 4 5 6 7 8 9
1 2 3 4 - 5 6 7 8 9
1 2 3 4 5 - 6 7 8 9
1 2 3 4 5 6 - 7 8 9
1 2 3 4 5 6 7 - 8 9
1 2 - 3 4 - 5 6 7 8 9
1 2 - 3 4 5 - 6 7 8 9
1 2 - 3 4 5 6 - 6 7 9
1 2 - 3 4 5 6 7 - 8 9
1 2 3 - 4 5 - 6 7 8 9
1 2 3 - 4 5 6 - 7 8 9
1 2 3 - 4 5 6 7 - 8 9
1 2 3 4 - 5 6 - 7 8 9
1 2 3 4 - 5 6 7 - 8 9
1 2 3 4 5 - 6 7 - 8 9
1 2 - 3 4 - 5 6 - 7 8 9
1 2 - 3 4 - 5 6 7 - 8 9
1 2 - 3 4 5 - 6 7 - 8 9
1 2 3 - 4 5 - 6 7 - 8 9
There are 34 subset combinations for n = 10:
1 2 3 4 5 6 7 8 9 10
1 2 - 3 4 5 6 7 8 9 10
1 2 3 - 4 5 6 7 8 9 10
1 2 3 4 - 5 6 7 8 9 10
1 2 3 4 5 - 6 7 8 9 10
1 2 3 4 5 6 - 7 8 9 10
1 2 3 4 5 6 7 - 8 9 10
1 2 3 4 5 6 7 8 - 9 10
1 2 - 3 4 - 5 6 7 8 9 10
1 2 - 3 4 5 - 6 7 8 9 10
1 2 - 3 4 5 6 - 6 7 9 10
1 2 - 3 4 5 6 7 - 8 9 10
1 2 - 3 4 5 6 7 8 - 9 10
1 2 3 - 4 5 - 6 7 8 9 10
1 2 3 - 4 5 6 - 7 8 9 10
1 2 3 - 4 5 6 7 - 8 9 10
1 2 3 - 4 5 6 7 8 - 9 10
1 2 3 4 - 5 6 - 7 8 9 10
1 2 3 4 - 5 6 7 - 8 9 10
1 2 3 4 - 5 6 7 8 - 9 10
1 2 3 4 5 - 6 7 - 8 9 10
1 2 3 4 5 - 6 7 8 - 9 10
1 2 3 4 5 6 - 7 8 - 9 10
1 2 - 3 4 - 5 6 - 7 8 9 10
1 2 - 3 4 - 5 6 7 - 8 9 10
1 2 - 3 4 - 5 6 7 8 - 9 10
1 2 - 3 4 5 - 6 7 - 8 9 10
1 2 - 3 4 5 - 6 7 8 - 9 10
1 2 - 3 4 5 6 - 7 8 - 9 10
1 2 3 - 4 5 - 6 7 - 8 9 10
1 2 3 - 4 5 - 6 7 8 - 9 10
1 2 3 - 4 5 6 - 7 8 - 9 10
1 2 3 4 - 5 6 - 7 8 - 9 10
1 2 - 3 4 - 5 6 - 7 8 - 9 10
I didn't write them down here but there are 55 subset combinations for n = 11 and 89 subset combinations for n = 12.
I need to write a Visual Basic code listing all possible subset groups for n. I have been thinking on the solution for days but it seems that the solution of the problem is beyond my capacity. The number of required nested loops increases with n and I could not figure out how to program the nested loops with increasing number. Any help will be greatly appreciated.
After some research, I found out this is the problem of "compositions of n with all parts >1" and the total number of possible compositions are Fibonacci numbers (Fn-1 for n).
We already know the answer for these cases (as you wrote in your examples):
n=2
n=3
n=4
For n=5:
You can partition from 2: 1 2 - 3 4 5. This is like dividing the 5 member set into two sets, first one n=2, and second one n=3. We can now continue dividing each half, but we already know the solutions when n=2 and n=3!
You can partition from 3: 1 2 3 - 4 5. This is like dividing the 5 member set into two sets, first one n=3, and second one n=2. We can now continue dividing each half, but we already know the solution when n=2 and n=3!
For n=6:
You can partition into two sets from 2: 1 2 - 3 4 5 6. This is like dividing the 6 member set into two sets, first one n=2, and second one n=4. We can now continue dividing each half, but we already know the solution when n=2. Solve the second half by assuming n=4!
You can partition into two sets from 3: 1 2 3 - 4 5 6. This is like dividing the 6 member set into two sets, first one n=3, and second one n=3, We can now continue dividing each half, but we already know the solution when n=3 and n=3!
You can partition into two sets from 3: 1 2 3 4 - 5 6. This is like dividing the 6 member set into two sets, first one n=4, and second one n=2, We can now continue dividing each half. Solve the first half by setting n=4. For the second half, we already know the solution when n=2!
This is a simple recursion relationship. The general case:
Partition (S): (where |S|>4)
- For i from 2 to |S|-2, partition the given set into two halves: s1 and s2 from i (s1={1,...,i}, s2={i+1,...,n}), and print the two subsets as a solution.
- Recursively continue for each half by calling Partition(s1) and Partition(s2)
---
Another perhaps harder solution is to assume we are dividing the numbers 1 to n into n sections, where the length of each section can be either 0, 2, or a number greater than 2. In other words let xi be the length of each section:
x1 + x2 + ... xn = n, where the range of xi is: {0} + [2,n]
This is a system of linear non-equalities can can be solved by methods described here.
My answer to you is to try and come up with a recurrence relation of the given pattern. Think recursively. How can I break this problem down into smaller subproblems until reaching the smallest problem. Solve that smallest problem. After solving that smallest problem, think induction. Hypothesize on the what the nth step will be and how you will reach the (n+1)th step. Try to solve that (n+1)th step. Once you have come up with a recurrence relation on the pattern being given, it should not be too difficult to think about how to solve this pattern recursively. Instead of trying to use nested loops, this approach may be more intuitive.

What is the worst case scenario for quicksort?

When does the quicksort algorithm take O(n^2) time?
Quicksort works by taking a pivot, then putting all the elements lower than that pivot on one side and all the higher elements on the other; it then recursively sorts the two sub groups in the same way (all the way down until everything is sorted.) Now if you pick the worst pivot each time (the highest or lowest element in the list) you'll only have one group to sort, with everything in that group other than the original pivot that you picked. This in essence gives you n groups that each need to be iterated through n times, hence the O(n^2) complexity.
The most common reason for this occurring is if the pivot is chosen to be the first or last element in the list in the quicksort implementation. For unsorted lists this is just as valid as any other, however for sorted or nearly sorted lists (which occur quite commonly in practice) this is very likely to give you the worst case scenario. This is why all half-decent implementations tend to take a pivot from the centre of the list.
There are modifications to the standard quicksort algorithm to avoid this edge case - one example is the dual-pivot quicksort that was integrated into Java 7.
In short, Quicksort for sorting an array lowest element first works like this:
Choose a pivot element
Presort array, such that all elements smaller than the pivot are on the left side
Recursively do step 1. and 2. for the left side and the right side
Ideally, you would want a pivot element that partitions the sequence in two equally long subsequences but this is not so easy.
There are different schemes for choosing the pivot element. Early versions just took the leftmost element. In the worst case, the pivot element will always be the lowest element of the current range.
Leftmost element is pivot
In this case it can be easily thought out that the worst case is an monotonic increasing array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Rightmost element is pivot
Similarly, when choosing the rightmost element the worst case will be a decreasing sequence.
20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Center element is pivot
One possible remedy for the worst-case for presorted arrays, is to use the center element (or slightly left of center if the sequence is of even length). Then, the worst case would be quite more exotic. It can be constructed by modifying the Quicksort algorithm to set the array elements corresponding to the currently selected pivot element to a monotonic increasing value. I.e. we know the first pivot is the center, so the center must be the lowest value, e.g. 0. Next it gets swapped to the leftmost, i.e. the leftmost value is now in the center and would be the next pivot element, so it must be 1. Now, we can already guess that the array would look like this:
1 ? ? 0 ? ? ?
Here is the C++ code for the modified Quicksort to generate a worst sequence:
// g++ -std=c++11 worstCaseQuicksort.cpp && ./a.out
#include <algorithm> // swap
#include <iostream>
#include <vector>
#include <numeric> // iota
int main( void )
{
std::vector<int> v(20); /**< will hold the worst case later */
/* p basically saves the indices of what was the initial position of the
* elements of v. As they get swapped around by Quicksort p becomes a
* permutation */
auto p = v;
std::iota( p.begin(), p.end(), 0 );
/* in the worst case we need to work on v.size( sequences, because
* the initial sequence is always split after the first element */
for ( auto i = 0u; i < v.size(); ++i )
{
/* i can be interpreted as:
* - subsequence starting index
* - current minimum value, if we start at 0 */
/* note thate in the last step iPivot == v.size()-1 */
auto const iPivot = ( v.size()-1 + i )/2;
v[ p[ iPivot ] ] = i;
std::swap( p[ iPivot ], p[i] );
}
for ( auto x : v ) std::cout << " " << x;
}
The result:
0
0 1
1 0 2
2 0 1 3
1 3 0 2 4
4 2 0 1 3 5
1 5 3 0 2 4 6
4 2 6 0 1 3 5 7
1 5 3 7 0 2 4 6 8
8 2 6 4 0 1 3 5 7 9
1 9 3 7 5 0 2 4 6 8 10
6 2 10 4 8 0 1 3 5 7 9 11
1 7 3 11 5 9 0 2 4 6 8 10 12
10 2 8 4 12 6 0 1 3 5 7 9 11 13
1 11 3 9 5 13 7 0 2 4 6 8 10 12 14
8 2 12 4 10 6 14 0 1 3 5 7 9 11 13 15
1 9 3 13 5 11 7 15 0 2 4 6 8 10 12 14 16
16 2 10 4 14 6 12 8 0 1 3 5 7 9 11 13 15 17
1 17 3 11 5 15 7 13 9 0 2 4 6 8 10 12 14 16 18
10 2 18 4 12 6 16 8 14 0 1 3 5 7 9 11 13 15 17 19
1 11 3 19 5 13 7 17 9 15 0 2 4 6 8 10 12 14 16 18 20
16 2 12 4 20 6 14 8 18 10 0 1 3 5 7 9 11 13 15 17 19 21
1 17 3 13 5 21 7 15 9 19 11 0 2 4 6 8 10 12 14 16 18 20 22
12 2 18 4 14 6 22 8 16 10 20 0 1 3 5 7 9 11 13 15 17 19 21 23
1 13 3 19 5 15 7 23 9 17 11 21 0 2 4 6 8 10 12 14 16 18 20 22 24
There is order in this. The right side is just increments of two starting with zero. The left side also has an order. Let's format the left side for the 73 element long worst case sequence nicely using Ascii art:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
------------------------------------------------------------------------------------------------------------
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
37 39 41 43 45 47 49 51 53
55 57 59 61 63
65 67
69
71
The header is the element index. In the first row numbers starting from 1 and increasing by 2 are given to every 2nd element. In the second row the same is done to every 4th element, in the 3rd row numbers are assigned to every 8th element and so on. In this case the first value to be written in the i-th row is at index 2^i-1, but for certain lengths this looks a tad different.
The resulting structure is reminiscent to an inverted binary tree whose nodes are labeled bottom-up starting from the leaves.
Median of leftmost, center and rightmost elements is pivot
Another way is to use the median of the leftmost, the center and the rightmost element. In this case the worst case can only be, that the w.l.o.g. left subsequence is of length 2 (not just length 1 like in the examples above). Also we assume that the rightmost value will always be the highest of the median-of-three. This also means it is the highest of all values. Making adjustments in the program above, we now have this:
auto p = v;
std::iota( p.begin(), p.end(), 0 );
auto i = 0u;
for ( ; i < v.size(); i+=2 )
{
auto const iPivot0 = i;
auto const iPivot1 = ( i + v.size()-1 )/2;
v[ p[ iPivot1 ] ] = i+1;
v[ p[ iPivot0 ] ] = i;
std::swap( p[ iPivot1 ], p[i+1] );
}
if ( v.size() > 0 && i == v.size() )
v[ v.size()-1 ] = i-1;
The generated sequences are:
0
0 1
0 1 2
0 1 2 3
0 2 1 3 4
0 2 1 3 4 5
0 4 2 1 3 5 6
0 4 2 1 3 5 6 7
0 4 2 6 1 3 5 7 8
0 4 2 6 1 3 5 7 8 9
0 8 2 6 4 1 3 5 7 9 10
0 8 2 6 4 1 3 5 7 9 10 11
0 6 2 10 4 8 1 3 5 7 9 11 12
0 6 2 10 4 8 1 3 5 7 9 11 12 13
0 10 2 8 4 12 6 1 3 5 7 9 11 13 14
0 10 2 8 4 12 6 1 3 5 7 9 11 13 14 15
0 8 2 12 4 10 6 14 1 3 5 7 9 11 13 15 16
0 8 2 12 4 10 6 14 1 3 5 7 9 11 13 15 16 17
0 16 2 10 4 14 6 12 8 1 3 5 7 9 11 13 15 17 18
0 16 2 10 4 14 6 12 8 1 3 5 7 9 11 13 15 17 18 19
0 10 2 18 4 12 6 16 8 14 1 3 5 7 9 11 13 15 17 19 20
0 10 2 18 4 12 6 16 8 14 1 3 5 7 9 11 13 15 17 19 20 21
0 16 2 12 4 20 6 14 8 18 10 1 3 5 7 9 11 13 15 17 19 21 22
0 16 2 12 4 20 6 14 8 18 10 1 3 5 7 9 11 13 15 17 19 21 22 23
0 12 2 18 4 14 6 22 8 16 10 20 1 3 5 7 9 11 13 15 17 19 21 23 24
Pseudorandom element with random seed 0 is pivot
The worst case sequences for center element and median-of-three look already pretty random, but in order to make Quicksort even more robust the pivot element can be chosen randomly. If the random sequence used is at least reproducible on every Quicksort run, then we can also construct a worst case sequence for that. We only have to adjust the iPivot = line in the first program, e.g. to:
srand(0); // you shouldn't use 0 as a seed
for ( auto i = 0u; i < v.size(); ++i )
{
auto const iPivot = i + rand() % ( v.size() - i );
[...]
The generated sequences are:
0
1 0
1 0 2
2 3 1 0
1 4 2 0 3
5 0 1 2 3 4
6 0 5 4 2 1 3
7 2 4 3 6 1 5 0
4 0 3 6 2 8 7 1 5
2 3 6 0 8 5 9 7 1 4
3 6 2 5 7 4 0 1 8 10 9
8 11 7 6 10 4 9 0 5 2 3 1
0 12 3 10 6 8 11 7 2 4 9 1 5
9 0 8 10 11 3 12 4 6 7 1 2 5 13
2 4 14 5 9 1 12 6 13 8 3 7 10 0 11
3 15 1 13 5 8 9 0 10 4 7 2 6 11 12 14
11 16 8 9 10 4 6 1 3 7 0 12 5 14 2 15 13
6 0 15 7 11 4 5 14 13 17 9 2 10 3 12 16 1 8
8 14 0 12 18 13 3 7 5 17 9 2 4 15 11 10 16 1 6
3 6 16 0 11 4 15 9 13 19 7 2 10 17 12 5 1 8 18 14
6 0 14 9 15 2 8 1 11 7 3 19 18 16 20 17 13 12 10 4 5
14 16 7 9 8 1 3 21 5 4 12 17 10 19 18 15 6 0 11 2 13 20
1 2 22 11 16 9 10 14 12 6 17 0 5 20 4 21 19 8 3 7 18 15 13
22 1 15 18 8 19 13 0 14 23 9 12 10 5 11 21 6 4 17 2 16 7 3 20
2 19 17 6 10 13 11 8 0 16 12 22 4 18 15 20 3 24 21 7 5 14 9 1 23
So how to check whether those sequences are correct?
Measure time it took for the sequences. Plot time over the sequence length N. If the curve scales with O(N^2) instead of O(N log(N)), then these are indeed worst case sequences.
Adjust a correct Quicksort to give debug output about the subsequence lengths and/or the chosen pivot elements. One of the subsequences should always be of length 1 (or 2 for median-of-three). The chosen pivot elements printed should be increasing.
Getting a pivot equal to the lowest or highest number, should also trigger the worst case scenario of O(n2).
Different implementations of quicksort have different datasets required to give it a worstcase runtime. It depends on where the algorithm selects it's pivot-element.
And also as Ghpst said, selecting the biggest or smallest number would give you a worstcase.
If I remember correctly quicksort normally uses a random element for pivot to minimize the chance of getting a worstcase.
I think if the array is in revrse order then it will be worst case for pivot the last element of that array
The factors that contribute to the worst-case scenario of quicksort are as follows:
Worst case occurs when the subarrays are completely unbalanced
The worst case occurs when there are 0 elements in one subarray and n-1 elements in the other.
In other words, the worst-case running time of quicksort occurs when Quicksort takes in a sorted array (in decreasing order), to be on the time complexity of O(n^2).

disappearing row names when using apply

Consider the following example (values in vectors are target practice results and I'm trying to automagically sort by shooting score). We generate three vectors. We sort values in columns 1:20 in ascending order and rows in descending order based on out.tot column.
# Generate data
shooter1 <- round(runif(n = 20, min = 1, max = 10))
shooter2 <- round(runif(n = 20, min = 1, max = 10))
shooter3 <- round(runif(n = 20, min = 1, max = 10))
out <- data.frame(t(data.frame(shooter1, shooter2, shooter3)))
colnames(out) <- 1:ncol(out)
out.sort <- t(apply(out, 1, sort, na.last = FALSE))
out.tot <- apply(out , 1, sum)
colnames(out.sort) <- 1:ncol(out.sort)
out2 <- cbind(out.sort, out.tot)
out3 <- apply(out2, 2, sort, decreasing = TRUE, na.last = FALSE)
out2 has row names attached while out3 lost them. The only difference is that I used MARGIN = 2, which is probably the culprit (because it takes in column by column). I can match rows by hand, but is there a way I can keep row names in out3 from disappearing?
> out2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 out.tot
shooter1 1 2 2 3 3 3 4 5 5 5 6 6 6 6 6 7 8 9 9 10 106
shooter2 1 3 3 3 3 4 4 4 5 5 5 5 5 6 7 8 8 9 9 10 107
shooter3 1 1 2 2 2 3 3 4 5 5 5 6 6 6 6 7 8 8 8 9 97
> out3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 out.tot
[1,] 1 3 3 3 3 4 4 5 5 5 6 6 6 6 7 8 8 9 9 10 107
[2,] 1 2 2 3 3 3 4 4 5 5 5 6 6 6 6 7 8 9 9 10 106
[3,] 1 1 2 2 2 3 3 4 5 5 5 5 5 6 6 7 8 8 8 9 97
If I understand your example, going from out2 to out3 you are sorting each column independently - meaning that the values on row 1 may not all come from the data generated from shooter1. It makes sense then that the rownames are dropped in as much as rownames are names of observations and you are no longer keeping data from one observation on one row.

Resources