Related
Let's suppose we have two sorted arrays, A and B, consisting of n elements. I dont understand why the time needed to merge these 2 is "n+n". In order to merge them we need to compare 2n-1 elements. For example, in the two following arrays
A = [3, 5, 7, 9] and B = [2, 4, 6, 8]
We will start merging them into a single one, by comparing the elements in the known way. However when we finally compare 8 with 9. Now, this will be our 2n-1=8-1=7th comparison and 8 will be inserted into the new array.
After this the 9 will be inserted without another comparison. So I guess my question is, since there are 2n-1 comparisons, why do we say that this merging takes 2n time? Im not saying O(n), im saying T(n)=2n, an exact time function.
Its probably a detail that im missing here so I would be very grateful if someone could provide some insight. Thanks in advance.
I'm trying to understand why heapsort isn't stable.
I've googled this, but haven't found a good, intuitive explanation.
I understand the importance of stable sorting - it allows us to sort based on more than one key, which can be very beneficial (i.e., do multiple sortings, each based on a different key. Since every sort will preserve the relative order of elements, previous sortings can add up to give a final list of elements sorted by multiple criteria).
However, why wouldn't heapsort preserve this as well?
Thanks for your help!
Heap sort unstable example
Consider array 21 20a 20b 12 11 8 7 (already in max-heap format)
here 20a = 20b just to differentiate the order we represent them as 20a and 20b
While heapsort first 21 is removed and placed in the last index then 20a is removed and placed in last but one index and 20b in the last but two index so after heap sort the array looks like
7 8 11 12 20b 20a 21.
It does not preserve the order of elements and hence can't be stable
The final sequence of the results from heapsort comes from removing items from the created heap in purely size order (based on the key field).
Any information about the ordering of the items in the original sequence was lost during the heap creation stage, which came first.
Stable means if the two elements have the same key, they remain in the same order or positions. But that is not the case for Heap sort.
Heapsort is not stable because operations on the heap can change the relative order of equal items.
From here:
When sorting (in ascending order) heapsort first peaks the largest
element and put it in the last of the list. So, the element that have
been picked first, stays last and the element that have been picked
second stays to the second last element in the sorted list.
Again, Build-Max-Heap procedure works such that it preserve the order
of same value (ex:3a,3b) in building the heap tree. For extracting
the maximum element it also works from the root and try to preserve
the structure of the tree (except the change for Heapify).
So, what happens, for elements with same value [3a,3b] heapsort picks
3a before 3b but puts 3a to the right of 3b. So, As the list is
sorted in ascending order we get 3b before 3a in the list .
If you try heapsort with (3a,3b,3b) then you can visualize the
situation.
Stable sort algorithms sort elements such that order of repeating elements in the input is maintained in the output as well.
Heap-Sort involves two steps:
Heap creation
Removing and adding the root element from heap tree into a new array which will be sorted in order
1. Order breaks during Heap Creation
Let's say the input array is {1, 5, 2, 3, 2, 6, 2} and for the purpose of seeing the order of 2's, say they are 2a, 2b and 2c so the array would be {1, 5, 2a, 3, 2b, 6, 2c}
Now if you create a heap (min-heap here) out of it, it's array representation will be {1, 2b, 2a, 3, 5, 6, 2c} where order of 2a and 2b has already changed.
2. Order breaks during removal of root element
Now when we have to remove root element (1 in our case) from the heap to put it into another new array, we swap it with the last position and remove it from there, hence changing the heap into {2c, 2b, 2a, 3, 5, 6}. We repeat the same and this time we will remove '2c' from the heap and put it at end of the array where we had put '1'.
When we finish repeating this step until the heap is empty and every element is transferred to the new array, the new array (sorted) it will look like {1, 2c, 2b, 2a, 3, 5, 6}.
Input to Heap-Sort: {1, 5, 2a, 3, 2b, 6, 2c} --> Output: {1, 2c, 2b, 2a, 3, 5, 6}
Hence we see that repeating elements (2's) are not in same order in heap-sorted array as they appear in the input and therefore Heap-Sort is not stable !
I know this is a late answers but I will add my 2 cents here.
Consider a simple array of 3 integers. 2,2,2 now if you build a max heap using build max heap function, you will find that the array storing the input has not changed as it is already in Max heap form. Now when we put the root of the tree at the end of the array in first iteration of heap sort the stability of array is already gone. So there you have an simple example of instability of heap sort.
Suppose take an array of size n (arbitrary value) and if there are two consecutive elements(assume 15) in heap and if their parent indices have values like 4 and 20.(this is the actual order (....4,20 ,.....,15,15.....). the relative order of 4 and 1st 15 remains same but as 20>15,the 2nd 15 comes to front(swap) as defined in heap sort algorithm, the relative order is gone.
I tried looking through my lovely textbook (with no avail) and online. According to the book I'm working off by Cormen we are to use the first element in an array as the pivot. I'm just stuck on what to do since the first element happens to be 1.
The array looks as follows:
[1, 16, 2, 3, 14, 5, 12, 7, 10, 8, 9, 17, 19, 21, 23, 26, 27]
Again, the problem with the algorithm in the book is that it chooses the first element as the pivot. And once we have compared 1 to all the other elements and find that there is no other element smaller than or equal to then we are to swap the pivot and the middle element of the sub arrays, where subarray on the left is smaller than the pivot and the subarray on the right is greater than the pivot. But if our pivot IS 1 then there is no way we can swap. Really confused, any help would be great. The title of the book is Introduction to Algorithms, 3rd Edition in case someone out there is familiar with it.
No difference from the normal case: just treat the left part as empty and do quicksort on the right part which is the subarray from 1.
This is not a special case. In fact, when the input is sorted, the naive quicksort degenerates into a O(N^2) sorting algorithm. Quoting Wikipedia:
In very early versions of quicksort, the leftmost element of the partition would often be chosen as the pivot element. Unfortunately, this causes worst-case behavior on already sorted arrays, which is a rather common use-case. The problem was easily solved by choosing either a random index for the pivot, choosing the middle index of the partition or (especially for longer partitions) choosing the median of the first, middle and last element of the partition for the pivot (as recommended by R. Sedgewick).
You can use something known as the rule of three. Pick the first value, middle value and last value in the array and choose one of those as the pivot candidate. This doesn't guarantee that you will get the best pivot but it lowers the chances of getting a really bad pivot.
In one of my project I encountered a need to generate a set of numbers in a given range that will be:
Exhaustive, which means that it will cover the most of the given
range without any repetition.
It will guarantee determinism (every time the sequence will be the
same). This can be probably achieved with a fixed seed.
It will be random (I am not very versed into Random Number Theory, but I guess there is a bunch of rules that describes randomness. From perspective something like 0,1,2..N is not random).
Ranges I am talking about can be ranges of integers, or of real numbers.
For example, if I used standard C# random generator to generate 10 numbers in range [0, 9] I will get this:
0 0 1 2 0 1 5 6 2 6
As you can see, a big part of given range still remains 'unexplored' and there are many repetitions.
Of course, input space can be very large, so remembering previously chosen values is not an option.
What would be the right way to tackle this problem?
Thanks.
After the comments:
Ok i agree that the random is not the right word, but I hope that you understood what I am trying to achieve. I want to explore given range that can be big so in memory list is not an option. If a range is (0, 10) and i want three numbers i want to guarantee that those numbers will be different and that they will 'describe the range' (i.e. They wont all be in a lower half etc).
Determinism part means that i would like to use something like standard rng with a fixed seed, so I can fully control the sequence.
I hope i made things a bit clearer.
Thanks.
Here's three options with different tradeoffs:
Generate a list of numbers ahead of time, and shuffle them using the fisher-yates shuffle. Select from the list as needed. O(n) total memory, and O(1) time per element. Randomness is as good as the PRNG you used to do the shuffle. The simplest of the three alternatives, too.
Use a Linear Feedback Shift Register, which will generate every value in its sequence exactly once before repeating. O(log n) total memory, and O(1) time per element. It's easy to determine future values based on the present value, however, and LFSRs are most easily constructed for power of 2 periods (but you can pick the next biggest power of 2, and skip any out of range values).
Use a secure permutation based on a block cipher. Usable for any power of 2 period, and with a little extra trickery, any arbitrary period. O(log n) total space and O(1) time per element, randomness is as good as the block cipher. The most complex of the three to implement.
If you just need something, what about something like this?
maxint = 16
step = 7
sequence = 7, 14, 5, 12, 3, 10, 1, 8, 15, 6, 13, 4, 11, 2, 9, 0
If you pick step right, it will generate the entire interval before repeating. You can play around with different values of step to get something that "looks" good. The "seed" here is where you start in the sequence.
Is this random? Of course not. Will it look random according to a statistical test of randomness? It might depend on the step, but likely this will not look very statistically random at all. However, it certainly picks the numbers in the range, not in their original order, and without any memory of the numbers picked so far.
In fact, you could make this look even better by making a list of factors - like [1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16] - and using shuffled versions of those to compute step * factor (mod maxint). Let's say we shuffled the example factors lists like [3, 2, 4, 5, 1], [6, 8, 9, 10, 7], [13, 16, 12, 11, 14, 15]. then we'd get the sequence
5, 14, 12, 3, 7, 10, 8, 15, 6, 1, 11, 0, 4, 13, 2, 9
The size of the factors list is completely tunable, so you can store as much memory as you like. Bigger factor lists, more randomness. No repeats regardless of factor list size. When you exhaust a factor list, generating a new one is as easy as counting and shuffling.
It is my impression that what you are looking for is a randomly-ordered list of numbers, not a random list of numbers. You should be able to get this with the following pseudocode. Better math-ies may be able to tell me if this is in fact not random:
list = [ 1 .. 100 ]
for item,index in list:
location = random_integer_below(list.length - index)
list.switch(index,location+index)
Basically, go through the list and pick a random item from the rest of the list to use in the position you are at. This should randomly arrange the items in your list. If you need to reproduce the same random order each time, consider saving the array, or ensuring somehow that random_integer_below always returns numbers in the same order given some seed.
Generate an array that contains the range, in order. So the array contains [0, 1, 2, 3, 4, 5, ... N]. Then use a Fisher-Yates Shuffle to scramble the array. You can then iterate over the array to get your random numbers.
If you need repeatability, seed your random number generator with the same value at the start of the shuffle.
Do not use a random number generator to select numbers in a range. What will eventually happen is that you have one number left to fill, and your random number generator will cycle repeatedly until it selects that number. Depending on the random number generator, there is no guarantee that will ever happen.
What you should do is generate a list of numbers on the desired range, then use a random number generator to shuffle the list. The shuffle is known as the Fisher-Yates shuffle, or sometimes called the Knuth shuffle. Here's pseudocode to shuffle an array x of n elements with indices from 0 to n-1:
for i from n-1 to 1
j = random integer such that 0 ≤ j ≤ i
swap x[i] and x[j]
I'm coding a question on an online judge for practice . The question is regarding optimizing Bogosort and involves not shuffling the entire number range every time. If after the last shuffle several first elements end up in the right places we will fix them and don't shuffle those elements furthermore. We will do the same for the last elements if they are in the right places. For example, if the initial sequence is (3, 5, 1, 6, 4, 2) and after one shuffle Johnny gets (1, 2, 5, 4, 3, 6) he will fix 1, 2 and 6 and proceed with sorting (5, 4, 3) using the same algorithm.
For each test case output the expected amount of shuffles needed for the improved algorithm to sort the sequence of first n natural numbers in the form of irreducible fractions.
A sample input/output says that for n=6, the answer is 1826/189.
I don't quite understand how the answer was arrived at.
This looks similar to 2011 Google Code Jam, Preliminary Round, Problem 4, however the answer is n, I don't know how you get 1826/189.