I am trying to understand how merge sort recursion stack actually manages to merge two arrays into a sorted array.
The code and the output are at - https://gist.github.com/antani/144a2dfc85d89ae86297 (to prevent clutter in the question)
I am not able to visualize the stack trace of this algorithm
Well, both arrays left and right are sorted if they will be merged. Then the algorithm compares the first and therefore the smallest left-value with the smallest right-value. The smaller value of both is the next value for the resulting array.
After this part the resulting array is also sorted and will be returned back to the recursion depth/step/iteration n -1...
Maybe this animated working sort algorithms will hell you to understand: http://www.sorting-algorithms.com
Related
Would it be possible to improve the insertion sort algorithm's running time if a doubly-linked list was used instead of an array?
Thank you.
No, not in a big-O sense anyway. There's a few variants of insertion sort so let's talk about the one where the left-side of the list is sorted in ascending order and the right side is unsorted. On each iteration we find the first (left-most) element in the unsorted list, call that x, and then iterate backwards through the sorted list to see where that element belongs. If the list is an array we pull out our new element and move items in the sorted list right until we find where the item belongs and we put it there. So we iterate through n items in the list and for each of those we iterate through the items in the sorted list which will be 0 items on the first iteration, 1 on the next, 2 after that, ... up to n so, on average, we're dong up to n/2 comparions/swaps for each of our n iterations for a total runtime of O(n^2).
If you make it a doubly-linked list instead the only thing that changes is that instead of moving items one to the right as you iterate through the sorted list you leave them where they are and then you drop your new item into place repairing the list instead. But it's still n/2 average comparisons and pointer de-references (to find the next node in the list) so the big-O runtime is the same. It's actually probably slower in real life as the pointer de-references are likely not cache-friendly and take more time that moving items one-right in an array.
Suppose I have an unsorted array P and it's sorted equivalent P_Sorted. Suppose L and R refer to the left and right halves of P. Is there a way to recover L_Sorted and R_Sorted from P and P_Sorted in linear time without using extra memory?
For further clarification, during a recursive merge sort implementation L_Sorted and R_Sorted would be merged together to form P_Sorted, so I'm kinda looking to reverse the merge step.
In a merge sort, you divide the array into two halves recursively and merge them. So at the last merge, you would have already sorted the left and right halves - they are sorted independently - that is why divide and conquer name.
Therefore when doing a merge you can just look at the sizes of the arrays to be merged and if they are equal ( even input size ) or differ by 1 ( odd input size ), you are at the last merge. Then you could store those sorted arrays in some variable before merging them.
BUT if you are not allowed to mess with the function, and you need to work only with the sorted array and the original array, I think the solution is not straightforward. I found an url that poses this problem and a possible solution.
It seems feasible in linear time for very specific datasets:
If there is a way to tell the original position of each data element in the sorted list, for example if these are records with a creation date and a name field and the original array is in chronological order, selecting from the sorted array the elements that fall in the first or second half can be done in a single scan in linear time with no space overhead.
In the general case, sorting the left and right half seems the most efficient way to get L_sorted and R_sorted, with or without P_sorted. The time complexity is O(n.log(n)).
I hate to just post a question regarding homework, but I am having a lot of trouble as to what they are trying to ask from me. I am not asking you to solve my homework problem, only to guide me as to the first steps I should take because I don't know where to begin on this. I read the part of the chapter about quick sort and sort of grasp it and watched a video on it too.
Here's the homework problem:
Sort an array of 10,000 elements using the quick sort algorithm as follows:
a. Sort the array using pivot as the middle element of the array.
b. Sort the array using pivot as the median of the first, last, and middle elements of the array.
c. Sort the array using pivot as the middle element of the array. How- ever, when the size of any sublist reduces to less than 20, sort the sublist using an insertion sort.
d. Sort the array using pivot as the median of the first, last, and middle elements of the array. When the size of any sublist reduces to less than 20, sort the sublist using an insertion sort.
e. Calculate and print the CPU time for each of the preceding four steps.
I'm having trouble understanding - is this going to be one whole function that contains these four steps, or four different functions each doing each step? I get what it means by step a in using pivot as middle element of array (Let's say you have 10 array elements and the middle would be element #4) but I don't get what it means by "sort the array using pivot as the median of the first, last, and middle elements of the array"
I need insight into the inner workings of quick sort and what this book is asking from me.
There is probably an efficient solution for this, but I'm not seeing it.
I'm not sure how to explain my problem but here goes...
Lets say we have one array with n integers, for example {3,2,0,5,0,4,1,9,7,3}.
What we want to do is to find the range of 5 consecutive elements with the "maximal minimum"...
The solution in this example, would be this part {3,2,0,5,0,4,1,9,7,3} with 1 as the maximal minimum.
It's easy to do with O(n^2), but there must be a better way of doing this. What is it?
If you mean literally five consecutive elements, then you just need to keep a sorted window of the source array.
Say you have:
{3,2,0,5,0,1,0,4,1,9,7,3}
First, you get five elements and sort'em:
{3,2,0,5,0, 1,0,1,9,7,3}
{0,0,2,3,5} - sorted.
Here the minimum is the first element of the sorted sequence.
Then you need do advance it one step to the right, you see the new element 1 and the old one 3, you need to find and replace 3 with 1 and then return the array to the sorted state. You actually don't need to run a sorting algorithm on it, but you can as there is just one element that is in the wrong place (1 in this example). But even bubble sort will do it in linear time here.
{3,2,0,5,0,1, 0,4,1,9,7,3}
{0,0,1,2,5}
Then the new minimum is again the first element.
Then again and again you advance and compare first elements of the sorted sequence to the minimum and remember it and the subsequence.
Time complexity is O(n).
Can't you use some circular buffer of 5 elements, run over the array and add the newest element to the buffer (thereby replacing the oldest element) and searching for the lowest number in the buffer? Keep a variable with the offset into the array that gave the highest minimum.
That would seem to be O(n * 5*log(5)) = O(n), I believe.
Edit: I see unkulunkulu proposed exactly the same as me in more detail :).
Using a balanced binary search tree indtead of a linear buffer, it is trivial to get complexity O(n log m).
You can do it in O(n) for arbitrary k-consecutive elements as well. Use a deque.
For each element x:
pop elements from the back of the deque that are larger than x
if the front of the deque is more than k positions old, discard it
push x at the end of the deque
at each step, the front of the deque will give you the minimum of your
current k-element window. Compare it with your global maximum and update if needed.
Since each element gets pushed and popped from the deque at most once, this is O(n).
The deque data structure can either be implemented with an array the size of your initial sequence, obtaining O(n) memory usage, or with a linked list that actually deletes the needed elements from memory, obtaining O(k) memory usage.
If I have N arrays, what is the best(Time complexity. Space is not important) way to find the common elements. You could just find 1 element and stop.
Edit: The elements are all Numbers.
Edit: These are unsorted. Please do not sort and scan.
This is not a homework problem. Somebody asked me this question a long time ago. He was using a hash to solve the problem and asked me if I had a better way.
Create a hash index, with elements as keys, counts as values. Loop through all values and update the count in the index. Afterwards, run through the index and check which elements have count = N. Looking up an element in the index should be O(1), combined with looping through all M elements should be O(M).
If you want to keep order specific to a certain input array, loop over that array and test the element counts in the index in that order.
Some special cases:
if you know that the elements are (positive) integers with a maximum number that is not too high, you could just use a normal array as "hash" index to keep counts, where the number are just the array index.
I've assumed that in each array each number occurs only once. Adapting it for more occurrences should be easy (set the i-th bit in the count for the i-th array, or only update if the current element count == i-1).
EDIT when I answered the question, the question did not have the part of "a better way" than hashing in it.
The most direct method is to intersect the first 2 arrays and then intersecting this intersection with the remaining N-2 arrays.
If 'intersection' is not defined in the language in which you're working or you require a more specific answer (ie you need the answer to 'how do you do the intersection') then modify your question as such.
Without sorting there isn't an optimized way to do this based on the information given. (ie sorting and positioning all elements relatively to each other then iterating over the length of the arrays checking for defined elements in all the arrays at once)
The question asks is there a better way than hashing. There is no better way (i.e. better time complexity) than doing a hash as time to hash each element is typically constant. Empirical performance is also favorable particularly if the range of values is can be mapped one to one to an array maintaining counts. The time is then proportional to the number of elements across all the arrays. Sorting will not give better complexity, since this will still need to visit each element at least once, and then there is the log N for sorting each array.
Back to hashing, from a performance standpoint, you will get the best empirical performance by not processing each array fully, but processing only a block of elements from each array before proceeding onto the next array. This will take advantage of the CPU cache. It also results in fewer elements being hashed in favorable cases when common elements appear in the same regions of the array (e.g. common elements at the start of all arrays.) Worst case behaviour is no worse than hashing each array in full - merely that all elements are hashed.
I dont think approach suggested by catchmeifyoutry will work.
Let us say you have two arrays
1: {1,1,2,3,4,5}
2: {1,3,6,7}
then answer should be 1 and 3. But if we use hashtable approach, 1 will have count 3 and we will never find 1, int his situation.
Also problems becomes more complex if we have input something like this:
1: {1,1,1,2,3,4}
2: {1,1,5,6}
Here i think we should give output as 1,1. Suggested approach fails in both cases.
Solution :
read first array and put into hashtable. If we find same key again, dont increment counter. Read second array in same manner. Now in the hashtable we have common elelements which has count as 2.
But again this approach will fail in second input set which i gave earlier.
I'd first start with the degenerate case, finding common elements between 2 arrays (more on this later). From there I'll have a collection of common values which I will use as an array itself and compare it against the next array. This check would be performed N-1 times or until the "carry" array of common elements drops to size 0.
One could speed this up, I'd imagine, by divide-and-conquer, splitting the N arrays into the end nodes of a tree. The next level up the tree is N/2 common element arrays, and so forth and so on until you have an array at the top that is either filled or not. In either case, you'd have your answer.
Without sorting and scanning the best operational speed you'll get for comparing 2 arrays for common elements is O(N2).