Runtime complexity: sorting 2d array where each row, column and diagonals are sorted? - complexity-theory

Given the following 2d array:
6 8 11 17
9 11 14 20
18 20 23 29
24 26 29 35
Each row and column is sorted as well as the diagonals are sorted too (top left to bottom right). Assuming we have n² elements in the array (n = 4 in this case), it is trivial to use quicksort which takes O(n² log(n²)) = O(n² log(n)) to sort the 2d array. My question is can we sort this in O(n²)?
The goal is to use the given semi-sorted 2d array and come-up with a clever solution.
The target output is:
6 8 9 11
11 14 17 18
20 20 23 24
26 29 29 35

Yes, we can sort this in O(n^2) time.
Reduction to sorting a 1D array
Let us first show that this new problem of sorting a 2D array (such that each row, column, and top-left-to-bottom-right diagonal is sorted) can be reduced to the problem of sorting a 1D array of n^2 elements.
Suppose we have a sorted 1D array of n^2 elements. We can trivially rearrange this in to a sorted n x n array by setting the first n numbers as the first row, followed by the next n numbers as the second row, and repeat until we exhaust the array.
Hence, given a 2D array of n^2 numbers, we can transform it into a 1D array in O(n^2) time, sort this array, then transform it back to the desired 2D array in O(n^2) time. Thus, if we can find a sorting algorithm for a 1D array in O(n^2), we can equivalently solve this new problem in O(n^2) time.
Sorting a 1D array in linear time
Given this, we simply need to provide a linear time sort. i.e. given n^2 elements, sort them in O(n^2) time. Conveniently, there are multiple algorithms you can use to accomplish this such as counting sort or radix sort, although they do come with various caveats. However, assuming a reasonable range of numerical values given the number of items to be sorted, these sorts will run in linear time.
Thus given n^2 elements in an n x n array, this 2D sorting problem can be reduced in O(n^2) time to a 1D sorting problem, which can then be solved with various linear time sorting algorithms in O(n^2) time. Hence, overall, this problem can be solved in O(n^2) time.
Sorting with a comparison sort
Following the discussion in the comments, the next step is to ask: what about comparison sorts. Comparison sorts are beneficial because it would allow us to avoid the previously mentioned caveats of counting and radix sorts.
However, even with this additional information, a linear time comparison sort is unlikely in practice, because this would require us to compute the final position of each number in O(1) time. We know this isn't possible using a comparison sort.
Let's consider a small example: what should be the final sorted position of the number originally in row 1, column 2? We know that it has to be the first of the numbers in columns 2...n. However, we don't know where it belongs relative to the numbers in column 1 (other than the number in row 1, column 1).
In general, for any number in the original square, we are uncertain of its final sorted position relative to all numbers to its lower left and the numbers to its upper right. It would take O(log_2(n)) comparisons to find the relative position of each number, and there are O(n^2) numbers to position. This uncertainty prevents us from achieving a linear time sort in practice.
But the additional information that we have should allow us to achieve some speedups. For example, we could adapt merge sort to this problem. In a standard merge sort we start by splitting our original array into half and repeat until we have arrays of size 1 that are guaranteed to be sorted,
then we repeatedly merge these subarrays until we have one single array. For n^2 elements, we have to create a binary tree with log_2(n^2) layers, and each layer takes O(n^2) time to merge.
Using the additional information in your problem setup, we don't have to split the arrays until they are of size 1. Instead, we can start off with n sorted arrays of length n and start merging from there. This halves the number of layers we have to merge, and gives us a final runtime of O(n^2 log_2(n)).
Conclusion
In practice, this additional information allows some speedups for comparison sorts, allowing us to achieve O(n^2 log_2(n)) run times.
But in order to achieve a linear time sort that runs in O(n^2) time, we have to rely on algorithms such as counting or radix sort.

Related

How can I find the upper and lower boundary for quick sort?

I got the average case complexity for quick sort.Now how can I find the upper and lower bounds for quick sort?
The time complexity of the Quick Sort is O(N log(N)) with a worst case of O(N^2). This is due to the fact that it must go through all the numbers of the array and divide them equally. into two sub arrays that are lower and higher then the selected pivot. Each of these sub arrays must continue through the same process. This divide and conquer continues until there are only arrays of size 2 that are sorted correctly. To compute this it takes N log(N). this is easily seen with a binary tree, where the leaves (the bottom rows) are sorted. Then you just concatenate them.
8
4 4
2 2 2 2
Quick Sort runs into problems when you have a sorted array. Something like insertion sort would have a O(N) time algorithm at this situation. Dealing with arrays that are partially sorted and you need a time crunch (that is if you are dealing with Millions of Data), then you might want to create a algorithm of your own design that suits your taste.
Reference: https://en.wikipedia.org/wiki/Quicksort

Bucket sort:Why don't we set range to 1? vs counting sort

Bucket sort creates k buckets....and distribute n numbers in one of those buckets..
Eg.1-10,
11-20,
21-30...
O(n+k)
The no.s within the bucket are sorted using insertion O(n²)
It works fine when few numbers end up in same bucket.. O(n+k)
But if all numbers end up in same bucket ...O(n²)
My question is if we make range of buckets as 1 ie 0-1
,1-2,
2-3.....
Different no.s won't end up in same bucket....(no sorting within bucket required)
O(n+k)
Without concerning space complexity why don't we use this instead of counting sort?
Plzz correct me if I m wrong..
What you propose is a distribution sort called count sort, only a simpler version where you know that elements are not duplicated, so counting stops at 1. It is very efficient in time O(N+n) but does require O(N) space.
Many people will naturally use this method when asked to sort a deck of cards: they will dispatch each card to its position on the table in order to form 4 lines of 13 cards. The final step is to gather the cards line by line. Here we have N == n and since both steps take O(n) time, the sort is very efficient.
When N becomes substantially larger than n, say you want to sort a pile of 20 dollar bills by the order of their serial numbers, this method becomes totally impractical.
If you are sorting integers, you might consider another method with O(n) time complexity: Radix sort.
The value of k is not the same in the first approach and the one you propose. Assume you have n numbers between 0 and N. In the first case (buckets of size ten) you need N/10 bucket, in the second case (buckets of size one) N buckets. Depending on the relative values of N and n, there will be an optimal for k which may not be k=1.

Scenarios for selection sort, insertion sort, and quick sort

If anyone can give some input on my logic, I would very much appreciate it.
Which method runs faster for an array with all keys identical, selection sort or insertion sort?
I think that this would be similar to when the array is already sorted, so that insertion sort will be linear, and the selection sort quadratic.
Which method runs faster for an array in reverse order, selection sort or insertion sort?
I think that they would run similarly, since the values at every position will have to be changed. The worst case scenario for insertion sort is reverse order, so that would mean it is quadratic, and then the selection sort would already be quadratic as well.
Suppose that we use insertion sort on a randomly ordered array where elements have only one of three values. Is the running time linear, quadratic, or something in between?
Since it is randomly sorted, I think that would mean that the insertion sort would have to perform many more times the number of operations that the number of values. If that's the case, then its not linear.So, it would likely be quadratic, or perhaps a little below quadratic.
What is the maximum number of times during the execution of Quick.sort() that the largest item can be exchanged, for an array of length N?
The maximum number cannot be passed over more times than there are spaces available, since it should always be approaching its right position. So, going from being the first to the last value spot, it would be exchanged N times.
About how many compares will quick.sort() make when sorting an array of N items that are all equal?
When drawing out the quick sort , a triangle can be drawn around the compared objects at every phase, that is N tall and N wide, the area of this would equal the number of compares performed, which would be (N^2)/2
Here are my comments on your comments:
Which method runs faster for an array with all keys identical, selection sort or insertion sort?
I think that this would be similar to when the array is already sorted, so that insertion sort will be linear, and the selection sort quadratic.
Yes, that's correct. Insertion sort will do O(1) work per element and visit O(n) elements for a total runtime of O(n). Selection sort always runs in time Θ(n2) regardless of the input structure, so its runtime will be quadratic.
Which method runs faster for an array in reverse order, selection sort or insertion sort?
I think that they would run similarly, since the values at every position will have to be changed. The worst case scenario for insertion sort is reverse order, so that would mean it is quadratic, and then the selection sort would already be quadratic as well.
You're right that both algorithms have quadratic runtime. The algorithms should actually have relatively comparable performance, since they'll make the same total number of comparisons.
Suppose that we use insertion sort on a randomly ordered array where elements have only one of three values. Is the running time linear, quadratic, or something in between?
Since it is randomly sorted, I think that would mean that the insertion sort would have to perform many more times the number of operations that the number of values. If that's the case, then its not linear.So, it would likely be quadratic, or perhaps a little below quadratic.
This should take quadratic time (time Θ(n2)). Take just the elements in the back third of the array. About a third of these elements will be 1's, and in order to insert them into the sorted sequence they'd need to be moved above 2/3's of the way down the array. Therefore, the work done would be at least (n / 3)(2n / 3) = 2n2 / 9, which is quadratic.
What is the maximum number of times during the execution of Quick.sort() that the largest item can be exchanged, for an array of length N?
The maximum number cannot be passed over more times than there are spaces available, since it should always be approaching its right position. So, going from being the first to the last value spot, it would be exchanged N times.
There's an off-by-one error here. When the array has size 1, the largest element can't be moved any more, so the maximum number of moves would be N - 1.
About how many compares will quick.sort() make when sorting an array of N items that are all equal?
When drawing out the quick sort , a triangle can be drawn around the compared objects at every phase, that is N tall and N wide, the area of this would equal the number of compares performed, which would be (N^2)/2
This really depends on the implementation of Quick.sort(). Quicksort with ternary partitioning would only do O(n) total work because all values equal to the pivot are excluded in the recursive calls. If this isn't done, then your analysis would be correct.
Hope this helps!

Insertion sort vs Bubble Sort Algorithms

I'm trying to understand a few sorting algorithms, but I'm struggling to see the difference in the bubble sort and insertion sort algorithm.
I know both are O(n2), but it seems to me that bubble sort just bubbles the maximum value of the array to the top for each pass, while insertion sort just sinks the lowest value to the bottom each pass. Aren't they doing the exact same thing but in different directions?
For insertion sort, the number of comparisons/potential swaps starts at zero and increases each time (ie 0, 1, 2, 3, 4, ..., n) but for bubble sort this same behaviour happens, but at the end of the sorting (ie n, n-1, n-2, ... 0) because bubble sort no longer needs to compare with the last elements as they are sorted.
For all this though, it seems a consensus that insertion sort is better in general. Can anyone tell me why?
Edit: I'm primarily interested in the differences in how the algorithms work, not so much their efficiency or asymptotic complexity.
Insertion Sort
After i iterations the first i elements are ordered.
In each iteration the next element is bubbled through the sorted section until it reaches the right spot:
sorted | unsorted
1 3 5 8 | 4 6 7 9 2
1 3 4 5 8 | 6 7 9 2
The 4 is bubbled into the sorted section
Pseudocode:
for i in 1 to n
for j in i downto 2
if array[j - 1] > array[j]
swap(array[j - 1], array[j])
else
break
Bubble Sort
After i iterations the last i elements are the biggest, and ordered.
In each iteration, sift through the unsorted section to find the maximum.
unsorted | biggest
3 1 5 4 2 | 6 7 8 9
1 3 4 2 | 5 6 7 8 9
The 5 is bubbled out of the unsorted section
Pseudocode:
for i in 1 to n
for j in 1 to n - i
if array[j] > array[j + 1]
swap(array[j], array[j + 1])
Note that typical implementations terminate early if no swaps are made during one of the iterations of the outer loop (since that means the array is sorted).
Difference
In insertion sort elements are bubbled into the sorted section, while in bubble sort the maximums are bubbled out of the unsorted section.
In bubble sort in ith iteration you have n-i-1 inner iterations (n^2)/2 total, but in insertion sort you have maximum i iterations on i'th step, but i/2 on average, as you can stop inner loop earlier, after you found correct position for the current element. So you have (sum from 0 to n) / 2 which is (n^2) / 4 total;
That's why insertion sort is faster than bubble sort.
Another difference, I didn't see here:
Bubble sort has 3 value assignments per swap:
you have to build a temporary variable first to save the value you want to push forward(no.1), than you have to write the other swap-variable into the spot you just saved the value of(no.2) and then you have to write your temporary variable in the spot other spot(no.3).
You have to do that for each spot - you want to go forward - to sort your variable to the correct spot.
With insertion sort you put your variable to sort in a temporary variable and then put all variables in front of that spot 1 spot backwards, as long as you reach the correct spot for your variable. That makes 1 value assignement per spot. In the end you write your temp-variable into the the spot.
That makes far less value assignements, too.
This isn't the strongest speed-benefit, but i think it can be mentioned.
I hope, I expressed myself understandable, if not, sorry, I'm not a nativ Britain
The main advantage of insert sort is that it's online algorithm. You don't have to have all the values at start. This could be useful, when dealing with data coming from network, or some sensor.
I have a feeling, that this would be faster than other conventional n log(n) algorithms. Because the complexity would be n*(n log(n)) e.g. reading/storing each value from stream (O(n)) and then sorting all the values (O(n log(n))) resulting in O(n^2 log(n))
On the contrary using Insert Sort needs O(n) for reading values from the stream and O(n) to put the value to the correct place, thus it's O(n^2) only. Other advantage is, that you don't need buffers for storing values, you sort them in the final destination.
Bubble Sort is not online (it cannot sort a stream of inputs without knowing how many items there will be) because it does not really keep track of a global maximum of the sorted elements. When an item is inserted you will need to start the bubbling from the very beginning
well bubble sort is better than insertion sort only when someone is looking for top k elements from a large list of number
i.e. in bubble sort after k iterations you'll get top k elements. However after k iterations in insertion sort, it only assures that those k elements are sorted.
Though both the sorts are O(N^2).The hidden constants are much smaller in Insertion sort.Hidden constants refer to the actual number of primitive operations carried out.
When insertion sort has better running time?
Array is nearly sorted-notice that insertion sort does fewer operations in this case, than bubble sort.
Array is of relatively small size: insertion sort you move elements around, to put the current element.This is only better than bubble sort if the number of elements is few.
Notice that insertion sort is not always better than bubble sort.To get the best of both worlds, you can use insertion sort if array is of small size, and probably merge sort(or quicksort) for larger arrays.
Number of swap in each iteration
Insertion-sort does at most 1 swap in each iteration.
Bubble-sort does 0 to n swaps in each iteration.
Accessing and changing sorted part
Insertion-sort accesses(and changes when needed) the sorted part to find the correct position of a number in consideration.
When optimized, Bubble-sort does not access what is already sorted.
Online or not
Insertion-sort is online. That means Insertion-sort takes one input at a time before it puts in appropriate position. It does not have to compare only adjacent-inputs.
Bubble-sort is not-online. It does not operate one input at a time. It handles a group of inputs(if not all) in each iteration. Bubble-sort only compare and swap adjacent-inputs in each iteration.
insertion sort:
1.In the insertion sort swapping is not required.
2.the time complexity of insertion sort is Ω(n)for best case and O(n^2) worst case.
3.less complex as compared to bubble sort.
4.example: insert books in library, arrange cards.
bubble sort:
1.Swapping required in bubble sort.
2.the time complexity of bubble sort is Ω(n)for best case and O(n^2) worst case.
3.more complex as compared to insertion sort.
I will try to give a more concise and informative answer than others.
Yes, after each pass, insertion sort and bubble sort intuitively seem the same - they both build a sorted sublist at the edge.
However, insertion sort will perform fewer comparisons in general. With insertion sort, we are only performing a linear search in the sorted sublist with each pass. With random data, you can expect to make m/2 comparisons and swaps, where m is the size of the sorted sublist.
With bubble sort, we are always comparing EVERY pair in the unsorted sublist with each pass, so that's n-m comparisons (twice as many as insertion sort on random data). This means bubble sort is bad if comparisons are expensive/slow.
Also, the branching associated with swaps and compares for insertion sort is more predictable. We do a linear search at the same time as a linear insert, and we can generally predict/assume that the linear search/insert will continue until the correct space is found. With bubble sort, branching is essentially random, and we can expect a branch miss half the time! With every single compare! This means bubble sort is bad for pipelined processors if comparisons and swaps are relatively cheap/fast.
These factors make bubble sort much slower in general than insertion sort.
Insertion Sort: We insert the elements into their proper positions in the array, one at a time. When we reach the nth element in the array, the n-1 elements are sorted.
Bubble Sort: We start with a bubble of one element and keep extending the bubble by a quantity of 1, until all elements are added. At any iteration, we simply swap the adjacent elements in the proper order so as to get the largest element at the end of the bubble. In this way, we keep on putting the largest element at the end of the array, and finally after all iterations our sorting is done.
Bubble Sort and Insertion sort complexity: O(n^2)
Insertion is faster as compared to Bubble sort, for the following reason:
Insertion sort just compares an element to a sorted array, that is ith element to the array containing 1...i-1 elements, which are sorted already. Therefore, there are less number of comparisons and swaps.
In Bubble sort, however, as the bubble increases, the same iteration of comparing each pair of neighbors runs. This leads to a lot more comparisons and swapping as compared to Insertion Sort.
Therefore, even though the time complexity of both the algorithms is O(n^2); insertion sort results in a faster approach that bubble sort.
Insertion sort can be resumed as "Look for the element which should be at first position(the minimum), make some space by shifting next elements, and put it at first position. Good. Now look at the element which should be at 2nd...." and so on...
Bubble sort operate differently which can be resumed as "As long as I find two adjacent elements which are in the wrong order, I swap them".
Bubble sort is almost useless under all circumstances. In use cases when insertion sort may have too many swaps, selection sort can be used because it guarantees less than N times of swap. Because selection sort is better than bubble sort, bubble sort has no use cases.

Is it possible to find two numbers whose difference is minimum in O(n) time

Given an unsorted integer array, and without making any assumptions on
the numbers in the array:
Is it possible to find two numbers whose
difference is minimum in O(n) time?
Edit: Difference between two numbers a, b is defined as abs(a-b)
Find smallest and largest element in the list. The difference smallest-largest will be minimum.
If you're looking for nonnegative difference, then this is of course at least as hard as checking if the array has two same elements. This is called element uniqueness problem and without any additional assumptions (like limiting size of integers, allowing other operations than comparison) requires >= n log n time. It is the 1-dimensional case of finding the closest pair of points.
I don't think you can to it in O(n). The best I can come up with off the top of my head is to sort them (which is O(n * log n)) and find the minimum difference of adjacent pairs in the sorted list (which adds another O(n)).
I think it is possible. The secret is that you don't actually have to sort the list, you just need to create a tally of which numbers exist. This may count as "making an assumption" from an algorithmic perspective, but not from a practical perspective. We know the ints are bounded by a min and a max.
So, create an array of 2 bit elements, 1 pair for each int from INT_MIN to INT_MAX inclusive, set all of them to 00.
Iterate through the entire list of numbers. For each number in the list, if the corresponding 2 bits are 00 set them to 01. If they're 01 set them to 10. Otherwise ignore. This is obviously O(n).
Next, if any of the 2 bits is set to 10, that is your answer. The minimum distance is 0 because the list contains a repeated number. If not, scan through the list and find the minimum distance. Many people have already pointed out there are simple O(n) algorithms for this.
So O(n) + O(n) = O(n).
Edit: responding to comments.
Interesting points. I think you could achieve the same results without making any assumptions by finding the min/max of the list first and using a sparse array ranging from min to max to hold the data. Takes care of the INT_MIN/MAX assumption, the space complexity and the O(m) time complexity of scanning the array.
The best I can think of is to counting sort the array (possibly combining equal values) and then do the sorted comparisons -- bin sort is O(n + M) (M being the number of distinct values). This has a heavy memory requirement, however. Some form of bucket or radix sort would be intermediate in time and more efficient in space.
Sort the list with radixsort (which is O(n) for integers), then iterate and keep track of the smallest distance so far.
(I assume your integer is a fixed-bit type. If they can hold arbitrarily large mathematical integers, radixsort will be O(n log n) as well.)
It seems to be possible to sort unbounded set of integers in O(n*sqrt(log(log(n))) time. After sorting it is of course trivial to find the minimal difference in linear time.
But I can't think of any algorithm to make it faster than this.
No, not without making assumptions about the numbers/ordering.
It would be possible given a sorted list though.
I think the answer is no and the proof is similar to the proof that you can not sort faster than n lg n: you have to compare all of the elements, i.e create a comparison tree, which implies omega(n lg n) algorithm.
EDIT. OK, if you really want to argue, then the question does not say whether it should be a Turing machine or not. With quantum computers, you can do it in linear time :)

Resources