Simple Algorithm Question - algorithm

Watching the free MIT Algorithms course on iTunesU and Im stuck on the first lecture.
Take an insertion sort, its time is really T(n/2) in worst-case (reversed order array/list), but they say that this is theta n squared. I would thought this would be theta n. Im lost how they say this is n squared. Im stuck how they jump to concluding this is n squared, Wikipedia is not helping either. Can someone dumb it down further?

Insertion-sorting an array of 4 elements that starts in reverse order:
4 3 2 1
first, insert the "4" into its proper position in an array of length 1 (i.e. do nothing).
next, insert the "3" into its proper position in an array of length 2:
3 4 2 1
(we had to move the 3 and the 4)
next, insert the "2" into its proper position in an array of length 3:
2 3 4 1
(we had to move the 2, the 3 and the 4)
next, insert the "1"
1 2 3 4
(we had to move the 1, the 2, the 3 and the 4)
We performed n steps, and each step k required moving k elements (or k-1 swaps, depending how you want to look at it). The sum of k from 1 to n is Theta(n^2).
In the case of a simple linked list structure[*], we can move an object into its proper place in O(1), but in general finding the proper place still requires a linear search through the part of the data that's already sorted, so it still ends up only O(n^2) for general input. A basic insertion sort for a linked list happens to deal very well with reverse-ordered data, though, because it happens to always find the correct insertion position immediately. So we get n steps of O(1) each for a total O(n) running time for this particular data.
Assuming we still choose the first unsorted element to insert, and that we search forward through the sorted part of the list at each step, then the worst case of insertion sort for a list is already-sorted data, and is Theta(n^2) again.
[*] meaning, nothing fancy like a skip list.

From wikipedia:
The worst case input is an array
sorted in reverse order. In this case
every iteration of the inner loop will
scan and shift the entire sorted
subsection of the array before
inserting the next element. For this
case insertion sort has a quadratic
running time (i.e., O(n2)).
First loop is iterating on the array/list to sort, inner loop iterates on the partially sorted array/list. If it's already sorted you can see that you iterate all the way to the end of the sorted container every time.
Here's more explanation in pseudo:
for element in unsorted_container
for current_element in sorted_container
if element < current_element -> Will never happen since sorted in reverse order.
InsertBefore(element, current_element)
if element not inserted
InsertAtEnd(element) <- Will always execute this part since it will always insert at end.

Related

Bubble sort variant - three adjacent number swapping

This problem appeared in code jam 2018 qualification round which has ended.
https://codejam.withgoogle.com/2018/challenges/ (Problem 2)
Problem description:
The basic operation of the standard bubble sort algorithm is to examine a pair of adjacent numbers and reverse that pair if the left number is larger than the right number. But our algorithm examines a group of three adjacent numbers, and if the leftmost number is larger than the rightmost number, it reverses that entire group. Because our algorithm is a "triplet bubble sort", we have named it Trouble Sort for short.
We were looking forward to presenting Trouble Sort at the Special
Interest Group in Sorting conference in Hawaii, but one of our interns
has just pointed out a problem: it is possible that Trouble Sort does
not correctly sort the list! Consider the list 8 9 7, for example.
We need your help with some further research. Given a list of N
integers, determine whether Trouble Sort will successfully sort the
list into non-decreasing order. If it will not, find the index
(counting starting from 0) of the first sorting error after the
algorithm has finished: that is, the first value that is larger than
the value that comes directly after it when the algorithm is done.
So a naive approach will be to apply trouble sort on the given list, apply normal sort on the list, and find the index of the first non-matching element. However, this would time out for very large N.
Here is what I figured:
The algorithm will compare 0th index with 2nd, 2nd with 4th and so on.
Similarly 1st with 3rd, 3rd with 5th and so on.
All the elements at odd index will be sorted with respect to odd index. Same for even indexed element.
So the issue would lie between two consecutive odd/even indexed element.
I can't think of a way to figure it out without doing an O(n^2) approach.
Is my approach any viable, or there is something easier?
Your observation is spot on. The algorithm presented in the problem statement will only compare( and swap ) the consecutive odd and even elements among themselves.
If you take that observation one step further, you can state that Trouble Sort is an algorithm that correctly sorts odd- and even-indexed elements of an array within themselves. (i.e. as if odd-indexed elements and even-indexed elements of an array A are two separate arrays B and C)
In other words, Trouble Sort does sort B and C correctly. The issue here is whether those arrays B and C of odd and even-indexed elements can be merged properly. You should check if sorting odd- and even-indexed elements among themselves is enough to make the entire array sorted.
This step is really similar to the merging step of MergeSort. The only difference is that, due to the indexing being a limiting factor on your operation, you know at all times from which array you will pick the top element. For a 1-indexed array A, during the merging step of B and C, at each step, you should pick the smallest previously unpicked element from B, and then C.
So, basically, if you sort B and C, which takes, O(NlogN) using an algorithm such as mergesort or heapsort, and then merge them in the manner described in the previous paragraph, which takes O(N), you end up with the same version of the array A after it has been processed by the Trouble Sort algorithm.
The difference is the time complexity. While Trouble Sort takes O(N^2) time, the operations described above takes O(NlogN) time. Once you end up with this array, then you can check in O(N) time if, for each consecutive indices i, j, A[i] < A[j] holds. The overall complexity of the algorithm would still be O(NlogN).
Below is a code sample in Python to demonstrate sort of a pseudocode of the algorithm I described above. There are a couple of minor differences in implementation due to Python arrays being 0-indexed. You may observe the execution of this code here.
def does_trouble_sort_work(A):
B, C = A[0::2], A[1::2]
B_sorted = sorted(B)
C_sorted = sorted(C)
j = k = 0
for i in xrange(len(A)):
if i % 2 == 0:
A[i] = B_sorted[j]
j += 1
else:
A[i] = C_sorted[k]
k += 1
trouble_sort_works = True
for i in xrange(1, len(A)):
if A[i-1] > A[i]:
trouble_sort_works = False
break
return trouble_sort_works

Find element of an array that appears only once in O(logn) time

Given an array A with all elements appearing twice except one element which appears only once. How do we find the element which appears only once in O(logn) time? Let's discuss two cases.
Array is always sorted and elements are in sequential order. Let's assume A = [1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 6], we want to find 3 in log n time because it appears only once.
When the array is not sorted and the elements are not in sequential order.
I can only come up with a solution of using the XOR operator on the binary representation of the integers as explained Here, and at the end, the binary string will represent the element which appears only once because duplicates will cancel out. But it takes O(n) time. How can we do better than that?
using Haroon S' comment this is the solution which I think is correct, given the constraints for time.
class Solution:
def singleNonDuplicate(self, nums: List[int]) -> int:
low = 0
high = len(nums)-1
while(low<high):
mid = (low+high)//2
if(mid%2==0):
mid+=1
if(nums[mid]==nums[mid+1]):
# answer in second half
high = mid-1
elif(nums[mid]==nums[mid-1]):
# answer in first half
low = mid+1
return nums[low]
If the elements are sorted (i.e., the first case you mentioned) then I believe a strategy not unlike binary search could work in O(logN) time.
Starting from the left endpoint in a sorted array, until we encounter the unique element, all the index pairs (2i, 2i + 1) we encounter along the way will have the same value. (i.e., due to the array being sorted) However, as we go towards the right endpoint of the array, as soon as we consider an array that includes the unique element, that structure of "same values within (2i, 2i+1) index pairs" will be invalid.
Using that information, a search algorithm similar to binary search can find out in which half of the array the unique element is. Basically, you can deduce that, "in the left half of the array, if the values in the rightmost index pair (2i, 2i+1) are the same, then the unique value is in the right half". (i.e., with the exception of the last index on the left half-array being even; but you can overcome that case with various O(1) time operations)
The overall complexity then becomes O(logN), due to the halving of the array size at each step.
For the demonstration of the index notion I mentioned above, see your own example. In the left of the unique element(i.e. 3) all index pairs (2i, 2i+1) have the same values. And all subarrays starting from index 0 and ending with an index that is to the right of the unique element, all index pairs (2i, 2i+1) have a correspond to cells that contain different values.
Unless the array is sorted, though, since you'd have to investigate each and every element, I believe any algorithm you may come up with would take at least O(n) time. This is what I think will happen in the second case you mention in your question.
In the general case this is impossible, as to make sure an element doesn't repeat you need to check every other element.
From your example, it seems the array might be a sorted sequence of integers with no "gaps" (or some other clearly defined sequence, like all even numbers, etc). In this case it is possible with a modified binary search.
You have the array [1,1,2,2,3,4,4,5,5,6,6].
You check the middle element and the element following it and see 3 and 4. Now you know there are only 5 elements from the set {1, 2, 3}, while there are 6 elements from the set {4, 5, 6}. Which means, the missing elements is in {1, 2, 3}.
Then you recurse on [1,1,2,2,3]. You see 2,2. Now you know there are 2 "1" elements and 1 "3" element, so 3 is the answer.
The reason you check 2 elements in each step is that if you see just "3", you don't know whether you hit the first 3 in "3,3" or the second one. But if you read 2 elements you always find a "boundary" between 2 different elements.
The condition for this to be viable is that, given the value of an element, you need to be able to calculate in O(1) how many different elements come before this element. In your case this is trivial, but it is also possible for any arithmetic series, geometric series (with fixed size numbers)...
This is not a O(log n) solution. I have no idea how to solve it in logarithmic time without the constraints that the array is sorted and we have a known difference between consecutive numbers so we can recognise when we are to the left or right of the singleton. The other solutions already deal with that special case and I couldn’t do better there either.
I have a suggestion that might solve the general case in O(n), rather than O(n log n) when you first sort the array. It’s not as fast as the xor solution, but it will also work for non-integers. The elements must have an order, so it is not completely general, but it will work anywhere you can sort the elements.
The idea is the same as the k’th order element algorithm based on Quicksort. You partition and recurse on one half of the array. The time recurrence is T(n) = T(n/2) + O(n) = O(n).
Given array x and indices i,j, representing sub-array x[i:j], partition with quicksort’s partitioning method. You want a variant that partitions x[i:j] into three segments, x[i:k] x[k:l], x[l:j] where all elements in the first part are smaller than the pivot (whatever it is) all elements in x[k:l] are equal to the pivot, and all elements in the last segment are greater than the pivot.
(you might be able to use a version that only partitions in two, or explicitly count the number of pivots, but with this version is easier to work with here)
Now, if the middle segment has length one, you have your singleton. It is the pivot.
If not, the length of the segment that has the singleton is odd while the other is even. So recurse on the segment with the odd length.
It doesn’t give you worst case linear time, for the same reason that Quicksort isn’t worst case log-linear, but you get an expected linear time algorithm and likely a fast one at that.
Not, of course, as fast as those solutions based on binary search, but here the elements do not need to be sorted and we can handle elements with arbitrary gaps between them. We are also not restricted to data where we can easily manipulate their bit-patterns. So it is more general. If you can compare the elements, this approach will find the singleton in O(n).
This solution will find the element in the array that appeared only once but there should not be more than one element of that type and the array should be sorted. This is Binary Search and will return the element in O(log n) time.
var singleNonDuplicate = function(nums) {
let s=0,e= nums.length-1
while(s < e){
let mid = Math.trunc(s+(e-s)/2)
if((mid%2 == 0&& nums[mid] ==nums[mid+1])||(mid%2==1 && nums[mid] == nums[mid-1]) ){
s= mid+1
}
else{
e = mid
}
}
return nums[s] // can return nums[e] also
};
I don't believe there is a O(log n) solution for that. The reason is that in order to find which element is appearing only once, you at least need to iterate over the elements of that array once.

Binary search in 2 sorted integer arrays

There is a big array which consists of 2 small integer arrays written one at the end of another. Both small arrays are sorted by ascending. We have to find an element in big array as fast, as possible. My idea was to find the end of the left array by binsearch in big array and then implement 2 binsearches on small arrays. The problem is that I don't know how to find that end. If you have an idea, how to find element without finding borders of smaller arrays, you're welcome!
Information about arrays: both small arrays have integer elements, both are sorted by ascending, they both can have length from 0 to any positive integer number, but there can be only one copy of an element.
Here are some examples of big arrays:
1 2 3 4 5 6 7 (all the elements of the second array are bigger, than the maximum of the first array)
100 1 (both arrays have only one element)
1 3 5 2 4 6 or 2 4 6 1 3 5 (most common situations)
This problem is impossible to solve in guaranteed time complexity faster than O(n) and not possible to solve at all for certain arrays. Binary search runs in O(log n) for a sorted array, but the big array is not guaranteed to be sorted and will in the worst-case require one or more comparisions per element, which is O(n). The best guaranteed time complexity is O(n) with the trivial algorithm: compare every item with its neighbour until you find the "turning point" with A[i] > A[i+1]. However, if you use a breadth-first search, you may get lucky and find the "turning point" early.
Proof that the problem is unsolvable for some arrays: let the array M = [A B] be our big array. To find the point where the arrays meet we're looking for an index i where M[i] > M[i+1]. Now let A=[1 2 3] and B=[4 5]. There is no index in the array M for which the condition holds true, thus the problem is unsolvable for some arrays.
Informal proof for the former: let M=[A B] and A=[1..x] and B=[(x+1)..y] be two sorted arrays. Then swap the positions of element x and y in M. We have no way of finding the index of x without (in the worst case) checking every index, thus the problem is O(n).
Binary search relies on being able to eliminate half the solution space with each comparision, but in this case we cannot eliminate anything from the array and so we cannot do better than a linear search.
(From a practical standpoint, you should never do this in a program. The two arrays should be separate. If this isn't possible, append the length of either array to the bigger array.)
Edit: changed my answer after question was updated. It's possible to do it faster than linear time for some arrays, but not all possible arrays. Here's my idea for an algorithm using breadth-first search:
Start with the interval [0..n-1] where n is the length of the big array.
Make a list of intervals and put the starting interval in it.
For each interval in the list:
if the interval is only two elements and the first element is greater than the last
we found the turning point, return it
else if the interval is two elements or less
remove it from the list
else if the first element of the interval is greater than the last
turning point is in this interval
clear the list
split this interval in two equal parts and add them to the list
else
split this interval in two equal parts and replace this interval in the list with the two parts
I think a breadth-first approach will increase the odds of finding an interval where A[first] > A[last] early. Note that this approach will not work if the turning point is between two intervals, but it's something to get you started. I would test this myself, but unfortunately I don't have the time now.

Sequence of Some Algorithms on Sorting

i see in some midterm or final exam on MIT that the following question repeat and repeat in same manner.
we show an array in the some step of one sorting algorithm.
5,3,1,9,8,2,4,7
2,3,1,4,5,8,9,7
1,2,3,4,5,8,9,7
1,2,3,4,5,8,7,9
1,2,3,4,5,7,8,9
which of Insertion Sort / Quick Sort / Merge Sort / Exchange Sort is used?
how i find solution of this Questions? ?
Edit: i think this is quick sort because each level some elements is lower than pivot and some elements is greater that pivot ....
In such cases you can either a) find some pattern if you think there is one or b) go with simple elimination. Let's try elimination:
1) it cannot be insertion sort as insertion sort starts from the beginning and treats the range [0,k] as a sorted subarray of already checked values. Then it continues one by one so we first would insert 3 before 5 etc as we would at first treat [5] as a sorted subarray of size 1 and insert 3 into it as it's the next value in the whole array.
2) Merge sort would sort neighbor first as it would first recursively treat the whole array as single element arrays and then go back up the recursion tree and merge neigbors so more like this:
[3,5],[1,9],[2,8],[4,7]
[1,3,5,9],[2,4,7,8]
[1,2,3,4,5,6,7,8]
[] shows which parts were sorted at each step.
This means that after one pass neighbors will be sorted.
3) exchange sort would also have a different ordering - the second line should start with 3 as you would swap 5 and 3, then 5 and 1 etc in the first pass. So after one pass we would go from 5,3,1,9,8,2,4,7 into 3,1,5,8,2,4,7,9 if my bubble sort serves me right. We compare each pair and swap if element at i+1 is greater that at i. This way the last element will be the largest.
4) as you fairly pointed out this is quick sort as in each step we can clearly see that the array is getting pivoted around a certain value 4, then you pivot the left half around 2 and the right half around 5 etc.
The parts in bold are the patterns I was talking about, now since you know them you can easily check which one it is :-)
It should be quick sort, not only because the evidence of partition, but also this interesting fact: for some level, only one part of the array changed.
Now let's discuss each algorithm:
Insertion sort will give you a pattern that the first few elements must be sorted, but obviously we don't have this pattern;
Bubble sort (exchange sort) will keep exchanging neighbors if the former element is bigger than the later element, and thus the last k elements will be sorted after k iterations. Based on these two facts, we won't have a pair of neighbor (a, b) that b < a exists after each iteration. However, the sequence doesn't follow this, say the term (3, 1) in the first sequence still exists in the second sequence.
Merge sort first splits the array into 2 + 2 + 2 subarrays and then merge it into 4 + 4 and finally a sorted array of 8 elements, so totally should take 3 steps, but we have 4 steps here, so won't be merge sort.

From the given array of numbers find all the of numbers in group of 3 with sum value N

Given is a array of numbers:
1, 2, 8, 6, 9, 0, 4
We need to find all the numbers in group of three which sums to a value N ( say 11 in this example). Here, the possible numbers in group of three are:
{1,2,8}, {1,4,6}, {0,2,9}
The first solution I could think was of O(n^3). Later I could improve a little(n^2 log n) with the approach:
1. Sort the array.
2. Select any two number and perform binary search for the third element.
Can it be improved further with some other approaches?
You can certainly do it in O(n^2): for each i in the array, test whether two other values sum to N-i.
You can test in O(n) whether two values in a sorted array sum to k by sweeping from both ends at once. If the sum of the two elements you're on is too big, decrement the "right-to-left" index to make it smaller. If the sum is too small, increment the "left-to-right" index to make it bigger. If there's a pair that works, you'll find them, and you perform at most 2*n iterations before you run out of road at one end or the other. You might need code to ignore the value you're using as i, depends what the rules are.
You could instead use some kind of dynamic programming, working down from N, and you probably end up with time something like O(n*N) or so. Realistically I don't think that's any better: it looks like all your numbers are non-negative, so if n is much bigger than N then before you start you can quickly throw out any large values from the array, and also any duplicates beyond 3 copies of each value (or 2 copies, as long as you check whether 3*i == N before discarding the 3rd copy of i). After that step, n is O(N).

Resources