No. of comparisons in merging lists - sorting

I am confused over the best case, the average case and the worst-case for the number of comparisons done by the merge function in merge sort.I have the following questions:
1.If I pass in a list of unsorted elements that give the worst-case time in merge sort, will it also be the case for maximum number of comparisons?
2.Suppose I pass in a list of size N that takes the maximum time to be sorted.The number of comparisons in this case would be N-1.Am I right?
3.What will be the best case for number of comparisons?
Someone please help me.I studied too much and now am confused.

There are three steps to merge sort:
Split the list (approximately in half). (If there's only one element, you're done).
Sort each list
Merge each list.
Only the merge operation requires comparisons, and in every case you need to do O(n) comparisons.
The best case for comparisons is when the list is already ordered; in that case, you'll be comparing each element of the first list to the first element of the second list:
[1, 2, 3, 4] ++ [5, 6, 7, 8]:
1 > 5
2 > 5
3 > 5
4 > 5
After those 4 comparisons, the sorting is done.
The worst case is alternating elements:
[1, 3, 5, 7] ++ [2, 4, 6, 8]
1 > 2 -> Result = [1]
3 > 2 -> Result = [1, 2]
4 > 3 -> Result = [1, 2, 3]
5 > 4 -> Result = [1, 2, 3, 4]
6 > 5 -> Result = [1, 2, 3, 4, 5]
7 > 6 -> Result = [1, 2, 3, 4, 5, 6]
8 > 7 -> Result = [1, 2, 3, 4, 5, 6, 7, 8] (Once the first list is finished, no more comparisons are necessary.)
However, while each merge operation is O(n) (n/2 in the best case, n-1 in the worst case), the total number of comparisons will be O(log n).

Related

How to optimize this for loop faster than O(N^3)?

My for loop prints all the consecutive subsequence of a list. For example, suppose a list contains [0, 1,2,3,4,5,6,7,8,9]. It prints,
0
0,1
0,1,2
0,1,2,3
........
0,1,2,3,4,5,6,7,8,9
1
1,2
1,2,3
1,2,3,4,5,6,7,8,9
........
8
8,9
9
for i in range(10)
for j in range(i, 10):
subseq = []
for k in range(i, j+1):
subseq.append(k)
print(subseq)
The current algorithmic complexity of this for loop is O(N^3). Is there any way to make this algorithm any faster?
I don't know Python (this is Python, right?), but something like this will be a little faster version of O(N^3) (see comments below):
for i in range(10):
subseq = []
for j in range(i, 10):
subseq.append(j)
print(subseq)
Yes, that works:
[0]
[0, 1]
[0, 1, 2]
[0, 1, 2, 3]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4, 5]
[0, 1, 2, 3, 4, 5, 6]
[0, 1, 2, 3, 4, 5, 6, 7]
[0, 1, 2, 3, 4, 5, 6, 7, 8]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[1]
[1, 2]
...
[7, 8]
[7, 8, 9]
[8]
[8, 9]
[9]
It’s not possible to do this in less than O(n3) time because you’re printing a total of O(n3) items. Specifically, split the array in quarters and look at the middle two quarters of the array. Pick any element there - say, the one at position k. That will be printed in at least n2 / 4 different subarrays: pick any element in the first quarter, any element in the last quarter, and the subarray between those elements will contain the element at position k.
This means that any of the n / 2 items in the middle two quarters gets printed at least n2 / 4 times, so you print at least n3 / 8 total values. There’s no way to do that in better than O(n3) time.

No. of elements sorted after m passes of insertionsort?

I am learning algorithms and Insertion sort. While solving the quiz, I came across the questions:
In insertion sort, after M passes through the array, How many elements are in sorted order?
With my understanding, I wrote M+1 as the answer. But that turned out to be wrong.
Actual answer is : First M elements are in sorted order after M passes of insertion sort on the array.
Why is this so? This is what I thought:
Say I/p array : 5, 4, 3, 2, 1 is given to insertion sort. Now, after each iteration, the result will look like:
Initial Input ==> after 1st iteration ==> after 2nd ==> after 3rd ==> after 4th
[5, 4, 3, 2, 1] ==> [4, 5, 3, 2, 1] ==> [3, 4, 5, 2, 1] ==> [2, 3, 4, 5, 1] ==> [1, 2, 3, 4, 5]
Here say after 2nd Iteration, we get [3, 4, 5, 2, 1] In which, elements 3, 4, 5 are sorted. So, after 2 passes, 3 elements are sorted. So, why was the answer M?
I tried finding answer from the internet, but there are no reliable resources / no explanations given. What am I missing here?

Is there a data structure for effective implementation of this encryption algorithm?

input -> alphabet -> output (index of a number in alphabet) -> new alphabet (the number moved to the begin of the alphabet):
3 -> [1, 2, 3, 4, 5] -> 3 -> [3, 1, 2, 4, 5]
2 -> [3, 1, 2, 4, 5] -> 3 -> [2, 3, 1, 4, 5]
1 -> [2, 3, 1, 4, 5] -> 3 -> [1, 2, 3, 4, 5]
1 -> [1, 2, 3, 4, 5] -> 1 -> [1, 2, 3, 4, 5]
4 -> [1, 2, 3, 4, 5] -> 4 -> [4, 1, 2, 3, 5]
5 -> [4, 1, 2, 3, 5] -> 5 -> [5, 4, 1, 2, 3]
input: (n - number of numbers in alphabet, m - length of text to be encrypted, the text)
5, 6
3 2 1 1 4 5
Answer: 3 2 1 1 4 5 -> 3 3 3 1 4 5
Is there any data structure or algorithm to make this efficiently, faster than O(n*m)?
I'd be appreciated for any ideas. Thanks.
Use an order statistics tree to store the pairs (1,1)...(n,n), ordered by their first elements.
Look up the translation for a character c by selecting the c-th smallest element of the tree and taking its second element.
Then update the tree by removing the node that you looked up and inserting it back into the tree with the first element of the pair set to -t, where t is the position in the message (or some other steadily decreasing counter).
Lookup, removal and insertion can be done in O(ln n) time worst-case if a self-balanced search tree (e.g. a red-black tree) is used as underlying tree structure for the order statistics tree.
Given that the elements for the initial tree are inserted in order, the tree structure can be build in O(n).
So the whole algorithm will be O(n + m ln n) time, worst-case.
You can further improve this for the case that n is larger than m, by storing only one node for any continuous range of nodes in the tree, but counting it for the purpose of rank in the order statistics tree according to the number of nodes there would normally be.
Starting then from only one actually stored node, when the tree is rearranged, you split the range-representing node into three: one node representing the range before the found value, one representing the range after the found value and one representing the actual value. These three nodes are then inserted back, in case of the range nodes only if they are non-empty and with the first pair element equal to the second and in case of the non-range node, with the negative value as described before. If a node with negative first entry is found, it is not split in this.
The result of this is that the tree will contain at most O(m) nodes, so the algorithm has a worst-time complexity of O(m ln min(n,m)).
Maybe a hashmap with letter/index pairs? I believe that element lookup in a hashmap usually O(1) most of the time, unless you have a lot of collisions (which is unlikely).

Find the maximum number of points per game

The input is an array of cards. In one move, you can remove any group of consecutive identical cards. For removing k cards, you get k * k points. Find the maximum number of points you can get per game.
Time limit: O(n4)
Example:
Input: [1, 8, 7, 7, 7, 8, 4, 8, 1]
Output: 23
Does anyone have an idea how to solve this?
To clarify, in the given example, one path to the best solution is
Remove Points Total new hand
3 7s 9 9 [1, 8, 8, 4, 8, 1]
1 4 1 10 [1, 8, 8, 8, 1]
3 8s 9 19 [1, 1]
2 1s 4 23 []
Approach
Recursion would fit well here.
First, identify the contiguous sequences in the array -- one lemma of this problem is that if you decide to remove at least one 7, you want to remove the entire sequence of three. From here on, you'll work with both cards and quantities. For instance,
card = [1, 8, 7, 8, 4, 8, 1]
quant = [1, 1, 3, 1, 1, 1, 1]
Now you're ready for the actual solving. Iterate through the array. For each element, remove that element, and add the score for that move.
Check to see whether the elements on either side match; if so, merge those entries. Recur on the remaining array.
For instance, here's the first turn of what will prove to be the optimal solution for the given input:
Choose and remove the three 7's
card = [1, 8, 8, 4, 8, 1]
quant = [1, 1, 1, 1, 1, 1]
score = score + 3*3
Merge the adjacent 8 entries:
card = [1, 8, 4, 8, 1]
quant = [1, 2, 1, 1, 1]
Recur on this game.
Improvement
Use dynamic programming: memoize the solution for every sub game.
Any card that appears only once in the card array can be removed first, without loss of generality. In the given example, you can remove the 7's and the single 4 to improve the remaining search tree.

Find the middle element in merged arrays in O(logn)

We have two sorted arrays of the same size n. Let's call the array a and b.
How to find the middle element in an sorted array merged by a and b?
Example:
n = 4
a = [1, 2, 3, 4]
b = [3, 4, 5, 6]
merged = [1, 2, 3, 3, 4, 4, 5, 6]
mid_element = merged[(0 + merged.length - 1) / 2] = merged[3] = 3
More complicated cases:
Case 1:
a = [1, 2, 3, 4]
b = [3, 4, 5, 6]
Case 2:
a = [1, 2, 3, 4, 8]
b = [3, 4, 5, 6, 7]
Case 3:
a = [1, 2, 3, 4, 8]
b = [0, 4, 5, 6, 7]
Case 4:
a = [1, 3, 5, 7]
b = [2, 4, 6, 8]
Time required: O(log n). Any ideas?
Look at the middle of both the arrays. Let's say one value is smaller and the other is bigger.
Discard the lower half of the array with the smaller value. Discard the upper half of the array with the higher value. Now we are left with half of what we started with.
Rinse and repeat until only one element is left in each array. Return the smaller of those two.
If the two middle values are the same, then pick arbitrarily.
Credits: Bill Li's blog
Quite interesting task. I'm not sure about O(logn), but solution O((logn)^2) is obvious for me.
If you know position of some element in first array then you can find how many elements are smaller in both arrays then this value (you know already how many smaller elements are in first array and you can find count of smaller elements in second array using binary search - so just sum up this two numbers). So if you know that number of smaller elements in both arrays is less than N, you should look in to the upper half in first array, otherwise you should move to the lower half. So you will get general binary search with internal binary search. Overall complexity will be O((logn)^2)
Note: if you will not find median in first array then start initial search in the second array. This will not have impact on complexity
So, having
n = 4 and a = [1, 2, 3, 4] and b = [3, 4, 5, 6]
You know the k-th position in result array in advance based on n, which is equal to n.
The result n-th element could be in first array or second.
Let's first assume that element is in first array then
do binary search taking middle element from [l,r], at the beginning l = 0, r = 3;
So taking middle element you know how many elements in the same array smaller, which is middle - 1.
Knowing that middle-1 element is less and knowing you need n-th element you may have [n - (middle-1)]th element from second array to be smaller, greater. If that's greater and previos element is smaller that it's what you need, if it's greater and previous is also greater we need to L = middle, if it's smaller r = middle.
Than do the same for the second array in case you did not find solution for first.
In total log(n) + log(n)

Resources