How do i derive an expression for the worst case number of comparisons needed to merge two sorted arrays of length n/2 - algorithm

merge sort uses divide and conquer approach

The worst case number of comparisons needed to merge two sorted arrays of length 𝑛/2 is 𝑛−1.
This is because when two values have been compared with each other, one of those two values will never be used in a comparison again: it is the value that follows the sorted order... and it is no longer part of the rest of the merge process.
In the worst case, the last comparison will be between the last value of the first array and the last value of the second array. All the other (𝑛−2) values were already excluded from further comparisons, so that means we already did 𝑛−2 comparisons. Now the last one executes, which completes the comparison count to be 𝑛−1.
Example of a worst case input
𝑛 = 10
A = [0, 2, 4, 6, 8]
B = [1, 3, 5, 7, 9]
Comparisons during the merge:
0 1
2 1
2 3
4 3
4 5
6 5
6 7
8 7
8 9
Example of a best case input
Just to put this in perspective, the best case occurs when the last value of the first array is less than the first value of the second array (or vice versa). This only needs 𝑛 / 2 comparisons.
𝑛 = 10
A = [0, 1, 2, 3, 4]
B = [5, 6, 7, 8, 9]
Comparisons during the merge:
0 5
1 5
2 5
3 5
4 5
No more comparisons are now needed, because the first list has no more values; only the values of the second list remain: but they don't need to be compared with anything, and can be appended to the result in the order they currently occur.
Remark
The fact that this process is used as part of merge sort -- implementing a divide and conquer algorithm -- is just background info. Merging two sorted lists can be needed in other contexts that do not relate to merge sort.

Related

Convert the permutation sequence A to B by selecting a set in A then reversing that set and inserting that set at the beginning of A

Given the sequence A and B consisting of N numbers that are permutations of 1,2,3,...,N. At each step, you choose a set S in sequence A in order from left to right (the numbers selected will be removed from A), then reverse S and add all elements in S to the beginning of the sequence A. Find a way to transform A into B in log2(n) steps.
Input: N <= 10^4 (number of elements of sequence A, B) and 2 permutations sequence A, B.
Output: K (Number of steps to convert A to B). The next K lines are the set of numbers S selected at each step.
Example:
Input:
5 // N
5 4 3 2 1 // A sequence
2 5 1 3 4 // B sequence
Output:
2
4 3 1
5 2
Step 0: S = {}, A = {5, 4, 3, 2, 1}
Step 1: S = {4, 3, 1}, A = {5, 2}. Then reverse S => S = {1, 3, 4}. Insert S to beginning of A => A = {1, 3, 4, 5, 2}
Step 2: S = {5, 2}, A = {1, 3, 4}. Then reverse S => S = {2, 5}. Insert S to beginning of A => A = {2, 5, 1, 3, 4}
My solution is to use backtracking to consider all possible choices of S in log2(n) steps. However, N is too large so is there a better approach? Thank you.
For each operation of combined selecting/removing/prepending, you're effectively sorting the elements relative to a "pivot", and preserving order. With this in mind, you can repeatedly "sort" the items in backwards order (by that I mean, you sort on the most significant bit last), to achieve a true sort.
For an explicit example, lets take an example sequence 7 3 1 8. Rewrite the terms with their respective positions in the final sorted list (which would be 1 3 7 8), to get 2 1 0 3.
7 -> 2 // 7 is at index 2 in the sorted array
3 -> 1 // 3 is at index 0 in the sorted array
1 -> 0 // so on
8 -> 3
This new array is equivalent to the original- we are just using indices to refer to the values indirectly (if you squint hard enough, we're kinda rewriting the unsorted list as pointers to the sorted list, rather than values).
Now, lets write these new values in binary:
2 10
1 01
0 00
3 11
If we were to sort this list, we'd first sort by the MSB (most significant bit) and then tiebreak only where necessary on the subsequent bit(s) until we're at the LSB (least significant bit). Equivalently, we can sort by the LSB first, and then sort all values on the next most significant bit, and continuing in this fashion until we're at the MSB. This will work, and correctly sort the list, as long as the sort is stable, that is- it doesn't change the order of elements that are considered equal.
Let's work this out by example: if we sorted these by the LSB, we'd get
2 10
0 00
1 01
3 11
-and then following that up with a sort on the MSB (but no tie-breaking logic this time), we'd get:
0 00
1 01
2 10
3 11
-which is the correct, sorted result.
Remember the "pivot" sorting note at the beginning? This is where we use that insight. We're going to take this transformed list 2 1 0 3, and sort it bit by bit, from the LSB to the MSB, with no tie-breaking. And to do so, we're going to pivot on the criteria <= 0.
This is effectively what we just did in our last example, so in the name of space I won't write it out again, but have a look again at what we did in each step. We took the elements with the bits we were checking that were equal to 0, and moved them to the beginning. First, we moved 2 (10) and 0 (00) to the beginning, and then the next iteration we moved 0 (00) and 1 (01) to the beginning. This is exactly what operation your challenge permits you to do.
Additionally, because our numbers are reduced to their indices, the max value is len(array)-1, and the number of bits is log2() of that, so overall we'll only need to do log2(n) steps, just as your problem statement asks.
Now, what does this look like in actual code?
from itertools import product
from math import log2, ceil
nums = [5, 9, 1, 3, 2, 7]
size = ceil(log2(len(nums)-1))
bit_table = list(product([0, 1], repeat=size))
idx_table = {x: i for i, x in enumerate(sorted(nums))}
for bit_idx in range(size)[::-1]:
subset_vals = [x for x in nums if bit_table[idx_table[x]][bit_idx] == 0]
nums.sort(key=lambda x: bit_table[idx_table[x]][bit_idx])
print(" ".join(map(str, subset_vals)))
You can of course use bitwise operators to accomplish the bit magic ((thing << bit_idx) & 1) if you want, and you could del slices of the list + prepend instead of .sort()ing, this is just a proof-of-concept to show that it actually works. The actual output being:
1 3 7
1 7 9 2
1 2 3 5

Practical algorithms for permuting external memory

On a spinning disk, I have N records that I want to permute. In RAM, I have an array of N indices that contain the desired permutation. I also have enough RAM to hold n records at a time. What algorithm can I use to execute the permutation on disk as quickly as possible, taking into account the fact that sequential disk access is a lot faster?
I have plenty of excess disk to use for intermediate files, if desired.
This is a known problem. Find the cycles in your permutation order. For instance, given five records to permute [1, 0, 3, 4, 2], you have cycles (0, 1) and (2, 3, 4). You do this by picking an unused starting position; follow the index pointers until you return to your starting point. The sequence of pointers describes a cycle.
You then permute the records with an internal temporary variable, one record long.
temp = disk[0]
disk[0] = disk[1]
disk[1] = temp
temp = disk[2]
disk[2] = disk[3]
disk[3] = disk[4]
disk[4] = temp
Note that you can also perform the permutation as you traverse the pointers. You will also need some method to recall which positions have already been permuted, such as clearing the permutation index (set it to -1).
Can you see how to generalize that?
This is an problem with interval coordination. I'll simplify the notation slightly by changing the memory available to M records -- having upper- and lower-case N is a little confusing.
First, we re-cast the permutations as a series of intervals, the rotational span during which a record needs to reside in RAM. If a record needs to be written to a lower-numbered position, we increase the endpoint by the list size, to indicate the wraparound -- have to wait for the next disk rotation. For instance, using my earlier example, we expand the list:
[1, 0, 3, 4, 2]
0 -> 1
1 -> 0+5
2 -> 3
3 -> 4
4 -> 2+5
Now, we apply standard greedy scheduling resolution. First, sort by endpoint:
[0, 1]
[2, 3]
[3, 4]
[1, 5]
[4, 7]
Now, apply the algorithm for M-1 "lanes"; the extra one is needed for swap space. We fill each lane, appending the interval with the earliest endpoint, whose start-point doesn't overlap:
[0, 1] [2, 3] [3, 4] [4, 7]
[1, 5]
We can do this in a total of 7 "ticks" if M >= 3. If M=2, we defer the second lane by 2 rotations to [11, 15].
Sneftal's nice example gives us more troubles, with deeper overlap:
[0, 4]
[1, 5]
[2, 6]
[3, 7]
[4, 0+8]
[5, 1+8]
[6, 2+8]
[7, 3+8]
This requires 4 "lanes" if available, deferring lanes as needed if M < 5.
The pathological case is where every record in the permutation needs to be copied back one position, such as [3, 0, 1, 2], with M=2.
[0, 3]
[1, 4]
[2, 5]
[3, 6]
In this case, we walk through the deferral cycle multiple times. At the end of every rotation, we have to defer all remaining intervals by one rotation, resulting in
[0, 3] [3, 6] [2+4, 5+4] [1+4+4, 4+4+4]
Does that get you moving, or do you need more detail?
I have an idea, which might need further improvement. But here it goes:
suppose the hdd has the following structure:
5 4 1 2 3
And we want to write out this permutation:
2 3 5 1 4
Since hdd is a circular buffer, and assuming it can only rotate in one direction, we can write the above permutation using shifts as such:
5 >> 2
4 >> 3
1 >> 1
2 >> 2
3 >> 2
So let's put that in an array, and since we know it is a circular array, lets put its mirrors side by side:
| 2 3 1 2 2 | 2 3 1 2 2| 2 3 1 2 2 | 2 3 1 2 2 |... Inf
Since we want to favor sequential reads, (or writes) we can put a cost function to the above series. Let the cost function be linear, i. e:
0 1 2 3 4 5 6 7 8 9 10 ... Inf
Now, let us add the cost function to the above series, but how to select the starting point?
The idea is to select the starting point such that you get the maximum congruent monotonically increasing sequence.
For example, if you select the 0 point to be on "3", you'll get
(1) | - 3 2 4 5 | 6 8 7 9 10 | ...
If you select the 0 point to be on "2", the one just right of "1", you'll get:
(2) | - - - 2 3 | 4 6 5 7 8 | ...
Since we are trying to favor consecutive reads, lets define our read-write function to work as such:
f():
At any currently pointed hdd location, function will read the currently pointed hdd file, into available RAM. (namely, total space - 1, because we want to save 1 for swap)
If no available space is left on RAM for read, the function will assert and program will halt.
At any current hdd location, if ram holds the value that we want to be written in that hdd location, function reads the current file into swap space, writes the wanted value from the ram to hdd, and destroys the value in ram.
If a value is placed into hdd, function will check if the sequence is completed. If it is, program will return with success.
Now, we should note that if the following holds:
shift amount <= n - 1 (n : available memory we can hold)
We can traverse the hard disk in once pass using the above function. For example:
current: 4 5 6 7 0 1 2 3
we want: 0 1 2 3 4 5 6 7
n : 5
We can start anywhere we want, say from the initial "4". We read 4 items sequentially, (n has 4 items now) and we start placing from 0 1 2 3, (we can because n = 5 total, and 4 is used. 1 is used for swap). So the total operations is 4 consecutive reads, and then r-w operations for 8 times.
Using that analogy, it becomes clear that if we subtract "n-1" from equations (1) and (2), the positions which have value "<= 0" will be a better suit for initial position because the ones higher than zero will definitely require another pass.
So we select eq. (2) and subtract, for let's say "n = 3", we subtract 2 from eq. (2):
(2) | - - - 0 1 | 2 4 3 5 6 | ...
Now it is clear that, using f(), and starting from 0, assuming n = 3, we will have a starting operation as such: r, r, r-w, r-w, ...
So, how do we do the rest and find minimum cost? We will place an array with initial minimum cost, just below equation (2). The positions in that array will signify where we want f() to be executed.
| - - - 0 1 | 2 4 3 5 6 | ...
| - - - 1 1 | 1 1 1 1 1 | ...
The second array, the ones with 1's and 0's tell the program where to execute f(). Note that, if we assumed those locations wrong, f() will assert.
Before we start actually placing files into hdd, we of course want to see if the f() positions are correct. We check if there are assertions, we we will try to minimize cost whilst removing all assertions. So, e.g:
(1) 1111000000000000001111
(2) 1111111000000000000000
(1) obviously has higher cost that (2). So the question simplifies on finding the 1-0 array.
Some ideas on finding the best array:
Simplest solution is to write out all 1's and turn assertions into 0's. (essentially it's a skip). This method is guaranteed to work.
Brute force: write an array of as shown in (2) and start shifting 1's to right, in such an order that tries out every permutation available:
1111111100000000
1111111010000000
1111110110000000
...
Full random approach: Plug in mt1997 and start permuting. Whenever you see a sharp drop in cost, stop executing and implement hdd copy-paste. You won't find the global minimum, but you'll get a nice trade-off.
Genetic algorithms: For permutations where "shift count is much lower than n - 1", the methodology provided in this answer should (?) provide a global minimum and smooth gradients. This allows one to use genetic algorithms without relying on mutations too much.
One advantage I find in this approach is that, since OP mentioned that this is a real life problem, the method provides an easy(ier?) way to change cost functions. It is easier to detect the effect of say, having lots of contigous small files to be copied vs. having a single huge file. Or perhaps rrwwrrww is better than rrrrwwww?
Does any of this even make sense? We will have to try out ...

Which sorting algorithm produces these steps?

This was a multiple-choice question in an exam today, and (at least) one of the answers should be true, but to me they all look wrong.
The sorting steps are:
5 2 6 1 3 4
4 2 6 1 3 5
4 2 5 1 3 6
4 2 3 1 5 6
1 2 3 4 5 6
The available answers were: Bubble Sort, Insertion Sort, Selection Sort, Merge Sort and Quick Sort.
I think that is a Quick sort. Here we can see the following steps:
A random selection of the reference element in the array (pivotValue), with respect to which reorders the elements of the array.
Move all of the values that are larger than the reference to the right, and all the values that the lower support left
Repeat algorithm for unsorted the left and right side of the array, while each element will not appear on its position
Why I think so:
It definitely isn't a Bubble Sort because it compares the first two elements of the array beginning so, the first step should be 2 5 6 1 3 4
It isn't a Insertion Sort because it's a sequential algorithm. In the first step we see that compared the first and the last element
It isn't a Selection Sort because it find the lowest value and move it to the top so, the first step should be 1 5 2 6 3 4
It isn't a Merge Sort because the array is divided into two subarrays. In this case we see interaction "first" and "second" parts
None of them.
bubble sort: no. After k steps, the last k elements should be the k largest, sorted.
insertion sort: no. After k steps, the k first elements should be sorted.
selection sort: no. After k steps, the k first elements should be the s smallest, sorted.
merge sort: no. After k steps, a value can only have moved 2^k - 1 places. (5 moves 5 places at k=1)
quick sort: no. Whatever the pivot is, 1 and 6 being the extreme values, they can stay in this initial position.
On the quick sort: To make it clear that it is not possible, lets enumerate the results of each pivot for the first step:
5 : [2134] - 5 - [6]. (2134 may be in any order)
2 : [1] - 2 - [5634]
6 : [52134] - 6
1 : 1 - [52634]
3 : [21] - 3 - [564]
4 : [213] - 4 - [56]
One obvious way of seeing that all those are incompatible with the OP's output is that in each case, the 1 is before the 6, no matter how you implement the pivot or the partition.
To solve this all you have to do is make a function for each sort algorithm but include a statement to print the array out after each swap. Then apply your print friendly sort algorithms to the initial array [5 2 6 1 3 4] and see which sort method produces the same output. Additionally, this will help you compare all the different methods.

Find the number of non-decreasing and non-increasing subsequences in an array

I am attempting to complete a programming challenge from Quora on HackerRank: https://www.hackerrank.com/contests/quora-haqathon/challenges/upvotes
I have designed a solution that works with some test cases, however, for many the algorithm that I am using is incorrect.
Rather than seeking a solution, I am simply asking for an explanation to how the subsequence is created and then I will implement a solution myself.
For example, with the input:
6 6
5 5 4 1 8 7
the correct output is -5, but I fail to see how -5 is the answer. The subsequence would be [5 5 4 1 8 7] and I cannot for the life of me find a means to get -5 as the output.
Problem Statement
At Quora, we have aggregate graphs that track the number of upvotes we get each day.
As we looked at patterns across windows of certain sizes, we thought about ways to track trends such as non-decreasing and non-increasing subranges as efficiently as possible.
For this problem, you are given N days of upvote count data, and a fixed window size K. For each window of K days, from left to right, find the number of non-decreasing subranges within the window minus the number of non-increasing subranges within the window.
A window of days is defined as contiguous range of days. Thus, there are exactly N−K+1 windows where this metric needs to be computed. A non-decreasing subrange is defined as a contiguous range of indices [a,b], a<b, where each element is at least as large as the previous element. A non-increasing subrange is similarly defined, except each element is at least as large as the next. There are up to K(K−1)/2 of these respective subranges within a window, so the metric is bounded by [−K(K−1)/2,K(K−1)/2].
Constraints
1≤N≤100,000 days
1≤K≤N days
Input Format
Line 1: Two integers, N and K
Line 2: N positive integers of upvote counts, each integer less than or equal to 10^9
Output Format
Line 1..: N−K+1 integers, one integer for each window's result on each line
Sample Input
5 3
1 2 3 1 1
Sample Output
3
0
-2
Explanation
For the first window of [1, 2, 3], there are 3 non-decreasing subranges and 0 non-increasing, so the answer is 3. For the second window of [2, 3, 1], there is 1 non-decreasing subrange and 1 non-increasing, so the answer is 0. For the third window of [3, 1, 1], there is 1 non-decreasing subrange and 3 non-increasing, so the answer is -2.
Given a window size of 6, and the sequence
5 5 4 1 8 7
the non-decreasing subsequences are
5 5
1 8
and the non-increasing subsequences are
5 5
5 4
4 1
8 7
5 5 4
5 4 1
5 5 4 1
So that's +2 for the non-decreasing subsequences and -7 for the non-increasing subsequences, giving -5 as the final answer.

Count the coins including permutations of the same sequence

I've found a code to find number of possibilities to make change using given coins: How to count possible combination for coin problem. But how to count it, if we think about different permutations of the same sequence? I mean that, e.g. amount is 12, and "4 4 2 2" and "4 2 4 2" should be counted as 2, not 1.
As you've mentioned inside your question you can count the possible combinations as stated in How to count possible combination for coin problem. But in order to include the permutations into your answer:
If you distinguish the permutation of the same numbers [1 7 7] and [1 7 7] e.g. just count each sequence([1 7 7] here) as n! (n = # of elements in the sequence) [instead of 1]
Otherwise : multiply each sequence by n!/(m!l!...) where m = number of equal elements of type 1, l is number of equal elements of type 2 and so on... . For example for sequence like [a b b c c c] you should count this 6!/(2!*3!) [instead of 1]
So use the algorithm inside that link, that I don't repeat again, but just instead of counting each combination as 1 use the formula that I said (depending on the case you desire).
(! is factorial.)

Resources