Find the swapped nodes in binary search tree - algorithm

Two of the nodes of a Binary Search Tree are swapped.
Input Tree:
10
/ \
5 8
/ \
2 20
In the above tree, nodes 20 and 8 must be swapped to fix the tree.
Output tree:
10
/ \
5 20
/ \
2 8
I followed the solution given in here. But I feel the solution is incorrect because:
As per the site:
The swapped nodes are not adjacent in the inorder traversal of the BST.
For example, Nodes 5 and 25 are swapped in {3 5 7 8 10 15 20 25}.
The inorder traversal of the given tree is 3 25 7 8 10 15 20 5 If we
observe carefully, during inorder traversal, we find node 7 is smaller
than the previous visited node 25. Here save the context of node 25
(previous node). Again, we find that node 5 is smaller than the
previous node 20. This time, we save the context of node 5 ( current
node ). Finally swap the two node’s values.
So my point is if it is considering 25 because it is greater than 7 than it should consider 20 as well because it is also greater than 5. So is this correct solution or I am missing something?

Yes. It is considering 25 because it is greater than 7. But, it should not consider 20 as well because it is also greater than 5. Instead, it should consider 5 because it is less than 20.
This example is not very good, because the position of 5 in the original array is the last one. Let's consider a sorted array {1, 2, 3, 4, 5}. Swap 2 and 4, then we get {1, 4, 3, 2, 5}. If two elements (not adjacent) in a sorted array is swapped, for all pairs like (A[i], A[i+1]), there will be exactly two pairs that is in wrong order, namely descending order. In the case of {1, 4, 3, 2, 5}, we have pair (4, 3), and pair (3, 2). Suppose we have pair (A[p], A[p+1]) and pair (A[q], A[q+1]), such that A[p] > A[p+1] and A[q] > A[q+1], we can claim that it is A[p] and A[q+1] being swapped. In the case of {1, 4, 3, 2, 5}, it is 4 and 2 being swapped.
Now come back to the example 3 25 7 8 10 15 20 5, in which 25, 7 and 20 5 are the only two pairs in wrong order. Then 25 and 5 are the two elements being swapped.

Following #jeffreys' notation,
if we have pair (A[p], A[p+1]) and pair (A[q], A[q+1]), such that A[p] > A[p+1] and A[q] > A[q+1], we can claim that it is A[p] and A[q+1] being swapped
You know that there's only a single swap, that would create either 2 discrepancies in the sorted order, or only one if they're adjacent. Let's say p < q, so the A[p],A[p+1] is the first descending pair, and the q's are the second.
If there's no second couple, than swapping the first couple would fix the tree, that's the easy part. Otherwise we know there are two non-adjacent nodes.
Out of the A[p] and A[p+1] let's say that A[p+1] was the one out of place. Since this is the first couple we would have to move A[p+1] forward towards the second couple, but that means that it's still going to be smaller than the earlier A[p] that stayed in place, so we would not create a sorted array. We must therefore chose A[p].
Same goes for the A[q] and A[q+1], let's say that A[q] was out of place, that means we'll have to move it backwards, and it would still be larger than A[q+1] appearing later, again breaking sort.

Related

How do i derive an expression for the worst case number of comparisons needed to merge two sorted arrays of length n/2

merge sort uses divide and conquer approach
The worst case number of comparisons needed to merge two sorted arrays of length 𝑛/2 is π‘›βˆ’1.
This is because when two values have been compared with each other, one of those two values will never be used in a comparison again: it is the value that follows the sorted order... and it is no longer part of the rest of the merge process.
In the worst case, the last comparison will be between the last value of the first array and the last value of the second array. All the other (π‘›βˆ’2) values were already excluded from further comparisons, so that means we already did π‘›βˆ’2 comparisons. Now the last one executes, which completes the comparison count to be π‘›βˆ’1.
Example of a worst case input
𝑛 = 10
A = [0, 2, 4, 6, 8]
B = [1, 3, 5, 7, 9]
Comparisons during the merge:
0 1
2 1
2 3
4 3
4 5
6 5
6 7
8 7
8 9
Example of a best case input
Just to put this in perspective, the best case occurs when the last value of the first array is less than the first value of the second array (or vice versa). This only needs 𝑛 / 2 comparisons.
𝑛 = 10
A = [0, 1, 2, 3, 4]
B = [5, 6, 7, 8, 9]
Comparisons during the merge:
0 5
1 5
2 5
3 5
4 5
No more comparisons are now needed, because the first list has no more values; only the values of the second list remain: but they don't need to be compared with anything, and can be appended to the result in the order they currently occur.
Remark
The fact that this process is used as part of merge sort -- implementing a divide and conquer algorithm -- is just background info. Merging two sorted lists can be needed in other contexts that do not relate to merge sort.

Convert the permutation sequence A to B by selecting a set in A then reversing that set and inserting that set at the beginning of A

Given the sequence A and B consisting of N numbers that are permutations of 1,2,3,...,N. At each step, you choose a set S in sequence A in order from left to right (the numbers selected will be removed from A), then reverse S and add all elements in S to the beginning of the sequence A. Find a way to transform A into B in log2(n) steps.
Input: N <= 10^4 (number of elements of sequence A, B) and 2 permutations sequence A, B.
Output: K (Number of steps to convert A to B). The next K lines are the set of numbers S selected at each step.
Example:
Input:
5 // N
5 4 3 2 1 // A sequence
2 5 1 3 4 // B sequence
Output:
2
4 3 1
5 2
Step 0: S = {}, A = {5, 4, 3, 2, 1}
Step 1: S = {4, 3, 1}, A = {5, 2}. Then reverse S => S = {1, 3, 4}. Insert S to beginning of A => A = {1, 3, 4, 5, 2}
Step 2: S = {5, 2}, A = {1, 3, 4}. Then reverse S => S = {2, 5}. Insert S to beginning of A => A = {2, 5, 1, 3, 4}
My solution is to use backtracking to consider all possible choices of S in log2(n) steps. However, N is too large so is there a better approach? Thank you.
For each operation of combined selecting/removing/prepending, you're effectively sorting the elements relative to a "pivot", and preserving order. With this in mind, you can repeatedly "sort" the items in backwards order (by that I mean, you sort on the most significant bit last), to achieve a true sort.
For an explicit example, lets take an example sequence 7 3 1 8. Rewrite the terms with their respective positions in the final sorted list (which would be 1 3 7 8), to get 2 1 0 3.
7 -> 2 // 7 is at index 2 in the sorted array
3 -> 1 // 3 is at index 0 in the sorted array
1 -> 0 // so on
8 -> 3
This new array is equivalent to the original- we are just using indices to refer to the values indirectly (if you squint hard enough, we're kinda rewriting the unsorted list as pointers to the sorted list, rather than values).
Now, lets write these new values in binary:
2 10
1 01
0 00
3 11
If we were to sort this list, we'd first sort by the MSB (most significant bit) and then tiebreak only where necessary on the subsequent bit(s) until we're at the LSB (least significant bit). Equivalently, we can sort by the LSB first, and then sort all values on the next most significant bit, and continuing in this fashion until we're at the MSB. This will work, and correctly sort the list, as long as the sort is stable, that is- it doesn't change the order of elements that are considered equal.
Let's work this out by example: if we sorted these by the LSB, we'd get
2 10
0 00
1 01
3 11
-and then following that up with a sort on the MSB (but no tie-breaking logic this time), we'd get:
0 00
1 01
2 10
3 11
-which is the correct, sorted result.
Remember the "pivot" sorting note at the beginning? This is where we use that insight. We're going to take this transformed list 2 1 0 3, and sort it bit by bit, from the LSB to the MSB, with no tie-breaking. And to do so, we're going to pivot on the criteria <= 0.
This is effectively what we just did in our last example, so in the name of space I won't write it out again, but have a look again at what we did in each step. We took the elements with the bits we were checking that were equal to 0, and moved them to the beginning. First, we moved 2 (10) and 0 (00) to the beginning, and then the next iteration we moved 0 (00) and 1 (01) to the beginning. This is exactly what operation your challenge permits you to do.
Additionally, because our numbers are reduced to their indices, the max value is len(array)-1, and the number of bits is log2() of that, so overall we'll only need to do log2(n) steps, just as your problem statement asks.
Now, what does this look like in actual code?
from itertools import product
from math import log2, ceil
nums = [5, 9, 1, 3, 2, 7]
size = ceil(log2(len(nums)-1))
bit_table = list(product([0, 1], repeat=size))
idx_table = {x: i for i, x in enumerate(sorted(nums))}
for bit_idx in range(size)[::-1]:
subset_vals = [x for x in nums if bit_table[idx_table[x]][bit_idx] == 0]
nums.sort(key=lambda x: bit_table[idx_table[x]][bit_idx])
print(" ".join(map(str, subset_vals)))
You can of course use bitwise operators to accomplish the bit magic ((thing << bit_idx) & 1) if you want, and you could del slices of the list + prepend instead of .sort()ing, this is just a proof-of-concept to show that it actually works. The actual output being:
1 3 7
1 7 9 2
1 2 3 5

Having trouble understanding the K-way merge algorithm (Counter example given)

In K way merge sort, the solution that uses a heap: essentially maintains a heap and constantly extracts max from that heap. I have a counterexample for why this won't work well.
5 -> 1 -> 0
4 -> 2 -> 1
3 -> 2 -> 0
Suppose we initialize our heap. It contains {5, 4, 3}.
We run extract max, we obtain 5 and add that into our new list (that represents the final solution). Our heap now looks like {4,3}. We then refill our heap with the head of list that we extracted the max element from.
This implies that we get something like this: {4, 3, 1}.
This doesn't make sense to me. This heap doesn't represent the top K elements anymore. 1 shouldn't be used to refill the heap, it should have been 2. So, this O(nlgk) method doesn't make much sense to me.
I hope someone can shed light on how this algorithm works because I'm stuck here.
The max heap always contains the max elements of k lists (or arrays). For your 'counter' example:
5 -> 1 -> 0
4 -> 2 -> 1
3 -> 2 -> 0
The heap is {5, 4, 3} contains max elements of these three lists.
Now you extract 5 from the heap, means you also remove 5 from the first list:
5-->1-->0: after extract 5, the list now is 1-->0: so 1 now is the top of the list.
Then the new heap is {4, 3, 1}, still contains max elements of lists.
Lets continue your example: the current heap after extracting 5 and heapifying is:
{4, 3, 1}
Extract 4 from the heap, means you also remove 4 from:
4-->2-->1: remove 4 you have 2-->1. 2 now is the top element of the list.
Then a new heap now is
{3, 2, 1}
Keep doing this, you get what you want (descending list).

Find the number of non-decreasing and non-increasing subsequences in an array

I am attempting to complete a programming challenge from Quora on HackerRank: https://www.hackerrank.com/contests/quora-haqathon/challenges/upvotes
I have designed a solution that works with some test cases, however, for many the algorithm that I am using is incorrect.
Rather than seeking a solution, I am simply asking for an explanation to how the subsequence is created and then I will implement a solution myself.
For example, with the input:
6 6
5 5 4 1 8 7
the correct output is -5, but I fail to see how -5 is the answer. The subsequence would be [5 5 4 1 8 7] and I cannot for the life of me find a means to get -5 as the output.
Problem Statement
At Quora, we have aggregate graphs that track the number of upvotes we get each day.
As we looked at patterns across windows of certain sizes, we thought about ways to track trends such as non-decreasing and non-increasing subranges as efficiently as possible.
For this problem, you are given N days of upvote count data, and a fixed window size K. For each window of K days, from left to right, find the number of non-decreasing subranges within the window minus the number of non-increasing subranges within the window.
A window of days is defined as contiguous range of days. Thus, there are exactly Nβˆ’K+1 windows where this metric needs to be computed. A non-decreasing subrange is defined as a contiguous range of indices [a,b], a<b, where each element is at least as large as the previous element. A non-increasing subrange is similarly defined, except each element is at least as large as the next. There are up to K(Kβˆ’1)/2 of these respective subranges within a window, so the metric is bounded by [βˆ’K(Kβˆ’1)/2,K(Kβˆ’1)/2].
Constraints
1≀N≀100,000 days
1≀K≀N days
Input Format
Line 1: Two integers, N and K
Line 2: N positive integers of upvote counts, each integer less than or equal to 10^9
Output Format
Line 1..: Nβˆ’K+1 integers, one integer for each window's result on each line
Sample Input
5 3
1 2 3 1 1
Sample Output
3
0
-2
Explanation
For the first window of [1, 2, 3], there are 3 non-decreasing subranges and 0 non-increasing, so the answer is 3. For the second window of [2, 3, 1], there is 1 non-decreasing subrange and 1 non-increasing, so the answer is 0. For the third window of [3, 1, 1], there is 1 non-decreasing subrange and 3 non-increasing, so the answer is -2.
Given a window size of 6, and the sequence
5 5 4 1 8 7
the non-decreasing subsequences are
5 5
1 8
and the non-increasing subsequences are
5 5
5 4
4 1
8 7
5 5 4
5 4 1
5 5 4 1
So that's +2 for the non-decreasing subsequences and -7 for the non-increasing subsequences, giving -5 as the final answer.

Sort array by pairwise difference

For example we have the array X[n] = {X0, X1, X2, ... Xn}
The goal is to sort this array that the difference between every pair is in ascending order.
For example X[] = {10, 2, 7, 4}
Answers are:
2 7 10 4
4 10 7 2
I have some code but it's brute force :)
#include <stdio.h>
int main(int argc, char **argv)
{
int array[] = { 10, 2, 7, 4 };
int a[4];
for(int i = 0; i < 4; i++){
a[0] = array[i];
for(int j = 0; j < 4; j++){
a[1] = array[j];
if(a[0] == a[1])
continue;
for(int k = 0; k < 4; k++){
a[2] = array[k];
if(a[0] == a[2] || a[1] == a[2])
continue;
for(int l = 0; l < 4; l++){
a[3] = array[l];
if(a[0] == a[3] || a[1] == a[3] || a[2] == a[3])
continue;
if(a[0] - a[1] < a[1] - a[2] && a[1] - a[2] < a[2] - a[3])
printf("%d %d %d %d\n", a[0], a[1], a[2], a[3]);
}
}
}
}
return 0;
}
Any idea for "pretty" algorithm ? :)
DISCLAIMER This solution will arrange items to difference grow by absolute value. Thx to #Will Ness
One solutions according to the difference between every pair is in ascending order requirement.
You just sort array in ascending order O(n)*log(n) and then start in the middle. And the you arrange elements like this :
[n/2, n/2+1, n/2-1, n/2+2, n/2-2, n/2+3 ...] you go to +1 first if more element are on the right side of (n/2)th element
[n/2, n/2-1, n/2+1, n/2-2, n/2+2, n/2-3 ...] you go to -1 first otherwise.
Here you get ascending pairwise difference.
NOTE!!! It is not guaranteed that this algo will find the smallest difference and start with it, but I do not see this is requirements.
Example
Sorted array: {1, 2, 10, 15, 40, 50, 60, 61, 100, 101}
Then, you pick 50 (as 10/2 = 5th), 60 (10/2+1 = 6), 40 and so on...
You'll get: {40, 50, 15, 60, 10, 61, 2, 100, 1, 101}
Which got you diffs: 10, 35, 45, 50, 51, 59, 88, 99, 100
Let's see. Your example array is {10,2,7,4} and the answers you show are:
2 7 10 4
5 3 -6 differences, a[i+1] - a[i]
4 10 7 2
6 -3 -5
I show the flipped differences here, it's easier to analyze that way.
So, the goal is to have the differences a[i+1] - a[i] in descending order. Obviously some positive difference values will go first, then some negative. This means the maximal element of the array will appear somewhere in the middle. The positive differences to the left of it must be in descending order of absolute value, and the negatives to the right - in ascending order of absolute value.
Let's take another array as an example: {4,8,20,15,16,1,3}. We start by sorting it:
1 3 4 8 15 16 20
2 1 4 7 1 4 differences, a[i+1] - a[i]
Now, 20 goes in the middle, and after it to the right must go values progressively further apart. Since the differences to the left of 20 in the solution are positive, the values themselves are ascending, i.e. sorted. So whatever's left after we pick some of them to move to the right of the maximal element, stays as is, and the (positive) differences must be in descending order. If they are, the solution is found.
Here there are no solutions. The possibilities are:
... 20 16 8 (no more) left: 1 3 4 15 (diffs: 2 1 11 5)
... 20 16 4 (no more) left: 1 3 8 15 (diffs: 2 5 7 5)
... 20 16 3 (no more) left: 1 4 8 15 (diffs: 3 4 7 5)
... 20 16 1 (no more) left: 3 4 8 15 ....................
... 20 15 8 (no more) left: 1 3 4 16
... 20 15 4 (no more) left: 1 3 8 16
... 20 15 3 (no more) left: 1 4 8 16
... 20 15 1 (no more) left: 3 4 8 16
... 20 8 (no more) left: 1 3 4 15 16
... 20 4 (no more) left: 1 3 8 15 16
... 20 3 (no more) left: 1 4 8 15 16
... 20 1 (no more) left: 3 4 8 15 16
... 20 (no more) left: 1 3 4 8 15 16
Without 1 and 3, several solutions are possible.
Solution for this problem is not always possible. For example, array X[] = {0, 0, 0} cannot be "sorted" as required because both differences are always equal.
In case this problem has a solution, array values should be "sorted" as shown on the left diagram: some subset of the values in ascending order should form prefix of the resulting array, then all the remaining values in descending order should form its suffix. And "sorted" array should be convex.
This gives a hint for an algorithm: sort the array, then split its values into two convex subsets, then extract one of these subsets and append it (in reverse order) at the end.
A simple (partial) implementation would be: sort the array, find a subset of values that belong to convex hull, then check all the remaining values, and if they are convex, append them at the end. This algorithm works only if one of the subsets lies completely below the other one.
If the resulting subsets intersect (as shown on the right diagram), an improved version of this algorithm may be used: split sorted array into segments where one of the subsets lies completely below other one (A-B, B-C), then for each of these segments find convex hull and check convexity of the remaining subset. Note that X axis on the right diagram corresponds to the array indexes in a special way: for subset intersections (A, B, C) X corresponds to an index in ascending-sorted array; X coordinates for values between intersections are scaled according to their positions in the resulting array.
Sketch of an algorithm
Sort the array in ascending order.
Starting from the largest value, try adding convex hull values to the "top" subset (in a way similar to Graham scan algorithm). Also put all the values not belonging to convex hull to the "bottom" subset and check its convexity. Continue while all the values properly fit to either "top" or "bottom" subset. When the smallest value is processed, remove one of these subsets from the array, reverse the subset, and append at the and of the array.
If after adding some value to the "top" subset, the "bottom" subset is not convex anymore, rollback last addition and check if this value can be properly added to the "bottom" subset. If not, stop, because input array cannot be "sorted" as required. Otherwise, exchange "top" and "bottom" subsets and continue with step 2 (already processed values should not be moved between subsets, any attempt to move them should result in going to step 3).
In other words, we could process each value of sorted array, from largest to smallest, trying to append this value to one of two subsets in such a way that both subsets stay convex. At first, we try to place a new value to the subset where previous value was added. This may make several values, added earlier, unfit to this subset - then we check if they all fit to other subset. If they do - move them to other subset, if not - leave them in "top" subset but move current value to other subset.
Time complexity
Each value is added or removed from "top" subset at most once, also it may be added to "bottom" subset at most once. And for each operation on an element we need to inspect only two its nearest predecessors. This means worst-case time complexity of steps 2 and 3 is O(N). So overall time complexity is determined by the sorting algorithm on step 1.

Resources