Minimum amount of swaps needed so no number has two neighbours that are both greater..? - algorithm

The problem statement goes like this: Given a list of N < 500 000 distinct numbers, find the minimum number of swaps of adjacent elements required such that no number has two neighbours that are both greater. A number can only be swapped with a neighbour.
Hint given: Use a segment tree or a fenwick tree.
I don't really get the idea of how I should use a sum-tree to solve this problem.
Example inputs:
Input 1:
5 (amount of elements in the list)
3 1 4 2 0
output 1: 1
input 2:
6
4 5 2 0 1 3
output 2: 4

I can do it in O(n log n) time and O(n) extra space. But first let's look at the quadratic solution I've hinted at earlier:
initialize the result accumulator to 0
while the input list has more than two elements
find the lowest element in the list
add its distance from the closer end of the list to the accumulator
remove the element from the list.
output the accumulator.
Why does this work? First, Let's look at how a sequence that requires zero swap looks like. Since there are no duplicates, if the lowest element is anywhere but at either end, it is surrounded by two elements that are both greater, violating the requirement, thus the lowest element must be at one of the ends. Recurse into the subsequence that excludes this element. To bring a sequence into this state: at least as many swaps involving the lowest element as in the greedy algorithm are required to move the lowest element to one end, and since swaps involving the lowest element do not change the relative ordering of the rest, there is no penalty to reordering them to the front either.
Unfortunately, just implementing this with a list is quadratic. How do you make it faster? Have a finger tree that tracks the subtree weight and minimum value of each subtree and update these as you remove individual minima:
To initialize the tree: First, think of each element in the list as a one-element sublist with its minimum equal to its value. Then, while you have more than one sublist, group the subsequences in pairs, building a tree of subsequences. The length of a sequence is the sum of lengths of both its halves, and its minimum is equal to whichever minimum from both halves is lower.
To remove the minimum from a subsequence while tracking its index in the sequence:
Decrease the length of the subsequence
Remove the minimum from whichever half's minimum is equal to this subsequence minimum
the new minimum is the lower of its halves' new minima
The index of the minimum is equal to its index in its respective half, plus the length of the left half if the minimum was in the right half.
The distance from one end is then equal to either the index or (length before removal - index - 1), whichever is lower.

Related

Finding the medians of multiple subarrays in an unsorted array

Suppose you are given an unsorted array of integers S and a list of ranges in T, return a list of medians from each of the ranges.
For example, S = [3,6,1,5,0,0,1,-2], T = [[1,3],[0,5],[4,4]]. Return [5, 2, 0].
Is there a better approach than running Median of Medians on each range? Can we somehow precompute/cache the results?
Let me introduce you to an interesting data structure called Wavelet Tree:
You build it by looking at the bit-string representation of your integers and recursively bisecting them:
You first separate your integers into those with most significant bit (MSB) 0 and those with MSB 1. However you store the MSBs in their original order in a bitvector. Then for each of these subsets of integers, you ignore the MSB and recursively repeat this construction for the next-most significant bit.
If you repeat this down to the least significant bit, you get a tree structure like this (note that the indices are just there for illustration, you should store only the bitvectors):
You can easily see that the construction of this data structure takes O(n log N) time where n is the number of integers and N is their maximum value.
Wavelet trees have the nice property that they represent the original sequence as well as their sorted counterpart at the same time:
If you read the topmost bitvector, you get the MSBs of the input sequence. To reconstruct the next bit of the entries, you can alternate between looking in the bitvector in the root's left child (if the MSB is 0) or in the right child (if the MSB is 1). For the following bits, you can continue recursively.
If you read the leaf nodes from left to right, you get the sorted sequence.
To use a Wavelet tree efficiently, you need two fundamental operations on the bitvectors:
rank1(k) tells you how many 1s come before the kth position in the bitvector, rank0 does the same for 0s
select1(k) tells you the index of the kth 1 in the bitvector, select0 does the same for 0s
Note that there are bitvector representations that require only o(n) (small o) bits of additional storage to implement these operations in O(1)
You can utilize them as follows:
If you are looking at the first 7 in the sequence above, it has index 3. If you now want to know which index it has in the right child node, you simply call rank1(3) on the root bitvector and get 2, which is exactly the index of the first 7 in the right child
If you are at the child containing 4544 and want to know the position of the second 4 (with index 2) in the parent node containing 46754476, you call select0(2) on the parent's bitvector and get the index 5.
Now how can you implement a range median query with this? The most important realization you need to make is that finding the median of a range of size k is equivalent to selecting the k/2 th element.
The basic idea of the algorithm is similar to Quickselect: Bisect the element range and recurse only into the range containing the element you are looking for.
Let's say we want to find the median of the range starting at the second 2 (inclusive) and ending at the 1 (exclusive).
These are 7 elements, thus the median has rank 4 (fourth-smallest element) in that range.
Now using a rank0/1 call in the root bitvector at the beginning and end of this range, we find the corresponding ranges in the children of the root:
As you can see, the left range (which contains only smaller elements) has only 3 elements, thus the element with rank 4 must be contained in the right child of the root. We can now recursively search for the element with rank 4 - 3 = 1 in that right child. By recursively descending the wavelet tree until you reach a leaf, you can thus identify the median with only two rank operations (à O(1) time) per level of the Wavelet tree, thus the whole range median query takes O(log N) time where N is the maximum number in your input sequence.
If you want to see a practical implementation of these Wavelet trees, have a look at the Succinct Data Structures Library (SDSL) which implements the aforementioned bitvectors and different WT variants.

Find the minimum two non edge, non adjacent entries in an array

I had the following question
Find the smallest two nonadjacent values in an array, such that non of these elements is on the array edge (no A[0] and no A[n-1])
The runtime of the algorithm should be O(n)
I first thought about sorting the array, but sorting would cost O(nlogn)
Ignoring this fact for a second, if we sort the array, we can not just take the first two values, since they might violate the conditions mentioned above? and then what? take the next element and try, if not take the next, I can't see an easy solution there
Another solution is to generate all allowed pairs and find the pair with the minimum sum. But finding all pairs cost O(n^2)
Any ideas?
In linear time, find the ith smallest entry (excluding the first and last) for i from 1 to 4. The best possibility is a pair of these. If 1 and 2 are nonadjacent, then that's the best. Otherwise, if 1 and 3 are nonadjacent, then that's the best. Otherwise, 2 and 3 are bordering 1 (hence not each other), and the possibilities are 1 and 4, or 2 and 3.
You could go with your sort first, then, unless I am missing something, take elements 0 and 2. Those would be the smallest non-adjacent values.
As long as the array is 3 elements or greater you should be assured that the element values in position 0 and 2 are the smallest (and if you need them to be, non-consecutive) as well as non-adjacent in the array.
If your array is sorted, you would only have to keep comparing alternate elements (like indices (0,2), (1,3), (2,5)) and so on and then find the pair with the smallest difference. But without sorting, you are right in saying that the run time complexity would then become O(n^2) as you would have to compare every element with every other element in the array.

The minimum number of "insertions" to sort an array

Suppose there is an unordered list. The only operation we can do is to move an element and insert it back to any place. How many moves does it take to sort the whole list?
I guess the answer is size of the list - size of longest ordered sequence, but I have no idea how to prove it.
First note that moving an element doesn't change relative order of elements other than the one being moved.
Consider the longest non-decreasing subsequence (closely related to the longest increasing subsequence - the way to find them are similar).
By only moving the element not in this sequence, it's easy to see that we'd end up with a sorted list, since all the elements in this sequence are already sorted relative to each other.
If we don't move any elements in this sequence, any other element between two elements in this subsequence is guaranteed to be either greater than the larger element, or smaller than the smaller one (if this is not true, it itself would be in the longest sequence), so it needs to be moved.
(see below for example)
Does it need to be non-decreasing? Yes. Consider if two consecutive elements in this sequence are decreasing. In this case it would be impossible to sort the list without moving those two elements.
To minimize the number of moves required, it's sufficient to pick the longest sequence possible, as done above.
So the total number of moves required is the size of the list minus the size of the longest non-decreasing subsequence.
An example explaining the value of an element not in the non-decreasing subsequence mentioned above:
Consider the longest non-decreasing subsequence 1 x x 2 x x 2 x 4 (the x's are some elements not part of the sequence).
Now consider the possible values for an x between 2 and 4.
If it's 2, 3 or 4, the longest subsequence would include that element. If it's greater than 4 or smaller than 2, it needs to be moved.
It is easy to prove that size of the list - size of longest ordered sequence is enough always to sort any sequence, e.g. with mathematical induction.
You can also easily prove that for some unordered sequences, it is the best what you can do by simply finding such trivial sequence. E.g. to sort the sequence 3, 1, 2 you need one move of one item (3) and it's trivial to see that it cannot be made faster, and obviously size of the list - size of longest ordered sequence is equal to 1.
However, proving that it is the best for all sequences is impossible because that statement is not actually true: A counter example is a sequence with multiple sorted sub-sequences S[i], where max(S[i]) < min(S[i+1]) for every i. For example see the sequence 1, 2, 3, 1000, 4, 5, 6.

Algorithm to generate a 'nearly sorted' or 'k sorted' list?

I want to generate some test data to test a function that merges 'k sorted' lists (lists where each element is at most k positions away from it's correct sorted position) into a single fully sorted list. I have an approach that works but I'm not sure how well randomized it is and I feel there should be a simpler / more elegant way to do this. My current approach:
Generate n random elements paired with an integer index.
Sort random elements.
Set paired index for each element to its sorted position.
Work backwards through the elements, swapping each element with an element a random distance between 1 and k positions behind it in the list. Only swap with the target element if its paired index is its current index (this avoids swapping an element that is already out of place and moving it further than k positions away from where it should be).
Copy the perturbed elements out into another list.
Like I say, this works but I'm interested in alternative / better approaches.
I think you could just fill an array with random integers and then run quicksort on it with a custom stopping condition.
If in a particular quicksort recursion your start and end indexes are less than k apart, then just return instead of continuing to recur.
Because of how quicksort works, every number in the start..end interval belongs somewhere in that region; worst case is that array[start] might really belong at array[end] (or vice versa) in truly sorted order. So, assuring that start and end are no more than k apart should be sufficient.
You can generate array of random numbers and then h-sort it like in shellsort, but without fiew last sorting steps when h is less then k.
Step 1: Randomly permute disjoint segments of length k. (Eg. 1 to K, k+1 to 2k ...)
Step 2: Permute conditionally again by swapping (that they don't break k-sorted assumption (1+t yo k+t, k+1+t to 1+2k+t ...) where t is a number between 1 and k (most preferably k/2)
Probably repeat step 2 multiple times with different t.
If I understand the problem, you want an algorithm to randomly pick a single k-sorted list of length n, uniformly selected from the universe U of all k-sorted lists of length n. (You will then run this algorithm m times to produce m lists as input test data.)
The first step is to count them. What is the size of U? |U|
The next step is to enumerate them. Create any one-to-one mapping F between the integers (1,2,...,|U|) and k-sorted lists of length n.
Then randomly select an integer x between 1 and |U| inclusive, and then apply F(x) to get the list.

Sort sub array minimaly

This is an interview question.
Given an array of integers, write a method to find indices m and n such that if you sorted elements m through n, the entire array would be sorted. Minimize n-m. i.e. find smallest sequence.
Observation
The integers before m should be ascending and smaller than (or equal to) any integers after.
Algorithm
Start from the first element, and stop upon first decreasing. (sub array SA)
Find minimum after. (MIN)
The start point is just after the maximum integer in SA that is smaller than (or equal to) MIN. (m is found)
Complexity
O(N)
Do similar for n.
You need to keep track of four things:
End of sorted region at the beginning
Start of sorted region at the end
Minimum number after the beginning region
Maximum number before the end region
Start by figuring out a preliminary value for 1 and 2, by scanning the array from the start and from the end until you find a misplaced value.
Then you scan everything after your preliminary 1, to find the minimum number. This is your 3. Find 4 in the same way.
Now you backtrack trough the start region of the array until you find the place where the minimum value should be. This is the exact answer to 1 and also your m.
Find n in the same way by backtracking through the end region to find where the maximum number should be.

Resources