Sort a given array based on parent array using only swap function - algorithm

It is a coding interview question. We are given an array say random_arr and we need to sort it using only the swap function.
Also the number of swaps for each element in random_arr are limited. For this you are given an array parent_arr, containing number of swaps for each element of random_arr.
Constraints:
You should use swap function.
Every element may repeat minimum 5 times and maximum 26 times.
You cannot make elements of given array to 0.
You should not write helper functions.
Now I will explain how parent_arr is declared. If parent_arr is like:
parent_arr[] = {a,b,c,d,...,z} then
a can be swapped at most one time.
b can be swapped at most two times.
if parent_arr[] = {c,b,a,....,z} then
c can be swapped at most one time.
b can be swapped at most two times.
a can be swapped at most three times
My solution:
For each element in random_arr[] store that how many elements are below it, if it is sorted. Now select element having minimum swap count from parent_arr[] and check whether it exist in random_arr[]. If yes and it if has occurred more than one time then it will have more than one location where it can be placed. Now choose the position(rather element at that position, preciously) with maximum swap count and swap it. Now decrease the swap count for that element and sort the parent_arr[] and repeat the process.
But it is quite inefficient and its correctness can't be proved. Any ideas?

First, let's simplify your algorithm; then let's informally prove its correctness.
Modified algorithm
Observe that once you computed the number of elements below each number in the sorted sequence, you have enough information to determine for each group of equal elements x their places in the sorted array. For example, if c is repeated 7 times and has 21 elements ahead of it, then cs will occupy the range [21..27] (all indexes are zero-based; the range is inclusive of its ends).
Go through the parent_arr in the order of increasing number of swaps. For each element x, find the beginning of its target range rb; also note the end of its target range re. Now go through the elements of random_arr outside of the [rb..re] range. If you see x, swap it into the range. After swapping, increment rb. If you see that random_arr[rb] is equal to x, continue incrementing: these xs are already in the right spot, you wouldn't need to swap them.
Informal proof of correctness
Now lets prove the correctness of the above. Observe that once an element is swapped into its place, it is never moved again. When you reach an element x in the parent_arr, all elements with lower number of swaps are already processed. By construction of the algorithm this means that these elements are already in place. Suppose that x has k number of allowed swaps. When you swap it into its place, you move another element out.
This replaced element cannot be x, because the algorithm skips xs when looking for a destination in the target range [rb..re]. Moreover, the replaced element cannot be one of elements below x in the parent_arr, because all elements below x are in their places already, and therefore cannot move. This means that the swap count of the replaced element is necessarily k+1 or more. Since by the time that we finish processing x we have exhausted at most k swaps on any element (which is easy to prove by induction), any element that we swap out to make room for x will have at least one remaining swap that would allow us to swap it in place when we get to it in the order dictated by the parent_arr.

Related

Looking for a limited shuffle algorithm

I have a shuffling problem. There is lots of pages and discussions about shuffling a array of values completely, like a stack of cards.
What I need is a shuffle that will uniformly displace the array elements at most N places away from its starting position.
That is If N is 2 then element I will be shuffled at most to a position from I-2 to I+2 (within the bounds of the array).
This has proven to be tricky with some simple solutions resulting in a directional bias to the element movement, or by a non-uniform amount.
You're right, this is tricky! First, we need to establish some more rules, to ensure we don't create artificially non-random results:
Elements can be left in the position they started in. This is a necessary part of any fair shuffle, and also ensures our shuffle will work for N=0.
When N is larger than an element's distance from the start or end of the array, it's allowed to be moved to the other side. We could tweak the algorithm to forbid this, but it would violate the "uniformly" requirement - elements near either end would be more likely to stay put than elements near the middle.
Now we can actually solve the problem.
Generate an array of random value in the range i + [-N, N] where i is the current index in the array. Normalize values outside the array bounds (e.g. -1 should become length-1 and length should become 0).
Look for pairs of duplicate values (collisions) in the array, and recompute them. You have a few options:
Recompute both values until they don't collide with each other, they could both still collide with other values.
Recompute just one until it doesn't collide with the other, the first value could still collide, but the second should now be unique, which might mean fewer calls to the RNG.
Identify the set of available indices for each collision (e.g. in [3, 1, 1, 0] index 2 is available), pick a random value from that set, and set one of the array values to selected result. This avoids needing to loop until the collision is resolved, but is more complex to code and risks running into a case where the set is empty.
However you address individual collisions, repeat the process until every value in the array is unique.
Now move each element in the original array to the index specified in the array we generated.
I'm not sure how to best implement #2, I'd suggest you benchmark it. If you don't want to take the time to benchmark, I'd go with the first option. The others are optimizations that might be faster, but might actually end up being slower.
This solution has an unbounded runtime in theory, but should terminate reasonably quickly in practice. Again, benchmark and test it before using it anywhere critical.
One possible solution I have come up with though how 'naive' it is I am not certain. Especially at edges, the far edge especially.
create a array of flags (boolean) N long (representing elements that have been swapped)
For At each index check if it has already been swapped (according first element in flags array) if so, move on to next (see below)
rotate the flags array, deleting the first element (representing this
element), and add a new 'not swapped' element to end. ASIDE: This
maybe done using a modulus array lookup, to avoid having to actually
move array contents, especially for large N
Loop...
pick a number from 0 to N (or less than N, if N plus current
index is larger that array being shuffled.
If 0, element swaps with itself, move to next.
Otherwise if that element marked as swapped, Loop and try again.
Note there is always 2 elements in flags array that can be picks, itself
and the last element (unless close to end of array being shuffled)
Swap current element with selected unswapped element, mark the selected element as swapped in the flags array. Loop to next element

Efficient algorithm for finding k largest elements with range of values

Assume there is a list of elements each with a range, such that the value of the element would lie in the range. The ranges between elements may overlap. The exact value is unknown, but it can be calculated. What would be an optimal algorithm to select the elements with highest k values, such that the number of exact computations is minimum?
I have a very naive and straight-forward algorithm, but definitely this is not optimal.
Sort the ranges according to maximum range values.
Compute first k values.
Remove the elements for which the maximum range value is less than the value of the k^{th} highest value till now.
From the remaining elements, calculate the value of the element with the maximum range value and update highest k list. If there are no remaining elements, then stop.
Go to 3
This can be improved without leaving the realm of naivité:
It is assured, that an element A, where the range-max is lower than the range-min of an element B also has a lower real value. So you drop all elements, that have a range-max lower than the 5th highest range-min. This leaves you with a much smaller list: If your original list is long (i.e.: Disk-based) you most likely can reduce it to a mem-based version. In addition to that, the selection run will most likely leave you with this sub-list already sorted.
If still necessary, sort the smaller list
(*) Now cycle similar to your original algorithm:
remove the highest-max element from the list and calculate the real value for it, ordering it into a sorted working list
move all values, that have range-max below this value from the current list to a secondary list, keeping the sortedness
This gives you an even shorter working list, that is assured to contain the highest values
if this has enough entries, chose the k highest and be done
if this not, make the secondary list your new primary list and goto (*)

Algorithm to generate a 'nearly sorted' or 'k sorted' list?

I want to generate some test data to test a function that merges 'k sorted' lists (lists where each element is at most k positions away from it's correct sorted position) into a single fully sorted list. I have an approach that works but I'm not sure how well randomized it is and I feel there should be a simpler / more elegant way to do this. My current approach:
Generate n random elements paired with an integer index.
Sort random elements.
Set paired index for each element to its sorted position.
Work backwards through the elements, swapping each element with an element a random distance between 1 and k positions behind it in the list. Only swap with the target element if its paired index is its current index (this avoids swapping an element that is already out of place and moving it further than k positions away from where it should be).
Copy the perturbed elements out into another list.
Like I say, this works but I'm interested in alternative / better approaches.
I think you could just fill an array with random integers and then run quicksort on it with a custom stopping condition.
If in a particular quicksort recursion your start and end indexes are less than k apart, then just return instead of continuing to recur.
Because of how quicksort works, every number in the start..end interval belongs somewhere in that region; worst case is that array[start] might really belong at array[end] (or vice versa) in truly sorted order. So, assuring that start and end are no more than k apart should be sufficient.
You can generate array of random numbers and then h-sort it like in shellsort, but without fiew last sorting steps when h is less then k.
Step 1: Randomly permute disjoint segments of length k. (Eg. 1 to K, k+1 to 2k ...)
Step 2: Permute conditionally again by swapping (that they don't break k-sorted assumption (1+t yo k+t, k+1+t to 1+2k+t ...) where t is a number between 1 and k (most preferably k/2)
Probably repeat step 2 multiple times with different t.
If I understand the problem, you want an algorithm to randomly pick a single k-sorted list of length n, uniformly selected from the universe U of all k-sorted lists of length n. (You will then run this algorithm m times to produce m lists as input test data.)
The first step is to count them. What is the size of U? |U|
The next step is to enumerate them. Create any one-to-one mapping F between the integers (1,2,...,|U|) and k-sorted lists of length n.
Then randomly select an integer x between 1 and |U| inclusive, and then apply F(x) to get the list.

algorithm to find the three majority elements in an array

Let's say there are three elements in a non-sorted array all of which appear more than one-fourth times of the total number of elements.
What is the most efficient way to find these elements? Both for non-online and online versions of this question.
Thank you!
Edit
The non-online version I was referring to is: this array is specified in full. The online version means the array elements are coming one at a time.
I require the space in addition to time complexity to be tight.
disclaimer: THIS IS NOT HOMEWORK! I consider this as research-level question.
Remember up to three elements, together with counters.
remember first element, set count1 = 1
scan until you find first different element, increasing count1 for each occurrence of element 1
remember second elemt, set count2 = 1
scan until you find first element different from elem1 and elem2, incrementing count1 or count2, depending which element you see
remember third element, set count3 = 1
continue scanning, if the element is one of the remembered, increment its count, if it's none of the remembered, decrement all three counts; if a count drops to 0, forget the element, go to step 1, 3, or 5, depending on how many elements you forget
If you have three elements occurring strictly more than one-fourth times the number of elements in the array, you will end up with three remembered elements each with positive count, these are the three majority elements.
Small constant additional space, O(n), no sorting.
create a histogram of the entries, sort it, and take the three largest entries.

string transposition algorithm

Suppose there is given two String:
String s1= "MARTHA"
String s2= "MARHTA"
here we exchange positions of T and H. I am interested to write code which counts how many changes are necessary to transform from one String to another String.
There are several edit distance algorithms, the given Wikipeida link has links to a few.
Assuming that the distance counts only swaps, here is an idea based on permutations, that runs in linear time.
The first step of the algorithm is ensuring that the two strings are really equivalent in their character contents. This can be done in linear time using a hash table (or a fixed array that covers all the alphabet). If they are not, then s2 can't be considered a permutation of s1, and the "swap count" is irrelevant.
The second step counts the minimum number of swaps required to transform s2 to s1. This can be done by inspecting the permutation p that corresponds to the transformation from s1 to s2. For example, if s1="abcde" and s2="badce", then p=(2,1,4,3,5), meaning that position 1 contains element #2, position 2 contains element #1, etc. This permutation can be broke up into permutation cycles in linear time. The cycles in the example are (2,1) (4,3) and (5). The minimum swap count is the total count of the swaps required per cycle. A cycle of length k requires k-1 swaps in order to "fix it". Therefore, The number of swaps is N-C, where N is the string length and C is the number of cycles. In our example, the result is 2 (swap 1,2 and then 3,4).
Now, there are two problems here, and I think I'm too tired to solve them right now :)
1) My solution assumes that no character is repeated, which is not always the case. Some adjustment is needed to calculate the swap count correctly.
2) My formula #MinSwaps=N-C needs a proof... I didn't find it in the web.
Your problem is not so easy, since before counting the swaps you need to ensure that every swap reduces the "distance" (in equality) between these two strings. Then actually you look for the count but you should look for the smallest count (or at least I suppose), otherwise there exists infinite ways to swap a string to obtain another one.
You should first check which charaters are already in place, then for every character that is not look if there is a couple that can be swapped so that the next distance between strings is reduced. Then iterate over until you finish the process.
If you don't want to effectively do it but just count the number of swaps use a bit array in which you have 1 for every well-placed character and 0 otherwise. You will finish when every bit is 1.

Resources