Quick sort - Not a stable sort - sorting

Stable sort talks about equal keys NOT getting past each other, after sorting
Consider duplicate key 4 at array index 8 & 9, in the below sequence,
a = [5 20 19 18 17 8 4 5 4 4] where pivot = 0, i = 1, j = 9
Partition logic says,
i pointer moves left to right. Move i as long as a[i] value is ≤ to a[pivot]. swap(a[i], a[j])
j pointer moves right to left. Move j as long as a[j] value is ≥ to a[pivot]. swap(a[i], a[j])
After following this procedure two times,
a = [5 4 19 18 17 8 4 5 4 20] Swap done at i = 1 & j = 9.
a = [5 4 19 18 17 8 4 5 4 20] Stops at i = 2 & j = 8
a = [5 4 4 18 17 8 4 5 19 20] Swap done at i = 2 & j = 8
My understanding is, as duplicate key 4 lost their order after two swaps, Quick sort is not stable sort.
Question:
As per my understanding, Is this the reason for Quick sort not being stable? If yes, Do we have any alternative partition approach to maintain the order of key 4 in the above example?

There's nothing in the definition of Quicksort per se that makes it either stable or unstable. It can be either.
The most common implementation of Quicksort on arrays involves partitioning via swaps between a pair of pointers, one progressing from end to beginning and the other from beginning to end. This does produce an unstable Quicksort.
While this method of partitioning is certainly common, it's not a requirement for the algorithm to be a Quicksort. It's just a method that's simple and common when applying Quicksort to an array.
On the other hand, consider doing a quicksort on a singly linked list. In this case, you typically do the partitioning by creating two separate linked lists, one containing those smaller than the pivot value, the other containing those larger than the pivot value. Since you always traverse the list from beginning to end (not many other reasonable choices with a singly linked list). As long as you add each element to the end of the sub-list, the sub-lists you create contain equal keys in their original order. Thus, the result is a stable sort. On the other hand, if you don't care about stability, you can splice elements to the beginnings of the sub-lists (slightly easy to do with constant complexity). in this case, the sort will (again) be unstable.
The actual mechanics of partitioning a linked list are pretty trivial, as long as you don't get too fancy in choosing the partition.
node *list1 = dummy_node1;
node *add1 = list1;
node *list2 = dummy_node2;
node *add2 = list2;
T pivot = input->data; // easiest pivot value to choose
for (node *current = input; current != nullptr; current = current->next)
if (current->data < pivot) {
add1->next = current;
add1 = add1 -> next;
}
else {
add2->next = current;
add2 = add2->next;
}
add1->next = nullptr;
add2->next = nullptr;

Related

Algorithm to sort an Array, in which every element is 10 positions away from where it should be

What is the most efficient sorting algorithm to sort an Array, that has n elements and EVERY element originally is 10 position away from its position after sorting?
I am thinking about insertion sort, but I have no clue how to proof that:
(1) It is the most efficient.
(2) The algorithm needs in worst case O(n) steps to sort the Array.
A self-conceived example: [10,11,12,13,14,15,16,17,18,19,0,1,2,3,4,5,6,7,8,9,]
With these constraints there are not that many possibilities:
The value at index 0 must go to index 10 as that is the only index that is 10 positions away from index 0. And which value can move to index 0? It can only be the value that is currently at index 10. So it's a swap between indexes 0 and 10.
With the same reasoning the value at index 1 will swap with the value at index 11, and 2 with 12, 3 with 13, ... 9 with 19.
So now we have covered all indices in the range 0..19. No values outside this range will get into this range, nor will any value in this range move out of it. All movements involving these indices are already defined above.
We can repeat the same reasoning for indices in the range 20..39, and again from positions 40..59, ...etc
So we can conclude:
The array's size is necessarily a multiple of 20
Only one permutation is possible that abides to the given rules
The solution is therefore simple.
Solution in pseudo code:
sort(A):
for i = 0 to size(A) - 1 step 20:
swap A[i+0..i+9] with A[i+10..i+19]
In some languages the swap of such array slices can be done very efficiently.
When you say 10 positions away, the actual position could be i - 10 or i + 10.
So, just make a temporary copy of the array and take minimums for each 10 index positions away.
This is because the only clash we can assume is some index going for +10 and another index going for -10 for some same index j. So taking minimums will install correct value at the index j.
private static void solve(int[] arr){
int[] temp = new int[arr.length];
Arrays.fill(temp,Integer.MAX_VALUE);
for(int i=0;i<arr.length;++i){
if(i - 10 >= 0) temp[i - 10] = Math.min(temp[i - 10],arr[i]);
if(i + 10 < temp.length) temp[i + 10] = Math.min(temp[i + 10],arr[i]);
}
for(int i=0;i<arr.length;++i) arr[i] = temp[i];
}

In what order should you insert a set of known keys into a B-Tree to get minimal height?

Given a fixed number of keys or values(stored either in array or in some data structure) and order of b-tree, can we determine the sequence of inserting keys that would generate a space efficient b-tree.
To illustrate, consider b-tree of order 3. Let the keys be {1,2,3,4,5,6,7}. Inserting elements into tree in the following order
for(int i=1 ;i<8; ++i)
{
tree.push(i);
}
would create a tree like this
4
2 6
1 3 5 7
see http://en.wikipedia.org/wiki/B-tree
But inserting elements in this way
flag = true;
for(int i=1,j=7; i<8; ++i,--j)
{
if(flag)
{
tree.push(i);
flag = false;
}
else
{
tree.push(j);
flag = true;
}
}
creates a tree like this
3 5
1 2 4 6 7
where we can see there is decrease in level.
So is there a particular way to determine sequence of insertion which would reduce space consumption?
The following trick should work for most ordered search trees, assuming the data to insert are the integers 1..n.
Consider the binary representation of your integer keys - for 1..7 (with dots for zeros) that's...
Bit : 210
1 : ..1
2 : .1.
3 : .11
4 : 1..
5 : 1.1
6 : 11.
7 : 111
Bit 2 changes least often, Bit 0 changes most often. That's the opposite of what we want, so what if we reverse the order of those bits, then sort our keys in order of this bit-reversed value...
Bit : 210 Rev
4 : 1.. -> ..1 : 1
------------------
2 : .1. -> .1. : 2
6 : 11. -> .11 : 3
------------------
1 : ..1 -> 1.. : 4
5 : 1.1 -> 1.1 : 5
3 : .11 -> 11. : 6
7 : 111 -> 111 : 7
It's easiest to explain this in terms of an unbalanced binary search tree, growing by adding leaves. The first item is dead centre - it's exactly the item we want for the root. Then we add the keys for the next layer down. Finally, we add the leaf layer. At every step, the tree is as balanced as it can be, so even if you happen to be building an AVL or red-black balanced tree, the rebalancing logic should never be invoked.
[EDIT I just realised you don't need to sort the data based on those bit-reversed values in order to access the keys in that order. The trick to that is to notice that bit-reversing is its own inverse. As well as mapping keys to positions, it maps positions to keys. So if you loop through from 1..n, you can use this bit-reversed value to decide which item to insert next - for the first insert use the 4th item, for the second insert use the second item and so on. One complication - you have to round n upwards to one less than a power of two (7 is OK, but use 15 instead of 8) and you have to bounds-check the bit-reversed values. The reason is that bit-reversing can move some in-bounds positions out-of-bounds and visa versa.]
Actually, for a red-black tree some rebalancing logic will be invoked, but it should just be re-colouring nodes - not rearranging them. However, I haven't double checked, so don't rely on this claim.
For a B tree, the height of the tree grows by adding a new root. Proving this works is, therefore, a little awkward (and it may require a more careful node-splitting than a B tree normally requires) but the basic idea is the same. Although rebalancing occurs, it occurs in a balanced way because of the order of inserts.
This can be generalised for any set of known-in-advance keys because, once the keys are sorted, you can assign suitable indexes based on that sorted order.
WARNING - This isn't an efficient way to construct a perfectly balanced tree from known already-sorted data.
If you have your data already sorted, and know it's size, you can build a perfectly balanced tree in O(n) time. Here's some pseudocode...
if size is zero, return null
from the size, decide which index should be the (subtree) root
recurse for the left subtree, giving that index as the size (assuming 0 is a valid index)
take the next item to build the (subtree) root
recurse for the right subtree, giving (size - (index + 1)) as the size
add the left and right subtree results as the child pointers
return the new (subtree) root
Basically, this decides the structure of the tree based on the size and traverses that structure, building the actual nodes along the way. It shouldn't be too hard to adapt it for B Trees.
This is how I would add elements to b-tree.
Thanks to Steve314, for giving me the start with binary representation,
Given are n elements to add, in order. We have to add it to m-order b-tree. Take their indexes (1...n) and convert it to radix m. The main idea of this insertion is to insert number with highest m-radix bit currently and keep it above the lesser m-radix numbers added in the tree despite splitting of nodes.
1,2,3.. are indexes so you actually insert the numbers they point to.
For example, order-4 tree
4 8 12 highest radix bit numbers
1,2,3 5,6,7 9,10,11 13,14,15
Now depending on order median can be:
order is even -> number of keys are odd -> median is middle (mid median)
order is odd -> number of keys are even -> left median or right median
The choice of median (left/right) to be promoted will decide the order in which I should insert elements. This has to be fixed for the b-tree.
I add elements to trees in buckets. First I add bucket elements then on completion next bucket in order. Buckets can be easily created if median is known, bucket size is order m.
I take left median for promotion. Choosing bucket for insertion.
| 4 | 8 | 12 |
1,2,|3 5,6,|7 9,10,|11 13,14,|15
3 2 1 Order to insert buckets.
For left-median choice I insert buckets to the tree starting from right side, for right median choice I insert buckets from left side. Choosing left-median we insert median first, then elements to left of it first then rest of the numbers in the bucket.
Example
Bucket median first
12,
Add elements to left
11,12,
Then after all elements inserted it looks like,
| 12 |
|11 13,14,|
Then I choose the bucket left to it. And repeat the same process.
Median
12
8,11 13,14,
Add elements to left first
12
7,8,11 13,14,
Adding rest
8 | 12
7 9,10,|11 13,14,
Similarly keep adding all the numbers,
4 | 8 | 12
3 5,6,|7 9,10,|11 13,14,
At the end add numbers left out from buckets.
| 4 | 8 | 12 |
1,2,|3 5,6,|7 9,10,|11 13,14,|15
For mid-median (even order b-trees) you simply insert the median and then all the numbers in the bucket.
For right-median I add buckets from the left. For elements within the bucket I first insert median then right elements and then left elements.
Here we are adding the highest m-radix numbers, and in the process I added numbers with immediate lesser m-radix bit, making sure the highest m-radix numbers stay at top. Here I have only two levels, for more levels I repeat the same process in descending order of radix bits.
Last case is when remaining elements are of same radix-bit and there is no numbers with lesser radix-bit, then simply insert them and finish the procedure.
I would give an example for 3 levels, but it is too long to show. So please try with other parameters and tell if it works.
Unfortunately, all trees exhibit their worst case scenario running times, and require rigid balancing techniques when data is entered in increasing order like that. Binary trees quickly turn into linked lists, etc.
For typical B-Tree use cases (databases, filesystems, etc), you can typically count on your data naturally being more distributed, producing a tree more like your second example.
Though if it is really a concern, you could hash each key, guaranteeing a wider distribution of values.
for( i=1; i<8; ++i )
tree.push(hash(i));
To build a particular B-tree using Insert() as a black box, work backward. Given a nonempty B-tree, find a node with more than the minimum number of children that's as close to the leaves as possible. The root is considered to have minimum 0, so a node with the minimum number of children always exists. Delete a value from this node to be prepended to the list of Insert() calls. Work toward the leaves, merging subtrees.
For example, given the 2-3 tree
8
4 c
2 6 a e
1 3 5 7 9 b d f,
we choose 8 and do merges to obtain the predecessor
4 c
2 6 a e
1 3 5 79 b d f.
Then we choose 9.
4 c
2 6 a e
1 3 5 7 b d f
Then a.
4 c
2 6 e
1 3 5 7b d f
Then b.
4 c
2 6 e
1 3 5 7 d f
Then c.
4
2 6 e
1 3 5 7d f
Et cetera.
So is there a particular way to determine sequence of insertion which would reduce space consumption?
Edit note: since the question was quite interesting, I try to improve my answer with a bit of Haskell.
Let k be the Knuth order of the B-Tree and list a list of keys
The minimization of space consumption has a trivial solution:
-- won't use point free notation to ease haskell newbies
trivial k list = concat $ reverse $ chunksOf (k-1) $ sort list
Such algorithm will efficiently produce a time-inefficient B-Tree, unbalanced on the left but with minimal space consumption.
A lot of non trivial solutions exist that are less efficient to produce but show better lookup performance (lower height/depth). As you know, it's all about trade-offs!
A simple algorithm that minimizes both the B-Tree depth and the space consumption (but it doesn't minimize lookup performance!), is the following
-- Sort the list in increasing order and call sortByBTreeSpaceConsumption
-- with the result
smart k list = sortByBTreeSpaceConsumption k $ sort list
-- Sort list so that inserting in a B-Tree with Knuth order = k
-- will produce a B-Tree with minimal space consumption minimal depth
-- (but not best performance)
sortByBTreeSpaceConsumption :: Ord a => Int -> [a] -> [a]
sortByBTreeSpaceConsumption _ [] = []
sortByBTreeSpaceConsumption k list
| k - 1 >= numOfItems = list -- this will be a leaf
| otherwise = heads ++ tails ++ sortByBTreeSpaceConsumption k remainder
where requiredLayers = minNumberOfLayersToArrange k list
numOfItems = length list
capacityOfInnerLayers = capacityOfBTree k $ requiredLayers - 1
blockSize = capacityOfInnerLayers + 1
blocks = chunksOf blockSize balanced
heads = map last blocks
tails = concat $ map (sortByBTreeSpaceConsumption k . init) blocks
balanced = take (numOfItems - (mod numOfItems blockSize)) list
remainder = drop (numOfItems - (mod numOfItems blockSize)) list
-- Capacity of a layer n in a B-Tree with Knuth order = k
layerCapacity k 0 = k - 1
layerCapacity k n = k * layerCapacity k (n - 1)
-- Infinite list of capacities of layers in a B-Tree with Knuth order = k
capacitiesOfLayers k = map (layerCapacity k) [0..]
-- Capacity of a B-Tree with Knut order = k and l layers
capacityOfBTree k l = sum $ take l $ capacitiesOfLayers k
-- Infinite list of capacities of B-Trees with Knuth order = k
-- as the number of layers increases
capacitiesOfBTree k = map (capacityOfBTree k) [1..]
-- compute the minimum number of layers in a B-Tree of Knuth order k
-- required to store the items in list
minNumberOfLayersToArrange k list = 1 + f k
where numOfItems = length list
f = length . takeWhile (< numOfItems) . capacitiesOfBTree
With this smart function given a list = [21, 18, 16, 9, 12, 7, 6, 5, 1, 2] and a B-Tree with knuth order = 3 we should obtain [18, 5, 9, 1, 2, 6, 7, 12, 16, 21] with a resulting B-Tree like
[18, 21]
/
[5 , 9]
/ | \
[1,2] [6,7] [12, 16]
Obviously this is suboptimal from a performance point of view, but should be acceptable, since obtaining a better one (like the following) would be far more expensive (computationally and economically):
[7 , 16]
/ | \
[5,6] [9,12] [18, 21]
/
[1,2]
If you want to run it, compile the previous code in a Main.hs file and compile it with ghc after prepending
import Data.List (sort)
import Data.List.Split
import System.Environment (getArgs)
main = do
args <- getArgs
let knuthOrder = read $ head args
let keys = (map read $ tail args) :: [Int]
putStr "smart: "
putStrLn $ show $ smart knuthOrder keys
putStr "trivial: "
putStrLn $ show $ trivial knuthOrder keys

Getting the lowest possible sum from numbers' difference

I have to find the lowest possible sum from numbers' difference.
Let's say I have 4 numbers. 1515, 1520, 1500 and 1535. The lowest sum of difference is 30, because 1535 - 1520 = 15 && 1515 - 1500 = 15 and 15 + 15 = 30. If I would do like this: 1520 - 1515 = 5 && 1535 - 1500 = 35 it would be 40 in sum.
Hope you got it, if not, ask me.
Any ideas how to program this? I just found this online, tried to translate from my language to English. It sounds interesting. I can't do bruteforce, because it would take ages to compile. I don't need code, just ideas how to program or little fragment of code.
Thanks.
Edit:
I didn't post everything... One more edition:
I have let's say 8 possible numbers. But I have to take only 6 of them to make the smallest sum. For instance, numbers 1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731, the smallest sum will be 48, but here I have to take only 6 numbers from 8.
Taking the edit into account:
Start by sorting the list. Then use a dynamic programming solution, with state i, n representing the minimum sum of n differences when considering only the first i numbers in the sequence. Initial states: dp[*][0] = 0, everything else = infinity. Use two loops: outer loop looping through i from 1 to N, inner loop looping through n from 0 to R (3 in your example case in your edit - this uses 3 pairs of numbers which means 6 individual numbers). Your recurrence relation is dp[i][n] = min(dp[i-1][n], dp[i-2][n-1] + seq[i] - seq[i-1]).
You have to be aware of handling boundary cases which I've ignored, but the general idea should work and will run in O(N log N + NR) and use O(NR) space.
The solution by marcog is a correct, non-recursive, polynomial-time solution to the problem — it's a pretty standard DP problem — but, just for completeness, here's a proof that it works, and actual code for the problem. [#marcog: Feel free to copy any part of this answer into your own if you wish; I'll then delete this.]
Proof
Let the list be x1, …, xN. Assume wlog that the list is sorted. We're trying to find K (disjoint) pairs of elements from the list, such that the sum of their differences is minimised.
Claim: An optimal solution always consists of the differences of consecutive elements.
Proof: Suppose you fix the subset of elements whose differences are taken. Then by the proof given by Jonas Kölker, the optimal solution for just this subset consists of differences of consecutive elements from the list. Now suppose there is a solution corresponding to a subset that does not comprise pairs of consecutive elements, i.e. the solution involves a difference xj-xi where j>i+1. Then, we can replace xj with xi+1 to get a smaller difference, since
xi ≤ xi+1 ≤ xj ⇒ xi+1-xi ≤ xj-xi.
(Needless to say, if xi+1=xj, then taking xi+1 is indistinguishable from taking xj.) This proves the claim.
The rest is just routine dynamic programming stuff: the optimal solution using k pairs from the first n elements either doesn't use the nth element at all (in which case it's just the optimal solution using k pairs from the first n-1), or it uses the nth element in which case it's the difference xn-xn-1 plus the optimal solution using k-1 pairs from the first n-2.
The whole program runs in time O(N log N + NK), as marcog says. (Sorting + DP.)
Code
Here's a complete program. I was lazy with initializing arrays and wrote Python code using dicts; this is a small log(N) factor over using actual arrays.
'''
The minimum possible sum|x_i - x_j| using K pairs (2K numbers) from N numbers
'''
import sys
def ints(): return [int(s) for s in sys.stdin.readline().split()]
N, K = ints()
num = sorted(ints())
best = {} #best[(k,n)] = minimum sum using k pairs out of 0 to n
def b(k,n):
if best.has_key((k,n)): return best[(k,n)]
if k==0: return 0
return float('inf')
for n in range(1,N):
for k in range(1,K+1):
best[(k,n)] = min([b(k,n-1), #Not using num[n]
b(k-1,n-2) + num[n]-num[n-1]]) #Using num[n]
print best[(K,N-1)]
Test it:
Input
4 2
1515 1520 1500 1535
Output
30
Input
8 3
1731 1572 2041 1561 1682 1572 1609 1731
Output
48
I assume the general problem is this: given a list of 2n integers, output a list of n pairs, such that the sum of |x - y| over all pairs (x, y) is as small as possible.
In that case, the idea would be:
sort the numbers
emit (numbers[2k], numbers[2k+1]) for k = 0, ..., n - 1.
This works. Proof:
Suppose you have x_1 < x_2 < x_3 < x_4 (possibly with other values between them) and output (x_1, x_3) and (x_2, x_4). Then
|x_4 - x_2| + |x_3 - x_1| = |x_4 - x_3| + |x_3 - x_2| + |x_3 - x_2| + |x_2 - x_1| >= |x_4 - x_3| + |x_2 - x_1|.
In other words, it's always better to output (x_1, x_2) and (x_3, x_4) because you don't redundantly cover the space between x_2 and x_3 twice. By induction, the smallest number of the 2n must be paired with the second smallest number; by induction on the rest of the list, pairing up smallest neighbours is always optimal, so the algorithm sketch I proposed is correct.
Order the list, then do the difference calculation.
EDIT: hi #hey
You can solve the problem using dynamic programming.
Say you have a list L of N integers, you must form k pairs (with 2*k <= N)
Build a function that finds the smallest difference within a list (if the list is sorted, it will be faster ;) call it smallest(list l)
Build another one that finds the same for two pairs (can be tricky, but doable) and call it smallest2(list l)
Let's define best(int i, list l) the function that gives you the best result for i pairs within the list l
The algorithm goes as follows:
best(1, L) = smallest(L)
best(2, L) = smallest2(L)
for i from 1 to k:
loop
compute min (
stored_best(i-2) - smallest2( stored_remainder(i-2) ),
stored_best(i-1) - smallest( stored_remainder(i-1)
) and store as best(i)
store the remainder as well for the chosen solution
Now, the problem is once you have chosen a pair, the two ints that form the boundaries are reserved and can't be used to form a better solution. But by looking two levels back you can guaranty you have allowed switching candidates.
(The switching work is done by smallest2)
Step 1: Calculate pair differences
I think it is fairly obvious that the right approach is to sort the numbers and then take differences between each
adjacent pair of numbers. These differences are the "candidate" differences contributing to the
minimal difference sum. Using the numbers from your example would lead to:
Number Diff
====== ====
1561
11
1572
0
1572
37
1609
73
1682
49
1731
0
1731
310
2041
Save the differences into an array or table or some other data structure where you can maintain the
differences and the two numbers that contributed to each difference. Call this the DiffTable. It
should look something like:
Index Diff Number1 Number2
===== ==== ======= =======
1 11 1561 1572
2 0 1572 1572
3 37 1572 1609
4 73 1609 1682
5 49 1682 1731
6 0 1731 1731
7 310 1731 2041
Step 2: Choose minimal Differences
If all numbers had to be chosen, we could have stopped at step 1 by choosing the number pair for odd numbered
indices: 1, 3, 5, 7. This is the correct answer. However,
the problem states that a subset of pairs are chosen and this complicates the problem quite a bit.
In your example 3 differences (6 numbers = 3 pairs = 3 differences) need to be chosen such that:
The sum of the differences is minimal
The numbers participating in any chosen difference are removed from the list.
The second point means that if we chose Diff 11 (Index = 1 above), the numbers 1561 and 1572 are
removed from the list, and consequently, the next Diff of 0 at index 2 cannot be used because only 1 instance
of 1572 is left. Whenever a
Diff is chosen the adjacent Diff values are removed. This is why there is only one way to choose 4 pairs of
numbers from a list containing eight numbers.
About the only method I can think of to minimize the sum of the Diff above is to generate and test.
The following pseudo code outlines a process to generate
all 'legal' sets of index values for a DiffTable of arbitrary size
where an arbitrary number of number pairs are chosen. One (or more) of the
generated index sets will contain the indices into the DiffTable yielding a minimum Diff sum.
/* Global Variables */
M = 7 /* Number of candidate pair differences in DiffTable */
N = 3 /* Number of indices in each candidate pair set (3 pairs of numbers) */
AllSets = [] /* Set of candidate index sets (set of sets) */
call GenIdxSet(1, []) /* Call generator with seed values */
/* AllSets now contains candidate index sets to perform min sum tests on */
end
procedure: GenIdxSet(i, IdxSet)
/* Generate all the valid index values for current level */
/* and subsequent levels until a complete index set is generated */
do while i <= M
if CountMembers(IdxSet) = N - 1 then /* Set is complete */
AllSets = AppendToSet(AllSets, AppendToSet(IdxSet, i))
else /* Add another index */
call GenIdxSet(i + 2, AppendToSet(IdxSet, i))
i = i + 1
end
return
Function CountMembers returns the number of members in the given set, function AppendToSet returns a new set
where the arguments are appended into a single ordered set. For example
AppendToSet([a, b, c], d) returns the set: [a, b, c, d].
For the given parameters, M = 7 and N = 3, AllSets becomes:
[[1 3 5]
[1 3 6] <= Diffs = (11 + 37 + 0) = 48
[1 3 7]
[1 4 6]
[1 4 7]
[1 5 7]
[2 4 6]
[2 4 7]
[2 5 7]
[3 5 7]]
Calculate the sums using each set of indices, the one that is minimum identifies the
required number pairs in DiffTable. Above I show that the second set of indices gives
the minimum you are looking for.
This is a simple brute force technique and it does not scale very well. If you had a list of
50 number pairs and wanted to choose the 5 pairs, AllSets would contain 1,221,759 sets of
number pairs to test.
I know you said you did not need code but it is the best way for me to describe a set based solution. The solution runs under SQL Server 2008. Included in the code is the data for the two examples you give. The sql solution could be done with a single self joining table but I find it easier to explain when there are multiple tables.
--table 1 holds the values
declare #Table1 table (T1_Val int)
Insert #Table1
--this data is test 1
--Select (1515) Union ALL
--Select (1520) Union ALL
--Select (1500) Union ALL
--Select (1535)
--this data is test 2
Select (1731) Union ALL
Select (1572) Union ALL
Select (2041) Union ALL
Select (1561) Union ALL
Select (1682) Union ALL
Select (1572) Union ALL
Select (1609) Union ALL
Select (1731)
--Select * from #Table1
--table 2 holds the sorted numbered list
Declare #Table2 table (T2_id int identity(1,1), T1_Val int)
Insert #Table2 Select T1_Val from #Table1 order by T1_Val
--table 3 will hold the sorted pairs
Declare #Table3 table (T3_id int identity(1,1), T21_id int, T21_Val int, T22_id int, T22_val int)
Insert #Table3
Select T2_1.T2_id, T2_1.T1_Val,T2_2.T2_id, T2_2.T1_Val from #Table2 AS T2_1
LEFT Outer join #Table2 AS T2_2 on T2_1.T2_id = T2_2.T2_id +1
--select * from #Table3
--remove odd numbered rows
delete from #Table3 where T3_id % 2 > 0
--select * from #Table3
--show the diff values
--select *, ABS(T21_Val - T22_val) from #Table3
--show the diff values in order
--select *, ABS(T21_Val - T22_val) from #Table3 order by ABS(T21_Val - T22_val)
--display the two lowest
select TOP 2 CAST(T22_val as varchar(24)) + ' and ' + CAST(T21_val as varchar(24)) as 'The minimum difference pairs are'
, ABS(T21_Val - T22_val) as 'Difference'
from #Table3
ORDER by ABS(T21_Val - T22_val)
I think #marcog's approach can be simplified further.
Take the basic approach that #jonas-kolker proved for finding the smallest differences. Take the resulting list and sort it. Take the R smallest entries from this list and use them as your differences. Proving that this is the smallest sum is trivial.
#marcog's approach is effectively O(N^2) because R == N is a legit option. This approach should be (2*(N log N))+N aka O(N log N).
This requires a small data structure to hold a difference and the values it was derived from. But, that is constant per entry. Thus, space is O(N).
I would go with answer of marcog, you can sort using any of the sorting algoriothms. But there is little thing to analyze now.
If you have to choose R numbers out N numbers so that the sum of their differences is minimum then the numbers be chosen in a sequence without missing any numbers in between.
Hence after sorting the array you should run an outer loop from 0 to N-R and an inner loop from 0 to R-1 times to calculate the sum of differnces.
If needed, you should try with some examples.
I've taken an approach which uses a recursive algorithm, but it does take some of what other people have contributed.
First of all we sort the numbers:
[1561,1572,1572,1609,1682,1731,1731,2041]
Then we compute the differences, keeping track of which the indices of the numbers that contributed to each difference:
[(11,(0,1)),(0,(1,2)),(37,(2,3)),(73,(3,4)),(49,(4,5)),(0,(5,6)),(310,(6,7))]
So we got 11 by getting the difference between number at index 0 and number at index 1, 37 from the numbers at indices 2 & 3.
I then sorted this list, so it tells me which pairs give me the smallest difference:
[(0,(1,2)),(0,(5,6)),(11,(0,1)),(37,(2,3)),(49,(4,5)),(73,(3,4)),(310,(6,7))]
What we can see here is that, given that we want to select n numbers, a naive solution might be to select the first n / 2 items of this list. The trouble is, in this list the third item shares an index with the first, so we'd only actually get 5 numbers, not 6. In this case you need to select the fourth pair as well to get a set of 6 numbers.
From here, I came up with this algorithm. Throughout, there is a set of accepted indices which starts empty, and there's a number of numbers left to select n:
If n is 0, we're done.
if n is 1, and the first item will provide just 1 index which isn't in our set, we taken the first item, and we're done.
if n is 2 or more, and the first item will provide 2 indices which aren't in our set, we taken the first item, and we recurse (e.g. goto 1). This time looking for n - 2 numbers that make the smallest difference in the remainder of the list.
This is the basic routine, but life isn't that simple. There are cases we haven't covered yet, but make sure you get the idea before you move on.
Actually step 3 is wrong (found that just before I posted this :-/), as it may be unnecessary to include an early difference to cover indices which are covered by later, essential differences. The first example ([1515, 1520, 1500, 1535]) falls foul of this. Because of this I've thrown it away in the section below, and expanded step 4 to deal with it.
So, now we get to look at the special cases:
** as above **
** as above **
If n is 1, but the first item will provide two indices, we can't select it. We have to throw that item away and recurse. This time we're still looking for n indices, and there have been no changes to our accepted set.
If n is 2 or more, we have a choice. Either we can a) choose this item, and recurse looking for n - (1 or 2) indices, or b) skip this item, and recurse looking for n indices.
4 is where it gets tricky, and where this routine turns into a search rather than just a sorting exercise. How can we decide which branch (a or b) to take? Well, we're recursive, so let's call both, and see which one is better. How will we judge them?
We'll want to take whichever branch produces the lowest sum.
...but only if it will use up the right number of indices.
So step 4 becomes something like this (pseudocode):
x = numberOfIndicesProvidedBy(currentDifference)
branchA = findSmallestDifference (n-x, remainingDifferences) // recurse looking for **n-(1 or 2)**
branchB = findSmallestDifference (n , remainingDifferences) // recurse looking for **n**
sumA = currentDifference + sumOf(branchA)
sumB = sumOf(branchB)
validA = indicesAddedBy(branchA) == n
validB = indicesAddedBy(branchB) == n
if not validA && not validB then return an empty branch
if validA && not validB then return branchA
if validB && not validA then return branchB
// Here, both must be valid.
if sumA <= sumB then return branchA else return branchB
I coded this up in Haskell (because I'm trying to get good at it). I'm not sure about posting the whole thing, because it might be more confusing than useful, but here's the main part:
findSmallestDifference = findSmallestDifference' Set.empty
findSmallestDifference' _ _ [] = []
findSmallestDifference' taken n (d:ds)
| n == 0 = [] -- Case 1
| n == 1 && provides1 d = [d] -- Case 2
| n == 1 && provides2 d = findSmallestDifference' taken n ds -- Case 3
| provides0 d = findSmallestDifference' taken n ds -- Case 3a (See Edit)
| validA && not validB = branchA -- Case 4
| validB && not validA = branchB -- Case 4
| validA && validB && sumA <= sumB = branchA -- Case 4
| validA && validB && sumB <= sumA = branchB -- Case 4
| otherwise = [] -- Case 4
where branchA = d : findSmallestDifference' (newTaken d) (n - (provides taken d)) ds
branchB = findSmallestDifference' taken n ds
sumA = sumDifferences branchA
sumB = sumDifferences branchB
validA = n == (indicesTaken branchA)
validB = n == (indicesTaken branchA)
newTaken x = insertIndices x taken
Hopefully you can see all the cases there. That code(-ish), plus some wrapper produces this:
*Main> findLeastDiff 6 [1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731]
Smallest Difference found is 48
1572 - 1572 = 0
1731 - 1731 = 0
1572 - 1561 = 11
1609 - 1572 = 37
*Main> findLeastDiff 4 [1515, 1520, 1500,1535]
Smallest Difference found is 30
1515 - 1500 = 15
1535 - 1520 = 15
This has become long, but I've tried to be explicit. Hopefully it was worth while.
Edit : There is a case 3a that can be added to avoid some unnecessary work. If the current difference provides no additional indices, it can be skipped. This is taken care of in step 4 above, but there's no point in evaluating both halves of the tree for no gain. I've added this to the Haskell.
Something like
Sort List
Find Duplicates
Make the duplicates a pair
remove duplicates from list
break rest of list into pairs
calculate differences of each pair
take lowest amounts
In your example you have 8 number and need the best 3 pairs. First sort the list which gives you
1561, 1572, 1572, 1609, 1682, 1731, 1731, 2041
If you have duplicates make them a pair and remove them from the list so you have
[1572, 1572] = 0
[1731, 1731] = 0
L = { 1561, 1609, 1682, 2041 }
Break the remaining list into pairs, giving you the 4 following pairs
[1572, 1572] = 0
[1731, 1731] = 0
[1561, 1609] = 48
[1682, 2041] = 359
Then drop the amount of numbers you need to.
This gives you the following 3 pairs with the lowest pairs
[1572, 1572] = 0
[1731, 1731] = 0
[1561, 1609] = 48
So
0 + 0 + 48 = 48

Algorithm to count the number of valid blocks in a permutation [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Finding sorted sub-sequences in a permutation
Given an array A which holds a permutation of 1,2,...,n. A sub-block A[i..j]
of an array A is called a valid block if all the numbers appearing in A[i..j]
are consecutive numbers (may not be in order).
Given an array A= [ 7 3 4 1 2 6 5 8] the valid blocks are [3 4], [1,2], [6,5],
[3 4 1 2], [3 4 1 2 6 5], [7 3 4 1 2 6 5], [7 3 4 1 2 6 5 8]
So the count for above permutation is 7.
Give an O( n log n) algorithm to count the number of valid blocks.
Ok, I am down to 1 rep because I put 200 bounty on a related question: Finding sorted sub-sequences in a permutation
so I cannot leave comments for a while.
I have an idea:
1) Locate all permutation groups. They are: (78), (34), (12), (65). Unlike in group theory, their order and position, and whether they are adjacent matters. So, a group (78) can be represented as a structure (7, 8, false), while (34) would be (3,4,true). I am using Python's notation for tuples, but it is actually might be better to use a whole class for the group. Here true or false means contiguous or not. Two groups are "adjacent" if (max(gp1) == min(gp2) + 1 or max(gp2) == min(gp1) + 1) and contigous(gp1) and contiguos(gp2). This is not the only condition, for union(gp1, gp2) to be contiguous, because (14) and (23) combine into (14) nicely. This is a great question for algo class homework, but a terrible one for interview. I suspect this is homework.
Just some thoughts:
At first sight, this sounds impossible: a fully sorted array would have O(n2) valid sub-blocks.
So, you would need to count more than one valid sub-block at a time. Checking the validity of a sub-block is O(n). Checking whether a sub-block is fully sorted is O(n) as well. A fully sorted sub-block contains n·(n - 1)/2 valid sub-blocks, which you can count without further breaking this sub-block up.
Now, the entire array is obviously always valid. For a divide-and-conquer approach, you would need to break this up. There are two conceivable breaking points: the location of the highest element, and that of the lowest element. If you break the array into two at one of these points, including the extremum in the part that contains the second-to-extreme element, there cannot be a valid sub-block crossing this break-point.
By always choosing the extremum that produces a more even split, this should work quite well (average O(n log n)) for "random" arrays. However, I can see problems when your input is something like (1 5 2 6 3 7 4 8), which seems to produce O(n2) behaviour. (1 4 7 2 5 8 3 6 9) would be similar (I hope you see the pattern). I currently see no trick to catch this kind of worse case, but it seems that it requires other splitting techniques.
This question does involve a bit of a "math trick" but it's fairly straight forward once you get it. However, the rest of my solution won't fit the O(n log n) criteria.
The math portion:
For any two consecutive numbers their sum is 2k+1 where k is the smallest element. For three it is 3k+3, 4 : 4k+6 and for N such numbers it is Nk + sum(1,N-1). Hence, you need two steps which can be done simultaneously:
Create the sum of all the sub-arrays.
Determine the smallest element of a sub-array.
The dynamic programming portion
Build two tables using the results of the previous row's entries to build each successive row's entries. Unfortunately, I'm totally wrong as this would still necessitate n^2 sub-array checks. Ugh!
My proposition
STEP = 2 // amount of examed number
B [0,0,0,0,0,0,0,0]
B [1,1,0,0,0,0,0,0]
VALID(A,B) - if not valid move one
B [0,1,1,0,0,0,0,0]
VALID(A,B) - if valid move one and step
B [0,0,0,1,1,0,0,0]
VALID (A,B)
B [0,0,0,0,0,1,1,0]
STEP = 3
B [1,1,1,0,0,0,0,0] not ok
B [0,1,1,1,0,0,0,0] ok
B [0,0,0,0,1,1,1,0] not ok
STEP = 4
B [1,1,1,1,0,0,0,0] not ok
B [0,1,1,1,1,0,0,0] ok
.....
CON <- 0
STEP <- 2
i <- 0
j <- 0
WHILE(STEP <= LEN(A)) DO
j <- STEP
WHILE(STEP <= LEN(A) - j) DO
IF(VALID(A,i,j)) DO
CON <- CON + 1
i <- j + 1
j <- j + STEP
ELSE
i <- i + 1
j <- j + 1
END
END
STEP <- STEP + 1
END
The valid method check that all elements are consecutive
Never tested but, might be ok
The original array doesn't contain duplicates so must itself be a consecutive block. Lets call this block (1 ~ n). We can test to see whether block (2 ~ n) is consecutive by checking if the first element is 1 or n which is O(1). Likewise we can test block (1 ~ n-1) by checking whether the last element is 1 or n.
I can't quite mould this into a solution that works but maybe it will help someone along...
Like everybody else, I'm just throwing this out ... it works for the single example below, but YMMV!
The idea is to count the number of illegal sub-blocks, and subtract this from the total possible number. We count the illegal ones by examining each array element in turn and ruling out sub-blocks that include the element but not its predecessor or successor.
Foreach i in [1,N], compute B[A[i]] = i.
Let Count = the total number of sub-blocks with length>1, which is N-choose-2 (one for each possible combination of starting and ending index).
Foreach i, consider A[i]. Ignoring edge cases, let x=A[i]-1, and let y=A[i]+1. A[i] cannot participate in any sub-block that does not include x or y. Let iX=B[x] and iY=B[y]. There are several cases to be treated independently here. The general case is that iX<i<iY<i. In this case, we can eliminate the sub-block A[iX+1 .. iY-1] and all intervening blocks containing i. There are (i - iX + 1) * (iY - i + 1) such sub-blocks, so call this number Eliminated. (Other cases left as an exercise for the reader, as are those edge cases.) Set Count = Count - Eliminated.
Return Count.
The total cost appears to be N * (cost of step 2) = O(N).
WRINKLE: In step 2, we must be careful not to eliminate each sub-interval more than once. We can accomplish this by only eliminating sub-intervals that lie fully or partly to the right of position i.
Example:
A = [1, 3, 2, 4]
B = [1, 3, 2, 4]
Initial count = (4*3)/2 = 6
i=1: A[i]=1, so need sub-blocks with 2 in them. We can eliminate [1,3] from consideration. Eliminated = 1, Count -> 5.
i=2: A[i]=3, so need sub-blocks with 2 or 4 in them. This rules out [1,3] but we already accounted for it when looking right from i=1. Eliminated = 0.
i=3: A[i] = 2, so need sub-blocks with [1] or [3] in them. We can eliminate [2,4] from consideration. Eliminated = 1, Count -> 4.
i=4: A[i] = 4, so we need sub-blocks with [3] in them. This rules out [2,4] but we already accounted for it when looking right from i=3. Eliminated = 0.
Final Count = 4, corresponding to the sub-blocks [1,3,2,4], [1,3,2], [3,2,4] and [3,2].
(This is an attempt to do this N.log(N) worst case. Unfortunately it's wrong -- it sometimes undercounts. It incorrectly assumes you can find all the blocks by looking at only adjacent pairs of smaller valid blocks. In fact you have to look at triplets, quadruples, etc, to get all the larger blocks.)
You do it with a struct that represents a subblock and a queue for subblocks.
struct
c_subblock
{
int index ; /* index into original array, head of subblock */
int width ; /* width of subblock > 0 */
int lo_value;
c_subblock * p_above ; /* null or subblock above with same index */
};
Alloc an array of subblocks the same size as the original array, and init each subblock to have exactly one item in it. Add them to the queue as you go. If you start with array [ 7 3 4 1 2 6 5 8 ] you will end up with a queue like this:
queue: ( [7,7] [3,3] [4,4] [1,1] [2,2] [6,6] [5,5] [8,8] )
The { index, width, lo_value, p_above } values for subbblock [7,7] will be { 0, 1, 7, null }.
Now it's easy. Forgive the c-ish pseudo-code.
loop {
c_subblock * const p_left = Pop subblock from queue.
int const right_index = p_left.index + p_left.width;
if ( right_index < length original array ) {
// Find adjacent subblock on the right.
// To do this you'll need the original array of length-1 subblocks.
c_subblock const * p_right = array_basic_subblocks[ right_index ];
do {
Check the left/right subblocks to see if the two merged are also a subblock.
If they are add a new merged subblock to the end of the queue.
p_right = p_right.p_above;
}
while ( p_right );
}
}
This will find them all I think. It's usually O(N log(N)), but it'll be O(N^2) for a fully sorted or anti-sorted list. I think there's an answer to this though -- when you build the original array of subblocks you look for sorted and anti-sorted sequences and add them as the base-level subblocks. If you are keeping a count increment it by (width * (width + 1))/2 for the base-level. That'll give you the count INCLUDING all the 1-length subblocks.
After that just use the loop above, popping and pushing the queue. If you're counting you'll have to have a multiplier on both the left and right subblocks and multiply these together to calculate the increment. The multiplier is the width of the leftmost (for p_left) or rightmost (for p_right) base-level subblock.
Hope this is clear and not too buggy. I'm just banging it out, so it may even be wrong.
[Later note. This doesn't work after all. See note below.]

How to master in-place array modification algorithms?

I am preparing for a software job interview, and I am having trouble with in-place array modifications.
For example, in the out-shuffle problem you interleave two halves of an array so that 1 2 3 4 5 6 7 8 would become 1 5 2 6 3 7 4 8. This question asks for a constant-memory solution (and linear-time, although I'm not sure that's even possible).
First I thought a linear algorithm is trivial, but then I couldn't work it out. Then I did find a simple O(n^2) algorithm but it took me a long time. And I still don't find a faster solution.
I remember also having trouble solving a similar problem from Bentley's Programming Pearls, column 2:
Rotate an array left by i positions (e.g. abcde rotated by 2 becomes cdeab), in time O(n) and with just a couple of bytes extra space.
Does anyone have tips to help wrap my head around such problems?
About an O(n) time, O(1) space algorithm for out-shuffle
Doing an out-shuffle in O(n) time and O(1) space is possible, but it is tough. Not sure why people think it is easy and are suggesting you try something else.
The following paper has an O(n) time and O(1) space solution (though it is for in-shuffle, doing in-shuffle makes out-shuffle trivial):
http://arxiv.org/PS_cache/arxiv/pdf/0805/0805.1598v1.pdf
About a method to tackle in-place array modification algorithms
In-place modification algorithms could become very hard to handle.
Consider a couple:
Inplace out-shuffle in linear time. Uses number theory.
In-place merge sort, was open for a few years. An algorithm came but was too complicated to be practical. Uses very complicated bookkeeping.
Sorry, if this sounds discouraging, but there is no magic elixir that will solve all in-place algorithm problems for you. You need to work with the problem, figure out its properties, and try to exploit them (as is the case with most algorithms).
That said, for array modifications where the result is a permutation of the original array, you can try the method of following the cycles of the permutation. Basically, any permutation can be written as a disjoint set of cycles (see John's answer too). For instance the permutation:
1 4 2 5 3 6
of 1 2 3 4 5 6 can be written as
1 -> 1
2 -> 3 -> 5 -> 4 -> 2
6 -> 6.
you can read the arrow as 'goes to'.
So to permute the array 1 2 3 4 5 6 you follow the three cycles:
1 goes to 1.
6 goes to 6.
2 goes to 3, 3 goes to 5, 5 goes to 4, and 4 goes to 2.
To follow this long cycle, you can use just one temp variable. Store 3 in it. Put 2 where 3 was. Now put 3 in 5 and store 5 in the temp and so on. Since you only use constant extra temp space to follow a particular cycle, you are doing an in-place modification of the array for that cycle.
Now if I gave you a formula for computing where an element goes to, all you now need is the set of starting elements of each cycle.
A judicious choice of the starting points of the cycles can make the algorithm easy. If you come up with the starting points in O(1) space, you now have a complete in-place algorithm. This is where you might actually have to get familiar with the problem and exploit its properties.
Even if you didn't know how to compute the starting points of the cycles, but had a formula to compute the next element, you could use this method to get an O(n) time in-place algorithm in some special cases.
For instance: if you knew the array of unsigned integers held only positive integers.
You can now follow the cycles, but negate the numbers in them as an indicator of 'visited' elements. Now you can walk the array and pick the first positive number you come across and follow the cycles for that, making the elements of the cycle negative and continue to find untouched elements. In the end, you just make all the elements positive again to get the resulting permutation.
You get an O(n) time and O(1) space algorithm! Of course, we kind of 'cheated' by using the sign bits of the array integers as our personal 'visited' bitmap.
Even if the array was not necessarily integers, this method (of following the cycles, not the hack of sign bits :-)) can actually be used to tackle the two problems you state:
The in-shuffle (or out-shuffle) problem: When 2n+1 is a power of 3, it can be shown (using number theory) that 1,3,3^2, etc are in different cycles and all cycles are covered using those. Combine this with the fact that the in-shuffle is susceptible to divide and conquer, you get an O(n) time, O(1) space algorithm (the formula is i -> 2*i modulo 2n+1). Refer to the above paper for more details.
The cyclic shift an array problem: Cyclic shift an array of size n by k also gives a permutation of the resulting array (given by the formula i goes to i+k modulo n), and can also be solved in linear time and in-place using the following the cycle method. In fact, in terms of the number of element exchanges this following cycle method is better than the 3 reverses algorithm. Of course, following the cycle method can kill the cache because of the access patterns, and in practice, the 3 reverses algorithm might actually fare better.
As for interviews, if the interviewer is a reasonable person, they will be looking at how you think and approach the problem and not whether you actually solve it. So even if you don't solve a problem, I think you should not be discouraged.
The basic strategy with in place algorithms is to figure out the rule for moving a entry from slot N to slot M.
So, your shuffle, for instance. if A and B are cards and N is the number of chards. the rules for the first half of the deck are different than the rules for the second half of the deck
// A is the current location, B is the new location.
// this math assumes that the first card is card 0
if (A < N/2)
B = A * 2;
else
B = (A - N/2) * 2 + 1;
Now we know the rule, we just have to move each card, each time we move a card, we calculate the new location, then remove the card that is currently in B. place A in slot B, then let B be A, and loop back to the top of the algorithm. Each card moved displaces the new card which becomes the next card to be moved.
I think the analysis is easier if we are 0 based rather than 1 based, so
0 1 2 3 4 5 6 7 // before
0 4 1 5 2 6 3 7 // after
So we want to move 1->2 2->4 4->1 and that completes a cycle
then move 3->6 6->5 5->3 and that completes a cycle
and we are done.
Now we know that card 0 and card N-1 don't move, so we can ignore those,
so we know that we only need to swap N-2 cards in total. The only sticky bit
is that there are 2 cycles, 1,2,4,1 and 3,6,5,3. when we get to card 1 the
second time, we need to move on to card 3.
int A = 1;
int N = 8;
card ary[N]; // Our array of cards
card a = ary[A];
for (int i = 0; i < N/2; ++i)
{
if (A < N/2)
B = A * 2;
else
B = (A - N/2) * 2 + 1;
card b = ary[B];
ary[B] = a;
a = b;
A = B;
if (A == 1)
{
A = 3;
a = ary[A];
}
}
Now this code only works for the 8 card example, because of that if test that moves us from 1 to 3 when we finish the first cycle. What we really need is a general rule to recognize the end of the cycle, and where to go to start the next one.
That rule could be mathematical if you can think of a way, or you could keep track of which places you had visited in a separate array, and when A is back to a visited place, you could then scan forward in your array looking for the first non-visited place.
For your in-place algorithm to be 0(n), the solution will need to be mathematical.
I hope this breakdown of the thinking process is helpful to you. If I was interviewing you, I would expect to see something like this on the whiteboard.
Note: As Moron points out, this doesn't work for all values of N, it's just an example of the sort of analysis that an interviewer is looking for.
Frank,
For programming with loops and arrays, nothing beats David Gries's textbook The Science of Programming. I studied it over 20 years ago, and there are ideas that I still use every day. It is very mathematical and will require real effort to master, but that effort will repay you many times over for your whole career.
Complementing Aryabhatta's answer:
There is a general method to "follow the cycles" even without knowing the starting positions for each cycle or using memory to know visited cycles. This is specially useful if you need O(1) memory.
For each position i in the array, follow the cycle without moving any data yet, until you reach...
the starting position i: end of the cyle. this is a new cycle: follow it again moving the data this time.
a position lower than i: this cycle was already visited, nothing to do with it.
Of course this has a time overhead (O(n^2), I believe) and has the cache problems of the general "following cycles" method.
For the first one, let's assume n is even. You have:
first half: 1 2 3 4
second : 5 6 7 8
Let x1 = first[1], x2 = second[1].
Now, you have to print one from the first half, one from the second, one from the first, one from the second...
Meaning first[1], second[1], first[2], second[2], ...
Obviously, you don't keep two halves in memory, as that will be O(n) memory. You keep pointers to the two halves. Do you see how you'd do that?
The second is a bit harder. Consider:
12345
abcde
..cde
.....ab
..cdeab
cdeab
Do you notice anything? You should notice that the question basically asks you to move the first i characters to the end of your string, without affording the luxury of copying the last n - i in a buffer then appending the first i and then returning the buffer. You need to do with O(1) memory.
To figure how to do this you basically need a lot of practice with these kinds of problems, as with anything else. Practice makes perfect basically. If you've never done these kinds of problems before, it's unlikely you'll figure it out. If you have, then you have to think about how you can manipulate the substrings and or indices such that you solve your problem under the given constraints. The general rule is to work and learn as much as possible so you'll figure out the solutions to these problems very fast when you see them. But the solution differs quite a bit from problem to problem. There's no clear recipe for success I'm afraid. Just read a lot and understand the stuff you read before you move on.
The logic for the second problem is this: what happens if we reverse the substring [1, 2], the substring [3, 5] and then concatenate them and reverse that? We have, in general:
1, 2, 3, 4, ..., i, i + 1, i + 2, ..., N
reverse [1, i] =>
i, i - 1, ..., 4, 3, 2, 1, i + 1, i + 2, ..., N
reverse [i + 1, N] =>
i, i - 1, ..., 4, 3, 2, 1, N, ..., i + 1
reverse [1, N] =>
i + 1, ..., N, 1, 2, 3, 4, ..., i - 1, i
which is what you wanted. Writing the reverse function using O(1) memory should be trivial.
Generally speaking, the idea is to loop through the array once, while
storing the value at the position you are at in a temporary variable
finding the correct value for that position and writing it
either move on to the next value, or figure out what to do with your temporary value before continuing.
A general approach could be as follows:
Construct a positions array int[] pos, such that pos[i] refers to the position (index) of a[i] in the shuffled array.
Rearrange the original array int[] a, according to this positions array pos.
/** Shuffle the array a. */
void shuffle(int[] a) {
// Step 1
int [] pos = contructRearrangementArray(a)
// Step 2
rearrange(a, pos);
}
/**
* Rearrange the given array a according to the positions array pos.
*/
private static void rearrange(int[] a, int[] pos)
{
// By definition 'pos' should not contain any duplicates, otherwise rearrange() can run forever.
// Do the above sanity check.
for (int i = 0; i < pos.length; i++) {
while (i != pos[i]) {
// This while loop completes one cycle in the array
swap(a, i, pos[i]);
swap(pos, i, pos[i]);
}
}
}
/** Swap ith element in a with jth element. */
public static void swap(int[] a, int i, int j)
{
int temp = a[i];
a[i] = a[j];
a[j] = temp;
}
As an example, for the case of outShuffle the following would be an implementation of contructRearrangementArray().
/**
* array : 1 2 3 4 5 6 7 8
* pos : 0 2 4 6 1 3 5 7
* outshuffle: 1 5 2 6 3 7 4 8 (outer boundaries remain same)
*/
public int[] contructRearrangementArray(int[] a)
{
if (a.length % 2 != 0) {
throw new IllegalArgumentException("Cannot outshuffle odd sized array");
}
int[] pos = new int[a.length];
for (int i = 0; i < pos.length; i++) {
pos[i] = i * 2 % (pos.length - 1);
}
pos[a.length - 1] = a.length - 1;
return pos;
}

Resources