Finding the minimum and maximm element from one of many arrays - algorithm

I received a question during an Amazon interview and would like assistance with solving it.
Given N arrays of size K each, each of these K elements in the N arrays are sorted, and each of these N*K elements are unique. Choose a single element from each of the N arrays, from the chosen subset of N elements. Subtract the minimum and maximum element. This difference should be the least possible minimum.
Sample:
N=3, K=3
N=1 : 6, 16, 67
N=2 : 11,17,68
N=3 : 10, 15, 100
here if 16, 17, 15 are chosen, we get the minimum difference as
17-15=2.

I can think of O(N*K*N)(edited after correctly pointed out by zivo, not a good solution now :( ) solution.
1. Take N pointer initially pointing to initial element each of N arrays.
6, 16, 67
^
11,17,68
^
10, 15, 100
^
2. Find out the highest and lowest element among the current pointer O(k) (6 and 11) and find the difference between them.(5)
3. Increment the pointer which is pointing to lowest element by 1 in that array.
6, 16, 67
^
11,17,68
^
10, 15, 100 (difference:5)
^
4. Keep repeating step 2 and 3 and store the minimum difference.
6, 16, 67
^
11,17,68
^
10,15,100 (difference:5)
^
6, 16, 67
^
11,17,68
^
10,15,100 (difference:2)
^
Above will be the required solution.
6, 16, 67
^
11,17,68
^
10,15,100 (difference:84)
^
6, 16, 67
^
11,17,68
^
10,15,100 (difference:83)
^
And so on......
EDIT:
Its complexity can be reduced by using a heap (as suggested by Uri). I thought of it but faced a problem: Each time an element is extracted from heap, its array number has to be found out in order to increment the corresponding pointer for that array. An efficient way to find array number can definitely reduce the complexity to O(K*N log(K*N)). One naive way is to use a data structure like this
Struct
{
int element;
int arraynumer;
}
and reconstruct the initial data like
6|0,16|0,67|0
11|1,17|1,68|1
10|2,15|2,100|2
Initially keep the current max for first column and insert the pointed elements in heap. Now each time an element is extracted, its array number can be found out, pointer in that array is incremented , the newly pointed element can be compared to current max and max pointer can be adjusted accordingly.

So here is an algorithm to do solve this problem in two steps:
First step is to merge all your arrays into one sorted array which would look like this:
combined_val[] - which holds all numbers
combined_ind[] - which holds index of which array did this number originally belonged to
this step can be done easily in O(K*N*log(N)) but i think you can do better than that too (maybe not, you can lookup variants of merge sort because they do step similar to that)
Now second step:
it is easier to just put code instead of explaining so here is the pseduocode:
int count[N] = { 0 }
int head = 0;
int diffcnt = 0;
// mindiff is initialized to overall maximum value - overall minimum value
int mindiff = combined_val[N * K - 1] - combined_val[0];
for (int i = 0; i &lt N * K; i++)
{
count[combined_ind[i]]++;
if (count[combined_ind[i]] == 1) {
// diffcnt counts how many arrays have at least one element between
// indexes of "head" and "i". Once diffcnt reaches N it will stay N and
// not increase anymore
diffcnt++;
} else {
while (count[combined_ind[head]] > 1) {
// We try to move head index as forward as possible while keeping diffcnt constant.
// i.e. if count[combined_ind[head]] is 1, then if we would move head forward
// diffcnt would decrease, that is something we dont want to do.
count[combined_ind[head]]--;
head++;
}
}
if (diffcnt == N) {
// i.e. we got at least one element from all arrays
if (combined_val[i] - combined_val[head] &lt mindiff) {
mindiff = combined_val[i] - combined_val[head];
// if you want to save actual numbers too, you can save this (i.e. i and head
// and then extract data from that)
}
}
}
the result is in mindiff.
The runing time of second step is O(N * K). This is because "head" index will move only N*K times maximum. so the inner loop does not make this quadratic, it is still linear.
So total algorithm running time is O(N * K * log(N)), however this is because of merging step, if you can come up with better merging step you can probably bring it down to O(N * K).

This problem is for managers
You have 3 developers (N1), 3 testers (N2) and 3 DBAs (N3)
Choose the less divergent team that can run a project successfully.
int[n] result;// where result[i] keeps the element from bucket N_i
int[n] latest;//where latest[i] keeps the latest element visited from bucket N_i
Iterate elements in (N_1 + N_2 + N_3) in sorted order
{
Keep track of latest element visited from each bucket N_i by updating 'latest' array;
if boundary(latest) < boundary(result)
{
result = latest;
}
}
int boundary(int[] array)
{
return Max(array) - Min(array);
}

I've O(K*N*log(K)), with typical execution much less. Currently cannot think anything better. I'll explain first the easier to describe (somewhat longer execution):
For each element f in the first array (loop through K elements)
For each array, starting from the second array (loop through N-1 arrays)
Do a binary search on the array, and find element closest to f. This is your element (Log(K))
This algorithm can be optimized, if for each array, you add a new Floor Index. When performent the binary search, search between 'Floor' to 'K-1'.
Initially Floor index is 0, and for first element you search through the entire arrays. Once you find an element closest to 'f', update the Floor Index with the index of that element. Worse case is the same (Floor may not update, if maximum element of first array is smaller than any other minimum), but average case will improve.

Correctness proof for the accepted answer (Terminal's solution)
Assume that the algorithm finds a series A=<A[1],A[2],...,A[N]> which isn't the optimal solution (R).
Consider the index j in R, such that item R[j] is the first item among R that the algorithm examines and replaces it with the next item in its row.
Let A' denote the candidate solution at that phase (prior to the replacement). Since R[j]=A'[j] is the minimum value of A', it's also the minimum of R.
Now, consider the maximum value of R, R[m]. If A'[m]<R[m], then R can be improved by replacing R[m] with A'[m], which contradicts the fact that R is optimal. Therefore, A'[m]=R[m].
In other words, R and A' share the same maximum and minimum, therefore they are equivalent. This completes the proof: if R is an optimal solution, then the algorithm is guaranteed to find a solution as good as R.

for every element in 1st array
choose the element in 2nd array that is closest to the element in 1st array
current_array = 2;
do
{
choose the element in current_array+1 that is closest to the element in current_array
current_array++;
} while(current_array < n);
complexity: O(k^2*n)

Here is my logic on how to resolve this issue, keeping in mind that we need to pick one element from each of the N arrays (to compute the least minimum)
// if we take the above values as an example!
// then the idea would be to sort all three arrays while keeping another
// array to keep the reference to their sets (1 or 2 or 3, could be
// extended to n sets)
1 3 2 3 1 2 1 2 3 // this is the array that holds the set index
6 10 11 15 16 17 67 68 100 // this is the sorted combined array.
| |
5 2 33 // this is the computed least minimum,
// the rule is to make sure the indexes of the values
// we are comparing are different (to make sure we are
// comparing elements from different sets), then for example
// the first element of that example is index:1|value:6 we hold
// that value 6 (that is the value we will be using to compute the least minimum,
// then we go to the edge of the comparison which would be the second different index,
// we skip index:3|value:10 (we remove it from the array) we compare index:2|value:11
// to index:1|value:6 we obtain 5 which would go to a variable named leastMinimum = 5,
// now we remove the indexes and values we already used,
// and redo the same steps.
Step 1:
1 3 2 3 1 2 1 2 3
6 10 11 15 16 17 67 68 100
|
5
leastMinumum = 5
Step 2:
3 1 2 1 2 3
15 16 17 67 68 100
|
2
leastMinimum = min(2, leastMinumum) // which is equal 2
Step 3:
1 2 3
67 68 100
33
leastMinimum = min(33, leastMinumum) // which is equal to old leastMinumum which is 2
Now: We suppose we have elements from the same array that are very close to each other (k=2 this time which means we only have 3 sets with two values) :
// After sorting the n arrays we will have the below indexes array and values array
1 1 2 3 2 3
6 7 8 12 15 16
* * *
* we skip second index of 1|7 and we take the least minimum of 1|6 and 3|12 (index:2|value:8 will be removed as it is not at the edges, we pick the minimum and maximum of the unique index subset of n elements)
1 3
6 12
=6
* second step we remove the values we already used, so the array become like below:
1 2 3
7 15 16
* * *
7 - 16
= 9
Note:
Another approach that consumes more memory would consist of creating N sub-arrays from which we would be comparing the maximum - minumum
So from the below sorted values array and its corresponding indexes array we extract three other sub arrays:
1 3 2 3 1 2 1 2 3
6 10 11 15 16 17 67 68 100
First Array:
1 3 2
6 10 11
11-6 = 5
Second Array:
3 1 2
15 15 17
17-15 = 2
Third Array:
1 2 3
67 68 100
100 - 67 = 33

Related

How to perform range updates in sqrt{n} time?

I have an array and I have to perform query and updates on it.
For queries, I have to find frequency of a particular number in a range from l to r and for update, I have to add x from some range l to r.
How to perform this?
I thought of sqrt{n} optimization but I don't know how to perform range updates with this time complexity.
Edit - Since some people are asking for an example, here is one
Suppose the array is of size n = 8
and it is
1 3 3 4 5 1 2 3
And there are 3 queries to help everybody explain about what I am trying to say
Here they are
q 1 5 3 - This means that you have to find the frequency of 3 in range 1 to 5 which is 2 as 3 appears on 2nd and 3rd position.
second is update query and it goes like this - u 2 4 6 -> This means that you have to add 6 in the array from range 2 to 4. So the new array will become
1 9 9 10 5 1 2 3
And the last query is again the same as first one which will now return 0 as there is no 3 in the array from position 1 to 5 now.
I believe things must be more clear now. :)
I developed this algorithm long time (20+ years) ago for Arithmetic coder.
Both Update and Retrieve are performed in O(log(N)).
I named this algorithm "Method of Intervals". Let I show you the example.
Imagine, we have 8 intervals, with numbers 0-7:
+--0--+--1--+--2-+--3--+--4--+--5--+--6--+--7--+
Lets we create additional set of intervals, each spawns pair of original ones:
+----01-----+----23----+----45-----+----67-----+
Thereafter, we'll create the extra one layer of intervals, spawn pairs of 2nd:
+---------0123---------+---------4567----------+
And at last, we create single interval, covers all 8:
+------------------01234567--------------------+
As you see, in this structure, to retrieve right border of the interval [5], you needed just add together length of intervals [0123] + [45]. to retrieve left border of the interval [5], you needed sum of length the intervals [0123] + [4] (left border for 5 is right border for 4).
Of course, left border of the interval [0] is always = 0.
When you'll watch this proposed structure carefully, you will see, the odd elements in the each layers aren't needed. I say, you do not needed elements 1, 3, 5, 7, 23, 67, 4567, since these elements aren't used, during Retrieval or Update.
Lets we remove the odd elements and make following remuneration:
+--1--+--x--+--3-+--x--+--5--+--x--+--7--+--x--+
+-----2-----+-----x----+-----6-----+-----x-----+
+-----------4----------+-----------x-----------+
+----------------------8-----------------------+
As you see, with this remuneration, used the numbers [1-8]. Lets they will be array indexes. So, you see, there is used memory O(N).
To retrieve right border of the interval [7], you needed add length of the values with indexes 4,6,7. To update length of the interval [7], you needed add difference to all 3 of these values. As result, both Retrieval and Update are performed for Log(N) time.
Now is needed algorithm, how by the original interval number compute set of indexes in this data structure. For instance - how to convert:
1 -> 1
2 -> 2
3 -> 3,2
...
7 -> 7,6,4
This is easy, if we will see binary representation for these numbers:
1 -> 1
10 -> 10
11 -> 11,10
111 -> 111,110,100
As you see, in the each chain - next value is previous value, where rightmost "1" changed to "0". Using simple bit operation "x & (x - 1)", we can wtite a simple loop to iterate array indexes, related to the interval number:
int interval = 7;
do {
int index = interval;
do_something(index);
} while(interval &= interval - 1);

Array size in Cycle leader iteration Algorithm [duplicate]

The cycle leader iteration algorithm is an algorithm for shuffling an array by moving all even-numbered entries to the front and all odd-numbered entries to the back while preserving their relative order. For example, given this input:
a 1 b 2 c 3 d 4 e 5
the output would be
a b c d e 1 2 3 4 5
This algorithm runs in O(n) time and uses only O(1) space.
One unusual detail of the algorithm is that it works by splitting the array up into blocks of size 3k+1. Apparently this is critical for the algorithm to work correctly, but I have no idea why this is.
Why is the choice of 3k + 1 necessary in the algorithm?
Thanks!
This is going to be a long answer. The answer to your question isn't simple and requires some number theory to fully answer. I've spent about half a day working through the algorithm and I now have a good answer, but I'm not sure I can describe it succinctly.
The short version:
Breaking the input into blocks of size 3k + 1 essentially breaks the input apart into blocks of size 3k - 1 surrounded by two elements that do not end up moving.
The remaining 3k - 1 elements in the block move according to an interesting pattern: each element moves to the position given by dividing the index by two modulo 3k.
This particular motion pattern is connected to a concept from number theory and group theory called primitive roots.
Because the number two is a primitive root modulo 3k, beginning with the numbers 1, 3, 9, 27, etc. and running the pattern is guaranteed to cycle through all the elements of the array exactly once and put them into the proper place.
This pattern is highly dependent on the fact that 2 is a primitive root of 3k for any k ≥ 1. Changing the size of the array to another value will almost certainly break this because the wrong property is preserved.
The Long Version
To present this answer, I'm going to proceed in steps. First, I'm going to introduce cycle decompositions as a motivation for an algorithm that will efficiently shuffle the elements around in the right order, subject to an important caveat. Next, I'm going to point out an interesting property of how the elements happen to move around in the array when you apply this permutation. Then, I'll connect this to a number-theoretic concept called primitive roots to explain the challenges involved in implementing this algorithm correctly. Finally, I'll explain why this leads to the choice of 3k + 1 as the block size.
Cycle Decompositions
Let's suppose that you have an array A and a permutation of the elements of that array. Following the standard mathematical notation, we'll denote the permutation of that array as σ(A). We can line the initial array A up on top of the permuted array σ(A) to get a sense for where every element ended up. For example, here's an array and one of its permutations:
A 0 1 2 3 4
σ(A) 2 3 0 4 1
One way that we can describe a permutation is just to list off the new elements inside that permutation. However, from an algorithmic perspective, it's often more helpful to represent the permutation as a cycle decomposition, a way of writing out a permutation by showing how to form that permutation by beginning with the initial array and then cyclically permuting some of its elements.
Take a look at the above permutation. First, look at where the 0 ended up. In σ(A), the element 0 ended up taking the place of where the element 2 used to be. In turn, the element 2 ended up taking the place of where the element 0 used to be. We denote this by writing (0 2), indicating that 0 should go where 2 used to be, and 2 should go were 0 used to be.
Now, look at the element 1. The element 1 ended up where 4 used to be. The number 4 then ended up where 3 used to be, and the element 3 ended up where 1 used to be. We denote this by writing (1 4 3), that 1 should go where 4 used to be, that 4 should go where 3 used to be, and that 3 should go where 1 used to be.
Combining these together, we can represent the overall permutation of the above elements as (0 2)(1 4 3) - we should swap 0 and 2, then cyclically permute 1, 4, and 3. If we do that starting with the initial array, we'll end up at the permuted array that we want.
Cycle decompositions are extremely useful for permuting arrays in place because it's possible to permute any individual cycle in O(C) time and O(1) auxiliary space, where C is the number of elements in the cycle. For example, suppose that you have a cycle (1 6 8 4 2). You can permute the elements in the cycle with code like this:
int[] cycle = {1, 6, 8, 4, 2};
int temp = array[cycle[0]];
for (int i = 1; i < cycle.length; i++) {
swap(temp, array[cycle[i]]);
}
array[cycle[0]] = temp;
This works by just swapping everything around until everything comes to rest. Aside from the space usage required to store the cycle itself, it only needs O(1) auxiliary storage space.
In general, if you want to design an algorithm that applies a particular permutation to an array of elements, you can usually do so by using cycle decompositions. The general algorithm is the following:
for (each cycle in the cycle decomposition algorithm) {
apply the above algorithm to cycle those elements;
}
The overall time and space complexity for this algorithm depends on the following:
How quickly can we determine the cycle decomposition we want?
How efficiently can we store that cycle decomposition in memory?
To get an O(n)-time, O(1)-space algorithm for the problem at hand, we're going to show that there's a way to determine the cycle decomposition in O(1) time and space. Since everything will get moved exactly once, the overall runtime will be O(n) and the overall space complexity will be O(1). It's not easy to get there, as you'll see, but then again, it's not awful either.
The Permutation Structure
The overarching goal of this problem is to take an array of 2n elements and shuffle it so that even-positioned elements end up at the front of the array and odd-positioned elements end up at the end of the array. Let's suppose for now that we have 14 elements, like this:
0 1 2 3 4 5 6 7 8 9 10 11 12 13
We want to shuffle the elements so that they come out like this:
0 2 4 6 8 10 12 1 3 5 7 9 11 13
There are a couple of useful observations we can have about the way that this permutation arises. First, notice that the first element does not move in this permutation, because even-indexed elements are supposed to show up in the front of the array and it's the first even-indexed element. Next, notice that the last element does not move in this permutation, because odd-indexed elements are supposed to end up at the back of the array and it's the last odd-indexed element.
These two observations, put together, means that if we want to permute the elements of the array in the desired fashion, we actually only need to permute the subarray consisting of the overall array with the first and last elements dropped off. Therefore, going forward, we are purely going to focus on the problem of permuting the middle elements. If we can solve that problem, then we've solved the overall problem.
Now, let's look at just the middle elements of the array. From our above example, that means that we're going to start with an array like this one:
Element 1 2 3 4 5 6 7 8 9 10 11 12
Index 1 2 3 4 5 6 7 8 9 10 11 12
We want to get the array to look like this:
Element 2 4 6 8 10 12 1 3 5 7 9 11
Index 1 2 3 4 5 6 7 8 9 10 11 12
Because this array was formed by taking a 0-indexed array and chopping off the very first and very last element, we can treat this as a one-indexed array. That's going to be critically important going forward, so be sure to keep that in mind.
So how exactly can we go about generating this permutation? Well, for starters, it doesn't hurt to take a look at each element and to try to figure out where it began and where it ended up. If we do so, we can write things out like this:
The element at position 1 ended up at position 7.
The element at position 2 ended up at position 1.
The element at position 3 ended up at position 8.
The element at position 4 ended up at position 2.
The element at position 5 ended up at position 9.
The element at position 6 ended up at position 3.
The element at position 7 ended up at position 10.
The element at position 8 ended up at position 4.
The element at position 9 ended up at position 11.
The element at position 10 ended up at position 5.
The element at position 11 ended up at position 12.
The element at position 12 ended up at position 6.
If you look at this list, you can spot a few patterns. First, notice that the final index of all the even-numbered elements is always half the position of that element. For example, the element at position 4 ended up at position 2, the element at position 12 ended up at position 6, etc. This makes sense - we pushed all the even elements to the front of the array, so half of the elements that came before them will have been displaced and moved out of the way.
Now, what about the odd-numbered elements? Well, there are 12 total elements. Each odd-numbered element gets pushed to the second half, so an odd-numbered element at position 2k+1 will get pushed to at least position 7. Its position within the second half is given by the value of k. Therefore, the elements at an odd position 2k+1 gets mapped to position 7 + k.
We can take a minute to generalize this idea. Suppose that the array we're permuting has length 2n. An element at position 2x will be mapped to position x (again, even numbers get halfed), and an element at position 2x+1 will be mapped to position n + 1 + x. Restating this:
The final position of an element at position p is determined as follows:
If p = 2x for some integer x, then 2x ↦ x
If p = 2x+1 for some integer x, then 2x+1 ↦ n + 1 + x
And now we're going to do something that's entirely crazy and unexpected. Right now, we have a piecewise rule for determining where each element ends up: we either divide by two, or we do something weird involving n + 1. However, from a number-theoretic perspective, there is a single, unified rule explaining where all elements are supposed to end up.
The insight we need is that in both cases, it seems like, in some way, we're dividing the index by two. For the even case, the new index really is formed by just dividing by two. For the odd case, the new index kinda looks like it's formed by dividing by two (notice that 2x+1 went to x + (n + 1)), but there's an extra term in there. In a number-theoretic sense, though, both of these really correspond to division by two. Here's why.
Rather than taking the source index and dividing by two to get the destination index, what if we take the destination index and multiply by two? If we do that, an interesting pattern emerges.
Suppose our original number was 2x. The destination is then x, and if we double the destination index to get back 2x, we end up with the source index.
Now suppose that our original number was 2x+1. The destination is then n + 1 + x. Now, what happens if we double the destination index? If we do that, we get back 2n + 2 + 2x. If we rearrange this, we can alternatively rewrite this as (2x+1) + (2n+1). In other words, we've gotten back the original index, plus an extra (2n+1) term.
Now for the kicker: what if all of our arithmetic is done modulo 2n + 1? In that case, if our original number was 2x + 1, then twice the destination index is (2x+1) + (2n+1) = 2x + 1 (modulo 2n+1). In other words, the destination index really is half of the source index, just done modulo 2n+1!
This leads us to a very, very interesting insight: the ultimate destination of each of the elements in a 2n-element array is given by dividing that number by two, modulo 2n+1. This means that there really is a nice, unified rule for determining where everything goes. We just need to be able to divide by two modulo 2n+1. It just happens to work out that in the even case, this is normal integer division, and in the odd case, it works out to taking the form n + 1 + x.
Consequently, we can reframe our problem in the following way: given a 1-indexed array of 2n elements, how do we permute the elements so that each element that was originally at index x ends up at position x/2 mod (2n+1)?
Cycle Decompositions Revisited
At this point, we've made quite a lot of progress. Given any element, we know where that element should end up. If we can figure out a nice way to get a cycle decomposition of the overall permutation, we're done.
This is, unfortunately, where things get complicated. Suppose, for example, that our array has 10 elements. In that case, we want to transform the array like this:
Initial: 1 2 3 4 5 6 7 8 9 10
Final: 2 4 6 8 10 1 3 5 7 9
The cycle decomposition of this permutation is (1 6 3 7 9 10 5 8 4 2). If our array has 12 elements, we want to transform it like this:
Initial: 1 2 3 4 5 6 7 8 9 10 11 12
Final: 2 4 6 8 10 12 1 3 5 7 9 11
This has cycle decomposition (1 7 10 5 9 11 12 6 3 8 4 2 1). If our array has 14 elements, we want to transform it like this:
Initial: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Final: 2 4 6 8 10 12 14 1 3 5 7 9 11 13
This has cycle decomposition (1 8 4 2)(3 9 12 6)(5 10)(7 11 13 14). If our array has 16 elements, we want to transform it like this:
Initial: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Final: 2 4 6 8 10 12 14 16 1 3 5 7 9 11 13 15
This has cycle decomposition (1 9 13 15 16 8 4 2)(3 10 5 11 14 7 12 6).
The problem here is that these cycles don't seem to follow any predictable patterns. This is a real problem if we're going to try to solve this problem in O(1) space and O(n) time. Even though given any individual element we can figure out what cycle contains it and we can efficiently shuffle that cycle, it's not clear how we figure out what elements belong to what cycles, how many different cycles there are, etc.
Primitive Roots
This is where number theory comes in. Remember that each element's new position is formed by dividing that number by two, modulo 2n+1. Thinking about this backwards, we can figure out which number will take the place of each number by multiplying by two modulo 2n+1. Therefore, we can think of this problem by finding the cycle decomposition in reverse: we pick a number, keep multiplying it by two and modding by 2n+1, and repeat until we're done with the cycle.
This gives rise to a well-studied problem. Suppose that we start with the number k and think about the sequence k, 2k, 22k, 23k, 24k, etc., all done modulo 2n+1. Doing this gives different patterns depending on what odd number 2n+1 you're modding by. This explains why the above cycle patterns seem somewhat arbitrary.
I have no idea how anyone figured this out, but it turns out that there's a beautiful result from number theory that talks about what happens if you take this pattern mod 3k for some number k:
Theorem: Consider the sequence 3s, 3s·2, 3s·22, 3s·23, 3s·24, etc. all modulo 3k for some k ≥ s. This sequence cycles through through every number between 1 and 3k, inclusive, that is divisible by 3s but not divisible by 3s+1.
We can try this out on a few examples. Let's work modulo 27 = 32. The theorem says that if we look at 3, 3 · 2, 3 · 4, etc. all modulo 27, then we should see all the numbers less than 27 that are divisible by 3 and not divisible by 9. Well, let'see what we get:
3 · 20 = 3 · 1 = 3 = 3 mod 27
3 · 21 = 3 · 2 = 6 = 6 mod 27
3 · 22 = 3 · 4 = 12 = 12 mod 27
3 · 23 = 3 · 8 = 24 = 24 mod 27
3 · 24 = 3 · 16 = 48 = 21 mod 27
3 · 25 = 3 · 32 = 96 = 15 mod 27
3 · 26 = 3 · 64 = 192 = 3 mod 27
We ended up seeing 3, 6, 12, 15, 21, and 24 (though not in that order), which are indeed all the numbers less than 27 that are divisible by 3 but not divisible by 9.
We can also try this working mod 27 and considering 1, 2, 22, 23, 24 mod 27, and we should see all the numbers less than 27 that are divisible by 1 and not divisible by 3. In other words, this should give back all the numbers less than 27 that aren't divisible by 3. Let's see if that's true:
20 = 1 = 1 mod 27
21 = 2 = 2 mod 27
22 = 4 = 4 mod 27
23 = 8 = 8 mod 27
24 = 16 = 16 mod 27
25 = 32 = 5 mod 27
26 = 64 = 10 mod 27
27 = 128 = 20 mod 27
28 = 256 = 13 mod 27
29 = 512 = 26 mod 27
210 = 1024 = 25 mod 27
211 = 2048 = 23 mod 27
212 = 4096 = 19 mod 27
213 = 8192 = 11 mod 27
214 = 16384 = 22 mod 27
215 = 32768 = 17 mod 27
216 = 65536 = 7 mod 27
217 = 131072 = 14 mod 27
218 = 262144 = 1 mod 27
Sorting these, we got back the numbers 1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 20, 22, 23, 25, 26 (though not in that order). These are exactly the numbers between 1 and 26 that aren't multiples of three!
This theorem is crucial to the algorithm for the following reason: if 2n+1 = 3k for some number k, then if we process the cycle containing 1, it will properly shuffle all numbers that aren't multiples of three. If we then start the cycle at 3, it will properly shuffle all numbers that are divisible by 3 but not by 9. If we then start the cycle at 9, it will properly shuffle all numbers that are divisible by 9 but not by 27. More generally, if we use the cycle shuffle algorithm on the numbers 1, 3, 9, 27, 81, etc., then we will properly reposition all the elements in the array exactly once and will not have to worry that we missed anything.
So how does this connect to 3k + 1? Well, we need to have that 2n + 1 = 3k, so we need to have that 2n = 3k - 1. But remember - we dropped the very first and very last element of the array when we did this! Adding those back in tells us that we need blocks of size 3k + 1 for this procedure to work correctly. If the blocks are this size, then we know for certain that the cycle decomposition will consist of a cycle containing 1, a nonoverlapping cycle containing 3, a nonoverlapping cycle containing 9, etc. and that these cycles will contain all the elements of the array. Consequently, we can just start cycling 1, 3, 9, 27, etc. and be absolutely guaranteed that everything gets shuffled around correctly. That's amazing!
And why is this theorem true? It turns out that a number k for which 1, k, k2, k3, etc. mod pn that cycles through all the numbers that aren't multiples of p (assuming p is prime) is called a primitive root of the number pn. There's a theorem that says that 2 is a primitive root of 3k for all numbers k, which is why this trick works. If I have time, I'd like to come back and edit this answer to include a proof of this result, though unfortunately my number theory isn't at a level where I know how to do this.
Summary
This problem was tons of fun to work on. It involves cute tricks with dividing by two modulo an odd numbers, cycle decompositions, primitive roots, and powers of three. I'm indebted to this arXiv paper which described a similar (though quite different) algorithm and gave me a sense for the key trick behind the technique, which then let me work out the details for the algorithm you described.
Hope this helps!
Here is most of the mathematical argument missing from templatetypedef’s
answer. (The rest is comparatively boring.)
Lemma: for all integers k >= 1, we have
2^(2*3^(k-1)) = 1 + 3^k mod 3^(k+1).
Proof: by induction on k.
Base case (k = 1): we have 2^(2*3^(1-1)) = 4 = 1 + 3^1 mod 3^(1+1).
Inductive case (k >= 2): if 2^(2*3^(k-2)) = 1 + 3^(k-1) mod 3^k,
then q = (2^(2*3^(k-2)) - (1 + 3^(k-1)))/3^k.
2^(2*3^(k-1)) = (2^(2*3^(k-2)))^3
= (1 + 3^(k-1) + 3^k*q)^3
= 1 + 3*(3^(k-1)) + 3*(3^(k-1))^2 + (3^(k-1))^3
+ 3*(1+3^(k-1))^2*(3^k*q) + 3*(1+3^(k-1))*(3^k*q)^2 + (3^k*q)^3
= 1 + 3^k mod 3^(k+1).
Theorem: for all integers i >= 0 and k >= 1, we have
2^i = 1 mod 3^k if and only if i = 0 mod 2*3^(k-1).
Proof: the “if” direction follows from the Lemma. If
i = 0 mod 2*3^(k-1), then
2^i = (2^(2*3^(k-1)))^(i/(2*3^(k-1)))
= (1+3^k)^(i/(2*3^(k-1))) mod 3^(k+1)
= 1 mod 3^k.
The “only if” direction is by induction on k.
Base case (k = 1): if i != 0 mod 2, then i = 1 mod 2, and
2^i = (2^2)^((i-1)/2)*2
= 4^((i-1)/2)*2
= 2 mod 3
!= 1 mod 3.
Inductive case (k >= 2): if 2^i = 1 mod 3^k, then
2^i = 1 mod 3^(k-1), and the inductive hypothesis implies that
i = 0 mod 2*3^(k-2). Let j = i/(2*3^(k-2)). By the Lemma,
1 = 2^i mod 3^k
= (1+3^(k-1))^j mod 3^k
= 1 + j*3^(k-1) mod 3^k,
where the dropped terms are divisible by (3^(k-1))^2, so
j = 0 mod 3, and i = 0 mod 2*3^(k-1).

Sort array by pairwise difference

For example we have the array X[n] = {X0, X1, X2, ... Xn}
The goal is to sort this array that the difference between every pair is in ascending order.
For example X[] = {10, 2, 7, 4}
Answers are:
2 7 10 4
4 10 7 2
I have some code but it's brute force :)
#include <stdio.h>
int main(int argc, char **argv)
{
int array[] = { 10, 2, 7, 4 };
int a[4];
for(int i = 0; i < 4; i++){
a[0] = array[i];
for(int j = 0; j < 4; j++){
a[1] = array[j];
if(a[0] == a[1])
continue;
for(int k = 0; k < 4; k++){
a[2] = array[k];
if(a[0] == a[2] || a[1] == a[2])
continue;
for(int l = 0; l < 4; l++){
a[3] = array[l];
if(a[0] == a[3] || a[1] == a[3] || a[2] == a[3])
continue;
if(a[0] - a[1] < a[1] - a[2] && a[1] - a[2] < a[2] - a[3])
printf("%d %d %d %d\n", a[0], a[1], a[2], a[3]);
}
}
}
}
return 0;
}
Any idea for "pretty" algorithm ? :)
DISCLAIMER This solution will arrange items to difference grow by absolute value. Thx to #Will Ness
One solutions according to the difference between every pair is in ascending order requirement.
You just sort array in ascending order O(n)*log(n) and then start in the middle. And the you arrange elements like this :
[n/2, n/2+1, n/2-1, n/2+2, n/2-2, n/2+3 ...] you go to +1 first if more element are on the right side of (n/2)th element
[n/2, n/2-1, n/2+1, n/2-2, n/2+2, n/2-3 ...] you go to -1 first otherwise.
Here you get ascending pairwise difference.
NOTE!!! It is not guaranteed that this algo will find the smallest difference and start with it, but I do not see this is requirements.
Example
Sorted array: {1, 2, 10, 15, 40, 50, 60, 61, 100, 101}
Then, you pick 50 (as 10/2 = 5th), 60 (10/2+1 = 6), 40 and so on...
You'll get: {40, 50, 15, 60, 10, 61, 2, 100, 1, 101}
Which got you diffs: 10, 35, 45, 50, 51, 59, 88, 99, 100
Let's see. Your example array is {10,2,7,4} and the answers you show are:
2 7 10 4
5 3 -6 differences, a[i+1] - a[i]
4 10 7 2
6 -3 -5
I show the flipped differences here, it's easier to analyze that way.
So, the goal is to have the differences a[i+1] - a[i] in descending order. Obviously some positive difference values will go first, then some negative. This means the maximal element of the array will appear somewhere in the middle. The positive differences to the left of it must be in descending order of absolute value, and the negatives to the right - in ascending order of absolute value.
Let's take another array as an example: {4,8,20,15,16,1,3}. We start by sorting it:
1 3 4 8 15 16 20
2 1 4 7 1 4 differences, a[i+1] - a[i]
Now, 20 goes in the middle, and after it to the right must go values progressively further apart. Since the differences to the left of 20 in the solution are positive, the values themselves are ascending, i.e. sorted. So whatever's left after we pick some of them to move to the right of the maximal element, stays as is, and the (positive) differences must be in descending order. If they are, the solution is found.
Here there are no solutions. The possibilities are:
... 20 16 8 (no more) left: 1 3 4 15 (diffs: 2 1 11 5)
... 20 16 4 (no more) left: 1 3 8 15 (diffs: 2 5 7 5)
... 20 16 3 (no more) left: 1 4 8 15 (diffs: 3 4 7 5)
... 20 16 1 (no more) left: 3 4 8 15 ....................
... 20 15 8 (no more) left: 1 3 4 16
... 20 15 4 (no more) left: 1 3 8 16
... 20 15 3 (no more) left: 1 4 8 16
... 20 15 1 (no more) left: 3 4 8 16
... 20 8 (no more) left: 1 3 4 15 16
... 20 4 (no more) left: 1 3 8 15 16
... 20 3 (no more) left: 1 4 8 15 16
... 20 1 (no more) left: 3 4 8 15 16
... 20 (no more) left: 1 3 4 8 15 16
Without 1 and 3, several solutions are possible.
Solution for this problem is not always possible. For example, array X[] = {0, 0, 0} cannot be "sorted" as required because both differences are always equal.
In case this problem has a solution, array values should be "sorted" as shown on the left diagram: some subset of the values in ascending order should form prefix of the resulting array, then all the remaining values in descending order should form its suffix. And "sorted" array should be convex.
This gives a hint for an algorithm: sort the array, then split its values into two convex subsets, then extract one of these subsets and append it (in reverse order) at the end.
A simple (partial) implementation would be: sort the array, find a subset of values that belong to convex hull, then check all the remaining values, and if they are convex, append them at the end. This algorithm works only if one of the subsets lies completely below the other one.
If the resulting subsets intersect (as shown on the right diagram), an improved version of this algorithm may be used: split sorted array into segments where one of the subsets lies completely below other one (A-B, B-C), then for each of these segments find convex hull and check convexity of the remaining subset. Note that X axis on the right diagram corresponds to the array indexes in a special way: for subset intersections (A, B, C) X corresponds to an index in ascending-sorted array; X coordinates for values between intersections are scaled according to their positions in the resulting array.
Sketch of an algorithm
Sort the array in ascending order.
Starting from the largest value, try adding convex hull values to the "top" subset (in a way similar to Graham scan algorithm). Also put all the values not belonging to convex hull to the "bottom" subset and check its convexity. Continue while all the values properly fit to either "top" or "bottom" subset. When the smallest value is processed, remove one of these subsets from the array, reverse the subset, and append at the and of the array.
If after adding some value to the "top" subset, the "bottom" subset is not convex anymore, rollback last addition and check if this value can be properly added to the "bottom" subset. If not, stop, because input array cannot be "sorted" as required. Otherwise, exchange "top" and "bottom" subsets and continue with step 2 (already processed values should not be moved between subsets, any attempt to move them should result in going to step 3).
In other words, we could process each value of sorted array, from largest to smallest, trying to append this value to one of two subsets in such a way that both subsets stay convex. At first, we try to place a new value to the subset where previous value was added. This may make several values, added earlier, unfit to this subset - then we check if they all fit to other subset. If they do - move them to other subset, if not - leave them in "top" subset but move current value to other subset.
Time complexity
Each value is added or removed from "top" subset at most once, also it may be added to "bottom" subset at most once. And for each operation on an element we need to inspect only two its nearest predecessors. This means worst-case time complexity of steps 2 and 3 is O(N). So overall time complexity is determined by the sorting algorithm on step 1.

Finding least difference between max and min value of all possible sets of numbers

There are n arrays of k size each.
a0[0],a0[1],......a0[k-1]
a1[0],a1[1],......a1[k-1]
.
.
.
an-1[0],an-1[1], .... an-1[k-1]
There are no duplicates at all and all the arrays are sorted.
Now a Set of size n is constructed by taking any value randomly from each of the arrays.
e.g one such set can be {a0[0],a1[3],a2[2],.... an-1[k-1]}.
My goal is to find out the min and max elements in all possible Sets such that the difference between the min and max is the lowest.
Example (k=3,n=3)
[3 7 11]
[1 12 15]
[4 19 21]
So mathematically there will be 27 such sets
(3 1 4) (3 12 4) (3 15 4)
(3 1 19) (3 12 19) (3 15 19)
(3 1 21) (3 12 21) (3 15 21)
(7 1 4) (7 12 4) (7 15 4)
(7 1 19) (7 12 19) (7 15 19)
(7 1 21) (7 12 21) (7 15 21)
(11 1 4) (7 12 4) (11 15 4)
(11 1 19) (7 12 19) (11 15 19)
(11 1 21) (7 12 21) (11 15 21)
After computing min and max values of all these sets we can conclude that (3 1 4) is the set for which difference between min (1) and max (4) is the global minimum or lowest.
So we will output 3 as the global minimum difference and the corresponding pair which is (3 4). If there are multiple global minima then print them all. Please suggest the algorithm with better time and space complexity. We can't go for brute force approach.
If I understand correctly, you want to find the set in which the largest difference within its elements is globally minimal. (I will call that the range of the set)
Start with k sets, with each of them initially contain the an element from the first array. For each set, the minimum and maximum would be equal to the element itself.
For your example that would be {3}, {7} and {11}.
Then you move on to the second array. For each set, you have to pick an element from that array that minimizes the new range. Ideally you would select an element that does not increase the range (but that's not possible now). If that's not possible, select the elements that expand your sets going both direction (plus and minus). For the example that would give you {1-3}, {3-12},{1-7},{7-12},{1-11} and {11-12}. From these 2k sets, you can remove sets that are overlapping. For example, the set {1-7} will always have a larger or equal range compared to the set {1-3}. You don't need to investigate the {1-7} set. You end up with sets {1-3} and {11-12}.
Moving on to the third array, again select the elements that expand the ranges of each sets as small as possible. You end up with {1-4}, {11-19} and {4-12}. Then just select the one with the lowest range.
Complexity of this algorithm is O(n*k*logn)
Select only the first element from each row and create a min heap and a max heap.
Calculate the current difference (=head(maxHeap)-head(minheap))
Remove the head of minHeap (and the corresponding element from maxHeap) and add the next element from the corresponding array (corresponding to removed element) to both minHeap and maxHeap.
Repeat it until all the elements in at one of the array are exhausted.
Complexity: You add/remove nk elements and update takes O(n) time at most. So complexity is O(nklogn).
Note: It is not exactly your stock heap. The minHeap contains pointer to the same element in maxHeap and vice versa. When you delete an element from minHeap, you can find the link to the maxHeap and delete it too. Also whenever position of a particular element changes, appropriate changes are made to the link in the other heap.
Iterate over all n * k elements.
Say our current element has value v. Let's calculate the minimum difference assuming that v is the minimum value in the resulting n-tuple.
Binary search the position of v in the other n - 1 arrays (since they're sorted, we can do it), and notice that to reduce the difference it's optimal to pick the smallest elements that are greater than or equal than v for all the other arrays. This is precisely what the binary search gives us.
An example:
[3 7 11]
[1 12 15]
[4 19 21]
If we take v = 1 on the second array, then we'll pick 3 on the first array, and 4 on the second.
The complexity is O(N * K * N * log(N)) = O(N^2 * log(N) * K).
Consider the result Set as a simple 1xn array
[3]
[1]
[4]
where the global difference is 3 (maximum of 4 less minimum of 1).
Now consider an expanded definition of Set, call it MultiSet. A MultiSet is an array where each element contains an ordered set of items.
[3 7 11]
[1 12 15]
[4 19 21]
We can calculate the global difference (call it 'cost') by the difference between the maximum 'last' value of each row and the minimum 'first' value of each row. In this case the cost will be the difference between 21 (max(11,15,21)) and 1 (min(3,1,4)), which is 20.
The process will now be to iterate the MultiSet until we reach minimum cost using the following algorithm:
identify the row with the lowest value. If this row has only one element then continue, otherwise consider the potential cost reduction of removing the values from this row which are lower than the lowest value from any other row.
identify the row with the highest value. If this row has only element then continue, otherwise consider the potential cost reduction of removing the values from this row which are higher than the highest value from any other row.
remove those values identified above and continue to the next highest/lowest items.
-
If lowest and highest values are both in single-item rows then the cost has been minimised.
To demonstrates in the given example, the original cost of 20 can be reduced to 18 by removing the lowest value of 1 (the lowest value in the MultiSet which is lower than any other rows minimum), or reduced to 14 by removing the highest values of 19 and 21 (the highest values in the MultiSet which are higher than any other row's maximum, i.e. 15) from the final row. The resulting MultiSet would be
[3 7 11]
[1 12 15]
[4]
The second iteration has us remove the 12 and 15 to reduce the cost to 10.
[3 7 11]
[1]
[4]
And the third and final iteration has us remove the 7 and 11 to reduce the cost to 3. After the third iteration the global difference can no longer be minimised, thus the solution is reached.
The complexity? Upper bounded by O(n * m * log(n) * k)
Code:
private static ElementData calcMin(int[] n1Arr, int[] n2Arr, int[] n3Arr) {
ElementData data = new ElementData();// added just to know which two elements algo has picked
int[] mixArr = { n1Arr[0], n2Arr[0], n3Arr[0] };
Arrays.sort(mixArr);
int minValue = mixArr[2] - mixArr[0];
data.setMinValue(minValue);
data.setHighValue(mixArr[2]);
data.setLowValue(mixArr[0]);
int tempValue = 0;
for (int n1 : n1Arr) {
for (int n2 : n2Arr) {
for (int n3 : n3Arr) {
int[] mixArr1 = { n1, n2, n3 };
Arrays.sort(mixArr1);
tempValue = mixArr1[2] - mixArr1[0];
if (minValue > tempValue) {
minValue = tempValue;
data = new ElementData();
data.setMinValue(minValue);
data.setHighValue(mixArr1[2]);
data.setLowValue(mixArr1[0]);
}
}
}
}
return data;
}

minimum steps required to make array of integers contiguous

given a sorted array of distinct integers, what is the minimum number of steps required to make the integers contiguous? Here the condition is that: in a step , only one element can be changed and can be either increased or decreased by 1 . For example, if we have 2,4,5,6 then '2' can be made '3' thus making the elements contiguous(3,4,5,6) .Hence the minimum steps here is 1 . Similarly for the array: 2,4,5,8:
Step 1: '2' can be made '3'
Step 2: '8' can be made '7'
Step 3: '7' can be made '6'
Thus the sequence now is 3,4,5,6 and the number of steps is 3.
I tried as follows but am not sure if its correct?
//n is the number of elements in array a
int count=a[n-1]-a[0]-1;
for(i=1;i<=n-2;i++)
{
count--;
}
printf("%d\n",count);
Thanks.
The intuitive guess is that the "center" of the optimal sequence will be the arithmetic average, but this is not the case. Let's find the correct solution with some vector math:
Part 1: Assuming the first number is to be left alone (we'll deal with this assumption later), calculate the differences, so 1 12 3 14 5 16-1 2 3 4 5 6 would yield 0 -10 0 -10 0 -10.
sidenote: Notice that a "contiguous" array by your implied definition would be an increasing arithmetic sequence with difference 1. (Note that there are other reasonable interpretations of your question: some people may consider 5 4 3 2 1 to be contiguous, or 5 3 1 to be contiguous, or 1 2 3 2 3 to be contiguous. You also did not specify if negative numbers should be treated any differently.)
theorem: The contiguous numbers must lie between the minimum and maximum number. [proof left to reader]
Part 2: Now returning to our example, assuming we took the 30 steps (sum(abs(0 -10 0 -10 0 -10))=30) required to turn 1 12 3 14 5 16 into 1 2 3 4 5 6. This is one correct answer. But 0 -10 0 -10 0 -10+c is also an answer which yields an arithmetic sequence of difference 1, for any constant c. In order to minimize the number of "steps", we must pick an appropriate c. In this case, each time we increase or decrease c, we increase the number of steps by N=6 (the length of the vector). So for example if we wanted to turn our original sequence 1 12 3 14 5 16 into 3 4 5 6 7 8 (c=2), then the differences would have been 2 -8 2 -8 2 -8, and sum(abs(2 -8 2 -8 2 -8))=30.
Now this is very clear if you could picture it visually, but it's sort of hard to type out in text. First we took our difference vector. Imagine you drew it like so:
4|
3| *
2| * |
1| | | *
0+--+--+--+--+--*
-1| |
-2| *
We are free to "shift" this vector up and down by adding or subtracting 1 from everything. (This is equivalent to finding c.) We wish to find the shift which minimizes the number of | you see (the area between the curve and the x-axis). This is NOT the average (that would be minimizing the standard deviation or RMS error, not the absolute error). To find the minimizing c, let's think of this as a function and consider its derivative. If the differences are all far away from the x-axis (we're trying to make 101 112 103 114 105 116), it makes sense to just not add this extra stuff, so we shift the function down towards the x-axis. Each time we decrease c, we improve the solution by 6. Now suppose that one of the *s passes the x axis. Each time we decrease c, we improve the solution by 5-1=4 (we save 5 steps of work, but have to do 1 extra step of work for the * below the x-axis). Eventually when HALF the *s are past the x-axis, we can NO LONGER IMPROVE THE SOLUTION (derivative: 3-3=0). (In fact soon we begin to make the solution worse, and can never make it better again. Not only have we found the minimum of this function, but we can see it is a global minimum.)
Thus the solution is as follows: Pretend the first number is in place. Calculate the vector of differences. Minimize the sum of the absolute value of this vector; do this by finding the median OF THE DIFFERENCES and subtracting that off from the differences to obtain an improved differences-vector. The sum of the absolute value of the "improved" vector is your answer. This is O(N) The solutions of equal optimality will (as per the above) always be "adjacent". A unique solution exists only if there are an odd number of numbers; otherwise if there are an even number of numbers, AND the median-of-differences is not an integer, the equally-optimal solutions will have difference-vectors with corrective factors of any number between the two medians.
So I guess this wouldn't be complete without a final example.
input: 2 3 4 10 14 14 15 100
difference vector: 2 3 4 5 6 7 8 9-2 3 4 10 14 14 15 100 = 0 0 0 -5 -8 -7 -7 -91
note that the medians of the difference-vector are not in the middle anymore, we need to perform an O(N) median-finding algorithm to extract them...
medians of difference-vector are -5 and -7
let us take -5 to be our correction factor (any number between the medians, such as -6 or -7, would also be a valid choice)
thus our new goal is 2 3 4 5 6 7 8 9+5=7 8 9 10 11 12 13 14, and the new differences are 5 5 5 0 -3 -2 -2 -86*
this means we will need to do 5+5+5+0+3+2+2+86=108 steps
*(we obtain this by repeating step 2 with our new target, or by adding 5 to each number of the previous difference... but since you only care about the sum, we'd just add 8*5 (vector length times correct factor) to the previously calculated sum)
Alternatively, we could have also taken -6 or -7 to be our correction factor. Let's say we took -7...
then the new goal would have been 2 3 4 5 6 7 8 9+7=9 10 11 12 13 14 15 16, and the new differences would have been 7 7 7 2 1 0 0 -84
this would have meant we'd need to do 7+7+7+2+1+0+0+84=108 steps, the same as above
If you simulate this yourself, can see the number of steps becomes >108 as we take offsets further away from the range [-5,-7].
Pseudocode:
def minSteps(array A of size N):
A' = [0,1,...,N-1]
diffs = A'-A
medianOfDiffs = leftMedian(diffs)
return sum(abs(diffs-medianOfDiffs))
Python:
leftMedian = lambda x:sorted(x)[len(x)//2]
def minSteps(array):
target = range(len(array))
diffs = [t-a for t,a in zip(target,array)]
medianOfDiffs = leftMedian(diffs)
return sum(abs(d-medianOfDiffs) for d in diffs)
edit:
It turns out that for arrays of distinct integers, this is equivalent to a simpler solution: picking one of the (up to 2) medians, assuming it doesn't move, and moving other numbers accordingly. This simpler method often gives incorrect answers if you have any duplicates, but the OP didn't ask that, so that would be a simpler and more elegant solution. Additionally we can use the proof I've given in this solution to justify the "assume the median doesn't move" solution as follows: the corrective factor will always be in the center of the array (i.e. the median of the differences will be from the median of the numbers). Thus any restriction which also guarantees this can be used to create variations of this brainteaser.
Get one of the medians of all the numbers. As the numbers are already sorted, this shouldn't be a big deal. Assume that median does not move. Then compute the total cost of moving all the numbers accordingly. This should give the answer.
community edit:
def minSteps(a):
"""INPUT: list of sorted unique integers"""
oneMedian = a[floor(n/2)]
aTarget = [oneMedian + (i-floor(n/2)) for i in range(len(a))]
# aTargets looks roughly like [m-n/2?, ..., m-1, m, m+1, ..., m+n/2]
return sum(abs(aTarget[i]-a[i]) for i in range(len(a)))
This is probably not an ideal solution, but a first idea.
Given a sorted sequence [x1, x2, …, xn]:
Write a function that returns the differences of an element to the previous and to the next element, i.e. (xn – xn–1, xn+1 – xn).
If the difference to the previous element is > 1, you would have to increase all previous elements by xn – xn–1 – 1. That is, the number of necessary steps would increase by the number of previous elements × (xn – xn–1 – 1). Let's call this number a.
If the difference to the next element is >1, you would have to decrease all subsequent elements by xn+1 – xn – 1. That is, the number of necessary steps would increase by the number of subsequent elements × (xn+1 – xn – 1). Let's call this number b.
If a < b, then increase all previous elements until they are contiguous to the current element. If a > b, then decrease all subsequent elements until they are contiguous to the current element. If a = b, it doesn't matter which of these two actions is chosen.
Add up the number of steps taken in the previous step (by increasing the total number of necessary steps by either a or b), and repeat until all elements are contiguous.
First of all, imagine that we pick an arbitrary target of contiguous increasing values and then calculate the cost (number of steps required) for modifying the array the array to match.
Original: 3 5 7 8 10 16
Target: 4 5 6 7 8 9
Difference: +1 0 -1 -1 -2 -7 -> Cost = 12
Sign: + 0 - - - -
Because the input array is already ordered and distinct, it is strictly increasing. Because of this, it can be shown that the differences will always be non-increasing.
If we change the target by increasing it by 1, the cost will change. Each position in which the difference is currently positive or zero will incur an increase in cost by 1. Each position in which the difference is currently negative will yield a decrease in cost by 1:
Original: 3 5 7 8 10 16
New target: 5 6 7 8 9 10
New Difference: +2 +1 0 0 -1 -6 -> Cost = 10 (decrease by 2)
Conversely, if we decrease the target by 1, each position in which the difference is currently positive will yield a decrease in cost by 1, while each position in which the difference is zero or negative will incur an increase in cost by 1:
Original: 3 5 7 8 10 16
New target: 3 4 5 6 7 8
New Difference: 0 -1 -2 -2 -3 -8 -> Cost = 16 (increase by 4)
In order to find the optimal values for the target array, we must find a target such that any change (increment or decrement) will not decrease the cost. Note that an increment of the target can only decrease the cost when there are more positions with negative difference than there are with zero or positive difference. A decrement can only decrease the cost when there are more positions with a positive difference than with a zero or negative difference.
Here are some example distributions of difference signs. Remember that the differences array is non-increasing, so positives always have to be first and negatives last:
C C
+ + + - - - optimal
+ + 0 - - - optimal
0 0 0 - - - optimal
+ 0 - - - - can increment (negatives exceed positives & zeroes)
+ + + 0 0 0 optimal
+ + + + - - can decrement (positives exceed negatives & zeroes)
+ + 0 0 - - optimal
+ 0 0 0 0 0 optimal
C C
Observe that if one of the central elements (marked C) is zero, the target must be optimal. In such a circumstance, at best any increment or decrement will not change the cost, but it may increase it. This result is important, because it gives us a trivial solution. We pick a target such that a[n/2] remains unchanged. There may be other possible targets that yield the same cost, but there are definitely none that are better. Here's the original code modified to calculate this cost:
//n is the number of elements in array a
int targetValue;
int cost = 0;
int middle = n / 2;
int startValue = a[middle] - middle;
for (i = 0; i < n; i++)
{
targetValue = startValue + i;
cost += abs(targetValue - a[i]);
}
printf("%d\n",cost);
You can not do it by iterating once on the array, that's for sure.
You need first to check the difference between each two numbers, for example:
2,7,8,9 can be 2,3,4,5 with 18 steps or 6,7,8,9 with 4 steps.
Create a new array with the difference like so: for 2,7,8,9 it wiil be 4,1,1. Now you can decide whether to increase or decrease the first number.
Lets assume that the contiguous array looks something like this -
c c+1 c+2 c+3 .. and so on
Now lets take an example -
5 7 8 10
The contiguous array in this case will be -
c c+1 c+2 c+3
In order to get the minimum steps, the sum of the modulus of the difference of the integers(before and after) w.r.t the ith index should be the minimum. In which case,
(c-5)^2 + (c-6)^2 + (c-6)^2 + (c-7)^2 should be minimum
Let f(c) = (c-5)^2 + (c-6)^2 + (c-6)^2 + (c-7)^2
= 4c^2 - 48c + 146
Applying differential calculus to get the minima,
f'(c) = 8c - 48 = 0
=> c = 6
So our contiguous array is 6 7 8 9 and the minimum cost here is 2.
To sum it up, just generate f(c), get the first differential and find out c.
This should take O(n).
Brute force approach O(N*M)
If one draws a line through each point in the array a then y0 is a value where each line starts at index 0. Then the answer is the minimum among number of steps reqired to get from a to every line that starts at y0, in Python:
y0s = set((y - i) for i, y in enumerate(a))
nsteps = min(sum(abs(y-(y0+i)) for i, y in enumerate(a))
for y0 in xrange(min(y0s), max(y0s)+1)))
Input
2,4,5,6
2,4,5,8
Output
1
3

Resources