Using dynamic programming to find minimum cost of a pipeline - algorithm

I am learning dynamic programming and I have this is problem that i can't understand.
We have a a pipeline that is connected by depos (pumps). The pipeline is linear and we have assigned the depos a value. For example 4----5----1----2 (shows a pipeline, with numbers representing pumps/depos), the total value of pipeline is computed as 4*5 +4*1 + 4*2 + 5*1 + 5*2 + 1*2 = 49.
We are now given with a problem to cut the water supply at such a point that we are left with the minimum value of pipeline for example cutting between 5 and 1 gives us 22 (4---5--/--1---2 gives as 4*5 + 1*2 = 22.)
while cutting it at 4 and 5 gives 17 ( 4--/--5--1---2 gives 5*1 + 5*2 + 1*2 = 17). We are allowed a limited number of cuts. and we are to device a dynamic algorithm that gives us the least pipeline cost.

Related

Distance algorithm - minimum coins required to clear all the level

Thor is playing a game where there are N levels and M types of available weapons. The levels are numbered from 0 to N-1 and the weapons are numbered from 0 to M-1. He can clear these levels in any order. In each level, some subset of these M weapons is required to clear this level. If in a particular level, he needs to buy x new weapons, he will pay x^2 coins for it. Also note that he can carry all the weapons he has currently to the next level. Initially, he has no weapons. Can you find out the minimum coins required such that he can clear all the levels?
Input Format
The first line of input contains 2 space separated integers:
N = the number of levels in the game
M = the number of types of weapons
N lines follow. The ith of these lines contains a binary string of length M. If the jth character of this string is 1, it means we need a weapon of type j to clear the ith level.
Constraints
1 <= N <= 20
1 <= M <= 20
Output Format
Print a single integer which is the answer to the problem.
Sample TestCase 1
Input
1 4
0101
Output
4
Explanation
There is only one level in this game. We need 2 types of weapons - 1 and 3. Since, initially, Thor has no weapons he will have to buy these, which will cost him 2^2 = 4 coins.
Sample TestCase 2
Input
3 3
111
001
010
Output
3
Explanation
There are 3 levels in this game. The 0th level (111) requires all 3 types of weapons. The 1st level (001) requires only weapon of type 2. The 2nd level requires only weapon of type 1. If we clear the levels in the given order (0-1-2), total cost = 3^2 + 0^2 + 0^2 = 9 coins. If we clear the levels in the order 1-2-0, it will cost = 1^2 + 1^2 + 1^2 = 3 coins, which is the optimal way.
The beauty of Gassa's answer is partly in the fact that if a different state can be reached by oring one of the levels' bitstring masks with the current state, we are guaranteed that achieving the current state did not include visiting this level (since otherwise those bits would already be set). This means checking a transition from one state to another by adding a different bitmask, guarantees we are looking at an ordering that did not yet include that mask. So a formulation like Gassa's could work: let f(st) represent the cost of acheiving state st, then:
f(st) = min(
some known cost of f(st),
f(prev_st) + (popcount(prev_st | level) - popcount(prev_st))^2
)
for all level and prev_st that or to st

Multiplication by power series summation with negative terms

How can I calculate a floating point multiplicand in Verilog? So far, I usually use shift << 1024 , then floating point number become to integer. Then I do some operations, then >> 1024 to obtain a fraction again.
For example 0.3545 = 2^-2 + 2^-4 + ...
I have question about another way, like this. I don't know where does the minus (-) comes from:
0.46194 = 2^-1 - 2^-5 - 2^-7 + 2^-10.
I have just look this from someone. but as you way, that is represented like this
0.46194 = 2^-2 + 2^-3 + 2^-4 + 2^-6 + 2^-7 + 2^-10 + .... .
I don't understand how does it know the minus is used it?
How do we know when the minus needed to it? Also how can I apply to verilog RTL?
UPDATE : I understand the concept the using minus in operation. But Is there any other way to equation or methodologies what to make reduce expression what multiplying with power of 2?
UPDATE : how can we use this method in verilog? for example, I have leaned 0.46194 = 2^-1 - 2^-5 - 2^-7 + 2^-10. then this code was written like this in verilog. 0.011101101 ='hED = 'd237. So the point of the question is how can we apply it to application in verilog?
UPDATE : Sir Would you please check this one? there are a little difference result.
0.46194 = 0.011101101. I just tried like this
0.011101101
0.100T10T01
= 2^-1 - 2^-4 + 2^-5 - 2^-7 + 2^-9. = 0.462890625
Something different. What do I wrong?
Multiplication of a variable by a constant is often implemented by adding the variable to shifted versions of itself. This is much cheaper to put on an FPGA than a multiplier circuit accepting two variables.
You can get further savings when there's a sequence of 1-bits in the constant, by using subtraction as well. (A subtraction circuit is only equally expensive as addition.)
Consider the number 30 = 11110. It's equal to 16 + 8 + 4 + 2, but it's also equal to 32 - 2.
In general, a sequence of multiplicand 1-bits, or the sum of several successive powers of two, can be formed by adding the first power of two after the most significant bit, and subtracting the least significant bit. Hence, instead of 16x + ... + 2x, use 32x - 2x.
It doesn't matter if the sequence of 1-bits is part of a fraction or an integer. You're just applying the identity 2^a = 1 + ∑2^0 ... 2^(a-1), in other worsd ∑2^0 ... 2^a = 2^(a+1) - 1.
In a 4 bit base 2 number can have these values:
Base 2: Unsigned 4 bit integer,
2^3 2^2 2^1 2^0
8 4 2 1
If we have a 0111 it represents 7. If we were to multiply by this number using a shift add architecture it would take 3 clockcycles (3 shift and adds).
An optimisation to this is called CSD (Canonical Signed Digit. It allows minus one to be present in the 'binary numbers'. We shall represent -1 as one bar, or T as that looks like a one with a bar over the top.
100T represents 8 - 1 which is the same as 0111. It can be observed that long runs of 1's can be replaced with a the 0 that ends the run becoming 1 and the first 1 of the run becoming a -1, (T).
An example of conversion:
00111101111
01000T1000T
But if passed in two section we would get :
00111101111
0011111000T
010000T000T
We have taken a number that would take 8 clock cycles or 8 blocks of logic to compute and turned it into 3.
Related questions to fixed point values in Verilog x precision binary fixed point representation? and verilog-floating-points-multiplication.
To cover the follow up question:
To answer the follow up section about your question on CSD conversion. I will look at them as pure integers to simplify the numbers, this is the same as multiplying the values by 2^9 (9 fractional bits).
256 128 64 32 16 8 4 2 1
0 1 1 1 0 1 1 0 1
128 + 64 +32 + 8 +4 +1 => 237
Now with your CSD conversion:
256 128 64 32 16 8 4 2 1
1 0 0 T 1 0 T 0 1
256 -32 + 16 - 4 + 1 => 237
You can see your conversion was correct. I get 237* 2^-9 as 0.462890625, which matches your answer when converted back to fractional. The 0.46194 that you started with must have been a rounded version, or when quantised to 9 fractional bits gets truncated. This error is known as quantisation error. The most important thing here though is that you got the CSD conversion correct.

Networking Algorithm to Maximum a Subset of Integers

I'm currently working on a pet project that simulates a couple different types of networks. One of them requires some specific conditions that until now I've just been brute forcing. It's not scaling well, however, so I'm trying to do this efficiently but this algorithm is really stumping me! I'll try to describe the problem as general as possible.
Given a set of integers X and an integer k, find a subset Y of X that maximizes the sum of M over each value in X:
M(s) = the largest value in Y such that it is less than or equal to s.
For example, for {2, 4, 5} and k = 2 the solution is {2, 4} with value 2+4+4=10 since M(2) = 2, M(4) = 4, and M(5) = 5.
My intuition is that the solution is a dynamic programming algorithm, but I could be way off. Any help would be greatly appreciated!
Here is a dynamic program problem with a solution - I'm not sure if it's yours because I'm not sure of the details of what you have written, but it might be.
Sort the set of numbers and draw a curve with the x axis giving the offset of the number in sorted order and the y axis giving the number. There will be some area under the curve.
You have a finite number of points, usually a smaller number than there are members of the set. You can use each of these points to mark a point of the set, and so a point of the curve.
Draw a histogram under the curve. At each marked point there is a line from that point going right, so the lines are entirely under the curve. Each such line extends till it reaches the x value for the next marked point, at which point there is a line going up to the new marked point.
The challenge is then to select which points to mark to maximize the area under the horizontal lines going right from marked points. This is straightforward dynamic programming. If you can choose up to k marked points then at each point of the histogram work out the most area you can cover to the left of that point using 0, 1, 2, ..k marked points, possibly including that point. You can work out the answer for each point by referring to the answers you have already worked out for the points to its left. The answer for the rightmost point is the answer for the entire problem.
To expand this: Suppose you are working out the best solutions for maximum area ending at offset 10. For each value j of 0..k consider taking the previous best solution ending at 0, 1, 2, 3... 9 and maintaining the height at that point, without introducing a new line. The total area for this is the area up to that point plus the new area gained by whatever height they were in at that point times the distance back to that point. Also consider doing this, but using an extra marked point at that point, so the total area is the area of the best solution with j-1 points up to e.g. point 7 plus the distance back from point 10 to point 7 times the height reached at point 7. By considering these two possibilities you can work out the best solution at point 10 using 0,1,2,...k marked points.
I think these problems are related because for each point, marked or not, the area it contributes to the histogram depends on the height of the line above it, which is the height of the largest marked point no greater that the point we are considering at the moment.
To do this you need an array of kn elements giving the area covered by the best solution at each point using at most k marked points up to there. It will also be convenient to use an extra array of this size to record the decision that led to this best solution, so you can trace the answer back. This has a cost of about kn^2, because at each of n points you need to calculate k values, and look back at all the previous points as you do so. I suspect that you could reduce this to something like O(kn) by changing the definition of what you store at each point so you never have to look back further than one previous point. If you could do that, you could economize on store at the cost of time by only storing a few intermediate points and solving the problem over again on smaller sections to trace back, but you'ld need to be desperately short of store to make that worth while.
My answer is very similar than the other one:
The algorithm I suggest is to start having K=N, all numbers ordered, and keep removing numbers until you reach the desired K. The number you select to remove in each step, is the one who represent the lowest loss.
Example: Let's say you have the numbers:
3, 7, 9, 13 and 19
The problem is K=3
You start in K=5 (all numbers are selected).
3 + 7 + 9 + 13 + 19 = 51
First number to remove:
if 3 is selected:
0 + 7 + 9 + 13 + 19 = 48 (we lose 3)
if 4 is selected: (7 becomes 3)
3 + 3 + 9 + 13 + 19 = 47 (we lose 4)
if 9 is selected: (9 becomes 7)
3 + 7 + 7 + 13 + 19 = 49 (we lose 2)
if 13 is selected: we lose 13 - 9 = 4
if 19 is selected: we lose 19 - 13 = 6
Lowest loss in this case is: number 9 (loss=2).
We remove 9, and then we have K=4.
For the second number to remove, we have 4 options:
if we remove 3:
0 + 7 + 7 + 13 + 19 = # (we lose 3)
if we remove 7 all 7s will become 3s:
3 + 3 + 3 + 13 + 19 = # (we lose two 7s becoming 3 = (7-3) x 2 = 8)
if we remove 13:
3 + 7 + 7 + 7 + 19 (loss = 13 - 7 = 6)
if 19 is removed:
3 + 7 + 7 + 13 + 13 (loss = 6)
Best selection here is to remove #3
and then K=3 achieving the sum: 46
I don't know if this is optimal, you could verify by comparing vs. brute force multiple cases. But, even if this is not optimal, it can give good results.

Modified Tower of Hanoi

We all know that the minimum number of moves required to solve the classical towers of hanoi problem is 2n-1. Now, let us assume that some of the discs have same size. What would be the minimum number of moves to solve the problem in that case.
Example, let us assume that there are three discs. In the classical problem, the minimum number of moves required would be 7. Now, let us assume that the size of disc 2 and disc 3 is same. In that case, the minimum number of moves required would be:
Move disc 1 from a to b.
Move disc 2 from a to c.
Move disc 3 from a to c.
Move disc 1 from b to c.
which is 4 moves. Now, given the total number of discs n and the sets of discs which have same size, find the minimum number of moves to solve the problem. This is a challenge by a friend, so pointers towards solution are welcome. Thanks.
Let's consider a tower of size n. The top disk has to be moved 2n-1 times, the second disk 2n-2 times, and so on, until the bottom disk has to be moved just once, for a total of 2n-1 moves. Moving each disk takes exactly one turn.
1 moved 8 times
111 moved 4 times
11111 moved 2 times
1111111 moved 1 time => 8 + 4 + 2 + 1 == 15
Now if x disks have the same size, those have to be in consecutive layers, and you would always move them towards the same target stack, so you could just as well collapse those to just one disk, requiring x turns to be moved. You could consider those multi-disks to be x times as 'heavy', or 'thick', if you like.
1
111 1 moved 8 times
111 collapse 222 moved 4 times, taking 2 turns each
11111 -----------> 11111 moved 2 times
1111111 3333333 moved 1 time, taking 3 turns
1111111 => 8 + 4*2 + 2 + 1*3 == 21
1111111
Now just sum those up and you have your answer.
Here's some Python code, using the above example: Assuming you already have a list of the 'collapsed' disks, with disks[i] being the weight of the collapsed disk in the ith layer, you can just do this:
disks = [1, 2, 1, 3] # weight of collapsed disks, top to bottom
print sum(d * 2**i for i, d in enumerate(reversed(disks)))
If instead you have a list of the sizes of the disks, like on the left side, you could use this algorithm:
disks = [1, 3, 3, 5, 7, 7, 7] # size of disks, top to bottom
last, t, s = disks[-1], 1, 0
for d in reversed(disks):
if d < last: t, last = t*2, d
s = s + t
print s
Output, in both cases, is 21, the required number of turns.
It completely depends on the distribution of the discs that are the same size. If you have n=7 discs and they are all the same size then the answer is 7 (or n). And, of course the standard problem is answered by 2n-1.
As tobias_k suggested, you can group same size discs. So now look at the problem as moving groups of discs. To move a certain number of groups, you have to know the size of each group
examples
1
n=7 //disc sizes (1,2,3,3,4,5,5)
g=5 //group sizes (1,1,2,1,2)
//group index (1,2,3,4,5)
number of moves = sum( g-size * 2^( g-count - g-index ) )
in this case
moves = 1*2^4 + 1*2^3 + 2*2^2 + 1*2^1 + 2*2^0
= 16 + 8 + 8 + 2 + 2
= 36
2
n=7 //disc sizes (1,1,1,1,1,1,1)
g=1 //group sizes (7)
//group index (1)
number of moves = sum( g-size * 2^( g-count - g-index ) )
in this case
moves = 7*2^0
= 7
3
n=7 //disc sizes (1,2,3,4,5,6,7)
g=7 //group sizes (1,1,1,1,1,1,1)
//group index (1,2,3,4,5,6,7)
number of moves = sum( g-size * 2^( g-count - g-index ) )
in this case
moves = 1*2^6 + 1*2^5 + 1*2^4 + 1*2^3 + 1*2^2 + 1*2^1 + 1*2^0
= 64 + 32 + 16 + 8 + 4 + 2 + 1
= 127
Interesting note about the last example, and the standard hanoi problem: sum(2n-1) = 2n - 1
I wrote a Github gist in C for this problem. I am attaching a link to it, may be useful to somebody, I hope.
Modified tower of Hanoi problem with one or more disks of the same size
There are n types of disks. For each type, all disks are identical. In array arr, I am taking the number of disks of each type. A, B and C are pegs or towers.
Method swap(int, int), partition(int, int) and qSort(int, int) are part of my implementation of the quicksort algorithm.
Method toh(char, char, char, int, int) is the Tower of Hanoi solution.
How it is working: Imagine we compress all the disks of the same size into one disk. Now we have a problem which has a general solution to the Tower of Hanoi. Now each time a disk moves, we add the total movement which is equal to the total number of that type of disk.

minimum steps required to make array of integers contiguous

given a sorted array of distinct integers, what is the minimum number of steps required to make the integers contiguous? Here the condition is that: in a step , only one element can be changed and can be either increased or decreased by 1 . For example, if we have 2,4,5,6 then '2' can be made '3' thus making the elements contiguous(3,4,5,6) .Hence the minimum steps here is 1 . Similarly for the array: 2,4,5,8:
Step 1: '2' can be made '3'
Step 2: '8' can be made '7'
Step 3: '7' can be made '6'
Thus the sequence now is 3,4,5,6 and the number of steps is 3.
I tried as follows but am not sure if its correct?
//n is the number of elements in array a
int count=a[n-1]-a[0]-1;
for(i=1;i<=n-2;i++)
{
count--;
}
printf("%d\n",count);
Thanks.
The intuitive guess is that the "center" of the optimal sequence will be the arithmetic average, but this is not the case. Let's find the correct solution with some vector math:
Part 1: Assuming the first number is to be left alone (we'll deal with this assumption later), calculate the differences, so 1 12 3 14 5 16-1 2 3 4 5 6 would yield 0 -10 0 -10 0 -10.
sidenote: Notice that a "contiguous" array by your implied definition would be an increasing arithmetic sequence with difference 1. (Note that there are other reasonable interpretations of your question: some people may consider 5 4 3 2 1 to be contiguous, or 5 3 1 to be contiguous, or 1 2 3 2 3 to be contiguous. You also did not specify if negative numbers should be treated any differently.)
theorem: The contiguous numbers must lie between the minimum and maximum number. [proof left to reader]
Part 2: Now returning to our example, assuming we took the 30 steps (sum(abs(0 -10 0 -10 0 -10))=30) required to turn 1 12 3 14 5 16 into 1 2 3 4 5 6. This is one correct answer. But 0 -10 0 -10 0 -10+c is also an answer which yields an arithmetic sequence of difference 1, for any constant c. In order to minimize the number of "steps", we must pick an appropriate c. In this case, each time we increase or decrease c, we increase the number of steps by N=6 (the length of the vector). So for example if we wanted to turn our original sequence 1 12 3 14 5 16 into 3 4 5 6 7 8 (c=2), then the differences would have been 2 -8 2 -8 2 -8, and sum(abs(2 -8 2 -8 2 -8))=30.
Now this is very clear if you could picture it visually, but it's sort of hard to type out in text. First we took our difference vector. Imagine you drew it like so:
4|
3| *
2| * |
1| | | *
0+--+--+--+--+--*
-1| |
-2| *
We are free to "shift" this vector up and down by adding or subtracting 1 from everything. (This is equivalent to finding c.) We wish to find the shift which minimizes the number of | you see (the area between the curve and the x-axis). This is NOT the average (that would be minimizing the standard deviation or RMS error, not the absolute error). To find the minimizing c, let's think of this as a function and consider its derivative. If the differences are all far away from the x-axis (we're trying to make 101 112 103 114 105 116), it makes sense to just not add this extra stuff, so we shift the function down towards the x-axis. Each time we decrease c, we improve the solution by 6. Now suppose that one of the *s passes the x axis. Each time we decrease c, we improve the solution by 5-1=4 (we save 5 steps of work, but have to do 1 extra step of work for the * below the x-axis). Eventually when HALF the *s are past the x-axis, we can NO LONGER IMPROVE THE SOLUTION (derivative: 3-3=0). (In fact soon we begin to make the solution worse, and can never make it better again. Not only have we found the minimum of this function, but we can see it is a global minimum.)
Thus the solution is as follows: Pretend the first number is in place. Calculate the vector of differences. Minimize the sum of the absolute value of this vector; do this by finding the median OF THE DIFFERENCES and subtracting that off from the differences to obtain an improved differences-vector. The sum of the absolute value of the "improved" vector is your answer. This is O(N) The solutions of equal optimality will (as per the above) always be "adjacent". A unique solution exists only if there are an odd number of numbers; otherwise if there are an even number of numbers, AND the median-of-differences is not an integer, the equally-optimal solutions will have difference-vectors with corrective factors of any number between the two medians.
So I guess this wouldn't be complete without a final example.
input: 2 3 4 10 14 14 15 100
difference vector: 2 3 4 5 6 7 8 9-2 3 4 10 14 14 15 100 = 0 0 0 -5 -8 -7 -7 -91
note that the medians of the difference-vector are not in the middle anymore, we need to perform an O(N) median-finding algorithm to extract them...
medians of difference-vector are -5 and -7
let us take -5 to be our correction factor (any number between the medians, such as -6 or -7, would also be a valid choice)
thus our new goal is 2 3 4 5 6 7 8 9+5=7 8 9 10 11 12 13 14, and the new differences are 5 5 5 0 -3 -2 -2 -86*
this means we will need to do 5+5+5+0+3+2+2+86=108 steps
*(we obtain this by repeating step 2 with our new target, or by adding 5 to each number of the previous difference... but since you only care about the sum, we'd just add 8*5 (vector length times correct factor) to the previously calculated sum)
Alternatively, we could have also taken -6 or -7 to be our correction factor. Let's say we took -7...
then the new goal would have been 2 3 4 5 6 7 8 9+7=9 10 11 12 13 14 15 16, and the new differences would have been 7 7 7 2 1 0 0 -84
this would have meant we'd need to do 7+7+7+2+1+0+0+84=108 steps, the same as above
If you simulate this yourself, can see the number of steps becomes >108 as we take offsets further away from the range [-5,-7].
Pseudocode:
def minSteps(array A of size N):
A' = [0,1,...,N-1]
diffs = A'-A
medianOfDiffs = leftMedian(diffs)
return sum(abs(diffs-medianOfDiffs))
Python:
leftMedian = lambda x:sorted(x)[len(x)//2]
def minSteps(array):
target = range(len(array))
diffs = [t-a for t,a in zip(target,array)]
medianOfDiffs = leftMedian(diffs)
return sum(abs(d-medianOfDiffs) for d in diffs)
edit:
It turns out that for arrays of distinct integers, this is equivalent to a simpler solution: picking one of the (up to 2) medians, assuming it doesn't move, and moving other numbers accordingly. This simpler method often gives incorrect answers if you have any duplicates, but the OP didn't ask that, so that would be a simpler and more elegant solution. Additionally we can use the proof I've given in this solution to justify the "assume the median doesn't move" solution as follows: the corrective factor will always be in the center of the array (i.e. the median of the differences will be from the median of the numbers). Thus any restriction which also guarantees this can be used to create variations of this brainteaser.
Get one of the medians of all the numbers. As the numbers are already sorted, this shouldn't be a big deal. Assume that median does not move. Then compute the total cost of moving all the numbers accordingly. This should give the answer.
community edit:
def minSteps(a):
"""INPUT: list of sorted unique integers"""
oneMedian = a[floor(n/2)]
aTarget = [oneMedian + (i-floor(n/2)) for i in range(len(a))]
# aTargets looks roughly like [m-n/2?, ..., m-1, m, m+1, ..., m+n/2]
return sum(abs(aTarget[i]-a[i]) for i in range(len(a)))
This is probably not an ideal solution, but a first idea.
Given a sorted sequence [x1, x2, …, xn]:
Write a function that returns the differences of an element to the previous and to the next element, i.e. (xn – xn–1, xn+1 – xn).
If the difference to the previous element is > 1, you would have to increase all previous elements by xn – xn–1 – 1. That is, the number of necessary steps would increase by the number of previous elements × (xn – xn–1 – 1). Let's call this number a.
If the difference to the next element is >1, you would have to decrease all subsequent elements by xn+1 – xn – 1. That is, the number of necessary steps would increase by the number of subsequent elements × (xn+1 – xn – 1). Let's call this number b.
If a < b, then increase all previous elements until they are contiguous to the current element. If a > b, then decrease all subsequent elements until they are contiguous to the current element. If a = b, it doesn't matter which of these two actions is chosen.
Add up the number of steps taken in the previous step (by increasing the total number of necessary steps by either a or b), and repeat until all elements are contiguous.
First of all, imagine that we pick an arbitrary target of contiguous increasing values and then calculate the cost (number of steps required) for modifying the array the array to match.
Original: 3 5 7 8 10 16
Target: 4 5 6 7 8 9
Difference: +1 0 -1 -1 -2 -7 -> Cost = 12
Sign: + 0 - - - -
Because the input array is already ordered and distinct, it is strictly increasing. Because of this, it can be shown that the differences will always be non-increasing.
If we change the target by increasing it by 1, the cost will change. Each position in which the difference is currently positive or zero will incur an increase in cost by 1. Each position in which the difference is currently negative will yield a decrease in cost by 1:
Original: 3 5 7 8 10 16
New target: 5 6 7 8 9 10
New Difference: +2 +1 0 0 -1 -6 -> Cost = 10 (decrease by 2)
Conversely, if we decrease the target by 1, each position in which the difference is currently positive will yield a decrease in cost by 1, while each position in which the difference is zero or negative will incur an increase in cost by 1:
Original: 3 5 7 8 10 16
New target: 3 4 5 6 7 8
New Difference: 0 -1 -2 -2 -3 -8 -> Cost = 16 (increase by 4)
In order to find the optimal values for the target array, we must find a target such that any change (increment or decrement) will not decrease the cost. Note that an increment of the target can only decrease the cost when there are more positions with negative difference than there are with zero or positive difference. A decrement can only decrease the cost when there are more positions with a positive difference than with a zero or negative difference.
Here are some example distributions of difference signs. Remember that the differences array is non-increasing, so positives always have to be first and negatives last:
C C
+ + + - - - optimal
+ + 0 - - - optimal
0 0 0 - - - optimal
+ 0 - - - - can increment (negatives exceed positives & zeroes)
+ + + 0 0 0 optimal
+ + + + - - can decrement (positives exceed negatives & zeroes)
+ + 0 0 - - optimal
+ 0 0 0 0 0 optimal
C C
Observe that if one of the central elements (marked C) is zero, the target must be optimal. In such a circumstance, at best any increment or decrement will not change the cost, but it may increase it. This result is important, because it gives us a trivial solution. We pick a target such that a[n/2] remains unchanged. There may be other possible targets that yield the same cost, but there are definitely none that are better. Here's the original code modified to calculate this cost:
//n is the number of elements in array a
int targetValue;
int cost = 0;
int middle = n / 2;
int startValue = a[middle] - middle;
for (i = 0; i < n; i++)
{
targetValue = startValue + i;
cost += abs(targetValue - a[i]);
}
printf("%d\n",cost);
You can not do it by iterating once on the array, that's for sure.
You need first to check the difference between each two numbers, for example:
2,7,8,9 can be 2,3,4,5 with 18 steps or 6,7,8,9 with 4 steps.
Create a new array with the difference like so: for 2,7,8,9 it wiil be 4,1,1. Now you can decide whether to increase or decrease the first number.
Lets assume that the contiguous array looks something like this -
c c+1 c+2 c+3 .. and so on
Now lets take an example -
5 7 8 10
The contiguous array in this case will be -
c c+1 c+2 c+3
In order to get the minimum steps, the sum of the modulus of the difference of the integers(before and after) w.r.t the ith index should be the minimum. In which case,
(c-5)^2 + (c-6)^2 + (c-6)^2 + (c-7)^2 should be minimum
Let f(c) = (c-5)^2 + (c-6)^2 + (c-6)^2 + (c-7)^2
= 4c^2 - 48c + 146
Applying differential calculus to get the minima,
f'(c) = 8c - 48 = 0
=> c = 6
So our contiguous array is 6 7 8 9 and the minimum cost here is 2.
To sum it up, just generate f(c), get the first differential and find out c.
This should take O(n).
Brute force approach O(N*M)
If one draws a line through each point in the array a then y0 is a value where each line starts at index 0. Then the answer is the minimum among number of steps reqired to get from a to every line that starts at y0, in Python:
y0s = set((y - i) for i, y in enumerate(a))
nsteps = min(sum(abs(y-(y0+i)) for i, y in enumerate(a))
for y0 in xrange(min(y0s), max(y0s)+1)))
Input
2,4,5,6
2,4,5,8
Output
1
3

Resources