Calculating mid in binary search - algorithm

I was reading an algorithms book which had the following algorithm for binary search:
public class BinSearch {
static int search ( int [ ] A, int K ) {
int l = 0 ;
int u = A. length −1;
int m;
while (l <= u ) {
m = (l+u) /2;
if (A[m] < K) {
l = m + 1 ;
} else if (A[m] == K) {
return m;
} else {
u = m−1;
}
}
return −1;
}
}
The author says "The error is in the assignment m = (l+u)/2; it can lead to overflow and should be replaced by m = l + (u-l)/2."
I can't see how that would cause an overflow. When I run the algorithm in my mind for a few different inputs, I don't see the mid's value going out of the array index.
So, in which cases would the overflow occur?

This post covers this famous bug in a lot of detail. As others have said it's an overflow issue. The fix recommended on the link is as follows:
int mid = low + ((high - low) / 2);
// Alternatively
int mid = (low + high) >>> 1;
It is also probably worth mentioning that in case negative indices are allowed, or perhaps it's not even an array that's being searched (for example, searching for a value in some integer range satisfying some condition), the code above may not be correct as well. In this case, something as ugly as
(low < 0 && high > 0) ? (low + high) / 2 : low + (high - low) / 2
may be necessary. One good example is searching for the median in an unsorted array without modifying it or using additional space by simply performing a binary search on the whole Integer.MIN_VALUE–Integer.MAX_VALUE range.

The following C++ program can show you how an overflow can happen with a 32-bit unsigned integer:
#include <iostream>
using namespace std;
int main ()
{
unsigned int low = 33,
high = 4294967290,
mid;
cout << "The value of low is " << low << endl;
cout << "The value of high is " << high << endl;
mid = (low + high) / 2;
cout << "The value of mid is " << mid << endl;
return 0;
}
If you run it on a Mac:
$ g++ try.cpp && ./a.out
The value of low is 33
The value of high is 4294967290
The value of mid is 13
The value of mid might be expected to be 2147483661, but low + high overflowed because a 32-bit unsigned integer cannot contain the proper value, and give back 27, and so mid becomes 13.
When the calculation of mid is changed to
mid = low + (high - low) / 2;
Then it will show
The value of mid is 2147483661
The simple answer is, the addition l + u can overflow, and has undefined behavior in some languages, as described in a blog post by Joshua Bloch, about a bug in the Java library for the implementation of binary search.
Some readers may not understand what it is about:
l + (u - l) / 2
Note that in some code, the variable names are different, and it is
low + (high - low) / 2
The answer is: let's say if you have two numbers: 200 and 210, and now you want the "middle number". And let's say if you add any two numbers and the result is greater than 255, then it can overflow and the behavior is undefined, then what can you do? A simple way is just to add the difference between them, but just half of it, to the smaller value: look at what the difference is between 200 and 210. It is 10. (You can consider it the "difference" or "length", between them). So you just need to add 10 / 2 = 5 to 200, and get 205. You don't need to add 200 and 210 together first -- and that's how we can reach the calculation: (u - l) is the difference. (u - l) / 2 is half of it. Add that to l and we have l + (u - l) / 2.
It is like, if we are looking at two trees, one is 200 feet tall and one is 210 feet tall, what is the "midpoint" or the "mean"? We don't have to add them together first. We can just tell the difference is 10 feet, and we can add half of that, which is 5, to 200, and we know it is 205 feet.
To put this into history perspectives, Robert Sedgewick mentioned that the first binary search was stated in 1946, and it wasn't correct until 1964. Jon Bentley described in his book Programming Pearls in 1988 that more that 90% of the professional programmers could not write it correctly given a couple of hours. But even Jon Bentley himself had that overflow bug for 20 years. A study that was published in 1988 showed that accurate code for binary search was only found in 5 out of 20 textbooks. In 2006, Joshua Bloch wrote that blog post about the bug about calculating the mid value. So it took 60 years for this code to be correct. But now, next time in the job interview, remember to write it correctly within that 5 minutes.

The problem is that (l+u) is evaluated first, and could overflow int, so (l+u)/2 would return the wrong value.

Jeff suggested really good post to read about this bug, here is summary if you want quick overview.
In Programming Pearls Bentley says that the analogous line "sets m to the average of l and u, truncated down to the nearest integer." On the face of it, this assertion might appear correct, but it fails for large values of the int variables low and high. Specifically, it fails if the sum of low and high is greater than the maximum positive int value (2^31 - 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws ArrayIndexOutOfBoundsException.

Here is an example, suppose you had a very big array of size 2,000,000,000 and 10 (10^9 + 10) and the left index was at 2,000,000,000 and the right index was at 2,000,000,000 + 1.
By using lo + hi will sum upto 2,000,000,000 + 2,000,000,001 = 4,000,000,001. Since the max value of an integer is 2,147,483,647. So you won't get 4,000,000,000 + 1, you will get an integer overflow.
But low + ((high - low) / 2) will work. 2,000,000,000 + ((2,000,000,001 - 2,000,000,000) / 2) = 2,000,000,000

The potential overflow is in the l+u addition itself.
This was actually a bug in early versions of binary search in the JDK.

Actually the following statement in calculating mid may result in INT range overflow.
mid = (start + end) /2
Suppose the given ordered input list is very large, and suppose it surpasses the INT range(-2^31 to 2^31-1). The start + end may result in exception. To counter this, the following statement is written:
mid = start + (end-start)/2
Ultimately it results in the same expression. But the exception is averted by this trick.

This answer gives a practical example of why the l + (r-l)/2 calculation is necessary.
In case you are curious how the two are equivalent mathematically, here is the proof. The key is adding 0 then splitting that into l/2 - l/2.
(l+r)/2 =
l/2 + r/2 =
l/2 + r/2 + 0 =
l/2 + r/2 + (l/2 - l/2) =
(l/2 + l/2) + (r/2 - l/2) =
l + (r-l)/2

int mid=(l+h)/2; can lead to integer overflow problem.
(l+u) gets evaluated into a large negative integer value and its half
is returned. Now,if we are searching for an element in an array, it
would lead to "index out of range error."
However, the issue is resolved as:-
int mid=l+(h-l)/2;
Bit Manipulation: For faster computation->int mid=((unsigned int)l+(unsigned int)h) >> 1 ;
where >> is the right shift operator.
Hope this helps :)

To avoid overflow, you can also do this:
int midIndex = (int) (startIndex/2.0 + endIndex / 2.0);
You divide both indices by 2.0
-> You are getting two doubles that are less or equal to Integer.MAX_VALUE / 2 and their sum is also less or equal to Integer.MAXVALUE and a double as well. Same for Integer.MIN_VALUE. Finally, you convert the sum to an int and prevented overflow ;)

it is because if we add : [ mid = low + high ] and both mid and high are large their addition may be out of range of integer
also why it is not [ mid = low/2 + high/2 ] it is because it is an integer division so if [ low = 5 and high= 11 ] then [ mid = low/2 + high/2 ] will be mid = 5/2 + 11/2 => 2+ 5 => 9
so it will lead to wrong answer
that is why it is taken as mid = low + (high -low)/2;

I have created this video with an example where number overflow will happen.
https://youtu.be/fMgenZq7qls
Usually, for simple binary search where you need to find an element from an array, this won't happen due to array size limitation in languages like Java but where problem space is not limited to an array, this problem can occur. Please see my video for practical example.

It is a very subtle error and easy to miss out the first time. Most articles on the internet don't seem to clearly explain how this error occurs and how the optimized formula prevents overflow.
After a lot of digging I found this article which has a excellent and detailed explanation on how the error occurs when mid = (left+right)/2 formula is used and also how it is overcome using mid = low + ((high - low) / 2). Most importantly they explain it with example which makes the understanding so much easier.
It also explains why mid = low + ((high - low) / 2) doesn't cause an overflow.

I saw a method by using bit operation:
int mid = (l & r)+ ((l ^ r )>>1);
Just for fun..

Related

Algorithm for separating integer into a sum of products of single digit numbers? [duplicate]

A couple of days ago I played around with Befunge which is an esoteric programming language. Befunge uses a LIFO stack to store data. When you write programs the digits from 0 to 9 are actually Befunge-instructions which push the corresponding values onto the stack. So for exmaple this would push a 7 to stack:
34+
In order to push a number greater than 9, calculations must be done with numbers less than or equal to 9. This would yield 123.
99*76*+
While solving Euler Problem 1 with Befunge I had to push the fairly large number 999 to the stack. Here I began to wonder how I could accomplish this task with as few instructions as possible. By writing a term down in infix notation and taking out common factors I came up with
9993+*3+*
One could also simply multiply two two-digit numbers which produce 999, e.g.
39*66*1+*
I thought about this for while and then decided to write a program which puts out the smallest expression according to these rules in reverse polish notation for any given integer. This is what I have so far (written in NodeJS with underscorejs):
var makeExpr = function (value) {
if (value < 10) return value + "";
var output = "", counter = 0;
(function fn (val) {
counter++;
if(val < 9) { output += val; return; };
var exp = Math.floor(Math.log(val) / Math.log(9));
var div = Math.floor(val / Math.pow(9, exp));
_( exp ).times(function () { output += "9"; });
_(exp-1).times(function () { output += "*"; });
if (div > 1) output += div + "*";
fn(val - Math.pow(9, exp) * div);
})(value);
_(counter-1).times(function () { output+= "+"; });
return output.replace(/0\+/, "");
};
makeExpr(999);
// yields 999**99*3*93*++
This piece of code constructs the expression naively and is obvously way to long. Now my questions:
Is there an algorithm to simplify expressions in reverse polish notation?
Would simplification be easier in infix notation?
Can an expression like 9993+*3+* be proofed to be the smallest one possible?
I hope you can give some insights. Thanks in advance.
When only considering multiplication and addition, it's pretty easy to construct optimal formula's, because that problem has the optimal substructure property. That is, the optimal way to build [num1][num2]op is from num1 and num2 that are both also optimal. If duplication is also considered, that's no longer true.
The num1 and num2 give rise to overlapping subproblems, so Dynamic Programming is applicable.
We can simply, for a number i:
For every 1 < j <= sqrt(i) that evenly divides i, try [j][i / j]*
For every 0 < j < i/2, try [j][i - j]+
Take the best found formula
That is of course very easy to do bottom-up, just start at i = 0 and work your way up to whatever number you want. Step 2 is a little slow, unfortunately, so after say 100000 it starts to get annoying to wait for it. There might be some trick that I'm not seeing.
Code in C# (not tested super well, but it seems to work):
string[] n = new string[10000];
for (int i = 0; i < 10; i++)
n[i] = "" + i;
for (int i = 10; i < n.Length; i++)
{
int bestlen = int.MaxValue;
string best = null;
// try factors
int sqrt = (int)Math.Sqrt(i);
for (int j = 2; j <= sqrt; j++)
{
if (i % j == 0)
{
int len = n[j].Length + n[i / j].Length + 1;
if (len < bestlen)
{
bestlen = len;
best = n[j] + n[i / j] + "*";
}
}
}
// try sums
for (int j = 1; j < i / 2; j++)
{
int len = n[j].Length + n[i - j].Length + 1;
if (len < bestlen)
{
bestlen = len;
best = n[j] + n[i - j] + "+";
}
}
n[i] = best;
}
Here's a trick to optimize searching for the sums. Suppose there is an array that contains, for every length, the highest number that can be made with that length. An other thing that is perhaps less obvious that this array also gives us, is a quick way to determine the shortest number that is bigger than some threshold (by simply scanning through the array and noting the first position that crosses the threshold). Together, that gives a quick way to discard huge portions of the search space.
For example, the biggest number of length 3 is 81 and the biggest number of length 5 is 728. Now if we want to know how to get 1009 (prime, so no factors found), first we try the sums where the first part has length 1 (so 1+1008 through 9+1000), finding 9+1000 which is 9 characters long (95558***+).
The next step, checking the sums where the first part has length 3 or less, can be skipped completely. 1009 - 81 = 929, and 929 (the lowest that the second part of the sum can be if the first part is to be 3 characters or less) is bigger than 728 so numbers of 929 and over must be at least 7 characters long. So if the first part of the sum is 3 characters, the second part must be at least 7 characters, and then there's also a + sign on the end, so the total is at least 11 characters. The best so far was 9, so this step can be skipped.
The next step, with 5 characters in the first part, can also be skipped, because 1009 - 728 = 280, and to make 280 or high we need at least 5 characters. 5 + 5 + 1 = 11, bigger than 9, so don't check.
Instead of checking about 500 sums, we only had to check 9 this way, and the check to make the skipping possible is very quick. This trick is good enough that generating all numbers up to a million only takes 3 seconds on my PC (before, it would take 3 seconds to get to 100000).
Here's the code:
string[] n = new string[100000];
int[] biggest_number_of_length = new int[n.Length];
for (int i = 0; i < 10; i++)
n[i] = "" + i;
biggest_number_of_length[1] = 9;
for (int i = 10; i < n.Length; i++)
{
int bestlen = int.MaxValue;
string best = null;
// try factors
int sqrt = (int)Math.Sqrt(i);
for (int j = 2; j <= sqrt; j++)
{
if (i % j == 0)
{
int len = n[j].Length + n[i / j].Length + 1;
if (len < bestlen)
{
bestlen = len;
best = n[j] + n[i / j] + "*";
}
}
}
// try sums
for (int x = 1; x < bestlen; x += 2)
{
int find = i - biggest_number_of_length[x];
int min = int.MaxValue;
// find the shortest number that is >= (i - biggest_number_of_length[x])
for (int k = 1; k < biggest_number_of_length.Length; k += 2)
{
if (biggest_number_of_length[k] >= find)
{
min = k;
break;
}
}
// if that number wasn't small enough, it's not worth looking in that range
if (min + x + 1 < bestlen)
{
// range [find .. i] isn't optimal
for (int j = find; j < i; j++)
{
int len = n[i - j].Length + n[j].Length + 1;
if (len < bestlen)
{
bestlen = len;
best = n[i - j] + n[j] + "+";
}
}
}
}
// found
n[i] = best;
biggest_number_of_length[bestlen] = i;
}
There's still room for improvement. This code will re-check sums that it has already checked. There are simple ways to make it at least not check the same sum twice (by remembering the last find), but that made no significant difference in my tests. It should be possible to find a better upper bound.
There's also 93*94*1+*, which is basically 27*37.
Were I to attack this problem, I'd start by first trying to evenly divide the number. So given 999 I would divide by 9 and get 111. Then I'd try to divide by 9, 8, 7, etc. until I discovered that 111 is 3*37.
37 is prime, so I go greedy and divide by 9, giving me 4 with a remainder of 1.
That seems to give me optimum results for the half dozen I've tried. It's a little expensive, of course, testing for even divisibility. But perhaps not more expensive than generating a too-long expression.
Using this, 100 becomes 55*4*. 102 works out to 29*5*6+.
101 brings up an interesting case. 101/9 = (9*11) + 2. Or, alternately, (9*9)+20. Let's see:
983+*2+ (9*11) + 2
99*45*+ (9*9) + 20
Whether it's easier to generate the postfix directly or generate infix and convert, I really don't know. I can see benefits and drawbacks to each.
Anyway, that's the approach I'd take: try to divide evenly at first, and then be greedy dividing by 9. Not sure exactly how I'd structure it.
I'd sure like to see your solution once you figure it out.
Edit
This is an interesting problem. I came up with a recursive function that does a credible job of generating postfix expressions, but it's not optimum. Here it is in C#.
string GetExpression(int val)
{
if (val < 10)
{
return val.ToString();
}
int quo, rem;
// first see if it's evenly divisible
for (int i = 9; i > 1; --i)
{
quo = Math.DivRem(val, i, out rem);
if (rem == 0)
{
// If val < 90, then only generate here if the quotient
// is a one-digit number. Otherwise it can be expressed
// as (9 * x) + y, where x and y are one-digit numbers.
if (val >= 90 || (val < 90 && quo <= 9))
{
// value is (i * quo)
return i + GetExpression(quo) + "*";
}
}
}
quo = Math.DivRem(val, 9, out rem);
// value is (9 * quo) + rem
// optimization reduces (9 * 1) to 9
var s1 = "9" + ((quo == 1) ? string.Empty : GetExpression(quo) + "*");
var s2 = GetExpression(rem) + "+";
return s1 + s2;
}
For 999 it generates 9394*1+**, which I believe is optimum.
This generates optimum expressions for values <= 90. Every number from 0 to 90 can be expressed as the product of two one-digit numbers, or by an expression of the form (9x + y), where x and y are one-digit numbers. However, I don't know that this guarantees an optimum expression for values greater than 90.
There is 44 solutions for 999 with lenght 9:
39149*+**
39166*+**
39257*+**
39548*+**
39756*+**
39947*+**
39499**+*
39669**+*
39949**+*
39966**+*
93149*+**
93166*+**
93257*+**
93548*+**
93756*+**
93947*+**
93269**+*
93349**+*
93366**+*
93439**+*
93629**+*
93636**+*
93926**+*
93934**+*
93939+*+*
93948+*+*
93957+*+*
96357**+*
96537**+*
96735**+*
96769+*+*
96778+*+*
97849+*+*
97858+*+*
97867+*+*
99689+*+*
956*99*+*
968*79*+*
39*149*+*
39*166*+*
39*257*+*
39*548*+*
39*756*+*
39*947*+*
Edit:
I have working on some search space pruning improvements so sorry I have not posted it immediately. There is script in Erlnag. Original one takes 14s for 999 but this one makes it in around 190ms.
Edit2:
There is 1074 solutions of length 13 for 9999. It takes 7 minutes and there is some of them below:
329+9677**+**
329+9767**+**
338+9677**+**
338+9767**+**
347+9677**+**
347+9767**+**
356+9677**+**
356+9767**+**
3147789+***+*
31489+77***+*
3174789+***+*
3177489+***+*
3177488*+**+*
There is version in C with more aggressive pruning of state space and returns only one solution. It is way faster.
$ time ./polish_numbers 999
Result for 999: 39149*+**, length 9
real 0m0.008s
user 0m0.004s
sys 0m0.000s
$ time ./polish_numbers 99999
Result for 99999: 9158*+1569**+**, length 15
real 0m34.289s
user 0m34.296s
sys 0m0.000s
harold was reporting his C# bruteforce version makes same number in 20s so I was curious if I can improve mine. I have tried better memory utilization by refactoring data structure. Searching algorithm mostly works with length of solution and it's existence so I separated this information to one structure (best_rec_header). I have also make solution as tree branches separated in another (best_rec_args). Those data are used only when new better solution for given number. There is code.
Result for 99999: 9158*+1569**+**, length 15
real 0m31.824s
user 0m31.812s
sys 0m0.012s
It was still too much slow. So I tried some other versions. First I added some statistics to demonstrate that mine code is not computing all smaller numbers.
Result for 99999: 9158*+1569**+**, length 15, (skipped 36777, computed 26350)
Then I have tried change code to compute + solutions for bigger numbers first.
Result for 99999: 1956**+9158*+**, length 15, (skipped 0, computed 34577)
real 0m17.055s
user 0m17.052s
sys 0m0.008s
It was almost as twice faster. But there was another idea that may be sometimes I give up find solution for some number as limited by current best_len limit. So I tried to make small numbers (up to half of n) unlimited (note 255 as best_len limit for first of operands finding).
Result for 99999: 9158*+1569**+**, length 15, (skipped 36777, computed 50000)
real 0m12.058s
user 0m12.048s
sys 0m0.008s
Nice improvement but what if I limit solutions for those numbers by best solution found so far. It needs some sort of computation global state. Code becomes more complicated but result even faster.
Result for 99999: 97484777**+**+*, length 15, (skipped 36997, computed 33911)
real 0m10.401s
user 0m10.400s
sys 0m0.000s
It was even able to compute ten times bigger number.
Result for 999999: 37967+2599**+****, length 17, (skipped 440855)
real 12m55.085s
user 12m55.168s
sys 0m0.028s
Then I decided to try also brute force method and this was even faster.
Result for 99999: 9158*+1569**+**, length 15
real 0m3.543s
user 0m3.540s
sys 0m0.000s
Result for 999999: 37949+2599**+****, length 17
real 5m51.624s
user 5m51.556s
sys 0m0.068s
Which shows, that constant matter. It is especially true for modern CPU when brute force approach gets advantage from better vectorization, better CPU cache utilization and less branching.
Anyway, I think there is some better approach using better understanding of number theory or space searching by algorithms as A* and so. And for really big numbers there may be good idea to use genetic algorithms.
Edit3:
harold came with new idea to eliminate trying to much sums. I have implemented it in this new version. It is order of magnitude faster.
$ time ./polish_numbers 99999
Result for 99999: 9158*+1569**+**, length 15
real 0m0.153s
user 0m0.152s
sys 0m0.000s
$ time ./polish_numbers 999999
Result for 999999: 37949+2599**+****, length 17
real 0m3.516s
user 0m3.512s
sys 0m0.004s
$ time ./polish_numbers 9999999
Result for 9999999: 9788995688***+***+*, length 19
real 1m39.903s
user 1m39.904s
sys 0m0.032s
Don't forget, you can also push ASCII values!!
Usually, this is longer, but for higher numbers it can get much shorter:
If you needed the number 123, it would be much better to do
"{" than 99*76*+

Unable to understand algorithm

Here is the link of problem
https://www.hackerrank.com/challenges/equal
I read its editorial and unable to understand it. And if you are not make any account on hackerrank then surely you will not see it's editorial so here is some lines of editorial.
This is equivalent to saying, christy can take away the chocolates of
one coworker by 1, 2 or 5 while keeping others' chocolate untouched.
Let's consider decreasing a coworker's chocolate as an operation. To minimize the number of operations, we should try to make the number of chocolates of every coworker equal to the minimum one in the group(min). We have to decrease the number of chocolates the ith person A[i] by (A[i] - min). Let this value be x.
This can be done in k operations.
k = x/5 +(x%5)/2 + (x%5)%2
and from here i unable to understand
Let f(min) be sum of operations performed over all coworkers to reduce
each of their chocolates to min. However, sometimes f(min) might not
always give the correct answer. It can also be a case when
f(min) > f(min-1)
f(min) < f(min-5)
as f(min-5) takes N operations more than f(min) where N is the number
of coworkers. Therefore, if
A = {min,min-1,min-2,min-3,min-4}
then f(A) <= f(min) < f(min-5)
can someone help me to understand why this is necessary to check f(min),f(min-1),...,f(min-4)
Consider the case A = [1,5,5]
As the editorial said, it is intuitive to think it is optimal to change A to [1,1,1] with 4 (2 minus 2) operations, but it is better to change it to [0,0,0] with 3 (1 minus 1, 2 minus 5) operations.
Hence if min = minimum element in array, then change all elements to min may not be optimal.
The part you do not understand is to cater this situation, we know min may not be optimal as min-x maybe better, but how large is x? Well it is 4. The editorial is saying if we know x is at most 4, we can just simply brute force min, min-1...min-4 to see which one is the minimum without thinking too much.
Reasoning (Not proof!) for x <= 4
If x >= 5, then you have to use at least extra N type 3 (minus 5) operations on all elements which is definitely not worth.
Basically it is not a matter of the type of operation, it is because you need to use same operation on ALL elements, after you do that, the problem is not reduced, the relative difference between elements is still the same while you aim to make the relative difference to 0, you cost N operations for nothing.
In other words, if x >= 5, then x-5 must be a more optimal choice of goal, indeed x%5 must be the best goal.
(Below is TL;DR part: Version 2) Jump to the Last Section If You are Not Interested in the proof
In the process of writing the original solution, I suspect x <= 2 indeed, and I have tried to submit a code on HackerRank which only check minimum for f(min-x) where x <= 2, and it got ACed.
More formally, I claim
If 5> (z-min)%5 >= 3 and (z-min')%5==0, then F(min')< F(min)
where min'=min-x for x<=2, F(k) = min # of operation for element z to become k
(Beware the notation, I use F(), it is different meaning from f() in the question)
Here is the proof:
If (z-min)%5 = 1 or 2, then it needs at least (z-min)/5 + 1 operations, while (z-min')%5 == 0 needs (z-min')/5 = (z-min)/5 + 1 operation, means F(min') = F(min)
If(z-min)%5 == 3 or 4, then it needs at least (z-min)/5 + 2 operations, while (z-min')%5 == 0 needs (z-min')/5 = (z-min)/5 + 1 operation, means F(min') < F(min) (or F(min') = F(min)+1)
So we proof
If 5> (z-min)%5 >= 3 and (z-min')%5==0, then F(min')< F(min)
where min'=min-x
Now let's proof the range of x
As we assume (z-min)%5 >= 3 and (z-min')%5 == 0,
so (z-min')%5 = (z-min+x)%5 = ((z-min)%5 + x%5)%5 == 0
Now, if x >= 3, then (z-min)%5 can never be >= 3 in order to make ((z-min)%5 + x%5)%5 == 0
If x = 2, then (z-min)%5 can be 3; if x = 1 then (z-min)%5 can be 4, to meet both conditions: 5> (z-min)%5 >= 3 and (z-min')%5==0
Thus together we show
If 5> (z-min)%5 >= 3 and (z-min')%5==0, then F(min')< F(min)
where min'=min-x for x<=2
Note one can always generate array P, such that f(min') < f(min), as you can always repeat integer which can be improved by such method until it out number those integers cannot. This is because for elements that cannot be improved, they will always need exactly 1 more operations
eg: Let P = [2,2,2,10] f(min) = 0+3 = 3, f(min-2) = 3+2 = 5
Here 10 is the element which can be improved, while 2 cannot, so we can just add more 10 in the array. Each 2 will use 1 more operation to get to min' = min-2, while each 10 will save 1 operation to get min'. So we only have to add more 10 until it out number (compensate) the "waste" of 2:
P = [2,2,2,10,10,10,10,10], then f(min) = 0+15 = 15, f(min-2) = 3+10=13
or simply just
P = [2,10,10], f(min) = 6, f(min-2) = 5
(End of TL;DR part!)
EDITED
OMG THE TEST CASE ON HACKERRANK IS WEAK!
Story is when I arrive my office this morning, I keep thinking this problem a bit, and think that there maybe a problem in my code (which got ACed!)
#include <cmath>
#include <cstdio>
#include <vector>
#include <iostream>
#include <algorithm>
using namespace std;
int T, n, a[10005], m = 1<<28;
int f(int m){
m = max(0, m);
int cnt = 0;
for(int i=0; i<n;i++){
cnt += (a[i]-m)/5 + (a[i]-m)%5/2 + (a[i]-m)%5%2;
}
return cnt;
}
int main() {
cin >> T;
while(T--){
m = 1<<28;
cin >> n;
for(int i=0; i<n;i++) cin >> a[i], m = min(m,a[i]);
cout << min(min(f(m), f(m-1)),f(m-2)) << endl;
}
return 0;
}
Can you see the problem?
The problem is m = max(0, m); !
It ensure that min-x must be at least 0, but wait, my proof above did not say anything about the range of min-x! It can be negative indeed!
Remember the original question is about "adding", so there is no maximum value of the goal; while we model the question to "subtracting", there is no minimum value of the goal as well (but I set it to 0!)
Try this test case with the code above:
1
3
0 3 3
It forces min-x = 0, so it gives 4 as output, but the answer should be 3
(If we use "adding" model, the goal should be 10, with +5 on a[0],a[2], +5 on a[0],a[1], +2 on a[1], a[2])
So everything finally got right (I think...) when I remove the line m = max(0, m);, it allows min-x to get negative and give 3 as a correct output, and of course the new code get ACed as well...

Matthews Correlation Coefficient yielding values outside of [-1,1]

I'm using the formula found on Wikipedia for calculating Matthew's Correlation Coefficient. It works fairly well, most of the time, but I'm running into problems in my tool's implementation, and I'm not seeing the problem.
MCC = ((TP*TN)-(FP*FN))/sqrt(((TP + FP)( TP + FN )( TN + FP )( TN + FN )))
Where TP, TN, FP, and FN are the non-negative, integer counts of the appropriate fields.
Which should only return values $\epsilon$ [-1,1]
My implementation is as follows:
double ret;
if ((TruePositives + FalsePositives) == 0 || (TruePositives + FalseNegatives) == 0 ||
( TrueNegatives + FalsePositives) == 0 || (TrueNegatives + FalseNegatives) == 0)
//To avoid dividing by zero
ret = (double)(TruePositives * TrueNegatives -
FalsePositives * FalseNegatives);
else{
double num = (double)(TruePositives * TrueNegatives -
FalsePositives * FalseNegatives);
double denom = (TruePositives + FalsePositives) *
(TruePositives + FalseNegatives) *
(TrueNegatives + FalsePositives) *
(TrueNegatives + FalseNegatives);
denom = Math.Sqrt(denom);
ret = num / denom;
}
return ret;
When I use this, as I said it works properly most of the time, but for instance if TP=280, TN = 273, FP = 67, and FN = 20, then we get:
MCC = (280*273)-(67*20)/sqrt((347*300*340*293)) = 75100/42196.06= (approx) 1.78
Is this normal behavior of Matthews Correlation Coefficient? I'm a programmer by trade, so statistics aren't a part of my formal training. Also, I've looked at questions with answers, and none of them discuss this behavior. Is it a bug in my code or in the formula itself?
The code is clear and looks correct. (But one's eyes can always deceive.)
One issue is a concern whether the output is guaranteed to lie between -1 and 1. Assuming all inputs are nonnegative, though, we can round the numerator up and the denominator down, thereby overestimating the result, by zeroing out all the "False*" terms, producing
TP*TN / Sqrt(TP*TN*TP*TN) = 1.
The lower limit is obtained similarly by zeroing out all the "True*" terms. Therefore, working code cannot produce a value larger than 1 in size unless it is presented with invalid input.
I therefore recommend placing a guard (such as an Assert statement) to assure the inputs are nonnegative. (Clearly it matters not in the preceding argument whether they are integral.) Place another assertion to check that the output is in the interval [-1,1]. Together, these will detect either or both of (a) invalid inputs or (b) an error in the calculation.

Rolling variance algorithm

I'm trying to find an efficient, numerically stable algorithm to calculate a rolling variance (for instance, a variance over a 20-period rolling window). I'm aware of the Welford algorithm that efficiently computes the running variance for a stream of numbers (it requires only one pass), but am not sure if this can be adapted for a rolling window. I would also like the solution to avoid the accuracy problems discussed at the top of this article by John D. Cook. A solution in any language is fine.
I've run across this problem as well. There are some great posts out there in computing the running cumulative variance such as John Cooke's Accurately computing running variance post and the post from Digital explorations, Python code for computing sample and population variances, covariance and correlation coefficient. Just could not find any that were adapted to a rolling window.
The Running Standard Deviations post by Subluminal Messages was critical in getting the rolling window formula to work. Jim takes the power sum of the squared differences of the values versus Welford’s approach of using the sum of the squared differences of the mean. Formula as follows:
PSA today = PSA(yesterday) + (((x today * x today) - x yesterday)) / n
x = value in your time series
n = number of values you've analyzed so far.
But, to convert the Power Sum Average formula to a windowed variety you need tweak the formula to the following:
PSA today = PSA yesterday + (((x today * x today) - (x yesterday * x Yesterday) / n
x = value in your time series
n = number of values you've analyzed so far.
You'll also need the Rolling Simple Moving Average formula:
SMA today = SMA yesterday + ((x today - x today - n) / n
x = value in your time series
n = period used for your rolling window.
From there you can compute the Rolling Population Variance:
Population Var today = (PSA today * n - n * SMA today * SMA today) / n
Or the Rolling Sample Variance:
Sample Var today = (PSA today * n - n * SMA today * SMA today) / (n - 1)
I've covered this topic along with sample Python code in a blog post a few years back, Running Variance.
Hope this helps.
Please note: I provided links to all the blog posts and math formulas
in Latex (images) for this answer. But, due to my low reputation (<
10); I'm limited to only 2 hyperlinks and absolutely no images. Sorry
about this. Hope this doesn't take away from the content.
I have been dealing with the same issue.
Mean is simple to compute iteratively, but you need to keep the complete history of values in a circular buffer.
next_index = (index + 1) % window_size; // oldest x value is at next_index, wrapping if necessary.
new_mean = mean + (x_new - xs[next_index])/window_size;
I have adapted Welford's algorithm and it works for all the values that I have tested with.
varSum = var_sum + (x_new - mean) * (x_new - new_mean) - (xs[next_index] - mean) * (xs[next_index] - new_mean);
xs[next_index] = x_new;
index = next_index;
To get the current variance just divide varSum by the window size: variance = varSum / window_size;
If you prefer code over words (heavily based on DanS' post):
http://calcandstuff.blogspot.se/2014/02/rolling-variance-calculation.html
public IEnumerable RollingSampleVariance(IEnumerable data, int sampleSize)
{
double mean = 0;
double accVar = 0;
int n = 0;
var queue = new Queue(sampleSize);
foreach(var observation in data)
{
queue.Enqueue(observation);
if (n < sampleSize)
{
// Calculating first variance
n++;
double delta = observation - mean;
mean += delta / n;
accVar += delta * (observation - mean);
}
else
{
// Adjusting variance
double then = queue.Dequeue();
double prevMean = mean;
mean += (observation - then) / sampleSize;
accVar += (observation - prevMean) * (observation - mean) - (then - prevMean) * (then - mean);
}
if (n == sampleSize)
yield return accVar / (sampleSize - 1);
}
}
Actually Welfords algorithm can AFAICT easily be adapted to compute weighted Variance.
And by setting weights to -1, you should be able to effectively cancel out elements. I havn't checked the math whether it allows negative weights though, but at a first look it should!
I did perform a small experiment using ELKI:
void testSlidingWindowVariance() {
MeanVariance mv = new MeanVariance(); // ELKI implementation of weighted Welford!
MeanVariance mc = new MeanVariance(); // Control.
Random r = new Random();
double[] data = new double[1000];
for (int i = 0; i < data.length; i++) {
data[i] = r.nextDouble();
}
// Pre-roll:
for (int i = 0; i < 10; i++) {
mv.put(data[i]);
}
// Compare to window approach
for (int i = 10; i < data.length; i++) {
mv.put(data[i-10], -1.); // Remove
mv.put(data[i]);
mc.reset(); // Reset statistics
for (int j = i - 9; j <= i; j++) {
mc.put(data[j]);
}
assertEquals("Variance does not agree.", mv.getSampleVariance(),
mc.getSampleVariance(), 1e-14);
}
}
I get around ~14 digits of precision compared to the exact two-pass algorithm; this is about as much as can be expected from doubles. Note that Welford does come at some computational cost because of the extra divisions - it takes about twice as long as the exact two-pass algorithm. If your window size is small, it may be much more sensible to actually recompute the mean and then in a second pass the variance every time.
I have added this experiment as unit test to ELKI, you can see the full source here: http://elki.dbs.ifi.lmu.de/browser/elki/trunk/test/de/lmu/ifi/dbs/elki/math/TestSlidingVariance.java
it also compares to the exact two-pass variance.
However, on skewed data sets, the behaviour might be different. This data set obviously is uniform distributed; but I've also tried a sorted array and it worked.
Update: we published a paper with details on differentweighting schemes for (co-)variance:
Schubert, Erich, and Michael Gertz. "Numerically stable parallel computation of (co-) variance." Proceedings of the 30th International Conference on Scientific and Statistical Database Management. ACM, 2018. (Won the SSDBM best-paper award.)
This also discusses how weighting can be used to parallelize the computation, e.g., with AVX, GPUs, or on clusters.
Here's a divide and conquer approach that has O(log k)-time updates, where k is the number of samples. It should be relatively stable for the same reasons that pairwise summation and FFTs are stable, but it's a bit complicated and the constant isn't great.
Suppose we have a sequence A of length m with mean E(A) and variance V(A), and a sequence B of length n with mean E(B) and variance V(B). Let C be the concatenation of A and B. We have
p = m / (m + n)
q = n / (m + n)
E(C) = p * E(A) + q * E(B)
V(C) = p * (V(A) + (E(A) + E(C)) * (E(A) - E(C))) + q * (V(B) + (E(B) + E(C)) * (E(B) - E(C)))
Now, stuff the elements in a red-black tree, where each node is decorated with mean and variance of the subtree rooted at that node. Insert on the right; delete on the left. (Since we're only accessing the ends, a splay tree might be O(1) amortized, but I'm guessing amortized is a problem for your application.) If k is known at compile-time, you could probably unroll the inner loop FFTW-style.
I know this question is old, but in case someone else is interested here follows the python code. It is inspired by johndcook blog post, #Joachim's, #DanS's code and #Jaime comments. The code below still gives small imprecisions for small data windows sizes. Enjoy.
from __future__ import division
import collections
import math
class RunningStats:
def __init__(self, WIN_SIZE=20):
self.n = 0
self.mean = 0
self.run_var = 0
self.WIN_SIZE = WIN_SIZE
self.windows = collections.deque(maxlen=WIN_SIZE)
def clear(self):
self.n = 0
self.windows.clear()
def push(self, x):
self.windows.append(x)
if self.n <= self.WIN_SIZE:
# Calculating first variance
self.n += 1
delta = x - self.mean
self.mean += delta / self.n
self.run_var += delta * (x - self.mean)
else:
# Adjusting variance
x_removed = self.windows.popleft()
old_m = self.mean
self.mean += (x - x_removed) / self.WIN_SIZE
self.run_var += (x + x_removed - old_m - self.mean) * (x - x_removed)
def get_mean(self):
return self.mean if self.n else 0.0
def get_var(self):
return self.run_var / (self.WIN_SIZE - 1) if self.n > 1 else 0.0
def get_std(self):
return math.sqrt(self.get_var())
def get_all(self):
return list(self.windows)
def __str__(self):
return "Current window values: {}".format(list(self.windows))
I look forward to be proven wrong on this but I don't think this can be done "quickly." That said, a large part of the calculation is keeping track of the EV over the window which can be done easily.
I'll leave with the question: are you sure you need a windowed function? Unless you are working with very large windows it is probably better to just use a well known predefined algorithm.
I guess keeping track of your 20 samples, Sum(X^2 from 1..20), and Sum(X from 1..20) and then successively recomputing the two sums at each iteration isn't efficient enough? It's possible to recompute the new variance without adding up, squaring, etc., all of the samples each time.
As in:
Sum(X^2 from 2..21) = Sum(X^2 from 1..20) - X_1^2 + X_21^2
Sum(X from 2..21) = Sum(X from 1..20) - X_1 + X_21
Here's another O(log k) solution: find squares the original sequence, then sum pairs, then quadruples, etc.. (You'll need a bit of a buffer to be able to find all of these efficiently.) Then add up those values that you need to to get your answer. For example:
||||||||||||||||||||||||| // Squares
| | | | | | | | | | | | | // Sum of squares for pairs
| | | | | | | // Pairs of pairs
| | | | // (etc.)
| |
^------------------^ // Want these 20, which you can get with
| | // one...
| | | | // two, three...
| | // four...
|| // five stored values.
Now you use your standard E(x^2)-E(x)^2 formula and you're done. (Not if you need good stability for small sets of numbers; this was assuming that it was only accumulation of rolling error that was causing issues.)
That said, summing 20 squared numbers is very fast these days on most architectures. If you were doing more--say, a couple hundred--a more efficient method would clearly be better. But I'm not sure that brute force isn't the way to go here.
For only 20 values, it's trivial to adapt the method exposed here (I didn't say fast, though).
You can simply pick up an array of 20 of these RunningStat classes.
The first 20 elements of the stream are somewhat special, however once this is done, it's much more simple:
when a new element arrives, clear the current RunningStat instance, add the element to all 20 instances, and increment the "counter" (modulo 20) which identifies the new "full" RunningStat instance
at any given moment, you can consult the current "full" instance to get your running variant.
You will obviously note that this approach isn't really scalable...
You can also note that there is some redudancy in the numbers we keep (if you go with the RunningStat full class). An obvious improvement would be to keep the 20 lasts Mk and Sk directly.
I cannot think of a better formula using this particular algorithm, I am afraid that its recursive formulation somewhat ties our hands.
This is just a minor addition to the excellent answer provided by DanS. The following equations are for removing the oldest sample from the window and updating the mean and variance. This is useful, for example, if you want to take smaller windows near the right edge of your input data stream (i.e. just remove the oldest window sample without adding a new sample).
window_size -= 1; % decrease window size by 1 sample
new_mean = prev_mean + (prev_mean - x_old) / window_size
varSum = varSum - (prev_mean - x_old) * (new_mean - x_old)
Here, x_old is the oldest sample in the window you wish to remove.
For those coming here now, here's a reference containing the full derivation, with proofs, of DanS's answer and Jaime's related comment.
DanS and Jaime's response in concise C.
typedef struct {
size_t n, i;
float *samples, mean, var;
} rolling_var_t;
void rolling_var_init(rolling_var_t *c, size_t window_size) {
size_t ss;
memset(c, 0, sizeof(*c));
c->n = window_size;
c->samples = (float *) malloc(ss = sizeof(float)*window_size);
memset(c->samples, 0, ss);
}
void rolling_var_add(rolling_var_t *c, float x) {
float nmean; // new mean
float xold; // oldest x
float dx;
c->i = (c->i + 1) % c->n;
xold = c->samples[c->i];
dx = x - xold;
nmean = c->mean + dx / (float) c->n; // walk mean
//c->var += ((x - c->mean)*(x - nmean) - (xold - c->mean) * (xold - nmean)) / (float) c->n;
c->var += ((x + xold - c->mean - nmean) * dx) / (float) c->n;
c->mean = nmean;
c->samples[c->i] = x;
}

Fastest way to modify one digit of an integer

Suppose I have an int x = 54897, old digit index (0 based), and the new value for that digit. What's the fastest way to get the new value?
Example
x = 54897
index = 3
value = 2
y = f(x, index, value) // => 54827
Edit: by fastest, I definitely mean faster performance. No string processing.
In simplest case (considering the digits are numbered from LSB to MSB, the first one being 0) AND knowing the old digit, we could do as simple as that:
num += (new_digit - old_digit) * 10**pos;
For the real problem we would need:
1) the MSB-first version of the pos, that could cost you a log() or at most log10(MAX_INT) divisions by ten (could be improved using binary search).
2) the digit from that pos that would need at most 2 divisions (or zero, using results from step 1).
You could also use the special fpu instruction from x86 that is able to save a float in BCD (I have no idea how slow it is).
UPDATE: the first step could be done even faster, without any divisions, with a binary search like this:
int my_log10(unsigned short n){
// short: 0.. 64k -> 1.. 5 digits
if (n < 1000){ // 1..3
if (n < 10) return 1;
if (n < 100) return 2;
return 3;
} else { // 4..5
if (n < 10000) return 4;
return 5;
}
}
If your index started at the least significant digit, you could do something like
p = pow(10,index);
x = (x / (p*10) * (p*10) + value * p + x % p).
But since your index is backwards, a string is probably the way to go. It would also be more readable and maintainable.
Calculate the "mask" M: 10 raised to the power of index, where index is a zero-based index from the right. If you need to index from the left, recalculate index accordingly.
Calculate the "prefix" PRE = x / (M * 10) * (M * 10)
Calculate the "suffix" SUF = x % M
Calculate the new "middle part" MID = value * M
Generate the new number new_x = PRE + MID + POST.
P.S. ruslik's answer does it more elegantly :)
You need to start by figuring out how many digits are in your input. I can think of two ways of doing that, one with a loop and one with logarithms. Here's the loop version. This will fail for negative and zero inputs and when the index is out of bounds, probably other conditions too, but it's a starting point.
def f(x, index, value):
place = 1
residual = x
while residual > 0:
if index < 0:
place *= 10
index -= 1
residual /= 10
digit = (x / place) % 10
return x - (place * digit) + (place * value)
P.S. This is working Python code. The principle of something simple like this is easy to work out, but the details are so tricky that you really need to iterate it a bit. In this case I started with the principle that I wanted to subtract out the old digit and add the new one; from there it was a matter of getting the correct multiplier.
You gotta get specific with your compute platform if you're talking about performance.
I would approach this by converting the number into pairs of decimal digits, 4 bit each.
Then I would find and process the pair that needs modification as a byte.
Then I would put the number back together.
There are assemblers that do this very well.

Resources