Generate any number using Incrementation and mult by 2 - algorithm

I'm looking for algorithm working in loop which will generate any natural number n with using only incrementation and multiplication by 2 well trivial way is known (increment number n times) but I'm looking for something a little bit faster. Honestly I don't even know how I should start this.

Basically, what you want to do is shift in the bits of the number from the right, starting with the MSB.
For example, if your number is 70, then the binary of it is 0b1000110. So, you want to "shift in" the bits 1, 0, 0, 0, 1, 1, 0.
To shift in a zero, you simply double the number. To shift in a one, you double the number, then increment it.
if (bit_to_be_shifted_in != 0)
x = (x * 2) + 1;
else
x = x * 2;
So, if you're given an array of bits from MSB to LSB (i.e. from left to right), then the C code looks like this:
x = 0;
for (i = 0; i < number_of_bits; i++)
{
if (bits[i] != 0)
x = x * 2 + 1;
else
x = x * 2;
}

One way of doing this is to go backwards. If it's an odd number, subtract one. If it's even, divide by 2.
while(n > 0) {
n & 1 ? n &= ~1 : n >>= 1;
}

Related

How to find the number of values in a given range divisible by a given value?

I have three numbers x, y , z.
For a range between numbers x and y.
How can i find the total numbers whose % with z is 0 i.e. how many numbers between x and y are divisible by z ?
It can be done in O(1): find the first one, find the last one, find the count of all other.
I'm assuming the range is inclusive. If your ranges are exclusive, adjust the bounds by one:
find the first value after x that is divisible by z. You can discard x:
x_mod = x % z;
if(x_mod != 0)
x += (z - x_mod);
find the last value before y that is divisible by y. You can discard y:
y -= y % z;
find the size of this range:
if(x > y)
return 0;
else
return (y - x) / z + 1;
If mathematical floor and ceil functions are available, the first two parts can be written more readably. Also the last part can be compressed using math functions:
x = ceil (x, z);
y = floor (y, z);
return max((y - x) / z + 1, 0);
if the input is guaranteed to be a valid range (x >= y), the last test or max is unneccessary:
x = ceil (x, z);
y = floor (y, z);
return (y - x) / z + 1;
(2017, answer rewritten thanks to comments)
The number of multiples of z in a number n is simply n / z
/ being the integer division, meaning decimals that could result from the division are simply ignored (for instance 17/5 => 3 and not 3.4).
Now, in a range from x to y, how many multiples of z are there?
Let see how many multiples m we have up to y
0----------------------------------x------------------------y
-m---m---m---m---m---m---m---m---m---m---m---m---m---m---m---
You see where I'm going... to get the number of multiples in the range [ x, y ], get the number of multiples of y then subtract the number of multiples before x, (x-1) / z
Solution: ( y / z ) - (( x - 1 ) / z )
Programmatically, you could make a function numberOfMultiples
function numberOfMultiples(n, z) {
return n / z;
}
to get the number of multiples in a range [x, y]
numberOfMultiples(y) - numberOfMultiples(x-1)
The function is O(1), there is no need of a loop to get the number of multiples.
Examples of results you should find
[30, 90] ÷ 13 => 4
[1, 1000] ÷ 6 => 166
[100, 1000000] ÷ 7 => 142843
[777, 777777777] ÷ 7 => 111111001
For the first example, 90 / 13 = 6, (30-1) / 13 = 2, and 6-2 = 4
---26---39---52---65---78---91--
^ ^
30<---(4 multiples)-->90
I also encountered this on Codility. It took me much longer than I'd like to admit to come up with a good solution, so I figured I would share what I think is an elegant solution!
Straightforward Approach 1/2:
O(N) time solution with a loop and counter, unrealistic when N = 2 billion.
Awesome Approach 3:
We want the number of digits in some range that are divisible by K.
Simple case: assume range [0 .. n*K], N = n*K
N/K represents the number of digits in [0,N) that are divisible by K, given N%K = 0 (aka. N is divisible by K)
ex. N = 9, K = 3, Num digits = |{0 3 6}| = 3 = 9/3
Similarly,
N/K + 1 represents the number of digits in [0,N] divisible by K
ex. N = 9, K = 3, Num digits = |{0 3 6 9}| = 4 = 9/3 + 1
I think really understanding the above fact is the trickiest part of this question, I cannot explain exactly why it works.
The rest boils down to prefix sums and handling special cases.
Now we don't always have a range that begins with 0, and we cannot assume the two bounds will be divisible by K.
But wait! We can fix this by calculating our own nice upper and lower bounds and using some subtraction magic :)
First find the closest upper and lower in the range [A,B] that are divisible by K.
Upper bound (easier): ex. B = 10, K = 3, new_B = 9... the pattern is B - B%K
Lower bound: ex. A = 10, K = 3, new_A = 12... try a few more and you will see the pattern is A - A%K + K
Then calculate the following using the above technique:
Determine the total number of digits X between [0,B] that are divisible by K
Determine the total number of digits Y between [0,A) that are divisible by K
Calculate the number of digits between [A,B] that are divisible by K in constant time by the expression X - Y
Website: https://codility.com/demo/take-sample-test/count_div/
class CountDiv {
public int solution(int A, int B, int K) {
int firstDivisible = A%K == 0 ? A : A + (K - A%K);
int lastDivisible = B%K == 0 ? B : B - B%K; //B/K behaves this way by default.
return (lastDivisible - firstDivisible)/K + 1;
}
}
This is my first time explaining an approach like this. Feedback is very much appreciated :)
This is one of the Codility Lesson 3 questions. For this question, the input is guaranteed to be in a valid range. I answered it using Javascript:
function solution(x, y, z) {
var totalDivisibles = Math.floor(y / z),
excludeDivisibles = Math.floor((x - 1) / z),
divisiblesInArray = totalDivisibles - excludeDivisibles;
return divisiblesInArray;
}
https://codility.com/demo/results/demoQX3MJC-8AP/
(I actually wanted to ask about some of the other comments on this page but I don't have enough rep points yet).
Divide y-x by z, rounding down. Add one if y%z < x%z or if x%z == 0.
No mathematical proof, unless someone cares to provide one, but test cases, in Perl:
#!perl
use strict;
use warnings;
use Test::More;
sub multiples_in_range {
my ($x, $y, $z) = #_;
return 0 if $x > $y;
my $ret = int( ($y - $x) / $z);
$ret++ if $y%$z < $x%$z or $x%$z == 0;
return $ret;
}
for my $z (2 .. 10) {
for my $x (0 .. 2*$z) {
for my $y (0 .. 4*$z) {
is multiples_in_range($x, $y, $z),
scalar(grep { $_ % $z == 0 } $x..$y),
"[$x..$y] mod $z";
}
}
}
done_testing;
Output:
$ prove divrange.pl
divrange.pl .. ok
All tests successful.
Files=1, Tests=3405, 0 wallclock secs ( 0.20 usr 0.02 sys + 0.26 cusr 0.01 csys = 0.49 CPU)
Result: PASS
Let [A;B] be an interval of positive integers including A and B such that 0 <= A <= B, K be the divisor.
It is easy to see that there are N(A) = ⌊A / K⌋ = floor(A / K) factors of K in interval [0;A]:
1K 2K 3K 4K 5K
●········x········x··●·····x········x········x···>
0 A
Similarly, there are N(B) = ⌊B / K⌋ = floor(B / K) factors of K in interval [0;B]:
1K 2K 3K 4K 5K
●········x········x········x········x···●····x···>
0 B
Then N = N(B) - N(A) equals to the number of K's (the number of integers divisible by K) in range (A;B]. The point A is not included, because the subtracted N(A) includes this point. Therefore, the result should be incremented by one, if A mod K is zero:
N := N(B) - N(A)
if (A mod K = 0)
N := N + 1
Implementation in PHP
function solution($A, $B, $K) {
if ($K < 1)
return 0;
$c = floor($B / $K) - floor($A / $K);
if ($A % $K == 0)
$c++;
return (int)$c;
}
In PHP, the effect of the floor function can be achieved by casting to the integer type:
$c = (int)($B / $K) - (int)($A / $K);
which, I think, is faster.
Here is my short and simple solution in C++ which got 100/100 on codility. :)
Runs in O(1) time. I hope its not difficult to understand.
int solution(int A, int B, int K) {
// write your code in C++11
int cnt=0;
if( A%K==0 or B%K==0)
cnt++;
if(A>=K)
cnt+= (B - A)/K;
else
cnt+=B/K;
return cnt;
}
(floor)(high/d) - (floor)(low/d) - (high%d==0)
Explanation:
There are a/d numbers divisible by d from 0.0 to a. (d!=0)
Therefore (floor)(high/d) - (floor)(low/d) will give numbers divisible in the range (low,high] (Note that low is excluded and high is included in this range)
Now to remove high from the range just subtract (high%d==0)
Works for integers, floats or whatever (Use fmodf function for floats)
Won't strive for an o(1) solution, this leave for more clever person:) Just feel this is a perfect usage scenario for function programming. Simple and straightforward.
> x,y,z=1,1000,6
=> [1, 1000, 6]
> (x..y).select {|n| n%z==0}.size
=> 166
EDIT: after reading other's O(1) solution. I feel shamed. Programming made people lazy to think...
Division (a/b=c) by definition - taking a set of size a and forming groups of size b. The number of groups of this size that can be formed, c, is the quotient of a and b. - is nothing more than the number of integers within range/interval ]0..a] (not including zero, but including a) that are divisible by b.
so by definition:
Y/Z - number of integers within ]0..Y] that are divisible by Z
and
X/Z - number of integers within ]0..X] that are divisible by Z
thus:
result = [Y/Z] - [X/Z] + x (where x = 1 if and only if X is divisible by Y otherwise 0 - assuming the given range [X..Y] includes X)
example :
for (6, 12, 2) we have 12/2 - 6/2 + 1 (as 6%2 == 0) = 6 - 3 + 1 = 4 // {6, 8, 10, 12}
for (5, 12, 2) we have 12/2 - 5/2 + 0 (as 5%2 != 0) = 6 - 2 + 0 = 4 // {6, 8, 10, 12}
The time complexity of the solution will be linear.
Code Snippet :
int countDiv(int a, int b, int m)
{
int mod = (min(a, b)%m==0);
int cnt = abs(floor(b/m) - floor(a/m)) + mod;
return cnt;
}
here n will give you count of number and will print sum of all numbers that are divisible by k
int a = sc.nextInt();
int b = sc.nextInt();
int k = sc.nextInt();
int first = 0;
if (a > k) {
first = a + a/k;
} else {
first = k;
}
int last = b - b%k;
if (first > last) {
System.out.println(0);
} else {
int n = (last - first)/k+1;
System.out.println(n * (first + last)/2);
}
Here is the solution to the problem written in Swift Programming Language.
Step 1: Find the first number in the range divisible by z.
Step 2: Find the last number in the range divisible by z.
Step 3: Use a mathematical formula to find the number of divisible numbers by z in the range.
func solution(_ x : Int, _ y : Int, _ z : Int) -> Int {
var numberOfDivisible = 0
var firstNumber: Int
var lastNumber: Int
if y == x {
return x % z == 0 ? 1 : 0
}
//Find first number divisible by z
let moduloX = x % z
if moduloX == 0 {
firstNumber = x
} else {
firstNumber = x + (z - moduloX)
}
//Fist last number divisible by z
let moduloY = y % z
if moduloY == 0 {
lastNumber = y
} else {
lastNumber = y - moduloY
}
//Math formula
numberOfDivisible = Int(floor(Double((lastNumber - firstNumber) / z))) + 1
return numberOfDivisible
}
public static int Solution(int A, int B, int K)
{
int count = 0;
//If A is divisible by K
if(A % K == 0)
{
count = (B / K) - (A / K) + 1;
}
//If A is not divisible by K
else if(A % K != 0)
{
count = (B / K) - (A / K);
}
return count;
}
This can be done in O(1).
Here you are a solution in C++.
auto first{ x % z == 0 ? x : x + z - x % z };
auto last{ y % z == 0 ? y : y - y % z };
auto ans{ (last - first) / z + 1 };
Where first is the first number that ∈ [x; y] and is divisible by z, last is the last number that ∈ [x; y] and is divisible by z and ans is the answer that you are looking for.

Number distribution

Problem: We have x checkboxes and we want to check y of them evenly.
Example 1: select 50 checkboxes of 100 total.
[-]
[x]
[-]
[x]
...
Example 2: select 33 checkboxes of 100 total.
[-]
[-]
[x]
[-]
[-]
[x]
...
Example 3: select 66 checkboxes of 100 total:
[-]
[x]
[x]
[-]
[x]
[x]
...
But we're having trouble to come up with a formula to check them in code, especially once you go 11/111 or something similar. Anyone has an idea?
Let's first assume y is divisible by x. Then we denote p = y/x and the solution is simple. Go through the list, every p elements, mark 1 of them.
Now, let's say r = y%x is non zero. Still p = y/x where / is integer devision. So, you need to:
In the first p-r elements, mark 1 elements
In the last r elements, mark 2 elements
Note: This depends on how you define evenly distributed. You might want to spread the r sections withx+1 elements in between p-r sections with x elements, which indeed is again the same problem and could be solved recursively.
Alright so it wasn't actually correct. I think this would do though:
Regardless of divisibility:
if y > 2*x, then mark 1 element every p = y/x elements, x times.
if y < 2*x, then mark all, and do the previous step unmarking y-x out of y checkboxes (so like in the previous case, but x is replaced by y-x)
Note: This depends on how you define evenly distributed. You might want to change between p and p+1 elements for example to distribute them better.
Here's a straightforward solution using integer arithmetic:
void check(char boxes[], int total_count, int check_count)
{
int i;
for (i = 0; i < total_count; i++)
boxes[i] = '-';
for (i = 0; i < check_count; i++)
boxes[i * total_count / check_count] = 'x';
}
total_count is the total number of boxes, and check_count is the number of boxes to check.
First, it sets every box to unchecked. Then, it checks check_count boxes, scaling the counter to the number of boxes.
Caveat: this is left-biased rather than right-biased like in your examples. That is, it prints x--x-- rather than --x--x. You can turn it around by replacing
boxes[i * total_count / check_count] = 'x';
with:
boxes[total_count - (i * total_count / check_count) - 1] = 'x';
Correctness
Assuming 0 <= check_count <= total_count, and that boxes has space for at least total_count items, we can prove that:
No check marks will overlap. i * total_count / check_count increments by at least one on every iteration, because total_count >= check_count.
This will not overflow the buffer. The subscript i * total_count / check_count
Will be >= 0. i, total_count, and check_count will all be >= 0.
Will be < total_count. When n > 0 and d > 0:
(n * d - 1) / d < n
In other words, if we take n * d / d, and nudge the numerator down, the quotient will go down, too.
Therefore, (check_count - 1) * total_count / check_count will be less than total_count, with the assumptions made above. A division by zero won't happen because if check_count is 0, the loop in question will have zero iterations.
Say number of checkboxes is C and the number of Xes is N.
You example states that having C=111 and N=11 is your most troublesome case.
Try this: divide C/N. Call it D. Have index in the array as double number I. Have another variable as counter, M.
double D = (double)C / (double)N;
double I = 0.0;
int M = N;
while (M > 0) {
if (checkboxes[Round(I)].Checked) { // if we selected it, skip to next
I += 1.0;
continue;
}
checkboxes[Round(I)].Checked = true;
M --;
I += D;
if (Round(I) >= C) { // wrap around the end
I -= C;
}
}
Please note that Round(x) should return nearest integer value for x.
This one could work for you.
I think the key is to keep count of how many boxes you expect to have per check.
Say you want 33 checks in 100 boxes. 100 / 33 = 3.030303..., so you expect to have one check every 3.030303... boxes. That means every 3.030303... boxes, you need to add a check. 66 checks in 100 boxes would mean one check every 1.51515... boxes, 11 checks in 111 boxes would mean one check every 10.090909... boxes, and so on.
double count = 0;
for (int i = 0; i < boxes; i++) {
count += 1;
if (count >= boxes/checks) {
checkboxes[i] = true;
count -= count.truncate(); // so 1.6 becomes 0.6 - resetting the count but keeping the decimal part to keep track of "partial boxes" so far
}
}
You might rather use decimal as opposed to double for count, or there's a slight chance the last box will get skipped due to rounding errors.
Bresenham-like algorithm is suitable to distribute checkboxes evenly. Output of 'x' corresponds to Y-coordinate change. It is possible to choose initial err as random value in range [0..places) to avoid biasing.
def Distribute(places, stars):
err = places // 2
res = ''
for i in range(0, places):
err = err - stars
if err < 0 :
res = res + 'x'
err = err + places
else:
res = res + '-'
print(res)
Distribute(24,17)
Distribute(24,12)
Distribute(24,5)
output:
x-xxx-xx-xx-xxx-xx-xxx-x
-x-x-x-x-x-x-x-x-x-x-x-x
--x----x----x---x----x--
Quick html/javascript solution:
<html>
<body>
<div id='container'></div>
<script>
var cbCount = 111;
var cbCheckCount = 11;
var cbRatio = cbCount / cbCheckCount;
var buildCheckCount = 0;
var c = document.getElementById('container');
for (var i=1; i <= cbCount; i++) {
// make a checkbox
var cb = document.createElement('input');
cb.type = 'checkbox';
test = i / cbRatio - buildCheckCount;
if (test >= 1) {
// check the checkbox we just made
cb.checked = 'checked';
buildCheckCount++;
}
c.appendChild(cb);
c.appendChild(document.createElement('br'));
}
</script>
</body></html>
Adapt code from one question's answer or another answer from earlier this month. Set N = x = number of checkboxes and M = y = number to be checked and apply formula (N*i+N)/M - (N*i)/M for section sizes. (Also see Joey Adams' answer.)
In python, the adapted code is:
N=100; M=33; p=0;
for i in range(M):
k = (N+N*i)/M
for j in range(p,k-1): print "-",
print "x",
p=k
which produces
- - x - - x - - x - - x - - [...] x - - x - - - x where [...] represents 25 --x repetitions.
With M=66 the code gives
x - x x - x x - x x - x x - [...] x x - x x - x - x where [...] represents mostly xx- repetitions, with one x- in the middle.
Note, in C or java: Substitute for (i=0; i<M; ++i) in place of for i in range(M):. Substitute for (j=p; j<k-1; ++j) in place of for j in range(p,k-1):.
Correctness: Note that M = x boxes get checked because print "x", is executed M times.
What about using Fisher–Yates shuffle ?
Make array, shuffle and pick first n elements. You do not need to shuffle all of them, just first n of array. Shuffling can be find in most language libraries.

Number of 1s in the two's complement binary representations of integers in a range

This problem is from the 2011 Codesprint (http://csfall11.interviewstreet.com/):
One of the basics of Computer Science is knowing how numbers are represented in 2's complement. Imagine that you write down all numbers between A and B inclusive in 2's complement representation using 32 bits. How many 1's will you write down in all ?
Input:
The first line contains the number of test cases T (<1000). Each of the next T lines contains two integers A and B.
Output:
Output T lines, one corresponding to each test case.
Constraints:
-2^31 <= A <= B <= 2^31 - 1
Sample Input:
3
-2 0
-3 4
-1 4
Sample Output:
63
99
37
Explanation:
For the first case, -2 contains 31 1's followed by a 0, -1 contains 32 1's and 0 contains 0 1's. Thus the total is 63.
For the second case, the answer is 31 + 31 + 32 + 0 + 1 + 1 + 2 + 1 = 99
I realize that you can use the fact that the number of 1s in -X is equal to the number of 0s in the complement of (-X) = X-1 to speed up the search. The solution claims that there is a O(log X) recurrence relation for generating the answer but I do not understand it. The solution code can be viewed here: https://gist.github.com/1285119
I would appreciate it if someone could explain how this relation is derived!
Well, it's not that complicated...
The single-argument solve(int a) function is the key. It is short, so I will cut&paste it here:
long long solve(int a)
{
if(a == 0) return 0 ;
if(a % 2 == 0) return solve(a - 1) + __builtin_popcount(a) ;
return ((long long)a + 1) / 2 + 2 * solve(a / 2) ;
}
It only works for non-negative a, and it counts the number of 1 bits in all integers from 0 to a inclusive.
The function has three cases:
a == 0 -> returns 0. Obviously.
a even -> returns the number of 1 bits in a plus solve(a-1). Also pretty obvious.
The final case is the interesting one. So, how do we count the number of 1 bits from 0 to an odd number a?
Consider all of the integers between 0 and a, and split them into two groups: The evens, and the odds. For example, if a is 5, you have two groups (in binary):
000 (aka. 0)
010 (aka. 2)
100 (aka. 4)
and
001 (aka 1)
011 (aka 3)
101 (aka 5)
Observe that these two groups must have the same size (because a is odd and the range is inclusive). To count how many 1 bits there are in each group, first count all but the last bits, then count the last bits.
All but the last bits looks like this:
00
01
10
...and it looks like this for both groups. The number of 1 bits here is just solve(a/2). (In this example, it is the number of 1 bits from 0 to 2. Also, recall that integer division in C/C++ rounds down.)
The last bit is zero for every number in the first group and one for every number in the second group, so those last bits contribute (a+1)/2 one bits to the total.
So the third case of the recursion is (a+1)/2 + 2*solve(a/2), with appropriate casts to long long to handle the case where a is INT_MAX (and thus a+1 overflows).
This is an O(log N) solution. To generalize it to solve(a,b), you just compute solve(b) - solve(a), plus the appropriate logic for worrying about negative numbers. That is what the two-argument solve(int a, int b) is doing.
Cast the array into a series of integers. Then for each integer do:
int NumberOfSetBits(int i)
{
i = i - ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}
Also this is portable, unlike __builtin_popcount
See here: How to count the number of set bits in a 32-bit integer?
when a is positive, the better explanation was already been posted.
If a is negative, then on a 32-bit system each negative number between a and zero will have 32 1's bits less the number of bits in the range from 0 to the binary representation of positive a.
So, in a better way,
long long solve(int a) {
if (a >= 0){
if (a == 0) return 0;
else if ((a %2) == 0) return solve(a - 1) + noOfSetBits(a);
else return (2 * solve( a / 2)) + ((long long)a + 1) / 2;
}else {
a++;
return ((long long)(-a) + 1) * 32 - solve(-a);
}
}
In the following code, the bitsum of x is defined as the count of 1 bits in the two's complement representation of the numbers between 0 and x (inclusive), where Integer.MIN_VALUE <= x <= Integer.MAX_VALUE.
For example:
bitsum(0) is 0
bitsum(1) is 1
bitsum(2) is 1
bitsum(3) is 4
..etc
10987654321098765432109876543210 i % 10 for 0 <= i <= 31
00000000000000000000000000000000 0
00000000000000000000000000000001 1
00000000000000000000000000000010 2
00000000000000000000000000000011 3
00000000000000000000000000000100 4
00000000000000000000000000000101 ...
00000000000000000000000000000110
00000000000000000000000000000111 (2^i)-1
00000000000000000000000000001000 2^i
00000000000000000000000000001001 (2^i)+1
00000000000000000000000000001010 ...
00000000000000000000000000001011 x, 011 = x & (2^i)-1 = 3
00000000000000000000000000001100
00000000000000000000000000001101
00000000000000000000000000001110
00000000000000000000000000001111
00000000000000000000000000010000
00000000000000000000000000010001
00000000000000000000000000010010 18
...
01111111111111111111111111111111 Integer.MAX_VALUE
The formula of the bitsum is:
bitsum(x) = bitsum((2^i)-1) + 1 + x - 2^i + bitsum(x & (2^i)-1 )
Note that x - 2^i = x & (2^i)-1
Negative numbers are handled slightly differently than positive numbers. In this case the number of zeros is subtracted from the total number of bits:
Integer.MIN_VALUE <= x < -1
Total number of bits: 32 * -x.
The number of zeros in a negative number x is equal to the number of ones in -x - 1.
public class TwosComplement {
//t[i] is the bitsum of (2^i)-1 for i in 0 to 31.
private static long[] t = new long[32];
static {
t[0] = 0;
t[1] = 1;
int p = 2;
for (int i = 2; i < 32; i++) {
t[i] = 2*t[i-1] + p;
p = p << 1;
}
}
//count the bits between x and y inclusive
public static long bitsum(int x, int y) {
if (y > x && x > 0) {
return bitsum(y) - bitsum(x-1);
}
else if (y >= 0 && x == 0) {
return bitsum(y);
}
else if (y == x) {
return Integer.bitCount(y);
}
else if (x < 0 && y == 0) {
return bitsum(x);
} else if (x < 0 && x < y && y < 0 ) {
return bitsum(x) - bitsum(y+1);
} else if (x < 0 && x < y && 0 < y) {
return bitsum(x) + bitsum(y);
}
throw new RuntimeException(x + " " + y);
}
//count the bits between 0 and x
public static long bitsum(int x) {
if (x == 0) return 0;
if (x < 0) {
if (x == -1) {
return 32;
} else {
long y = -(long)x;
return 32 * y - bitsum((int)(y - 1));
}
} else {
int n = x;
int sum = 0; //x & (2^i)-1
int j = 0;
int i = 1; //i = 2^j
int lsb = n & 1; //least significant bit
n = n >>> 1;
while (n != 0) {
sum += lsb * i;
lsb = n & 1;
n = n >>> 1;
i = i << 1;
j++;
}
long tot = t[j] + 1 + sum + bitsum(sum);
return tot;
}
}
}

An interview question: About Probability

An interview question:
Given a function f(x) that 1/4 times returns 0, 3/4 times returns 1.
Write a function g(x) using f(x) that 1/2 times returns 0, 1/2 times returns 1.
My implementation is:
function g(x) = {
if (f(x) == 0){ // 1/4
var s = f(x)
if( s == 1) {// 3/4 * 1/4
return s // 3/16
} else {
g(x)
}
} else { // 3/4
var k = f(x)
if( k == 0) {// 1/4 * 3/4
return k // 3/16
} else {
g(x)
}
}
}
Am I right? What's your solution?(you can use any language)
If you call f(x) twice in a row, the following outcomes are possible (assuming that
successive calls to f(x) are independent, identically distributed trials):
00 (probability 1/4 * 1/4)
01 (probability 1/4 * 3/4)
10 (probability 3/4 * 1/4)
11 (probability 3/4 * 3/4)
01 and 10 occur with equal probability. So iterate until you get one of those
cases, then return 0 or 1 appropriately:
do
a=f(x); b=f(x);
while (a == b);
return a;
It might be tempting to call f(x) only once per iteration and keep track of the two
most recent values, but that won't work. Suppose the very first roll is 1,
with probability 3/4. You'd loop until the first 0, then return 1 (with probability 3/4).
The problem with your algorithm is that it repeats itself with high probability. My code:
function g(x) = {
var s = f(x) + f(x) + f(x);
// s = 0, probability: 1/64
// s = 1, probability: 9/64
// s = 2, probability: 27/64
// s = 3, probability: 27/64
if (s == 2) return 0;
if (s == 3) return 1;
return g(x); // probability to go into recursion = 10/64, with only 1 additional f(x) calculation
}
I've measured average number of times f(x) was calculated for your algorithm and for mine. For yours f(x) was calculated around 5.3 times per one g(x) calculation. With my algorithm this number reduced to around 3.5. The same is true for other answers so far since they are actually the same algorithm as you said.
P.S.: your definition doesn't mention 'random' at the moment, but probably it is assumed. See my other answer.
Your solution is correct, if somewhat inefficient and with more duplicated logic. Here is a Python implementation of the same algorithm in a cleaner form.
def g ():
while True:
a = f()
if a != f():
return a
If f() is expensive you'd want to get more sophisticated with using the match/mismatch information to try to return with fewer calls to it. Here is the most efficient possible solution.
def g ():
lower = 0.0
upper = 1.0
while True:
if 0.5 < lower:
return 1
elif upper < 0.5:
return 0
else:
middle = 0.25 * lower + 0.75 * upper
if 0 == f():
lower = middle
else:
upper = middle
This takes about 2.6 calls to g() on average.
The way that it works is this. We're trying to pick a random number from 0 to 1, but we happen to stop as soon as we know whether the number is 0 or 1. We start knowing that the number is in the interval (0, 1). 3/4 of the numbers are in the bottom 3/4 of the interval, and 1/4 are in the top 1/4 of the interval. We decide which based on a call to f(x). This means that we are now in a smaller interval.
If we wash, rinse, and repeat enough times we can determine our finite number as precisely as possible, and will have an absolutely equal probability of winding up in any region of the original interval. In particular we have an even probability of winding up bigger than or less than 0.5.
If you wanted you could repeat the idea to generate an endless stream of bits one by one. This is, in fact, provably the most efficient way of generating such a stream, and is the source of the idea of entropy in information theory.
Given a function f(x) that 1/4 times returns 0, 3/4 times returns 1
Taking this statement literally, f(x) if called four times will always return zero once and 1 3 times. This is different than saying f(x) is a probabalistic function and the 0 to 1 ratio will approach 1 to 3 (1/4 vs 3/4) over many iterations. If the first interpretation is valid, than the only valid function for f(x) that will meet the criteria regardless of where in the sequence you start from is the sequence 0111 repeating. (or 1011 or 1101 or 1110 which are the same sequence from a different starting point). Given that constraint,
g()= (f() == f())
should suffice.
As already mentioned your definition is not that good regarding probability. Usually it means that not only probability is good but distribution also. Otherwise you can simply write g(x) which will return 1,0,1,0,1,0,1,0 - it will return them 50/50, but numbers won't be random.
Another cheating approach might be:
var invert = false;
function g(x) {
invert = !invert;
if (invert) return 1-f(x);
return f(x);
}
This solution will be better than all others since it calls f(x) only one time. But the results will not be very random.
A refinement of the same approach used in btilly's answer, achieving an average ~1.85 calls to f() per g() result (further refinement documented below achieves ~1.75, tbilly's ~2.6, Jim Lewis's accepted answer ~5.33). Code appears lower in the answer.
Basically, I generate random integers in the range 0 to 3 with even probability: the caller can then test bit 0 for the first 50/50 value, and bit 1 for a second. Reason: the f() probabilities of 1/4 and 3/4 map onto quarters much more cleanly than halves.
Description of algorithm
btilly explained the algorithm, but I'll do so in my own way too...
The algorithm basically generates a random real number x between 0 and 1, then returns a result depending on which "result bucket" that number falls in:
result bucket result
x < 0.25 0
0.25 <= x < 0.5 1
0.5 <= x < 0.75 2
0.75 <= x 3
But, generating a random real number given only f() is difficult. We have to start with the knowledge that our x value should be in the range 0..1 - which we'll call our initial "possible x" space. We then hone in on an actual value for x:
each time we call f():
if f() returns 0 (probability 1 in 4), we consider x to be in the lower quarter of the "possible x" space, and eliminate the upper three quarters from that space
if f() returns 1 (probability 3 in 4), we consider x to be in the upper three-quarters of the "possible x" space, and eliminate the lower quarter from that space
when the "possible x" space is completely contained by a single result bucket, that means we've narrowed x down to the point where we know which result value it should map to and have no need to get a more specific value for x.
It may or may not help to consider this diagram :-):
"result bucket" cut-offs 0,.25,.5,.75,1
0=========0.25=========0.5==========0.75=========1 "possible x" 0..1
| | . . | f() chooses x < vs >= 0.25
| result 0 |------0.4375-------------+----------| "possible x" .25..1
| | result 1| . . | f() chooses x < vs >= 0.4375
| | | . ~0.58 . | "possible x" .4375..1
| | | . | . | f() chooses < vs >= ~.58
| | ||. | | . | 4 distinct "possible x" ranges
Code
int g() // return 0, 1, 2, or 3
{
if (f() == 0) return 0;
if (f() == 0) return 1;
double low = 0.25 + 0.25 * (1.0 - 0.25);
double high = 1.0;
while (true)
{
double cutoff = low + 0.25 * (high - low);
if (f() == 0)
high = cutoff;
else
low = cutoff;
if (high < 0.50) return 1;
if (low >= 0.75) return 3;
if (low >= 0.50 && high < 0.75) return 2;
}
}
If helpful, an intermediary to feed out 50/50 results one at a time:
int h()
{
static int i;
if (!i)
{
int x = g();
i = x | 4;
return x & 1;
}
else
{
int x = i & 2;
i = 0;
return x ? 1 : 0;
}
}
NOTE: This can be further tweaked by having the algorithm switch from considering an f()==0 result to hone in on the lower quarter, to having it hone in on the upper quarter instead, based on which on average resolves to a result bucket more quickly. Superficially, this seemed useful on the third call to f() when an upper-quarter result would indicate an immediate result of 3, while a lower-quarter result still spans probability point 0.5 and hence results 1 and 2. When I tried it, the results were actually worse. A more complex tuning was needed to see actual benefits, and I ended up writing a brute-force comparison of lower vs upper cutoff for second through eleventh calls to g(). The best result I found was an average of ~1.75, resulting from the 1st, 2nd, 5th and 8th calls to g() seeking low (i.e. setting low = cutoff).
Here is a solution based on central limit theorem, originally due to a friend of mine:
/*
Given a function f(x) that 1/4 times returns 0, 3/4 times returns 1. Write a function g(x) using f(x) that 1/2 times returns 0, 1/2 times returns 1.
*/
#include <iostream>
#include <cstdlib>
#include <ctime>
#include <cstdio>
using namespace std;
int f() {
if (rand() % 4 == 0) return 0;
return 1;
}
int main() {
srand(time(0));
int cc = 0;
for (int k = 0; k < 1000; k++) { //number of different runs
int c = 0;
int limit = 10000; //the bigger the limit, the more we will approach %50 percent
for (int i=0; i<limit; ++i) c+= f();
cc += c < limit*0.75 ? 0 : 1; // c will be 0, with probability %50
}
printf("%d\n",cc); //cc is gonna be around 500
return 0;
}
Since each return of f() represents a 3/4 chance of TRUE, with some algebra we can just properly balance the odds. What we want is another function x() which returns a balancing probability of TRUE, so that
function g() {
return f() && x();
}
returns true 50% of the time.
So let's find the probability of x (p(x)), given p(f) and our desired total probability (1/2):
p(f) * p(x) = 1/2
3/4 * p(x) = 1/2
p(x) = (1/2) / 3/4
p(x) = 2/3
So x() should return TRUE with a probability of 2/3, since 2/3 * 3/4 = 6/12 = 1/2;
Thus the following should work for g():
function g() {
return f() && (rand() < 2/3);
}
Assuming
P(f[x] == 0) = 1/4
P(f[x] == 1) = 3/4
and requiring a function g[x] with the following assumptions
P(g[x] == 0) = 1/2
P(g[x] == 1) = 1/2
I believe the following definition of g[x] is sufficient (Mathematica)
g[x_] := If[f[x] + f[x + 1] == 1, 1, 0]
or, alternatively in C
int g(int x)
{
return f(x) + f(x+1) == 1
? 1
: 0;
}
This is based on the idea that invocations of {f[x], f[x+1]} would produce the following outcomes
{
{0, 0},
{0, 1},
{1, 0},
{1, 1}
}
Summing each of the outcomes we have
{
0,
1,
1,
2
}
where a sum of 1 represents 1/2 of the possible sum outcomes, with any other sum making up the other 1/2.
Edit.
As bdk says - {0,0} is less likely than {1,1} because
1/4 * 1/4 < 3/4 * 3/4
However, I am confused myself because given the following definition for f[x] (Mathematica)
f[x_] := Mod[x, 4] > 0 /. {False -> 0, True -> 1}
or alternatively in C
int f(int x)
{
return (x % 4) > 0
? 1
: 0;
}
then the results obtained from executing f[x] and g[x] seem to have the expected distribution.
Table[f[x], {x, 0, 20}]
{0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0}
Table[g[x], {x, 0, 20}]
{1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1}
This is much like the Monty Hall paradox.
In general.
Public Class Form1
'the general case
'
'twiceThis = 2 is 1 in four chance of 0
'twiceThis = 3 is 1 in six chance of 0
'
'twiceThis = x is 1 in 2x chance of 0
Const twiceThis As Integer = 7
Const numOf As Integer = twiceThis * 2
Private Sub Button1_Click(ByVal sender As System.Object, _
ByVal e As System.EventArgs) Handles Button1.Click
Const tries As Integer = 1000
y = New List(Of Integer)
Dim ct0 As Integer = 0
Dim ct1 As Integer = 0
Debug.WriteLine("")
''show all possible values of fx
'For x As Integer = 1 To numOf
' Debug.WriteLine(fx)
'Next
'test that gx returns 50% 0's and 50% 1's
Dim stpw As New Stopwatch
stpw.Start()
For x As Integer = 1 To tries
Dim g_x As Integer = gx()
'Debug.WriteLine(g_x.ToString) 'used to verify that gx returns 0 or 1 randomly
If g_x = 0 Then ct0 += 1 Else ct1 += 1
Next
stpw.Stop()
'the results
Debug.WriteLine((ct0 / tries).ToString("p1"))
Debug.WriteLine((ct1 / tries).ToString("p1"))
Debug.WriteLine((stpw.ElapsedTicks / tries).ToString("n0"))
End Sub
Dim prng As New Random
Dim y As New List(Of Integer)
Private Function fx() As Integer
'1 in numOf chance of zero being returned
If y.Count = 0 Then
'reload y
y.Add(0) 'fx has only one zero value
Do
y.Add(1) 'the rest are ones
Loop While y.Count < numOf
End If
'return a random value
Dim idx As Integer = prng.Next(y.Count)
Dim rv As Integer = y(idx)
y.RemoveAt(idx) 'remove the value selected
Return rv
End Function
Private Function gx() As Integer
'a function g(x) using f(x) that 50% of the time returns 0
' that 50% of the time returns 1
Dim rv As Integer = 0
For x As Integer = 1 To twiceThis
fx()
Next
For x As Integer = 1 To twiceThis
rv += fx()
Next
If rv = twiceThis Then Return 1 Else Return 0
End Function
End Class

Tickmark algorithm for a graph axis

I'm looking for an algorithm that places tick marks on an axis, given a range to display, a width to display it in, and a function to measure a string width for a tick mark.
For example, given that I need to display between 1e-6 and 5e-6 and a width to display in pixels, the algorithm would determine that I should put tickmarks (for example) at 1e-6, 2e-6, 3e-6, 4e-6, and 5e-6. Given a smaller width, it might decide that the optimal placement is only at the even positions, i.e. 2e-6 and 4e-6 (since putting more tickmarks would cause them to overlap).
A smart algorithm would give preference to tickmarks at multiples of 10, 5, and 2. Also, a smart algorithm would be symmetric around zero.
As I didn't like any of the solutions I've found so far, I implemented my own. It's in C# but it can be easily translated into any other language.
It basically chooses from a list of possible steps the smallest one that displays all values, without leaving any value exactly in the edge, lets you easily select which possible steps you want to use (without having to edit ugly if-else if blocks), and supports any range of values. I used a C# Tuple to return three values just for a quick and simple demonstration.
private static Tuple<decimal, decimal, decimal> GetScaleDetails(decimal min, decimal max)
{
// Minimal increment to avoid round extreme values to be on the edge of the chart
decimal epsilon = (max - min) / 1e6m;
max += epsilon;
min -= epsilon;
decimal range = max - min;
// Target number of values to be displayed on the Y axis (it may be less)
int stepCount = 20;
// First approximation
decimal roughStep = range / (stepCount - 1);
// Set best step for the range
decimal[] goodNormalizedSteps = { 1, 1.5m, 2, 2.5m, 5, 7.5m, 10 }; // keep the 10 at the end
// Or use these if you prefer: { 1, 2, 5, 10 };
// Normalize rough step to find the normalized one that fits best
decimal stepPower = (decimal)Math.Pow(10, -Math.Floor(Math.Log10((double)Math.Abs(roughStep))));
var normalizedStep = roughStep * stepPower;
var goodNormalizedStep = goodNormalizedSteps.First(n => n >= normalizedStep);
decimal step = goodNormalizedStep / stepPower;
// Determine the scale limits based on the chosen step.
decimal scaleMax = Math.Ceiling(max / step) * step;
decimal scaleMin = Math.Floor(min / step) * step;
return new Tuple<decimal, decimal, decimal>(scaleMin, scaleMax, step);
}
static void Main()
{
// Dummy code to show a usage example.
var minimumValue = data.Min();
var maximumValue = data.Max();
var results = GetScaleDetails(minimumValue, maximumValue);
chart.YAxis.MinValue = results.Item1;
chart.YAxis.MaxValue = results.Item2;
chart.YAxis.Step = results.Item3;
}
Take the longest of the segments about zero (or the whole graph, if zero is not in the range) - for example, if you have something on the range [-5, 1], take [-5,0].
Figure out approximately how long this segment will be, in ticks. This is just dividing the length by the width of a tick. So suppose the method says that we can put 11 ticks in from -5 to 0. This is our upper bound. For the shorter side, we'll just mirror the result on the longer side.
Now try to put in as many (up to 11) ticks in, such that the marker for each tick in the form i*10*10^n, i*5*10^n, i*2*10^n, where n is an integer, and i is the index of the tick. Now it's an optimization problem - we want to maximize the number of ticks we can put in, while at the same time minimizing the distance between the last tick and the end of the result. So assign a score for getting as many ticks as we can, less than our upper bound, and assign a score to getting the last tick close to n - you'll have to experiment here.
In the above example, try n = 1. We get 1 tick (at i=0). n = 2 gives us 1 tick, and we're further from the lower bound, so we know that we have to go the other way. n = 0 gives us 6 ticks, at each integer point point. n = -1 gives us 12 ticks (0, -0.5, ..., -5.0). n = -2 gives us 24 ticks, and so on. The scoring algorithm will give them each a score - higher means a better method.
Do this again for the i * 5 * 10^n, and i*2*10^n, and take the one with the best score.
(as an example scoring algorithm, say that the score is the distance to the last tick times the maximum number of ticks minus the number needed. This will likely be bad, but it'll serve as a decent starting point).
Funnily enough, just over a week ago I came here looking for an answer to the same question, but went away again and decided to come up with my own algorithm. I am here to share, in case it is of any use.
I wrote the code in Python to try and bust out a solution as quickly as possible, but it can easily be ported to any other language.
The function below calculates the appropriate interval (which I have allowed to be either 10**n, 2*10**n, 4*10**n or 5*10**n) for a given range of data, and then calculates the locations at which to place the ticks (based on which numbers within the range are divisble by the interval). I have not used the modulo % operator, since it does not work properly with floating-point numbers due to floating-point arithmetic rounding errors.
Code:
import math
def get_tick_positions(data: list):
if len(data) == 0:
return []
retpoints = []
data_range = max(data) - min(data)
lower_bound = min(data) - data_range/10
upper_bound = max(data) + data_range/10
view_range = upper_bound - lower_bound
num = lower_bound
n = math.floor(math.log10(view_range) - 1)
interval = 10**n
num_ticks = 1
while num <= upper_bound:
num += interval
num_ticks += 1
if num_ticks > 10:
if interval == 10 ** n:
interval = 2 * 10 ** n
elif interval == 2 * 10 ** n:
interval = 4 * 10 ** n
elif interval == 4 * 10 ** n:
interval = 5 * 10 ** n
else:
n += 1
interval = 10 ** n
num = lower_bound
num_ticks = 1
if view_range >= 10:
copy_interval = interval
else:
if interval == 10 ** n:
copy_interval = 1
elif interval == 2 * 10 ** n:
copy_interval = 2
elif interval == 4 * 10 ** n:
copy_interval = 4
else:
copy_interval = 5
first_val = 0
prev_val = 0
times = 0
temp_log = math.log10(interval)
if math.isclose(lower_bound, 0):
first_val = 0
elif lower_bound < 0:
if upper_bound < -2*interval:
if n < 0:
copy_ub = round(upper_bound*10**(abs(temp_log) + 1))
times = copy_ub // round(interval*10**(abs(temp_log) + 1)) + 2
else:
times = upper_bound // round(interval) + 2
while first_val >= lower_bound:
prev_val = first_val
first_val = times * copy_interval
if n < 0:
first_val *= (10**n)
times -= 1
first_val = prev_val
times += 3
else:
if lower_bound > 2*interval:
if n < 0:
copy_ub = round(lower_bound*10**(abs(temp_log) + 1))
times = copy_ub // round(interval*10**(abs(temp_log) + 1)) - 2
else:
times = lower_bound // round(interval) - 2
while first_val < lower_bound:
first_val = times*copy_interval
if n < 0:
first_val *= (10**n)
times += 1
if n < 0:
retpoints.append(first_val)
else:
retpoints.append(round(first_val))
val = first_val
times = 1
while val <= upper_bound:
val = first_val + times * interval
if n < 0:
retpoints.append(val)
else:
retpoints.append(round(val))
times += 1
retpoints.pop()
return retpoints
When passing in the following three data-points to the function
points = [-0.00493, -0.0003892, -0.00003292]
... the output I get (as a list) is as follows:
[-0.005, -0.004, -0.003, -0.002, -0.001, 0.0]
When passing this:
points = [1.399, 38.23823, 8309.33, 112990.12]
... I get:
[0, 20000, 40000, 60000, 80000, 100000, 120000]
When passing this:
points = [-54, -32, -19, -17, -13, -11, -8, -4, 12, 15, 68]
... I get:
[-60, -40, -20, 0, 20, 40, 60, 80]
... which all seem to be a decent choice of positions for placing ticks.
The function is written to allow 5-10 ticks, but that could easily be changed if you so please.
Whether the list of data supplied contains ordered or unordered data it does not matter, since it is only the minimum and maximum data points within the list that matter.
This simple algorithm yields an interval that is multiple of 1, 2, or 5 times a power of 10. And the axis range gets divided in at least 5 intervals. The code sample is in java language:
protected double calculateInterval(double range) {
double x = Math.pow(10.0, Math.floor(Math.log10(range)));
if (range / x >= 5)
return x;
else if (range / (x / 2.0) >= 5)
return x / 2.0;
else
return x / 5.0;
}
This is an alternative, for minimum 10 intervals:
protected double calculateInterval(double range) {
double x = Math.pow(10.0, Math.floor(Math.log10(range)));
if (range / (x / 2.0) >= 10)
return x / 2.0;
else if (range / (x / 5.0) >= 10)
return x / 5.0;
else
return x / 10.0;
}
I've been using the jQuery flot graph library. It's open source and does axis/tick generation quite well. I'd suggest looking at it's code and pinching some ideas from there.

Resources