Shifting and Masking Binary Bits - bit

I've came across this snippet of code on a book:
public static short countBits(int x) {
short numBit = 0;
while(x != 0) {
numBit += (x&1);
x >>>= 1;
}
return numBit;
}
However, I'm not really sure how numBit += (x&1); and x >>>= 1 works.
I think that numBit += (x&1) is comparing AND for a single digit and 1. Does it mean that if my binary number is 10001, the function is ANDing the 1000"1" bit with 1 on the first iteration of the while loop?
Also, what's the point of >>>= 1 ? I think that ">>>" is shifting the bits to the right by three but I can't figure out the purpose of doing so in this function.
Any help would be much appreciated. Thank you!

This function counts the number of bits that are set to "1". x & 1 is a bitwise-AND with the least significant bit of x's current value (either 1 if x is odd, or 0 if it's even). As such it makes perfect sense to add it to result. x >>>= 1 is equivalent to x = x >> 1 and this means "shift bits in x by 1 position to the right" (or, for unsigned integers, divide x by 2).

Related

Verify that a number can be decomposed into powers of 2

Is it possible to verify that a number can be decomposed into a sum of powers of 2 where the exponents are sequential?
Is there an algorithm to check this?
Example: where and
The binary representation would have a single, consecutive group of 1 bits.
To check this, you could first identify the value of the least significant bit, add that bit to the original value, and then check whether the result is a power of 2.
This leads to the following formula for a given x:
(x & (x + (x & -x))) == 0
This expression is also true when x is zero. If that case needs to be rejected as a solution, you need an extra condition for that.
In Python:
def f(x):
return x > 0 and (x & (x + (x & -x))) == 0
This can be done in an elegant way using bitwise operations to check whether the binary representation of the number is a single block of consecutive 1 bits, followed by perhaps some 0s.
The expression x & (x - 1) replaces the lowest 1 in the binary representation of x with a 0. If we call that number y, then y | (y >> 1) sets each bit to be a 1 if it had a 1 to its immediate left. If the original number x was a single block of consecutive 1 bits, then the result is the same as the number x that we started with, because the 1 which was removed will be replaced by the shift. On the other hand, if x is not a single block of consecutive 1 bits, then the shift will add at least one other 1 bit that wasn't there in the original x, and they won't be equal.
That works if x has more than one 1 bit, so the shift can put back the one that was removed. If x has only a single 1 bit, then removing it will result in y being zero. So we can check for that, too.
In Python:
def is_sum_of_consecutive_powers_of_two(x):
y = x & (x - 1)
z = y | (y >> 1)
return x == z or y == 0
Note that this returns True when x is zero, and that's the correct result if "a sum of consecutive powers of two" is allowed to be the empty sum. Otherwise, you will have to write a special case to reject zero.
A number can be represented as the sum of powers of 2 with sequential exponents iff its binary representation has all 1s adjacent.
E.g. the set of numbers that can be represented as 2^n + 2^n-1, n >= 1, is exactly those with two adjacent ones in the binary representation.
just like this:
bool check(int x) {/*the number you want to check*/
int flag = 0;
while (x >>= 1) {
if (x & 1) {
if (!flag) flag = 1;
if (flag == 2) return false;
}
if (flag == 1) flag = 2;
}
return true;
}
O(log n).

geting maximum number in a set with special conditions

I encountered a problem recently I have a hard time finding the answer.
This is the question:
Consider a set of numbers.There are tree kinds of input:
1 x
2 x
3
The first command adds integer x to the set.
The second one means for every element y in list, put:
y = y xor x
and The last command prints the biggest number in the set. for instance:
10
3
1 7
3
2 4
2 8
2 3
1 10
1 3
3
2 1
results:
0
7
15
if n is the number of commands in input:
and:
also there is a 1 second execution time limit!
My solution so far:
lets call the set S and have an integer m which initially is 0.as you know:
number = number xor x xor x
meaning that if we apply xor twice on something then the its effect is reversed and the original number doesn't change. That being said if we every time we insert a number(command 1) we do the following:
y = y xor m
add y to S
and every time we want to get a number from the set:
find y
y = y xor m
return y
and if command two comes to the following:
m = m xor x
then the problem is almost solved, since initially save the XORed version of the numbers and when needed we do the revers!
But the problem here is to find the largest number in the set( pay attention that the numbers in the set are different from original numbers) so command 3 works right. I don't know how to do this in an efficient time.but I have an idea here:
if we save the binary representation of the numbers in the set in a trie data structure at first the maybe we can quickly find the biggest number. I don't really know how but this idea occurred to me.
so to sum up these are my issues:
problem 1:
how to find the biggest number in the revised list
problem 2:
is this trie idea good?
problem 3:
how can I implement it in code(the language is not very important here) so that it works time find?
also what is the time complexity needed to solve this problem in the first place?
Thanks for reading my question.
Yes your idea is correct, it can be solved in O(N log 10^9) using binary trie data structure.
The idea is to store numbers in binary notation yet putting biggest bits first, so while traversing the trie we can choose a branch that leads to greatest answer.
For determining which branch to choose we can determine this bit by bit, if from some trie node we have 2 branches with values 0 and 1 we choose the one which gives better result after xoring with m
Sample code (C++):
#include <bits/stdc++.h>
using namespace std;
int Trie[4000005][2];
int nxt = 2;
void Add(int x)
{
bitset<32>b(x);
int c = 1;
for(int j=31; j>=0; j--)
if(Trie[c][b[j]])c=Trie[c][b[j]];
else c = Trie[c][b[j]] = nxt++;
}
int Get(int x)
{
bitset<32>b(x),res(0);
int c = 1;
for(int j=31; j>=0; j--)
if(Trie[c][!b[j]])c=Trie[c][!b[j]],res[j]=!b[j];
else c = Trie[c][b[j]], res[j]=b[j];
return res.to_ullong()^x;
}
int main()
{
ios::sync_with_stdio(0);cin.tie(0);cout.tie(0);
int q,m=0;
cin>>q;
Add(0);
while(q--)
{
int type;
cin>>type;
if(type==1)
{
int x;
cin>>x;
Add(x^m);
}
else if(type==2)
{
int x;
cin>>x;
m^=x;
}
else cout<<Get(m)<<"\n";
}
}
This is very similar to this problem and should be solvable in O(n), because the number of bits for x is constant (for 10^9 you will have to look at the 30 lowest bits).
At start m = 0, each time you encounter the 2nd command you do m ^= x (m = m xor x).
Use a binary tree. Unlike for the linked question the amount of numbers in a bucket doesn't matter, you just need to be able to tell if there is a number that has a certain bit which is one or zero. E.g. for 3-bit numbers 1, 4 and 5 the tree could look like this (left means bit is 0, right means bit is 1):
*
/ \
1 1 there are numbers with highest bit 0 and 1
/ /
1 1 of the numbers with 1st bit 0, there is a number with 2nd bit 0 and ...
\ / \
1 1 1 of the numbers with 1st and 2nd bit 0, there is a number with 3rd bit 1,...
1 4 5 (the numbers just to clarify)
So adding a number just means adding some edges and nodes.
To get the highest number in the set you go down the tree and through the bits of m and calculate the max x as follows:
Initialize node n as the root of the tree, i = 29 the bit of m we are looking at and the solution x = 0.
mi = (m & (1 << i)) >> i (1 if the bit in m is 1, 0 otherwise).
If we look at n and there is only an edge denoting a 0 or if mi == 1 and we have a 0-edge: n becomes the node connected by that edge, x = 2 * x + mi (or more fancy: x = (x << 1) | mi).
Otherwise n becomes the node connected by the 1-edge and x = 2 * x + 1 - mi
If i > 0: decrease i by 1 and continue with step 2.
An example for 3-bit numbers m = 6 (110) and the numbers 1 (001), 4 (100) and 5 (101) in the set, the answer should be 7 (111), i.e. 1 xor 6: First step we go left and x = 1, then we can only go left and x = 3, then we can only go right and x = 7.

Is there a name for this algorithm? (I've been calling it changeBinary)

Is there a name for this algorithm? (I've been calling it changeBinary)
DESCRIPTION:
You take a binary string as input.
The first bit of the output is the same as the first bit of the input.
Every bit after that is 0 if the bit at that index of the input string is the same as the bit at the previous index in the input string. Otherwise, it's 1.
For example,
Input: 00011000001010100001001000010011
Output: 00010100001111110001101100011010
Here is a simple javascript implementation:
var changeBinary = function(binaryString){
var output = binaryString[0] === '0' ? '0' : 1;
for (var i = 1; i < binaryString.length; i++){
var nextBit = binaryString[i] === binaryString[i - 1] ? '0' : '1';
output += nextBit;
}
return output;
}
OBSERVATIONS:
First, it seems that if you keep applying the algorithm to a string, it eventually returns to its original value. Second, it the number of iterations it takes to do so seems to always be a power of 2 (including 2^0 = 1). For example, if you apply the changeBinary function above 32 times to the string above, it will return to the original value.
Has anyone ever encountered this before, and if so, do you know of any other information about it?
It just seems to me like this is something so simple and basic that someone must have studied it more in depth.
Any feedback would be greatly appreciated.
It may be interesting to know that this is x ^ (x << 1) on a BigInteger (or, if you limit the length of the strings, the same thing but on a fixed-size integer), also describable as clmul(x, 3).
Carryless multiplication, which is essentially just like normal multiplication, but instead of adding the partial products you XOR them, has some fairly nice properties, such as being commutative and associative. The associative property is especially of interest since it allows you to reason easily about what composing your algorithm with itself a couple of times does: for example
changeBinary o changeBinary is clmul(clmul(x, 3), 3) = clmul(x, clmul(3, 3)) = clmul(x, 5)
That it's a carryless multiplication by 3 also explains why it "undoes" itself when applied often enough, as the carryless multiplicative inverse of 3 is the number with all bits set, which with 32 bits is 0xffffffff, which can be formed as 331 (with carryless exponentiation). This also follows from the equivalence of a carryless square to a "bit-spread", so it takes a bit string abcd to a0b0c0d, and thus clpow(3, 32) = 1 - 5 spreads have spread the bits so far apart that only the original lsb is left over, the rest does not fit in a 32bit number.
And that also gives a faster inversion, because the number with all bits set can be decomposed into small number of (carryless) factors:
3 x 5 x 17 x 257 x 65537 ...
With a number of factors that is the base two logarithm of the number of bits (rounded up).
Since x ^ (x >> 1) converts a number to Gray Code, I suppose you might call this a "mirrored" Gray Code. The same trick with the factors is used "in the mirror image" to convert a Gray Code back to binary:
x ^= x >> 1 // this is like a "mirror" of x = clmul(x, 3)
x ^= x >> 2 // 5
x ^= x >> 4 // 17
x ^= x >> 8
x ^= x >> 16
Here we just flip the direction of the shift to get:
x ^= x << 1
x ^= x << 2
x ^= x << 4
x ^= x << 8
x ^= x << 16
Which is clmul(x, 0xffffffff) and has also been called PS-XOR(x)
The algorithm you described is an example of Delta Encoding.

Compare two numbers for "likeness"

This is part of a search function on a website. So im trying to find a way to get to the end result as fast as possible.
Have a binary number where digit order matters.
Input Number = 01001
Have a database of other binary numbers all the same length.
01000, 10110, 00000, 11111
I dont know how to write what im doing, so im going to do it more visually below.
// Zeros mean nothing & the location of a 1 matters, not the total number of 1's.
input num > 0 1 0 0 1 = 2 possible matches
number[1] > 0 1 0 0 0 = 1 match = 50% match
number[2] > 1 0 1 1 0 = 0 match = 0% match
number[3] > 0 0 0 0 0 = 0 match = 0% match
number[4] > 1 1 1 1 1 = 2 match = 100% match
Now obviously, you could go digit by digit, number by number and compare it that way (using a loop and what not). But I was hoping there might be an algorithm or something that will help. Mostly because in the above example I only used 5 digit numbers. But im going to be routinely comparing around 100,000 numbers with 200 digits each, that's a lot of calculating.
I usually deal with php and MySQL. But if something spectacular comes up I could always learn.
If it's possible to somehow chop up your bitstrings in integer-size chunks some elementary boolean arithmetic would do, and that kind of instructions is generally pretty fast
$matchmask = ~ ($inputval ^ $tomatch) & $inputval
What this does:
the xor determines the bits that are different in the inputval and tomatch
negation gives a value where all bits that are equal in inputval and tomatch are set
and that with inputval and only the bits that are 1 in both inputval and tomatch remain set.
Then count the number of bits set in the result, look at How to count the number of set bits in a 32-bit integer? for an optimal solution, easily translated into php
Instead of checking each bit, you could pre-process the input and determine which bits need checking. In the worst case, this devolves into processing each bit, but for a normal distribution, you'll save some processing.
That is, for input
01001, iterate over the database and determine if number1[0] & input is non-zero, and (number1[3] >> 8) & input is non-zero, assuming 0 as the index of the LSB. How you get fast bit-shifting and anding with the large numbers is on you, however. If you detect 1s than 0s in the input, you could always invert the input and test for zero to detect coverage.
This will give you modest improvement, but it's at best a constant-time reduction of the problem. If most of your inputs are balanced between 0s and 1s, you'll halve the number of required operations. If it's more biased, you'll get better results.
Well, the first thing I can think of is a simple bitwise AND between the two numbers; you can then analyze the result to get the match percentage:
if( result >= input )
//100% match
else {
result ^= input;
/* The number of 1's in result is the number of 1 of "input"
* that are missing in "result".
*/
}
Of course, you'll need to implement your own AND and XOR function (this will work only for 32 bit integers). Note that it works only with unsigned numbers.
Suppose the input number is called A (so in your example A = 01001) and the other number is x. You'll have 100% match when x & A == A. Otherwise, for partial matches, the number of 1 bits will be (taken from hacker's delight):
x = (x & 0x55555555) + ((x >> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
x = (x & 0x0F0F0F0F) + ((x >> 4) & 0x0F0F0F0F);
x = (x & 0x00FF00FF) + ((x >> 8) & 0x00FF00FF);
x = (x & 0x0000FFFF) + ((x >>16) & 0x0000FFFF);
Note this will work for 32 bits integers.
Let's assume you have a function bit1count, then from what you describe, the "likeness" formula should be:
100.0 / min(bit1count(n1), bit1count(n2)) * bit1count(n1 & n2)
With n1 and n2 being the two numbers and & being the logical and operator.
bit1count can be easily implemented using a loop, or, more elegant, using the algorithm provided in BigBears answer.
There is actually a BIT_COUNT in mysql, so something like this should work:
SELECT 100.0 / IF(BIT_COUNT(n1) < BIT_COUNT(n2), BIT_COUNT(n1), BIT_COUNT(n2)) * BIT_COUNT(n1 & n2) FROM table

Fastest way to modify one digit of an integer

Suppose I have an int x = 54897, old digit index (0 based), and the new value for that digit. What's the fastest way to get the new value?
Example
x = 54897
index = 3
value = 2
y = f(x, index, value) // => 54827
Edit: by fastest, I definitely mean faster performance. No string processing.
In simplest case (considering the digits are numbered from LSB to MSB, the first one being 0) AND knowing the old digit, we could do as simple as that:
num += (new_digit - old_digit) * 10**pos;
For the real problem we would need:
1) the MSB-first version of the pos, that could cost you a log() or at most log10(MAX_INT) divisions by ten (could be improved using binary search).
2) the digit from that pos that would need at most 2 divisions (or zero, using results from step 1).
You could also use the special fpu instruction from x86 that is able to save a float in BCD (I have no idea how slow it is).
UPDATE: the first step could be done even faster, without any divisions, with a binary search like this:
int my_log10(unsigned short n){
// short: 0.. 64k -> 1.. 5 digits
if (n < 1000){ // 1..3
if (n < 10) return 1;
if (n < 100) return 2;
return 3;
} else { // 4..5
if (n < 10000) return 4;
return 5;
}
}
If your index started at the least significant digit, you could do something like
p = pow(10,index);
x = (x / (p*10) * (p*10) + value * p + x % p).
But since your index is backwards, a string is probably the way to go. It would also be more readable and maintainable.
Calculate the "mask" M: 10 raised to the power of index, where index is a zero-based index from the right. If you need to index from the left, recalculate index accordingly.
Calculate the "prefix" PRE = x / (M * 10) * (M * 10)
Calculate the "suffix" SUF = x % M
Calculate the new "middle part" MID = value * M
Generate the new number new_x = PRE + MID + POST.
P.S. ruslik's answer does it more elegantly :)
You need to start by figuring out how many digits are in your input. I can think of two ways of doing that, one with a loop and one with logarithms. Here's the loop version. This will fail for negative and zero inputs and when the index is out of bounds, probably other conditions too, but it's a starting point.
def f(x, index, value):
place = 1
residual = x
while residual > 0:
if index < 0:
place *= 10
index -= 1
residual /= 10
digit = (x / place) % 10
return x - (place * digit) + (place * value)
P.S. This is working Python code. The principle of something simple like this is easy to work out, but the details are so tricky that you really need to iterate it a bit. In this case I started with the principle that I wanted to subtract out the old digit and add the new one; from there it was a matter of getting the correct multiplier.
You gotta get specific with your compute platform if you're talking about performance.
I would approach this by converting the number into pairs of decimal digits, 4 bit each.
Then I would find and process the pair that needs modification as a byte.
Then I would put the number back together.
There are assemblers that do this very well.

Resources