How can I check if there is only value changed in a (bitwise?) value?

How can I check if there is only value changed in a (bitwise?) value? - bit

How can I check if there is only 1 bit change between a value and another (next) value?
the output is for example
001
101
110
in the second output there is a 0 changed into a 1
in the third output there is a 0 changed into a 1 AND also the last 1 changed into a 0
the program may only continue if there is only 1 change.

First, XOR the two numbers. XOR will return a 1 for every bit that changed.
Example:
0101110110100100
XOR
0100110110100100
would give you
0001000000000000
Now what you need is a quick way to check if there is only a single bit in your resulting number, or in other words, if the resulting number is a power of two.
A quick test for that is: (x & (x - 1)) == 0.
No for loops needed.

You can compute the bitwise XOR and then just count the bits that are 1's. This is known as the Hamming distance. For example:
unsigned int a = 0b001;
unsigned int b = 0b100;
unsigned int res;
/* Stores the number of different bits */
unsigned int acc;
res = a ^ b;
/* from https://graphics.stanford.edu/~seander/bithacks.html */
for (acc = 0; res; res >>= 1)
{
acc += res & 1;
}

In Java
void main(String[] args){
boolean value = moreThanOneChanged("101", "001");
}
static boolean moreThanOneChanged(String input, String current){
if(input.length() != current.length()) return false;
char[] first = input.toCharArray();
char[] second = current.toCharArray();
for(int i = 0, j = 0; i < input.length(); i++){
if(first[i] == second[i])
j++;
if(j > 1)
return true;
}
return false;
}

You can prove it to yourself fairly easily by using an and comparison between an exclusive or of each value and the exclusive or minus 1. It is easier to visualize what takes place by looking at the binary representation of the values and results. Below the function onebitoff performs the test. The other functions just provide a way to output the results:
#include <stdio.h>
#include <limits.h> /* for CHAR_BIT */
#define WDSZ 64
/** returns pointer to binary representation of 'n' zero padded to 'sz'.
* returns pointer to string contianing binary representation of
* unsigned 64-bit (or less ) value zero padded to 'sz' digits.
*/
char *cpbin (unsigned long n, int sz)
{
static char s[WDSZ + 1] = {0};
char *p = s + WDSZ;
int i;
for (i=0; i<sz; i++) {
p--;
*p = (n>>i & 1) ? '1' : '0';
}
return p;
}
/* return true if one-bit bitwise variance */
int onebitoff (unsigned int a, unsigned int b)
{
return ((a ^ b) & ((a ^ b) - 1)) ? 0 : 1;
}
/* quick output of binary difference for 2 values */
void showdiff (unsigned int a, unsigned int b)
{
if (onebitoff (a, b))
printf ( " values %u, %u - vary by one-bit (bitwise)\n\n", a, b);
else
printf ( " values %u, %u - vary by other than one-bit (bitwise)\n\n", a, b);
printf (" %3u : %s\n", a, cpbin (a, sizeof (char) * CHAR_BIT));
printf (" %3u : %s\n", b, cpbin (b, sizeof (char) * CHAR_BIT));
printf (" xor : %s\n\n", cpbin ((a ^ b), sizeof (char) * CHAR_BIT));
}
int main () {
printf ("\nTest whether the following numbers vary by a single bit (bitwise)\n\n");
showdiff (1, 5);
showdiff (5, 6);
showdiff (6, 1);
showdiff (97, 105); /* just as a further test */
return 0;
}
output:
$ ./bin/bitsvary
Test whether the following numbers vary by a single bit (bitwise)
values 1, 5 - vary by one-bit (bitwise)
1 : 00000001
5 : 00000101
xor : 00000100
values 5, 6 - vary by other than one-bit (bitwise)
5 : 00000101
6 : 00000110
xor : 00000011
values 6, 1 - vary by other than one-bit (bitwise)
6 : 00000110
1 : 00000001
xor : 00000111
values 97, 105 - vary by one-bit (bitwise)
97 : 01100001
105 : 01101001
xor : 00001000

Related

Filter only digit sequences containing a given set of digits

I have a large list of digit strings like this one. The individual strings are relatively short (say less than 50 digits).
data = [
'300303334',
'53210234',
'123456789',
'5374576807063874'
]
I need to find out a efficient data structure (speed first, memory second) and algorithm which returns only those strings that are composed of a given set of digits.
Example results:
filter(data, [0,3,4]) = ['300303334']
filter(data, [0,1,2,3,4,5]) = ['300303334', '53210234']
The data list will usually fit into memory.

For each digit, precompute a postings list that don't contain the digit.
postings = [[] for _ in xrange(10)]
for i, d in enumerate(data):
for j in xrange(10):
digit = str(j)
if digit not in d:
postings[j].append(i)
Now, to find all strings that contain, for example, just the digits [1, 3, 5] you can merge the postings lists for the other digits (ie: 0, 2, 4, 6, 7, 8, 9).
def intersect_postings(p0, p1):
i0, i1 = next(p0), next(p1)
while True:
if i0 == i1:
yield i0
i0, i1 = next(p0), next(p1)
elif i0 < i1: i0 = next(p0)
else: i1 = next(p1)
def find_all(digits):
p = None
for d in xrange(10):
if d not in digits:
if p is None: p = iter(postings[d])
else: p = intersect_postings(p, iter(postings[d]))
return (data[i] for i in p) if p else iter(data)
print list(find_all([0, 3, 4]))
print list(find_all([0, 1, 2, 3, 4, 5]))

A string can be encoded by a 10-bit number. There are 2^10, or 1,024 possible values.
So create a dictionary that uses an integer for a key and a list of strings for the value.
Calculate the value for each string and add that string to the list of strings for that value.
General idea:
Dictionary Lookup;
for each (string in list)
value = 0;
for each character in string
set bit N in value, where N is the character (0-9)
Lookup[value] += string // adds string to list for this value in dictionary
Then, to get a list of the strings that match your criteria, just compute the value and do a direct dictionary lookup.
So if the user asks for strings that contain only 3, 5, and 7:
value = (1 << 3) || (1 << 5) || (1 << 7);
list = Lookup[value];
Note that, as Matt pointed out in comment below, this will only return strings that contain all three digits. So, for example, it wouldn't return 37. That seems like a fatal flaw to me.
Edit
If the number of symbols you have to deal with is very large, then the number of possible combinations becomes too large for this solution to be practical.
With a large number of symbols, I'd recommend an inverted index as suggested in the comments, combined with a secondary filter that removes the strings that contain extraneous digits.

Consider a function f which constructs a bitmask for each string with bit i set if digit i is in the string.
For example,
f('0') = 0b0000000001
f('00') = 0b0000000001
f('1') = 0b0000000010
f('1100') = 0b0000000011
Then I suggest storing a list of strings for each bitmask.
For example,
Bitmask 0b0000000001 -> ['0','00']
Once you have prepared this data structure (which is the same size as your original list), you can then easily access all the strings for a particular filter by accessing all lists where the bitmask is a subset of the digits in your filter.
So for your example of filter [0,3,4] you would return the lists from:
Strings containing just 0
Strings containing just 3
Strings containing just 4
Strings containing 0 and 3
Strings containing 0 and 4
Strings containing 3 and 4
Strings containing 0 and 3 and 4
Example Python Code
from collections import defaultdict
import itertools
raw_data = [
'300303334',
'53210234',
'123456789',
'5374576807063874'
]
def preprocess(raw_data):
data = defaultdict(list)
for s in raw_data:
bitmask = 0
for digit in s:
bitmask |= 1<<int(digit)
data[bitmask].append(s)
return data
def filter(data,mask):
for r in range(len(mask)):
for m in itertools.combinations(mask,r+1):
bitmask = sum(1<<digit for digit in m)
for s in data[bitmask]:
yield s
data = preprocess(raw_data)
for a in filter(data, [0,1,2,3,4,5]):
print a

Just for kicks, I have coded up Jim's lovely algorithm and the Perl is here if anyone wants to play with it. Please do not accept this as an answer or anything, pass all credit to Jim:
#!/usr/bin/perl
use strict;
use warnings;
my $Debug=1;
my $Nwords=1000;
my ($word,$N,$value,$i,$j,$k);
my (#dictionary,%Lookup);
################################################################################
# Generate "words" with random number of characters 5-30
################################################################################
print "DEBUG: Generating $Nwords word dictionary\n" if $Debug;
for($i=0;$i<$Nwords;$i++){
$j = rand(25) + 5; # length of this word
$word="";
for($k=0;$k<$j;$k++){
$word = $word . int(rand(10));
}
$dictionary[$i]=$word;
print "$word\n" if $Debug;
}
# Add some obvious test cases
$dictionary[++$i]="0" x 50;
$dictionary[++$i]="1" x 50;
$dictionary[++$i]="2" x 50;
$dictionary[++$i]="3" x 50;
$dictionary[++$i]="4" x 50;
$dictionary[++$i]="5" x 50;
$dictionary[++$i]="6" x 50;
$dictionary[++$i]="7" x 50;
$dictionary[++$i]="8" x 50;
$dictionary[++$i]="9" x 50;
$dictionary[++$i]="0123456789";
################################################################################
# Encode words
################################################################################
for $word (#dictionary){
$value=0;
for($i=0;$i<length($word);$i++){
$N=substr($word,$i,1);
$value |= 1 << $N;
}
push(#{$Lookup{$value}},$word);
print "DEBUG: $word encoded as $value\n" if $Debug;
}
################################################################################
# Do lookups
################################################################################
while(1){
print "Enter permitted digits, separated with commas: ";
my $line=<STDIN>;
my #digits=split(",",$line);
$value=0;
for my $d (#digits){
$value |= 1<<$d;
}
print "Value: $value\n";
print join(", ",#{$Lookup{$value}}),"\n\n" if defined $Lookup{$value};
}

I like Jim Mischel's approach. It has pretty efficient look up and bounded memory usage. Code in C follows:
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <readline/readline.h>
#include <readline/history.h>
enum {
zero = '0',
nine = '9',
numbers = nine - zero + 1,
masks = 1 << numbers,
};
typedef uint16_t mask;
struct list {
char *s;
struct list *next;
};
typedef struct list list_cell;
typedef struct list *list;
static inline int is_digit(char c) { return c >= zero && c <= nine; }
static inline mask char2mask(char c) { return 1 << (c - zero); }
static inline mask add_char2mask(mask m, char c) {
return m | (is_digit(c) ? char2mask(c) : 0);
}
static inline int is_set(mask m, mask n) { return (m & n) != 0; }
static inline int is_set_char(mask m, char c) { return is_set(m, char2mask(c)); }
static inline int is_submask(mask sub, mask m) { return (sub & m) == sub; }
static inline char *sprint_mask(char buf[11], mask m) {
char *s = buf;
char i;
for(i = zero; i <= nine; i++)
if(is_set_char(m, i)) *s++ = i;
*s = 0;
return buf;
}
static inline mask get_mask(char *s) {
mask m=0;
for(; *s; s++)
m = add_char2mask(m, *s);
return m;
}
static inline int is_empty(list l) { return !l; }
static inline list insert(list *l, char *s) {
list cell = (list)malloc(sizeof(list_cell));
cell->s = s;
cell->next = *l;
return *l = cell;
}
static void *foreach(void *f(char *, void *), list l, void *init) {
for(; !is_empty(l); l = l->next)
init = f(l->s, init);
return init;
}
struct printer_state {
int first;
FILE *f;
};
static void *prin_list_member(char *s, void *data) {
struct printer_state *st = (struct printer_state *)data;
if(st->first) {
fputs(", ", st->f);
} else
st->first = 1;
fputs(s, st->f);
return data;
}
static void print_list(list l) {
struct printer_state st = {.first = 0, .f = stdout};
foreach(prin_list_member, l, (void *)&st);
putchar('\n');
}
static list *init_lu(void) { return (list *)calloc(sizeof(list), masks); }
static list *insert2lu(list lu[masks], char *s) {
mask i, m = get_mask(s);
if(m) // skip string without any number
for(i = m; i < masks; i++)
if(is_submask(m, i))
insert(lu+i, s);
return lu;
}
int usage(const char *name) {
fprintf(stderr, "Usage: %s filename\n", name);
return EXIT_FAILURE;
}
#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)
static inline void chomp(char *s) { if( (s = strchr(s, '\n')) ) *s = '\0'; }
list *load_file(FILE *f) {
char *line = NULL;
size_t len = 0;
ssize_t read;
list *lu = init_lu();
for(; (read = getline(&line, &len, f)) != -1; line = NULL) {
chomp(line);
insert2lu(lu, line);
}
return lu;
}
void read_reqs(list *lu) {
char *line;
char buf[11];
for(; (line = readline("> ")); free(line))
if(*line) {
add_history(line);
mask m = get_mask(line);
printf("mask: %s\nstrings: ", sprint_mask(buf, m));
print_list(lu[m]);
};
putchar('\n');
}
int main(int argc, const char* argv[] ) {
const char *name = argv[0];
FILE *f;
list *lu;
if(argc != 2) return usage(name);
f = fopen(argv[1], "r");
if(!f) handle_error("open");
lu = load_file(f);
fclose(f);
read_reqs(lu);
return EXIT_SUCCESS;
}
To compile use
gcc -lreadline -o digitfilter digitfilter.c
And test run:
$ cat data.txt
300303334
53210234
123456789
5374576807063874
$ ./digitfilter data.txt
> 034
mask: 034
strings: 300303334
> 0,1,2,3,4,5
mask: 012345
strings: 53210234, 300303334
> 0345678
mask: 0345678
strings: 5374576807063874, 300303334

Put each value into a set-- Eg.: '300303334'={3, 0, 4}.
Since the length of your data items are bound by a constant (50),
you can do these at O(1) time for each item using Java HashSet. The overall complexity of this phase adds up to O(n).
For each filter set, use containsAll() of HashSet to see whether
each of these data items is a subset of your filter. Takes O(n).
Takes O(m*n) in the overall where n is the number of data items and m the number of filters.

Manually Converting rgba8 to rgba5551

I need to convert rgba8 to rgba5551 manually. I found some helpful code from another post and want to modify it to convert from rgba8 to rgba5551. I don't really have experience with bitewise stuff and haven't had any luck messing with the code myself.
void* rgba8888_to_rgba4444( void* src, int src_bytes)
{
// compute the actual number of pixel elements in the buffer.
int num_pixels = src_bytes / 4;
unsigned long* psrc = (unsigned long*)src;
unsigned short* pdst = (unsigned short*)src;
// convert every pixel
for(int i = 0; i < num_pixels; i++){
// read a source pixel
unsigned px = psrc[i];
// unpack the source data as 8 bit values
unsigned r = (px << 8) & 0xf000;
unsigned g = (px >> 4) & 0x0f00;
unsigned b = (px >> 16) & 0x00f0;
unsigned a = (px >> 28) & 0x000f;
// and store
pdst[i] = r | g | b | a;
}
return pdst;
}

The value of RGBA5551 is that it has color info condensed into 16 bits - or two bytes, with only one bit for the alpha channel (on or off). RGBA8888, on the other hand, uses a byte for each channel. (If you don't need an alpha channel, I hear RGB565 is better - as humans are more sensitive to green). Now, with 5 bits, you get the numbers 0 through 31, so r, g, and b each need to be converted to some number between 0 and 31, and since they are originally a byte each (0-255), we multiply each by 31/255. Here is a function that takes RGBA bytes as input and outputs RGBA5551 as a short:
short int RGBA8888_to_RGBA5551(unsigned char r, unsigned char g, unsigned char b, unsigned char a){
unsigned char r5 = r*31/255; // All arithmetic is integer arithmetic, and so floating points are truncated. If you want to round to the nearest integer, adjust this code accordingly.
unsigned char g5 = g*31/255;
unsigned char b5 = b*31/255;
unsigned char a1 = (a > 0) ? 1 : 0; // 1 if a is positive, 0 else. You must decide what is sensible.
// Now that we have our 5 bit r, g, and b and our 1 bit a, we need to shift them into place before combining.
short int rShift = (short int)r5 << 11; // (short int)r5 looks like 00000000000vwxyz - 11 zeroes. I'm not sure if you need (short int), but I've wasted time tracking down bugs where I didn't typecast properly before shifting.
short int gShift = (short int)g5 << 6;
short int bShift = (short int)b5 << 1;
// Combine and return
return rShift | gShift | bShift | a1;
}
You can, of course condense this code.

find the index of the highest bit set of a 32-bit number without loops obviously

Here's a tough one(atleast i had a hard time :P):
find the index of the highest bit set of a 32-bit number without using any loops.

With recursion:
int firstset(int bits) {
return (bits & 0x80000000) ? 31 : firstset((bits << 1) | 1) - 1;
}
Assumes [31,..,0] indexing
Returns -1 if no bits set
| 1 prevents stack overflow by capping the number of shifts until a 1 is reached (32)
Not tail recursive :)

Very interesting question, I will provide you an answer with benchmark
Solution using a loop
uint8_t highestBitIndex( uint32_t n )
{
uint8_t r = 0;
while ( n >>= 1 )
r++;
return r;
}
This help to better understand the question but is highly inefficient.
Solution using log
This approach can also be summarize by the log method
uint8_t highestSetBitIndex2(uint32_t n) {
return (uint8_t)(log(n) / log(2));
}
However it is also inefficient (even more than above one, see benchmark)
Solution using built-in instruction
uint8_t highestBitIndex3( uint32_t n )
{
return 31 - __builtin_clz(n);
}
This solution, while very efficient, suffer from the fact that it only work with specific compilers (gcc and clang will do) and on specific platforms.
NB: It is 31 and not 32 if we want the index
Solution with intrinsic
#include <x86intrin.h>
uint8_t highestSetBitIndex5(uint32_t n)
{
return _bit_scan_reverse(n); // undefined behavior if n == 0
}
This will call the bsr instruction at assembly level
Solution using inline assembly
LZCNT and BSR can be summarize in assembly with the below functions:
uint8_t highestSetBitIndex4(uint32_t n) // undefined behavior if n == 0
{
__asm__ __volatile__ (R"(
.intel_syntax noprefix
bsr eax, edi
.att_syntax noprefix
)"
);
}
uint8_t highestSetBitIndex7(uint32_t n) // undefined behavior if n == 0
{
__asm__ __volatile__ (R"(.intel_syntax noprefix
lzcnt ecx, edi
mov eax, 31
sub eax, ecx
.att_syntax noprefix
)");
}
NB: Do Not Use unless you know what you are doing
Solution using lookup table and magic number multiplication (probably the best AFAIK)
First you use the following function to clear all the bits except the highest one:
uint32_t keepHighestBit( uint32_t n )
{
n |= (n >> 1);
n |= (n >> 2);
n |= (n >> 4);
n |= (n >> 8);
n |= (n >> 16);
return n - (n >> 1);
}
Credit: The idea come from Henry S. Warren, Jr. in his book Hacker's Delight
Then we use an algorithm based on DeBruijn's Sequence to perform a kind of binary search:
uint8_t highestBitIndex8( uint32_t b )
{
static const uint32_t deBruijnMagic = 0x06EB14F9; // equivalent to 0b111(0xff ^ 3)
static const uint8_t deBruijnTable[64] = {
0, 0, 0, 1, 0, 16, 2, 0, 29, 0, 17, 0, 0, 3, 0, 22,
30, 0, 0, 20, 18, 0, 11, 0, 13, 0, 0, 4, 0, 7, 0, 23,
31, 0, 15, 0, 28, 0, 0, 21, 0, 19, 0, 10, 12, 0, 6, 0,
0, 14, 27, 0, 0, 9, 0, 5, 0, 26, 8, 0, 25, 0, 24, 0,
};
return deBruijnTable[(keepHighestBit(b) * deBruijnMagic) >> 26];
}
Another version:
void propagateBits(uint32_t *n) {
*n |= *n >> 1;
*n |= *n >> 2;
*n |= *n >> 4;
*n |= *n >> 8;
*n |= *n >> 16;
}
uint8_t highestSetBitIndex8(uint32_t b)
{
static const uint32_t Magic = (uint32_t) 0x07C4ACDD;
static const int BitTable[32] = {
0, 9, 1, 10, 13, 21, 2, 29,
11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7,
19, 27, 23, 6, 26, 5, 4, 31,
};
propagateBits(&b);
return BitTable[(b * Magic) >> 27];
}
Benchmark with 100 million calls
compiling with g++ -std=c++17 highestSetBit.cpp -O3 && ./a.out
highestBitIndex1 136.8 ms (loop)
highestBitIndex2 183.8 ms (log(n) / log(2))
highestBitIndex3 10.6 ms (de Bruijn lookup Table with power of two, 64 entries)
highestBitIndex4 4.5 ms (inline assembly bsr)
highestBitIndex5 6.7 ms (intrinsic bsr)
highestBitIndex6 4.7 ms (gcc lzcnt)
highestBitIndex7 7.1 ms (inline assembly lzcnt)
highestBitIndex8 10.2 ms (de Bruijn lookup Table, 32 entries)
I would personally go for highestBitIndex8 if portability is your focus, else gcc built-in is nice.

Floor of logarithm-base-two should do the trick (though you have to special-case 0).
Floor of log base 2 of 0001 is 0 (bit with index 0 is set).
" " of 0010 is 1 (bit with index 1 is set).
" " of 0011 is 1 (bit with index 1 is set).
" " of 0100 is 2 (bit with index 2 is set).
and so on.
On an unrelated note, this is actually a pretty terrible interview question (I say this as someone who does technical interviews for potential candidates), because it really doesn't correspond to anything you do in practical programming.
Your boss isn't going to come up to you one day and say "hey, so we have a rush job for this latest feature, and it needs to be implemented without loops!"

You could do it like this (not optimised):
int index = 0;
uint32_t temp = number;
if ((temp >> 16) != 0) {
temp >>= 16;
index += 16;
}
if ((temp >> 8) != 0) {
temp >>= 8
index += 8;
}
...

sorry for bumping an old thread, but how about this
inline int ilog2(unsigned long long i) {
union { float f; int i; } = { i };
return (u.i>>23)-27;
}
...
int highest=ilog2(x); highest+=(x>>highest)-1;
// and in case you need it
int lowest = ilog2((x^x-1)+1)-1;

this can be done as a binary search, reducing complexity of O(N) (for an N-bit word) to O(log(N)). A possible implementation is:
int highest_bit_index(uint32_t value)
{
if(value == 0) return 0;
int depth = 0;
int exponent = 16;
while(exponent > 0)
{
int shifted = value >> (exponent);
if(shifted > 0)
{
depth += exponent;
if(shifted == 1) return depth + 1;
value >>= exponent;
}
exponent /= 2;
}
return depth + 1;
}
the input is a 32 bit unsigned integer.
it has a loop that can be converted into 5 levels of if-statements , therefore resulting in 32 or so if-statements. you could also use recursion to get rid of the loop, or the absolutely evil "goto" ;)

Let
n - Decimal number for which bit location to be identified
start - Indicates decimal value of ( 1 << 32 ) - 2147483648
bitLocation - Indicates bit location which is set to 1
public int highestBitSet(int n, long start, int bitLocation)
{
if (start == 0)
{
return 0;
}
if ((start & n) > 0)
{
return bitLocation;
}
else
{
return highestBitSet(n, (start >> 1), --bitLocation);
}
}
long i = 1;
long startIndex = (i << 31);
int bitLocation = 32;
int value = highestBitSet(64, startIndex, bitLocation);
System.out.println(value);

int high_bit_set(int n, int pos)
{
if(pos<0)
return -1;
else
return (0x80000000 & n)?pos:high_bit_set((n<<1),--pos);
}
main()
{
int n=0x23;
int high_pos = high_bit_set(n,31);
printf("highest index = %d",high_pos);
}
From your main call function high_bit_set(int n , int pos) with the input value n, and default 31 as the highest position. And the function is like above.

Paislee's solution is actually pretty easy to make tail-recursive, though, it's a much slower solution than the suggested floor(log2(n));
int firstset_tr(int bits, int final_dec) {
// pass in 0 for final_dec on first call, or use a helper function
if (bits & 0x80000000) {
return 31-final_dec;
} else {
return firstset_tr( ((bits << 1) | 1), final_dec+1 );
}
}
This function also works for other bit sizes, just change the check,
e.g.
if (bits & 0x80) { // for 8-bit
return 7-final_dec;
}

Note that what you are trying to do is calculate the integer log2 of an integer,
#include <stdio.h>
#include <stdlib.h>
unsigned int
Log2(unsigned long x)
{
unsigned long n = x;
int bits = sizeof(x)*8;
int step = 1; int k=0;
for( step = 1; step < bits; ) {
n |= (n >> step);
step *= 2; ++k;
}
//printf("%ld %ld\n",x, (x - (n >> 1)) );
return(x - (n >> 1));
}
Observe that you can attempt to search more than 1 bit at a time.
unsigned int
Log2_a(unsigned long x)
{
unsigned long n = x;
int bits = sizeof(x)*8;
int step = 1;
int step2 = 0;
//observe that you can move 8 bits at a time, and there is a pattern...
//if( x>1<<step2+8 ) { step2+=8;
//if( x>1<<step2+8 ) { step2+=8;
//if( x>1<<step2+8 ) { step2+=8;
//}
//}
//}
for( step2=0; x>1L<<step2+8; ) {
step2+=8;
}
//printf("step2 %d\n",step2);
for( step = 0; x>1L<<(step+step2); ) {
step+=1;
//printf("step %d\n",step+step2);
}
printf("log2(%ld) %d\n",x,step+step2);
return(step+step2);
}
This approach uses a binary search
unsigned int
Log2_b(unsigned long x)
{
unsigned long n = x;
unsigned int bits = sizeof(x)*8;
unsigned int hbit = bits-1;
unsigned int lbit = 0;
unsigned long guess = bits/2;
int found = 0;
while ( hbit-lbit>1 ) {
//printf("log2(%ld) %d<%d<%d\n",x,lbit,guess,hbit);
//when value between guess..lbit
if( (x<=(1L<<guess)) ) {
//printf("%ld < 1<<%d %ld\n",x,guess,1L<<guess);
hbit=guess;
guess=(hbit+lbit)/2;
//printf("log2(%ld) %d<%d<%d\n",x,lbit,guess,hbit);
}
//when value between hbit..guess
//else
if( (x>(1L<<guess)) ) {
//printf("%ld > 1<<%d %ld\n",x,guess,1L<<guess);
lbit=guess;
guess=(hbit+lbit)/2;
//printf("log2(%ld) %d<%d<%d\n",x,lbit,guess,hbit);
}
}
if( (x>(1L<<guess)) ) ++guess;
printf("log2(x%ld)=r%d\n",x,guess);
return(guess);
}
Another binary search method, perhaps more readable,
unsigned int
Log2_c(unsigned long x)
{
unsigned long v = x;
unsigned int bits = sizeof(x)*8;
unsigned int step = bits;
unsigned int res = 0;
for( step = bits/2; step>0; )
{
//printf("log2(%ld) v %d >> step %d = %ld\n",x,v,step,v>>step);
while ( v>>step ) {
v>>=step;
res+=step;
//printf("log2(%ld) step %d res %d v>>step %ld\n",x,step,res,v);
}
step /= 2;
}
if( (x>(1L<<res)) ) ++res;
printf("log2(x%ld)=r%ld\n",x,res);
return(res);
}
And because you will want to test these,
int main()
{
unsigned long int x = 3;
for( x=2; x<1000000000; x*=2 ) {
//printf("x %ld, x+1 %ld, log2(x+1) %d\n",x,x+1,Log2(x+1));
printf("x %ld, x+1 %ld, log2_a(x+1) %d\n",x,x+1,Log2_a(x+1));
printf("x %ld, x+1 %ld, log2_b(x+1) %d\n",x,x+1,Log2_b(x+1));
printf("x %ld, x+1 %ld, log2_c(x+1) %d\n",x,x+1,Log2_c(x+1));
}
return(0);
}

well from what I know the function Log is Implemented very efficiently in most programming languages, and even if it does contain loops , it is probably very few of them , internally
So I would say that in most cases using the log would be faster , and more direct.
you do have to check for 0 though and avoid taking the log of 0, as that would cause the program to crash.

efficiently find the first element matching a bit mask

I have a list of N 64-bit integers whose bits represent small sets. Each integer has at most k bits set to 1. Given a bit mask, I would like to find the first element in the list that matches the mask, i.e. element & mask == element.
Example:
If my list is:
index abcdef
0 001100
1 001010
2 001000
3 000100
4 000010
5 000001
6 010000
7 100000
8 000000
and my mask is 111000, the first element matching the mask is at index 2.
Method 1:
Linear search through the entire list. This takes O(N) time and O(1) space.
Method 2:
Precompute a tree of all possible masks, and at each node keep the answer for that mask. This takes O(1) time for the query, but takes O(2^64) space.
Question:
How can I find the first element matching the mask faster than O(N), while still using a reasonable amount of space? I can afford to spend polynomial time in precomputation, because there will be a lot of queries. The key is that k is small. In my application, k <= 5 and N is in the thousands. The mask has many 1s; you can assume that it is drawn uniformly from the space of 64-bit integers.
Update:
Here is an example data set and a simple benchmark program that runs on Linux: http://up.thirld.com/binmask.tar.gz. For large.in, N=3779 and k=3. The first line is N, followed by N unsigned 64-bit ints representing the elements. Compile with make. Run with ./benchmark.e >large.out to create the true output, which you can then diff against. (Masks are generated randomly, but the random seed is fixed.) Then replace the find_first() function with your implementation.
The simple linear search is much faster than I expected. This is because k is small, and so for a random mask, a match is found very quickly on average.

A suffix tree (on bits) will do the trick, with the original priority at the leaf nodes:
000000 -> 8
1 -> 5
10 -> 4
100 -> 3
1000 -> 2
10 -> 1
100 -> 0
10000 -> 6
100000 -> 7
where if the bit is set in the mask, you search both arms, and if not, you search only the 0 arm; your answer is the minimum number you encounter at a leaf node.
You can improve this (marginally) by traversing the bits not in order but by maximum discriminability; in your example, note that 3 elements have bit 2 set, so you would create
2:0 0:0 1:0 3:0 4:0 5:0 -> 8
5:1 -> 5
4:1 5:0 -> 4
3:1 4:0 5:0 -> 3
1:1 3:0 4:0 5:0 -> 6
0:1 1:0 3:0 4:0 5:0 -> 7
2:1 0:0 1:0 3:0 4:0 5:0 -> 2
4:1 5:0 -> 1
3:1 4:0 5:0 -> 0
In your example mask this doesn't help (since you have to traverse both the bit2==0 and bit2==1 sides since your mask is set in bit 2), but on average it will improve the results (but at a cost of setup and more complex data structure). If some bits are much more likely to be set than others, this could be a huge win. If they're pretty close to random within the element list, then this doesn't help at all.
If you're stuck with essentially random bits set, you should get about (1-5/64)^32 benefit from the suffix tree approach on average (13x speedup), which might be better than the difference in efficiency due to using more complex operations (but don't count on it--bit masks are fast). If you have a nonrandom distribution of bits in your list, then you could do almost arbitrarily well.

This is the bitwise Kd-tree. It typically needs less than 64 visits per lookup operation. Currently, the selection of the bit (dimension) to pivot on is random.
#include <limits.h>
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
typedef unsigned long long Thing;
typedef unsigned long Number;
unsigned thing_ffs(Thing mask);
Thing rand_mask(unsigned bitcnt);
#define WANT_RANDOM 31
#define WANT_BITS 3
#define BITSPERTHING (CHAR_BIT*sizeof(Thing))
#define NONUMBER ((Number)-1)
struct node {
Thing value;
Number num;
Number nul;
Number one;
char pivot;
} *nodes = NULL;
unsigned nodecount=0;
unsigned itercount=0;
struct node * nodes_read( unsigned *sizp, char *filename);
Number *find_ptr_to_insert(Number *ptr, Thing value, Thing mask);
unsigned grab_matches(Number *result, Number num, Thing mask);
void initialise_stuff(void);
int main (int argc, char **argv)
{
Thing mask;
Number num;
unsigned idx;
srand (time(NULL));
nodes = nodes_read( &nodecount, argv[1]);
fprintf( stdout, "Nodecount=%u\n", nodecount );
initialise_stuff();
#if WANT_RANDOM
mask = nodes[nodecount/2].value | nodes[nodecount/3].value ;
#else
mask = 0x38;
#endif
fprintf( stdout, "\n#### Search mask=%llx\n", (unsigned long long) mask );
itercount = 0;
num = NONUMBER;
idx = grab_matches(&num,0, mask);
fprintf( stdout, "Itercount=%u\n", itercount );
fprintf(stdout, "KdTree search %16llx\n", (unsigned long long) mask );
fprintf(stdout, "Count=%u Result:\n", idx);
idx = num;
if (idx >= nodecount) idx = nodecount-1;
fprintf( stdout, "num=%4u Value=%16llx\n"
,(unsigned) nodes[idx].num
,(unsigned long long) nodes[idx].value
);
fprintf( stdout, "\nLinear search %16llx\n", (unsigned long long) mask );
for (idx = 0; idx < nodecount; idx++) {
if ((nodes[idx].value & mask) == nodes[idx].value) break;
}
fprintf(stdout, "Cnt=%u\n", idx);
if (idx >= nodecount) idx = nodecount-1;
fprintf(stdout, "Num=%4u Value=%16llx\n"
, (unsigned) nodes[idx].num
, (unsigned long long) nodes[idx].value );
return 0;
}
void initialise_stuff(void)
{
unsigned num;
Number root, *ptr;
root = 0;
for (num=0; num < nodecount; num++) {
nodes[num].num = num;
nodes[num].one = NONUMBER;
nodes[num].nul = NONUMBER;
nodes[num].pivot = -1;
}
nodes[num-1].value = 0; /* last node is guaranteed to match anything */
root = 0;
for (num=1; num < nodecount; num++) {
ptr = find_ptr_to_insert (&root, nodes[num].value, 0ull );
if (*ptr == NONUMBER) *ptr = num;
else fprintf(stderr, "Found %u for %u\n"
, (unsigned)*ptr, (unsigned) num );
}
}
Thing rand_mask(unsigned bitcnt)
{struct node * nodes_read( unsigned *sizp, char *filename)
{
struct node *ptr;
unsigned size,used;
FILE *fp;
if (!filename) {
size = (WANT_RANDOM+0) ? WANT_RANDOM : 9;
ptr = malloc (size * sizeof *ptr);
#if (!WANT_RANDOM)
ptr[0].value = 0x0c;
ptr[1].value = 0x0a;
ptr[2].value = 0x08;
ptr[3].value = 0x04;
ptr[4].value = 0x02;
ptr[5].value = 0x01;
ptr[6].value = 0x10;
ptr[7].value = 0x20;
ptr[8].value = 0x00;
#else
for (used=0; used < size; used++) {
ptr[used].value = rand_mask(WANT_BITS);
}
#endif /* WANT_RANDOM */
*sizp = size;
return ptr;
}
fp = fopen( filename, "r" );
if (!fp) return NULL;
fscanf(fp,"%u\n", &size );
fprintf(stderr, "Size=%u\n", size);
ptr = malloc (size * sizeof *ptr);
for (used = 0; used < size; used++) {
fscanf(fp,"%llu\n", &ptr[used].value );
}
fclose( fp );
*sizp = used;
return ptr;
}
Thing value = 0;
unsigned bit, cnt;
for (cnt=0; cnt < bitcnt; cnt++) {
bit = 54321*rand();
bit %= BITSPERTHING;
value |= 1ull << bit;
}
return value;
}
Number *find_ptr_to_insert(Number *ptr, Thing value, Thing done)
{
Number num=NONUMBER;
while ( *ptr != NONUMBER) {
Thing wrong;
num = *ptr;
wrong = (nodes[num].value ^ value) & ~done;
if (nodes[num].pivot < 0) { /* This node is terminal */
/* choose one of the wrong bits for a pivot .
** For this bit (nodevalue==1 && searchmask==0 )
*/
if (!wrong) wrong = ~done ;
nodes[num].pivot = thing_ffs( wrong );
}
ptr = (wrong & 1ull << nodes[num].pivot) ? &nodes[num].nul : &nodes[num].one;
/* Once this bit has been tested, it can be masked off. */
done |= 1ull << nodes[num].pivot ;
}
return ptr;
}
unsigned grab_matches(Number *result, Number num, Thing mask)
{
Thing wrong;
unsigned count;
for (count=0; num < *result; ) {
itercount++;
wrong = nodes[num].value & ~mask;
if (!wrong) { /* we have a match */
if (num < *result) { *result = num; count++; }
/* This is cheap pruning: the break will omit both subtrees from the results.
** But because we already have a result, and the subtrees have higher numbers
** than our current num, we can ignore them. */
break;
}
if (nodes[num].pivot < 0) { /* This node is terminal */
break;
}
if (mask & 1ull << nodes[num].pivot) {
/* avoid recursion if there is only one non-empty subtree */
if (nodes[num].nul >= *result) { num = nodes[num].one; continue; }
if (nodes[num].one >= *result) { num = nodes[num].nul; continue; }
count += grab_matches(result, nodes[num].nul, mask);
count += grab_matches(result, nodes[num].one, mask);
break;
}
mask |= 1ull << nodes[num].pivot;
num = (wrong & 1ull << nodes[num].pivot) ? nodes[num].nul : nodes[num].one;
}
return count;
}
unsigned thing_ffs(Thing mask)
{
unsigned bit;
#if 1
if (!mask) return (unsigned)-1;
for ( bit=random() % BITSPERTHING; 1 ; bit += 5, bit %= BITSPERTHING) {
if (mask & 1ull << bit ) return bit;
}
#elif 0
for (bit =0; bit < BITSPERTHING; bit++ ) {
if (mask & 1ull <<bit) return bit;
}
#else
mask &= (mask-1); // Kernighan-trick
for (bit =0; bit < BITSPERTHING; bit++ ) {
mask >>=1;
if (!mask) return bit;
}
#endif
return 0xffffffff;
}
struct node * nodes_read( unsigned *sizp, char *filename)
{
struct node *ptr;
unsigned size,used;
FILE *fp;
if (!filename) {
size = (WANT_RANDOM+0) ? WANT_RANDOM : 9;
ptr = malloc (size * sizeof *ptr);
#if (!WANT_RANDOM)
ptr[0].value = 0x0c;
ptr[1].value = 0x0a;
ptr[2].value = 0x08;
ptr[3].value = 0x04;
ptr[4].value = 0x02;
ptr[5].value = 0x01;
ptr[6].value = 0x10;
ptr[7].value = 0x20;
ptr[8].value = 0x00;
#else
for (used=0; used < size; used++) {
ptr[used].value = rand_mask(WANT_BITS);
}
#endif /* WANT_RANDOM */
*sizp = size;
return ptr;
}
fp = fopen( filename, "r" );
if (!fp) return NULL;
fscanf(fp,"%u\n", &size );
fprintf(stderr, "Size=%u\n", size);
ptr = malloc (size * sizeof *ptr);
for (used = 0; used < size; used++) {
fscanf(fp,"%llu\n", &ptr[used].value );
}
fclose( fp );
*sizp = used;
return ptr;
}
UPDATE:
I experimented a bit with the pivot-selection, favouring bits with the highest discriminatory value ("information content"). This involves:
making a histogram of the usage of bits (can be done while initialising)
while building the tree: choosing the one with frequency closest to 1/2 in the remaining subtrees.
The result: the random pivot selection performed better.

Construct a a binary tree as follows:
Every level corresponds to a bit
It corresponding bit is on go right, otherwise left
This way insert every number in the database.
Now, for searching: if the corresponding bit in the mask is 1, traverse both children. If it is 0, traverse only the left node. Essentially keep traversing the tree until you hit the leaf node (BTW, 0 is a hit for every mask!).
This tree will have O(N) space requirements.
Eg of tree for 1 (001), 2(010) and 5 (101)
root
/ \
0 1
/ \ |
0 1 0
| | |
1 0 1
(1) (2) (5)

With precomputed bitmasks. Formally is is still O(N), since the and-mask operations are O(N). The final pass is also O(N), because it needs to find the lowest bit set, but that could be sped up, too.
#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/* For demonstration purposes.
** In reality, this should be an unsigned long long */
typedef unsigned char Thing;
#define BITSPERTHING (CHAR_BIT*sizeof (Thing))
#define COUNTOF(a) (sizeof a / sizeof a[0])
Thing data[] =
/****** index abcdef */
{ 0x0c /* 0 001100 */
, 0x0a /* 1 001010 */
, 0x08 /* 2 001000 */
, 0x04 /* 3 000100 */
, 0x02 /* 4 000010 */
, 0x01 /* 5 000001 */
, 0x10 /* 6 010000 */
, 0x20 /* 7 100000 */
, 0x00 /* 8 000000 */
};
/* Note: this is for demonstration purposes.
** Normally, one should choose a machine wide unsigned int
** for bitmask arrays.
*/
struct bitmap {
char data[ 1+COUNTOF (data)/ CHAR_BIT ];
} nulmaps [ BITSPERTHING ];
#define BITSET(a,i) (a)[(i) / CHAR_BIT ] |= (1u << ((i)%CHAR_BIT) )
#define BITTEST(a,i) ((a)[(i) / CHAR_BIT ] & (1u << ((i)%CHAR_BIT) ))
void init_tabs(void);
void map_empty(struct bitmap *dst);
void map_full(struct bitmap *dst);
void map_and2(struct bitmap *dst, struct bitmap *src);
int main (void)
{
Thing mask;
struct bitmap result;
unsigned ibit;
mask = 0x38;
init_tabs();
map_full(&result);
for (ibit = 0; ibit < BITSPERTHING; ibit++) {
/* bit in mask is 1, so bit at this position is in fact a don't care */
if (mask & (1u <<ibit)) continue;
/* bit in mask is 0, so we can only select items with a 0 at this bitpos */
map_and2(&result, &nulmaps[ibit] );
}
/* This is not the fastest way to find the lowest 1 bit */
for (ibit = 0; ibit < COUNTOF (data); ibit++) {
if (!BITTEST(result.data, ibit) ) continue;
fprintf(stdout, " %u", ibit);
}
fprintf( stdout, "\n" );
return 0;
}
void init_tabs(void)
{
unsigned ibit, ithing;
/* 1 bits in data that dont overlap with 1 bits in the searchmask are showstoppers.
** So, for each bitpos, we precompute a bitmask of all *entrynumbers* from data[], that contain 0 in bitpos.
*/
memset(nulmaps, 0 , sizeof nulmaps);
for (ithing=0; ithing < COUNTOF(data); ithing++) {
for (ibit=0; ibit < BITSPERTHING; ibit++) {
if ( data[ithing] & (1u << ibit) ) continue;
BITSET(nulmaps[ibit].data, ithing);
}
}
}
/* Logical And of two bitmask arrays; simular to dst &= src */
void map_and2(struct bitmap *dst, struct bitmap *src)
{
unsigned idx;
for (idx = 0; idx < COUNTOF(dst->data); idx++) {
dst->data[idx] &= src->data[idx] ;
}
}
void map_empty(struct bitmap *dst)
{
memset(dst->data, 0 , sizeof dst->data);
}
void map_full(struct bitmap *dst)
{
unsigned idx;
/* NOTE this loop sets too many bits to the left of COUNTOF(data) */
for (idx = 0; idx < COUNTOF(dst->data); idx++) {
dst->data[idx] = ~0;
}
}

clear all but the two most significant set bits in a word

Given an 32 bit int which is known to have at least 2 bits set, is there a way to efficiently clear all except the 2 most significant set bits? i.e. I want to ensure the output has exactly 2 bits set.
What if the input is guaranteed to have only 2 or 3 bits set.?
Examples:
0x2040 -> 0x2040
0x0300 -> 0x0300
0x0109 -> 0x0108
0x5040 -> 0x5000
Benchmarking Results:
Code:
QueryPerformanceFrequency(&freq);
/***********/
value = (base =2)|1;
QueryPerformanceCounter(&start);
for (l=0;l<A_LOT; l++)
{
//!!value calculation goes here
junk+=value; //use result to prevent optimizer removing it.
//advance to the next 2|3 bit word
if (value&0x80000000)
{ if (base&0x80000000)
{ base=6;
}
base*=2;
value=base|1;
}
else
{ value<<=1;
}
}
QueryPerformanceCounter(&end);
time = (end.QuadPart - start.QuadPart);
time /= freq.QuadPart;
printf("--------- name\n");
printf("%ld loops took %f sec (%f additional)\n",A_LOT, time, time-baseline);
printf("words /sec = %f Million\n",A_LOT/(time-baseline)/1.0e6);
Results on using VS2005 default release settings on Core2Duo E7500#2.93 GHz:
--------- BASELINE
1000000 loops took 0.001630 sec
--------- sirgedas
1000000 loops took 0.002479 sec (0.000849 additional)
words /sec = 1178.074206 Million
--------- ashelly
1000000 loops took 0.004640 sec (0.003010 additional)
words /sec = 332.230369 Million
--------- mvds
1000000 loops took 0.005250 sec (0.003620 additional)
words /sec = 276.242030 Million
--------- spender
1000000 loops took 0.009594 sec (0.007964 additional)
words /sec = 125.566361 Million
--------- schnaader
1000000 loops took 0.025680 sec (0.024050 additional)
words /sec = 41.580158 Million

If the input is guaranteed to have exactly 2 or 3 bits then the answer can be computed very quickly. We exploit the fact that the expression x&(x-1) is equal to x with the LSB cleared. Applying that expression twice to the input will produce 0, if 2 or fewer bits are set. If exactly 2 bits are set, we return the original input. Otherwise, we return the original input with the LSB cleared.
Here is the code in C++:
// assumes a has exactly 2 or 3 bits set
int topTwoBitsOf( int a )
{
int b = a&(a-1); // b = a with LSB cleared
return b&(b-1) ? b : a; // check if clearing the LSB of b produces 0
}
This can be written as a confusing single expression, if you like:
int topTwoBitsOf( int a )
{
return a&(a-1)&((a&(a-1))-1) ? a&(a-1) : a;
}

I'd create a mask in a loop. At the beginning, the mask is 0. Then go from the MSB to the LSB and set each corresponding bit in the mask to 1 until you found 2 set bits. Finally AND the value with this mask.
#include <stdio.h>
#include <stdlib.h>
int clear_bits(int value) {
unsigned int mask = 0;
unsigned int act_bit = 0x80000000;
unsigned int bit_set_count = 0;
do {
if ((value & act_bit) == act_bit) bit_set_count++;
mask = mask | act_bit;
act_bit >>= 1;
} while ((act_bit != 0) && (bit_set_count < 2));
return (value & mask);
}
int main() {
printf("0x2040 => %X\n", clear_bits(0x2040));
printf("0x0300 => %X\n", clear_bits(0x0300));
printf("0x0109 => %X\n", clear_bits(0x0109));
printf("0x5040 => %X\n", clear_bits(0x5040));
return 0;
}
This is quite complicated, but should be more efficient as using a for loop over the 32 bits every time (and clear all bits except the 2 most significant set ones). Anyway, be sure to benchmark different ways before using one.
Of course, if memory is not a problem, use a lookup table approach like some recommended - this will be much faster.

how much memory is available at what latency? I would propose a lookup table ;-)
but seriously: if you would perform this on 100s of numbers, an 8 bit lookup table giving 2 msb and another 8 bit lookup table giving 1 msb may be all you need. Depending on the processor this might beat really counting bits.
For speed, I would create a lookup table mapping an input byte to
M(I)=0 if 1 or 0 bits set
M(I)=B' otherwise, where B' is the value of B with the 2 msb bits set.
Your 32 bit int are 4 input bytes I1 I2 I3 I4.
Lookup M(I1), if nonzero, you're done.
Compare M(I1)==0, if zero, repeat previous step for I2.
Else, lookup I2 in a second lookup table with 1 MSB bits, if nonzero, you're done.
Else, repeat previous step for I3.
etc etc. Don't actually loop anything over I1-4 but unroll it fully.
Summing up: 2 lookup tables with 256 entries, 247/256 of cases are resolved with one lookup, approx 8/256 with two lookups, etc.
edit: the tables, for clarity (input, bits table 2 MSB, bits table 1 MSB)
I table2 table1
0 00000000 00000000
1 00000000 00000001
2 00000000 00000010
3 00000011 00000010
4 00000000 00000100
5 00000101 00000100
6 00000110 00000100
7 00000110 00000100
8 00000000 00001000
9 00001001 00001000
10 00001010 00001000
11 00001010 00001000
12 00001100 00001000
13 00001100 00001000
14 00001100 00001000
15 00001100 00001000
16 00000000 00010000
17 00010001 00010000
18 00010010 00010000
19 00010010 00010000
20 00010100 00010000
..
250 11000000 10000000
251 11000000 10000000
252 11000000 10000000
253 11000000 10000000
254 11000000 10000000
255 11000000 10000000

Here's another attempt (no loops, no lookup, no conditionals). This time it works:
var orig=0x109;
var x=orig;
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
x = orig & ~(x & ~(x >> 1));
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
var solution=orig & ~(x >> 1);
Console.WriteLine(solution.ToString("X")); //0x108
Could probably be shortened by someone cleverer than me.

Following up on my previous answer, here's the complete implementation. I think it is as fast as it can get. (sorry for unrolling the whole thing ;-)
#include <stdio.h>
unsigned char bittable1[256];
unsigned char bittable2[256];
unsigned int lookup(unsigned int);
void gentable(void);
int main(int argc,char**argv)
{
unsigned int challenge = 0x42341223, result;
gentable();
if ( argc > 1 ) challenge = atoi(argv[1]);
result = lookup(challenge);
printf("%08x --> %08x\n",challenge,result);
}
unsigned int lookup(unsigned int i)
{
unsigned int ret;
ret = bittable2[i>>24]<<24; if ( ret ) return ret;
ret = bittable1[i>>24]<<24;
if ( !ret )
{
ret = bittable2[i>>16]<<16; if ( ret ) return ret;
ret = bittable1[i>>16]<<16;
if ( !ret )
{
ret = bittable2[i>>8]<<8; if ( ret ) return ret;
ret = bittable1[i>>8]<<8;
if ( !ret )
{
return bittable2[i] | bittable1[i];
} else {
return (ret | bittable1[i&0xff]);
}
} else {
if ( bittable1[(i>>8)&0xff] )
{
return (ret | (bittable1[(i>>8)&0xff]<<8));
} else {
return (ret | bittable1[i&0xff]);
}
}
} else {
if ( bittable1[(i>>16)&0xff] )
{
return (ret | (bittable1[(i>>16)&0xff]<<16));
} else if ( bittable1[(i>>8)&0xff] ) {
return (ret | (bittable1[(i>>8)&0xff]<<8));
} else {
return (ret | (bittable1[i&0xff]));
}
}
}
void gentable()
{
int i;
for ( i=0; i<256; i++ )
{
int bitset = 0;
int j;
for ( j=128; j; j>>=1 )
{
if ( i&j )
{
bitset++;
if ( bitset == 1 ) bittable1[i] = i&(~(j-1));
else if ( bitset == 2 ) bittable2[i] = i&(~(j-1));
}
}
//printf("%3d %02x %02x\n",i,bittable1[i],bittable2[i]);
}
}

Using a variation of this, I came up with the following:
var orig=56;
var x=orig;
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
Console.WriteLine(orig&~(x>>2));
In c# but should translate easily.
EDIT
I'm not so sure I've answered your question. This takes the highest bit and preserves it and the bit next to it, eg. 101 => 100

Here's some python that should work:
def bit_play(num):
bits_set = 0
upper_mask = 0
bit_index = 31
while bit_index >= 0:
upper_mask |= (1 << bit_index)
if num & (1 << bit_index) != 0:
bits_set += 1
if bits_set == 2:
num &= upper_mask
break
bit_index -= 1
return num
It makes one pass over the number. It builds a mask of the bits that it crosses so it can mask off the bottom bits as soon as it hits the second-most significant one. As soon as it finds the second bit, it proceeds to clear the lower bits. You should be able to create a mask of the upper bits and &= it in instead of the second while loop. Maybe I'll hack that in and edit the post.

I'd also use a table based approach, but I believe one table alone should be sufficient. Take the 4 bit case as an example. If you're input is guaranteed to have 2 or 3 bits, then your output can only be one of 6 values
0011
0101
0110
1001
1010
1100
Put these possible values in an array sorted by size. Starting with the largest, find the first value which is equal to or less than your target value. This is your answer. For the 8 bit version you'll have more possible return values, but still easily less than the maximum possible permutations of 8*7.
public static final int [] MASKS = {
0x03, //0011
0x05, //0101
0x06, //0110
0x09, //1001
0x0A, //1010
0x0C, //1100
};
for (int i = 0; i < 16; ++i) {
if (countBits(i) < 2) {
continue;
}
for (int j = MASKS.length - 1; j >= 0; --j) {
if (MASKS[j] <= i) {
System.out.println(Integer.toBinaryString(i) + " " + Integer.toBinaryString(MASKS[j]));
break;
}
}
}

Here's my implementation in C#
uint OnlyMostSignificant(uint value, int count) {
uint newValue = 0;
int c = 0;
for(uint high = 0x80000000; high != 0 && c < count; high >>= 1) {
if ((value & high) != 0) {
newValue = newValue | high;
c++;
}
}
return newValue;
}
Using count, you could make it the most significant (count) bits.

My solution:
Use "The best method for counting bits in a 32-bit integer", then clear the lower bit if the answer is 3. Only works when input is limited to 2 or 3 bits set.
unsigned int c; // c is the total bits set in v
unsigned int v = value;
v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // temp
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count
crc+=value&value-(c-2);

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio