Assembly language using signed int multiplication math to perform shifts

Assembly language using signed int multiplication math to perform shifts - algorithm

This is a bit of a turn around.
Usually one is attempting to use shifts to perform multiplication and not the other way around.
On the Hitachi/Motorola 6309 there is no shift by n bits. There is only shift by 1 bit.
However there is a 16 bit x 16 bit signed multiply (provides a 32 bit signed result).
(EDIT) Using this is no problem for a 16 bit shift (left) however I'm trying to use 2 x 16x16 signed mults to do a 32 bit shift. The high order word of the result for the low order word shift is the problem. (Does that make sence?)
Some pseudo code might help:
result.highword = low word of (val.highword * shiftmulttable[shift])
temp = val.lowword * shiftmulttable[shift]
result.lowword = temp.lowword
result.highword = or (result.highword, temp.highword)
(with some magic on temp.highword to consider signed values)
I have been exercising my logic in an attempt to use this instruction to perform the shifts but so far I have failed.
I can easily achieve any positive value shifts by 0 to 14 but when it comes to shifting by 15 bits (mult by 0x8000) or shifting any negative values certain combinations of values require either:
complementing the result by 1
complementing the result by 2
adding 1 to the result
doing nothing to the result
And I just can't see any pattern to these values.
Any ideas appreciated!

Best I can tell from the problem description, implementing the 32-bit shift would work as desired by using an unsigned 16x16->32 bit multiply. This can easily be synthesized from a signed 16x16->32 multiply instruction by exploiting the two's complement integer representation. If the two factors are a and b, adding b to the high-order 16 bits of the signed product when a is negative, and adding a to the high-order 16 bits of the signed product when b is negative will give us the unsigned multiplication result.
The following C code implements this approach and tests it exhaustively:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
/* signed 16x16->32 bit multiply. Hardware instruction */
int32_t mul16_wide (int16_t a, int16_t b)
{
return (int32_t)a * (int32_t)b;
}
/* unsigned 16x16->32 bit multiply (synthetic) */
int32_t umul16_wide (int16_t a, int16_t b)
{
int32_t p = mul16_wide (a, b); // signed 16x16->32 bit multiply
if (a < 0) p = p + (b << 16); // add 'b' to upper 16 bits of product
if (b < 0) p = p + (a << 16); // add 'a' to upper 16 bits of product
return p;
}
/* unsigned 16x16->32 bit multiply (reference) */
uint32_t umul16_wide_ref (uint16_t a, uint16_t b)
{
return (uint32_t)a * (uint32_t)b;
}
/* test synthetic unsigned multiply exhaustively */
int main (void)
{
int16_t a, b;
int32_t res, ref;
uint64_t count = 0;
a = -32768;
do {
b = -32768;
do {
res = umul16_wide (a, b);
ref = umul16_wide_ref (a, b);
count++;
if (res != ref) {
printf ("!!!! a=%d b=%d res=%d ref=%d\n", a, b, res, ref);
return EXIT_FAILURE;
}
if (b == 32767) break;
b = b + 1;
} while (1);
if (a == 32767) break;
a = a + 1;
} while (1);
printf ("test cases passed: %llx\n", count);
return EXIT_SUCCESS;
}
I am not familiar with the Hitachi/Motorola 6309 architecture. I assume it uses a special 32-bit register to hold the result of a wide multiply, from which high and low half can be extracted into 16-bit general-purpose registers, and the conditional corrections can then be applied to the register holding the upper 16 bits.

Are you using fixed-point multiplicative inverses to use the high half result for a right shift?
If you're just left-shifting, multiply by 0x8000 should work. The low half of an NxN => 2N-bit multiply is the same whether inputs are treated as signed or unsigned. Or do you need a 32-bit shift result from your 16-bit input?
Is the multiply instruction actually faster than a few 1-bit shifts for small shift counts? (I wouldn't be surprised if compile-time-constant counts of 2 or 3 would be faster with just a chain of 2 or 3 add same,same or left-shift instructions.)
Anyway, for a compile-time-constant shift count of 15, maybe just multiply by 1<<14 and then do the last count with a 1-bit shift (add same,same).
Or if your ISA has rotates, rotate right by 1 and mask away the low bits, skipping the multiply. Or zero a register, right-shift the low bit into the carry flag, then rotate-through-carry into the top of the zeroed register.
(The latter might be useful on an ISA that doesn't have large immediates and couldn't "mask away all the low bits" in one instruction. Or an ISA that only has RCR not ROR. I don't know 6309 at all)
If you're using a runtime count to look up a multiplier from a table, maybe branch for that case, or adjust your LUT so every entry needs an extra 1-bit shift, so you can do mul(lut[count]) and an unconditional extra shift.
(Only works if you don't need to support a shift-count of zero.)

Not that there would be many interested people who would want to see the 6309 code, but here it is:
Compliant with OS9 C ABI.
Pointer to result and arguments pushed on stack right to left.
U,PC,val(4bytes),shift(2bytes),*result(2bytes)
0 2 4 8 10
:
* 10,s pointer to long result
* 4,s 4 byte value
* 8,s 2 byte shift
* x = pointer to result
pshs u
ldx 10,s * load pointer to result
ldd 8,s * load shift
* if shift amount is greater than 31 then
* just return zero. OS9 C standard.
cmpd #32
blt _10x
ldq #0
stq 4,s
bra _13x
* if shift amount is greater than 16 than
* move bottom word of value into top word
* and clear bottom word
_10x
cmpb #16
blt _1x
ldu 6,s
stu 4,s
clr 6,s
clr 7,s
_1x
* setup pointer u and offset e into mult table _2x
leau _2x,pc
andb #15
* if there is no shift value just return value
beq _13x
aslb * need to double shift to use as word table offset
stb 8,s * save double shft
tfr b,e
* shift top word q = val.word.high * multtab[shft]
ldd 4,s
muld e,u
stw ,x * result.word.high = low word of mult
* shift bottom word q = val.word.low * multtab[shft]
lde 8,s * reload double shft
ldd 6,s
muld e,u
stw 2,x * result.word.low = low word of mult
* The high word or mult needs to be corrected for sign
* if val is negative then muld will return negated results
* and need to un negate it
lde 8,s * reload double shift
tst 4,s * test top byte of val for negative
bge _11x
addd e,u * add the multtab[shft] again to top word
_11x
* if multtab[shft] is negative (shft is 15 or shft<<1 is 30)
* also need to un negate result
cmpe #30
bne _12x
addd 6,s * add val.word.low to top word
_12x
* combine top and bottom and save bottom half of result
ord ,x
std ,x
bra _14x
* this is only reached if the result is in value (let result = value)
_13x
ldq 4,s * load value
stq ,x * result = value
_14x
puls u,pc
_2x fdb $01,$02,$04,$08,$10,$20,$40,$80,$0100,$0200,$0400,$0800
fdb $1000,$2000,$4000,$8000

Related

How to divide by 9 using just shifts/add/sub?

Last week I was in an interview and there was a test like this:
Calculate N/9 (given that N is a positive integer), using only
SHIFT LEFT, SHIFT RIGHT, ADD, SUBSTRACT instructions.

first, find the representation of 1/9 in binary
0,0001110001110001
means it's (1/16) + (1/32) + (1/64) + (1/1024) + (1/2048) + (1/4096) + (1/65536)
so (x/9) equals (x>>4) + (x>>5) + (x>>6) + (x>>10) + (x>>11)+ (x>>12)+ (x>>16)
Possible optimization (if loops are allowed):
if you loop over 0001110001110001b right shifting it each loop,
add "x" to your result register whenever the carry was set on this shift
and shift your result right each time afterwards,
your result is x/9
mov cx, 16 ; assuming 16 bit registers
mov bx, 7281 ; bit mask of 2^16 * (1/9)
mov ax, 8166 ; sample value, (1/9 of it is 907)
mov dx, 0 ; dx holds the result
div9:
inc ax ; or "add ax,1" if inc's not allowed :)
; workaround for the fact that 7/64
; are a bit less than 1/9
shr bx,1
jnc no_add
add dx,ax
no_add:
shr dx,1
dec cx
jnz div9
( currently cannot test this, may be wrong)

you can use fixed point math trick.
so you just scale up so the significant fraction part goes to integer range, do the fractional math operation you need and scale back.
a/9 = ((a*10000)/9)/10000
as you can see I scaled by 10000. Now the integer part of 10000/9=1111 is big enough so I can write:
a/9 = ~a*1111/10000
power of 2 scale
If you use power of 2 scale then you need just to use bit-shift instead of division. You need to compromise between precision and input value range. I empirically found that on 32 bit arithmetics the best scale for this is 1<<18 so:
(((a+1)<<18)/9)>>18 = ~a/9;
The (a+1) corrects the rounding errors back to the right range.
Hardcoded multiplication
Rewrite the multiplication constant to binary
q = (1<<18)/9 = 29127 = 0111 0001 1100 0111 bin
Now if you need to compute c=(a*q) use hard-coded binary multiplication: for each 1 of the q you can add a<<(position_of_1) to the c. If you see something like 111 you can rewrite it to 1000-1 minimizing the number of operations.
If you put all of this together you should got something like this C++ code of mine:
DWORD div9(DWORD a)
{
// ((a+1)*q)>>18 = (((a+1)<<18)/9)>>18 = ~a/9;
// q = (1<<18)/9 = 29127 = 0111 0001 1100 0111 bin
// valid for a = < 0 , 147455 >
DWORD c;
c =(a<< 3)-(a ); // c= a*29127
c+=(a<< 9)-(a<< 6);
c+=(a<<15)-(a<<12);
c+=29127; // c= (a+1)*29127
c>>=18; // c= ((a+1)*29127)>>18
return c;
}
Now if you see the binary form the pattern 111000 is repeating so yu can further improve the code a bit:
DWORD div9(DWORD a)
{
DWORD c;
c =(a<<3)-a; // first pattern
c+=(c<<6)+(c<<12); // and the other 2...
c+=29127;
c>>=18;
return c;
}

How to calculate g values from LIS3DH sensor?

I am using LIS3DH sensor with ATmega128 to get the acceleration values to get motion. I went through the datasheet but it seemed inadequate so I decided to post it here. From other posts I am convinced that the sensor resolution is 12 bit instead of 16 bit. I need to know that when finding g value from the x-axis output register, do we calculate the two'2 complement of the register values only when the sign bit MSB of OUT_X_H (High bit register) is 1 or every time even when this bit is 0.
From my calculations I think that we calculate two's complement only when MSB of OUT_X_H register is 1.
But the datasheet says that we need to calculate two's complement of both OUT_X_L and OUT_X_H every time.
Could anyone enlighten me on this ?
Sample code
int main(void)
{
stdout = &uart_str;
UCSRB=0x18; // RXEN=1, TXEN=1
UCSRC=0x06; // no parit, 1-bit stop, 8-bit data
UBRRH=0;
UBRRL=71; // baud 9600
timer_init();
TWBR=216; // 400HZ
TWSR=0x03;
TWCR |= (1<<TWINT)|(1<<TWSTA)|(0<<TWSTO)|(1<<TWEN);//TWCR=0x04;
printf("\r\nLIS3D address: %x\r\n",twi_master_getchar(0x0F));
twi_master_putchar(0x23, 0b000100000);
printf("\r\nControl 4 register 0x23: %x", twi_master_getchar(0x23));
printf("\r\nStatus register %x", twi_master_getchar(0x27));
twi_master_putchar(0x20, 0x77);
DDRB=0xFF;
PORTB=0xFD;
SREG=0x80; //sei();
while(1)
{
process();
}
}
void process(void){
x_l = twi_master_getchar(0x28);
x_h = twi_master_getchar(0x29);
y_l = twi_master_getchar(0x2a);
y_h = twi_master_getchar(0x2b);
z_l = twi_master_getchar(0x2c);
z_h = twi_master_getchar(0x2d);
xvalue = (short int)(x_l+(x_h<<8));
yvalue = (short int)(y_l+(y_h<<8));
zvalue = (short int)(z_l+(z_h<<8));
printf("\r\nx_val: %ldg", x_val);
printf("\r\ny_val: %ldg", y_val);
printf("\r\nz_val: %ldg", z_val);
}
I wrote the CTRL_REG4 as 0x10(4g) but when I read them I got 0x20(8g). This seems bit bizarre.

Do not compute the 2s complement. That has the effect of making the result the negative of what it was.
Instead, the datasheet tells us the result is already a signed value. That is, 0 is not the lowest value; it is in the middle of the scale. (0xffff is just a little less than zero, not the highest value.)
Also, the result is always 16-bit, but the result is not meant to be taken to be that accurate. You can set a control register value to to generate more accurate values at the expense of current consumption, but it is still not guaranteed to be accurate to the last bit.

the datasheet does not say (at least the register description in chapter 8.2) you have to calculate the 2' complement but stated that the contents of the 2 registers is in 2's complement.
so all you have to do is receive the two bytes and cast it to an int16_t to get the signed raw value.
uint8_t xl = 0x00;
uint8_t xh = 0xFC;
int16_t x = (int16_t)((((uint16)xh) << 8) | xl);
or
uint8_t xa[2] {0x00, 0xFC}; // little endian: lower byte to lower address
int16_t x = *((int16*)xa);
(hope i did not mixed something up with this)

I have another approach, which may be easier to implement as the compiler will do all of the work for you. The compiler will probably do it most efficiently and with no bugs too.
Read the raw data into the raw field in:
typedef union
{
struct
{
// in low power - 8 significant bits, left justified
int16 reserved : 8;
int16 value : 8;
} lowPower;
struct
{
// in normal power - 10 significant bits, left justified
int16 reserved : 6;
int16 value : 10;
} normalPower;
struct
{
// in high resolution - 12 significant bits, left justified
int16 reserved : 4;
int16 value : 12;
} highPower;
// the raw data as read from registers H and L
uint16 raw;
} LIS3DH_RAW_CONVERTER_T;
than use the value needed according to the power mode you are using.
Note: In this example, bit fields structs are BIG ENDIANS.
Check if you need to reverse the order of 'value' and 'reserved'.

The LISxDH sensors are 2's complement, left-justified. They can be set to 12-bit, 10-bit, or 8-bit resolution. This is read from the sensor as two 8-bit values (LSB, MSB) that need to be assembled together.
If you set the resolution to 8-bit, just can just cast LSB to int8, which is the likely your processor's representation of 2's complement (8bit). Likewise, if it were possible to set the sensor to 16-bit resolution, you could just cast that to an int16.
However, if the value is 10-bit left justified, the sign bit is in the wrong place for an int16. Here is how you convert it to int16 (16-bit 2's complement).
1.Read LSB, MSB from the sensor:
[MMMM MMMM] [LL00 0000]
[1001 0101] [1100 0000] //example = [0x95] [0xC0] (note that the LSB comes before MSB on the sensor)
2.Assemble the bytes, keeping in mind the LSB is left-justified.
//---As an example....
uint8_t byteMSB = 0x95; //[1001 0101]
uint8_t byteLSB = 0xC0; //[1100 0000]
//---Cast to U16 to make room, then combine the bytes---
assembledValue = ( (uint16_t)(byteMSB) << UINT8_LEN ) | (uint16_t)byteLSB;
/*[MMMM MMMM LL00 0000]
[1001 0101 1100 0000] = 0x95C0 */
//---Shift to right justify---
assembledValue >>= (INT16_LEN-numBits);
/*[0000 00MM MMMM MMLL]
[0000 0010 0101 0111] = 0x0257 */
3.Convert from 10-bit 2's complement (now right-justified) to an int16 (which is just 16-bit 2's complement on most platforms).
Approach #1: If the sign bit (in our example, the tenth bit) = 0, then just cast it to int16 (since positive numbers are represented the same in 10-bit 2's complement and 16-bit 2's complement).
If the sign bit = 1, then invert the bits (keeping just the 10bits), add 1 to the result, then multiply by -1 (as per the definition of 2's complement).
convertedValueI16 = ~assembledValue; //invert bits
convertedValueI16 &= ( 0xFFFF>>(16-numBits) ); //but keep just the 10-bits
convertedValueI16 += 1; //add 1
convertedValueI16 *=-1; //multiply by -1
/*Note that the last two lines could be replaced by convertedValueI16 = ~convertedValueI16;*/
//result = -425 = 0xFE57 = [1111 1110 0101 0111]
Approach#2: Zero the sign bit (10th bit) and subtract out half the range 1<<9
//----Zero the sign bit (tenth bit)----
convertedValueI16 = (int16_t)( assembledValue^( 0x0001<<(numBits-1) ) );
/*Result = 87 = 0x57 [0000 0000 0101 0111]*/
//----Subtract out half the range----
convertedValueI16 -= ( (int16_t)(1)<<(numBits-1) );
[0000 0000 0101 0111]
-[0000 0010 0000 0000]
= [1111 1110 0101 0111];
/*Result = 87 - 512 = -425 = 0xFE57
Link to script to try out (not optimized): http://tpcg.io/NHmBRR

Explanation of the calc_delta_mine function

I am currently reading "Linux Kernel Development" by Robert Love, and I got a few questions about the CFS.
My question is how calc_delta_mine calculates :
delta_exec_weighted= (delta_exec * weight)/lw->weight
I guess it is done by two steps :
calculation the (delta_exec * 1024) :
if (likely(weight > (1UL << SCHED_LOAD_RESOLUTION)))
tmp = (u64)delta_exec * scale_load_down(weight);
else
tmp = (u64)delta_exec;
calculate the /lw->weight ( or * lw->inv_weight ) :
if (!lw->inv_weight) {
unsigned long w = scale_load_down(lw->weight);
if (BITS_PER_LONG > 32 && unlikely(w >= WMULT_CONST))
lw->inv_weight = 1;
else if (unlikely(!w))
lw->inv_weight = WMULT_CONST;
else
lw->inv_weight = WMULT_CONST / w;
}
/*
* Check whether we'd overflow the 64-bit multiplication:
*/
if (unlikely(tmp > WMULT_CONST))
tmp = SRR(SRR(tmp, WMULT_SHIFT/2) * lw->inv_weight,
WMULT_SHIFT/2);
else
tmp = SRR(tmp * lw->inv_weight, WMULT_SHIFT);
return (unsigned long)min(tmp, (u64)(unsigned long)LONG_MAX);
The SRR (Shift right and round) macro is defined via :
#define SRR(x, y) (((x) + (1UL << ((y) - 1))) >> (y))
And the other MACROS are defined :
#if BITS_PER_LONG == 32
# define WMULT_CONST (~0UL)
#else
# define WMULT_CONST (1UL << 32)
#endif
#define WMULT_SHIFT 32
Can someone please explain how exactly the SRR works and how does this check the 64-bit multiplication overflow?
And please explain the definition of the MACROS in this function((~0UL) ,(1UL << 32))?

The code you posted is basically doing calculations using 32.32 fixed-point arithmetic, where a single 64-bit quantity holds the integer part of the number in the high 32 bits, and the decimal part of the number in the low 32 bits (so, for example, 1.5 is 0x0000000180000000 in this system). WMULT_CONST is thus an approximation of 1.0 (using a value that can fit in a long for platform efficiency considerations), and so dividing WMULT_CONST by w computes 1/w as a 32.32 value.
Note that multiplying two 32.32 values together as integers produces a result that is 232 times too large; thus, WMULT_SHIFT (=32) is the right shift value needed to normalize the result of multiplying two 32.32 values together back down to 32.32.
The necessity of using this improved precision for scheduling purposes is explained in a comment in sched/sched.h:
/*
* Increase resolution of nice-level calculations for 64-bit architectures.
* The extra resolution improves shares distribution and load balancing of
* low-weight task groups (eg. nice +19 on an autogroup), deeper taskgroup
* hierarchies, especially on larger systems. This is not a user-visible change
* and does not change the user-interface for setting shares/weights.
*
* We increase resolution only if we have enough bits to allow this increased
* resolution (i.e. BITS_PER_LONG > 32). The costs for increasing resolution
* when BITS_PER_LONG <= 32 are pretty high and the returns do not justify the
* increased costs.
*/
As for SRR, mathematically, it computes the rounded result of x / 2y.
To round the result of a division x/q you can calculate x + q/2 floor-divided by q; this is what SRR does by calculating x + 2y-1 floor-divided by 2y.

Converting data from 8 bits to 12 bits

I am getting signal that is stored as a buffer of char data (8 bits).
I am also getting the same signal plus 24 dB and my boss told me that it should be possible to reconstruct from those two buffers, one (which will be used as output) that will be stored as 12 bits.
I would like to know the mathematical operation that can do that and why choosing +24dB.
Thanks (I am dumb ><).

From the problem statement, I guess you have an analog signal which are sampled at two amlitudes. Both signals has a resolution of 8 bits, but one is shifted and truncated.
You could get a 12 bit signal by combining the upper 4 bits of the first signal, and concatenating them with the second signal.
sOut = ((sIn1 & 0xF0) << 4) | sIn2
If you want to get a little better accuracy, you could try to calculate an average over the common bits of the two signals. Normally, the lower 4 bits of the first signal should be approximately equal to the upper 4 bits of the second signal. Due to rounding-errors or noise, the values could be slightly different. One of the values could even have overflowed, and moved to the other end of the range.
int Combine(byte sIn1, byte sIn2)
{
int a = sIn1 >> 4; // Upper 4 bits
int b1 = sIn1 & 0x0F; // Common middle 4 bits
int b2 = sIn2 >> 4; // Common middle 4 bits
int c = sIn2 & 0x0F; // Lower 4 bits
int b;
if (b1 >= 12 && b2 < 4)
{
// Assume b2 has overflowed, and wrapped around to a smaller value.
// We need to add 16 to it to compensate the average.
b = (b1 + b2 + 16)/2;
}
else if (b1 < 4 && b2 >= 12)
{
// Assume b2 has underflowed, and wrapped around to a larger value.
// We need to subtract 16 from it to compensate the average.
b = (b1 + b2 - 16)/2;
}
else
{
// Neither or both has overflowed. Just take the average.
b = (b1 + b2)/2;
}
// Construct the combined signal.
return a * 256 + b * 16 + c;
}
When I tested this, it reproduced the signal accurately more often than the first formula.

Counting, reversed bit pattern

I am trying to find an algorithm to count from 0 to 2n-1 but their bit pattern reversed. I care about only n LSB of a word. As you may have guessed I failed.
For n=3:
000 -> 0
100 -> 4
010 -> 2
110 -> 6
001 -> 1
101 -> 5
011 -> 3
111 -> 7
You get the idea.
Answers in pseudo-code is great. Code fragments in any language are welcome, answers without bit operations are preferred.
Please don't just post a fragment without even a short explanation or a pointer to a source.
Edit: I forgot to add, I already have a naive implementation which just bit-reverses a count variable. In a sense, this method is not really counting.

This is, I think easiest with bit operations, even though you said this wasn't preferred
Assuming 32 bit ints, here's a nifty chunk of code that can reverse all of the bits without doing it in 32 steps:
unsigned int i;
i = (i & 0x55555555) << 1 | (i & 0xaaaaaaaa) >> 1;
i = (i & 0x33333333) << 2 | (i & 0xcccccccc) >> 2;
i = (i & 0x0f0f0f0f) << 4 | (i & 0xf0f0f0f0) >> 4;
i = (i & 0x00ff00ff) << 8 | (i & 0xff00ff00) >> 8;
i = (i & 0x0000ffff) << 16 | (i & 0xffff0000) >> 16;
i >>= (32 - n);
Essentially this does an interleaved shuffle of all of the bits. Each time around half of the bits in the value are swapped with the other half.
The last line is necessary to realign the bits so that bin "n" is the most significant bit.
Shorter versions of this are possible if "n" is <= 16, or <= 8

At each step, find the leftmost 0 digit of your value. Set it, and clear all digits to the left of it. If you don't find a 0 digit, then you've overflowed: return 0, or stop, or crash, or whatever you want.
This is what happens on a normal binary increment (by which I mean it's the effect, not how it's implemented in hardware), but we're doing it on the left instead of the right.
Whether you do this in bit ops, strings, or whatever, is up to you. If you do it in bitops, then a clz (or call to an equivalent hibit-style function) on ~value might be the most efficient way: __builtin_clz where available. But that's an implementation detail.

This solution was originally in binary and converted to conventional math as the requester specified.
It would make more sense as binary, at least the multiply by 2 and divide by 2 should be << 1 and >> 1 for speed, the additions and subtractions probably don't matter one way or the other.
If you pass in mask instead of nBits, and use bitshifting instead of multiplying or dividing, and change the tail recursion to a loop, this will probably be the most performant solution you'll find since every other call it will be nothing but a single add, it would only be as slow as Alnitak's solution once every 4, maybe even 8 calls.
int incrementBizarre(int initial, int nBits)
// in the 3 bit example, this should create 100
mask=2^(nBits-1)
// This should only return true if the first (least significant) bit is not set
// if initial is 011 and mask is 100
// 3 4, bit is not set
if(initial < mask)
// If it was not, just set it and bail.
return initial+ mask // 011 (3) + 100 (4) = 111 (7)
else
// it was set, are we at the most significant bit yet?
// mask 100 (4) / 2 = 010 (2), 001/2 = 0 indicating overflow
if(mask / 2) > 0
// No, we were't, so unset it (initial-mask) and increment the next bit
return incrementBizarre(initial - mask, mask/2)
else
// Whoops we were at the most significant bit. Error condition
throw new OverflowedMyBitsException()
Wow, that turned out kinda cool. I didn't figure in the recursion until the last second there.
It feels wrong--like there are some operations that should not work, but they do because of the nature of what you are doing (like it feels like you should get into trouble when you are operating on a bit and some bits to the left are non-zero, but it turns out you can't ever be operating on a bit unless all the bits to the left are zero--which is a very strange condition, but true.
Example of flow to get from 110 to 001 (backwards 3 to backwards 4):
mask 100 (4), initial 110 (6); initial < mask=false; initial-mask = 010 (2), now try on the next bit
mask 010 (2), initial 010 (2); initial < mask=false; initial-mask = 000 (0), now inc the next bit
mask 001 (1), initial 000 (0); initial < mask=true; initial + mask = 001--correct answer

Here's a solution from my answer to a different question that computes the next bit-reversed index without looping. It relies heavily on bit operations, though.
The key idea is that incrementing a number simply flips a sequence of least-significant bits, for example from nnnn0111 to nnnn1000. So in order to compute the next bit-reversed index, you have to flip a sequence of most-significant bits. If your target platform has a CTZ ("count trailing zeros") instruction, this can be done efficiently.
Example in C using GCC's __builtin_ctz:
void iter_reversed(unsigned bits) {
unsigned n = 1 << bits;
for (unsigned i = 0, j = 0; i < n; i++) {
printf("%x\n", j);
// Compute a mask of LSBs.
unsigned mask = i ^ (i + 1);
// Length of the mask.
unsigned len = __builtin_ctz(~mask);
// Align the mask to MSB of n.
mask <<= bits - len;
// XOR with mask.
j ^= mask;
}
}
Without a CTZ instruction, you can also use integer division:
void iter_reversed(unsigned bits) {
unsigned n = 1 << bits;
for (unsigned i = 0, j = 0; i < n; i++) {
printf("%x\n", j);
// Find least significant zero bit.
unsigned bit = ~i & (i + 1);
// Using division to bit-reverse a single bit.
unsigned rev = (n / 2) / bit;
// XOR with mask.
j ^= (n - 1) & ~(rev - 1);
}
}

void reverse(int nMaxVal, int nBits)
{
int thisVal, bit, out;
// Calculate for each value from 0 to nMaxVal.
for (thisVal=0; thisVal<=nMaxVal; ++thisVal)
{
out = 0;
// Shift each bit from thisVal into out, in reverse order.
for (bit=0; bit<nBits; ++bit)
out = (out<<1) + ((thisVal>>bit) & 1)
}
printf("%d -> %d\n", thisVal, out);
}

Maybe increment from 0 to N (the "usual" way") and do ReverseBitOrder() for each iteration. You can find several implementations here (I like the LUT one the best).
Should be really quick.

Here's an answer in Perl. You don't say what comes after the all ones pattern, so I just return zero. I took out the bitwise operations so that it should be easy to translate into another language.
sub reverse_increment {
my($n, $bits) = #_;
my $carry = 2**$bits;
while($carry > 1) {
$carry /= 2;
if($carry > $n) {
return $carry + $n;
} else {
$n -= $carry;
}
}
return 0;
}

Here's a solution which doesn't actually try to do any addition, but exploits the on/off pattern of the seqence (most sig bit alternates every time, next most sig bit alternates every other time, etc), adjust n as desired:
#define FLIP(x, i) do { (x) ^= (1 << (i)); } while(0)
int main() {
int n = 3;
int max = (1 << n);
int x = 0;
for(int i = 1; i <= max; ++i) {
std::cout << x << std::endl;
/* if n == 3, this next part is functionally equivalent to this:
*
* if((i % 1) == 0) FLIP(x, n - 1);
* if((i % 2) == 0) FLIP(x, n - 2);
* if((i % 4) == 0) FLIP(x, n - 3);
*/
for(int j = 0; j < n; ++j) {
if((i % (1 << j)) == 0) FLIP(x, n - (j + 1));
}
}
}

How about adding 1 to the most significant bit, then carrying to the next (less significant) bit, if necessary. You could speed this up by operating on bytes:
Precompute a lookup table for counting in bit-reverse from 0 to 256 (00000000 -> 10000000, 10000000 -> 01000000, ..., 11111111 -> 00000000).
Set all bytes in your multi-byte number to zero.
Increment the most significant byte using the lookup table. If the byte is 0, increment the next byte using the lookup table. If the byte is 0, increment the next byte...
Go to step 3.

With n as your power of 2 and x the variable you want to step:
(defun inv-step (x n) ; the following is a function declaration
"returns a bit-inverse step of x, bounded by 2^n" ; documentation
(do ((i (expt 2 (- n 1)) ; loop, init of i
(/ i 2)) ; stepping of i
(s x)) ; init of s as x
((not (integerp i)) ; breaking condition
s) ; returned value if all bits are 1 (is 0 then)
(if (< s i) ; the loop's body: if s < i
(return-from inv-step (+ s i)) ; -> add i to s and return the result
(decf s i)))) ; else: reduce s by i
I commented it thoroughly as you may not be familiar with this syntax.
edit: here is the tail recursive version. It seems to be a little faster, provided that you have a compiler with tail call optimization.
(defun inv-step (x n)
(let ((i (expt 2 (- n 1))))
(cond ((= n 1)
(if (zerop x) 1 0)) ; this is really (logxor x 1)
((< x i)
(+ x i))
(t
(inv-step (- x i) (- n 1))))))

When you reverse 0 to 2^n-1 but their bit pattern reversed, you pretty much cover the entire 0-2^n-1 sequence
Sum = 2^n * (2^n+1)/2
O(1) operation. No need to do bit reversals

Edit: Of course original poster's question was about to do increment by (reversed) one, which makes things more simple than adding two random values. So nwellnhof's answer contains the algorithm already.
Summing two bit-reversal values
Here is one solution in php:
function RevSum ($a,$b) {
// loop until our adder, $b, is zero
while ($b) {
// get carry (aka overflow) bit for every bit-location by AND-operation
// 0 + 0 --> 00 no overflow, carry is "0"
// 0 + 1 --> 01 no overflow, carry is "0"
// 1 + 0 --> 01 no overflow, carry is "0"
// 1 + 1 --> 10 overflow! carry is "1"
$c = $a & $b;
// do 1-bit addition for every bit location at once by XOR-operation
// 0 + 0 --> 00 result = 0
// 0 + 1 --> 01 result = 1
// 1 + 0 --> 01 result = 1
// 1 + 1 --> 10 result = 0 (ignored that "1", already taken care above)
$a ^= $b;
// now: shift carry bits to the next bit-locations to be added to $a in
// next iteration.
// PHP_INT_MAX here is used to ensure that the most-significant bit of the
// $b will be cleared after shifting. see link in the side note below.
$b = ($c >> 1) & PHP_INT_MAX;
}
return $a;
}
Side note: See this question about shifting negative values.
And as for test; start from zero and increment value by 8-bit reversed one (10000000):
$value = 0;
$add = 0x80; // 10000000 <-- "one" as bit reversed
for ($count = 20; $count--;) { // loop 20 times
printf("%08b\n", $value); // show value as 8-bit binary
$value = RevSum($value, $add); // do addition
}
... will output:
00000000
10000000
01000000
11000000
00100000
10100000
01100000
11100000
00010000
10010000
01010000
11010000
00110000
10110000
01110000
11110000
00001000
10001000
01001000
11001000

Let assume number 1110101 and our task is to find next one.
1) Find zero on highest position and mark position as index.
11101010 (4th position, so index = 4)
2) Set to zero all bits on position higher than index.
00001010
3) Change founded zero from step 1) to '1'
00011010
That's it. This is by far the fastest algorithm since most of cpu's has instructions to achieve this very efficiently. Here is a C++ implementation which increment 64bit number in reversed patern.
#include <intrin.h>
unsigned __int64 reversed_increment(unsigned __int64 number)
{
unsigned long index, result;
_BitScanReverse64(&index, ~number); // returns index of the highest '1' on bit-reverse number (trick to find the highest '0')
result = _bzhi_u64(number, index); // set to '0' all bits at number higher than index position
result |= (unsigned __int64) 1 << index; // changes to '1' bit on index position
return result;
}
Its not hit your requirements to have "no bits" operations, however i fear there is now way how to achieve something similar without them.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio