Reverse Engineering, Left Bit shift by seven - bit

I've been trying to reverse engineer a function of a game but I'm kinda confused. I'm pretty new to reverse engineering (I'm using ollydbg btw) so I don't really know about all the tricks and details yet.
Anyway here's my problem. This function is called when you pick up any Item in the game. It then calculates the value of the item and adds this value to your score. Before the function is called, a value is pushed which I'm quite confident is the ID of the item.
This is the code that confuses me:
SHL ESI,7
MOV CX,WORD PTR DS:[EDX+ESI+42]
ESI = the ID of the item
EDX = constant value FE56A0
I was guessing that EDX (FE56A0) was the start of an array of items, ESI was the index of the item somehow and 42 would be the index of the value the item holds. This would be kinda weird though since your bit shifting ESI to the left by 7. As ESI increases, it's bit shifted value doesn't grow linearly.
So if EDX represent the start of an array and ESI would be an index, the items in the array wouldn't be of equal size.
The meaning of this code is puzzling me.
Anyone got an idea what this code could represent?

The array might hold 128 byte long structures. Shifting by 7 multiplies the ID by 128, giving the offset required to access the structure for that ID. 42 would be the offset into the structure.
This works because multiplication actually increases the multiplied index linearly:
0 << 7 == 0
1 << 7 == 128
2 << 7 == 256
3 << 7 == 384
etc.
This code snippet simply accesses a member of a structure stored in an array.

It could be that EDX points to the start of some structure which the array is part of. The data that comes before the array requires 42 bytes, and each element in the array requires 128 bytes. (1<<7 is 128 - shifting is often used as a quick way to multiply by a power of two.) For example, something like this:
// EDX points here
struct GameItems
{
int numItems;
int stuff;
int moreStuff;
char[30] data;
GameItem[MAX_ITEMS] items; // offset 42 bytes from start
};
struct GameItem
{
// 128-bit structure
}

Related

algorithm of addressing a triangle matrices memory using assembly

I was doing a project in ASM about pascal triangle using NASM
so in the project you need to calculate pascal triangle from line 0 to line 63
my first problem is where to store the results of calculation -> memory
second problem what type of storage I use in memory, to understand what I mean I have 3 way first declare a full matrices so will be like this way
memoryLabl: resd 63*64 ; 63 rows of 64 columns each
but the problem in this way that half of matrices is not used that make my program not efficient so let's go the second method is available
which is declare for every line a label for memory
for example :
line0: dd 1
line1: dd 1,1
line2: dd 1,2,1 ; with pre-filled data for example purposes
...
line63: resd 64 ; reserve space for 64 dword entries
this way of doing it is like do it by hand,
some other from the class try to use macro as you can see here
but i don't get it
so far so good
let's go to the last one that i have used
which is like the first one but i use a triangle matrices , how is that,
by declaring only the amount of memory that i need
so to store line 0 to line 63 line of pascal triangle, it's give me a triangle matrices because every new line I add a cell
I have allocate 2080 dword for the triangle matrices how is that ??
explain by 2080 dword:
okey we have line0 have 1 dword /* 1 number in first line */
line1 have 2 dword /* 2 numbers in second line */
line2 have 3 dword /* 3 numbers in third line */
...
line63 have 64 dword /* 64 numbers in final line*/
so in the end we have 2080 as the sum of them
I have give every number 1 dword
okey now we have create the memory to store results let's start calculation
first# in pascal triangle you have all the cells in row 0 have value 1
I will do it in pseudo code so you understand how I put one in all cells of row 0:
s=0
for(i=0;i<64;i++):
s = s+i
mov dword[x+s*4],1 /* x is addresses of triangle matrices */
second part in pascal triangle is to have the last row of each line equal to 1
I will use pseudo code to make it simple
s=0
for(i=2;i<64;i++):
s = s+i
mov dword[x+s*4],1
I start from i equal to 2 because i = 0 (i=1) is line0 (line1) and line0 (line1)is full because is hold only one (tow) value as I say in above explanation
so the tow pseudo code will make my rectangle look like in memory :
1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
...
1 1
now come the hard part is the calculation using this value in triangle to fill all the triangle cells
let's start with the idea here
let's take cell[line][row]
we have cell[2][1] = cell[1][0]+cell[1][1]
and cell[3][1]= cell[2][0]+cell[2][1]
cell[3][2]= cell[2][1]+cell[2][2]
in **general** we have
cell[line][row]= cell[line-1][row-1]+cell[line-1][row]
my problem I could not break this relation using ASM instruction because i have a
triangle matrices which weird to work with can any one help me to break it using a relation or very basic pseudo code or asm code ?
TL:DR: you just need to traverse the array sequentially, so you don't have to work out the indexing. See the 2nd section.
To random access index into a (lower) triangular matrix, row r starts after a triangle of size r-1. A triangle of size n has n*(n+1)/2 total elements, using Gauss's formula for the sum of numbers from 1 to n-1. So a triangle of size r-1 has (r-1)*r/2 elements. Indexing a column within a row is of course trivial, once we know the address of the start of a row.
Each DWORD element is 4 bytes wide, and we can take care of that scaling as part of the multiply, because lea lets us shift and add as well as put the result in a different register. We simplify n*(n-1)/2 elements * 4 bytes / elem to n*(n-1) * 2 bytes.
The above reasoning works for 1-based indexing, where row 1 has 1 element. We have to adjust for that if we want zero-based indexing by adding 1 to row indices before the calculation, so we want the size of a triangle
with r+1 - 1 rows, thus r*(r+1)/2 * 4 bytes. It helps to put the linear array index into a triangle to quickly double-check the formula
0
4 8
12 16 20
24 28 32 36
40 44 48 52 56
60 64 68 72 76 80
84 88 92 96 100 104 108
The 4th row, which we're calling "row 3", starts 24 bytes from the start of the whole array. That's (3+1)*(3+1-1) * 2 = (3+1)*3 * 2; yes the r*(r+1)/2 formula works.
;; given a row number in EDI, and column in ESI (zero-extended into RSI)
;; load triangle[row][col] into eax
lea ecx, [2*rdi + 2]
imul ecx, edi ; ecx = r*(r+1) * 2 bytes
mov eax, [triangle + rcx + rsi*4]
This assuming 32-bit absolute addressing is ok (32-bit absolute addresses no longer allowed in x86-64 Linux?). If not, use a RIP-relative LEA to get the triangle base address in a register, and add that to rsi*4. x86 addressing modes can only have 3 components when one of them is a constant. But that is the case here for your static triangle, so we can take full advantage by using a scaled index for the column, and base as our calculated row offset, and the actual array address as the displacement.
Calculating the triangle
The trick here is that you only need to loop over it sequentially; you don't need random access to a given row/column.
You read one row while writing the one below. When you get to the end of a row, the next element is the start of the next row. The source and destination pointers will get farther and farther from each other as you go down the rows, because the destination is always 1 whole row ahead. And you know the length of a row = row number, so you can actually use the row counter as the offset.
global _start
_start:
mov esi, triangle ; src = address of triangle[0,0]
lea rdi, [rsi+4] ; dst = address of triangle[1,0]
mov dword [rsi], 1 ; triangle[0,0] = 1 special case: there is no source
.pascal_row: ; do {
mov rcx, rdi ; RCX = one-past-end of src row = start of dst row
xor eax, eax ; EAX = triangle[row-1][col-1] = 0 for first iteration
;; RSI points to start of src row: triangle[row-1][0]
;; RDI points to start of dst row: triangle[row ][0]
.column:
mov edx, [rsi] ; tri[r-1, c] ; will load 1 on the first iteration
add eax, edx ; eax = tri[r-1, c-1] + tri[r-1, c]
mov [rdi], eax ; store to triangle[row, col]
add rdi, 4 ; ++dst
add rsi, 4 ; ++src
mov eax, edx ; becomes col-1 src value for next iteration
cmp rsi, rcx
jb .column ; }while(src < end_src)
;; RSI points to one-past-end of src row, i.e. start of next row = src for next iteration
;; RDI points to last element of dst row (because dst row is 1 element longer than src row)
mov dword [rdi], 1 ; [r,r] = 1 end of a row
add rdi, 4 ; this is where dst-src distance grows each iteration
cmp rdi, end_triangle
jb .pascal_row
;;; triangle is constructed. Set a breakpoint here to look at it with a debugger
xor edi,edi
mov eax, 231
syscall ; Linux sys_exit_group(0), 64-bit ABI
section .bss
; you could just as well use resd 64*65/2
; but put a label on each row for debugging convenience.
ALIGN 16
triangle:
%assign i 0
%rep 64
row %+ i: resd i + 1
%assign i i+1
%endrep
end_triangle:
I tested this and it works: correct values in memory, and it stops at the right place. But note that integer overflow happens before you get down to the last row. This would be avoided if you used 64-bit integers (simple change to register names and offsets, and don't forget resd to resq). 64 choose 32 is 1832624140942590534 = 2^60.66.
The %rep block to reserve space and label each row as row0, row1, etc. is from my answer to the question you linked about macros, much more sane than the other answer IMO.
You tagged this NASM, so that's what I used because I'm familiar with it. The syntax you used in your question was MASM (until the last edit). The main logic is the same in MASM, but remember that you need OFFSET triangle to get the address as an immediate, instead of loading from it.
I used x86-64 because 32-bit is obsolete, but I avoided too many registers, so you can easily port this to 32-bit if needed. Don't forget to save/restore call-preserved registers if you put this in a function instead of a stand-alone program.
Unrolling the inner loop could save some instructions copying registers around, as well as the loop overhead. This is a somewhat optimized implementation, but I mostly limited it to optimizations that make the code simpler as well as smaller / faster. (Except maybe for using pointer increments instead of indexing.) It took a while to make it this clean and simple. :P
Different ways of doing the array indexing would be faster on different CPUs. e.g. perhaps use an indexed addressing mode (relative to dst) for the loads in the inner loop, so only one pointer increment is needed. But if you want it to run fast, SSE2 or AVX2 vpaddd could be good. Shuffling with palignr might be useful, but probably also unaligned loads instead of some of the shuffling, especially with AVX2 or AVX512.
But anyway, this is my version; I'm not trying to write it the way you would, you need to write your own for your assignment. I'm writing for future readers who might learn something about what's efficient on x86. (See also the performance section in the x86 tag wiki.)
How I wrote that:
I started writing the code from the top, but quickly realized that off-by-one errors were going to be tricky, and I didn't want to just write it the stupid way with branches inside the loops for special cases.
What ended up helping was writing the comments for the pre and post conditions on the pointers for the inner loop. That made it clear I needed to enter the loop with eax=0, instead of with eax=1 and storing eax as the first operation inside the loop, or something like that.
Obviously each source value only needs to be read once, so I didn't want to write an inner loop that reads [rsi] and [rsi+4] or something. Besides, that would have made it harder to get the boundary condition right (where a non-existant value has to read as 0).
It took some time to decide whether I was going to have an actual counter in a register for row length or row number, before I ended up just using an end-pointer for the whole triangle. It wasn't obvious before I finished that using pure pointer increments / compares was going to save so many instructions (and registers when the upper bound is a build-time constant like end_triangle), but it worked out nicely.

Self-organising sequence of numbers with big amount of operations on it - best data structure

I need help with choosing right data structure to my exercise.
For input I have been given number of operations that should be executed(t) and after that indexed sequence of natural numbers seperated with a space.
So for example:
3 1 2 3
This means that there will be 3 operations executed on sequence {1,2,3}.
There is also defined a pointer, which shows current position. Operations on that sequence that I should implement are:
R -> removing element c on index PTR+1 and moving PTR forward c times
X -> inserting right after element c, which is on index PTR (so inserting on PTR+1), element with value of c-1 and of course moving PTR forward c times.
My job is to find ending sequence after performing operations R and X t times so that if its element is even then do the R, else do the X. At the start PTR shows first element(if exists) and it should be all in cycle.
For given example at the start of post the output should be:
0 0 3 1
I know that it might sound confusing, so let me show you how this should work step by step.
t = 1
Starting sequence:
1 2 3
Actual position: PTR -> 1
Operation: X, c=1
Ending sequence:
1 0 2 3
Ending position: PTR -> 0
t = 2
Starting sequence:
1 0 2 3
Actual position: PTR -> 0
Operation: R, c=2
Ending sequence:
1 0 3
Ending position: PTR -> 1
t = 3
Starting sequence:
1 0 3
Actual position: PTR -> 1
Operation: X, c=1
Ending sequence:
1 0 0 3
Ending position: PTR -> 0
The solution is a sequence from PTR in right direction. So output should be: 0 0 3 1
As for circumscriptions:
starting length of C sequence up to 10^7
number of t operations up to 10^7
moving PTR to right up to 10^9 times
I have created my algorithm, which is based on circular linked list. Works, but it's too slow for some tests. I'd be more than grateful if someone could help me in finding the best data structure.
I've got also a hint from my teacher that I should use binary list, but to be honest I didn't find anything about this structure on the internet! Maybe also someone knows this thing and show me where can I search for information about it? I'd apreciate any help.
Use a circular doubly linked list. It has O(1) insert and remove. (Perhaps this is what your instructor meant by "binary list.")
Fun fact: you can reduce the memory utilization of a doubly linked list with an XOR trick with code here. Lower memory utilization will mean better speed for large lists due to better cache behavior. There's also a SO Q&A on XOR linked lists, which itemizes some of its drawbacks.
The data structure depends very much on those available in your chosen implementation language. In most, simply use an array, and adjust the index with the modulus operator. That will give you the needed circularity.
This assumes that your chosen language has basic insert & delete methods, or slicing to facilitate doing it on your own.
R -> removing element c on index PTR+1 and moving PTR forward c times
Looks like a single-linked list: PTR points to a list element, we need to remove the next element and advance PTR pointer c times.
X -> inserting right after element c, which is on index PTR (so inserting on PTR+1), element with value of c-1 and of course moving PTR forward c times.
Also looks like a single-linked list. PTR points to a list element, we insert next element with value c-1 and advance PTR pointer c times.
#dominosam The answer you marked as correct is wrong in fact. There is no need to traverse back the list, so all you need is a circular single-linked list like described, for example, on Wikipedia.

Shift elements to the left of a SIMD register based on boolean mask [duplicate]

This question already has answers here:
AVX2 what is the most efficient way to pack left based on a mask?
(6 answers)
Closed 6 months ago.
This question is related to this: Optimal uint8_t bitmap into a 8 x 32bit SIMD "bool" vector
I would like to create an optimal function with this signature:
__m256i PackLeft(__m256i inputVector, __m256i boolVector);
The desired behaviour is that on an input of 64bit int like this:
inputVector = {42, 17, 13, 3}
boolVector = {true, false, true, false}
It masks all values that have false in the boolVector and then repacks the values that remain to the left. On the output above, the return value should be:
{42, 13, X, X}
... Where X is "I don't care".
An obvious way to do this is the use _mm_movemask_epi8 to get a 8 byte int out of the bool vector, look up the shuffle mask in a table and then do a shuffle with the mask.
However, I would like to avoid a lookup table if possible. Is there a faster solution?
This is covered quite well by Andreas Fredriksson in his 2015 GDC talk: https://deplinenoise.files.wordpress.com/2015/03/gdc2015_afredriksson_simd.pdf
Starting on slide 104, he covers how to do this using only SSSE3 and then using just SSE2.
Just saw this problem - perhaps u have already solved it, but am still writing the logic for other programmers who may need to handle this situation.
The solution (in Intel ASM format) is given below. It consists of three steps :
Step 0 : convert the 8 bit mask into a 64 bit mask, with each set bit in the original mask represented as a 8 set bits in the expanded mask.
Step 1 : Use this expanded mask to extract the relevant bits from the source data
Step 2: Since you require the data to be left packed, we shift the output by appropriate number of bits.
The code is as below :
; Step 0 : convert the 8 bit mask into a 64 bit mask
xor r8,r8
movzx rax,byte ptr mask_pattern
mov r9,rax ; save a copy of the mask - avoids a memory read in Step 2
mov rcx,8 ; size of mask in bit count
outer_loop :
shr al,1 ; get the least significant bit of the mask into CY
setnc dl ; set DL to 0 if CY=1, else 1
dec dl ; if mask lsb was 1, then DL is 1111, else it sets to 0000
shrd r8,rdx,8
loop outer_loop
; We get the mask duplicated in R8, except it now represents bytewise mask
; Step 1 : we extract the bits compressed to the lowest order bit
mov rax,qword ptr data_pattern
pext rax,rax,r8
; Now we do a right shift, as right aligned output is required
popcnt r9,r9 ; get the count of bits set in the mask
mov rcx,8
sub cl,r9b ; compute 8-(count of bits set to 1 in the mask)
shl cl,3 ; convert the count of bits to count of bytes
shl rax,cl
;The required data is in RAX
Trust this helps

circular byte shifting in an array

I'm coding an LED display (7x48) and the language I'm working in is BASIC (no former experience in that language, but in C/C++) and I have a small issue.
I have an array (red[20] of byte) and an example of a current state is:
to make it easier here lets say its red[3]
10011010 01011100 01011101
and now i need to shift the array by 1 so in next cycle its supposed to be
00110100 10111000 10111011
so what happened is that the whole array shifted for 1 bit to left
the BASIC I'm working with doesn't have any .NET APIs so I need the total low level code (doesn't have to be BASIC, I can translate it, I just need an idea how to do it as I'm limited to 8KB code memory so I have to fully optmize it)
If most significant bit is 1:
subtract value of most significant bit
multiply by 2
add 1
otherwise:
multiply by 2
You should be able to use bit shift operations:
http://msdn.microsoft.com/en-us/library/2d9yb87a.aspx
Let x be the element you want to shift:
x = (x<<1) | (x>>23)
or in general, if you want to shift left by y bits and there are a total of n bits:
x = (x<<y) | (x>>(n-y))
I don't know basic well, but here's what I would do in a C++/Java/C# language:
Assuming you have red[] of length n:
int b = 32; //Number of bits per byte (your example showed 24, but usually there are 32)
int y = 1; //Number of bytes to shift to the left
int carry = 0; //The bytes to carry over (I'm assuming that they move up the array from red[0] to red[1], etc.
for (int i=0;i<n;i++)
{
int newCarry = (red[i]>>(n-y));
red[i] = (red[i]<<y) | carry;
carry = newCarry;
}
//Complete the loop
red[0]|=carry;

bit vector implementation of set in Programming Pearls, 2nd Edition

On Page 140 of Programming Pearls, 2nd Edition, Jon proposed an implementation of sets with bit vectors.
We'll turn now to two final structures that exploit the fact that our sets represent integers. Bit vectors are an old friend from Column 1. Here are their private data and functions:
enum { BITSPERWORD = 32, SHIFT = 5, MASK = 0x1F };
int n, hi, *x;
void set(int i) { x[i>>SHIFT] |= (1<<(i & MASK)); }
void clr(int i) { x[i>>SHIFT] &= ~(1<<(i & MASK)); }
int test(int i) { return x[i>>SHIFT] &= (1<<(i & MASK)); }
As I gathered, the central idea of a bit vector to represent an integer set, as described in Column 1, is that the i-th bit is turned on if and only if the integer i is in the set.
But I am really at a loss at the algorithms involved in the above three functions. And the book doesn't give an explanation.
I can only get that i & MASK is to get the lower 5 bits of i, while i>>SHIFT is to move i 5 bits toward the right.
Anybody would elaborate more on these algorithms? Bit operations always seem a myth to me, :(
Bit Fields and You
I'll use a simple example to explain the basics. Say you have an unsigned integer with four bits:
[0][0][0][0] = 0
You can represent any number here from 0 to 15 by converting it to base 2. Say we have the right end be the smallest:
[0][1][0][1] = 5
So the first bit adds 1 to the total, the second adds 2, the third adds 4, and the fourth adds 8. For example, here's 8:
[1][0][0][0] = 8
So What?
Say you want to represent a binary state in an application-- if some option is enabled, if you should draw some element, and so on. You probably don't want to use an entire integer for each one of these- it'd be using a 32 bit integer to store one bit of information. Or, to continue our example in four bits:
[0][0][0][1] = 1 = ON
[0][0][0][0] = 0 = OFF //what a huge waste of space!
(Of course, the problem is more pronounced in real life since 32-bit integers look like this:
[0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0] = 0
The answer to this is to use a bit field. We have a collection of properties (usually related ones) which we will flip on and off using bit operations. So, say, you might have 4 different lights on a piece of hardware that you want to be on or off.
3 2 1 0
[0][0][0][0] = 0
(Why do we start with light 0? I'll explain this in a second.)
Note that this is an integer, and is stored as an integer, but is used to represent multiple states for multiple objects. Crazy! Say we turn lights 2 and 1 on:
3 2 1 0
[0][1][1][0] = 6
The important thing you should note here: There's probably no obvious reason why lights 2 and 1 being on should equal six, and it may not be obvious how we would do anything with this scheme of information storage. It doesn't look more obvious if you add more bits:
3 2 1 0
[1][1][1][0] = 0xE \\what?
Why do we care about this? Do we have exactly one state for each number between 0 and 15?How are we going to manage this without some insane series of switch statements? Ugh...
The Light at the End
So if you've worked with binary arithmetic a bit before, you might realize that the relationship between the numbers on the left and the numbers on the right is, of course, base 2. That is:
1*(23) + 1*(22) + 1*(21) +0 *(20) = 0xE
So each light is present in the exponent of each term of the equation. If the light is on, there is a 1 next to its term- if the light is off, there is a zero. Take the time to convince yourself that there is exactly one integer between 0 and 15 that corresponds to each state in this numbering scheme.
Bit operators
Now that we have this done, let's take a second to see what bitshifting does to integers in this setup.
[0][0][0][1] = 1
When you shift bits to the left or the right in an integer, it literally moves the bits left and right. (Note: I 100% disavow this explanation for negative numbers! There be dragons!)
1<<2 = 4
[0][1][0][0] = 4
4>>1 = 2
[0][0][1][0] = 2
You will encounter similar behavior when shifting numbers represented with more than one bit. Also, it shouldn't be hard to convince yourself that x>>0 or x<<0 is just x. Doesn't shift anywhere.
This probably explains the naming scheme of the Shift operators to anyone who wasn't familiar with them.
Bitwise operations
This representation of numbers in binary can also be used to shed some light on the operations of bitwise operators on integers. Each bit in the first number is xor-ed, and-ed, or or-ed with its fellow number. Take a second to venture to wikipedia and familiarize yourself with the function of these Boolean operators - I'll explain how they function on numbers but I don't want to rehash the general idea in great detail.
...
Welcome back! Let's start by examining the effect of the OR (|) operator on two integers, stored in four bit.
OR OPERATOR ON:
[1][0][0][1] = 0x9
[1][1][0][0] = 0xC
________________
[1][1][0][1] = 0xD
Tough! This is a close analogue to the truth table for the boolean OR operator. Notice that each column ignores the adjacent columns and simply fills in the result column with the result of the first bit and the second bit OR'd together. Note also that the value of anything or'd with 1 is 1 in that particular column. Anything or'd with zero remains the same.
The table for AND (&) is interesting, though somewhat inverted:
AND OPERATOR ON:
[1][0][0][1] = 0x9
[1][1][0][0] = 0xC
________________
[1][0][0][0] = 0x8
In this case we do the same thing- we perform the AND operation with each bit in a column and put the result in that bit. No column cares about any other column.
Important lesson about this, which I invite you to verify by using the diagram above: anything AND-ed with zero is zero. Also, equally important- nothing happens to numbers that are AND-ed with one. They stay the same.
The final table, XOR, has behavior which I hope you all find predictable by now.
XOR OPERATOR ON:
[1][0][0][1] = 0x9
[1][1][0][0] = 0xC
________________
[0][1][0][1] = 0x5
Each bit is being XOR'd with its column, yadda yadda, and so on. But look closely at the first row and the second row. Which bits changed? (Half of them.) Which bits stayed the same? (No points for answering this one.)
The bit in the first row is being changed in the result if (and only if) the bit in the second row is 1!
The one lightbulb example!
So now we have an interesting set of tools we can use to flip individual bits. Let's go back to the lightbulb example and focus only on the first lightbulb.
0
[?] \\We don't know if it's one or zero while coding
We know that we have an operation that can always make this bit equal to one- the OR 1 operator.
0|1 = 1
1|1 = 1
So, ignoring the rest of the bulbs, we could do this
4_bit_lightbulb_integer |= 1;
and know for sure that we did nothing but set the first lightbulb to ON.
3 2 1 0
[0][0][0][?] = 0 or 1? \\4_bit_lightbulb_integer
[0][0][0][1] = 1
________________
[0][0][0][1] = 0x1
Similarly, we can AND the number with zero. Well- not quite zero- we don't want to affect the state of the other bits, so we will fill them in with ones.
I'll use the unary (one-argument) operator for bit negation. The ~ (NOT) bitwise operator flips all of the bits in its argument. ~(0X1):
[0][0][0][1] = 0x1
________________
[1][1][1][0] = 0xE
We will use this in conjunction with the AND bit below.
Let's do 4_bit_lightbulb_integer & 0xE
3 2 1 0
[0][1][0][?] = 4 or 5? \\4_bit_lightbulb_integer
[1][1][1][0] = 0xE
________________
[0][1][0][0] = 0x4
We're seeing a lot of integers on the right-hand-side which don't have any immediate relevance. You should get used to this if you deal with bit fields a lot. Look at the left-hand side. The bit on the right is always zero and the other bits are unchanged. We can turn off light 0 and ignore everything else!
Finally, you can use the XOR bit to flip the first bit selectively!
3 2 1 0
[0][1][0][?] = 4 or 5? \\4_bit_lightbulb_integer
[0][0][0][1] = 0x1
________________
[0][1][0][*] = 4 or 5?
We don't actually know what the value of * is now- just that flipped from whatever ? was.
Combining Bit Shifting and Bitwise operations
The interesting fact about these two operations is when taken together they allow you to manipulate selective bits.
[0][0][0][1] = 1 = 1<<0
[0][0][1][0] = 2 = 1<<1
[0][1][0][0] = 4 = 1<<2
[1][0][0][0] = 8 = 1<<3
Hmm. Interesting. I'll mention the negation operator here (~) as it's used in a similar way to produce the needed bit values for ANDing stuff in bit fields.
[1][1][1][0] = 0xE = ~(1<<0)
[1][1][0][1] = 0xD = ~(1<<1)
[1][0][1][1] = 0xB = ~(1<<2)
[0][1][1][1] = 0X7 = ~(1<<3)
Are you seeing an interesting relationship between the shift value and the corresponding lightbulb position of the shifted bit?
The canonical bitshift operators
As alluded to above, we have an interesting, generic method for turning on and off specific lights with the bit-shifters above.
To turn on a bulb, we generate the 1 in the right position using bit shifting, and then OR it with the current lightbulb positions. Say we want to turn on light 3, and ignore everything else. We need to get a bit shifting operation that ORs
3 2 1 0
[?][?][?][?] \\all we know about these values at compile time is where they are!
and 0x8
[1][0][0][0] = 0x8
Which is easy, thanks to bitshifting! We'll pick the number of the light and switch the value over:
1<<3 = 0x8
and then:
4_bit_lightbulb_integer |= 0x8;
3 2 1 0
[1][?][?][?] \\the ? marks have not changed!
And we can guarantee that the bit for the 3rd lightbulb is set to 1 and that nothing else has changed.
Clearing a bit works similarly- we'll use the negated bits table above to, say, clear light 2.
~(1<<2) = 0xB = [1][0][1][1]
4_bit_lightbulb_integer & 0xB:
3 2 1 0
[?][?][?][?]
[1][0][1][1]
____________
[?][0][?][?]
The XOR method of flipping bits is the same idea as the OR one.
So the canonical methods of bit switching are this:
Turn on the light i:
4_bit_lightbulb_integer|=(1<<i)
Turn off light i:
4_bit_lightbulb_integer&=~(1<<i)
Flip light i:
4_bit_lightbulb_integer^=(1<<i)
Wait, how do I read these?
In order to check a bit we can simply zero out all of the bits except for the one we care about. We'll then check to see if the resulting value is greater than zero- since this is the only value that could possibly be nonzero, it will make the entire integer nonzero if and only if it is nonzero. For example, to check bit 2:
1<<2:
[0][1][0][0]
4_bit_lightbulb_integer:
[?][?][?][?]
1<<2 & 4_bit_lightbulb_integer:
[0][?][0][0]
Remember from the previous examples that the value of ? didn't change. Remember also that anything AND 0 is 0. So, we can say for sure that if this value is greater than zero, the switch at position 2 is true and the lightbulb is zero. Similarly, if the value is off, the value of the entire thing will be zero.
(You can alternately shift the entire value of 4_bit_lightbulb_integer over by i bits and AND it with 1. I don't remember off the top of my head if one is faster than the other but I doubt it.)
So the canonical checking function:
Check if bit i is on:
if (4_bit_lightbulb_integer & 1<<i) {
\\do whatever
}
The specifics
Now that we have a complete set of tools for bitwise operations, we can look at the specific example here. This is basically the same idea- except a much more concise and powerful way of executing it. Let's look at this function:
void set(int i) { x[i>>SHIFT] |= (1<<(i & MASK)); }
From the canonical implementation I'm going to make a guess that this is trying to set some bits to 1! Let's take an integer and look at what's going on here if i feed the value 0x32 (50 in decimal) into i:
x[0x32>>5] |= (1<<(0x32 & 0x1f))
Well, that's a mess.. let's dissect this operation on the right. For convenience, pretend there are 24 more irrelevant zeros, since these are both 32 bit integers.
...[0][0][0][1][1][1][1][1] = 0x1F
...[0][0][1][1][0][0][1][0] = 0x32
________________________
...[0][0][0][1][0][0][1][0] = 0x12
It looks like everything is being cut off at the boundary on top where 1s turn into zeros. This technique is called Bit Masking. Interestingly, the boundary here restricts the resulting values to be between 0 and 31... Which is exactly the number of bit positions we have for a 32 bit integer!
x[0x32>>5] |= (1<<(0x12))
Let's look at the other half.
...[0][0][1][1][0][0][1][0] = 0x32
Shift five bits to the right:
...[0][0][0][0][0][0][0][1] = 0x01
Note that this transformation exactly destroyed all information from the first part of the function- we have 32-5 = 27 remaining bits which could be nonzero. This indicates which of 227 integers in the array of integers are selected. So the simplified equation is now:
x[1] |= (1<<0x12)
This just looks like the canonical bit-setting operation! We've just chosen
So the idea is to use the first 27 bits to pick an integer to shift and the last five bits indicate which bit of the 32 in that integer to shift.
The key to understanding what's going on is to recognize that BITSPERWORD = 2SHIFT. Thus, x[i>>SHIFT] finds which 32-bit element of the array x has the bit corresponding to i. (By shifting i 5 bits to the right, you're simply dividing by 32.) Once you have located the correct element of x, the lower 5 bits of i can then be used to find which particular bit of x[i>>SHIFT] corresponds to i. That's what i & MASK does; by shifting 1 by that number of bits, you move the bit corresponding to 1 to the exact position within x[i>>SHIFT] that corresponds to the ith bit in x.
Here's a bit more of an explanation:
Imagine that we want capacity for N bits in our bit vector. Since each int holds 32 bits, we will need (N + 31) / 32 int values for our storage (that is, N/32 rounded up). Within each int value, we will adopt the convention that bits are ordered from least significant to most significant. We will also adopt the convention that the first 32 bits of our vector are in x[0], the next 32 bits are in x[1], and so forth. Here's the memory layout we are using (showing the bit index in our bit vector corresponding to each bit of memory):
+----+----+-------+----+----+----+
x[0]: | 31 | 30 | . . . | 02 | 01 | 00 |
+----+----+-------+----+----+----+
x[1]: | 63 | 62 | . . . | 34 | 33 | 32 |
+----+----+-------+----+----+----+
etc.
Our first step is to allocate the necessary storage capacity:
x = new int[(N + BITSPERWORD - 1) >> SHIFT]
(We could make provision for dynamically expanding this storage, but that would just add complexity to the explanation.)
Now suppose we want to access bit i (either to set it, clear it, or just to know its current value). We need to first figure out which element of x to use. Since there are 32 bits per int value, this is easy:
subscript for x = i / 32
Making use of the enum constants, the x element we want is:
x[i >> SHIFT]
(Think of this as a 32-bit-wide window into our N-bit vector.) Now we have to find the specific bit corresponding to i. Looking at the memory layout, it's not hard to figure out that the first (rightmost) bit in the window corresponds to bit index 32 * (i >> SHIFT). (The window starts afteri >> SHIFT slots in x, and each slot has 32 bits.) Since that's the first bit in the window (position 0), then the bit we're interested in is is at position
i - (32 * (i >> SHIFT))
in the windows. With a little experimenting, you can convince yourself that this expression is always equal to i % 32 (actually, that's one definition of the mod operator) which, in turn, is always equal to i & MASK. Since this last expression is the fastest way to calculate what we want, that's what we'll use.
From here, the rest is pretty simple. We start with a single bit in the least-significant position of the window (that is, the constant 1), and move it to the left by i & MASK bits to get it to the position in the window corresponding to bit i in the bit vector. This is where the expression
1 << (i & MASK)
comes from. With the bit now moved to where we want it, we can use this as a mask to set, clear, or query the value of the bit at that position in x[i>>SHIFT] and we know that we're actually setting, clearing, or querying the value of bit i in our bit vector.
If you store your bits in an array of n words you can imagine them to be layed out as a matrix with n rows and 32 columns (BITSPERWORD):
3 0
1 0
0 xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx
1 xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx
2 xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx
....
n xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx
To get the k-th bit you divide k by 32. The (integer) result will give you the row (word) the bit is in, the reminder will give you which bit is within the word.
Dividing by 2^p can be done simply by shifting p postions to the right. The reminder can be obtained by getting the p rightmost bits (i.e the bitwise AND with (2^p - 1)).
In C terms:
#define div32(k) ((k) >> 5)
#define mod32(k) ((k) & 31)
#define word_the_bit_is_in(k) div32(k)
#define bit_within_word(k) mod32(k)
Hope it helps.

Resources