I have a crash dump of a user-mode Windows program and I want to emulate RtlDecodePointer(), that is, decode some pointer encoded with RtlEncodePointer(). How do I do that?

I studied disasm of ntdll!RtlDecodePointer and was able to compose the following WinDBG expression:
r $t0 = 86aaaa40`0007ff77 // put value to decoded here
r $t1 = dwo(ntdll!`RtlpGetCookieValue'::`2'::CookieValue)
r $t2 = #$t1 & 3f
r $t3 = (#$t0 >> (0x40 - #$t2)) | (#$t0 << #$t2)
.printf "Decoded pointer: %p\n", #$t3 ^ #$t1
Or, as a one-liner:
r $t0 = 86aaaa40`0007ff77 // put value to decoded here
r $t1 = dwo(ntdll!`RtlpGetCookieValue'::`2'::CookieValue); r $t2 = #$t1 & 3f; r $t3 = (#$t0 >> (0x40 - #$t2)) | (#$t0 << #$t2); .printf "Decoded pointer: %p\n", #$t3 ^ #$t1
This works well even on mini-dumps without full memory.


short way of ORing all flags

With the following:
const (
Flag1 = 0
Flag2 = uint64(1) << iota
is there a shorter/better way of doing, I want to OR them all:
const FlagAll = Flag1 | Flag2 | Flag3 | Flag4 | Flag5
Since all your flags contain a single 1 bit, shifting the bit to the left, FlagAll would be the next in the line but subtract 1:
const (
Flag1 = 0
Flag2 = uint64(1) << iota
FlagAll = uint64(1)<<iota - 1
Testing it:
fmt.Printf("%08b\n", Flag1)
fmt.Printf("%08b\n", Flag2)
fmt.Printf("%08b\n", Flag3)
fmt.Printf("%08b\n", Flag4)
fmt.Printf("%08b\n", Flag5)
fmt.Printf("%08b\n", FlagAll)
This will output (try it on the Go Playground):
Note that you get the same value if you left shift the last constant and subtract 1:
const FlagAll2 = Flag5<<1 - 1
But this requires to explicitly include the last constant (Flag5) while the first solution does not require it (you may add further flags like Flag6, Flag7..., and FlagAll will be right value without changing it).

Fast CRC32 algorithm for reversed bit order

I am working with a micro controller which calculates the CRC32 checksum of data I upload to it's flash memory on the fly. This can in turn be used to verify that the upload was correct, by verifying the resulting checksum after all data is uploaded.
The only problem is that the Micro Controller reverses the bit order of the input bytes when it's run through the otherwise standard crc32 calculation. This in turn means I need to reverse every byte in the data on the programming host in order to calculate the CRC32 sum to verify. As the programming host is somewhat constrained, this is quite slow.
I figure that if it's possible to modify the CRC32 lookuptable so I can do the lookup without having to reverse the bit order, the verification algorithm would run many times faster. But I seem unable to figure out a way to do this.
To clarify the byte reversal, I need to change the input bytes following way:
01 02 03 04 -> 80 40 C0 20
It's a lot easier to see the reversal in binary representation of course:
00000001 00000010 00000011 00000100 ->
10000000 01000000 11000000 00100000
Here is the PoC Python code I use to verify the correctness of the CRC32 calculation, however this reverses each byte (a.e the slow way).
I've also included my failed attempt to generate a permutated lookup table, and using a standard LUT CRC32 algorithm.
The code spits out the correct reference CRC value first, and then the wrong LUT calculated CRC afterwards.
import binascii
CRC32_POLY = 0xEDB88320
def reverse_byte_bits(x):
Reverses the bit order of the giveb byte 'x' and returns the result
x = ((x<<4) & 0xF0)|((x>>4) & 0x0F)
x = ((x<<2) & 0xCC)|((x>>2) & 0x33)
x = ((x<<1) & 0xAA)|((x>>1) & 0x55)
return x
def reverse_bits(ba, blen):
Reverses all bytes in the given array of bytes
bar = bytearray()
for i in range(0, blen):
return bar
def crc32_reverse(ba):
# Reverse all bits in the
bar = reverse_bits(ba, len(ba))
# Calculate the CRC value
return binascii.crc32(bar)
def gen_crc_table_msb():
crctable = [0] * 256
for i in range(0, 256):
remainder = i
for bit in range(0, 8):
if remainder & 0x1:
remainder = (remainder >> 1) ^ CRC32_POLY
remainder = (remainder >> 1)
# The correct index for the calculated value is the reverse of the index
ix = reverse_byte_bits(i)
crctable[ix] = remainder
return crctable
def crc32_revlut(ba, lut):
crc = 0xFFFFFFFF
for x in ba:
crc = lut[x ^ (crc & 0xFF)] ^ (crc >> 8)
return ~crc
# Reference test which gives the correct CRC
test = bytearray([1, 2, 3, 4, 5, 6, 7, 8])
crcrev = crc32_reverse(test)
print("0x%08X" % (crcrev & 0xFFFFFFFF))
# Test using permutated lookup table, but standard CRC32 LUT algorithm
lut = gen_crc_table_msb()
crctst = crc32_revlut(test, lut)
print("0x%08X" % (crctst & 0xFFFFFFFF))
Does anyone have any hints to how this could be done?
By reversing the logic of which way the crc "streams", the reverse in the main calculation can be avoided. So instead of crc >> 8 there would be crc << 8 and instead of XORing with the bottom byte of the crc for the LUT index we take the top. Like this:
def reverse_dword_bits(x):
Reverses the bit order of the given dword 'x' and returns the result
x = ((x<<16) & 0xFFFF0000)|((x>>16) & 0x0000FFFF)
x = ((x<<8) & 0xFF00FF00)|((x>>8) & 0x00FF00FF)
x = ((x<<4) & 0xF0F0F0F0)|((x>>4) & 0x0F0F0F0F)
x = ((x<<2) & 0xCCCCCCCC)|((x>>2) & 0x33333333)
x = ((x<<1) & 0xAAAAAAAA)|((x>>1) & 0x55555555)
return x
def gen_crc_table_msb():
crctable = [0] * 256
for i in range(0, 256):
remainder = i
for bit in range(0, 8):
if remainder & 0x1:
remainder = (remainder >> 1) ^ CRC32_POLY
remainder = (remainder >> 1)
# The correct index for the calculated value is the reverse of the index
ix = reverse_byte_bits(i)
crctable[ix] = reverse_dword_bits(remainder)
return crctable
def crc32_revlut(ba, lut):
crc = 0xFFFFFFFF
for x in ba:
crc = lut[x ^ (crc >> 24)] ^ ((crc << 8) & 0xFFFFFFFF)
return reverse_dword_bits(~crc)

How to understand .bitmap format encoding. Convert 1-bit bitmaps to images by pen-and-paper

I'm struggling to interpret and make-sense of the many bitmap format options, and my use case is so simple that I was hoping someone could point me in the right direction: how do I (mentally, with pen-and-paper) convert a raw .bitmap file to an array of pixels? All the other sites I could find either give a function or library to solve the problem computationally, or I can't understand the fine distinctions between the various formatting options (as well as my confusion between .bmp and .bitmap).
My image was drawn with a "13 X 11" pixel grid in GIMP, with index model set to 1-bit before being exported at a .bitmap file. The file is copied below, along with two ASCII representations of it: you should be able to see "73" in the middle along with some pixels on the top and bottom row in the pattern: 1101001000010.
#define seventythree_with_fibbonacci_spaces_pixels_width 13
#define seventythree_with_fibbonacci_spaces_pixels_height 11
static unsigned char seventythree_with_fibbonacci_spaces_pixels_bits[] = {
0x4b, 0x08, 0x3e, 0x07, 0xa0, 0x04, 0x30, 0x04, 0x10, 0x07, 0x18, 0x04,
0x0c, 0x04, 0x84, 0x04, 0x84, 0x03, 0x00, 0x00, 0x4b, 0x08 };
## # # #
##### ###
# # #
## #
# ###
## #
## #
# # #
# ###
## # # #
// where a 1 or # represents a black pixel square,
// and a 0 or space is a white/blank square.
Now my question is: what is the exact mathematical relationship between those 22 8-bit words [75, 8, 62, ...., 75, 8] and the original picture that I draw in GIMP?
I want to be able to draw simple images in GIMP, convert it to a plain grid or array of bits/bools, and then I can use that array to redraw the picture in a totally different context (video game maps to be precise, with the black pixels mapping to, e.g., walls).
I believe I solved it now: see this python script (I commented heavily, I know that's kind of a marmite decision for some people!). I was using python3 (3.6.5)
from sys import argv # argument values
from math import ceil
In GIMP, draw a WIDTH x HEIGHT Black & White picture:: Image/Mode >> Indexed...<use black and white (1 Bit) palette>:: export as .bitmap and choose the X10 format option.
This script can draw a simple ASCII image of it, or be used to get a binary
For testing, I used:
python3 Lev1_TD.bitmap
if len(argv) > 2:
saveFileName = argv[2]
saveFileName = "image_dump"
f = open(argv[1], "r")
l = [line for line in f.readlines()]
## print(l[0])
## print(l[1])
## print(l[2])
# print(l[3])
# print(l[4])
WIDTH = int(l[0].split()[-1])
HEIGHT = int(l[1].split()[-1])
# l[3:] is our plan
l = l[3:] # lines 0 and 1 were only useful for getting the image width (x) and height (y)
# line 2 is just "static unsigned short Lev1_TD_bits[] = {"
# print([l[-1]])
# [' 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0x000f };\n']
l[-1] = l[-1].strip()[:-3]
# print(l[-1])
l = [line.strip() for line in l]
# print(l[0]) #0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0x000f, 0xffff, 0xffff,
# print(l[-1]) #0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0x000f
l[-1] = l[-1] + ","
# Now every line has the same format (although l[-1] may have fewer elements)
#print(len(l[0])) #71
#print(len(l[2])) #71
#print(len(l[7])) #71
#print(len(l[-1])) #55
### NOTICE ALL OF THESE ARE === -1 IN MODULO 8 -- that's because they are missing a ' ' char at the end
# (8 == len("0xffff, "), if you wondered where '8' came from)
l = [line + " " for line in l]
l = [line.split(", ")[:-1] for line in l]
# l is now a list of lists. most of these lists have lengths of 72/8 (9), except the last, which has 7
# the elements of each of these sublists is a string like "0xffff" or "0x0000"
## the '[:-1]' clause is to cut-out the last element, which by construction will be the empty string '' and we don't want it
l = [ [ bin(int(s, 16))[2:] for s in l[i]] for i in range(len(l)) ]
# now the elements of these sublists are 16-bit strings such as "1111111111111111"
def mergelists(listOfLists):
l = []
for x in listOfLists:
l += x
return l
l = mergelists(l)
l = [i.zfill(16) for i in l]
def rev(s):
x = ""
for i in range(len(s)):
x += s[-1-i]
return x
n = ceil(WIDTH/16)
m = 16*n - WIDTH
assert len(l) == n*HEIGHT, "I believe this should always be true, else I've messed up somewhere; or maybe this isn't an X10 Format?"
def __aux_funct(l, n, i):
s = ""
for j in range(n):
s += rev(l[n*i + j])
return s
l = [ __aux_funct(l, n, i) for i in range(HEIGHT) ]
l = [line[:-m] for line in l]
##for line in l:
## print(line)
# seems to work just fine ^_^
f = open(saveFileName + ".binaryimage", "a")
for line in l:
# print(len(line)) ## they all seem to be the same width
print("") # newline
f.write(line + "\n")
for bit in line:
print(BLACK_BOX, end='') if bit == "1" else print(" ", end='')
# Beautiful <3

Go << and >> operators

Could someone please explain to me the usage of << and >> in Go? I guess it is similar to some other languages.
The super (possibly over) simplified definition is just that << is used for "times 2" and >> is for "divided by 2" - and the number after it is how many times.
So n << x is "n times 2, x times". And y >> z is "y divided by 2, z times".
For example, 1 << 5 is "1 times 2, 5 times" or 32. And 32 >> 5 is "32 divided by 2, 5 times" or 1.
From the spec at, it seems that at least with integers, it's a binary shift. for example, binary 0b00001000 >> 1 would be 0b00000100, and 0b00001000 << 1 would be 0b00010000.
Go apparently doesn't accept the 0b notation for binary integers. I was just using it for the example. In decimal, 8 >> 1 is 4, and 8 << 1 is 16. Shifting left by one is the same as multiplication by 2, and shifting right by one is the same as dividing by two, discarding any remainder.
The << and >> operators are Go Arithmetic Operators.
<< left shift integer << unsigned integer
>> right shift integer >> unsigned integer
The shift operators shift the left
operand by the shift count specified
by the right operand. They implement
arithmetic shifts if the left operand
is a signed integer and logical shifts
if it is an unsigned integer. The
shift count must be an unsigned
integer. There is no upper limit on
the shift count. Shifts behave as if
the left operand is shifted n times by
1 for a shift count of n. As a result,
x << 1 is the same as x*2 and x >> 1
is the same as x/2 but truncated
towards negative infinity.
They are basically Arithmetic operators and its the same in other languages here is a basic PHP , C , Go Example
package main
import (
func main() {
var t , i uint
t , i = 1 , 1
for i = 1 ; i < 10 ; i++ {
fmt.Printf("%d << %d = %d \n", t , i , t<<i)
t = 512
for i = 1 ; i < 10 ; i++ {
fmt.Printf("%d >> %d = %d \n", t , i , t>>i)
GO Demo
#include <stdio.h>
int main()
int t = 1 ;
int i = 1 ;
for(i = 1; i < 10; i++) {
printf("%d << %d = %d \n", t, i, t << i);
t = 512;
for(i = 1; i < 10; i++) {
printf("%d >> %d = %d \n", t, i, t >> i);
return 0;
C Demo
$t = $i = 1;
for($i = 1; $i < 10; $i++) {
printf("%d << %d = %d \n", $t, $i, $t << $i);
print PHP_EOL;
$t = 512;
for($i = 1; $i < 10; $i++) {
printf("%d >> %d = %d \n", $t, $i, $t >> $i);
PHP Demo
They would all output
1 << 1 = 2
1 << 2 = 4
1 << 3 = 8
1 << 4 = 16
1 << 5 = 32
1 << 6 = 64
1 << 7 = 128
1 << 8 = 256
1 << 9 = 512
512 >> 1 = 256
512 >> 2 = 128
512 >> 3 = 64
512 >> 4 = 32
512 >> 5 = 16
512 >> 6 = 8
512 >> 7 = 4
512 >> 8 = 2
512 >> 9 = 1
n << x = n * 2^x   Example: 3 << 5 = 3 * 2^5 = 96
y >> z = y / 2^z   Example: 512 >> 4 = 512 / 2^4 = 32
<< is left shift. >> is sign-extending right shift when the left operand is a signed integer, and is zero-extending right shift when the left operand is an unsigned integer.
To better understand >> think of
var u uint32 = 0x80000000;
var i int32 = -2;
u >> 1; // Is 0x40000000 similar to >>> in Java
i >> 1; // Is -1 similar to >> in Java
So when applied to an unsigned integer, the bits at the left are filled with zero, whereas when applied to a signed integer, the bits at the left are filled with the leftmost bit (which is 1 when the signed integer is negative as per 2's complement).
Go's << and >> are similar to shifts (that is: division or multiplication by a power of 2) in other languages, but because Go is a safer language than C/C++ it does some extra work when the shift count is a number.
Shift instructions in x86 CPUs consider only 5 bits (6 bits on 64-bit x86 CPUs) of the shift count. In languages like C/C++, the shift operator translates into a single CPU instruction.
The following Go code
x := 10
y := uint(1025) // A big shift count
println(x >> y)
println(x << y)
while a C/C++ program would print
In decimal math, when we multiply or divide by 10, we effect the zeros on the end of the number.
In binary, 2 has the same effect. So we are adding a zero to the end, or removing the last digit
<< is the bitwise left shift operator ,which shifts the bits of corresponding integer to the left….the rightmost bit being ‘0’ after the shift .
For example:
In gcc we have 4 bytes integer which means 32 bits .
like binary representation of 3 is
00000000 00000000 00000000 00000011
3<<1 would give
00000000 00000000 00000000 00000110 which is 6.
In general 1<<x would give you 2^x
In gcc
1<<20 would give 2^20 that is 1048576
but in tcc it would give you 0 as result because integer is of 2 bytes in tcc.
in simple terms we can take it like this in golang
n << x is "n times 2, x times". And y >> z is "y divided by 2, z times".
n << x = n * 2^x Example: 3<< 5 = 3 * 2^5 = 96
y >> z = y / 2^z Example: 512 >> 4 = 512 / 2^4 = 32
These are Right bitwise and left bitwise operators

How to get lg2 of a number that is 2^k

What is the best solution for getting the base 2 logarithm of a number that I know is a power of two (2^k). (Of course I know only the value 2^k not k itself.)
One way I thought of doing is by subtracting 1 and then doing a bitcount:
lg2(n) = bitcount( n - 1 ) = k, iff k is an integer
0b10000 - 1 = 0b01111, bitcount(0b01111) = 4
But is there a faster way of doing it (without caching)? Also something that doesn't involve bitcount about as fast would be nice to know?
One of the applications this is:
suppose you have bitmask
and value
and you are interested of
(value & bitmask) >> number of zeros in front of bitmask
(0b0101010101 & 0b0110111000) >> 3 = 0b100010
this can be done with
using bitcount
value & bitmask >> bitcount((bitmask - 1) xor bitmask) - 1
or using lg2
value & bitmask >> lg2(((bitmask - 1) xor bitmask) + 1 ) - 2
For it to be faster than bitcount without caching it should be faster than O(lg(k)) where k is the count of storage bits.
Yes. Here's a way to do it without the bitcount in lg(n), if you know the integer in question is a power of 2.
unsigned int x = ...;
static const unsigned int arr[] = {
// Each element in this array alternates a number of 1s equal to
// consecutive powers of two with an equal number of 0s.
0xAAAAAAAA, // 0b10101010.. // one 1, then one 0, ...
0xCCCCCCCC, // 0b11001100.. // two 1s, then two 0s, ...
0xF0F0F0F0, // 0b11110000.. // four 1s, then four 0s, ...
0xFF00FF00, // 0b1111111100000000.. // [The sequence continues.]
register unsigned int reg = (x & arr[0]) != 0;
reg |= ((x & arr[4]) != 0) << 4;
reg |= ((x & arr[3]) != 0) << 3;
reg |= ((x & arr[2]) != 0) << 2;
reg |= ((x & arr[1]) != 0) << 1;
// reg now has the value of lg(x).
In each of the reg |= steps, we successively test to see if any of the bits of x are shared with alternating bitmasks in arr. If they are, that means that lg(x) has bits which are in that bitmask, and we effectively add 2^k to reg, where k is the log of the length of the alternating bitmask. For example, 0xFF00FF00 is an alternating sequence of 8 ones and zeroes, so k is 3 (or lg(8)) for this bitmask.
Essentially, each reg |= ((x & arr[k]) ... step (and the initial assignment) tests whether lg(x) has bit k set. If so, we add it to reg; the sum of all those bits will be lg(x).
That looks like a lot of magic, so let's try an example. Suppose we want to know what power of 2 the value 2,048 is:
// x = 2048
// = 1000 0000 0000
register unsigned int reg = (x & arr[0]) != 0;
// reg = 1000 0000 0000
& ... 1010 1010 1010
= 1000 0000 0000 != 0
// reg = 0x1 (1) // <-- Matched! Add 2^0 to reg.
reg |= ((x & arr[4]) != 0) << 4;
// reg = 0x .. 0800
& 0x .. 0000
= 0 != 0
// reg = reg | (0 << 4) // <--- No match.
// reg = 0x1 | 0
// reg remains 0x1.
reg |= ((x & arr[3]) != 0) << 3;
// reg = 0x .. 0800
& 0x .. FF00
= 800 != 0
// reg = reg | (1 << 3) // <--- Matched! Add 2^3 to reg.
// reg = 0x1 | 0x8
// reg is now 0x9.
reg |= ((x & arr[2]) != 0) << 2;
// reg = 0x .. 0800
& 0x .. F0F0
= 0 != 0
// reg = reg | (0 << 2) // <--- No match.
// reg = 0x9 | 0
// reg remains 0x9.
reg |= ((x & arr[1]) != 0) << 1;
// reg = 0x .. 0800
& 0x .. CCCC
= 800 != 0
// reg = reg | (1 << 1) // <--- Matched! Add 2^1 to reg.
// reg = 0x9 | 0x2
// reg is now 0xb (11).
We see that the final value of reg is 2^0 + 2^1 + 2^3, which is indeed 11.
If you know the number is a power of 2, you could just shift it right (>>) until it equals 0. The amount of times you shifted right (minus 1) is your k.
Edit: faster than this is the lookup table method (though you sacrifice some space, but not a ton). See
Many architectures have a "find first one" instruction (bsr, clz, bfffo, cntlzw, etc.) which will be much faster than bit-counting approaches.
If you don't mind dealing with floats you can use log(x) / log(2).
