Pointer increment difference b/w 32-bit and 64-bit - winapi

I was trying to run some drivers coded for 32-bit vista (x86) on 64-bit win7 (amd64) and it was not running. After a lot of debugging, and hit-and-trial, I made it to work on the latter, but I don't know the reason why it's working. This is what I did:
At many places, buffer pointers pointed to an array of structures(different at different places), and to increment them, at some places this type of statement was used:
ptr = (PVOID)((PCHAR)ptr + offset);
And at some places:
ptr = (PVOID)((ULONG)ptr + offset);
The 2nd one was returning garbage, so I changed them all to 1st one. But I found many sample drivers on the net following the second one. My questions:
Where are these macros
defined(google didn't help much)?
I understand all the P_ macros are
pointers, why was a pointer casted
to ULONG? How does this work on
32-bit?
PCHAR obviously changes the
width according to the environment. Do you know any place to find documentation for this?

they should be defined in WinNT.h (they are in the SDK; don't have the DDK at hand)
ULONG is unsigned long; on a 32-bit system, this is the size of a pointer. So a pointer
can be converted back and forth to ULONG without loss - but not so on a 64-bit system
(where casting the value will truncate it). People cast to ULONG to get byte-base pointer
arithmetic (even though this has undefined behavior, as you found out)
Pointer arithmetic always works in units of the underlying type, i.e. in CHARs for PCHAR; this equates to bytes arithmetic
Any C book should elaborate on the precise semantics of pointer arithmetic.

The reason this code fails on 64-bit is that it is casting pointers to ULONG. ULONG is a 32-bit value while pointers on 64-bit are 64-bit values. So you will be truncating the pointer whenever you use the ULONG cast.
The PCHAR cast, assuming PCHAR is defined as char * is fine, provided the intention is to increment the pointer by an explicit number of bytes.
Both macros have the same intention but only one of them is valid where pointers are larger than 32-bits.
Pointer arithmetic works like this. If you have:
T *p;
and you do:
p + n;
(where n is a number), then the value of p will change by n * sizeof(T).
To give a concrete example, if you have a pointer to a DWORD:
DWORD *pdw = &some_dword_in_memory;
and you add one to it:
pdw = pdw + 1;
then you will be pointing to the next DWORD. The address pdw points to will have increased by sizeof(DWORD), i.e. 4 bytes.
The macros you mention are using casts to cause the address offsets they apply to be multiplied by different amounts. This is normally only done in low-level code which has been passed a BYTE (or char or void) buffer but knows the data inside it is really some other type.

ULONG is defined in WinDef.h in Windows SDK and is always 32-bit, so when you cast a 64-bit pointer into ULONG you truncate the pointer to 32 bits.

Related

Why does unsafe.Sizeof return a uintptr?

As per the documentation (https://golang.org/pkg/unsafe/#Sizeof) unsafe.Sizeof returns the size of the given expression in bytes. A size of any given expression can ideally be denoted by a uint32 or uint64. Then why does Golang return a uintptr instead? Isn't that confusing? A uintptr is supposed to hold a pointer to some data value but in this case it is not actually a pointer it is just a number right?
There are a lot of good answers in the comments, which boil down to "because that's big enough, yet not too big". I think, though, it might be helpful to view this from a historical perspective, with particular attention to how this all came about in the C programming language.
In very old (pre-standard) C, if you go far back enough in time, there was not even an explicit unsigned integer type. The PDP-11 had:
char, which was 8 bits and signed;
int, which was 16 bits and signed; and
pointers, which were 16 bits and unsigned.
That is:
int i;
int *u;
was how you made two integers, i being signed, and u being unsigned. Setting i to 32767 (0x7fff) and then incrementing it gave you -32768 (0x8000), which gradually increased to -1 (0xffff) and then zero. Setting u to 32767 and then incrementing it gave you 32768, which gradually increased to 65535, and then rolled over to zero.
The lack of distinction between integers and pointers meant that device drivers could read:
struct {
int csr;
int blk;
int bar;
int bcr;
};
0177440->bcr = count;
0177440->blk = block;
0177440->bar = addr;
0177440->csr = READ | GO;
which might be how one told a device to read some bytes or blocks.
(This is also why struct member names, like st_ino in struct stat, were all prefixed like this: st_ino just meant "some integer offset" and you could use the st_ino member with any pointer, or even with an ordinary variable. The prefix meant you could #include multiple headers without having their struct member names collide.)
All of this turned untenable when C was made to work on 32-bit and other machines. C grew an unsigned integer type, rather than pressing pointers into service as unsigned integers, and Steve Johnson's PCC compiler turned unsigned into a modifier, that could be applied to char and short as well as int. A lot of experimentation occurred. Eventually, in 1989, C was first standardized with most of the syntax and semantics that we have now (though new standards have added new types, and many functions, and so on).
Some of the early C pioneers were involved with creating Go, with particular influence from Ken Thompson. There is a quote on the Wikipedia page that is appropriate here:
When the three of us [Thompson, Rob Pike, and Robert Griesemer] got started, it was pure research. The three of us got together and decided that we hated C++. [laughter] ... [Returning to Go,] we started off with the idea that all three of us had to be talked into every feature in the language, so there was no extraneous garbage put into the language for any reason.
As we see from the early days of C, a pointer-as-integer is a suitable unsigned type that can not only hold any pointer, but, if treated as unsigned, can also hold any object size. A pointer-as-integer is not directly usable as a pointer, of course, and with a GC system and concurrency, we need the language itself to have pointers. But we also need to be able to write the runtime support for the language,1 for which we need integer-ized pointers, which also covers all of our needs for object sizes. So one type, built in to the compiler, covers all the requirements. That is as simple as possible, but no simpler.
1I say "we" as if I had anything to do with it. It's just obvious, once you have implemented a few runtime systems.

What is the size of a DIBSECTION?

I am looking to find the size of a device independent bitmap structure for use with GetObject in the windows API. I have an hBitmap. GetObject says that to get information about the hBitmap, I can either send a buffer with the size of a Bitmap structure or the size of a DIBSection. I don't know the exact sizes for a BITMAP and DIBSECTION struct are, can any one let me know what they are on both 32-bit and 64-bit systems?
You don't need to do any math manually. Simply declare a DIBSECTION variable, and then pass a pointer to it to GetObject() along with sizeof() as the size of that variable, eg:
DIBSECTION dib;
GetObject(hBitmap, sizeof(dib), &dib);
I took out a piece of paper and added up everything myself.
A DIBSECTION contains 5 parts.
typedef struct tagDIBSECTION {
BITMAP dsBm;
BITMAPINFOHEADER dsBmih;
DWORD dsBitfields[3];
HANDLE dshSection;
DWORD dsOffset;
} DIBSECTION, *LPDIBSECTION, *PDIBSECTION;
So let's start with BITMAP.
typedef struct tagBITMAP {
LONG bmType;
LONG bmWidth;
LONG bmHeight;
LONG bmWidthBytes;
WORD bmPlanes;
WORD bmBitsPixel;
LPVOID bmBits;
} BITMAP, *PBITMAP, *NPBITMAP, *LPBITMAP;
A LONG is just an int which is 4 bytes. A WORD is a unsigned short which is 2 bytes. And LPVOID is a ptr.
4+4+4+4+2+2 = 20. But wait, a struct has to be aligned properly. So we need to test divisibility by 8 on 64-bit systems. 20 is not divisible by 8, so we add 4 bytes of padding to get 24. Adding the ptr gives us 32.
The size of the BITMAPINFOHEADER is 40 bytes. It's divisible by 8, so nothing fancy needed. We're at 72 now.
Back to the DIBSECTION. There's an array of DWORDs. And each DWORD is an unsigned int. Adding 12 to 72 gives us 84.
Now there's a handle. A handle is basically a pointer, whose value can be 4 or 8 depending on 32 or 64 bit. Time to check if 84 is divisible by 8. It's not so we add 4 bytes of padding to get 88. Then add the pointer to get 96.
Finally there's the last DWORD and the total reaches 100 on a 64-bit system.
But what about sizeof()?????? Can't you just do sizeof(DIBSECTION)? After all magic numbers = bad. Ken White said in the comments that I didn't need to do any math. I disagree with this. First, as a programmer, it's essential to understand what is happening and why. Nothing could be more elementary than memory on a computer. Second, I only tagged the post as winapi. For the people reading this, if you scroll down on the GetObject page, the function is exported on Gdi32.dll. Any windows program has access to Gdi32.dll. Not every windows program has access to sizeof(). Third, it may be important for people who need to know the math to have the steps shown. Not everyone programs in a high level language. It might even be a question on an exam.
Perhaps the real question is if a struct of size 100 gets padded to 104 when memory is being granted on a 64-bit system.

Pointers to static variables must respect canonical form?

Assuming I have the following example:
struct Dummy {
uint64_t m{0llu};
template < class T > static uint64_t UniqueID() noexcept {
static const uint64_t uid = 0xBEA57;
return reinterpret_cast< uint64_t >(&uid);
}
template < class T > static uint64_t BuildID() noexcept {
static const uint64_t id = UniqueID< T >()
// dummy bits for the sake of example (whole last byte is used)
| (1llu << 60llu) | (1llu << 61llu) | (1llu << 63llu);
return id;
}
// Copy bits 48 through 55 over to bits 56 through 63 to keep canonical form.
uint64_t GetUID() const noexcept {
return ((m & ~(0xFFllu << 56llu)) | ((m & (0xFFllu << 48llu)) << 8llu));
}
uint64_t GetPayload() const noexcept {
return *reinterpret_cast< uint64_t * >(GetUID());
}
};
template < class T > inline Dummy DummyID() noexcept {
return Dummy{Dummy::BuildID< T >()};
}
Knowing very well that the resulting pointer is an address to a static variable in the program.
When I call GetUID() do I need to make sure that bit 47 is repeated till bit 63?
Or I can just AND with a mask of the lower 48 bits and ignore this rule.
I was unable to find any information about this. And I assume that those 16 bits are likely to always be 0.
This example is strictly limited to x86_64 architecture (x32).
In user-space code for mainstream x86-64 OSes, you can normally assume that the upper bits of any valid address are zero.
AFAIK, all the mainstream x86-64 OSes use a high-half kernel design where user-space addresses are always in the lower canonical range.
If you wanted this code to work in kernel code, too, you would want to sign-extend with x <<= 16; x >>= 16; using signed int64_t x.
If the compiler can't keep 0x0000FFFFFFFFFFFF = (1ULL<<48)-1 around in a register across multiple uses, 2 shifts might be more efficient anyway. (mov r64, imm64 to create that wide constant is a 10-byte instruction that can sometimes be slow to decode or fetch from the uop cache.) But if you're compiling with -march=haswell or newer, then BMI1 is available so the compiler can do mov eax, 48 / bzhi rsi, rdi, rax. Either way, though, one AND or BZHI is only 1 cycle of critical path latency for the pointer vs. 2 for 2 shifts. Unfortunately BZHI isn't available with an immediate operand. (x86 bitfield instructions mostly suck compared to ARM or PowerPC.)
Your current method of extracting bits [55:48] and using them to replace the current bits [63:56] is probably slower because the compiler has to mask out the old high byte and then OR in the new high byte. That's already at least 2 cycle latency so you might as well just shift, or mask which can be faster.
x86 has crap bitfield instructions so that was never a good plan. Unfortunately ISO C++ doesn't provide any guaranteed arithmetic right shift, but on all actual x86-64 compilers, >> on a signed integer is a 2's complement arithmetic shift. If you want to be really careful about avoiding UB, do the left shift on an unsigned type to avoid signed integer overflow.
int64_t is guaranteed to be a 2's complement type with no padding if it exists.
I think int64_t is actually a better choice than intptr_t, because if you have 32-bit pointers, e.g. the Linux x32 ABI (32-bit pointers in x86-64 long mode), your code might still Just Work, and casting a uint64_t to a pointer type will simply discard the upper bits. So it doesn't matter what you did to them, and zero-extension first will hopefully optimize away.
So your uint64_t member would just end up storing a pointer in the low 32 and your tag bits in the high 32, somewhat inefficiently but still working. Maybe check sizeof(void*) in a template to select an implementation?
Future proofing
x86-64 CPUs with 5-level page tables for 57-bit canonical addresses are probably coming at some point soonish, to allow use of large memory mapped non-volatile storage like Optane / 3DXPoint NVDIMMs.
Intel has already published a proposal for a PML5 extension https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf (see https://en.wikipedia.org/wiki/Intel_5-level_paging for a summary). There's already support for it in the Linux kernel so it's ready for the appearance of actual HW.
(I can't find out if it's expected in Ice Lake or not.)
See also Why in 64bit the virtual address are 4 bits short (48bit long) compared with the physical address (52 bit long)? for more about where the 48-bit virtual address limit comes from.
So you can still use the high 7 bits for tagged pointers and maintain compat with PML5.
If you assume user-space, then you can use the top 8 bits and zero-extend, because you're assuming the 57th bit (bit 56) = 0.
Redoing sign- (or zero-) extension of the low bits was already optimal, we're just changing it to a different width that only re-extends the bits we disturb. And we're disturbing few enough high bits that it should be future proof even on systems that enable PML5 mode and use wide virtual addresses.
On a system with 48-bit virtual addresses, broadcasting bit 57 to the upper 7 still works, because bit 57 = bit 48. And if you don't disturb those lower bits, they don't need to be re-written.
And BTW, your GetUID() returns an integer. It's not clear why you need that to return the static address.
And BTW, it may be cheaper for it to return &uid (just a RIP-relative LEA) than to load + re-canonicalize your m member value. Move static const uint64_t uid = 0xBEA57; to a static member variable instead of being within one member function.

What is dwLowDateTime and dwHighDateTime

I know they are variables in the FileTime struct, but what is the low-order and high-order part of the file time?
Older compilers did not have support for 64 bit types. So the structure splits the 64 bit value into two 32 bit parts. The low part contains the least significant 32 bits. The high part contains the most significant 32 bits.
So if you have the two 32 bit parts, the corresponding 64 bit value is
low + 2^32 * high
The officially santioned way to get a 64 bit value from the two 32 bit parts is via the ULARGE_INTEGER union.
From the FILETIME documentation:
It is not recommended that you add and subtract values from the FILETIME structure to obtain relative times. Instead, you should copy the low- and high-order parts of the file time to a ULARGE_INTEGER structure, perform 64-bit arithmetic on the QuadPart member, and copy the LowPart and HighPart members into the FILETIME structure.
Do not cast a pointer to a FILETIME structure to either a ULARGE_INTEGER* or __int64* value because it can cause alignment faults on 64-bit Windows.
That is legacy stuff. The point was to have 64-bit value by having couple of 32-bit values. So afterwords you'll end up doing:
FILETIME ft;
// get time here
__int64 fileTime64;
memcpy( &fileTime64, &ft, sizeof( __int64 ) );
Or, as Microsoft wants you to do it:
FILETIME ft;
// get time here
ULARGE_INTEGER ul;
ul.LowPart = ft.dwLowDateTime;
ul.HighPart = ft.dwHighDateTime;
__int64 fileTime64 = ul.QuadPart;

Difference in integer size for 64-bit system(confuse with my old 32-bit pc system)

Few months ago i get myself a laptop with cpu intel i7-2630qm with a 64-bit windows. While practising my programming skils under this system , I encountered some difference in terms of integer size which makes me think that it's probably due to my new 64-bit system.
Let's take a look at a code.
The C Code :
#include <stdio.h>
int main(void)
{
int num = 20;
printf("%d %lld\n" , num , num);
return 0;
}
The Question :
1.) I remember before getting this new laptop , which mean that time i'm still using my old 32-bit system , when i run this code , the program will print the integer 20 while some random number next to it due to the %lld specifier.
2.)But this phenomena no longer happen when i'm using my new laptop , it will instead print both integer correctly , even if i change the variable num to type short.
3.)Is it on a 64-bit system , there's new integer promotion which will promote int to long long when it's use as an argument??Or is it short integer can be promoted to long long which is 64-bit too when pass as an argument??
4.)Besides that I'm quite confuse with one thing , on 16-bit system , int would be 16-bit and it would be 32-bit when it's on a 32-bit system.But why isn't it become 64-bit when it's on a 64-bit??
==================================================================================
Addon :
1.)I choose "console program(64-bit)" as my project on the IDE while using my new laptop but "console program" on my 32-bit old PC system.
2.)I've check the size of int under "console program(64-bit)" project using sizeof operator and it returns 32-bit while short still remain 16-bit.The only change is long type , it's 64-bit and long long still remain its usual 64-bit size.
You are seeing this side-effect because the calling convention is different for x64 code. The function arguments in 32-bit x86 code are passed on the stack. The printf() function will read a word from the stack that isn't part of the activation frame. The odds that it contains a value of 0 are extremely low.
In x64 code, the first 4 arguments for a function are passed through cpu registers, not the stack. The odds that the high word of the 64-bit register is zero by chance are quite good. Left there by a previous 64-bit operation that worked with small numbers. But certainly not guaranteed.
Trying to reason out the defined behavior of undefined behavior is otherwise not useful. Other than trying to guess how the language is implemented for the core that's in your machine. There are better resources for that. Learning the machine code that's applicable to your compiler is an excellent shortcut. Together with the decent debugger that shows you how your C code got translated into machine code. Machine code has no undefined behavior.
I do not have access to an windows 64-bit compiler right now, but my guess is the following.
Your question is not about integer promotion, but regarding how parameters are passed from the function caller to the called function. This is beyond the C specification, but it is interesting to know.
In 32-bit, all parameters are divided into 32-bit blocks as all registers can hold 32 bits. So in this case we have the following stack layout:
[ 32-bit format string pointer ][ num as 32-bit ][ num as 32-bit ] junk...
In 64-bit, all parameters are divided into 64-bit blocks as all registers can hold 64 bits. So the stack will contain the following:
[ 64-bit format string pointer ][ num as 64-bit ][ num as 64-bit ] junk...
The upper 32 bits of the 64-bit registers holding 32-bit values are conveniently set to zero.
So when printf is reading a 64-bit number, it will load the equivalent of two 32-bit registers on a 32-bit platform but only one 64-bit register, with high bits cleared, on a 64-bit platform.
(1 and 2) As already stated, the behaviour in this situation is undefined, so the compiler is allowed to behave differently for any reason or indeed no reason at all.
(3) The compiler is allowed to define int as 64-bit, in which case no promotion would be necessary because all the variables in question would be the same size. But it almost certainly doesn't.
(4) On most or all 64-bit compilers, int is 32-bits. This is because int has been 32 bits for so long that programmers have come to expect it and changing it would break existing code. As far as I know this isn't officially part of the standard, but it's one of those de-facto standards that are even harder to change. :-)
Everything you are describing is specific to whatever spec your compiler is using and the platform you are on (with the exception that long is guaranteed to be at least the same size as int):
Wikipedia entries:
long long
int
The c99 standard seeks to end this ambiguity by adding specific types; int32_t, uint64_t, etc. There's also a POSIX spec that defines u_int32_t, etc.
Edit: I missed the question about printf(), sorry. As #nos points out in the comments on your question, passing something other than a long long to %lld results in undefined behavior. This means there is no rhyme or reason as to what it will do; unicorns spontaneously appearing would not be out of the question.
Oh - and on every compiler and OS I know, int is 32 bit. Changing that has the potential to break things that depend on it being 32 bit.

Resources