How is this code assuming little endian is being used? - endianness

In an article explaining little versus big endian it said the folowing code was making the assumption that it was running on a little endian machine.
The reason it says for the assumption is "The switching of the bytes is being assumed in the 'C' structure." I don't understand where the assumption is.
struct
{
WORD y;
WORD x;
} POS;
lparam = (DWORD) POS;

Think about it like this, x is (0x1234) and y is (0x5678) and the intention is to have lparam be (0x12345678).
The code from the example will cause lparam to be (0x78 0x56 0x34 0x12) on a little endian machine, which is (0x12345678) as intended.
However on a big endian machine lparam will be (0x56 0x78 0x12 0x34) which is (0x56781234). Therefore this code was written with the assumption that it was for little endian.

Related

What is the size of a DIBSECTION?

I am looking to find the size of a device independent bitmap structure for use with GetObject in the windows API. I have an hBitmap. GetObject says that to get information about the hBitmap, I can either send a buffer with the size of a Bitmap structure or the size of a DIBSection. I don't know the exact sizes for a BITMAP and DIBSECTION struct are, can any one let me know what they are on both 32-bit and 64-bit systems?
You don't need to do any math manually. Simply declare a DIBSECTION variable, and then pass a pointer to it to GetObject() along with sizeof() as the size of that variable, eg:
DIBSECTION dib;
GetObject(hBitmap, sizeof(dib), &dib);
I took out a piece of paper and added up everything myself.
A DIBSECTION contains 5 parts.
typedef struct tagDIBSECTION {
BITMAP dsBm;
BITMAPINFOHEADER dsBmih;
DWORD dsBitfields[3];
HANDLE dshSection;
DWORD dsOffset;
} DIBSECTION, *LPDIBSECTION, *PDIBSECTION;
So let's start with BITMAP.
typedef struct tagBITMAP {
LONG bmType;
LONG bmWidth;
LONG bmHeight;
LONG bmWidthBytes;
WORD bmPlanes;
WORD bmBitsPixel;
LPVOID bmBits;
} BITMAP, *PBITMAP, *NPBITMAP, *LPBITMAP;
A LONG is just an int which is 4 bytes. A WORD is a unsigned short which is 2 bytes. And LPVOID is a ptr.
4+4+4+4+2+2 = 20. But wait, a struct has to be aligned properly. So we need to test divisibility by 8 on 64-bit systems. 20 is not divisible by 8, so we add 4 bytes of padding to get 24. Adding the ptr gives us 32.
The size of the BITMAPINFOHEADER is 40 bytes. It's divisible by 8, so nothing fancy needed. We're at 72 now.
Back to the DIBSECTION. There's an array of DWORDs. And each DWORD is an unsigned int. Adding 12 to 72 gives us 84.
Now there's a handle. A handle is basically a pointer, whose value can be 4 or 8 depending on 32 or 64 bit. Time to check if 84 is divisible by 8. It's not so we add 4 bytes of padding to get 88. Then add the pointer to get 96.
Finally there's the last DWORD and the total reaches 100 on a 64-bit system.
But what about sizeof()?????? Can't you just do sizeof(DIBSECTION)? After all magic numbers = bad. Ken White said in the comments that I didn't need to do any math. I disagree with this. First, as a programmer, it's essential to understand what is happening and why. Nothing could be more elementary than memory on a computer. Second, I only tagged the post as winapi. For the people reading this, if you scroll down on the GetObject page, the function is exported on Gdi32.dll. Any windows program has access to Gdi32.dll. Not every windows program has access to sizeof(). Third, it may be important for people who need to know the math to have the steps shown. Not everyone programs in a high level language. It might even be a question on an exam.
Perhaps the real question is if a struct of size 100 gets padded to 104 when memory is being granted on a 64-bit system.

Pointers to static variables must respect canonical form?

Assuming I have the following example:
struct Dummy {
uint64_t m{0llu};
template < class T > static uint64_t UniqueID() noexcept {
static const uint64_t uid = 0xBEA57;
return reinterpret_cast< uint64_t >(&uid);
}
template < class T > static uint64_t BuildID() noexcept {
static const uint64_t id = UniqueID< T >()
// dummy bits for the sake of example (whole last byte is used)
| (1llu << 60llu) | (1llu << 61llu) | (1llu << 63llu);
return id;
}
// Copy bits 48 through 55 over to bits 56 through 63 to keep canonical form.
uint64_t GetUID() const noexcept {
return ((m & ~(0xFFllu << 56llu)) | ((m & (0xFFllu << 48llu)) << 8llu));
}
uint64_t GetPayload() const noexcept {
return *reinterpret_cast< uint64_t * >(GetUID());
}
};
template < class T > inline Dummy DummyID() noexcept {
return Dummy{Dummy::BuildID< T >()};
}
Knowing very well that the resulting pointer is an address to a static variable in the program.
When I call GetUID() do I need to make sure that bit 47 is repeated till bit 63?
Or I can just AND with a mask of the lower 48 bits and ignore this rule.
I was unable to find any information about this. And I assume that those 16 bits are likely to always be 0.
This example is strictly limited to x86_64 architecture (x32).
In user-space code for mainstream x86-64 OSes, you can normally assume that the upper bits of any valid address are zero.
AFAIK, all the mainstream x86-64 OSes use a high-half kernel design where user-space addresses are always in the lower canonical range.
If you wanted this code to work in kernel code, too, you would want to sign-extend with x <<= 16; x >>= 16; using signed int64_t x.
If the compiler can't keep 0x0000FFFFFFFFFFFF = (1ULL<<48)-1 around in a register across multiple uses, 2 shifts might be more efficient anyway. (mov r64, imm64 to create that wide constant is a 10-byte instruction that can sometimes be slow to decode or fetch from the uop cache.) But if you're compiling with -march=haswell or newer, then BMI1 is available so the compiler can do mov eax, 48 / bzhi rsi, rdi, rax. Either way, though, one AND or BZHI is only 1 cycle of critical path latency for the pointer vs. 2 for 2 shifts. Unfortunately BZHI isn't available with an immediate operand. (x86 bitfield instructions mostly suck compared to ARM or PowerPC.)
Your current method of extracting bits [55:48] and using them to replace the current bits [63:56] is probably slower because the compiler has to mask out the old high byte and then OR in the new high byte. That's already at least 2 cycle latency so you might as well just shift, or mask which can be faster.
x86 has crap bitfield instructions so that was never a good plan. Unfortunately ISO C++ doesn't provide any guaranteed arithmetic right shift, but on all actual x86-64 compilers, >> on a signed integer is a 2's complement arithmetic shift. If you want to be really careful about avoiding UB, do the left shift on an unsigned type to avoid signed integer overflow.
int64_t is guaranteed to be a 2's complement type with no padding if it exists.
I think int64_t is actually a better choice than intptr_t, because if you have 32-bit pointers, e.g. the Linux x32 ABI (32-bit pointers in x86-64 long mode), your code might still Just Work, and casting a uint64_t to a pointer type will simply discard the upper bits. So it doesn't matter what you did to them, and zero-extension first will hopefully optimize away.
So your uint64_t member would just end up storing a pointer in the low 32 and your tag bits in the high 32, somewhat inefficiently but still working. Maybe check sizeof(void*) in a template to select an implementation?
Future proofing
x86-64 CPUs with 5-level page tables for 57-bit canonical addresses are probably coming at some point soonish, to allow use of large memory mapped non-volatile storage like Optane / 3DXPoint NVDIMMs.
Intel has already published a proposal for a PML5 extension https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf (see https://en.wikipedia.org/wiki/Intel_5-level_paging for a summary). There's already support for it in the Linux kernel so it's ready for the appearance of actual HW.
(I can't find out if it's expected in Ice Lake or not.)
See also Why in 64bit the virtual address are 4 bits short (48bit long) compared with the physical address (52 bit long)? for more about where the 48-bit virtual address limit comes from.
So you can still use the high 7 bits for tagged pointers and maintain compat with PML5.
If you assume user-space, then you can use the top 8 bits and zero-extend, because you're assuming the 57th bit (bit 56) = 0.
Redoing sign- (or zero-) extension of the low bits was already optimal, we're just changing it to a different width that only re-extends the bits we disturb. And we're disturbing few enough high bits that it should be future proof even on systems that enable PML5 mode and use wide virtual addresses.
On a system with 48-bit virtual addresses, broadcasting bit 57 to the upper 7 still works, because bit 57 = bit 48. And if you don't disturb those lower bits, they don't need to be re-written.
And BTW, your GetUID() returns an integer. It's not clear why you need that to return the static address.
And BTW, it may be cheaper for it to return &uid (just a RIP-relative LEA) than to load + re-canonicalize your m member value. Move static const uint64_t uid = 0xBEA57; to a static member variable instead of being within one member function.

C++ 11: Adding an int to a wchar_t

I naively added an int to a wchar_t resulting in a Visual Studio 2013 warning.
L'A' + 1 // next letter
warning C4244: 'argument' : conversion from 'int' to 'wchar_t', possible loss of data
So the error is concerned that a 4 byte int is being implicitly cast to a 2 byte wchar_t. Fair enough.
What is the C++ 11 standards safe way of doing this? I'm wondering about cross-platform implications, code-point correctness and readability of doing things like: L'A' + (wchar_t)1 or L'A' + \U1 or whatever. What are my coding options?
Edit T+2: I presented this question to a hacker's group. Unsurprisingly, no one got it correct. Everyone agreed this is a great interview question when hiring C/C++ Unicode programmers because it's very terse and deserves a meaty conversation.
When you add two integral values together, such that both values can fit within an int, they are added as ints.
If you require an unsigned int to fit one of them, they are instead added as unsigned ints.
If those are not big enough, bigger types can be used. It gets complicated, and it changes by standard revision if I remember rightly (there where some rough spots).
Now, addition with ints is unspecified if it overflows. Addition with unsigned ints is guaranteed to wrap mod some power of two.
When you convert an int or an unsigned int to a signed type, if it doesn't fit the result is unspecified. If it does fit, it fits.
If you convert an int or unsigned int to an unsigned type, the value that can be represented equal to the source mod some power of two (fixed for the given unsigned type) is the result.
Many popular C++ compilers and hardware return the same bit pattern for int as they would for unsigned int interpreted by 2s complement logic, but that is not required by the standard.
So L'A' + 1 involves converting L'A' to an int, adding 1 as an int.
If we add the missing bit:
wchar_t bob = L'A' + 1;
we can see where the warning occurs. The compiler sees someone converting an int to a wchar_t and warns them. (this makes more sense when the values in question are not compile time constants)
If we make it explicit:
wchar_t bob = static_cast<wchar_t>(L'A' + 1);
the warning (probably? hopefully?) goes away. So long as the right hand side results in being in the range of valid wchar_t values, you are golden.
If instead you are doing:
wchar_t bob = static_cast<wchar_t>(L'A' + x);
where x is an int, if wchar_t is signed you could be in trouble (unspecified result if x is large enough!), and if it unsigned you could still be somewhat surprised.
A nice thing about this static_cast method is that unlike (wchar_t)x or wchar_t(x) casts, it won't work if you accidentally feed pointers into the cast.
Note that casting x or 1 is relatively pointless, unless it quiets the compiler, as the values are always up-converted (logically) into ints prior to + operating (or unsigned ints if wchar_t is unsigned and the same size as an int). With int significantly larger than wchar_t this is relatively harmless if wchar_t is unsigned, as the back-conversion is guaranteed to do the same thing as adding in wchar_t mod its power of two, and if wchar_t is signed leaving the gamut gives an unspecified result anyhow.
So, cast the result using static_cast. If that doesn't work, use a bitmask to explicitly clear bits you won't care about.
Finally, VS2013 uses 2s complement math for int. So static_cast<wchar_t>(L'A' + x) and static_cast<wchar_t>( L'A' + static_cast<wchar_t>(x)) always produce the same values, and would do so if wchar_t was replaced with unsigned short or signed short.
This is a poor answer: it needs curation and culling. But I'm tired, and it might be illuminating.
Until I see a more elegant answer, which I hope there is, I'll go with this pattern:
(wchar_t)(L'A' + i)
I like this pattern because i can be negative or positive and it will evaluate as expected. My original notion to use L'A' + (wchar_t)i is flawed if i is negative and wchar_t is unsigned. I'm assuming here that wchar_t is implementation dependent and could be signed.

Pointer increment difference b/w 32-bit and 64-bit

I was trying to run some drivers coded for 32-bit vista (x86) on 64-bit win7 (amd64) and it was not running. After a lot of debugging, and hit-and-trial, I made it to work on the latter, but I don't know the reason why it's working. This is what I did:
At many places, buffer pointers pointed to an array of structures(different at different places), and to increment them, at some places this type of statement was used:
ptr = (PVOID)((PCHAR)ptr + offset);
And at some places:
ptr = (PVOID)((ULONG)ptr + offset);
The 2nd one was returning garbage, so I changed them all to 1st one. But I found many sample drivers on the net following the second one. My questions:
Where are these macros
defined(google didn't help much)?
I understand all the P_ macros are
pointers, why was a pointer casted
to ULONG? How does this work on
32-bit?
PCHAR obviously changes the
width according to the environment. Do you know any place to find documentation for this?
they should be defined in WinNT.h (they are in the SDK; don't have the DDK at hand)
ULONG is unsigned long; on a 32-bit system, this is the size of a pointer. So a pointer
can be converted back and forth to ULONG without loss - but not so on a 64-bit system
(where casting the value will truncate it). People cast to ULONG to get byte-base pointer
arithmetic (even though this has undefined behavior, as you found out)
Pointer arithmetic always works in units of the underlying type, i.e. in CHARs for PCHAR; this equates to bytes arithmetic
Any C book should elaborate on the precise semantics of pointer arithmetic.
The reason this code fails on 64-bit is that it is casting pointers to ULONG. ULONG is a 32-bit value while pointers on 64-bit are 64-bit values. So you will be truncating the pointer whenever you use the ULONG cast.
The PCHAR cast, assuming PCHAR is defined as char * is fine, provided the intention is to increment the pointer by an explicit number of bytes.
Both macros have the same intention but only one of them is valid where pointers are larger than 32-bits.
Pointer arithmetic works like this. If you have:
T *p;
and you do:
p + n;
(where n is a number), then the value of p will change by n * sizeof(T).
To give a concrete example, if you have a pointer to a DWORD:
DWORD *pdw = &some_dword_in_memory;
and you add one to it:
pdw = pdw + 1;
then you will be pointing to the next DWORD. The address pdw points to will have increased by sizeof(DWORD), i.e. 4 bytes.
The macros you mention are using casts to cause the address offsets they apply to be multiplied by different amounts. This is normally only done in low-level code which has been passed a BYTE (or char or void) buffer but knows the data inside it is really some other type.
ULONG is defined in WinDef.h in Windows SDK and is always 32-bit, so when you cast a 64-bit pointer into ULONG you truncate the pointer to 32 bits.

Endian conversion of signed ints

I am receiving big endian data over UDP and converting it to little endian. The source says the integers are signed but when I swap the bytes of the signed ints (specifically 16-bit) I get unrealistic values. When I swap them as unsigned ints I get what I expect. I suppose the source documentation could be incorrect and is actually sending unsigned 16-bit ints. But why would that matter? The values are all supposed to be positive and well under 16-bit INT_MAX so overflow should not be an issue. The only thing I can think of is that (1) the documentation is wrong AND (2) I am not handling the sign bit properly when I perform a signed endian swap.
I really have two questions:
1) When overflow is not an issue, does it matter whether I read into signed or unsigned ints.
2) Is endian swapping different between signed and unsigned values (i.e. does the sign bit need to be handled differently)?
I thought endian conversion looked the same for both signed and unsigned values, e.g. for 16-bit value = value&0xff00 >> 8 | value&0x00ff << 8.
Thanks
You are running into problems with sign extensions in your swap function. Instead of doing this:
value & 0xff00 >> 8 | value & 0x00ff << 8
do this:
((value >> 8) & 0x00ff) | ((value & 0x00ff) << 8)
The issue is that if value is a 16-bit signed value, then 0xabcd >> 8 is 0xffab. The most significant bit stays 1 if it starts out as 1 in a signed right shift.
Finally, instead of writing this function yourself you should use ntohs().

Resources