CArray MFC Serialization multiplatform, 16, 32 and 64 bit

CArray MFC Serialization multiplatform, 16, 32 and 64 bit - windows

I'm working on very old legacy code and I'm porting it from 32 to 64 bit.
One of the things where I'm struggling was about the MFC serialization. One of the difference between 32 and 64 bit was the size of pointer data. This means, for example, that if for some reason I have serialized the size of a CArray like
ar << m_array.GetSize();
the data was different between 32 and 64 platform because GetSize return a INT_PTR. To get serialize data fully compatible with the same application compiled in 32 and 64 bit, I forced the data type in the storing phase, and the same on reading. (pretty sure 32 bit are enough for this data)
store
ar << (int)m_array.GetSize();
reading
int iNumSize = 0;
ar >> iNumSize ;
In other word, the application, does't matter if compiled in 32 or 64 bits, serialize this data like as int.
Now I have one doubt about the serialization of the CArray type; to serialize a CArray the code use the built CArchive serialization
//defined as CArray m_arrayVertex; on .h
m_arrayVertex.Serialize(ar);
and this Serialize is defined in the MFC file afxtemp.h with this template
template<class TYPE, class ARG_TYPE>
void CArray<TYPE, ARG_TYPE>::Serialize(CArchive& ar)
{
ASSERT_VALID(this);
CObject::Serialize(ar);
if (ar.IsStoring())
{
ar.WriteCount(m_nSize);
}
else
{
DWORD_PTR nOldSize = ar.ReadCount();
SetSize(nOldSize, -1);
}
SerializeElements<TYPE>(ar, m_pData, m_nSize);
}
where (afx.h)
// special functions for reading and writing (16-bit compatible) counts
DWORD_PTR ReadCount();
void WriteCount(DWORD_PTR dwCount);
Here my question: ReadCount and WriteCount use the DWORD_PTR that have different size between platforms... this kind of serialization is compatible at 32/64 bit or, due to the size change, the serialized data work only in each platform respectively?
I mean the data can be read by both the 32 and 64 application without errors? the comment say it works also for "16 bit" and I not found anything in details about this serialization.
If this does't work, there is a workaround to serialize the CArray in such a way the data are fully compatible with both 32 and 64 app?
Edit: Both of answer are good. I simply accept as the solution the first come. Many thanks to both, hope can help someone else!

As you have written, ReadCount returns a DWORD_PTR which is either 32 bit or 64 bits wide depending if the code has been compiled as 32 or 64 bit code.
Now as long as the actual object count fits into 32 bits, there is no problem with interoperability between files that have been written by a 32 bit or a 64 bit program.
On the other hand if your 64 bit code serializes a CArray that has more than 4294967295 elements (which is unlikely to happen anyway), then you will run into trouble if you want to read deserialize this file from a 32 bit program. But on a 32 bit program a CArray cannot store more than 4294967295 anyway.
Long story short meaning, you don't need to do anything special, just serialize/deserialize your data.

Storage and retrieval of the item count for CArray instantiations are implemented in CArchive::WriteCount and CArchive::ReadCount, respectively.
They write and read a 16-bit (WORD), 32-bit (DWORD), or 64-bit (on 64-bit platforms, DWORD_PTR) value to or from the stream. Writing uses the following algorithm:
If the item count is less than 0xFFFF, write the item count as a 16-bit WORD value
Otherwise, dump an "invalid value" marker ((WORD)0xFFFF) into the stream, followed by
32-bit: The item count as a 32-bit value (DWORD)
64-bit: If the item count is less than 0xFFFF'FFFF, write the item count as a 32-bit DWORD value
Otherwise, dump an "invalid value" marker ((DWORD)0xFFFFFFFF) into the stream, followed by the item count as a 64-bit value (DWORD_PTR)
The stream layout is summarized in the following table depending on the item count in the CArray (where ❌ denotes a value that's not present in the stream):
Item count n
WORD
DWORD
DWORD_PTR
n < 0xFFFF
n
❌
❌
0xFFFF <= n < 0xFFFF'FFFF
0xFFFF
n
❌
n == 0xFFFF'FFFF (32-bit only)
0xFFFF
0xFFFF'FFFF
❌
0xFFFF'FFFF <= n (64-bit only)
0xFFFF
0xFFFF'FFFF
n
When deserializing the stream the code reads the item count value, checks to see if it matches the "invalid value" marker, and continues with larger values if a marker was found.
This works across bitnesses as long as the CArray holds no more than 0xFFFF'FFFE values. For 32-bit platforms this is always true; you cannot have a CArray that uses up the entire address space.
When serializing from a 64-bit process you just need to make sure that there aren't any more than 0xFFFF'FFFE items in the array.
Summary:
For CArrays with less than 0xFFFF'FFFF (4294967295) items, the serialized stream is byte-for-byte identical regardless of whether it was created on a 32-bit platform or a 64-bit platform.
There's the odd corner case of a CArray with exactly 0xFFFF'FFFF items on a 32-bit platform1. If that were to be streamed out and read back in on a 64-bit platform, the size field in the stream would be mistaken for the "invalid value" marker, with catastrophic consequences. Luckily, that is not something we need to worry about. 32-bit processes cannot allocate containers that are a multiple of available address space in size.
That covers the scenario where a stream serialized on a 32-bit platform is consumed on a 64-bit platform. Everything works as designed, in practice.
On to the other direction then: A stream created on a 64-bit platform to be deserialized on a 32-bit platform. The only relevant disagreement here is containers larger than what a 32-bit program could even represent. The 64-bit serializer will drop an "invalid value" marker (DWORD) followed by the actual item count (DWORD_PTR)2. The 32-bit deserializer will assume that the marker (0xFFFF'FFFF) is the true item count, and fail the subsequent memory allocation without ever looking at the actual item count. Things are torn down from there using whatever exception handling is in place, before any data corruption can happen3.
This is not a novel error mode, unique to cross-bitness interoperability, though. A CArray serialized on a 32-bit platform can fail to be deserialized on a 32-bit platform just as well, if the process runs out of resources. This can happen far earlier than running out of memory, since CArrays need contiguous memory.
1 Line 3 in the table above.
2 Line 4 in the table above.
3 This is assuming there's no catch(...) up the call stack that just keeps ignoring.

Related

How do I retrieve high and low-order parts of a value from two registers in inline assembly?

I'm currently working on a little game that can run from the boot sector of a hard drive, just for something fun to do. This means my program runs in 16-bit real mode, and I have my compiler flags set up to emit pure i386 code. I'm writing the game in C++, but I do need a lot of inline assembly to talk to the BIOS via interrupt calls. Some of these calls return a 32-bit integer, but stored in two 16-bit registers. Currently I'm doing the following to get my number out of the assembly:
auto getTicks = [](){
uint16_t ticksL{ 0 }, ticksH{ 0 };
asm volatile("int $0x1a" : "=c"(ticksH), "=d"(ticksL) : "a"(0x0));
return static_cast<uint32_t>( (ticksH << 16) | ticksL );
};
This is a lambda function I use to call this interrupt function which returns a tick count. I'm aware that there are better methods to get time data, and that I haven't implemented a check for AL to see if midnight has passed, but that's another topic.
As you can see, I have to use two 16-bit values, get the register values separately, then combine them into a 32-bit number the way you see at the return statement.
Is there any way I could retrieve that data into a single 32-bit number in my code right away avoid the shift and bitwise-or? I know that those 16-bit registers I'm accessing are really just the higher and lower 16-bits of a 32-bit register in reality, but I have no idea how to access the original 32-bit register as a whole.

I know that those 16-bit registers I'm accessing are really just the higher and lower 16-bits of a 32-bit register in reality, but I have no idea how to access the original 32-bit register as a whole.
As Jester has already pointed out, these are in fact 2 separate registers, so there is no way to retrieve "the original 32-bit register."
One other point: That interrupt modifies the ax register (returning the 'past midnight' flag), however your asm doesn't inform gcc that you are changing ax. Might I suggest something like this:
asm volatile("int $0x1a" : "=c"(ticksH), "=d"(ticksL), "=a"(midnight) : "a"(0x0));
Note that midnight is also a uint16_t.

As other answers suggest you can't load DX and CX directly into a 32-bit register. You'd have to combine them as you suggest.
In this case there is an alternative. Rather than using INT 1Ah/AH=0h you can read the BIOS Data Area (BDA) in low memory for the 32-bit DWORD value and load it into a 32-bit register. This is allowed in real mode on i386 processors. Two memory addresses of interest:
40:6C dword Daily timer counter, equal to zero at midnight;
incremented by INT 8; read/set by INT 1A
40:70 byte Clock rollover flag, set when 40:6C exceeds 24hrs
These two memory addresses are in segment:offset format, but would be equivalent to physical address 0x0046C and 0x00470.
All you'd have to do is temporarily set the DS register to 0 (saving the previous value), turn off interrupts with CLI retrieve the values from lower memory using C/C++ pointers, re-enable interrupts with STI and restore DS to the previously saved value. This of course is added overhead in the boot sector compared to using INT 1Ah/AH=0h but would allow you direct access to the memory addresses the BIOS is reading/writing on your behalf.
Note: If DS is set to zero already no need to save/set/restore it. Since we don't see the code that sets up the environment before calling into the C++ code I don't know what your default segment values are. If you don't need to retrieve both the roll over and timer values and only wish to get them individually you can eliminate the CLI/STI.

You're looking for the 'A' constraint, which refers to the dx:ax register pair as a double-wide value. You can see the full set of defined constraints for x86 in the gcc documentation. Unfortunately there are no constraints for any other register pairs, so you have to get them as two values and reassemble them with shift and or, like you describe.

Thread-Ids in Windows greater than 0xFFFF

we have a big and old software project. This software runs in older days on an old OS, so it has an OS-Wrapper. Today it runs on windows.
In the OS-Wrapper we have structs to manage threads. One Member of this struct is the thread-Id, but it is defined with an uint16_t. The thread-Ids will be generated with the Win-API createThreadEx.
Since some month at one of our customers thread-Ids appears which are greater than
numeric_limits<uint16_t>::max()
We run in big troubles, if we try to change this member to an uint32_t. And even if we fix it, we had to test the fix.
So my question is: How is it possible in windows to get thread-Ids which are greater than 0xffff? How must be the circumstances to reach this?

Windows thread IDs are 32 bit unsigned integers, of type DWORD. There's no requirement for them to be less than 0xffff. Whatever thought process led you to that belief was flawed.
If you want to stress test your system to create a scenario where you have thread IDs that go above 0xffff then you simply need to create a large number of threads. To make this tenable, without running out of virtual address space, create threads with very small stacks. You can create the threads suspended too because you don't need the threads to do anything.
Of course, it might still be a little tricky to force the system to allocate that many threads. I found that my simple test application would not readily generate thread IDs above 0xffff when run as a 32 bit process, but would do so as a 64 bit process. You could certainly create a 64 bit process that would consume the low-numbered thread IDs and then allow your 32 bit process to go to work and so deal with lower numbered thread IDs.
Here's the program that I experimented with:
#include <Windows.h>
#include <iostream>
DWORD WINAPI ThreadProc(LPVOID lpParameter)
{
return 0;
}
int main()
{
for (int i = 0; i < 10000; i++)
{
DWORD threadID;
if (CreateThread(NULL, 64, ThreadProc, NULL, CREATE_SUSPENDED, &threadID) == NULL)
return 1;
std::cout << std::hex << threadID << std::endl;
}
return 0;
}

Re
” We run in big troubles, if we try to change this member to an uint32_t. And even if we fix it, we had to test the fix.
Your current software’s use of a 16-bit object to store a value that requires 32 bits, is a bug. So you have to fix it, and test the fix. There are at least two practical fixes:
Changing the declaration of the id, and all uses of it.
It can really help with finding all copying of the id, to introduce a dedicated type that is not implicitly convertible to integer, e.g. a C++11 based enumeration type.
Adding a layer of indirection.
Might be possible without changing the data, only changing the threading library implementation.
A deeper fix might be to replace the current threading with C++11 standard library threading.
Anyway you're up for a bit of work, and/or some cost.

using int64 type for snmp v2c oid?

I am debugging some snmp code for an integer overflow problem. Basically we use an integer to store disk/raid capacity in KB. However when a disk/raid of more than 2TB is used, it'll overflow.
I read from some internet forums that snmp v2c support integer64 or unsigned64. In my test it'll still just send the lower 32 bits even though I have set the type to integer64 or unsigned64.
Here is how I did it:
a standalone program will obtain the capacity and write the data to a file. example lines for raid capacity
my-sub-oid
Counter64
7813857280
/etc/snmp/snmpd.conf has a clause to pass thru the oids:
pass_persist mymiboid /path/to/snmpagent
in the mysnmpagent source, read the oidmap into oid/type/value structure from the file, and print to stdout.
printf("%s\n", it->first.c_str());
printf("%s\n", it->second.type.c_str());
printf("%s\n", it->second.value.c_str());
fflush(stdout);
use snmpget to get the sub-oid, and it returns:
mysuboid = Counter32: 3518889984
I use tcpdump and the last segment of the value portion is:
41 0500 d1be 0000
41 should be the tag, 05 should be the length, and the value is only carrying the lower 32-bit of the capacity. (note 7813857280 is 0x1.d1.be.00.00)
I do find that using string type would send correct value (in octetstring format). But I want to know if there is a way to use 64-bit integer in snmp v2c.
I am running NET-SNMP 5.4.2.1 though.
thanks a lot.
Update:
Found the following from snmpd.conf regarding pass (and probably also pass_persist) in net-snmp doc page. I guess it's forcing the Counter64 to Counter32.
Note:
The SMIv2 type counter64 and SNMPv2 noSuchObject exception are not supported.

You are supposed to use two Unsigned32 for lower and upper bytes of your large number.
Counter64 is not meant to be used for large numbers this way.
For reference : 17 Common MIB Design Errors (last one)

SNMP SMIv2 defines a new type Counter64,
https://www.rfc-editor.org/rfc/rfc2578#page-24
which is in fact unsigned 64 bit integer. So if your data fall into the range, using Counter64 is proper.
"In my test it'll still just send the lower 32 bits even though I have set the type to integer64 or unsigned64" sounds like a problem, but unless you show more details (like showing some code) on how you tested it out and received the result, nobody might help further.

CreateThread() fails on 64 bit Windows, works on 32 bit Windows. Why?

Operating System: Windows XP 64 bit, SP2.
I have an unusual problem. I am porting some code from 32 bit to 64 bit. The 32 bit code works just fine. But when I call CreateThread() for the 64 bit version the call fails. I have three places where this fails. 2 call CreateThread(). 1 calls beginthreadex() which calls CreateThread().
All three calls fail with error code 0x3E6, "Invalid access to memory location".
The problem is all the input parameters are correct.
HANDLE h;
DWORD threadID;
h = CreateThread(0, // default security
0, // default stack size
myThreadFunc, // valid function to call
myParam, // my param
0, // no flags, start thread immediately
&threadID);
All three calls to CreateThread() are made from a DLL I've injected into the target program at the start of the program execution (this is before the program has got to the start of main()/WinMain()). If I call CreateThread() from the target program (same params) via say a menu, it works. Same parameters etc. Bizarre.
If I pass NULL instead of &threadID, it still fails.
If I pass NULL as myParam, it still fails.
I'm not calling CreateThread from inside DllMain(), so that isn't the problem. I'm confused and searching on Google etc hasn't shown any relevant answers.
If anyone has seen this before or has any ideas, please let me know.
Thanks for reading.
ANSWER
Short answer: Stack Frames on x64 need to be 16 byte aligned.
Longer answer:
After much banging my head against the debugger wall and posting responses to the various suggestions (all of which helped in someway, prodding me to try new directions) I started exploring what-ifs about what was on the stack prior to calling CreateThread(). This proved to be a red-herring but it did lead to the solution.
Adding extra data to the stack changes the stack frame alignment. Sooner or later one of the tests gets you to 16 byte stack frame alignment. At that point the code worked. So I retraced my steps and started putting NULL data onto the stack rather than what I thought was the correct values (I had been pushing return addresses to fake up a call frame). It still worked - so the data isn't important, it must be the actual stack addresses.
I quickly realised it was 16 byte alignment for the stack. Previously I was only aware of 8 byte alignment for data. This microsoft document explains all the alignment requirements.
If the stackframe is not 16 byte aligned on x64 the compiler may put large (8 byte or more) data on the wrong alignment boundaries when it pushes data onto the stack.
Hence the problem I faced - the hooking code was called with a stack that was not aligned on a 16 byte boundary.
Quick summary of alignment requirements, expressed as size : alignment
1 : 1
2 : 2
4 : 4
8 : 8
10 : 16
16 : 16
Anything larger than 8 bytes is aligned on the next power of 2 boundary.
I think Microsoft's error code is a bit misleading. The initial STATUS_DATATYPE_MISALIGNMENT could be expressed as a STATUS_STACK_MISALIGNMENT which would be more helpful. But then turning STATUS_DATATYPE_MISALIGNMENT into ERROR_NOACCESS - that actually disguises and misleads as to what the problem is. Very unhelpful.
Thank you to everyone that posted suggestions. Even if I disagreed with the suggestions, they prompted me to test in a wide variety of directions (including the ones I disagreed with).
Written a more detailed description of the problem of datatype misalignment here: 64 bit porting gotcha #1! x64 Datatype misalignment.

The only reason that 64bit would make a difference is that threading on 64bit requires 64bit aligned values. If threadID isn't 64bit aligned, you could cause this problem.
Ok, that idea's not it. Are you sure it's valid to call CreateThread before main/WinMain? It would explain why it works in a menu- because that's after main/WinMain.
In addition, I'd triple-check the lifetime of myParam. CreateThread returns (this I know from experience) long before the function you pass in is called.
Post the thread routine's code (or just a few lines).
It suddenly occurs to me: Are you sure that you're injecting your 64bit code into a 64bit process? Because if you had a 64bit CreateThread call and tried to inject that into a 32bit process running under WOW64, bad things could happen.
Starting to seriously run out of ideas. Does the compiler report any warnings?
Could the bug be due to a bug in the host program, rather than the DLL? There's some other code, such as loading a DLL if you used __declspec(import/export), that occurs before main/WinMain. If that DLLMain, for example, had a bug in it.

I ran into this issue today. And I checked every argument feed into _beginthread/CreateThread/NtCreateThread via rohitab's Windows API Monitor v2. Every argument is aligned properly (AFAIK).
So, where does STATUS_DATATYPE_MISALIGNMENT come from?
The first few lines of NtCreateThread validate parameters passed from user mode.
ProbeForReadSmallStructure (ThreadContext, sizeof (CONTEXT), CONTEXT_ALIGN);
for i386
#define CONTEXT_ALIGN (sizeof(ULONG))
for amd64
#define STACK_ALIGN (16UI64)
...
#define CONTEXT_ALIGN STACK_ALIGN
On amd64, if the ThreadContext pointer is not aligned to 16 bytes, NtCreateThread will return STATUS_DATATYPE_MISALIGNMENT.
CreateThread (actually CreateRemoteThread) allocated ThreadContext from stack, and did nothing special to guarantee the alignment requirement is satisfied. Things will work smoothly if every piece of your code followed Microsoft x64 calling convention, which unfortunately not true for me.
PS: The same code may work on newer Windows (say Vista and newer). I didn't check though. I'm facing this issue on Windows Server 2003 R2 x64.

I'm in the business of using parallel threads under windows
for calculations. No funny business, no dll-calls, and certainly
no call-back's. The following works in 32 bits windows. I set up the stack for my calculation, well within the area reserved for my program.
All releveant data about area's and start addresses is contained in
a data structure that is passed to CreateThread as parameter 3.
The address that is called contains a small assembler routine
that uses this data stucture.
Indeed this routine finds the address to return to on the stack,
then the address of the data structure.
There is no reason to go far into this. It just works and it calculates
the number of primes below 2,000,000,000 just fine, in one thread,
in two threads or in 20 threads.
Now CreateThread in 64 bits doesn't push the address of the data
structure. That seems implausible so I show you the smoking gun,
a dump of a debug session.
In the subwindow at the bottom right you see the stack, and
there is merely the return address, amidst a sea of zeroes.
The mechanism I use to fill in parameters is portable between 32 and 64 bits.
No other call exhibits a difference between word-sizes.
Moreover why would the code address work but not the data address?
The bottom line: one would expect that CreateThread passes the data parameter on the stack in the same way in 64 bits as in 32 bits, then does a subroutine call. At the assembler level it doesn't work that way. If there are any hidden requirements to e.g. RSP that are automatically fullfilled in C++ that would be very nasty.
P.S. No there are no 16 byte alignment problems. That lies ages behind me.

Try using _beginthread() or _beginthreadex() instead, you shouldn't be using CreateThread directly.
See this previous question.

DWORD_PTR, INT_PTR, LONG_PTR, UINT_PTR, ULONG_PTR When, How and Why?

I found that Windows has some new Windows Data Types
DWORD_PTR, INT_PTR, LONG_PTR, UINT_PTR, ULONG_PTR
can you tell me when, how and why to use them?

The *_PTR types were added to the Windows API in order to support Win64's 64bit addressing.
Because 32bit APIs typically passed pointers using data types like DWORD, it was necessary to create new types for 64 bit compatibility that could substitute for DWORD in 32bit applications, but were extended to 64bits when used in a 64bit applications.
So, for example, application developers who want to write code that works as 32bit OR 64bit the windows 32bit API SetWindowLong(HWND,int,LONG) was changed to SetWindowLongPtr(HWND,int,LONG_PTR)
In a 32bit build, SetWindowLongPtr is simply a macro that resolves to SetWindowLong, and LONG_PTR is likewise a macro that resolves to LONG.
In a 64bit build on the other hand, SetWindowLongPtr is an API that accepts a 64bit long as its 3rd parameter, and ULONG_PTR is typedef for unsigned __int64.
By using these _PTR types, one codebase can compile for both Win32 and Win64 targets.
When performing pointer arithmetic, these types should also be used in 32bit code that needs to be compatible with 64bit.
so, if you need to access an array with more than 4billion elements, you would need to use an INT_PTR rather than an INT
CHAR* pHuge = new CHAR[0x200000000]; // allocate 8 billion bytes
INT idx;
INT_PTR idx2;
pHuge[idx]; // can only access the 1st 4 billion elements.
pHuge[idx2]; // can access all 64bits of potential array space.

Chris Becke is pretty much correct. Its just worth noting that these _PTR types are just types that are 32-bits wide on a 32-bit app and 64-bits wide on a 64-bit app. Its as simple as that.
You could easily use __int3264 instead of INT_PTR for example.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio