I was trying to figure out a port bug from win32 to win64 where a LB_GETSELITEMS message was returning a -1 in the 64 bit port, but not in the original 32 bit environment.
My head about exploded when I finally realized that the LB_GETSELITEMS requires that the lParam cannot be 2 byte aligned, but must be 8 byte (maybe 4 byte?) aligned.
I've not seen this documented anywhere. Does anyone know of any documentation related to this? Are there any other places where this is a problem?
Related
Test platform is windows 32bit.
I use IDA pro to disassemble a PE file, do some very tedious transform work, and re-assembly it into a new PE file.
But there is some difference in the re-assembled PE file and the original one if I use OllyDbg
to debug the new PE file (although there is no difference of this part in the assembly file I transformed)
Here is part of the original one:
See the
PUSH 8
PUSH 0
is correct.
Here is part of my new PE file:
See now the
PUSH 8
PUSH 0
is changed to
66:6A 08
66:6A 00
and it lead to the failure of the new PE's execution.
Basically, from what I have seen, it lead to the un-align of stack.
So does anyone know what is wrong with this part? I don't see any difference in the assembly code I transform....
Could anyone give me some help? Thank you!
66h is the operand-size override prefix. In 32-bit code, it switches the operand size to 16-bit from the default 32-bit. So what happens here is that the PUSH instruction pushes a 16-bit value on the stack instead of the 32-bit one, and the ESP is decremented by 2 instead of 4. That's why you get unbalanced stack after the call.
You should check your assembler's documentation to see how you can force 32-bit operand size for the PUSH imm instructions. Different assemblers use different conventions for that. For example, in NASM you'd probably use something like push dword 8.
It is a "prefix" opcode byte: See http://wiki.osdev.org/X86-64_Instruction_Encoding#Legacy_Prefixes
0x66 means "operand size override". Your code is apparantly operating in 32-bit mode; PUSH without the prefix will push a 32 bit value. I think what this does is cause the PUSH to fetch a 16 bit value, and push that as a 32 bit value on the stack. (I write a lot of assembly code, and have never had need to do that).
After much trial and error, I still have some trouble understanding why the assembly syntax used in my textbook caused so many issues when using Windows 8.
.MODEL SMALL
.586
.STACK 100h
.DATA
Message DB 'Hello, my name blank', 13, 10, '$'
.CODE
Hello PROC
mov ax, #data
mov ds, ax
mov dx, OFFSET Message
mov ah, 9h
int 21h
mov al, 0
mov ah, 4ch
int 21h
Hello ENDP
END Hello
At first I tried running the code with masm32, using the command prompt and correct linker. Then I tried using Visual Studio 2013 ultimate; even using masm32 within Visual Studio, I got the similar issues each time. The assembler had issues with the #data line, and no leading underscore for Hello. Fixing the latter only resulted in a issue with unmatched blocks.
I did find a workaround by using a MS-DOS virtual environment, and the code worked fine after removing the .586 instruction.
I suspect the main issues were trying to run this code in a x64 OS environment, but I'm still learning the language so I'd like to hear other opinions on why I couldn't get it to run initially.
The book we're using is Jones, Assembly Language for the IBM PC Family 3rd edition.
You are using a 32 bit linker. You need to use the 16 bit linker called link16 in masm32/bin to link the code.
e.g.
ml /c /Fl filename.asm
-then-
link16 filename.obj
The difference between the 16 bit and the 32 bit addressmode is the default size of the operands/registers and addresses inside of our codesegment and how the assembler use the operandsize and the addresssize prefixes.
Within the 16 bit addressmode the default size is 16 bit and if we want to use 32 bit register/operands and/or 32 bit addresses within the 16 bit addressmode, then our assembler have to place an operandsize and/or an adddresssize prefix to all of those 32 bit instructions. But if we use only 16 bit instructions within the 16 bit adressmode, then we do not need those operandsize and/or adddresssize prefixes.
Whithin the 32 bit addressmode the default size is 32 bit and if we want to use 32 bit register/operands and/or 32 bit addresses within the 32 bit addressmode, then our assembler do not have to place an operandsize and/or an adddresssize prefix to all of our 32 bit instructions. (This is good for to minimize the number of bytes of our code, if we use mostly 32 bit instructions.) But if we use 16 bit instructions within the 32 bit adressmode, then our assembler have to place the operandsize and/or adddresssize prefixes.
Additional there are two assembler directives(use16 and use32) for to determine for wich adressmode the code is written, if we want to have different parts of code for both addressmodes.
..
Beside both addressmodes there are also a large difference between the realmode and the protected mode.
For the realmode in combination with the 16 bit Addressmode(default on startup) we become a default segmentsize of 64 KB segments and all addresses will be calculate together with the segment part of a segmentregister and the offset part for to build an address. For the protected mode we have to use global and/or local descriptor tables for to specify the size of a segment that we want to use.
...
At last the architecture of the underlying operating system give us the demands for the target for that we have to assemble our code and which software interrupts are aviable for to use.
Dirk
On windows 64 bit, I've got a 32 bit process that reads the memory of other 32 bit processes, and I'd like it to be able to read 64 bit processes too.
ReadProcessMemory is being used to read the memory, but it has a 32 bit limitation. Is there any way of doing the equivalent of a ReadProcessMemory on a 64 bit process?
I know I could write a 64 bit process and launch that from my 32 bit process to do the work, but I'm wondering if there's some other option so that I don't need to write a 64 bit process.
Thanks.
It's possible.
For an example you may refer to the excellent sample in the answer of tofucoder.
For one more sample you may refer to this link.
For explanation why it actually works please check this thread.
Another sample may be found here.
The whole trick is to call 64-bit version of ReadProcessMemory function. Intuitively it's not an option from 32-bit process however the link above explains: x64 version of ntdll.dll is also loaded as a part of 32-bit process in Windows WOW64 emulator. It has a function called NtReadVirtualMemory with the same prototype as ReadProcessMemory64:
__declspec(SPEC)BOOL __cdecl ReadProcessMemory64(HANDLE hProcess, DWORD64 lpBaseAddress, LPVOID lpBuffer, SIZE_T nSize, SIZE_T *lpNumberOfBytesRead);
The address is 64-bit long and thus the whole virtual address space of 64-bit process may be referred.
You may wonder how to get the address of this function. It's when another function in ntdll.dll comes in handy: LdrGetProcedureAddress. Its prototype is the same as of GetProcAddress:
__declspec(SPEC)DWORD64 __cdecl GetProcAddress64(DWORD64 hModule, char* funcName);
We are to examine export directory of x64 ntdll.dll and manually found this function's entry. Then we can obtain address of any other function.
Another question is left uncovered so far: how to obtain start address of x64 ntdll.dll? We need to manually walk through x64 PEB structure of our process and traverse loaded modules' list - as one of the variants. And how to get PEB address? Please refer to the links above, not to overflow this post with too many details.
All this is covered in sample from the first link.
Alternative variants with usage of NtReadVirtualMemory & NtWow64ReadVirtualMemory64 functions are provided in second & third links (as well as alternative ways to get PEB address).
Summary: it is possible to interact with x64 process from x86 one. It can be done either with direct call to x64 version of function (from x64 ntdll.dll which is loaded as a part of WOW64 process) or with the call of specific x86 function which is intended to work with x64 process (namely NtWow64ReadVirtualMemory64).
P.S. One may say it's undocumented and is more like hack - but it's just not officially documented. Soft like Unlocker, ProcessHacker or ProcessExplorer, for example, makes use of these undocumented features (and many more), and it's up to you to decide, of course.
The library wow64ext seems to have solved this problem and offers a function ReadProcessMemory64 The Visual Studio Extension VSDebugTool seems to use this library and works for me with 64 bit processes.
Anyway, it shouldn't be impossibe because the (32 bit) Visual Studio Debugger handles 64 bit Debuggees very well.
No: http://blogs.msdn.com/b/oldnewthing/archive/2008/10/20/9006720.aspx
There's no way to get around this. One solution is to stop using the WOW64 emulator and write a 64 bit process. Another solution is to use IPC rather than direct memory reading.
ReadProcessMemory can read any size of memory including from x86 processes reading x64 processes.
You can without a problem, in an x86 program, do the following:
DWORD64 test = 0;
ReadProcessMemory(hProcess, (LPCVOID)lpBaseAddress, &test, sizeof(DWORD64), NULL);
Which would allow you to dereference an x64 pointer from a x86 process.
I am using windows 7 64 bit with MSVC2005 and QT (but I doubt QT is causing the problem since this is an issue with the fundamental data type char.
So when I try to compare two char's like so
char A=0xAA;
if(A==0xAA)
printf("Success");
else
printf("Fail");
lo and behold it fails! but when I do this
char A=0xAA;
char B=0xAA;
if(A==B)
printf("Success");
else
printf("Fail");
I get success! Actually when I thought about it... hey I'm working on a 64 bit processor.. even though char's are supposed to be treated as 1 byte. It's probablly stored as 4 bytes.
So
char A=0xAA;
if(A==0xFFFFFFAA)
printf("Success");
else
printf("Fail");
Now I get success!!!
But WTF! Is this standard behavior!! If the damn thing is defined as a char, shouldn't the compiler know what to do with it? Further tests show that the extra bytes are only stored as one's if the most significant bit of the char is a 1. So 0x07 and lower is stored as 0x00000007. WTF.
Actually I seemed to have answered all my questions... except who to call to get this bug fixed. Is this even a bug? You can use MSVC2005 on 64 bit operating systems right or am I being an idiot. I guess I should get qt creator to use MSVC2010.. damn it. There goes my 2 hours.
You are comparing a (signed) char with the value -86 (256-0xAA) to an integer with the value 170 (0xAA).
The same will happen on a 32-bit system, and an 8-bit system, for that matter.
Not related to 64 bit: you need to define A as unsigned char to get correct behavior. Compiler warning shows that this code may be incorrect:
warning C4309: 'initializing' : truncation of constant value
Operating System: Windows XP 64 bit, SP2.
I have an unusual problem. I am porting some code from 32 bit to 64 bit. The 32 bit code works just fine. But when I call CreateThread() for the 64 bit version the call fails. I have three places where this fails. 2 call CreateThread(). 1 calls beginthreadex() which calls CreateThread().
All three calls fail with error code 0x3E6, "Invalid access to memory location".
The problem is all the input parameters are correct.
HANDLE h;
DWORD threadID;
h = CreateThread(0, // default security
0, // default stack size
myThreadFunc, // valid function to call
myParam, // my param
0, // no flags, start thread immediately
&threadID);
All three calls to CreateThread() are made from a DLL I've injected into the target program at the start of the program execution (this is before the program has got to the start of main()/WinMain()). If I call CreateThread() from the target program (same params) via say a menu, it works. Same parameters etc. Bizarre.
If I pass NULL instead of &threadID, it still fails.
If I pass NULL as myParam, it still fails.
I'm not calling CreateThread from inside DllMain(), so that isn't the problem. I'm confused and searching on Google etc hasn't shown any relevant answers.
If anyone has seen this before or has any ideas, please let me know.
Thanks for reading.
ANSWER
Short answer: Stack Frames on x64 need to be 16 byte aligned.
Longer answer:
After much banging my head against the debugger wall and posting responses to the various suggestions (all of which helped in someway, prodding me to try new directions) I started exploring what-ifs about what was on the stack prior to calling CreateThread(). This proved to be a red-herring but it did lead to the solution.
Adding extra data to the stack changes the stack frame alignment. Sooner or later one of the tests gets you to 16 byte stack frame alignment. At that point the code worked. So I retraced my steps and started putting NULL data onto the stack rather than what I thought was the correct values (I had been pushing return addresses to fake up a call frame). It still worked - so the data isn't important, it must be the actual stack addresses.
I quickly realised it was 16 byte alignment for the stack. Previously I was only aware of 8 byte alignment for data. This microsoft document explains all the alignment requirements.
If the stackframe is not 16 byte aligned on x64 the compiler may put large (8 byte or more) data on the wrong alignment boundaries when it pushes data onto the stack.
Hence the problem I faced - the hooking code was called with a stack that was not aligned on a 16 byte boundary.
Quick summary of alignment requirements, expressed as size : alignment
1 : 1
2 : 2
4 : 4
8 : 8
10 : 16
16 : 16
Anything larger than 8 bytes is aligned on the next power of 2 boundary.
I think Microsoft's error code is a bit misleading. The initial STATUS_DATATYPE_MISALIGNMENT could be expressed as a STATUS_STACK_MISALIGNMENT which would be more helpful. But then turning STATUS_DATATYPE_MISALIGNMENT into ERROR_NOACCESS - that actually disguises and misleads as to what the problem is. Very unhelpful.
Thank you to everyone that posted suggestions. Even if I disagreed with the suggestions, they prompted me to test in a wide variety of directions (including the ones I disagreed with).
Written a more detailed description of the problem of datatype misalignment here: 64 bit porting gotcha #1! x64 Datatype misalignment.
The only reason that 64bit would make a difference is that threading on 64bit requires 64bit aligned values. If threadID isn't 64bit aligned, you could cause this problem.
Ok, that idea's not it. Are you sure it's valid to call CreateThread before main/WinMain? It would explain why it works in a menu- because that's after main/WinMain.
In addition, I'd triple-check the lifetime of myParam. CreateThread returns (this I know from experience) long before the function you pass in is called.
Post the thread routine's code (or just a few lines).
It suddenly occurs to me: Are you sure that you're injecting your 64bit code into a 64bit process? Because if you had a 64bit CreateThread call and tried to inject that into a 32bit process running under WOW64, bad things could happen.
Starting to seriously run out of ideas. Does the compiler report any warnings?
Could the bug be due to a bug in the host program, rather than the DLL? There's some other code, such as loading a DLL if you used __declspec(import/export), that occurs before main/WinMain. If that DLLMain, for example, had a bug in it.
I ran into this issue today. And I checked every argument feed into _beginthread/CreateThread/NtCreateThread via rohitab's Windows API Monitor v2. Every argument is aligned properly (AFAIK).
So, where does STATUS_DATATYPE_MISALIGNMENT come from?
The first few lines of NtCreateThread validate parameters passed from user mode.
ProbeForReadSmallStructure (ThreadContext, sizeof (CONTEXT), CONTEXT_ALIGN);
for i386
#define CONTEXT_ALIGN (sizeof(ULONG))
for amd64
#define STACK_ALIGN (16UI64)
...
#define CONTEXT_ALIGN STACK_ALIGN
On amd64, if the ThreadContext pointer is not aligned to 16 bytes, NtCreateThread will return STATUS_DATATYPE_MISALIGNMENT.
CreateThread (actually CreateRemoteThread) allocated ThreadContext from stack, and did nothing special to guarantee the alignment requirement is satisfied. Things will work smoothly if every piece of your code followed Microsoft x64 calling convention, which unfortunately not true for me.
PS: The same code may work on newer Windows (say Vista and newer). I didn't check though. I'm facing this issue on Windows Server 2003 R2 x64.
I'm in the business of using parallel threads under windows
for calculations. No funny business, no dll-calls, and certainly
no call-back's. The following works in 32 bits windows. I set up the stack for my calculation, well within the area reserved for my program.
All releveant data about area's and start addresses is contained in
a data structure that is passed to CreateThread as parameter 3.
The address that is called contains a small assembler routine
that uses this data stucture.
Indeed this routine finds the address to return to on the stack,
then the address of the data structure.
There is no reason to go far into this. It just works and it calculates
the number of primes below 2,000,000,000 just fine, in one thread,
in two threads or in 20 threads.
Now CreateThread in 64 bits doesn't push the address of the data
structure. That seems implausible so I show you the smoking gun,
a dump of a debug session.
In the subwindow at the bottom right you see the stack, and
there is merely the return address, amidst a sea of zeroes.
The mechanism I use to fill in parameters is portable between 32 and 64 bits.
No other call exhibits a difference between word-sizes.
Moreover why would the code address work but not the data address?
The bottom line: one would expect that CreateThread passes the data parameter on the stack in the same way in 64 bits as in 32 bits, then does a subroutine call. At the assembler level it doesn't work that way. If there are any hidden requirements to e.g. RSP that are automatically fullfilled in C++ that would be very nasty.
P.S. No there are no 16 byte alignment problems. That lies ages behind me.
Try using _beginthread() or _beginthreadex() instead, you shouldn't be using CreateThread directly.
See this previous question.