NASM FindFirstFileA LPWIN32_FIND_DATAA - winapi

I've written a basic program in NASM trying to use FindFileA with a view to eventually listing all files in a directory.
extern FindFirstFileA ; kernel32.dll
extern ExitProcess ; kernel32.dll
section .code
Start:
push dataStructPtr
push searchParameters
call [FindFirstFileA]
mov [fileHandle], eax
push 0
call [ExitProcess]
section .data
searchParameters: db "*.*",0
section .bss
dataStructPtr: resb 4
fileHandle: resb 4
As far as I can tell a 32-bit pointer to the WIN32_FIND_DATAA structure should be going into address 402008.
However it looks like more than 4 bytes are being written and also that gives me an address of 007334C8 which the program memory does not go up to.
Would you be able to shed some light on why this is happening and where the structure resides so I could look at it using OllyDbg?
Using OllyDbg to look at it:
Many Thanks

Related

Explain to me how Windows allocates process virtual memory

I have pretty complex question combined of multiple related questions. Let me give you the preamble.
I wrote a simple Win64 program in assembly language which prints "2 + 3 = 5" using printf and then "Hello World!" using puts:
format PE64
entry start
section '.text' code readable executable
start:
sub rsp,8*5 ; reserve stack for API use and make stack dqword aligned
mov edx, 3
mov ecx, 2
call print_sum
lea rcx,[_hw_message]
call [puts]
mov ecx,0
call [ExitProcess]
print_sum:
sub rsp, 20h
mov r9d, ecx
add r9d, edx
mov r8d, edx
mov edx, ecx
lea ecx, [_format_message]
call [printf]
add rsp, 20h
ret
section '.data' data readable writeable
_hw_message db 'Hello World!',0
_format_message db '%d + %d = %d',13,10,0
section '.idata' import data readable writeable
dd 0,0,0,RVA kernel_name,RVA kernel_table
dd 0,0,0,RVA msvcrt_name,RVA msvcrt_table
kernel_table:
ExitProcess dq RVA _ExitProcess
dq 0
msvcrt_table:
printf dq RVA _printf
puts dq RVA _puts
dq 0
kernel_name db 'KERNEL32.DLL',0
msvcrt_name db 'msvcrt.dll',0
_ExitProcess dw 0
db 'ExitProcess',0
_printf dw 0
db 'printf',0
_puts dw 0
db 'puts',0
and built it with fasm. Resulting binary size is 2048 bytes.
I've opened it with CFF Explorer to see PE header values.
Image base is 0x400000, entry point is 0x1000, .text section virtual address is 0x1000 too, so, as far as I understand, it should start in virtual memory at offset 0x401000 and it is also its entry point.
Then I've opened it in debugger (I use x64dbg) to confirm my guess:
Looks believable. Also note that stack is located at 0x8A000.
Fine, then I've tried the same with another program – notepad.exe from C:\Windows:
Wait, what? 0x140000000 + 0x24050 = 0x140024050, not 0x7FF75FD04050. And I can't find in PE headers such big values starting with 7FF.
In addition, the stack is again located somewhere at the beginning of the process's memory map, but now its address is already much larger:
I thought that perhaps this is because notepad.exe is a system program and is tightly tied to the Windows system APIs, and some parts of it (and maybe all the code) are always loaded into RAM while Windows is running. Therefore, I tried to do the same with x64dbg itself, and saw about the same picture:
image base: 0x140000000
entry point (in headers): 0x2440
entry point in VM: 0x7FF6B0E82440
location of stack in VM: 0xFDA07F8000
So the questions are:
Why are sections of some programs mapped to addresses greater than 0x7ff000000000, which doesn't match PE headers?
How are these processes different from others?
How does the OS decide where to place the stack in virtual memory?
Each thread has its own stack. As you can see from the screenshots, thread stacks are usually placed before code sections. If the program starts a dynamic number of threads, this memory may not be enough. Where, in this case, will stacks be allocated for new threads?
How can I programmatically, having an executable file, but not running it, statically determine at what addresses in the virtual memory of its process the sections, the stack will be located, and what address spaces will be available for allocation on the heap?
I understand that this can be difficult to explain in a nutshell, so I appreciate if, in addition to answering my questions, you can recommend me some reading material that will help me improve my understanding of the Windows virtual memory mapping.
What you're seeing is Address space layout randomization, which is enabled by default in MSVC with the linker flag:
/DYNAMICBASE.
To enable this, the flag IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE (0x40) must be set in the PE header, at FileHeader -> OptionalHeader -> DllCharacteristics.
When enabled, the OS will select a random address for the base image, stack, and heap. The ImageBase specified in the PE header will be ignored.

Assembler calling conventions for Windows 10 API routines

Back in the 1970's I cut my teeth on the IBM 370 mainframe assembler, and in the early 1980's I had the original IBM PC, with the Microsoft Macro Assembler. At that time it was sold as a separate product, and came with a very useful manual. Now I'm retired, in quarantine, and looking to get back into assembler language.
I downloaded Visual Studio 2019 Community, which has MASM included in it, and for interactive debugging I'm using x64dbg. My PC is 64 bit, so I'm using the ML64 assembler as provided with VS.
My question is regarding the calling convention for the Windows API functions.
These days the Windows functions all seem to be geared toward C++ and, in my understanding, the calling convention reflects the machine code that is generated by C++ for calling those functions. I want to develop a template that I can use for all future calls, so it's coded for a nonexistent function called apifunc. This fictional function has five parameters.
; command to assemble is:
; ml64 samplecall.asm /link /subsystem:windows /defaultlib:kernel32.lib /entry:Start
extrn ExitProcess: PROC
extrn apifunc: PROC ; any hypothetical api function with five parameters
.data
;
parm1 dword ? ; these could be any required data type
parm2 dword ?
parm3 dword ?
parm4 dword ?
parm5 dword ?
;
.code
Start PROC
;
sub rsp, 32 ; room on the stack for first four parameters, 8 bytes each
;
lea rcx, parm1 ; pass the first four parameters in registers
lea rdx, parm2
lea r8, parm3
lea r9, parm4
lea rax, parm5 ; address of the fifth and last parameter
push rax ; put it on the stack
call apifunc ; call the hypothetical function
;
call ExitProcess
;
Start ENDP
End:
Does this code look even remotely correct? When control returns from apifunc, do I have any indication at all of whether it was successful and, if it was not, why not? Do I need to add 40 back to the stack pointer in order to leave it in the same condition in which it was passed to me?
Please be patient with me, because I now stand at the bottom of a very steep learning curve. I hope my questions make sense, and that I provided enough information.

GCC compiled assembly

I am trying to learn assembly language by example, or compiling simple C files with GCC using the -S option, intel syntax, and CFI calls disabled (every other free way is extremely confusing
My C file is literally just int main() {return 0;}, but GCC spits out this:
.file "simpleCTest.c"
.intel_syntax noprefix
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
push ebp
mov ebp, esp
and esp, -16
call ___main
mov eax, 0
leave
ret
.ident "GCC: (GNU) 5.3.0"
My real question is why does the main function have any processor instructions (push edp, mov edp, esp, etc)? Are these even necessary (I guess it would be a way of data management to prepare/shut down programs, but I'm not sure)? Why doesn't it just issue a ret statement after the main function? Also why are there TWO main functions (_main & ___main)?
To sum it up, why is it not just like this?
.def _main
_main:
mov eax, 0 ;(for return integer)
ret
GCC spits out this
This would probably be a bit clearer if you actually had your main function do some things, oddly enough, including calling another function.
Your compiled code is setting up a frame by which to reference its stack variables with the first opcode, mov ebp,esp. This would be used if you had variables that could be referred to with ebp and a constant, for instance. Then, it is aligning the stack to a multiple of 16 bytes with the AND instruction- that is, it is saying it will not use from 0 to 15 bytes of the provided stack, such that [esp] is aligned to a multiple of 16 bytes. This would be important because of the calling conventions in use.
The ending opcode leave copies the backed up base pointer over the current state of the stack pointer, and then restores the original base pointer with pop.
My real question is why does the main function have any processor instructions
It's setting stuff up for things that you aren't doing (but that nontrivial programs would do), and is not making the most optimized "return 0" program that it could. By having a base pointer that is mostly a backup of the original stack pointer, the program is free to refer to local variables as an offset plus the base pointer (including implied stuff you aren't using like the argument count, the pointer to the pointers to argument listing, and the pointer to the environment), and by having a stack pointer that is a multiple of 16, the program is free to make calls to functions according to its calling standard.

How does one display "Hello, world!" without using the benefits of a high-level assembler?

I'm attempting to display "Hello, world!" with FASM on a 64-bit Windows 7 machine without using the crutches that modern assemblers seem to provide in abundance.
This rather simple task proved to be surprisingly frustrating since every example and tutorial I could find insists on resorting to macros, including prewritten code, or importing libraries from high-level languages. I thought that the kind of people who want to learn assembly typically do so to develop a direct and intimate understanding of how computers work. All these abstractions and obfuscations seem to detract from that purpose.
Rant aside, I'm looking for code that can display "Hello, world!" on a console without reusing, including, and importing anything except to directly access the Windows API. Although I'm aware that many assemblers come packaged with files that provide access to the Windows API, I'd rather not rely on them.
Also, if you have any suggestions as to what assemblers or tutorials I can use to better facilitate my approach to learning, I'd greatly appreciate it.
The big problem with "pure" windows programming is that Windows require that the program contains import section, about what functions from the system DLLs have to be provided to the program - so called import table.
This table is not a part of the program and has nothing to do with assembly programming itself. Besides, the import table has complex structure, not very convenient to be manually build. That is why FASM provides some standard way for the user to build these import tables.
The proper approach to you, if you goal is to learn assembly, is to read the FASM manuals, where these macros are described, then to read the example code provided in any FASM distribution and then to start using them and concentrate to the assembly programming.
The moderate use of macros does not make your program less assembly written!
The FASM message board is good place to ask questions and to get help, but you have to make your homework after all.
Every running process under windows gets either kernel32 or kernalbase loaded into its address space, using this fact and the PEB internals, you can easily access any windows function (provided you have the right access privileges).
This blog entry details how to go about doing this to display a message with MessageBoxA.
In all honesty, unless you have some extreme reason for doing this, you are going to just end up wasting time, rather use the tools provided (in this case, a linker, so you can access any windows API without going through 10000 hurdles and loops).
I managed to link to one library only (kernel32.dll) and make reference to 3 functions:
GetStdHandle
WriteConsole
ExitProcess
The code below is the result of my exhaustive Google search, and my own reference to MS documentation.
format PE console
entry start
include 'include\win32a.inc'
section '.data' data readable writable
msg db 'Hello World!',13,10,0
len = $-msg
dummy dd ?
section '.code' readable writable executable
start:
push STD_OUTPUT_HANDLE
call [GetStdHandle] ;STD_OUTPUT_HANDLE (DWORD)-11
push 0 ;LPVOID lpReserved
push dummy ;LPDWORD lpNumberOfCharsWritten
push len ;DWORD nNumberOfCharsToWrite
push msg ;VOID *lpBuffer;
push eax ;HANDLE hConsoleOutput
call [WriteConsole]
push 0
call [ExitProcess]
section '.idata' data import readable writable
library kernel32,'KERNEL32.DLL'
include 'include\api\kernel32.inc'
Asking google for help: http://board.flatassembler.net/topic.php?t=14034
Trying it out yourself
; Example of 64-bit PE program
format PE64 GUI
entry start
section '.text' code readable executable
start:
sub rsp,8*5 ; reserve stack for API use and make stack dqword aligned
mov r9d,0
lea r8,[_caption]
lea rdx,[_message]
mov rcx,0
call [MessageBoxA]
mov ecx,eax
call [ExitProcess]
section '.data' data readable writeable
_caption db 'Win64 assembly program',0
_message db 'Hello World!',0
section '.idata' import data readable writeable
dd 0,0,0,RVA kernel_name,RVA kernel_table
dd 0,0,0,RVA user_name,RVA user_table
dd 0,0,0,0,0
kernel_table:
ExitProcess dq RVA _ExitProcess
dq 0
user_table:
MessageBoxA dq RVA _MessageBoxA
dq 0
kernel_name db 'KERNEL32.DLL',0
user_name db 'USER32.DLL',0
_ExitProcess dw 0
db 'ExitProcess',0
_MessageBoxA dw 0
db 'MessageBoxA',0
Using nasm to compile this hello world (16 bit) code taken from here:
.model tiny
.code
org 100h
main proc
mov ah,9 ; Display String Service
mov dx,offset hello_message ; Offset of message (Segment DS is the right segment in .COM files)
int 21h ; call DOS int 21h service to display message at ptr ds:dx
retn ; returns to address 0000 off the stack
; which points to bytes which make int 20h (exit program)
hello_message db 'Hello, world!$'
main endp
end main

Confusion with how Win32 API calls work in assembly

I don't know how to ask this better but why does this:
call ExitProcess
do the same as this?:
mov eax, ExitProcess
mov eax, [eax]
call eax
I would think that these would be equivalent:
call ExitProcess
mov eax, ExitProcess
call eax
When importing the code from a DLL, the symbol ExitProcess isn't actually the address of the code that exits your process (it's the address of the address). So, in that case, you have to dereference it to get the actual code address.
That means that you must use:
call [ExitProcess]
to call it.
For example, there's some code at this location containing the following:
;; Note how we use 'AllocConsole' as if it was a variable. 'AllocConsole', to
;; NASM, means the address of the AllocConsole "variable" ; but since the
;; pointer to the AllocConsole() Win32 API function is stored in that
;; variable, we need to call the address from that variable.
;; So it's "call the code at the address: whatever's at the address
;; AllocConsole" .
call [AllocConsole]
However, importing the DLL directly in user code is not the only way to get at the function. I'll explain why you're seeing both ways below.
The "normal" means of calling a DLL function is to mark it extern then import it from the DLL:
extern ExitProcess
import ExitProcess kernel32.dll
:
call [ExitProcess]
Because that sets up the symbol to be an indirect reference to the code, you need to call it indirectly.
After some searching, it appears there is code in the wild that uses the naked form:
call ExitProcess
From what I can tell, this all seems to use the alink linker, which links with the win32.lib library file. It's possible that this library provides the stub for calling the actual DLL code, something like:
import ExitProcessActual kernel32.dll ExitProcess
global ExitProcess
ExitProcess:
jmp [ExitProcessActual]
In nasm, this would import the address of ExitProcess from the DLL and call it ExitProcessActual, keeping in mind that this address is an indirect reference to the code, not the address of the code itself.
It would then export the ExitProcess entry point (the one in this LIB file, not the one in the DLL) so that others could use it.
Then someone could simply write:
extern ExitProcess
:
call ExitProcess
to exit the process - the library would jump to the actual DLL code.
In fact, with a little more research, this is exactly what's happening. From the alink.txt file which comes with the alink download:
A sample import library for Win32 is included as win32.lib. All named exports in Kernel32, User32, GDI32, Shell32, ADVAPI32, version, winmm, lz32, commdlg and commctl are included.
Use:
alink -oPE file[.obj] win32.lib
to include it or specify
INCLUDELIB "win32"
in your source file.
This consists of a series of entries for import redirection - call MessageBoxA, and it jumps to [__imp_MessageBoxA], which is in the import table.
Thus calls to imports will run faster if call [__imp_importName] is used instead of call importName.
See test.asm, my sample program, which calls up a message box both ways:
includelib "win32.lib"
extrn MessageBoxA:near
extrn __imp_MessageBoxA:dword
codeseg
start:
push 0 ; OK button
push offset title1
push offset string1
push 0
call MessageBoxA
push 0 ; OK button
push offset title1
push offset string2
push 0
call large [large __imp_MessageBoxA]
(__imp_MessageBoxA is the symbol imported from the DLL, equivalent to my ExitProcessActual above).

Resources