winddk: __iob_func redefinition - windows

I am trying to link a user space library into a windows kernel driver. It has a reference to __iob_func which is part of "libcmt.lib" (user space library).
I don't have access to this function in winddk. So, I am planning to define a stub for __iob_func which will try to emulate the same functionality as done in user space library.
Does anyone know what __iob_func do? I found the declaration of the function in the header files. But I am not sure what functionality it exactly does.

__iob_func() returns a pointer to the array of FILE descriptors that holds stdin, stdout, stderr and any FILE objects opened through the C runtime library. See the MSVC runtime library source _file.c.
If your user-space library code actually tries to do much with the C runtime, you'll probably run into a lot of headaches linking it into your kernel driver. Good luck.

Disassemble the following c code. cl /Fa mycode.c
fflush (stdin) ;
fflush (stdout) ;
fflush (stderr) ;
This is basically what the assembly file output with the /Fa switch on the c file will look like:
call ___iob_func ; invoke the c function __iob_func
push eax ; invoke fflush with 1 parameter
call _fflush
add esp, 04h ; realign the stack adding 4 bytes to
; the stack pointer (esp).
So, apparently the __iob_func returns a pointer to array or structure of input output buffer information; hence the iob acronym followed by func (__iob_func). i stands for input, o for output, b for buffer, etc......
That's just the fflush(stdin) function. fflush(stdout) repeats the same 4 lines with the only difference for stdout in the second line:
push eax + 020h
So, apparently each array member is composed of 32 bytes or 8 double words.
For stderr the assembler posted push eax + 040h or eax + 64 bytes
Microsoft Developer Network (MSDN) doesn't document the __iob_func function. But it's declaration probably would be something like the following: lpReturn __iob_func ( void )
32 bit assembly usually returns the value of a function in the eax register. And when the input parameter value of a function is described as an addition to a register (e.g. eax + 020h), it usually means that its referring to a structure or array of some type. So eax would be the starting address of the structure or array. And eax + 020h would be a location in that structure where information for stdout begins. eax + 040h would be the location where stderr begins.
So basically, if you want to use the __iob_func in your c program, you would have to prototype the function, and then perhaps create your own personal lib
mylib.def
LIBRARY msvcrt.dll
EXPORTS
__iob_func
And then run lib on that file.
LIB /def:mylib.def /machine:x86
That should create a 32 bit library called mylib.lib which you can use to link into your program.

Related

How to see result of MASM directives such as PROC, .SETFRAME. .PUSHREG

Writing x64 Assembly code using MASM, we can use these directives to provide frame unwinding information. For example, from .SETFRAME definition:
These directives do not generate code; they only generate .xdata and .pdata.
Since these directives don't produce any code, I cannot see their effects in Disassembly window. So, I don't see any difference, when I write assembly function with or without these directives. How can I see the result of these directives - using dumpbin or something else?
How to write code that can test this unwinding capability? For example, I intentionally write assembly code that causes an exception. I want to see the difference in exception handling behavior, when function is written with or without these directives.
In my case caller is written in C++, and can use try-catch, SSE etc. - whatever is relevant for this situation.
Answering your question:
How can I see the result of these directives - using dumpbin or something else?
You can use dumpbin /UNWINDINFO out.exe to see the additions to the .pdata resulting from your use of .SETFRAME.
The output will look something like the following:
00000054 00001530 00001541 000C2070
Unwind version: 1
Unwind flags: None
Size of prologue: 0x04
Count of codes: 2
Frame register: rbp
Frame offset: 0x0
Unwind codes:
04: SET_FPREG, register=rbp, offset=0x00
01: PUSH_NONVOL, register=rbp
A bit of explanation to the output:
The second hex number found in the output is the function address 00001530
Unwind codes express what happens in the function prolog. In the example what happens is:
RBP is pushed to the stack
RBP is used as the frame pointer
Other functions may look like the following:
000000D8 000016D0 0000178A 000C20E4
Unwind version: 1
Unwind flags: EHANDLER UHANDLER
Size of prologue: 0x05
Count of codes: 2
Unwind codes:
05: ALLOC_SMALL, size=0x20
01: PUSH_NONVOL, register=rbx
Handler: 000A2A50
One of the main differences here is that this function has an exception handler. This is indicated by the Unwind flags: EHANDLER UHANDLER as well as the Handler: 000A2A50.
Probably your best bet is to have your asm function call another C++ function, and have your C++ function throw a C++ exception. Ideally have the code there depend on multiple values in call-preserved registers, so you can make sure they get restored. But just having unwinding find the right return addresses to get back into your caller requires correct metadata to indicate where that is relative to RSP, for any given RIP.
So create a situation where a C++ exception needs to unwind the stack through your asm function; if it works then you got the stack-unwind metadata directives correct. Specifically, try{}catch in the C++ caller, and throw in a C++ function you call from asm.
That thrower can I think be extern "C" so you can call it from asm without name mangling. Or call it via a function pointer, or just look at MSVC compiler output and copy the mangled name into asm.
Apparently Windows SEH uses the same mechanism as plain C++ exceptions, so you could potentially set up a catch for the exception delivered by the kernel in response to a memory fault from something like mov ds:[0], eax (null deref). You could put this at any point in your function to make sure the exception unwind info was correct about the stack state at every point, not just getting back into sync before a function-call.
https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170&viewFallbackFrom=vs-2019 has details about the metadata.
BTW, the non-Windows (e.g. GNU/Linux) equivalent of this metadata is DWARF .cfi directives which create a .eh_frame section.
I don't know equivalent details for Windows, but I do know they use similar metadata that makes it possible to unwind the stack without relying on RBP frame pointers. This lets compilers make optimized code that doesn't waste instructions on push rbp / mov rbp,rsp and leave in function prologues/epilogues, and frees up RBP for use as a general-purpose register. (Even more useful in 32-bit code where 7 instead of 6 registers besides the stack pointer is a much bigger deal than 15 vs. 14.)
The idea is that given a RIP, you can look up the offset from RSP to the return address on the stack, and the locations of any call-preserved registers. So you can restore them and continue unwinding into the parent using that return address.
The metadata indicates where each register was saved, relative to RSP or RBP, given the current RIP as a search key. In functions that use an RBP frame pointer, one piece of metadata can indicate that. (Other metadata for each push rbx / push r12 says which call-preserved regs were saved in which order).
In functions that don't use RBP as a frame pointer, every push / pop or sub/add RSP needs metadata for which RIP it happened at, so given a RIP, stack unwinding can see where the return address is, and where those saved call-preserved registers are. (Functions that use alloca or VLAs thus must use RBP as a frame pointer.)
This is the big-picture problem that the metadata has to solve. There are a lot of details, and it's much easier to leave things up to a compiler!

How to call library functions in shellcode

I want to generate shellcode using the following NASM code:
global _start
extern exit
section .text
_start:
xor rcx, rcx
or rcx, 10
call exit
The problem here is that I cannot use this because the address of exit function cannot be hard coded. So, how do I go about using library functions without having to re-implement them using system calls?
One way that I can think of, is to retrieve the address of exit function in a pre-processing program using GetProcAddress and substitute it in the shellcode at the appropriate place.
However, this method does not generate shellcode that can be run as it is. I'm sure there must be a better way to do it.
I am not an expert on writing shellcode, but you could try to find the import address table (IAT) of your target program and use the stored function pointers to call windows functions.
Note that you would be limited to the functions the target program uses.
Also you would have to let your shellcode calculate IAT's position relative to the process's base address due to relocations. Of course you could rely on Windows not relocating, but this might result in errors in a few cases.
Another issue is that you would have to find the target process's base address from outside.
A totally different attempt would be using syscalls, but they are really hard to use, not talking about the danger using them.
Information on PE file structure:
https://msdn.microsoft.com/en-us/library/ms809762.aspx

Printing a string in x86 Assembly on Mac OS X (NASM)

I'm doing x86 on Mac OS X with NASM. Copying an example and experimenting I noticed that my print command needed a four bytes pushed onto the stack after the other parameters but can't figure out why line five is necessary:
1 push dword len ;Length of message
2 push dword msg ;Message to write
3 push dword 1 ;STDOUT
4 mov eax,4 ;Command code for 'writing'
5 sub esp,4 ;<<< Effectively 'push' Without this the print breaks
6 int 0x80 ;SYSCALL
7 add esp,16 ;Functionally 'pop' everything off the stack
I am having trouble finding any documentation on this 'push the parameters to the stack' syntax that NASM/OS X seems to require. If anyone can point me to a resource for that in general that would most likely answer this question as well.
(Most of the credit goes to #Michael Petch's comment; I'm repeating it here so that it is an answer, and also in order to further clarify the reason for the additional four bytes on the stack.)
macOS is based on BSD, and, as per FreeBSD's documentation re system calls, by default the kernel uses the C calling conventions (which means arguments are pushed to the stack, from last to first), but assuming four extra bytes pushed to the stack, as "it is assumed the program will call a function that issues int 80h, rather than issuing int 80h directly".
That is, the kernel is not built for direct int 80h calls, but rather for code that looks like this:
kernel: ; subroutine to make system calls
int 80h
ret
.
.
.
; code that makes a system call
call kernel ; instead of invoking int 80h directly
Notice that call kernel would push the return address (used by the kernel subroutine's ret to return to calling code after the system call) onto the stack, accounting for four additional bytes – that's why it's necessary to manually push four bytes to the stack (any four bytes – their actual value doesn't matter, as it is ignored by the kernel – so one way to achieve this is sub esp, 4) when invoking int 80h directly.
The reason the kernel expects this behaviour – of calling a method which invokes the interrupt instead of invoking it directly – is that when writing code that can be run on multiple platforms it's then only needed to provide a different version of the kernel subroutine, rather than of every place where a system call is invoked (more details and examples in the link above).
Note: all the above is for 32-bit; for 64-bit the calling conventions are different – registers are used to pass the arguments rather than the stack (there's also a call convention for 32-bit which uses registers, but even then it's not the same registers), the syscall instruction is used instead of int 80h, and no extra four bytes (which, on 64-bit systems, would actually be eight bytes) need to be pushed.

Working with _RTL_USER_PROCESS_PARAMETERS

I am working with PEB. I have managed to get inside _RTL_USER_PROCESS_PARAMETERS.
My Aim-> To know the memory address of argc and argv .( and if possible their values too ) only by using a binary file (.exe file)
My current approach-> To access commandline string(which resides inside the struct _RTL_USER_PROCESS_PARAMETERS.
i managed to get inside it by embedding asm inside a c program
mov eax:fs[0x30]
mov [PEBaddress] , eax
mov ebx, [eax+0x10]
mov [ProcessParameters] , ebx
i got the offsets 0x30 and 0x10 by studying the binary under windows debugger
now at the offset of 0x40 from Processparameters address lies the string commandline, which i believe is a buffer,which i further believe is holding the value of argc and argv.
Problem: I want to read that buffer , and get the address values of argc and argv (command line arguments passed to a process)
can anyone make this possible by providing me with a code for reading the buffer (as it is Unicode string) and get the required address.
Is there anyother way of doing this job ?(you can suggest me that also ,dont give me the option of printing the address of argc and argv inside main) i want static answers.
Windows does not pass argc and argv into a program. It passes the full literal command line, as a string. If the program in question even is a C program, then this parsing is done by the C runtime libraries embedded in that program.

Simple "Hello-World", null-free shellcode for Windows needed

I would like to test a buffer-overflow by writing "Hello World" to console (using Windows XP 32-Bit). The shellcode needs to be null-free in order to be passed by "scanf" into the program I want to overflow. I've found plenty of assembly-tutorials for Linux, however none for Windows. Could someone please step me through this using NASM? Thxxx!
Assembly opcodes are the same, so the regular tricks to produce null-free shellcodes still apply, but the way to make system calls is different.
In Linux you make system calls with the "int 0x80" instruction, while on Windows you must use DLL libraries and do normal usermode calls to their exported functions.
For that reason, on Windows your shellcode must either:
Hardcode the Win32 API function addresses (most likely will only work on your machine)
Use a Win32 API resolver shellcode (works on every Windows version)
If you're just learning, for now it's probably easier to just hardcode the addresses you see in the debugger. To make the calls position independent you can load the addresses in registers. For example, a call to a function with 4 arguments:
PUSH 4 ; argument #4 to the function
PUSH 3 ; argument #3 to the function
PUSH 2 ; argument #2 to the function
PUSH 1 ; argument #1 to the function
MOV EAX, 0xDEADBEEF ; put the address of the function to call
CALL EAX
Note that the argument are pushed in reverse order. After the CALL instruction EAX contains the return value, and the stack will be just like it was before (i.e. the function pops its own arguments). The ECX and EDX registers may contain garbage, so don't rely on them keeping their values after the call.
A direct CALL instruction won't work, because those are position dependent.
To avoid zeros in the address itself try any of the null-free tricks for x86 shellcode, there are many out there but my favorite (albeit lengthy) is encoding the values using XOR instructions:
MOV EAX, 0xDEADBEEF ^ 0xFFFFFFFF ; your value xor'ed against an arbitrary mask
XOR EAX, 0xFFFFFFFF ; the arbitrary mask
You can also try NEG EAX or NOT EAX (sign inversion and bit flipping) to see if they work, it's much cheaper (two bytes each).
You can get help on the different API functions you can call here: http://msdn.microsoft.com
The most important ones you'll need are probably the following:
WinExec(): http://msdn.microsoft.com/en-us/library/ms687393(VS.85).aspx
LoadLibrary(): http://msdn.microsoft.com/en-us/library/windows/desktop/ms684175(v=vs.85).aspx
GetProcAddress(): http://msdn.microsoft.com/en-us/library/ms683212%28v=VS.85%29.aspx
The first launches a command, the next two are for loading DLL files and getting the addresses of its functions.
Here's a complete tutorial on writing Windows shellcodes: http://www.codeproject.com/Articles/325776/The-Art-of-Win32-Shellcoding
Assembly language is defined by your processor, and assembly syntax is defined by the assembler (hence, at&t, and intel syntax) The main difference (at least i think it used to be...) is that windows is real-mode (call the actual interrupts to do stuff, and you can use all the memory accessible to your computer, instead of just your program) and linux is protected mode (You only have access to memory in your program's little cubby of memory, and you have to call int 0x80 and make calls to the kernel, instead of making calls to the hardware and bios) Anyway, hello world type stuff would more-or-less be the same between linux and windows, as long as they are compatible processors.
To get the shellcode from your program you've made, just load it into your target system's
debugger (gdb for linux, and debug for windows) and in debug, type d (or was it u? Anyway, it should say if you type h (help)) and between instructions and memory will be the opcodes.
Just copy them all over to your text editor into one string, and maybe make a program that translates them all into their ascii values. Not sure how to do this in gdb tho...
Anyway, to make it into a bof exploit, enter aaaaa... and keep adding a's until it crashes
from a buffer overflow error. But find exactly how many a's it takes to crash it. Then, it should tell you what memory adress that was. Usually it should tell you in the error message. If it says '9797[rest of original return adress]' then you got it. Now u gotta use ur debugger to find out where this was. disassemble the program with your debugger and look for where scanf was called. Set a breakpoint there, run and examine the stack. Look for all those 97's (which i forgot to mention is the ascii number for 'a'.) and see where they end. Then remove breakpoint and type the amount of a's you found out it took (exactly the amount. If the error message was "buffer overflow at '97[rest of original return adress]" then remove that last a, put the adress you found examining the stack, and insert your shellcode. If all goes well, you should see your shellcode execute.
Happy hacking...

Resources