Why won't CreateProcess work in assembly in Windows? - windows

I have a C program here that invokes CreateProcess...
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
int main(int argc, char *argv[])
{
STARTUPINFO st;
ZeroMemory(%st, sizeof(STARTUPINFO));
st.cb = sizeof(STARTUPINFO);
PROCESS_INFORMATION pi;
CreateProcessA("C:\\WINDOWS\\system32\\cmd.exe",0,0,0,0,0,0,0,&st,&pi);
return 0;
}
Which runs fine, creating a shell within a shell.
I also have this code, written in GAS assembly via the MinGW compiler suite for Windows...
.extern _CreateProcessA#40
.def _CreateProcessA#40; .scl 2; .type 32; .endef
.extern _ExitProcess#4
.def _ExitProcess#4; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
push %ebp
movl %esp, %ebp
#PROCESS_INFORMATION...
subl $16, %esp
movl %esp, %eax
#STARTUPINFO...
subl $68, %esp
movl $68, (%esp)
movl %esp, %ebx
#Application name with path : C:\WINDOWS\system32\cmd.exe...
subl $29, %esp
xor %edx, %edx
movb %dx, 27(%esp)
movb $0x65, 26(%esp)
movw $0x7856, 24(%esp)
movl $0x2e646d63, 20(%esp)
movl $0x5c32336d, 16(%esp)
movl $0x65747379, 12(%esp)
movl $0x735c5357, 8(%esp)
movl $0x4f444e49, 4(%esp)
movl $0x575c3a43, (%esp)
movl %esp, %ecx
push %eax
push %ebx
push %edx
push %edx
push %edx
push %edx
push %edx
push %edx
push %edx
push %ecx
call _CreateProcessA#40
movl %ebp, %esp
pop %ebp
push %edx
call _ExitProcess#4
It compiles and links fine with;
as createProc.s -o createProc.o
ld createProc.o -o createProc.exe -lkernel32
When it runs though, and it does run, it executes without starting a second shell within a shell on the command line. What could be wrong?
Note : I'm inputting the string with the movl instructions for a reason, so please no suggestions saying that I should be using .data, .bss, or lables. Also note I have already tried using escaped slashes (\\) within the string in the assembly program to no avail, it actually crashes if escaped slashes are used.

About programming style
You should forget about mucking around with ESP.
The way to do it is to set up a stack frame at the start of your routine and use EBP to address the space thus created.
You have a typo in your path
You pass "c1\win...." as the path. That's not going to work.
You should double check the code against the ascii-table, or review the parameters in a debugger making the API call..
Also I have no idea why you need 29 bytes to store the string. It fits in 28 chars as far as I can tell.
Working code using a stack frame
Here's code that works using a stack frame the way it is supposed to be done.
//Set up stack frame.
00418200 55 push ebp
00418201 8BEC mov ebp,esp
00418203 83C490 add esp,-$70
//Zero StartupInfoA
00418206 57 push edi
00418207 8D45A0 lea eax,[ebp-$60]
0041820A 8BF8 mov edi,eax
0041820C 33C0 xor eax,eax
0041820E B911000000 mov ecx,$00000011
00418213 F3AB rep stosd
//st.cb = SizeOf(st)
00418215 C745A044000000 mov [ebp-$60],$00000044
//Set the string: path = 'c:\windows\system32\cmd.exe'; 28 chars including trailing 0.
0041821C C745E4433A5C77 mov [ebp-$1c],$775c3a43 //c:\w
00418223 C745E8696E646F mov [ebp-$18],$6f646e69 //indo
0041822A C745EC77735C73 mov [ebp-$14],$735c7377 //ws\s
00418231 C745F079737465 mov [ebp-$10],$65747379 //yste
00418238 C745F46D33325C mov [ebp-$0c],$5c32336d //m32\
0041823F C745F8636D642E mov [ebp-$08],$2e646d63 //cmd.
00418246 C745FC65786500 mov [ebp-$04],$00657865 //exe-
//Set up parameters for call
0041824D 8D4590 lea eax,[ebp-$70] //ProcessInfo
00418250 50 push eax
00418251 8D45A0 lea eax,[ebp-$60] //StartupInfoA
00418254 50 push eax
00418255 6A00 push $00
00418257 6A00 push $00
00418259 6A00 push $00
0041825B 6A00 push $00
0041825D 6A00 push $00
0041825F 6A00 push $00
00418261 6A00 push $00
00418263 8D45E4 lea eax,[ebp-$1c] //Path
00418266 50 push eax
//Call
00418267 E80823FFFF call CreateProcessA
//Clean up the stackframe
0041826C 5F pop edi
0041826D 8BE5 mov esp,ebp
0041826F 5D pop ebp
About messing with ESP
If you set ESP to an unaligned address it will seriously degrade performance.

#HarryJohnson figured it out, all that was needed was to zero out the STARTUPINFO structure,
.extern _CreateProcessA#40
.def _CreateProcessA#40; .scl 2; type 32; .endef
.extern _ExitProcess#4
.def _ExitProcess#4; .scl 2; type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
push %ebp
movl %ebp, %esp
xor %edx, %edx
#PROCESS_INFORMATION...
subl $16, %esp
movl %esp, %eax
#STARTUPINFO...
subl $68, %esp
movl %edx, 64(%esp)
movl %edx, 60(%esp)
movl %edx, 56(%esp)
movl %edx, 52(%esp)
movl %edx, 48(%esp)
movl %edx, 44(%esp)
movl %edx, 40(%esp)
movl %edx, 36(%esp)
movl %edx, 32(%esp)
movl %edx, 28(%esp)
movl %edx, 24(%esp)
movl %edx, 20(%esp)
movl %edx, 16(%esp)
movl %edx, 12(%esp)
movl %edx, 8(%esp)
movl %edx, 4(%esp)
movl %edx, (%esp)
movb $68, (%esp)
movl %esp, %ebx
#Application name (C:\WINDOWS\system32\cmd.exe)...
subl $28, %esp
movb %dl, 27(%esp)
movb $0x65, 26(%esp)
movw $0x7865, 24(%esp)
movl $0x2e646d63, 20(%esp)
movl $0x5c32336d, 16(%esp)
movl $0x65747379, 12(%esp)
movl $0x735c5357, 8(%esp)
movl $0x4f444e49, 4(%esp)
movl $0x575c3a43, (%esp)
movl %esp, %ecx
push %eax
push %ebx
push %edx
push %edx
push %edx
push %edx
push %edx
push %edx
push %edx
push %ecx
call _CreateProcessA#40
mov %ebp, %esp
pop %ebp
push %edx
call _ExitProcess#4

Related

Correct usage of the RIP related addressing

I found example of code on assembly, which finds the maximum number in array named data_items but that example was for x86 and I tried to adapt it for x64 because 32 bit absolute addressing is not supported by 64 bit system.
To be short there are three actions:
lea data_items(%rip), %rdi #(1) Obtaining data_items address
add $4, %rdi #(2) Incrementing the pointer to 4 to read a next item
movl (%rdi), %eax #(3) Reading data at %rdi to %eax
The main questions:
Is it correct way to pointing? Can it produce error after code relocation?
If the %rip register constantly grows, why lea data_items(%rip), %rdi loads correct memory address? May be getting an offset by %rip have special meaning rather than "dataItems + %rip"?
Full adapted code here:
.section __DATA,__data
data_items:
.long 3,67,34,222,45,75,54,34,44,33,22,11,66,0
.section __TEXT,__text
.globl _main
_main:
lea data_items(%rip), %rdi #(1)
movl (%rdi), %eax
movl %eax, %ebx
start_loop:
cmpl $0, %eax
je loop_exit
add $4, %rdi #(2)
movl (%rdi), %eax #(3)
cmpl %ebx, %eax
jle start_loop
movl %eax, %ebx
jmp start_loop
loop_exit:
mov $0x2000001, %rax
mov $0, %rdi
syscall

C++11 memory ordering support in GCC

GCC does not seem to support the different memory ordering settings as it generates the same code for relaxed, acquire and sequentially consistent.
I tried the following code with GCC 7.4 and 9.1:
#include <thread>
#include <atomic>
using namespace std;
atomic<int> z(0);
void Thr1()
{
z.store(1,memory_order_relaxed);
}
void Thr2()
{
z.store(2,memory_order_release);
}
void Thr3()
{
z.store(3);
}
//------------------------------------------
int main (int argc, char **argv)
{
thread t1(Thr1);
thread t2(Thr2);
thread t3(Thr3);
t1.join();
t2.join();
t3.join();
return 0;
}
When I generate assembly for the above, I get the following for each of the three functions:
_Z4Thr1v:
.LFB2992:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl $1, -12(%rbp)
movl $0, -8(%rbp)
movl -8(%rbp), %eax
movl $65535, %esi
movl %eax, %edi
call _ZStanSt12memory_orderSt23__memory_order_modifier
movl %eax, -4(%rbp)
movl -12(%rbp), %edx
leaq z(%rip), %rax
movl %edx, (%rax)
mfence
nop
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE2992:
.size _Z4Thr1v, .-_Z4Thr1v
.globl _Z4Thr2v
.type _Z4Thr2v, #function
_Z4Thr2v:
.LFB2993:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl $2, -12(%rbp)
movl $3, -8(%rbp)
movl -8(%rbp), %eax
movl $65535, %esi
movl %eax, %edi
call _ZStanSt12memory_orderSt23__memory_order_modifier
movl %eax, -4(%rbp)
movl -12(%rbp), %edx
leaq z(%rip), %rax
movl %edx, (%rax)
mfence
nop
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE2993:
.size _Z4Thr2v, .-_Z4Thr2v
.globl _Z4Thr3v
.type _Z4Thr3v, #function
_Z4Thr3v:
.LFB2994:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl $3, -12(%rbp)
movl $5, -8(%rbp)
movl -8(%rbp), %eax
movl $65535, %esi
movl %eax, %edi
call _ZStanSt12memory_orderSt23__memory_order_modifier
movl %eax, -4(%rbp)
movl -12(%rbp), %edx
leaq z(%rip), %rax
movl %edx, (%rax)
mfence
nop
leave
.cfi_def_cfa 7, 8
ret
where all of the code ends in a memory fence instruction.
If you are interested in performance, then looking at non-optimised machine code is not going to get you anywhere. Here's what gcc -O2 generates:
Thr1():
mov DWORD PTR z[rip], 1
ret
Thr2():
mov DWORD PTR z[rip], 2
ret
Thr3():
mov DWORD PTR z[rip], 3
mfence
ret
As you can see, only sequentially consistent memory order requires mfence.

Storing keyboard Input in x64 assembly (Mac OS/X)

I have been trying for some time now to get a number from a keyboard and comparing it with a value on the stack. If it is correct it will print "Hello World!" and if incorrect, it should print out "Nope!". However, what happens now is no matter the input "jne" is called, nope is printed, and segfault. Perhaps one of you could lend a hand.
.section __DATA,__data
str:
.asciz "Hello world!\n"
sto:
.asciz "Nope!\n"
.section __TEXT,__text
.globl _main
_main:
push %rbp
mov %rsp,%rbp
sub $0x20, %rsp
movl $0x0, -0x4(%rbp)
movl $0x2, -0x8(%rbp)
movl $0x2000003, %eax
mov $0, %edi
subq $0x4, %rsi
movq %rsi, %rcx
syscall
cmp -0x8(%rbp), %edx
je L1
jne L2
xor %rbx, %rbx
xor %rax, %rax
movl $0x2000001, %eax
syscall
L1:
xor %rax, %rax
movl $0x2000004, %eax
movl $1, %edi
movq str#GOTPCREL(%rip), %rsi
movq $14, %rdx
syscall
ret
L2:
xor %eax, %eax
movl $0x2000004, %eax
movl $1, %edi
movq sto#GOTPCREL(%rip), %rsi
movq $6, %rdx
syscall
ret
I would start with this OS/X Syscall tutorial (The 64-bit part in your case). It is written for NASM syntax but the important information is the text and links for the SYSCALL calling convention. The SYSCALL table is found on this Apple webpage. Additional information on the standard calling convention for 64-bit OS/X can be found in the System V 64-bit ABI.
Of importance for SYSCALL convention:
arguments are passed in order via these registers rdi, rsi, rdx, r10, r8 and r9
syscall number in the rax register
the call is done via the syscall instruction
what OS X contributes to the mix is that you have to add 0x20000000 to the syscall number (still have to figure out why)
You have many issues with with your sys_read system call. The SYSCALL table says this:
3 AUE_NULL ALL { user_ssize_t read(int fd, user_addr_t cbuf, user_size_t nbyte); }
So given the calling convention, int fd is in RDI, user_addr_t cbuf (pointer to character buffer to hold return data) is in RSI, and user_size_t nbyte (maximum bytes buffer can contain) is in RDX.
Your program seg faulted on the ret because you didn't have proper function epilogue to match the function prologue at the top:
push %rbp #
mov %rsp,%rbp # Function prologue
You need to do the reverse at the bottom, set the result code in RAX and then do the ret. Something like:
mov %rbp,%rsp # \ Function epilogue
pop %rbp # /
xor %eax, %eax # Return value = 0
ret # Return to C runtime which will exit
# gracefully and return to OS
I did other minor cleanup, but tried to keep the structure of the code similar. You will have to learn more assembly to better understand the code that sets up RSI with the address for sys_read SYSCALL . You should try to find a good tutorial/book on x86-64 assembly language programming in general. Writing a primer on that subject is beyond the scope of this answer.
Code that might be closer to what you were looking for that takes the above into account:
.section __DATA,__data
str:
.asciz "Hello world!\n"
sto:
.asciz "Nope!\n"
.section __TEXT,__text
.globl _main
_main:
push %rbp #
mov %rsp,%rbp # Function prologue
sub $0x20, %rsp # Allocate 32 bytes of space on stack
# for temp local variables
movl $0x2, -4(%rbp) # Number for comparison
# 16-bytes from -20(%rbp) to -5(%rbp)
# for char input buffer
movl $0x2000003, %eax
mov $0, %edi # 0 for STDIN
lea -20(%rbp), %rsi # Address of temporary buffer on stack
mov $16, %edx # Read 16 character maximum
syscall
movb (%rsi), %r10b # RSI = pointer to buffer on stack
# get first byte
subb $48, %r10b # Convert first character to number 0-9
cmpb -4(%rbp), %r10b # Did we find magic number (2)?
jne L2 # If No exit with error message
L1: # If the magic number matched print
# Hello World
xor %rax, %rax
movl $0x2000004, %eax
movl $1, %edi
movq str#GOTPCREL(%rip), %rsi
movq $14, %rdx
syscall
jmp L0 # Jump to exit code
L2: # Print "Nope"
xor %eax, %eax
movl $0x2000004, %eax
movl $1, %edi
movq sto#GOTPCREL(%rip), %rsi
movq $6, %rdx
syscall
L0: # Code to exit main
mov %rbp,%rsp # \ Function epilogue
pop %rbp # /
xor %eax, %eax # Return value = 0
ret # Return to C runtime which will exit
# gracefully and return to OS

Load floating-point number from pointer to float and push on stack

This is a homework task. I've got a C program that calls a function calc(int, float*, float*, float*, float*) implemented with NASM. I want to do floating-point division with the data passed from C, but first I wanted to check if I access the data correctly.
This is an excerpt from the C program:
printf("read.c: F data1[0]=%f\n", data1[0]);
printf("read.c: X data1[0]=%X\n", *(int*)(&data1[0]));
calc(nlines, data1, data2, result1, result2);
For testing, I wanted to print out exactly the same from the assembler code, but whatever I tried, it wouldn't give me the right results. To be precise, outputting the %X format gives the same result, but the %f format gives some incredibly huge number.
global calc
extern printf
; -----------------------------------------------------------------------
; extern void calc(int nlines, float* data1, float* data2,
; float* result1, float* result2)
; -----------------------------------------------------------------------
calc:
section .data
.strf db "calc.asm: F data1[0]=%f", 10, 0
.strx db "calc.asm: X data1[0]=%X", 10, 0
section .text
enter 0, 0
; Move the value of float* data1 into ecx.
mov ecx, [esp + 12]
; Move the contents of data1[0] into esi.
mov esi, [ecx]
push esi
push .strf
call printf
add esp, 8
push esi
push .strx
call printf
add esp, 8
leave
ret
Outputs
read.c: F data1[0]=20.961977
read.c: X data1[0]=41A7B221
calc.asm: F data1[0]=-8796958457989122902187458235483374032941932827208012972482327255932202912296419757153331437662235555722313731094096197990916443553479942683040096290755684437514827018615169352974748429901549205109479495668937369584705401541113350145698235773041651907978442730240007381959397006695721667307435228446926569472.000000
calc.asm: X data1[0]=41A7B221
I've also looked into fld, but I couldn't find out how I can push the loaded value on stack. This didnt work:
; Move float* data1 into ecx
mov ecx, [esp + 12]
; Load the floating point number into esi.
fld dword [ecx]
fst esi
How to do it right?
I've stripped down read.c to this code
#include <stdio.h>
#include <stdlib.h>
#define MAXLINES 1024
extern void calc(int, float*, float*, float*, float*);
int main(int argc, char** argv)
{
int nlines;
float* data1 = malloc(sizeof(float)*MAXLINES);
float*data2, *results1, *results2;
printf("read.c: F data1[0]=%f\n", data1[0]);
printf("read.c: X data1[0]=%X\n", *(int*)(&data1[0]));
calc(nlines, data1, data2, results1, results2);
return 0;
}
and this is the assembler output:
.file "test.c"
.section .rodata
.LC0:
.string "read.c: F data1[0]=%f\n"
.LC1:
.string "read.c: X data1[0]=%X\n"
.text
.globl main
.type main, #function
main:
.LFB2:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $64, %esp
movl $4096, (%esp)
call malloc
movl %eax, 44(%esp)
movl 44(%esp), %eax
flds (%eax)
fstpl 4(%esp)
movl $.LC0, (%esp)
call printf
movl 44(%esp), %eax
movl (%eax), %eax
movl %eax, 4(%esp)
movl $.LC1, (%esp)
call printf
movl 60(%esp), %eax
movl %eax, 16(%esp)
movl 56(%esp), %eax
movl %eax, 12(%esp)
movl 52(%esp), %eax
movl %eax, 8(%esp)
movl 44(%esp), %eax
movl %eax, 4(%esp)
movl 48(%esp), %eax
movl %eax, (%esp)
call calc
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE2:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4"
.section .note.GNU-stack,"",#progbits
.LC1:
.string "read.c: F data1[0]=%f\n"
.LC2:
.string "read.c: X data1[0]=%X\n"
.text
.globl main
.type main, #function
main:
.LFB4:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $64, %esp
movl 44(%esp), %eax
flds (%eax)
fstpl 4(%esp)
movl $.LC1, (%esp)
call printf
movl 44(%esp), %eax
movl (%eax), %eax
movl %eax, 4(%esp)
movl $.LC2, (%esp)
call printf
movl 60(%esp), %eax
movl %eax, 16(%esp)
movl 56(%esp), %eax
movl %eax, 12(%esp)
movl 52(%esp), %eax
movl %eax, 8(%esp)
movl 44(%esp), %eax
movl %eax, 4(%esp)
movl 48(%esp), %eax
movl %eax, (%esp)
call calc
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE4:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4"
.section .note.GNU-stack,"",#progbits
Ok, I've now had a chance to test this and verify that what I suggested in my comment works. Here's my modified version of the assembly code, with some comments to explain the things I've added/changed:
global _calc
extern _printf
; -----------------------------------------------------------------------
; extern void calc(int nlines, float* data1, float* data2,
; float* result1, float* result2)
; -----------------------------------------------------------------------
_calc:
section .data
.strf db "calc.asm: F data1[0]=%f", 10, 0
.strx db "calc.asm: X data1[0]=%X", 10, 0
section .text
enter 0, 0
; Move the value of float* data1 into ecx.
mov ecx, [esp + 12]
; Move the contents of data1[0] into esi.
mov esi, [ecx]
fld dword [ecx] ; Load a single-precision float onto the FP stack.
sub esp,8 ; Make room for a double on the stack.
fstp qword [esp] ; Store the top of the FP stack on the regular stack as
; a double, and pop it off the FP stack.
push .strf
call _printf
add esp, 12 ; 12 == sizeof(char*) + sizeof(double)
push esi
push .strx
call _printf
add esp, 8
leave
ret

Can you convert this inline asm into non-inline one?

I came across this inline asm. I am not sure how it should look without this syntax... Could someone show it to me?
__asm__ volatile ("lock\n\tincl %0"
:"=m"(llvm_cbe_tmp__29)
:"m"(*(llvm_cbe_tmp__29))"cc");
lock
incl llvm_cbe_tmp__29
However, because the operand is specified abstractly, the compiler will generate the code needed to reference it, even if that means a load and store. As a result it is possible that more than two instructions or an addressing mode will be added.
Using gcc -S on this:
int main()
{
int *p;
asm volatile ("lock\n\tincl %0":"=m"(p):"m"(*(p)):"cc");
}
gives
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
movl -8(%ebp), %eax
#APP
# 4 "asm.c" 1
lock
incl -8(%ebp)
# 0 "" 2
#NO_APP
addl $20, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret

Resources