In 32-bit assembly, I can access the ProcessEnvironmentBlock of the TEB structure. From there I access Ldr of the TEB structure.
This technique is described here: http://en.wikipedia.org/wiki/Win32_Thread_Information_Block
The code to do this in 32-bit assembly is:
void* ptr = NULL;
__asm
{
mov eax, FS:[0x18]
mov eax, [eax + 0x30] //Offset of PEB
mov eax, [eax + 0x0C] //Offset of LDR in PEB structure
mov eax, _ptr
};
std::cout<<ptr<<"\n";
The TEB structure can be seen here: http://msdn.moonsols.com/win7rtm_x64/TEB.html
and the PEB structure can be seen here: http://msdn.moonsols.com/win7rtm_x64/PEB.html
The above works for 32-bit code.
However, I want to also write code to work on x64 machines. I viewed the x64 version of the structures and wrote:
__asm
{
mov rax, GS:[0x30]
mov rax, [rax + 0x60]
mov rax, [rax + 0x18]
mov rax, _ptr
};
This can be done using Winnt.h NtCurrentTeb() but I want to use assembly.
However, it fails to work at all. Any ideas why?
If you using visual studio, you can use Intrinsics!
[x86]
__readfsbyte
__readfsdword
__readfsqword
__readfsword
[x64]
__readgsbyte
__readgsdword
__readgsqword
__readgsword
Good luck~
Visual studio doesn't allow inline assembler for X64 C++. The __asm keyword isn't supported. You can write your assembler in a separate file and link it in or you can use intrinsics do what you need to do.
In order to achieve this, you must create an .asm file in Visual Studio like described here.
To access the TEB/PEB in x64 compiled with Visual Studio you can use the following code:
GetTEBAsm64 proc
push rbx
xor rbx,rbx
xor rax,rax
mov rbx, qword ptr gs:[00000030h]
mov rax, rbx
pop rbx
ret
GetTEBAsm64 endp
GetPEBAsm64 proc
push rbx
xor rbx,rbx
xor rax,rax
mov rbx, qword ptr gs:[00000060h]
mov rax, rbx
pop rbx
ret
GetPEBAsm64 endp
And then simply use them:
PTEB pTeb = GetTEBAsm64();
PPEB pPeb = GetPEBAsm64();
PRO TIP:
You can define your own "versions" of these structs (PTEB/PPEB), to include more or less information in them.
Related
Before calling a member function of an object, the address of the object will be moved to ECX.
Inside the function, ECX will be moved to dword ptr [this], what does this mean?
C++ Source
#include <iostream>
class CAdd
{
public:
CAdd(int x, int y) : _x(x), _y(y) {}
int Do() { return _x + _y; }
private:
int _x;
int _y;
};
int main()
{
CAdd ca(1, 2);
int n = ca.Do();
std::cout << n << std::endl;
}
Disassembly
...
CAdd ca(1, 2);
00A87B4F push 2
00A87B51 push 1
00A87B53 lea ecx,[ca] ; the instance address
00A87B56 call CAdd::CAdd (0A6BA32h)
int Do() { return _x + _y; }
00A7FFB0 push ebp
00A7FFB1 mov ebp,esp
00A7FFB3 sub esp,0CCh
00A7FFB9 push ebx
00A7FFBA push esi
00A7FFBB push edi
00A7FFBC push ecx
00A7FFBD lea edi,[ebp-0Ch]
00A7FFC0 mov ecx,3
00A7FFC5 mov eax,0CCCCCCCCh
00A7FFCA rep stos dword ptr es:[edi]
00A7FFCC pop ecx
00A7FFCD mov dword ptr [this],ecx ; ========= QUESTION HERE!!! =========
00A7FFD0 mov ecx,offset _CC7F790E_main#cpp (0BC51F2h)
00A7FFD5 call #__CheckForDebuggerJustMyCode#4 (0A6AC36h)
00A7FFDA mov eax,dword ptr [this] ; ========= AND HERE!!! =========
00A7FFDD mov eax,dword ptr [eax]
00A7FFDF mov ecx,dword ptr [this]
00A7FFE2 add eax,dword ptr [ecx+4]
00A7FFE5 pop edi
00A7FFE6 pop esi
00A7FFE7 pop ebx
00A7FFE8 add esp,0CCh
00A7FFEE cmp ebp,esp
00A7FFF0 call __RTC_CheckEsp (0A69561h)
00A7FFF5 mov esp,ebp
00A7FFF7 pop ebp
00A7FFF8 ret
MSVC's asm output itself (https://godbolt.org/z/h44rW3Mxh) uses _this$[ebp] with _this$ = -4, in a debug build like this which wastes instructions storing/reloading incoming register args.
_this$ = -4
int CAdd::Do(void) PROC ; CAdd::Do, COMDAT
push ebp
mov ebp, esp
push ecx ; dummy push instead of sub to reserve 4 bytes
mov DWORD PTR _this$[ebp], ecx
mov eax, DWORD PTR _this$[ebp]
...
This is just spilling the register arg to a local on the stack with that name. (The default options for the MSVC version I used on Godbolt, x86 MSVC 19.29.30136, don't include __CheckForDebuggerJustMyCode#4 or the runtime-check stack poisoning (rep stos) in Do(), but the usage of this is still there.)
Amusingly, the push ecx it uses (as a micro-optimization) instead of sub esp, 4 to reserve stack space already stored ECX, making the mov store redundant.
(AFAIK, no compilers actually do use push to both initialize and make space for locals, but it would be an optimization for cases like this: What C/C++ compiler can use push pop instructions for creating local variables, instead of just increasing esp once?. It's just using the push for its effect on ESP, not caring what it stores, even if you enabled optimization. In a function where it did still need to spill it, instead of keeping it in memory.)
Your disassembler apparently folds the frame-pointer (EBP +) into what its defining as a this symbol / macro, making it more confusing if you don't look around at other lines to find out how it defines that text macro or whatever it is.
What disassembler are you using? The one built-in to Visual Studio's debugger?
I guess that would make sense that it's using C local var names this way, even though it looks super weird to people familiar with asm. (Because only static storage is addressable with a mode like [symbol] not involving any registers.)
I'm trying to learn more about reading PEB / TEB (and segment registers in general).
My main source is: https://en.wikipedia.org/wiki/Win32_Thread_Information_Block
I'm trying to figure out what's the actual address of any address in gs.
For instance, I see that the PEB in x64 is in gs:[60h]
so
mov rax, gs:[60h]
Should give me the address of the PEB. I'm trying to figure out where it is stored / what's the value of gs.
push gs
pop rax ; rax = 43 (Memory is not defined # 43)
; mov rax, gs:0h ; Error : Operand must be memory expression
mov rax, gs ; rax = 43
lea rax, gs:[0] ; rax = 0 ?? should have been 43, or real address?
lea rax, gs:[30h] ; rax = 48 -> why is that?
mov rax, gs:[30h] ; getting TEB linear address
mov rbx, [rax+60h] ; getting PEB address using TEB linear address
mov rcx, gs:[60h] ; rbx == rcx
My main questions are:
1) What are the values of 43 and 48 what they represent?
2) Is there a way to find and access the memory location of gs without gs:[30h]? since lea and gs:0 didn't work
3) I've tried assume nothing that didn't compile in my enviroment.
Enviroment: Windows 7, Visual Studio 2015
I've seen many references stating
I am creating a file in NASM 64 using CreateFileA in the Windows API. Yesterday I posted a question on this, which brought out some useful comments. Today I wrote this section in C and compliled it with Pelles C compiler, and got the disassembly from x64dbg so I can use the output in NASM. I put the disassembly code into my NASM project, and here's the program in NASM as it is now:
CreateAuditFile:
sub rsp,38
xor eax,eax
mov qword [rsp+30],rax
mov eax,80
mov dword [rsp+28],eax
mov eax,2
mov dword [rsp+20],eax
xor r9,r9
xor r8d,r8d
mov edx,40000000
mov rcx,FileName_1
;lea rcx,[rel FileName_1]
call CreateFileA
call GetLastError
mov rdi,FileAuditString ; not from the disassembly
mov [rdi],rax ; not from the disassembly
xor eax,eax
add rsp,38
ret
That's the disassembly of this C code (done as a console app):
int main(void)
{
char* File_Name ="C:\\Audit_File_P1";
HANDLE hFile;
hFile = CreateFileA(
File_Name,
GENERIC_WRITE,
0,
NULL,
CREATE_ALWAYS,
FILE_ATTRIBUTE_NORMAL,
NULL);
}
This works in C, but in NASM it returns an error code 87, "invalid parameter." I notice two things: (1) the first line is sub rsp,38, which is an odd number and not 16-byte aligned; (2) for the file name (FileName_1), the disassembly shows
"lea rcx,qword ptr ds:[140006270]" but in NASM we use "mov rcx,FileName_1" instead of lea where the file name string is defined in the .data section, as it is here (FileName_1: db "c:\AuditFile_1.txt",0). Also, it inserts a qword parameter at [rsp+30] and a dword at [rsp+28] but a dword needs four bytes, not two, which seems wrong but it works from the C version.
So my question is: which parameter(s) is/are incorrect?
Thanks very much.
This problem was solved with the help of others above, and I am posting the full correct code below for others who need this information in the future:
CreateAuditFile:
mov rcx,FileName_1
sub rsp,56 ; 38h
xor eax,eax
mov qword [rsp+48],rax ; 30h
mov eax,80
mov dword [rsp+40],eax ; 28h
mov eax,2
mov dword [rsp+32],eax ; 20h
xor r9,r9
xor r8d,r8d
mov edx,40000000
call CreateFileA
mov rdi,OutputFileHandle
mov [rdi+r15],rax
xor eax,eax
add rsp,56 ;38h
ret
Here is what changed (changes are followed by a comment and ***):
CreateAuditFile:
sub rsp,38 - should be sub rsp,56 ***
xor eax,eax
mov qword [rsp+30],rax - should be mov qword [rsp+48],rax ***
mov eax,80
mov dword [rsp+28],eax - should be mov dword [rsp+40],eax ***
mov eax,2
mov dword [rsp+20],eax - should be mov dword [rsp+32],eax ***
xor r9,r9
xor r8d,r8d
mov edx,40000000
mov rcx,FileName_1
call CreateFileA
call GetLastError - do not call this without testing the result of CreateFileA ***
mov rdi,FileAuditString - not needed ***
mov [rdi],rax - not needed ***
mov rdi,OutputFileHandle - use this to capture the file handle ***
mov [rdi+r15],rax - use this to capture the file handle ***
xor eax,eax
add rsp,38 - should be add rsp,56 ***
ret
Thanks very much to everyone who responded.
I am trying to find a specific pattern in the Windows system calls, for research purposes.
So far i've been looking into the Windows dlls such as ntdll.dll, user32.dll, etc., but those seem to contain only wrapper codes for preparing to jump to the system call. For example:
mov eax, 101Eh
lea edx, [esp+arg_0]
mov ecx, 0
call large dword ptr fs:0C0h
retn 10h
I'm guessing the call large dword ptr fs:0C0h instruction is another gateway in the chain that finally leads to the actual assembly, but I was wondering if I can get to that assembly directly.
You're looking in the wrong dlls. The system calls are in ntoskrnl.exe.
If you look at NtOpenFile() in ntoskrnl.exe you'll see:
mov r11, rsp
sub rsp, 88h
mov eax, [rsp+88h+arg_28]
xor r10d, r10d
mov [r11-10h], r10
mov [rsp+88h+var_18], 20h ; int
mov [r11-20h], r10d
mov [r11-28h], r10
mov [r11-30h], r10d
mov [r11-38h], r10d
mov [r11-40h], r10
mov [rsp+88h+var_48], eax ; int
mov eax, [rsp+88h+arg_20]
mov [rsp+88h+var_50], 1 ; int
mov [rsp+88h+var_58], eax ; int
mov [r11-60h], r10d
mov [r11-68h], r10
call IopCreateFile
add rsp, 88h
retn
Which is the true body of the function. Most of the work is done in IopCreateFile(), but you can follow it statically and do whatever analysis you need.
I'm using C++builder for GUI application on Win32. Borland compiler optimization is very bad and does not know how to use SSE.
I have a function that is 5 times faster when compiled with mingw gcc 4.7.
I think about asking gcc to generate assembler code and then use this cod inside my C function because Borland compiler allows inline assembler.
The function in C looks like this :
void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
{
double s = 77.777;
size_t m = mA[NT-3];
AV[2]=x[n-4]+m*s;
}
I made the function code very simple in order to simplify my question. My real function contains many loops.
The Borland C++ compiler generated this assembler code :
;
; void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
;
#1:
push ebp
mov ebp,esp
add esp,-16
push ebx
;
; {
; double s = 77.777;
;
mov dword ptr [ebp-8],1580547965
mov dword ptr [ebp-4],1079210426
;
; size_t m = mA[NT-3];
;
mov edx,dword ptr [ebp+20]
mov ecx,dword ptr [ebp+24]
mov eax,dword ptr [edx+4*ecx-12]
;
; AV[2]=x[n-4]+m*s;
;
?live16385#48: ; EAX = m
xor edx,edx
mov dword ptr [ebp-16],eax
mov dword ptr [ebp-12],edx
fild qword ptr [ebp-16]
mov ecx,dword ptr [ebp+8]
mov ebx,dword ptr [ebp+12]
mov eax,dword ptr [ebp+16]
fmul qword ptr [ebp-8]
fadd qword ptr [ecx+8*ebx-32]
fstp qword ptr [eax+16]
;
; }
;
?live16385#64: ;
#2:
pop ebx
mov esp,ebp
pop ebp
ret
While the gcc generated assembler code is :
_Test_Fn:
mov edx, DWORD PTR [esp+20]
mov eax, DWORD PTR [esp+16]
mov eax, DWORD PTR [eax-12+edx*4]
mov edx, DWORD PTR [esp+8]
add eax, -2147483648
cvtsi2sd xmm0, eax
mov eax, DWORD PTR [esp+4]
addsd xmm0, QWORD PTR LC0
mulsd xmm0, QWORD PTR LC1
addsd xmm0, QWORD PTR [eax-32+edx*8]
mov eax, DWORD PTR [esp+12]
movsd QWORD PTR [eax+16], xmm0
ret
LC0:
.long 0
.long 1105199104
.align 8
LC1:
.long 1580547965
.long 1079210426
.align 8
I like to get help about how the function arguments acces is done in gcc and Borland C++.
My function in C++ for Borland would be something like :
void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
{
__asm
{
put gcc generated assembler here
}
}
Borland starts using ebp register while gcc use esp register.
Can I force one of the compilers to generate compatible code for accessing the arguments using some calling conventions like cdecl ou stdcall ?
The arguments are passed similarly in both cases. The difference is that the code generated by Borland expresses the argument locations relative to EBP register and GCC relative to ESP, but both of them refer to the same addresses.
Borlands sets EBP to point to the start of the function's stack frame and expresses locations relative to that, while GCC doesn't set up a new stack frame but expresses locations relative to ESP, which the caller has left pointing to the end of the caller's stack frame.
The code generated by Borland sets up a stack frame at the beginning of the function, causing EBP in the Borland code to be equal to ESP in the GCC code decreased by 4. This can be seen by looking at the first two Borland lines:
push ebp ; decrease esp by 4
mov ebp,esp ; ebp = the original esp decreased by 4
The GCC code doesn't alter ESP and Borland code doesn't alter EBP until the end of the procedure, so the relationsip holds when the arguments are accessed.
The calling convention seems to be cdecl in both of the cases, and there's no difference in how the functions are called. You can add keyword __cdecl to both in order to make that clear.
void __cdecl Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
However adding inline assembly compiled with GCC to the function compiled with Borland is not straightforward, because Borland might set up a stack frame even if the function body contains only inline assembly, causing the value of ESP register to differ from the one used in the GCC code. I see three possible workarounds:
Compile with Borland without the option "Standard stack frames". If the compiler figures out that a stack frame is not needed, this might work.
Compile with GCC without the option -fomit-frame-pointer. This should make sure that atleast the value of EBP is the same in both. The option is enabled at levels -O, -O2, -O3 and -Os.
Manually edit the assembly produced by GCC, changing references to ESP to EBP and adding 4 to the offset.
I would recommend you do some reading up on Application Binary Interfaces.
Here is a relevant link to help you figure out what compiler generates what sort of code:
https://en.wikipedia.org/wiki/X86_calling_conventions
I'd try either compiling everything with GCC, or see if compiling just the critical file with GCC and the rest with Borland and linking together works. What you explain can be made to work, but it will be a hard job that probably isn't worth your invested time (unless it will run very frequently on many, many machines).