I am debugging an application with GDB.
There is a comparaison of strings and then a jump (JNS):
0x8048410 <main+32>: call 0x8058a70
0x8048415 <main+37>: add esp,0x10
0x8048418 <main+40>: test eax,eax
0x804841a <main+42>: jns 0x8048436 <main+70>
The state of the stack after the test instruction is the following:
EFLAGS: 0x286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow)
Capital letters mean the flags are activated.
GDB Peda says the jump is NOT taken since the SIGN flag is activated, at least that's how I understand the JNS instruction, jump on NO SIGN.
I would like to follow the jump instruction so I toggled the flag off:
set $SF = 7
set $eflags &= ~(1 << $SF)
you can see the SIGN flag is disabled:
EFLAGS: 0x206 (carry PARITY adjust zero sign trap INTERRUPT direction overflow)
but gdp peda still says that the jump is NOT taken, and when I actually try to step, it doesn't take the jump.
Is the problem in my understanding of the JNS instruction or am I missing something else?
EDIT: Not sure why but the jump does work after changing the flag, but peda does not understand the change and still says that the jump won't be taken. I will check this a bit more before writing an answer.
Related
Debuggers generate int3 instruction at the breakpoint so that the control goes to debug exception handler. For int3 debuggers will insert 0x cc. Why don't they insert 0x cd 03 which also means int3? What will happen if they insert 0x cd 03 instead of 0x cc?
Intel's documentation for that instruction answers your question for you: the Description section in the vol.2 manual entry for int / int3 says:
The INT3 instruction uses a one-byte opcode (CC) and is intended for calling the debug exception handler with a breakpoint exception (#BP). (This one-byte form is useful because it can replace the first byte of any instruction at which a breakpoint is desired, including other one-byte instructions, without overwriting other instructions.)
There are some 1-byte x86 instructions like push reg. Or in 16/32-bit mode inc/dec reg. If a 1-byte instruction was the last instruction before a branch target, overwriting it with a longer instruction would corrupt the first byte of an instruction that can run without reaching the breakpoint.
There are other differences for vm86 mode between the two byte cd 03 encoding of int 3 vs the 1-byte int3 encoding, again documented right in the manual. (Presumably to make it easier to write debuggers that debug the vm86 guest from outside the vm86 environment.)
An interrupt generated by the INTO, INT3, or INT1 instruction differs from one generated by INT n in the following ways:
The normal IOPL checks do not occur in virtual-8086 mode. The interrupt is taken (without fault) with any IOPL value.
The interrupt redirection enabled by the virtual-8086 mode extensions (VME) does not occur. The interrupt is always handled by a protected-mode handler.
(These features do not pertain to CD03, the “normal” 2-byte opcode for INT 3. Intel and Microsoft assemblers will not generate the CD03 opcode from any mnemonic, but this opcode can be created by direct numeric code definition or by self-modifying code.)
BTW, in NASM/YASM, you do need to use the int3 mnemonic for the 1 byte encoding; int 3 does assemble to CD 03
GAS assembles int $3 the same as int3, to 0xcc
But of course debuggers are working with binary machine code, not asm source. The assembly-mnemonic stuff only applies to manually including a breakpoint in your asm source. (Yes, that's a thing you can do. Most debuggers will let you resume by skipping over that instruction.)
I think my real problem is I don't completely understand the stack frame mechanism so I am looking to understand why the following code causes the program execution to resume at the end of the application.
This code is called from a C function which is several call levels deep and the pushf causes program execution to revert back several levels through the stack and completely exit the program.
Since my work around works as expected I would like to know why using the pushf instruction appears to be (I assume) corrupting the stack.
In the routines I usually setup and clean up the stack with :
sub rsp, 28h
...
add rsp, 28h
However I noticed that this is only necessary when the assembly code calls a C function.
So I tried removing this from both routines but it made no difference. SaveFlagsCmb is an assembly function but could easily be a macro.
The code represents an emulated 6809 CPU Rora (Rotate Right Register A).
PUBLIC Rora_I_A ; Op 46 - Rotate Right through Carry A reg
Rora_I_A PROC
sub rsp, 28h
; Restore Flags
mov cx, word ptr [x86flags]
push cx
popf
; Rotate Right the byte and save the FLAGS
rcr byte ptr [q_s+AREG], 1
; rcr only affects Carry. Save the Carry first in dx then
; add 0 to result to trigger Zero and Sign/Neg flags
pushf ; this causes jump to end of program ????
pop dx ; this line never reached
and dx, CF ; Save only Carry Flag
add [q_s+AREG], 0 ; trigger NZ flags
mov rcx, NF+ZF+CF ; Flag Mask NZ
Call SaveFlagsCmb ; NZ from the add and CF saved in dx
add rsp, 28h
ret
Rora_I_A ENDP
However if I use this code it works as expected:
PUBLIC Rora_I_A ; Op 46 - Rotate Right through Carry A reg
Rora_I_A PROC
; sub rsp, 28h ; works with or without this!!!
; Restore Flags
mov ah, byte ptr [x86flags+LSB]
sahf
; Rotate Right the byte and save the FLAGS
rcr byte ptr [q_s+AREG], 1
; rcr only affects Carry. Save the Carry first in dx then
; add 0 to result to trigger Zero and Sign/Neg flags
lahf
mov dl, ah
and dx, CF ; Save only Carry Flag
add [q_s+AREG], 0 ; trigger NZ flags
mov rcx, NF+ZF+CF ; Flag Mask NZ
Call SaveFlagsCmb ; NZ from the add and CF saved in dx
; add rsp, 28h ; works with or without this!!!
ret
Rora_I_A ENDP
Your reported behaviour doesn't really make sense. Mostly this answer is just providing some background not a real answer, and a suggestion not to use pushf/popf in the first place for performance reasons.
Make sure your debugging tools work properly and aren't being fooled by something into falsely showing a "jump" to somewhere. (And jump where exactly?)
There's little reason to mess around with 16-bit operand size, but that's probably not your problem.
In Visual Studio / MASM, apparently (according to OP's comment) pushf assembles as pushfw, 66 9C which pushes 2 bytes. Presumably popf also assembles as popfw, only popping 2 bytes into FLAGS instead of the normal 8 bytes into RFLAGS. Other assemblers are different.1
So your code should work. Unless you're accidentally setting some other bit in FLAGS that breaks execution? There are bits in EFLAGS/RFLAGS other than condition codes, including the single-step TF Trap Flag: debug exception after every instruction.
We know you're in 64-bit mode, not 32-bit compat mode, otherwise rsp wouldn't be a valid register. And running 64-bit machine code in 32-bit mode wouldn't explain your observations either.
I'm not sure how that would explain pushf being a jump to anywhere. pushf itself can't fault or jump, and if popf set TF then the instruction after popf would have caused a debug exception.
Are you sure you're assembling 64-bit machine code and running it in 64-bit mode? The only thing that would be different if a CPU decoded your code in 32-bit mode should be the REX prefix on sub rsp, 28h, and the RIP-relative addressing mode on [x86flags] decoding as absolute (which would presumably fault). So I don't think that could explain what you're seeing.
Are you sure you're single-stepping by instructions (not source lines or C statements) with a debugger to test this?
Use a debugger to look at the machine code as you single-step. This seem really weird.
Anyway, it seems like a very low-performance idea to use pushf / popf at all, and also to be using 16-bit operand-size creating false dependencies.
e.g. you can set x86 CF with movzx ecx, word ptr [x86flags] / bt ecx, CF.
You can capture the output CF with setc cl
Also, if you're going to do multiple things to the byte from the guest memory, load it into an x86 register. A memory-destination RCR and a memory-destination ADD are unnecessarily slow vs. load / rcr / ... / test reg,reg / store.
LAHF/SAHF may be useful, but you can also do without them too for many cases. popf is quite slow (https://agner.org/optimize/) and it forces a round trip through memory. However, there is one condition-code outside the low 8 in x86 FLAGS: OF (signed overflow). asm-source compatibility with 8080 is still hurting x86 in 2019 :(
You can restore OF from a 0/1 integer with add al, 127: if AL was originally 1, it will overflow to 0x80, otherwise it won't. You can then restore the rest of the condition codes with SAHF. You can extract OF with seto al. Or you can just use pushf/popf.
; sub rsp, 28h ; works with or without this!!!
Yes of course. You have a leaf function that doesn't use any stack space.
You only need to reserve another 40 bytes (align the stack + 32 bytes of shadow space) if you were going to make another function call from this function.
Footnote 1: pushf/popf in other assemblers:
In NASM, pushf/popf default to the same width as other push/pop instructions: 8 bytes in 64-bit mode. You get the normal encoding without an operand-size prefix. (https://www.felixcloutier.com/x86/pushf:pushfd:pushfq)
Like for integer registers, both 16 and 64-bit operand-size for pushf/popf are encodeable in 64-bit mode, but 32-bit operand size isn't.
In NASM, your code would be broken because push cx / popf would push 2 bytes and pop 8, popping 6 bytes of your return address into RFLAGS.
But apparently MASM isn't like that. Probably a good idea to use explicit operand-size specifiers anyway, like pushfw and popfw if you use it at all, to make sure you get the 66 9C encoding, not just 9C pushfq.
Or better, use pushfq and pop rcx like a normal person: only write to 8 or 16-bit partial registers when you need to, and keep the stack qword-aligned. (16-byte alignment before call, 8-byte alignment always.)
I believe this is a bug in Visual Studio. I'm using 2022, so it's an issue that's been around for a while.
I don't know exactly what is triggering it, however stepping over one specific pushf in my code had the same symptoms, albeit with the code actually working.
Putting a breakpoint on the line after the pushf did break, and allowed further debugging of my app. Adding a push ax, pop ax before the pushf also seemed to fix the issue. So it must be a Visual Studio issue.
At this point I think MASM and debugging in Visual Studio has pretty much been abandoned. Any suggestions for alternatives for developing dlls on Windows would be appreciated!
I am learning some mechanism of breakpoint and I learned that 'In x86, there exist a instruction called int3 for debugger to interrupt the CPU. And then CPU will interrupt the running program by signal'.
For example:
8048e20: 55 push %ebp
8048e21: 89 e5 mov %esp,%ebp
When the user input
b *0x8048e21
The instruction will be replaced by int3(opcode 0xcc) and become this:
8048e20: 55 push %ebp
8048e21: cc e5 mov %esp,%ebp
And it will stop at the right place.
Then comes the question:
What would happen if I set the breakpoint not at the beginning of a instruction? ie, if I input:
b *0x8048e22
will debugger still replace the e5 with cc? So I write a simple example and run it with gdb.
As you can see above, I set two break points and the second is at the middle of a break points. I Input r and stop at the first breakpoint and input c and run to the end.
So it seems that the gdb ignore the second breakpoint. (For if it really repalce it with a int3 the program would be totally wrong).
Question: What happen to the second breakpoint, more specifically, what does gdb deal with it( or what I learn is wrong?)
Edit: #dbrank already give a great example about altering the data field of a instruction, I will try to make it more comprehensive with a similar example (it seems the register).
(Any reference about mechanism of breakpoint is appreciated!)
Inserting breakpoint in the middle of instruction will alter the instruction.
See this example of a program, where inserting a breakpoint overwrites original value assigned to variable (42 (0x2a)) with breakpoint instruction (0xcc (204)).
You can find more about how breakpoints work here.
You can also look into GDB sources (breakpoint.c & infrun.c mostly).
The short description:
Setting a breakpoint on the first line of my .CODE segment in an assembly program will not halt execution of the program.
The question:
What about Visual Studio's debugger would allow it to fail to create a breakpoint at the first line of a program written in assembly? Is this some oddity of the debugger, a case of breaking on a multi-byte instruction, or am I just doing something silly?
The details:
I have the following assembly program compiling and running in Visual Studio:
; Tell MASM to use the Intel 80386 instruction set.
.386
; Flat memory model, and Win 32 calling convention
.MODEL FLAT, STDCALL
; Treat labels as case-sensitive (required for windows.inc)
OPTION CaseMap:None
include windows.inc
include masm32.inc
include user32.inc
include kernel32.inc
include macros.asm
includelib masm32.lib
includelib user32.lib
includelib kernel32.lib
.DATA
BadText db "Error...", 0
GoodText db "Excellent!", 0
.CODE
main PROC
;int 3 ; <-- If uncommented, this will not break.
mov ecx, 6 ; <-- Breakpoint here will not hit.
xor eax, eax ; <-- Breakpoint here will.
_label: add eax, ecx
dec ecx
jnz _label
cmp eax, 21
jz _good
_bad: invoke StdOut, addr BadText
jmp _quit
_good: invoke StdOut, addr GoodText
_quit: invoke ExitProcess, 0
main ENDP
END main
If I try to set a breakpoint on the first line of the main function, mov ecx, 6, it is ignored, and the program executes without stopping. Only will a breakpoint be hit if I set it on the line after that, xor eax, eax, or any subsequent line.
I have even tried inserting a software breakpoint, int 3, as the first line of the function, and it is also ignored.
The first thing I notice that is odd: viewing the disassembly after hitting one of my breakpoints gives me the following:
01370FFF add byte ptr [ecx+6],bh
--- [Path]\main.asm
xor eax, eax
00841005 xor eax,eax --- <-- Breakpoint is hit here
_label: add eax, ecx
00841007 add eax,ecx
dec ecx
00841009 dec ecx
jnz _label
0084100A jne _label (841007h)
cmp eax, 21
0084100C cmp eax,15h
What's interesting here is that the xor is, in Visual Studio's eyes, the first operation in my program. Absent is the line move ecx, 6. Directly above where it thinks my source begins is the line that actually sets ecx to 6. So the actual start of my program has been mangled according to the disassembly.
If I make the first line of my program int 3, the line that appears above where my code is in the disassembly is:
00F80FFF add ah,cl
As suggested in one of the answers, I turned off ASLR, and it looks like the disassembly is a little more stable:
.CODE
main PROC
;mov ecx, 6
xor eax, eax
00401000 xor eax,eax --- <-- Breakpoint is present here, but not hit.
_label: add eax, ecx
00401002 add eax,ecx --- <-- Breakpoint here is hit.
dec ecx
00401004 dec ecx
The complete program is visible in the disassembly, but the problem still perists. Despite my program starting on an expected address, and the first breakpoint being shown in the disassembly, it is still skipped. Placing an int 3 as the first line still results in the following line:
00400FFF add ah,cl
and does not stop execution, and re-mangles the view of my program in the disassembly again. The next line of my program is then at location 00401001, which I suppose makes sense because int 3 is a one-byte instruction, but why would it have disappeared in the disassembly?
Even starting the program using the 'Step Into (F11)' command does not allow me to break on the first line. In fact, with no breakpoint, starting the program with F11 does not halt execution at all.
I'm not really sure what else I can try to solve the problem, beyond what I have detailed here. This is stretching beyond my current understanding of assembly and debuggers.
01370FFF add byte ptr [ecx+6],bh
At least I can explain away one mystery. Note the address, 0x1370fff. The CODE segment never starts at an address like that, segments begin at an address that's a multiple of 0x1000. Which makes the last 3 hex digits of the start address always 0. The debugger got confuzzled and started disassembling the code at the wrong address, off by one. The actual start address is 0x1371000. The disassembly starts off poorly because there's a 0 at 0x1370fff. That's a multi-byte ADD instruction. So it displays garbage for a while until it catches up with real machine code instructions by accident.
You need to help it along and give it a command to start disassembling at the proper address. In VS that's the Address box, type "0x1371000".
Another notable quirk is the strange value of the start address. A process normally starts at address 0x400000. You have a feature called ASLR turned on, Address Space Layout Randomization. It is an anti-virus feature that makes programs start at an unpredictable start address. Nice feature but it doesn't exactly help debugging programs. It isn't clear how you built this code but you need the /DYNAMICBASE:NO linker option to turn it off.
Another important quirk of debuggers you need to keep in mind here is the way they set breakpoints. They do so by patching the code, replacing the start byte of an instruction with an int 3 instruction. When the breakpoint hits, it quickly replaces the byte with the original machine code instruction byte. So you never see this. This goes wrong if you pick the wrong address to set the breakpoint, like in the middle of a multi-byte instruction. It now no longer breaks the code, the altered byte messes up the original instruction. You can easily fall into this trap when you started with a bad disassembly.
Well, do this the Right Way. Start debugging with the debugger's STEP command instead.
I have discovered what the root of the problem is, but I haven't a clue why it is so.
After creating another MASM project, I noticed that the new one would break on the first line of the program, and the disassembly did not appear to be mangled or altered. So, I compared its properties to my original project (for the Debug configuration). The only difference I found was that my original project had Incremental Linking disabled. Specifically, it added /INCREMENTAL:NO to the linker command line.
Removing this option from the command line (thereby enabling Incremental Linking) resulted in the program behaving as expected during debugging; my code shown in the disassembly window remained unaltered, I could hit a breakpoint on the first line of the main procedure, and an int 3 instruction would also execute properly as the first line.
If you press F+11 (step into) instead of Start Debugging the debugger will stop on the first line.
It is possible there is some messed up breakpoint setting. Delete any *.suo files in your project directory to reset all breakpoints.
Note that your project will have a secret headers and stuff in it if it has a main function. To set a breakpoint at the real entry point use: Debug + New Breakpoint + Break at Function -> wWinMainCRTStartup for a windows program or mainCRTStartup or wmainCRTStartup for a console program.
Currently i'm playing with the windows/WOW64 trick known as "the heaven's gate", which, as some of you will probably know, allows us to enter x64 mode even though in a x86 program (i was so amazed when i tested it and it worked!) But i know it is not supported on all Windows versions, so my code (because there is a code) uses seh, it looks like this:
start:
use32
;; setup seh...
call $33:.64bits_code ; specify 0x33 segment, it's that easy
;; success in x64 mode, quit seh...
jmp .exit
.64bits_code:
use64
;; ...
use32
retf
.seh_handler:
use32
;; ...
xor eax,eax ; EXCEPTION_CONTINUE_EXECUTION
ret
.32bits_code:
; we have been called by a far call (well, indirectly, routed by a seh handler)
; HERE IS THE PROBLEM => Should i use a retf since cs and eip are on the stack,
; or the exception has been triggered before pushing them???
; "retf" or "jmp .exit"?
.exit:
xor eax,eax
push eax
call [ExitProcess]
I know a simple "jmp .exit" would do the trick, but i'm terribly curious about it
When the OS gets an interrupt or a fault happens, it expects that no matter what the user code was up to, the CPU has saved the required state on the kernel stack so that an IRET will invisibly resume whatever it was doing.
Note that there is no "magic" involved in that state on the kernel stack. "Continuing execution" only means restoring saved values of rflags, cs:rip and ss:rsp and running the code that cs:rip ends up pointing to.
That means that without involving SEH specifically, just thinking about any kind of exception happening during a far call, there are really only two cases to consider:
The exception happens "before" the jump: nothing has been pushed, the state on the kernel stack says we should resume by restarting the call instruction.
The exception happens "after" the jump: cs:eip have been pushed, rip points somewhere at or after the .64bits_code label, and that saved state says that to resume we should jump to the 64-bit code.
If the CPU allowed a far call to be interrupted "in the middle", there would be no possible value of cs:rip that produces a consistent result when the OS continues execution. For example, if the far call's return address was pushed before the exception happened but the saved cs:rip points to the far call instruction, you'll end up with two copies of the return address on the stack and all hell breaks loose.
Now, to actually answer your question: it depends on the value of rIP that the OS tells you the exception happened on. If it points to the 64-bit code you must have cs:eip on the stack, if it points to the 32-bit code you can safely assume it has not been pushed yet.