Windows pintool mismatch between call/ret instructions - windows

So i've been trying to write a pintool that monitors call/ret instructions but i've noticed that threre was a significant inconsistency between the two. For example ret instructions without previous call.
I've run the tool in a console application from which i got the following logs showing this inconsistency (this is an example, there are more inconsistencies like the one listed below in the other call/ret instructions):
1. Call from ntdll!LdrpCallInitRoutine+0x69, expected to return to 7ff88f00502a
2. RETURN to 7ff88f00502a
//call from ntdll!LdrpInitializeNode+0x1ac which is supposed to return at 7ff88f049385 is missing (the previous instruction)
3. RETURN to 7ff88f049385 (ntdll!LdrpInitializeNode+0x1b1)
The above are the first 3 log entries for the call/ret instructions. As one can see, the monitoring started a bit late, at the call found at ntdll!LdrpCallInitRoutine+0x69, it returned to the expected address but then returned to 7ff88f049385 without first tracking the call found in the previous instruction.
Any ideas of what could be the fault?
The program is traced with INS_AddInstrumentFunction with a callback that more or less does:
if INS_IsCall(ins) INS_InsertCall(ins,...
if INS_IsRet(ins) INS_InsertCall(ins,...
I've tried the same program on Linux which worked as expected, without any mismatch.
Any ideas of the reason behind this behavior?

Related

Windows ASLR and address rewriting

I am just starting out learning about how code injection works and various defences.
I understand how and what ASLR does, but am struggling a little while looking at it in action.
By hijacking execution, I have added functionality (messagebox on startup) to mspaint.exe, calc.exe, and winword.exe. On WinXP these execute as desired. On Windows 7, the first two execute as desired but winword.exe runs into what I assume is ASLR:
[1] A call or jmp to an address in another executable section gets adjusted and executed properly.
[2] Similar instructions in that new section (call or jmp) are also adjusted properly
[3] Instructions to DLL APIs such as 'call LoadLibraryW', or simple memory access such as 'mov dword [esp], 40402d' do not get adjusted and so point to invalid address space.
Number [3] seems to happen even when I add these instructions directly to the .text section, right alongside an existing identical instruction that gets adjusted properly.
For example:
Using winword.exe, the instruction at 300010f0 is 'call [30001004]' (call GetProcAddress).
Running under ollydbg this shows it has been adjusted to 'call [04106b2f]'.
However, if I change the entrypoint instruction (300010cc) to be 'call [30001004]', this does not get adjusted at all.
Can someone explain or point me to some enlightenment as to the rules governing when and where instructions are adjusted like this? (or if I am being a complete numpty, a pointer towards some useful further reading would be appreciated)
Many thanks.

Debugger implementation - Step over issue

I am currently writing a debugger for a script virtual machine.
The compiler for the scripts generates debug information, such as function entry points, variable scopes, names, instruction to line mappings, etc.
However, and have run into an issue with step-over.
Right now, I have the following:
1. Look up the current IP
2. Get the source line from that
3. Get the next (valid) source line
4. Get the IP where the next valid source line starts
5. Set a temporary breakpoint at that instruction
or: if the next source line no longer belongs to the same function, set the temp breakpoint at the next valid source line after return address.
So far this works well. However, I seem to be having problems with jumps.
For example, take the following code:
n = 5; // Line A
if(n == 5) // Line B
{
foo(); // Line C
}
else
{
bar(); // Line D
--n;
}
Given this code, if I'm on line B and choose to step-over, the IP determined for the breakpoint will be on line C. If, however, the conditional jump evaluates to false, it should be placed on line D. Because of this, the step-over wouldn't halt at the expected location (or rather, it wouldn't halt at all).
There seems to be little information on debugger implementation of this specific issue out there. However, I found this. While this is for a native debugger on Windows, the theory still holds true.
It seems though that the author has not considered this issue, either, in section "Implementing Step-Over" as he says:
1. The UI-threads calls CDebuggerCore::ResumeDebugging with EResumeFlag set to StepOver.
This tells the debugger thread (having the debugger-loop) to put IBP on next line.
2. The debugger-thread locates next executable line and address (0x41141e), it places an IBP on that location.
3. It calls then ContinueDebugEvent, which tells the OS to continue running debuggee.
4. The BP is now hit, it passes through EXCEPTION_BREAKPOINT and reaches at EXCEPTION_SINGLE_STEP. Both these steps are same, including instruction reversal, EIP reduction etc.
5. It again calls HaltDebugging, which in turn, awaits user input.
Again:
The debugger-thread locates next executable line and address (0x41141e), it places an IBP on that location.
This statement does not seem to hold true in cases where jumps are involved, though.
Has anyone encountered this problem before? If so, do you have any tips on how to tackle this?
Since this thread comes in Google first when searching for "debugger implement step over". I'll share my experiences regarding the x86 architecture.
You start first by implementing step into: This is basically single stepping on the instructions and checking whether the line corresponding to the current EIP changes. (You use either the DIA SDK or the read the dwarf debug data to find out the current line for an EIP).
In the case of step over: before single stepping to the next instruction, you'll need to check if the current instruction is a CALL instuction. If it's a CALL instruction then put a temporary breakpoint on the instruction following it and continue execution till the execution stops (then remove it). In this case you effectively stepped over function calls literally in the assembly level and so in the source too.
No need to manage stack frames (unless you'll need to deal with single line recursive functions). This analogy can be applied to other architectures as well.
Ok, so since this seems to be a bit of black magic, in this particular case the most intelligent thing was to enumerate the instruction where the next line starts (or the instruction stream ends + 1), and then run that many instructions before halting again.
The only gotcha was that I have to keep track of the stack frame in case CALL is executed; those instructions should run without counting in case of step-over.

How to debug Erlang code?

I have some Ruby and Java background and I'm accustomed to having exact numbers of lines in the error logs.
So, if there is an error in the compiled code, I will see the number of line which caused the exception in the console output.
Like in this Ruby example:
my_ruby_code.rb:13:in `/': divided by 0 (ZeroDivisionError)
from my_ruby_code.rb:13
It's simple and fast - I just go to the line number 13 and fix the error.
On the contrary, Erlang just says something like:
** exception error: no match of right hand side value [xxxx]
in function my_module:my_fun/1
in call from my_module:other_fun/2
There are no line numbers to look at.
And if I have two lines like
X = Param1,
Y = Param2,
in 'my_fun', how can understand in which line the problem lies?
Additionally, I have tried to switch to Emacs+Elang-mode from Vim, but the only bonus I've got so far is the ability to cycle through compilation errors inside Emacs (C-k `).
So, the process of writing code and seeking for simple logical errors like 'no match of right hand side' seems to be a bit cumbersome.
I have tried to add a lot of "io:format" lines in the code, but it is additional work which takes time.
I have also tried to use distel, but it requires 10 steps to just open a debugger once.
Questions:
What is the most straight and simple way to debug Erlang code?
Does Emacs' erlang-mode has something superior in terms of Erlang development comparing to Vim?
What development 'write-compile-debug' cycle do you prefer? Do you leave Emacs to compile and run the code in the terminal? How do you search for errors in your Erlang code?
Debugging Erlang code can be tricky at times, especially dealing with badmatch errors. In general, two good guidelines to keep are:
Keep functions short
Use return values directly if you can, instead of binding temporary variables (this will give you the benefit of getting function_clause errors etc which are way more informative)
That being said, using the debuggers are usually required to quickly get to the bottom of errors. I recommend to use the command line debugger, dbg, instead of the graphical one, debugger (it's way faster when you know how to use it, and you don't have to context switch from the Erlang shell to a GUI).
Given the sample expression you provided, the case is often that you have more than just variables being assigned to other variables (which is absolutely unnecessary in Erlang):
run(X, Y) ->
X = something(whatever),
Y = other:thing(more_data),
Debugging a badmatch error here is aided by using the command line debugger:
1> dbg:tracer(). % Start the CLI debugger
{ok,<0.55.0>}
2> dbg:p(all, c). % Trace all processes, only calls
{ok,[{matched,nonode#nohost,29}]}
3> dbg:tpl(my_module, something, x). % tpl = trace local functions as well
{ok,[{matched,nonode#nohost,1},{saved,x}]}
4> dbg:tp(other, do, x). % tp = trace exported functions
{ok,[{matched,nonode#nohost,1},{saved,x}]}
5> dbg:tp(my_module, run, x). % x means print exceptions
{ok,[{matched,nonode#nohost,1},{saved,x}]} % (and normal return values)
Look for {matched,_,1} in the return value... if this would have been 0 instead of 1 (or more) that would have meant that no functions matched the pattern. Full documentation for the dbg module can be found here.
Given that both something/1 and other:do/1 always returns ok, the following could happen:
6> my_module:run(ok, ok).
(<0.72.0>) call my_module:run(ok,ok)
(<0.72.0>) call my_module:something(whatever)
(<0.72.0>) returned from my_module:something/1 -> ok
(<0.72.0>) call other:thing(more_data)
(<0.72.0>) returned from other:thing/1 -> ok
(<0.72.0>) returned from my_module:run/2 -> ok
ok
Here we can see the whole call procedure, and what return values were given. If we call it with something we know will fail:
7> my_module:run(error, error).
** exception error: no match of right hand side value ok
(<0.72.0>) call my_module:run(error,error)
(<0.72.0>) call my_module:something(whatever)
(<0.72.0>) returned from my_module:something/1 -> ok
(<0.72.0>) exception_from {my_module,run,2} {error,{badmatch,ok}}
Here we can see that we got a badmatch exception, something/1 was called, but never other:do/1 so we can deduce that the badmatch happened before that call.
Getting proficient with the command line debugger will save you a lot of time, whether you debug simple (but tricky!) badmatch errors or something much more complex.
You can use the Erlang debugger to step through your code and see which line is failing.
From erl, start the debugger with:
debugger:start().
Then you can choose which modules you want to in interpreted mode (required for debugging) using the UI or using the console with ii:
ii(my_module).
Adding breakpoints is done in the UI or console again:
ib(my_module, my_func, func_arity).
Also, in Erlang R15 we'll finally have line number in stack traces!
If you replace your erlang installation with a recent one, you will have line numbers, they were added starting with version 15.
If the new versions are not yet available on your operating system, you could build from source or try to get a packaged version here: http://www.erlang-solutions.com/section/132/download-erlang-otp
You can use "debug_info" at compile time of the file and "debugger"
1> c(test_module, [debug_info]).
{ok, test_module}
2> debugger:start().
More details about how do Debugging in Erlang you can follow by link to video - https://vimeo.com/32724400

How to debug stack-overwriting errors with Valgrind?

I just spent some time chasing down a bug that boiled down to the following. Code was erroneously overwriting the stack, and I think it wrote over the return address of the function call. Following the return, the program would crash and stack would be corrupted. Running the program in valgrind would return an error such as:
vex x86->IR: unhandled instruction bytes: 0xEA 0x3 0x0 0x0
==9222== valgrind: Unrecognised instruction at address 0x4e925a8.
I figure this is because the return jumped to a random location, containing stuff that were not valid x86 opcodes. (Though I am somehow suspicious that this address 0x4e925a8 happened to be in an executable page. I imagine valgrind would throw a different error if this wasn't the case.)
I am certain that the problem was of the stack-overwriting type, and I've since fixed it. Now I am trying to think how I could catch errors like this more effectively. Obviously, valgrind can't warn me if I rewrite data on the stack, but maybe it can catch when someone writes over a return address on the stack. In principle, it can detect when something like 'push EIP' happens (so it can flag where the return addresses are on the stack).
I was wondering if anyone knows if Valgrind, or anything else can do that? If not, can you comment on other suggestions regarding debugging errors of this type efficiently.
If the problem happens deterministically enough that you can point out particular function that has it's stack smashed (in one repeatable test case), you could, in gdb:
Break at entry to that function
Find where the return address is stored (it's relative to %ebp (on x86) (which keeps the value of %esp at the function entry), I am not sure whether there is any offset).
Add watchpoint to that address. You have to issue the watch command with calculated number, not an expression, because with an expression gdb would try to re-evaluate it after each instruction instead of setting up a trap and that would be extremely slow.
Let the function run to completion.
I have not yet worked with the python support available in gdb7, but it should allow automating this.
In general, Valgrind detection of overflows in stack and global variables is weak to non-existant. Arguably, Valgrind is the wrong tool for that job.
If you are on one of supported platforms, building with -fmudflap and linking with -lmudflap will give you much better results for these kinds of errors. Additional docs here.
Udpdate:
Much has changed in the 6 years since this answer. On Linux, the tool to find stack (and heap) overflows is AddressSanitizer, supported by recent versions of GCC and Clang.

If FindNextUrlCacheEntry() fails, how can I retrieve info of the failed entry again?

I got a ERROR_INSUFFICIENT_BUFFER error when invoking FindNextUrlCacheEntry(). Then I want to retrieve the failed entry again, using a enlarged buffer. But I found that when I invoke FindNextUrlCacheEntry(), it seems I was retrieving the one next to the failed entry. Is there any approach I can go back to retrieve the information of the just failed entry?
I also observed the same behavior on XP. I am trying to clear IE cache programmatically using WinInet APIs. The code at the following MSDN link works perfectly fine on Win7/Vista but deletes cache files in batches(multiple runs) on XP. On debugging I found that API FindNextUrlCacheEntry gives different sizes for the same entry when executed multiple times.
MSDN Link: http://support.microsoft.com/kb/815718
Here is what I am doing:
First of all I make a call to determine the size of the next URL entry
fSuccess = FindNextUrlCacheEntry(hCacheHandle, 0, &cacheEntryInfoBufferSizeInitial) // cacheEntryInfoBufferSizeInitial = 0 at this point
The above call returns false with error no as INSUFFICIENT_BUFFER and with cacheEntryInfoBufferSizeInitial parameter set equal to the size of the buffer required to retrieve the cache entry, in bytes. After allocating the required size (cacheEntryInfoBufferSizeInitial) I call the same WinInet API again expecting it to retrieve the entry successfully this time. But sometimes it fails. I see that the cases in which API fails again even though with required buffered sizes (as determined it only) because it expects morebytes then what it retrieved earlier. Most of times the difference is of few bytes but I have also seen cases where the difference is almost 4 to 5 times.
For what it's worth this seems to be solved in Vista.

Resources