ACCESS_VIOLATION_BAD_IP - debugging

I am trying to figure out a crash in my application.
WinDbg tells me the following: (using dashes in place of underscores)
LAST-CONTROL-TRANSFER: from 005f5c7e to 6e697474
DEFAULT-BUCKET-ID: BAD_IP
BUGCHECK-STR: ACCESS-VIOLATION
It is obvious to me that 6e697474 is NOT a valid address.
I have three questions:
1) Does the "BAD_IP" bucket ID mean "Bad Instruction Pointer?"
2) This is a multi-threaded application so one consideration was that the object whose function I was attempting to call went out of scope. Does anyone know if that would lead to the same error message?
3) What else might cause an error like this? One of my co-workers suggested that it might be a stack overflow issue, but WinDBG in the past has proven rather reliable at detecting and pointing these out. (not that I'm sure about the voodoo it does in the background to diagnose that).

Bad-IP is Bad Instruction Pointer. From the description of your problem, I would assume it is a stack corruption instead of a stack overflow.

I can think of the following things that could cause a jump to invalid address, in decreasing order of likelyhood:
calling a member function on a deallocated object. (as you suspect)
calling a member function of a corrupted object.
calling a member function of an object with a corrupted vtable.
a rouge pointer overwriting code space.
I'd start debugging by finding the code at 005f5c7e and looking at what objects are being accessed around there.

It may be helpful to ask, what could have written the string 'ttie' to this location? Often when you have bytes in the 0x41-0x5A, 0x61-0x7A ([a-zA-Z]) range, it indicates a string buffer overflow.
As to what was actually overwritten, it could be the return address, some other function pointer you're using, or occasionally that a virtual function table pointer (vfptr) in an object got overwritten to point to the middle of a string.

Related

How did debuggers for 16-bit real mode programs produce stack traces?

I'm messing around with running old DOS programs in an emulator, and I've gotten to the point where I'd like to trace the program's stack. However, I'm running into a problem, specifically how to detect near calls and far calls. Some pretext:
A near call pushes only the IP onto the stack, and is expected to be paired with a ret which pops only the IP to return to.
A far call pushes both the CS and IP onto the stack, and is expected to be paired with a retf which pops both the CS and IP to return to.
There is no way to know whether a call is a near call or a far call, except by knowing which kind of instruction called it, or which return it uses.
Luckily, for the period this program was developed in, BP-based stack frames were very common, so walking the stack doesn't seem to be a problem: I just follow the BP-chain. Unfortunately, getting the CS and/or IP is difficult, because there doesn't seem to be any way for me to determine whether a call is a near call or a far call by looking at the stack alone.
I have metadata about functions available, so I can tell whether a function is a near or far call if I already know the actual CS and IP, but I can't figure out the IP and CS unless I already know if it's a far call or near call.
I'm having a little success by just guessing and seeing if my guess results in a valid function lookup, but I think this method will produce a lot of false positives.
So my question is this: How did debuggers of the DOS era deal with this problem and produce stack traces? Is there some algorithm for this I'm missing, or did they just encode debug information in the stack? (If this is the case, then I'll have to come up with something else.)
Just a guess, I've never actually used 16-bit x86 development tools (modern or back in the day):
You know the CS:IP value of the current function (or one that triggered a fault or whatever from an exception frame).
You might have metadata that tells you whether this is a "far" function that's called with a far call or not. Or you could attempt decoding until you get to a retn or retf, and use that to decide whether the return address is a near IP or a far CS:IP.
(Assuming this is a normal function that returns with some kind of ret. Or if it ends with a jmp tailcall to another function, then the return address probably matches that, but that's another level of assumptions. And figuring out that a near jmp is the end of a function instead of just a jump within a large function is am ambiguous problem without any symbol metadata.)
But anyway, apply the same thing to the parent function: after one level of successful backtracing, you now have the CS:IP of the instruction after the call in your parent function, and the SS:BP value of the BP linked list.
And BTW, yes there's a very good reason for legacy BP stack frames being widely used: [SP] isn't a valid 16-bit addressing mode, and only [BP] as a base implies SS as a segment, so yes, using BP for access to the stack was the only good option for random access (not just push/pop for temporaries). No reason not to save/restore it first (before any other registers or reserving stack space) to make a conventional stack-frame.

C++ function for "kernel memory check" before accessing

I am developing a simple device driver for study. With a lot of testing, I am creating so many errors which finally leads my computer to blue screen. I am sure that the reason for this is memory crash. So now I want to check if my code can access to Kernel memory before going further.
My question is what function can check whether it is accessible or not in kernel memory. For instance, there is a pointer structure with two fields and I want to access the first field which is also a pointer but do now know whether it really has an accessible value or just trash value.
In this given context, I need to check it out to make sure that I am not getting blue screen.
Thanks in advance!
this is impossible for kernel memory. you must know exactly are kernel address is valid and will be valid during access. if you get address from user mode - you can and must use ProbeForRead or ProbeForWrite. but this is only for user-mode buffer. for any kernel memory (even valid and resident) this function just raise exception. from invalid access to kernel memory address no any protection. try - except handler not help here - you just got PAGE_FAULT_IN_NONPAGED_AREA bug check
For instance, there is a pointer structure with two fields and I want
to access the first field which is also a pointer but do now know
whether it really has an accessible value or just trash value.
from where you got this pointer ? who fill this structure ? you must not check. you must know at begin that this pointer is valid and context of structure will be valid during time you use it. otherwise your code already wrong and buggy

Stack Overflow in C function call - MS Visual C++ 2010 Express

I have written a function in C, which, when called, immediately results in a stack overflow.
Prototype:
void dumpOutput( Settings *, char **, FILE * );
Calling line:
dumpOutput( stSettings, sInput, fpOut );
At the time of calling it, stSettings is already a pointer to Settings structure, sInput is a dynamically allocated 2D array and fpOut is a FILE *. It reaches all the way to the calling line without any errors, no memory leaks etc.
The actual function is rather lengthy and i think its not worth sharing it here as the overflow occurs just as the code enters the function (called the prologue part, i think)
I have tried calling the same function directly from main() with dummy variables for checking if there are any problems with passed arguments but it still throws the stack overflow condition.
The error arises from the chkstk.asm when the function is called. This asm file (according to the comments present in it) tries to probe the stack to check / allocate the memory for the called function. It just keeps jumping to Find next lower page and probe part till the stack overflow occurs.
The local variables in dumpOutput are not memory beasts either, just 6 integers and 2 pointers.
The memory used by code at the point of entering this function is 60,936K, which increases to 61,940K at the point when the stack overflow occurs. Most of this memory goes into the sInput. Is this the cause of error? I don't think so, because only its pointer is being passed. Secondly, i fail to understand why dumpOutput is trying to allocate 1004K of memory on stack?
I am totally at a loss here. Any help will be highly appreciated.
Thanks in advance.
By design, it is _chkstk()'s job to generate a stack overflow exception. You can diagnose it by looking at the generated machine code. After you step into the function, right-click the edit window and click Go To Disassembly. You ought to see something similar to this:
003013B0 push ebp
003013B1 mov ebp,esp
003013B3 mov eax,1000D4h ; <== here
003013B8 call #ILT+70(__chkstk) (30104Bh)
The value passed through the EAX register is the important one, that's the amount of stack space your function needs. Chkstk then verifies it is actually available by probing the pages of stack. If you see it repeatedly looping then the value for EAX in your code is high. Like mine, it is guaranteed to consume all bytes of the stack. And more. Which is what it protects against, you normally get an access violation exception. But there's no guarantee, your code may accidentally write to a mapped page that belongs to, say, the heap. Which would produce an incredibly difficult to diagnose bug. Chkstk() helps you find these bugs before you blow your brains out in frustration.
I simply did it with this little test function:
void test()
{
char kaboom[1024*1024];
}
We can't see yours, but the exception says that you either have a large array as a local variable or you are passing a large value to _alloca(). Fix by allocating that array from the heap instead.
Most likely a stack corruption or recursion error but it's hard to answer without seeing any code

Some Windows API calls fail unless the string arguments are in the system memory rather than local stack

We have an older massive C++ application and we have been converting it to support Unicode as well as 64-bits. The following strange thing has been happening:
Calls to registry functions and windows creation functions, like the following, have been failing:
hWnd = CreateSysWindowExW( ExStyle, ClassNameW.StringW(), Label2.StringW(), Style,
Posn.X(), Posn.Y(),
Size.X(), Size.Y(),
hParentWnd, (HMENU)Id,
AppInstance(), NULL);
ClassNameW and Label2 are instances of our own Text class which essentially uses malloc to allocate the memory used to store the string.
Anyway, when the functions fail, and I call GetLastError it returns the error code for "invalid memory access" (though I can inspect and see the string arguments fine in the debugger). Yet if I change the code as follows then it works perfectly fine:
BSTR Label2S = SysAllocString(Label2.StringW());
BSTR ClassNameWS = SysAllocString(ClassNameW.StringW());
hWnd = CreateSysWindowExW( ExStyle, ClassNameWS, Label2S, Style,
Posn.X(), Posn.Y(),
Size.X(), Size.Y(),
hParentWnd, (HMENU)Id,
AppInstance(), NULL);
SysFreeString(ClassNameWS); ClassNameWS = 0;
SysFreeString(Label2S); Label2S = 0;
So what gives? Why would the original functions work fine with the arguments in local memory, but when used with Unicode, the registry function require SysAllocString, and when used in 64-bit, the Windows creation functions also require SysAllocString'd string arguments? Our Windows procedure functions have all been converted to be Unicode, always, and yes we use SetWindowLogW call the correct default Unicode DefWindowProcW etc. That all seems to work fine and handles and draws Unicode properly etc.
The documentation at http://msdn.microsoft.com/en-us/library/ms632679%28v=vs.85%29.aspx does not say anything about this. While our application is massive we do use debug heaps and tools like Purify to check for and clean up any memory corruption. Also at the time of this failure, there is still only one main system thread. So it is not a thread issue.
So what is going on? I have read that if string arguments are marshalled anywhere or passed across process boundaries, then you have to use SysAllocString/BSTR, yet we call lots of API functions and there is lots of code out there which calls these functions just using plain local strings?
What am I missing? I have tried Googling this, as someone else must have run into this, but with little luck.
Edit 1: Our StringW function does not create any temporary objects which might go out of scope before the actual API call. The function is as follows:
Class Text {
const wchar_t* StringW () const
{
return TextStartW;
}
wchar_t* TextStartW; // pointer to current start of text in DataArea
I have been running our application with the debug heap and memory checking and other diagnostic tools, and found no source of memory corruption, and looking at the assembly, there is no sign of temporary objects or invalid memory access.
BUT I finally figured it out:
We compile our code /Zp1, which means byte aligned memory allocations. SysAllocString (in 64-bits) always return a pointer that is aligned on a 8 byte boundary. Presumably a 32-bit ANSI C++ application goes through an API layer to the underlying Unicode windows DLLs, which would also align the pointer for you.
But if you use Unicode, you do not get that incidental pointer alignment that the conversion mapping layer gives you, and if you use 64-bits, of course the situation will get even worse.
I added a method to our Text class which shifts the string pointer so that it is aligned on an eight byte boundary, and viola, everything runs fine!!!
Of course the Microsoft people say it must be memory corruption and I am jumping the wrong conclusion, but there is evidence it is not the case.
Also, if you use /Zp1 and include windows.h in a 64-bit application, the debugger will tell you sizeof(BITMAP)==28, but calling GetObject on a bitmap will fail and tell you it needs a 32-byte structure. So I suspect that some of Microsoft's API is inherently dependent on aligned pointers, and I also know that some optimized assembly (I have seen some from Fortran compilers) takes advantage of that and crashes badly if you ever give it unaligned pointers.
So the moral of all of this is, dont use "funky" compiler arguments like /Zp1. In our case we have to for historical reasons, but the number of times this has bitten us...
Someone please give me a "this is useful" tick on my answer please?
Using a bit of psychic debugging, I'm going to guess that the strings in your application are pooled in a read-only section.
It's possible that the CreateSysWindowsEx is attempting to write to the memory passed in for the window class or title. That would explain why the calls work when allocated on the heap (SysAllocString) but not when used as constants.
The easiest way to investigate this is to use a low level debugger like windbg - it should break into the debugger at the point where the access violation occurs which should help figure out the problem. Don't use Visual Studio, it has a nasty habit of being helpful and hiding first chance exceptions.
Another thing to try is to enable appverifier on your application - it's possible that it may show something.
Calling a Windows API function does not cross the process boundary, since the various Windows DLLs are loaded into your process.
It sounds like whatever pointer that StringW() is returning isn't valid when Windows is trying to access it. I would look there - is it possible that the pointer returned it out of scope and deleted shortly after it is called?
If you share some more details about your string class, that could help diagnose the problem here.

How to know that my callstack is wrong?

How can I recognize that the callstack that is shown by the debugger when my program crashes may be wrong and misleading. For example when the callstack says the following frames may be missing or incorrect, what that actually means? Also what the + number after the function call in the callstack means :
kernel32!LoadLibrary + 0x100 bytes
Should this number be important to me, and is it true that if this number is big the callstack may be incorrect ?
Sorry if I am asking something trivial and obvious
Thank you all
Generally, you can trust your callstack to be correct.
However, if you re-throw exceptions explicitly instead of allowing them to bubble up the callstack naturally, the actual error can be hidden from the stack trace.
To start with the 2nd one: kernel32!LoadLibrary + 0x100 bytes means that the call was from the function LoadLibrary (offset: +100 bytes); appearantly there was no symbolic information exactly identifying the caller. This in itself is no reason for the callstack to be corrupted.
A call stack may be corrupted if functions overwrite values on the stack (i.e. by buffer overflow. This would likely show as '0x41445249' (if it were my name to overwrite it) as a call function. That is something outside your program memory ranges.
A way to diagnose the cause of your crash would be to set breakpoints on functions identified by the call stack. Or use your debugger to backtrace (depending on debugger & system). It is interesting to find out what arguments were included in the calls. Pointers are generally a good start (NULL pointers, uninitialized pointers). Good luck.

Resources