How to handle seg faults under Windows? - windows

How can a Windows application handle segmentation faults? By 'handle' I mean intercept them and perhaps output a descriptive message. Also, the ability to recover from them would be nice too, but I assume that is too complicated.

Let them crash and let the Windows Error Reporting handle it - under Vista+, you should also consider registering with Restart Manager (http://msdn.microsoft.com/en-us/library/aa373347(VS.85).aspx), so that you have a chance to save out the user's work and restart the application (like what Word/Excel/etc.. does)

Use SEH for early exception handling,
and use SetUnhandledExceptionFilter to show a descriptive message.

If you add the /EHa compiler argument then try {} catch(...) will catch all exceptions for you, including SEH exceptions.
You can also use __try {} __except {} which gives you more flexibility on what to do when an exception is caught. putting an __try {} __except {} on your entire main() function is somewhat equivalent to using SetUnhandeledExceptionFilter().
That being said, you should also use the proper terminology: "seg-fault" is a UNIX term. There are no segmentation faults on Windows. On Windows they are called "Access Violation Exceptions"

C++ self-contained example on how to use SetUnhandledExceptionFilter, triggering a write fault and displaying a nice error message:
#include <windows.h>
#include <sstream>
LONG WINAPI TopLevelExceptionHandler(PEXCEPTION_POINTERS pExceptionInfo)
{
std::stringstream s;
s << "Fatal: Unhandled exception 0x" << std::hex << pExceptionInfo->ExceptionRecord->ExceptionCode
<< std::endl;
MessageBoxA(NULL, s.str().c_str(), "my application", MB_OK | MB_ICONSTOP);
exit(1);
return EXCEPTION_CONTINUE_SEARCH;
}
int main()
{
SetUnhandledExceptionFilter(TopLevelExceptionHandler);
int *v=0;
v[12] = 0; // should trigger the fault
return 0;
}
Tested successfully with g++ (and should work OK with MSVC++ as well)

What you want to do here depends on what sort of faults you are concerned with. If you have sloppy code that is prone to more or less random General Protection Violations, then #Paul Betts answer is what you need.
If you have code that has a good reason to deference bad pointers, and you want to recover, start from #whunmr's suggestion about SEH. You can handle and indeed recover, if you have clear enough control of your code to know exactly what state it is in at the point of the fault and how to go about recovering.

Similar to Jean-François Fabre solution, but with Posix code in MinGW-w64. But note that the program must exit - it can't recover from the SIGSEGV and continue.
#include <stdio.h>
#include <signal.h>
#include <stdlib.h>
void sigHandler(int s)
{
printf("signal %d\n", s);
exit(1);
}
int main()
{
signal(SIGSEGV, sigHandler);
int *v=0;
*v = 0; // trigger the fault
return 0;
}

Related

May the translation-function set with _set_se_translator just return without throwing?

May the translation-function set with _set_se_translator just return without throwing?
If so, would this mean that the further processing goes the way of normal SEH-processing?
[EDIT]: I tried it out myself:
#include <Windows.h>
#include <iostream>
#include <stdexcept>
using namespace std;
int main()
{
_set_se_translator( []( unsigned int, EXCEPTION_POINTERS * ) { } );
__try
{
RaiseException( EXCEPTION_IN_PAGE_ERROR, 0, 0, nullptr );
}
__except( EXCEPTION_EXECUTE_HANDLER )
{
cout << "caught" << endl;
}
}
Is this specified to work?
From the documentation (added emphasis mine):
Your translator function should do no more than throw a C++ typed
exception. If it does anything in addition to throwing (such as
writing to a log file, for example) your program might not behave as
expected because the number of times the translator function is
invoked is platform-dependent.
If we take this completely literally, then a translator function should not return, as this is doing something 'more' than throwing a typed exception. However, I can find no specific mention in that document (or any related ones) that the function should never return, and neither does the function's prototype specify the [[noreturn]] attribute (though that, in itself, may not mean very much).

Calling pthread_cond_destroy results in "Function not implemented" ENOSYS on macOS

I am trying to make some Linux-based code run on macOS. It is the POSIX OSAL layer for NASA Core Flight System as found here: https://github.com/nasa/osal.
I am observing that the code uses POSIX conditions and in particular, there is a call like the following:
if (pthread_cond_destroy(&(sem->cv)) != 0) {
printf("pthread_cond_destroy %d %s\n", errno, strerror(errno)); // my addition
...
}
On macOS, the tests related to this code provided in the OSAL repository always fail because the call to pthread_cond_destroy always results in:
pthread_cond_destroy 78 Function not implemented
I have found an example in the Apple documentation which shows an example of Using Conditions (Threading Programming Guide / Synchronization / Using Conditions) and in that example there is no call to pthread_cond_destroy but I cannot make any conclusions on whether that call should be there or not because the example is simplified.
This is how the header looks like on my machine:
__API_AVAILABLE(macos(10.4), ios(2.0))
int pthread_cond_destroy(pthread_cond_t *);
I am wondering if pthread_cond_* functionality is simply missing on macOS and I have to implement a replacement for it or there is some way to make it work.
EDIT: The minimal example is working fine for me. The problem should be somewhere around the problematic code. What I still don't understand is why I am getting ENOSYS/78 error code, for one thing it is not mentioned on the man page man/3/pthread_cond_destroy:
#include <iostream>
#include <pthread.h>
int main() {
pthread_cond_t condition;
pthread_cond_init(&condition, NULL);
int result = pthread_cond_destroy(&condition);
assert(result == 0);
assert(errno == 0);
std::cout << "Hello, World!" << std::endl;
return 0;
}

std::lock how do I know it failed

It is not clear from the documentation. This template function returns void. The document mentions -
If the function cannot lock all objects, the function first unlocks
all objects it successfully locked (if any) before failing.
But how should the caller know it has failed ?
Does it block until it is successful and exception is the only failure scenario ?
It throws an error on any issue.
As a couple other SO members have mentioned to me in the past on my own questions, steer away from CPlusPlus.com - The Canonical Reference for Misinformation.
Please take this as an opportunity to learn the differences between c and c++. C requires return codes or side-effects to function arguments, while C++ offers exceptions in addition to the aforementioned.
Parameters
(none)
Return value
(none)
Exceptions
Throws std::system_error when errors occur, including errors from the
underlying operating system that would prevent lock from meeting its
specifications. The mutex is not locked in the case of any exception
being thrown.
Notes
lock() is usually not called directly: std::unique_lock and
std::lock_guard are used to manage exclusive locking.
Example
This example shows how lock and unlock can be used to protect shared
data.
#include <iostream>
#include <chrono>
#include <thread>
#include <mutex>
int g_num = 0; // protected by g_num_mutex
std::mutex g_num_mutex;
void slow_increment(int id)
{
for (int i = 0; i < 3; ++i) {
g_num_mutex.lock();
++g_num;
std::cout << id << " => " << g_num << '\n';
g_num_mutex.unlock();
std::this_thread::sleep_for(std::chrono::seconds(1));
}
}
int main()
{
std::thread t1(slow_increment, 0);
std::thread t2(slow_increment, 1);
t1.join();
t2.join();
}

main() program won't exit normally

My C++ 2011 main() program for DiGSE is:
int main(int argc, char* argv[]) {
. . .
return EXIT_SUCCESS;
} // this } DOES match the opening { above
It compiles and executes correctly. A print statement immediately before the return outputs normally. However, a Windows 7.1 notification pops up saying "DiGSE.exe has stopped working." It then graciously offers to search the web for a solution.
I tried replacing the return with return 0; exit(0); and nothing so execution falls out the bottom (which, as I understand, is acceptable). However, in all cases I still get the pop-up.
What do I do to get the main() to exit gracefully?
DiGSE is just the name of the Windows 7 executable compiled on MinGW 4.9.2. The "full" program is already stripped down:
int main(int argc, char* argv[]) {
try {
DiGSE::log_init(DiGSE::log_dest_T::console_dest, "dig.log", true,
DiGSE::log_lvl_T::trace_lvl);
}//try
catch (const std::exception& ex) {
std::cerr << FMSG("\n"
"Executing '%1%' raised this exception:\n"
" %2%", % DiGSE::Partition::productName()
% ex.what())
<< std::endl;
return EXIT_FAILURE;
}//exception
catch (...) {
std::cerr << FMSG("\n"
"Executing '%1%' instance raised an unknown exception.",
% DiGSE::Partition::productName())
<< std::endl;
return EXIT_FAILURE;
}//exception
L_INFO(FMSG("'%1% v%2%' terminated normally.",
% DiGSE::Partition::productName()
% DiGSE::Partition::productVersion()))
return EXIT_SUCCESS;
}//main()
The L_INFO() is a logging call, which outputs as it should. The log_init() at the top initializes the log. Commenting out log_init() and L_INFO() has the same result as originally reported.
Program received signal SIGSEGV, Segmentation fault.
0x000000006fc8da9d in libstdc++-6!_ZNSo6sentryC1ERSo ()
from D:\Program Files\mingw-w64\x86_64-4.9.2-posix-seh-rt_v3-rev0\mingw64\bin
\libstdc++-6.dll
This is what gdb returns while mail() is exiting. It does this even with the log_init() and L_LNFO() commented out. So the problem is probably in one of globals of something it's linked to.
It is completely possible for a program to crash after the end of main -- the program isn't over yet. The following items execute after main() returns:
Registered at_exit handlers
Destructors for main()'s own automatic variables, and all variables with static storage duration (globals and function-static) (C++ only)
DllMain(PROCESS_DETACH) code in all dynamic libraries you are using (Windows only)
In addition to that, various events can occur outside your program and cause failures which you might mistake for a failure of your program (especially if your program forks or spawns copies of itself):
SIGCHLD is raised (on *nix). Process handles become signaled and cause wait functions to return (on Windows)
All open handles (file descriptors) get abandoned, and the close handler in the driver is invoked
The other end of connections (pipes, sockets) shift into a disconnected state (reads return 0, writes fail, on *nix SIGHUP may be raised)
I suggest attaching a debugger, set a breakpoint at the end of main, and then single-step through the cleanup code to find out where the failure is occurring. Divide and conquer may also be helpful (cut out some global variables, or all usage of a particular DLL).

why this signal handler is called infinitely

I am using Mac OS 10.6.5, g++ 4.2.1. And meet problem with following code:
#include <iostream>
#include <sys/signal.h>
using namespace std;
void segfault_handler(int signum)
{
cout << "segfault caught!!!\n";
}
int main()
{
signal(SIGSEGV, segfault_handler);
int* p = 0;
*p = 100;
return 1;
}
It seems the segfault_handler is called infinitely and keep on print:
segfault caught!!!
segfault caught!!!
segfault caught!!!
...
I am new to Mac development, do you have any idea on what happened?
This is because after your signal handler executes, the EIP is back to the instruction which causes the SIGSEGV - so it executes again, and SIGSEGV is raised again.
Usually ignoring SIGSEGV like you do is meaningless anyway - suppose the instruction actually read some value from a pointer to a register, what would you do? You don't have any 'correct' value to put in the register, so the following code will likely SIGSEGV again or, worse, trigger some logic error.
You should either exit the process when SIGSEGV happens, or return to a known safe point - longjmp should work, if you know that this is indeed the safe point (the only possible example that comes to mind is VM interpreters/JITs).
Have you tried returning 0 instead of 1 in your program? Traditionally, values other than 0 indicate error. Also, does removing the two lines dealing with *p resolve it?

Resources