GWAN terminating under load - g-wan

I have a web app that needs to be a restful interface. So I have a connection handler that tries to change the inbound request into something that gwan can use.
Every connection to this service is the same, so I am doing a replace on every connection:
#include "gwan.h" // G-WAN exported functions
#include <stdio.h>
int init(int argc, char *argv[]){
u32 *states = (u32*)get_env(argv, US_HANDLER_STATES);
*states = 1 << HDL_AFTER_READ; // we assume "GET /hello" sent in one shot
return 0;
}
void clean(int argc, char *argv[]){}
int main(int argc, char *argv[])
{
const long state = (long)argv[0];
if(state == HDL_AFTER_READ) {
xbuf_t *read_xbuf = (xbuf_t*)get_env(argv, READ_XBUF);
xbuf_replfrto(read_xbuf, read_xbuf->ptr, read_xbuf->ptr + 16, "/classify.htm?", "/?boost.cpp&");
}
return 255;
}
The problem is, under load, after a little while, G-WAN crashes and give me an error:
G-WAN 4.3.14 (pid:20477)
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_S_construct NULL not valid
Signal : 6:Abort
Signal src : -6:tkill
errno : 0
Thread : 2
Code Pointer: 7fc5dcd748a5 (module:libc.so.6, function:raise, line:0)
Access Address: 000000004ffd
Registers : EAX=000000000000 CS=00000033 EIP=7fc5dcd748a5 EFLGS=000000000202
EBX=0000006693e8 SS=0000000a ESP=7fc5d5c7bf38 EBP=7fc5880008d8
ECX=ffffffffffffffff DS=0000000a ESI=00000000504e FS=00000033
EDX=000000000006 ES=0000000a EDI=000000004ffd CS=00000033
Module :Function :Line # PgrmCntr(EIP) RetAddress FramePtr(EBP)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Aborted (core dumped)
The problem is only when I put the server under load. With one machine I'm hitting around 8000 requests/sec, and it lasts about 5 seconds before crashing.
If I DON'T do the rewrite (move main.c to main.c_) and call the cpp script directly, no crash...
Help! Any ideas?
Thanks

IT was in a different part of the code... the biggest clue was It looked Like I was getting a C++ unhandled exception.
According to docs, G-Wan is written in C.
Dug into my code,and added some exception handling around where I suspected the code was crashing, and I saw the exception, but the server kept going, which is what I wanted!
Snippet where I fixed it:
try {
get_arg("key=",&name,argc,argv);
string url = name;
boost::thread cls(classify_url,url,boost::ref(rp));
if(!cls.timed_join(boost::posix_time::milliseconds(8))) {
cls.interrupt();
rp="{\"c\": [998]}";
}
}
catch(...) {
rp="{\"c\": [997]}";
cout << "EXCEPTION" << endl;
}
Added the try/catch, and all is good!

Add proper return value checks, do not take anything as granted.
This looks like get_env fails and returns NULL, but there is a lot todo for you (gdb, strace, ..) so somebody can help you.

Related

boost asio: Is it thread safe to call tcp::socket::async_read_some() when handler is protected by a strand

I'm struggle to full understand Boost ASIO and strands. I was under the impression that the call to socket::async_read_some() was safe as long as the handler was wrapped in a strand. This appears not to be the case since the code eventually throws an exception.
In my situation a third party library is making the Session::readSome() calls. I'm using a reactor pattern with the ASIO layer under the third party library. When data arrives on the socket the 3rd party is called to do the read. The pattern is used since it is necessary to abort the read operation at any time and have the 3rd party library error out and return its thread. The third party expected a blocking read so the code mimics it with a conditional variable.
Given the example below what is the proper way to do this? Do I need to wrap the async_read_some() call in a dispatch() or post() so it runs through a strand too?
Note: Compiler is c++14 ;-(
Example representative code:
Session::Session (ba::io_context& ioContext):
m_sessionStrand ( ioContext.get_executor() ),
m_socket ( m_sessionStrand )
{}
int32_t Session::readSome (unsigned char* pBuffer, uint32_t bufferSizeToRead, boost::system::error_code& errorCode)
{
// The 3d party expects a synchronous read so we mimic the behavior
// with a async_read and then wait for the results. With this pattern
// we can unblock the read elsewhere - for or example calling close on the socket -
// and still give the 3d party the illusion of a synchronous read.
// In such a cases the 3rd party will receive an error code
// on the read and return it's thread.
// Nothing to do
if ( bufferSizeToRead == 0) return 0;
// Create a mutable buffer
ba::mutable_buffer buffer (pBuffer, bufferSizeToRead);
std::size_t result = 0;
errorCode.clear();
// Setup conditional
m_readerPause.exchange(true);
auto readHandler = [&result, &errorCode, self=shared_from_this()](boost::system::error_code ec, std::size_t bytesRead)
{
result = bytesRead;
errorCode = ec;
// Signal that we got results
std::unique_lock<std::mutex> lock{m_readerMutex};
m_readerPause.exchange(false);
m_readerPauseCV.notify_all();
};
m_socket.async_read_some(buffer, ba::bind_executor (m_sessionStrand, readHandler));
// We pause the 3rd party read thread until we get the read results back - or an error occurs
{
std::unique_lock<std::mutex> lock{m_readerMutex};
m_readerPauseCV.wait (lock, [this]{ return !m_readerPause.load(std::memory_order_acquire); } );
}
return result;
}
The exception occurs in epoll_reactor.ipp. There is a race condition between the read and closing the socket.
void epoll_reactor::start_op(int op_type, socket_type descriptor,
epoll_reactor::per_descriptor_data& descriptor_data, reactor_op* op,
bool is_continuation, bool allow_speculative)
{
if (!descriptor_data)
{
op->ec_ = boost::asio::error::bad_descriptor;
post_immediate_completion(op, is_continuation);
return;
}
mutex::scoped_lock descriptor_lock(descriptor_data->mutex_);
if (descriptor_data->shutdown_) //!! SegFault here: descriptor_data == NULL*
{
post_immediate_completion(op, is_continuation);
return;
}
...
}
Thanks in advance for any insights in the proper way to handle this situation using ASIO.
The strand doesn't "protect" the handler. Instead, it protects some shared state (which you control) by synchronizing handler execution. It's exactly like a mutex for async execution.
According to this logic all code running on the strand can touch the shared resources, and conversely, code not guaranteed to be on the strand can not be allowed to touch them.
In your code, the shared resources consist of at least buffer, result, m_socket. It would be more complete to include the m_sessionStrand, m_readerPauseCV, m_readerMutex, m_readerPause but all of these are implicitly threadsafe the way they are used¹.
Your code looks to do things safely in these regards. However it makes a few unfortunate detours that make it harder than necessary to check/reason about the code:
it uses more (local) shared state to communicate results from the handler
it doesn't make explicit what the mutex and/or the strand protect
it employs both a mutex and a strand which conceptually compete for the same responsibility
it employs both a condition and an atomic bool, which again compete for the same responsibility
it does manual strand binding, which muddies the expectations about what the native executor for the m_socket object is expected to be
the initial read is not protected. This means that if Session::readSome is invoked from a "wild" thread, it will use member functions without synchronizing with any other operations that may be pending on the m_socket.
the atomic_bool mutations are spelled in Very Convoluted Ways(TM), which serve to show you (presumably) understand the memory model, but make the code harder to review without tangible merit. Clearly, the blocking synchronization will (far) outweigh any benefit of explicit memory acquisition order. I suggest to at least "normalize" the spelling as atomic_bool was explicitly designed to afford:
//m_readerPause.exchange(true);
m_readerPause = true;
and
m_readerPauseCV.wait(lock, [this] { return !m_readerPause; });
since you are emulating blocking IO, there is no merit capturing shared_from_this() in the lambda. Lifetime should be guaranteed by the calling party any ways.
Interestingly, you didn't show this capture, which is required for the lambda to compile, assuming you didn't use global variables.
Kudos for explicitly clearing the error_code output variable. This is oft forgotten. Technically, you did forget about with the (questionable?) early exit when (bufferSizeToRead == 0)... You might have a slightly unorthodox caller contract where this makes sense.
To be generic I'd suggest to perform the zero-length read as it might behave differently depending on the transport connected.
Last, but not least, m_socket.[async_]read_some is rarely what you require on application protocol level. I'll leave this one to you, as you might have this exceptional edge-case scenario.
Simplifying
Conceptually, I'd like to write:
int32_t Session::readSome(unsigned char* buf, uint32_t size, error_code& ec) {
ec.clear();
size_t result = 0;
std::tie(ec, result) = m_socket
.async_read_some(ba::buffer(buf, size),
ba::as_tuple(ba::use_future))
.get();
return result;
}
This uses futures to get the blocking behaviour while being cancelable. Sadly, contrary to expectation there is currently a limitation that prevents combining as_tuple and use_future.
So, we have to either ignore partial success scenarios (significant result when !ec):
int32_t Session::readSome(unsigned char* buf, uint32_t size, error_code& ec) try {
ec.clear();
return m_socket
.async_read_some(ba::buffer(buf, size), ba::use_future)
.get();
} catch (boost::system::system_error const& se) {
ec = se.code();
return 0;
}
I suspect that member-async_read_some doesn't have a partial success mode. However, let's still give it thought, seeing that I warned before that async_read_some is rarely what you need anyways:
int32_t Session::readSome(unsigned char* buf, uint32_t size, error_code& ec) {
std::promise<std::tuple<size_t, error_code> > p;
m_socket.async_read_some(ba::buffer(buf, size), [&p](error_code ec_, size_t n_) { p.set_value({n_, ec_}); });
size_t result;
std::tie(result, ec) = p.get_future().get();
return result;
}
Still considerably easier.
Interim Result
Self contained example with the current approach:
Live On Coliru
#include <boost/asio.hpp>
namespace ba = boost::asio;
using ba::ip::tcp;
using boost::system::error_code;
using CharT = /*unsigned*/ char; // for ease of output...
struct Session : std::enable_shared_from_this<Session> {
tcp::socket m_socket;
Session(ba::any_io_executor ex) : m_socket(make_strand(ex)) {
m_socket.connect({{}, 7878});
}
int32_t readSome(CharT* buf, uint32_t size, error_code& ec) {
std::promise<std::tuple<size_t, error_code>> p;
m_socket.async_read_some(ba::buffer(buf, size), [&p](error_code ec_, size_t n_) {
p.set_value({n_, ec_});
});
size_t result;
std::tie(result, ec) = p.get_future().get();
return result;
}
};
#include <iomanip>
#include <iostream>
int main() {
ba::thread_pool ioc;
auto s = std::make_shared<Session>(ioc.get_executor());
error_code ec;
CharT data[10];
while (auto n = s->readSome(data, 10, ec))
std::cout << "Received " << quoted(std::string(data, n)) << " (" << ec.message() << ")\n";
ioc.join();
}
Testing with
g++ -std=c++14 -O2 -Wall -pedantic -pthread main.cpp
for resp in FOO LONG_BAR_QUX_RESPONSE; do nc -tln 7878 -w 0 <<< $resp; done&
set -x
sleep .2; ./a.out
sleep .2; ./a.out
Prints
+ sleep .2
+ ./a.out
Received "FOO
" (Success)
+ sleep .2
+ ./a.out
Received "LONG_BAR_Q" (Success)
Received "UX_RESPONS" (Success)
Received "E
" (Success)
External Synchronization (Cancellation?)
Now, code not show implies that other operations may act on m_socket, if at least only to cancel operations in flight³. If this situation arises you have add the missing synchronization, either using the mutex or the strand.
I suggest not introducing the competing synchronization mechanism, even though not "incorrect". It will
lead to simpler code
allow you to solidify your understanding of the use of the strand.
So, let's make sure that the operation runs on the strand:
int32_t readSome(CharT* buf, uint32_t size, error_code& ec) {
std::promise<size_t> p;
post(m_socket.get_executor(), [&] {
m_socket.async_read_some(ba::buffer(buf, size),
[&](error_code ec_, size_t n_) { ec = ec_; p.set_value(n_); });
});
return p.get_future().get();
}
void cancel() {
post(m_socket.get_executor(),
[self = shared_from_this()] { self->m_socket.cancel(); });
}
See it Live On Coliru
Exercising Cancellation
int main() {
ba::thread_pool ioc(1);
auto s = std::make_shared<Session>(ioc.get_executor());
std::thread th([&] {
std::this_thread::sleep_for(5s);
s->cancel();
});
error_code ec;
CharT data[10];
do {
auto n = s->readSome(data, 10, ec);
std::cout << "Received " << quoted(std::string(data, n)) << " (" << ec.message() << ")\n";
} while (!ec);
ioc.join();
th.join();
}
Again, Live On Coliru
¹ Technically in a multi-thread situation you need to notify the CV under the lock to allow for fair scheduling, i.e. to prevent waiter starvation. However your scenario is so isolated that you can get away with being somewhat sloppy.
² by default tcp::socket type-erases the executor with any_io_executor, but you could use basic_stream_socket<tcp, strand<io_context::executor_type> > to remove that cost if your executor type is statically known
³ Of course, POSIX sockets include full duplex scenarios, where read and write operations can be in flight simultaneoulsy.
UPDATE: redirect_error
Just re-discovered redirect_error which allows something close to as_tuple:
auto readSome(CharT* buf, uint32_t size, error_code& ec) {
return m_socket
.async_read_some(ba::buffer(buf, size),
ba::redirect_error(ba::use_future, ec))
.get();
}
void cancel() { m_socket.cancel(); }
This only suffices when readSome and cancel are guaranteed to be invoked on the strand.

Standalone ASIO Asynchronous Not Connecting

ASIO seems like the best async cross-platform networking library for my project. However, I'm having trouble getting it to actually connect.
First off, I'm not using Boost. I'm compiling this on Windows for the time being, so I had to manually add definitions to inform ASIO that I'm using a C++11-compliant compiler.
Source.cpp
#define TCPCLIENT_DEBUG
#include "TCPClient.hpp"
#include <iostream>
#define PORT "1234"
#define HOST "127.0.0.1"
int main() {
DEBUG("Starting program...\n");
namespace ip = asio::ip;
asio::io_service io;
ip::tcp::resolver::query query(HOST, PORT);
ip::tcp::resolver resolver(io);
decltype(resolver)::iterator ep_iter = resolver.resolve(query);
TCPClient client(io, ep_iter);
try {
std::cin.get();
}
catch (const std::exception &e) { // mainly to catch Ctrl+C
std::cout << e.what() << std::endl;
}
return 0;
}
TCPClient.hpp
#ifndef TCPCLIENT_HPP
#define TCPCLIENT_HPP
#include <functional>
#if defined(_DEBUG) || defined(TCPCLIENT_DEBUG)
#include <iostream>
#define DEBUG(dbg_msg) std::cerr << dbg_msg
#else
#define DEBUG(dbg_msg)
#endif
#define ASIO_STANDALONE
#define ASIO_HAS_CSTDINT
#define ASIO_HAS_STD_ARRAY
#define ASIO_HAS_STD_ADDRESSOF
#define ASIO_HAS_STD_SHARED_PTR
#define ASIO_HAS_STD_TYPE_TRAITS
#include <asio.hpp>
#ifndef BUFFER_SIZE
#define BUFFER_SIZE 1024
#endif
class TCPClient {
public:
TCPClient(asio::io_service& io, asio::ip::tcp::resolver::iterator endpoint_iter);
void on_connect(const asio::error_code& err);
private:
asio::io_service& m_io; // store the io service reference
asio::ip::tcp::socket m_sock; // object's socket
static const size_t bufSize{ BUFFER_SIZE }; // default buffer size
char m_buffer[bufSize]; // store the received data in a buffer
};
#endif//TCPCLIENT_HPP
TCPClient.cpp
#include "TCPClient.hpp"
TCPClient::TCPClient(asio::io_service& io, asio::ip::tcp::resolver::iterator endpoint_iter) : m_io{ io }, m_sock(io) {
asio::ip::tcp::endpoint endpoint = *endpoint_iter;
asio::error_code ec;
m_sock.async_connect(
endpoint,
std::bind(
&TCPClient::on_connect,
this,
std::placeholders::_1
)
);
}
void TCPClient::on_connect(const asio::error_code& err) {
DEBUG("Connected successfully!\n");
}
It seems to me that the on_connect is never being called. It only prints "Starting program...".
Using netcat, I can spawn a listener that sees the connection successfully go through.
What is obviously wrong with my code? I'm only working on the connection function for right now.
Handlers are only executed within threads that are currently running the io_service. As the io_service is never ran, the connect handler is never executed. To resolve this, run the io_service by calling io_service::run():
TCPClient client(io, ep_iter);
try {
io.run();
}
catch (const std::exception &e) {
std::cout << e.what() << std::endl;
}
The Using a timer asynchronously Tutorial notes the importance of running the io_service:
Finally, we must call the io_service::run() member function on the io_service object.
The asio library provides a guarantee that callback handlers will only be called from threads that are currently calling io_service::run(). Therefore unless the io_service::run() function is called the callback for the asynchronous wait completion will never be invoked.
The io_service::run() function will also continue to run while there is still "work" to do. In this example, the work is the asynchronous wait on the timer, so the call will not return until the timer has expired and the callback has completed.
It is important to remember to give the io_service some work to do before calling io_service::run(). For example, if we had omitted the above call to deadline_timer::async_wait(), the io_service would not have had any work to do, and consequently io_service::run() would have returned immediately.
By calling async_connect, you only register an asynchronous operation. You should explicitly call io_service.run() somewhere, - probably, in main instead of std::cin.get(), - to get your asynchronous operations really executed and callbacks called.
Under the hood, asio uses epoll or something similar: it registers events it is interested in (a socket connection in your case) and then waits for the events to happen. io_service.run() is precisely the place where waiting is done.
I'd advise you to look at some boost::asio asyncronous tutorials, like this one.

main() program won't exit normally

My C++ 2011 main() program for DiGSE is:
int main(int argc, char* argv[]) {
. . .
return EXIT_SUCCESS;
} // this } DOES match the opening { above
It compiles and executes correctly. A print statement immediately before the return outputs normally. However, a Windows 7.1 notification pops up saying "DiGSE.exe has stopped working." It then graciously offers to search the web for a solution.
I tried replacing the return with return 0; exit(0); and nothing so execution falls out the bottom (which, as I understand, is acceptable). However, in all cases I still get the pop-up.
What do I do to get the main() to exit gracefully?
DiGSE is just the name of the Windows 7 executable compiled on MinGW 4.9.2. The "full" program is already stripped down:
int main(int argc, char* argv[]) {
try {
DiGSE::log_init(DiGSE::log_dest_T::console_dest, "dig.log", true,
DiGSE::log_lvl_T::trace_lvl);
}//try
catch (const std::exception& ex) {
std::cerr << FMSG("\n"
"Executing '%1%' raised this exception:\n"
" %2%", % DiGSE::Partition::productName()
% ex.what())
<< std::endl;
return EXIT_FAILURE;
}//exception
catch (...) {
std::cerr << FMSG("\n"
"Executing '%1%' instance raised an unknown exception.",
% DiGSE::Partition::productName())
<< std::endl;
return EXIT_FAILURE;
}//exception
L_INFO(FMSG("'%1% v%2%' terminated normally.",
% DiGSE::Partition::productName()
% DiGSE::Partition::productVersion()))
return EXIT_SUCCESS;
}//main()
The L_INFO() is a logging call, which outputs as it should. The log_init() at the top initializes the log. Commenting out log_init() and L_INFO() has the same result as originally reported.
Program received signal SIGSEGV, Segmentation fault.
0x000000006fc8da9d in libstdc++-6!_ZNSo6sentryC1ERSo ()
from D:\Program Files\mingw-w64\x86_64-4.9.2-posix-seh-rt_v3-rev0\mingw64\bin
\libstdc++-6.dll
This is what gdb returns while mail() is exiting. It does this even with the log_init() and L_LNFO() commented out. So the problem is probably in one of globals of something it's linked to.
It is completely possible for a program to crash after the end of main -- the program isn't over yet. The following items execute after main() returns:
Registered at_exit handlers
Destructors for main()'s own automatic variables, and all variables with static storage duration (globals and function-static) (C++ only)
DllMain(PROCESS_DETACH) code in all dynamic libraries you are using (Windows only)
In addition to that, various events can occur outside your program and cause failures which you might mistake for a failure of your program (especially if your program forks or spawns copies of itself):
SIGCHLD is raised (on *nix). Process handles become signaled and cause wait functions to return (on Windows)
All open handles (file descriptors) get abandoned, and the close handler in the driver is invoked
The other end of connections (pipes, sockets) shift into a disconnected state (reads return 0, writes fail, on *nix SIGHUP may be raised)
I suggest attaching a debugger, set a breakpoint at the end of main, and then single-step through the cleanup code to find out where the failure is occurring. Divide and conquer may also be helpful (cut out some global variables, or all usage of a particular DLL).

Windows boost asio: 10061 in async_receive_from on on async_send_to

I have a fairly large application that works as desired on Linux. I've recently compiled it on Windows 7 using VC2012 and boost asio 1.52 and run into a strange issue:
An async_receive_from followed by an async_send_to on the same UDP socket results in the read completion handler being called with boost::system::error_code 10061:
No connection could be made because the target machine actively refused it
if the send destination is another port on the local host. If the packet is sent to another machine, the read completion handler is not called. After the read completion handler, the write completion handler is called with no error.
The following code replicates the issue:
#include <iostream>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <boost/shared_ptr.hpp>
using namespace std;
using namespace boost::asio;
void read_completion_handler(const boost::system::error_code& ec, std::size_t bytes_received)
{
if (!ec)
cout << "Received " << bytes_received << " successfully" << endl;
else
cout << "Error: " << ec.message() << endl;
}
void write_completion_handler(const boost::system::error_code& ec, std::size_t bytes_transferred)
{
if (!ec)
cout << "Wrote " << bytes_transferred << " successfully" << endl;
else
cout << "Error: " << ec.message() << endl;
}
int main(int argc, char** argv)
{
enum
{
max_length = 1500,
out_length = 100
};
// buffer for incoming data
char data[max_length];
// outgoing data
char out_data[out_length];
// sender endpoint
ip::udp::endpoint sender_endpoint;
// for sending packets: if this localhost, the error occurs
ip::udp::endpoint destination(ip::address::from_string("127.0.0.1"), 5004);
io_service ioService;
ip::udp::socket socket(ioService, ip::udp::endpoint(ip::udp::v4(), 49170));
socket.async_receive_from(
buffer(data, max_length), sender_endpoint,
boost::bind(&read_completion_handler,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
socket.async_send_to( boost::asio::buffer(out_data, out_length),
destination,
boost::bind(&write_completion_handler,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
ioService.run();
cout << "Done" << endl;
return 0;
}
On linux this is never an issue. Does anyone have an explanation? As far as I know, simultaneous reads and writes on the same socket should be ok or is this not the case on Windows? Why the change in behaviour if localhost is the destination?
Yes, it's about 6 months after you asked this question. I'm not even sure how I ended up here. I ran into this problem myself -- but the good news is that it's not a problem.
Some machines return a Destination Unreachable message through ICMP when they aren't listening on the port you sent your message to. Asio translates this to boost::system::errc::connection_refused and/or boost::system::errc::connection_reset. This is a meaningless error since UDP is connectionless. You can safely ignore these two error codes in your async_receive_from handler (ie, if you get one of these errors returned, just call async_receive_from again).
For anyone stumbling on this, read the comment I made above to the first response.
However, if you are by any change encountering the same issue in C#, use this code to get rid of the behavior:
byte[] byteTrue = new byte[4];
byteTrue[byteTrue.Length - 1] = 1;
m_udpClient.Client.IOControl(-1744830452, byteTrue, null);
To disable ICMP PORT_UNREACHABLE on UDP receive, set SIO_UDP_CONNRESET to 0 (not 1, like the other answer suggests):
#ifdef _WIN32
struct winsock_udp_connreset {
unsigned long value = 0;
int name() { return -1744830452; /* SIO_UDP_CONNRESET */ }
unsigned long* data() { return &value; }
};
winsock_udp_connreset connreset{0};
socket.io_control(connreset);
#endif

How to handle seg faults under Windows?

How can a Windows application handle segmentation faults? By 'handle' I mean intercept them and perhaps output a descriptive message. Also, the ability to recover from them would be nice too, but I assume that is too complicated.
Let them crash and let the Windows Error Reporting handle it - under Vista+, you should also consider registering with Restart Manager (http://msdn.microsoft.com/en-us/library/aa373347(VS.85).aspx), so that you have a chance to save out the user's work and restart the application (like what Word/Excel/etc.. does)
Use SEH for early exception handling,
and use SetUnhandledExceptionFilter to show a descriptive message.
If you add the /EHa compiler argument then try {} catch(...) will catch all exceptions for you, including SEH exceptions.
You can also use __try {} __except {} which gives you more flexibility on what to do when an exception is caught. putting an __try {} __except {} on your entire main() function is somewhat equivalent to using SetUnhandeledExceptionFilter().
That being said, you should also use the proper terminology: "seg-fault" is a UNIX term. There are no segmentation faults on Windows. On Windows they are called "Access Violation Exceptions"
C++ self-contained example on how to use SetUnhandledExceptionFilter, triggering a write fault and displaying a nice error message:
#include <windows.h>
#include <sstream>
LONG WINAPI TopLevelExceptionHandler(PEXCEPTION_POINTERS pExceptionInfo)
{
std::stringstream s;
s << "Fatal: Unhandled exception 0x" << std::hex << pExceptionInfo->ExceptionRecord->ExceptionCode
<< std::endl;
MessageBoxA(NULL, s.str().c_str(), "my application", MB_OK | MB_ICONSTOP);
exit(1);
return EXCEPTION_CONTINUE_SEARCH;
}
int main()
{
SetUnhandledExceptionFilter(TopLevelExceptionHandler);
int *v=0;
v[12] = 0; // should trigger the fault
return 0;
}
Tested successfully with g++ (and should work OK with MSVC++ as well)
What you want to do here depends on what sort of faults you are concerned with. If you have sloppy code that is prone to more or less random General Protection Violations, then #Paul Betts answer is what you need.
If you have code that has a good reason to deference bad pointers, and you want to recover, start from #whunmr's suggestion about SEH. You can handle and indeed recover, if you have clear enough control of your code to know exactly what state it is in at the point of the fault and how to go about recovering.
Similar to Jean-François Fabre solution, but with Posix code in MinGW-w64. But note that the program must exit - it can't recover from the SIGSEGV and continue.
#include <stdio.h>
#include <signal.h>
#include <stdlib.h>
void sigHandler(int s)
{
printf("signal %d\n", s);
exit(1);
}
int main()
{
signal(SIGSEGV, sigHandler);
int *v=0;
*v = 0; // trigger the fault
return 0;
}

Resources