How to diagnose "random" winsock failures

How to diagnose "random" winsock failures - windows

I have winsock2 application, running Windows on both ends. MSVC++ 2017 is my development environment. Without getting too much into the details, the two ends of it (both written by me) send small messages back and forth. Both ends run as a Windows service, and the connection is meant to stay running for hours or days. The problem is that on one end of it (call it machine A), after running normally for an hour or more, the connection seems to fail. Machine A sends a message to machine B, which is received, then machine B sends a response back. Machine A has the socket in non-blocking mode, so it goes into a loop of calling recv() until data shows up. If it gets WSAEWOULDBLOCK, it checks to see if a timeout interval has occurred, if not, loops back to the Recv(). The error happens because the timeout value (3 minutes) has passed. There is no way that the data could have been delayed that long, something has been hosed. A subsequent send() from machine A results in error 10054, connection reset.
As I said, this can run for hours with no problems. Other times I've seen it fail after 45 minutes or so. Both machines are configured not to go into power save mode or anything like that. Can someone suggest how I would go about diagnosing this problem?
Update: sample code. The goal is to wait until a buffer of a certain length (tlen) has been received, and to allow the routine to time out if something has gone wrong:
while (TRUE) {
Sleep(1);
iret = recv(isocket, s, rlen, 0);
err = WSAGetLastError();
if (iret == SOCKET_ERROR) {
if (err == WSAEWOULDBLOCK) {
time(&tcur);
if (tcur > tend) {
com_Log("Timeout on recv 1");
return(FALSE);
}
continue;
}
com_Log("Error on recv, error=%lu", err);
return(FALSE);
}
time(&tend);
tend = tend + gTimeOut;
clen = clen + iret;
if (clen >= tlen) break;
s = s + iret;
rlen = rlen - iret;
}

Related

MPI Non-blocking Irecv didn't receive data?

I use MPI non-blocking communication(MPI_Irecv, MP_Isend) to monitor the slaves' idle states, the code is like bellow.
rank 0:
int dest = -1;
while( dest <= 0){
int i;
for(i=1;i<=slaves_num;i++){
printf("slave %d, now is %d \n",i,idle_node[i]);
if (idle_node[i]== 1) {
idle_node[i] = 0;
dest = i;
break;
}
}
if(dest <= 0){
MPI_Irecv(&idle_node[1],1,MPI_INT,1,MSG_IDLE,MPI_COMM_WORLD,&request);
MPI_Irecv(&idle_node[2],1,MPI_INT,2,MSG_IDLE,MPI_COMM_WORLD,&request);
MPI_Irecv(&idle_node[3],1,MPI_INT,3,MSG_IDLE,MPI_COMM_WORLD,&request);
// MPI_Wait(&request,&status);
}
usleep(100000);
}
idle_node[dest] = 0;//indicates this slave is busy now
rank 1,2,3:
while(1)
{
...//do something
MPI_Isend(&idle,1,MPI_INT,0,MSG_IDLE,MPI_COMM_WORLD,&request);
MPI_Wait(&request,&status);
}
it works, but I want it to be faster, so I delete the line:
usleep(100000);
then rank 0 goes into a dead while like this:
slave 1, now is 0
slave 2, now is 0
slave 3, now is 0
slave 1, now is 0
slave 2, now is 0
slave 3, now is 0
...
So does it indicate that when I use the MPI_Irecv, it just tells MPI I want to receive a message here（haven't received message), and MPI needs other time to receive the real data? or some reasons else?

The use of non-blocking operations has been discussed over and over again here. From the MPI specification (section Nonblocking Communication):
Similarly, a nonblocking receive start call initiates the receive operation, but does not complete it. The call can return before a message is stored into the receive buffer. A separate receive complete call is needed to complete the receive operation and verify that the data has been received into the receive buffer. With suitable hardware, the transfer of data into the receiver memory may proceed concurrently with computations done after the receive was initiated and before it completed.
(the bold text is copied verbatim from the standard; the emphasis in italic is mine)
The key sentence is the last one. The standard does not give any guarantee that a non-blocking receive operation will ever complete (or even start) unless MPI_WAIT[ALL|SOME|ANY] or MPI_TEST[ALL|SOME|ANY] was called (with MPI_TEST* setting a value of true for the completion flag).
By default Open MPI comes as a single-threaded library and without special hardware acceleration the only way to progress non-blocking operations is to either call periodically into some non-blocking calls (with the primary example of MPI_TEST*) or call into a blocking one (with the primary example being MPI_WAIT*).
Also your code leads to a nasty leak that will sooner or later result in resource exhaustion: you are calling MPI_Irecv multiple times with the same request variable, effectively overwriting its value and losing the reference to the previously started requests. Requests that are not waited upon are never freed and therefore remain in memory.
There is absolutely no need to use non-blocking operations in your case. If I understand the logic correctly, you can achieve what you want with code as simple as:
MPI_Recv(&dummy, 1, MPI_INT, MPI_ANY_SOURCE, MSG_IDLE, MPI_COMM_WORLD, &status);
idle_node[status.MPI_SOURCE] = 0;
If you'd like to process more than one worker processes at the same time, it is a bit more involving:
MPI_Request reqs[slaves_num];
int indices[slaves_num], num_completed;
for (i = 0; i < slaves_num; i++)
reqs[i] = MPI_REQUEST_NULL;
while (1)
{
// Repost all completed (or never started) receives
for (i = 1; i <= slaves_num; i++)
if (reqs[i-1] == MPI_REQUEST_NULL)
MPI_Irecv(&idle_node[i], 1, MPI_INT, i, MSG_IDLE,
MPI_COMM_WORLD, &reqs[i-1]);
MPI_Waitsome(slaves_num, reqs, &num_completed, indices, MPI_STATUSES_IGNORE);
// Examine num_completed and indices and feed the workers with data
...
}
After the call to MPI_Waitsome there will be one or more completed requests. The exact number will be in num_completed and the indices of the completed requests will be filled in the first num_completed elements of indices[]. The completed requests will be freed and the corresponding elements of reqs[] will be set to MPI_REQUEST_NULL.
Also, there appears to be a common misconception about using non-blocking operations. A non-blocking send can be matched by a blocking receive and also a blocking send can be equally matched by a non-blocking receive. That makes such constructs nonsensical:
// Receiver
MPI_Irecv(..., &request);
... do something ...
MPI_Wait(&request, &status);
// Sender
MPI_Isend(..., &request);
MPI_Wait(&request, MPI_STATUS_IGNORE);
MPI_Isend immediately followed by MPI_Wait is equivalent to MPI_Send and the following code is perfectly valid (and easier to understand):
// Receiver
MPI_Irecv(..., &request);
... do something ...
MPI_Wait(&request, &status);
// Sender
MPI_Send(...);

Socket programming Update: recv returning -1, error = 10053

I'm implementing a TCP/IP application on Windows 7 that loops around a socket recv() call. For small amount of data (< 5 MB) it works fine, but for large data (>20 MB), the recv fails in between.
Details: My app needs to communicate with HTTP server running , both running on same machine, in this scenerio, tcp app is sending heavy data to HTTP server
It gives error = 2, recv returns 0.
Error 2 means ENOENT, but what does it means?. Does anyone know what this is (in regards to a socket) and how I can get around this?
msgLen = recv(s,msg,BUFFER_SIZE,0);
if(msgLen > 0)
{
// do processing
}
else
{
printf("\n no data received .... msgLen=%d",msgLen);
printf("\n no data received .... errno=%d",errno);
}
Update Code as per comment
msgLen = recv(s,msg,BUFFER_SIZE,0);
if(msgLen > 0)
{
// do processing
}
else if(msgLen == 0)
{
printf("\n sender disconnected");
}
else
{
printf("\n no data received .... msgLen=%d",msgLen);
printf("\n no data received .... errno=%d",WSAGetLastError());
}
The error I get now is:
Firstly, recv = 0 many times, i.e. sender disconnected;
Finally, recv returns -1, and error = 10053.
My TCP/IP application is sending data to HTTP Server. The same works fine with small data, but the issue comes with large amount of data. Is HTTP server getting time out?

When recv() returns 0, it means the other party disconnected gracefully (assuming that your requested buffer size is not 0). recv() only provides an error code when it returns SOCKET_ERROR (-1). On Windows, you have to use WSAGetLastError() to get the error code, not errno, eg:
msgLen = recv(s,msg,BUFFER_SIZE,0);
if(msgLen > 0)
{
// do processing
}
else if (msgLen == 0)
{
printf("\n sender disconnected");
}
else
{
printf("\n no data received .... error=%d",WSAGetLastError());
}
Also keep in mind that if you are using a non-blocking socket, the error code may be WSAEWOULDBLOCK, which is not a fatal error. You can use select() to detect when the socket has data and then attempt the recv() again.

Boost asio read an unknown number of bytes

I have 2 cases:
Client connects, send no bytes and wait for server response.
Client connects, send more than 1 bytes and wait for server response.
Problem is next:
in 1st case I should read no bytes and get some server response.
in 2nd case I should read at least 1 byte and only then I'll get a server response.
If i try to read at least 0 bytes, like this:
async_read(sock, boost::asio::buffer(data),
boost::asio::transfer_at_least(0),
boost::bind(&server::read, this,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
I will never get proper server reposonse in 2nd case.
But if I read at least 1 byte than this async_read operation will never ends.
So, how can I process this cases?
Update 1:
I'm still searching for solution without using time limit.

How do you expect this to work? Does the response vary between the first and second case? If it does vary, you cannot do this reliably because there is a race condition and you should fix the protocol. If it does not vary, the server should just send the response.
The solution to this problem is not an asio issue.

here is what I did
while (ec != boost::asio::error::eof) {
vector<char>socketBuffer(messageSize);
size_t buffersize = socket->read_some(
boost::asio::buffer(socketBuffer),ec);
if (buffersize > 0) {
cout << "received" << endl;
for (unsigned int i=0; i < buffersize; ++i) {
cout << socketBuffer.at(i);
}
cout << "" << endl;
}
socketBuffer.clear();
}
if(!ec){
cout<<"error "<<ec.message()<<endl;
}
socket->close();
That is a small snippet of my code I hope it helps you can read a set amount of data and append it to a vector of bytes if you like and process it later.
Hope this helps

I guess u need to use the data which the client send to make the proper server response in case 2, and maybe make a default response in case 1.
So there is no time limit about how long the client should send the data after connected the server? Maybe u should start a timer waiting for the special limited time after the server accepted the connection. If the server receive the data in time, it's case 2. If time is over, then it's case 1.

Does CancelSynchronousIo work with WNetAddConnection2?

I'm trying and failing to cancel a call to WNetAddConnection2 with CancelSynchronousIo.
The call to CancelSynchronousIo succeeds but nothing is actually cancelled.
I'm using a 32-bit console app running on Windows 7 x64.
Has anyone done this successfully? Am I doing something dumb? Here's a sample console app (which needs to be linked with mpr.lib):
DWORD WINAPI ConnectThread(LPVOID param)
{
NETRESOURCE nr;
memset(&nr, 0, sizeof(nr));
nr.dwType = RESOURCETYPE_ANY;
nr.lpRemoteName = L"\\\\8.8.8.8\\bog";
// result is ERROR_BAD_NETPATH (i.e. the call isn't cancelled)
DWORD result = WNetAddConnection2(&nr, L"pass", L"user", CONNECT_TEMPORARY);
return 0;
}
int _tmain(int argc, _TCHAR* argv[])
{
// Create a new thread to run WNetAddConnection2
HANDLE hThread = CreateThread(0, 0, ConnectThread, 0, 0, 0);
if (!hThread)
return 1;
// Retry the cancel until it fails; keep track of how often
int count = 0;
BOOL ok;
do
{
// Sleep to give the thread a chance to start
Sleep(1000);
ok = CancelSynchronousIo(hThread);
++count;
}
while (ok);
// count will equal two here (i.e. one successful cancellation and
// one failed cancellation)
// err is ERROR_NOT_FOUND (i.e. nothing to cancel) which makes
// sense for the second call
DWORD err = GetLastError();
// Wait for the thread to finish; this takes ages (i.e. the
// WNetAddConnection2 call is not cancelled)
WaitForSingleObject(hThread, INFINITE);
return 0;
}

According to Larry Osterman (I hope he doesn't mind me quoting him): "The question was answered in the comments: wnetaddconnection2 isn’t a simple IOCTL call." So the answer (unfortunately) is no.

First, WNetAddConnection2 is system-wide, not per-process. This is important, as calling WNetAddConnection2 many times can wreck system stability - particularly with explorer.
I use WNetGetResourceInformation first to check if the connection already exists before even thinking of calling it - my process may have previously run and then shutdown. The connection may still exist. When my Windows service(s) needs to add such a connection I use a nasty little trick in order to prevent these totally non-abortable API's from stalling my own service shutdown.
The trick is to run these calls in a separate process: they are system-wide, after all. You can normally wait for the process to complete as if you called the functions yourself but you can terminate the process and give up waiting if you need to abort in order to shutdown.
Sadly, however, certain Windows resources, such as named pipe handles and handles to files open on remote computers, can take about 16 seconds to close following failure or shutdown of a remote machine. CancelSynchronousIo does not seem to even help with those but will likely add additional long delay.

Duplex named pipe hangs on a certain write

I have a C++ pipe server app and a C# pipe client app communicating via Windows named pipe (duplex, message mode, wait/blocking in separate read thread).
It all works fine (both sending and receiving data via the pipe) until I try and write to the pipe from the client in response to a forms 'textchanged' event. When I do this, the client hangs on the pipe write call (or flush call if autoflush is off). Breaking into the server app reveals it's also waiting on the pipe ReadFile call and not returning.
I tried running the client write on another thread -- same result.
Suspect some sort of deadlock or race condition but can't see where... don't think I'm writing to the pipe simultaneously.
Update1: tried pipes in byte mode instead of message mode - same lockup.
Update2: Strangely, if (and only if) I pump lots of data from the server to the client, it cures the lockup!?
Server code:
DWORD ReadMsg(char* aBuff, int aBuffLen, int& aBytesRead)
{
DWORD byteCount;
if (ReadFile(mPipe, aBuff, aBuffLen, &byteCount, NULL))
{
aBytesRead = (int)byteCount;
aBuff[byteCount] = 0;
return ERROR_SUCCESS;
}
return GetLastError();
}
DWORD SendMsg(const char* aBuff, unsigned int aBuffLen)
{
DWORD byteCount;
if (WriteFile(mPipe, aBuff, aBuffLen, &byteCount, NULL))
{
return ERROR_SUCCESS;
}
mClientConnected = false;
return GetLastError();
}
DWORD CommsThread()
{
while (1)
{
std::string fullPipeName = std::string("\\\\.\\pipe\\") + mPipeName;
mPipe = CreateNamedPipeA(fullPipeName.c_str(),
PIPE_ACCESS_DUPLEX,
PIPE_TYPE_MESSAGE | PIPE_READMODE_MESSAGE | PIPE_WAIT,
PIPE_UNLIMITED_INSTANCES,
KTxBuffSize, // output buffer size
KRxBuffSize, // input buffer size
5000, // client time-out ms
NULL); // no security attribute
if (mPipe == INVALID_HANDLE_VALUE)
return 1;
mClientConnected = ConnectNamedPipe(mPipe, NULL) ? TRUE : (GetLastError() == ERROR_PIPE_CONNECTED);
if (!mClientConnected)
return 1;
char rxBuff[KRxBuffSize+1];
DWORD error=0;
while (mClientConnected)
{
Sleep(1);
int bytesRead = 0;
error = ReadMsg(rxBuff, KRxBuffSize, bytesRead);
if (error == ERROR_SUCCESS)
{
rxBuff[bytesRead] = 0; // terminate string.
if (mMsgCallback && bytesRead>0)
mMsgCallback(rxBuff, bytesRead, mCallbackContext);
}
else
{
mClientConnected = false;
}
}
Close();
Sleep(1000);
}
return 0;
}
client code:
public void Start(string aPipeName)
{
mPipeName = aPipeName;
mPipeStream = new NamedPipeClientStream(".", mPipeName, PipeDirection.InOut, PipeOptions.None);
Console.Write("Attempting to connect to pipe...");
mPipeStream.Connect();
Console.WriteLine("Connected to pipe '{0}' ({1} server instances open)", mPipeName, mPipeStream.NumberOfServerInstances);
mPipeStream.ReadMode = PipeTransmissionMode.Message;
mPipeWriter = new StreamWriter(mPipeStream);
mPipeWriter.AutoFlush = true;
mReadThread = new Thread(new ThreadStart(ReadThread));
mReadThread.IsBackground = true;
mReadThread.Start();
if (mConnectionEventCallback != null)
{
mConnectionEventCallback(true);
}
}
private void ReadThread()
{
byte[] buffer = new byte[1024 * 400];
while (true)
{
int len = 0;
do
{
len += mPipeStream.Read(buffer, len, buffer.Length);
} while (len>0 && !mPipeStream.IsMessageComplete);
if (len==0)
{
OnPipeBroken();
return;
}
if (mMessageCallback != null)
{
mMessageCallback(buffer, len);
}
Thread.Sleep(1);
}
}
public void Write(string aMsg)
{
try
{
mPipeWriter.Write(aMsg);
mPipeWriter.Flush();
}
catch (Exception)
{
OnPipeBroken();
}
}

If you are using separate threads you will be unable to read from the pipe at the same time you write to it. For example, if you are doing a blocking read from the pipe then a subsequent blocking write (from a different thread) then the write call will wait/block until the read call has completed and in many cases if this is unexpected behavior your program will become deadlocked.
I have not tested overlapped I/O, but it MAY be able to resolve this issue. However, if you are determined to use synchronous calls then the following models below may help you to solve the problem.
Master/Slave
You could implement a master/slave model in which the client or the server is the master and the other end only responds which is generally what you will find the MSDN examples to be.
In some cases you may find this problematic in the event the slave periodically needs to send data to the master. You must either use an external signaling mechanism (outside of the pipe) or have the master periodically query/poll the slave or you can swap the roles where the client is the master and the server is the slave.
Writer/Reader
You could use a writer/reader model where you use two different pipes. However, you must associate those two pipes somehow if you have multiple clients since each pipe will have a different handle. You could do this by having the client send a unique identifier value on connection to each pipe which would then let the server associate the two pipes. This number could be the current system time or even a unique identifier that is global or local.
Threads
If you are determined to use the synchronous API you can use threads with the master/slave model if you do not want to be blocked while waiting for a message on the slave side. You will however want to lock the reader after it reads a message (or encounters the end of a series of message) then write the response (as the slave should) and finally unlock the reader. You can lock and unlock the reader using locking mechanisms that put the thread to sleep as these would be most efficient.
Security Problem With TCP
The loss going with TCP instead of named pipes is also the biggest possible problem. A TCP stream does not contain any security natively. So if security is a concern you will have to implement that and you have the possibility of creating a security hole since you would have to handle authentication yourself. The named pipe can provide security if you properly set the parameters. Also, to note again more clearly: security is no simple matter and generally you will want to use existing facilities that have been designed to provide it.

I think you may be running into problems with named pipes message mode. In this mode, each write to the kernel pipe handle constitutes a message. This doesn't necessarily correspond with what your application regards a Message to be, and a message may be bigger than your read buffer.
This means that your pipe reading code needs two loops, the inner reading until the current [named pipe] message has been completely received, and the outer looping until your [application level] message has been received.
Your C# client code does have a correct inner loop, reading again if IsMessageComplete is false:
do
{
len += mPipeStream.Read(buffer, len, buffer.Length);
} while (len>0 && !mPipeStream.IsMessageComplete);
Your C++ server code doesn't have such a loop - the equivalent at the Win32 API level is testing for the return code ERROR_MORE_DATA.
My guess is that somehow this is leading to the client waiting for the server to read on one pipe instance, whilst the server is waiting for the client to write on another pipe instance.

It seems to me that what you are trying to do will rather not work as expected.
Some time ago I was trying to do something that looked like your code and got similar results, the pipe just hanged
and it was difficult to establish what had gone wrong.
I would rather suggest to use client in very simple way:
CreateFile
Write request
Read answer
Close pipe.
If you want to have two way communication with clients which are also able to receive unrequested data from server you should
rather implement two servers. This was the workaround I used: here you can find sources.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to diagnose "random" winsock failures - windows

Related

MPI Non-blocking Irecv didn't receive data?

Socket programming Update: recv returning -1, error = 10053

Boost asio read an unknown number of bytes

Does CancelSynchronousIo work with WNetAddConnection2?

Duplex named pipe hangs on a certain write

Categories

Resources