Monitoring UDP socket in glib(mm) eats up CPU time - windows

I have a GTKmm Windows application (built with MinGW) that receives UDP packets (no sending). The socket is native winsock and I use glibmm IOChannel to connect it to the application main loop. The socket is read with recvfrom.
My problem is: this setup eats 25% percent CPU time on a 3GHz workstation. Can somebody tell me why?
The application is idle in this case, and if I remove the UDP code, CPU usage drops down to almost zero. As the application has to perform some CPU intensive tasks, I could image better ways to spend that 25%
Here are some code excerpts: (sorry for the printf's ;) )
/* bind */
void UDPInterface::bindToPort(unsigned short port)
{
struct sockaddr_in target;
WSADATA wsaData;
target.sin_family = AF_INET;
target.sin_port = htons(port);
target.sin_addr.s_addr = 0;
if ( WSAStartup ( 0x0202, &wsaData ) )
{
printf("WSAStartup failed!\n");
exit(0); // :)
WSACleanup();
}
sock = socket( AF_INET, SOCK_DGRAM, 0 );
if (sock == INVALID_SOCKET)
{
printf("invalid socket!\n");
exit(0);
}
if (bind(sock,(struct sockaddr*) &target, sizeof(struct sockaddr_in) ) == SOCKET_ERROR)
{
printf("failed to bind to port!\n");
exit(0);
}
printf("[UDPInterface::bindToPort] listening on port %i\n", port);
}
/* read */
bool UDPInterface::UDPEvent(Glib::IOCondition io_condition)
{
recvfrom(sock, (char*)buf, BUF_SIZE*4, 0, NULL, NULL);
/* process packet... */
}
/* glibmm connect */
Glib::RefPtr channel = Glib::IOChannel::create_from_win32_socket(udp.sock);
Glib::signal_io().connect( sigc::mem_fun(udp, &UDPInterface::UDPEvent), channel, Glib::IO_IN );
I've read here in some other question, and also in glib docs (g_io_channel_win32_new_socket()) that the socket is put into nonblocking mode, and it's "a side-effect of the implementation and unavoidable". Does this explain the CPU effect, it's not clear to me?
Whether or not I use glib to access the socket or call recvfrom() directly doesn't seem to make much difference, since CPU is used up before any packet arrives and the read handler gets invoked. Also glibmm docs state that it's ok to call recvfrom() even if the socket is polled (Glib::IOChannel::create_from_win32_socket())
I've tried compiling the program with -pg and created a per function cpu usage report with gprof. This wasn't usefull because the time is not spent in my program, but in some external glib/glibmm dll.

Related

c: socketCAN connection: read() not fast enough

socketCAN connection: read() not fast enough
Hello,
I use the socket() connection for my CAN communication.
fd = socket(PF_CAN, SOCK_RAW, CAN_RAW);
I'm using 2 threads: one periodic 1ms RT thread to send data and one
thread to read the incoming messages. The read function looks like:
void readCan0Socket(void){
int receivedBytes = 0;
do
{
// set GPIO pin low
receivedBytes = read(fd ,
&receiveCanFrame[recvBufferWritePosition],
sizeof(struct can_frame));
// reset GPIO pin high
if (receivedBytes != 0)
{
if (receivedBytes == sizeof(struct can_frame))
{
recvBufferWritePosition++;
if (recvBufferWritePosition == CAN_MAX_RECEIVE_BUFFER_LENGTH)
{
recvBufferWritePosition = 0;
}
}
receivedBytes = 0;
}
} while (1);
}
The socket is configured in blocking mode, so the read function stays open
until a message arrived. The current implementation is working, but when
I measure the time between reading a message and the next waiting state of
the read function (see set/reset GPIO comment) the time varies between 30 us
(the mean value) and > 200 us. A value greather than 200us means
(CAN has a baud rate of 1000 kBit/s) that packages are not recognized while
the read() handles the previous message. The read() function must be ready within
134 us.
How can I accelerate my implementation? I tried to use two threads which are
separated with Mutexes (lock before the read() function and unlock after a
message reception), but this didn't solve my problem.

Inconsistent behavior transmitting bursts of UDP packets on Windows 7

I've got two systems, both running Windows 7. The source is 192.168.0.87, the target is 192.168.0.22, they are both connected to a small switch on my desk.
The source is transmitting a burst of 100 UDP packets to the target with this program -
#include <iostream>
#include <vector>
using namespace std;
#include <winsock2.h>
int main()
{
// It's windows, we need this.
WSAData wsaData;
int wres = WSAStartup(MAKEWORD(2,2), &wsaData);
if (wres != 0) { exit(1); }
SOCKET s = socket(AF_INET, SOCK_DGRAM, 0);
if (s < 0) { exit(1); }
struct sockaddr_in addr;
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(0);
if (bind(s, (struct sockaddr *)&addr, sizeof(addr)) < 0) { exit(3); }
int max = 100;
// build all the packets to send
typedef vector<unsigned char> ByteArray;
vector<ByteArray> v;
v.reserve(max);
for(int i=0;i<max;i++) {
ByteArray bytes(150+(i%25), 'a'+(i%26));
v.push_back(bytes);
}
// send all the packets out, one right after the other.
addr.sin_addr.s_addr = htonl(0xC0A80016);// 192.168.0.22
addr.sin_port = htons(24105);
for(int i=0;i<max;++i) {
if (sendto(s, (const char *)v[i].data(), v[i].size(), 0,
(struct sockaddr *)&addr, sizeof(addr)) < 0) {
cout << "i: " << i << " error: " << errno;
}
}
closesocket(s);
cout << "Complete!" << endl;
}
Now, on first run I get massive losses of UDP packets (often only 1 will get through!).
On subsequent runs, all 100 make it through.
If I wait for 2 minutes or so, and run again, I'm back to losing most of the packets.
Reception on the target system is done using Wireshark.
I also ran Wireshark at the same time on the source system, and found exactly the same trace as on the target in all cases.
That means that the packets are getting lost on the source machine, rather than being lost in the switch or on the wire.
I also tried running sysinternals process monitor, and found that indeed, all 100 sendto calls do result in appropriate winsock calls, but not necessarily in packets on the wire.
As near as I can tell (using arp -a), in all cases the target's IP is in the source's arp cache.
Can anyone tell me why Windows is so inconsistent in how it treats these packets? I get that in my actual application I've just got to rate limit my sends a bit, but I'd like to understand why it works sometimes and not others.
Oh yes, and I also tried swapping the systems for send and receive, with no change in behavior.
Most probably the client is overruning udp send buffer. Maybe while ARP protocol is running to get the target MAC address. You say that you lose datagrams the first run and if you wait 2 minutes or more. Why don't you check with Wireshark what happens in that first run? (If ARP frames are sent/received)
If that is the problem, you could apply one of these 2 alternatives:
1-Before running make sure the ARP entry is there.
2-Send the first datagram, wait 1 sec or less, send the burst

Upper limit to UDP performance on windows server 2008

It looks like from my testing I am hitting a performance wall on my 10gb network. I seem to be unable to read more than 180-200k packets per second. Looking at perfmon, or task manager I can receive up to a million packets / second if not more. Testing 1 socket or 10 or 100, doesn't seem to change this limit of 200-300k packets a second. I've fiddled with RSS and the like without success. Unicast vs multicast doesn't seem to matter, overlapped i/o vs synchronous doesn't make a difference either. Size of packet doesn't matter either. There just seems to be a hard limit to the number of packets windows can copy from the nic to the buffer. This is a dell r410. Any ideas?
#include "stdafx.h"
#include <WinSock2.h>
#include <ws2ipdef.h>
static inline void fillAddr(const char* const address, unsigned short port, sockaddr_in &addr)
{
memset( &addr, 0, sizeof( addr ) );
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = inet_addr( address );
addr.sin_port = htons(port);
}
int _tmain(int argc, _TCHAR* argv[])
{
#ifdef _WIN32
WORD wVersionRequested;
WSADATA wsaData;
int err;
wVersionRequested = MAKEWORD( 1, 1 );
err = WSAStartup( wVersionRequested, &wsaData );
#endif
int error = 0;
const char* sInterfaceIP = "10.20.16.90";
int nInterfacePort = 0;
//Create socket
SOCKET m_socketID = socket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );
//Re use address
struct sockaddr_in addr;
fillAddr( "10.20.16.90", 12400, addr ); //"233.43.202.1"
char one = 1;
//error = setsockopt(m_socketID, SOL_SOCKET, SO_REUSEADDR , &one, sizeof(one));
if( error != 0 )
{
fprintf( stderr, "%s: ERROR setsockopt returned %d.\n", __FUNCTION__, WSAGetLastError() );
}
//Bind
error = bind( m_socketID, reinterpret_cast<SOCKADDR*>( &addr ), sizeof( addr ) );
if( error == -1 )
{
fprintf(stderr, "%s: ERROR %d binding to %s:%d\n",
__FUNCTION__, WSAGetLastError(), sInterfaceIP, nInterfacePort);
}
//Join multicast group
struct ip_mreq mreq;
mreq.imr_multiaddr.s_addr = inet_addr("225.2.3.13");//( "233.43.202.1" );
mreq.imr_interface.s_addr = inet_addr("10.20.16.90");
//error = setsockopt( m_socketID, IPPROTO_IP, IP_ADD_MEMBERSHIP, reinterpret_cast<char*>( &mreq ), sizeof( mreq ) );
if (error == -1)
{
fprintf(stderr, "%s: ERROR %d trying to join group %s.\n", __FUNCTION__, WSAGetLastError(), "233.43.202.1" );
}
int bufSize = 0, len = sizeof(bufSize), nBufferSize = 10*1024*1024;//8192*1024;
//Resize the buffer
getsockopt(m_socketID, SOL_SOCKET, SO_RCVBUF, (char*)&bufSize, &len );
fprintf(stderr, "getsockopt size before %d\n", bufSize );
fprintf(stderr, "setting buffer size %d\n", nBufferSize );
error = setsockopt(m_socketID, SOL_SOCKET, SO_RCVBUF,
reinterpret_cast<const char*>( &nBufferSize ), sizeof( nBufferSize ) );
if( error != 0 )
{
fprintf(stderr, "%s: ERROR %d setting the receive buffer size to %d.\n",
__FUNCTION__, WSAGetLastError(), nBufferSize );
}
bufSize = 1234, len = sizeof(bufSize);
getsockopt(m_socketID, SOL_SOCKET, SO_RCVBUF, (char*)&bufSize, &len );
fprintf(stderr, "getsockopt size after %d\n", bufSize );
//Non-blocking
u_long op = 1;
ioctlsocket( m_socketID, FIONBIO, &op );
//Create IOCP
HANDLE iocp = CreateIoCompletionPort( INVALID_HANDLE_VALUE, NULL, NULL, 1 );
HANDLE iocp2 = CreateIoCompletionPort( (HANDLE)m_socketID, iocp, 5, 1 );
char buffer[2*1024]={0};
int r = 0;
OVERLAPPED overlapped;
memset(&overlapped, 0, sizeof(overlapped));
DWORD bytes = 0, flags = 0;
// WSABUF buffers[1];
//
// buffers[0].buf = buffer;
// buffers[0].len = sizeof(buffer);
//
// while( (r = WSARecv( m_socketID, buffers, 1, &bytes, &flags, &overlapped, NULL )) != -121 )
//sleep(100000);
while( (r = ReadFile( (HANDLE)m_socketID, buffer, sizeof(buffer), NULL, &overlapped )) != -121 )
{
bytes = 0;
ULONG_PTR key = 0;
LPOVERLAPPED pOverlapped;
if( GetQueuedCompletionStatus( iocp, &bytes, &key, &pOverlapped, INFINITE ) )
{
static unsigned __int64 total = 0, printed = 0;
total += bytes;
if( total - printed > (1024*1024) )
{
printf( "%I64dmb\r", printed/ (1024*1024) );
printed = total;
}
}
}
while( r = recv(m_socketID,buffer,sizeof(buffer),0) )
{
static unsigned int total = 0, printed = 0;
if( r > 0 )
{
total += r;
if( total - printed > (1024*1024) )
{
printf( "%dmb\r", printed/ (1024*1024) );
printed = total;
}
}
}
return 0;
}
I am using Iperf as the sender and comparing the amount of data received to the amount of data sent: iperf.exe -c 10.20.16.90 -u -P 10 -B 10.20.16.51 -b 1000000000 -p 12400 -l 1000
edit: doing iperf to iperf the performance is closer to 180k or so without dropping (8mb client side buffer). If I am doing tcp I can do about 200k packets/second. Here's what interesting though - I can do far more than 200k with multiple tcp connections, but multiple udp connections do not increase the total (I test udp performance with multiple iperfs, since a single iperf with multiple threads doesn't seem to work). All hardware acceleration is tuned on in the drivers.. It seems like udp performance is simply subpar?
I've been doing some UDP testing with similar hardware as I investigate the performance gains that can be had from using the Winsock Registered I/O network extensions, RIO, in Windows 8 Server. For this I've been running tests on Windows Server 2008 R2 and on Windows Server 8.
I've yet to get to the point where I've begun testing with our 10Gb cards (they've only just arrived) but the results of my earlier tests and the example programs used to run them can be found here on my blog.
One thing that I might suggest is that with a simple test like the one you show where there's very little work being done to each datagram you may find that old fashioned, synchronous I/O, is faster than the IOCP design. Whilst the IOCP design steps ahead as the
workload per datagram rises and you can fully utilise the multiple threads.
Also, are your test machines wired back to back (i.e. without a switch) or do they run through a switch; if so, could the issue be down to the performance of your switch rather than your test machines? If you're using a switch, or have multiple nics in the server, can you run multiple clients against the server, could the issue be on the client rather than the server?
What CPU usage are you seeing on the sending and receiving machines? Have you looked at the machine's cpu usage with Process Explorer? This is more accurate than Task Manager. Which CPU is handling the nic interrupts, can you improve things by binding these to another cpu? or changing the affinity of your test program to run on another cpu? Is your IOCP example spreading its threads across multiple NUMA nodes or are you locking all of them to one node?
I'm hoping to get to run some more tests next week and will update my answer when I have done so.
Edit: For me the problem was due to the fact that the NIC drivers had "flow control" enabled and this caused the sender to run at the speed of the receiver. This had some undesirable "non-paged pool" usage characteristics and turning off flow control allows you to see how fast the sender can go (and the difference in network utilisation between the sender and receiver clearly shows how much data is being lost). See my blog posting here for more details.

In Win32, is there a way to test if a socket is non-blocking?

In Win32, is there a way to test if a socket is non-blocking?
Under POSIX systems, I'd do something like the following:
int is_non_blocking(int sock_fd) {
flags = fcntl(sock_fd, F_GETFL, 0);
return flags & O_NONBLOCK;
}
However, Windows sockets don't support fcntl(). The non-blocking mode is set using ioctl with FIONBIO, but there doesn't appear to be a way to get the current non-blocking mode using ioctl.
Is there some other call on Windows that I can use to determine if the socket is currently in non-blocking mode?
A slightly longer answer would be: No, but you will usually know whether or not it is, because it is relatively well-defined.
All sockets are blocking unless you explicitly ioctlsocket() them with FIONBIO or hand them to either WSAAsyncSelect or WSAEventSelect. The latter two functions "secretly" change the socket to non-blocking.
Since you know whether you have called one of those 3 functions, even though you cannot query the status, it is still known. The obvious exception is if that socket comes from some 3rd party library of which you don't know what exactly it has been doing to the socket.
Sidenote: Funnily, a socket can be blocking and overlapped at the same time, which does not immediately seem intuitive, but it kind of makes sense because they come from opposite paradigms (readiness vs completion).
Previously, you could call WSAIsBlocking to determine this. If you are managing legacy code, this may still be an option.
Otherwise, you could write a simple abstraction layer over the socket API. Since all sockets are blocking by default, you could maintain an internal flag and force all socket ops through your API so you always know the state.
Here is a cross-platform snippet to set/get the blocking mode, although it doesn't do exactly what you want:
/// #author Stephen Dunn
/// #date 10/12/15
bool set_blocking_mode(const int &socket, bool is_blocking)
{
bool ret = true;
#ifdef WIN32
/// #note windows sockets are created in blocking mode by default
// currently on windows, there is no easy way to obtain the socket's current blocking mode since WSAIsBlocking was deprecated
u_long flags = is_blocking ? 0 : 1;
ret = NO_ERROR == ioctlsocket(socket, FIONBIO, &flags);
#else
const int flags = fcntl(socket, F_GETFL, 0);
if ((flags & O_NONBLOCK) && !is_blocking) { info("set_blocking_mode(): socket was already in non-blocking mode"); return ret; }
if (!(flags & O_NONBLOCK) && is_blocking) { info("set_blocking_mode(): socket was already in blocking mode"); return ret; }
ret = 0 == fcntl(socket, F_SETFL, is_blocking ? flags ^ O_NONBLOCK : flags | O_NONBLOCK);
#endif
return ret;
}
I agree with the accepted answer, there is no official way to determine the blocking state of a socket on Windows. In case you get a socket from a third party (let's say, you are a TLS library and you get the socket from upper layer) you cannot decide if it is in blocking state or not.
Despite this I have a working, unofficial and limited solution for the problem which works for me for a long time.
I attempt to read 0 bytes from the socket. In case it is a blocking socket it will return 0, in case it is a non-blocking it will return -1 and GetLastError equals WSAEWOULDBLOCK.
int IsBlocking(SOCKET s)
{
int r = 0;
unsigned char b[1];
r = recv(s, b, 0, 0);
if (r == 0)
return 1;
else if (r == -1 && GetLastError() == WSAEWOULDBLOCK)
return 0;
return -1; /* In case it is a connection socket (TCP) and it is not in connected state you will get here 10060 */
}
Caveats:
Works with UDP sockets
Works with connected TCP sockets
Doesn't work with unconnected TCP sockets

Events/Interrupts in Serial Communication

I want to read and write from serial using events/interrupts.
Currently, I have it in a while loop and it continuously reads and writes through the serial. I want it to only read when something comes from the serial port. How do I implement this in C++?
This is my current code:
while(true)
{
//read
if(!ReadFile(hSerial, szBuff, n, &dwBytesRead, NULL)){
//error occurred. Report to user.
}
//write
if(!WriteFile(hSerial, szBuff, n, &dwBytesRead, NULL)){
//error occurred. Report to user.
}
//print what you are reading
printf("%s\n", szBuff);
}
Use a select statement, which will check the read and write buffers without blocking and return their status, so you only need to read when you know the port has data, or write when you know there's room in the output buffer.
The third example at http://www.developerweb.net/forum/showthread.php?t=2933 and the associated comments may be helpful.
Edit: The man page for select has a simpler and more complete example near the end. You can find it at http://linux.die.net/man/2/select if man 2 select doesn't work on your system.
Note: Mastering select() will allow you to work with both serial ports and sockets; it's at the heart of many network clients and servers.
For a Windows environment the more native approach would be to use asynchronous I/O. In this mode you still use calls to ReadFile and WriteFile, but instead of blocking you pass in a callback function that will be invoked when the operation completes.
It is fairly tricky to get all the details right though.
Here is a copy of an article that was published in the c/C++ users journal a few years ago. It goes into detail on the Win32 API.
here a code that read serial incomming data using interruption on windows
you can see the time elapsed during the waiting interruption time
int pollComport(int comport_number, LPBYTE buffer, int size)
{
BYTE Byte;
DWORD dwBytesTransferred;
DWORD dwCommModemStatus;
int n;
double TimeA,TimeB;
// Specify a set of events to be monitored for the port.
SetCommMask (m_comPortHandle[comport_number], EV_RXCHAR );
while (m_comPortHandle[comport_number] != INVALID_HANDLE_VALUE)
{
// Wait for an event to occur for the port.
TimeA = clock();
WaitCommEvent (m_comPortHandle[comport_number], &dwCommModemStatus, 0);
TimeB = clock();
if(TimeB-TimeA>0)
cout <<" ok "<<TimeB-TimeA<<endl;
// Re-specify the set of events to be monitored for the port.
SetCommMask (m_comPortHandle[comport_number], EV_RXCHAR);
if (dwCommModemStatus & EV_RXCHAR)
{
// Loop for waiting for the data.
do
{
ReadFile(m_comPortHandle[comport_number], buffer, size, (LPDWORD)((void *)&n), NULL);
// Display the data read.
if (n>0)
cout << buffer <<endl;
} while (n > 0);
}
return(0);
}
}

Resources