Non-blockings reads/writes to stdin/stdout in C on Linux or Mac - nonblocking

I have two programs communicating via named pipes (on a Mac), but the buffer size of named pipes is too small. Program 1 writes 50K bytes to pipe 1 before reading pipe 2. Named pipes are 8K (on my system) so program 1 blocks until the data is consumed. Program 2 reads 20K bytes from pipe 1 and then writes 20K bytes to pipe2. Pipe2 can't hold 20K so program 2 now blocks. It will only be released when program 1 does its reads. But program 1 is blocked waiting for program 2. deadlock
I thought I could fix the problem by creating a gasket program that reads stdin non-blocking and writes stdout non-blocking, temporarily storing the data in a large buffer. I tested the program using cat data | ./gasket 0 | ./gasket 1 > out, expecting out to be a copy of data. However, while the first invocation of gasket works as expected, the read in the second program returns 0 before all the data is consumed and never returns anything other than 0 in follow on calls.
I tried the code below both on a MAC and Linux. Both behave the same. I've added logging so that I can see that the fread from the second invocation of gasket starts getting no data even though it has not read all the data written by the first invocation.
#include <stdio.h>
#include <fcntl.h>
#include <time.h>
#include <stdlib.h>
#include <unistd.h>
#define BUFFER_SIZE 100000
char buffer[BUFFER_SIZE];
int elements=0;
int main(int argc, char **argv)
{
int total_read=0, total_write=0;
FILE *logfile=fopen(argv[1],"w");
int flags = fcntl(fileno(stdin), F_GETFL, 0);
fcntl(fileno(stdin), F_SETFL, flags | O_NONBLOCK);
flags = fcntl(fileno(stdout), F_GETFL, 0);
fcntl(fileno(stdout), F_SETFL, flags | O_NONBLOCK);
while (1) {
int num_read=0;
if (elements < (BUFFER_SIZE-1024)) { // space in buffer
num_read = fread(&buffer[elements], sizeof(char), 1024, stdin);
elements += num_read;
total_read += num_read;
fprintf(logfile,"read %d (%d) elements \n",num_read, total_read); fflush(logfile);
}
if (elements > 0) { // something in buffer that we can write
int num_written = fwrite(&buffer[0],sizeof(char),elements, stdout); fflush(stdout);
total_write += num_written;
fprintf(logfile,"wrote %d (%d) elements \n",num_written, total_write); fflush(logfile);
if (num_written > 0) { // copy data to top of buffer
for (int i=0; i<(elements-num_written); i++) {
buffer[i] = buffer[i+num_written];
}
elements -= num_written;
}
}
}
}
I guess I could make the gasket multi-threaded and use blocking reads in one thread and blocking writes in the other, but I would like to understand why non-blocking IO seems to break for me.
Thanks!

My general solution to any IPC project is to make the client and server non-blocking I/O. To do so requires queuing data both on writing and reading, to handle cases where the OS can't read/write, or can only read/write a portion of your message.
The code below will probably seem like EXTREME overkill, but if you get it working, you can use it the rest of your career, whether for named pipes, sockets, network, you name it.
In pseudo-code:
typedef struct {
const char* pcData, * pcToFree; // pcData may no longer point to malloc'd region
int iToSend;
} DataToSend_T;
queue of DataToSend_T qdts;
// Caller will use malloc() to allocate storage, and create the message in
// that buffer. MyWrite() will free it now, or WritableCB() will free it
// later. Either way, the app must NOT free it, and must not even refer to
// it again.
MyWrite( const char* pcData, int iToSend ) {
iSent = 0;
// Normally the OS will tell select() if the socket is writable, but if were hugely
// compute-bound, then it won't have a chance to. So let's call WritableCB() to
// send anything in our queue that is now sendable. We have to send the data in
// order, of course, so can't send the new data until the entire queue is done.
WritableCB();
if ( qdts has no entries ) {
iSent = write( pcData, iToSend );
// TODO: check error
// Did we send it all? We're done.
if ( iSent == iToSend ) {
free( pcData );
return;
}
}
// OK, either 1) we had stuff queued already meaning we can't send, or 2)
// we tried to send but couldn't send it all.
add to queue qdts the DataToSend ( pcData + iSent, pcData, iToSend - iSent );
}
WritableCB() {
while ( qdts has entries ) {
DataToSend_T* pdts = qdts head;
int iSent = write( pdts->cData, pdts->iToSend );
// TODO: check error
if ( iSent == pdts->iToSend ) {
free( pdts->pcToFree );
pop the front node off qdts
else {
pdts->pcData += iSent;
pdts->iToSend -= iSent;
return;
}
}
}
// Off-subject but I like a TINY buffer as an original value, that will always
// exercise the "buffer growth" code for almost all usage, so we're sure it works.
// If the initial buffer size is like 1M, and almost never grows, then the grow code
// may be buggy and we won't know until there's a crash years later.
int iBufSize = 1, iEnd = 0; iEnd is the first byte NOT in a message
char* pcBuf = malloc( iBufSize );
ReadableCB() {
// Keep reading the socket until there's no more data. Grow buffer if necessary.
while (1) {
int iRead = read( pcBuf + iEnd, iBufSize - iEnd);
// TODO: check error
iEnd += iRead;
// If we read less than we had space for, then read returned because this is
// all the available data, not because the buffer was too small.
if ( iRead < iBufSize - iEnd )
break;
// Otherwise, double the buffer and try reading some more.
iBufSize *= 2;
pcBuf = realloc( pcBuf, iBufSize );
}
iStart = 0;
while (1) {
if ( pcBuf[ iStart ] until iEnd-1 is less than a message ) {
// If our partial message isn't at the front of the buffer move it there.
if ( iStart ) {
memmove( pcBuf, pcBuf + iStart, iEnd - iStart );
iEnd -= iStart;
}
return;
}
// process a message, and advance iStart by the size of that message.
}
}
main() {
// Do your initial processing, and call MyWrite() to send and/or queue data.
while (1) {
select() // see man page
if ( the file handle is readable )
ReadableCB();
if ( the file handle is writable )
WritableCB();
if ( the file handle is in error )
// handle it;
if ( application is finished )
exit( EXIT_SUCCESS );
}
}

Related

Send an struct array across a FREERTOS queue

I am starting with ESP32 and FREERTOS, and I am having problems sending an Struct array across a queue. I have already sent another kind of variables but never an array of Structs and I am getting an exception.
The sender and the receiver are in different source files and I am starting to thing that maybe is the problem (or at least part of the problem).
My simplified code looks like this:
common.h
struct dailyWeather {
// Day of the week starting in Monday (1)
int dayOfWeek;
// Min and Max daily temperature
float minTemperature;
float maxTemperature;
int weather;
};
file1.h
#pragma once
#ifndef _FILE1_
#define _FILE1_
// Queue
extern QueueHandle_t weatherQueue;
#endif
file1.cpp
#include "common.h"
#include "file1.h"
// Queue
QueueHandle_t weatherQueue = xQueueCreate( 2, sizeof(dailyWeather *) ); // also tried "dailyWeather" without pointer and "Struct dailyWeather"
void task1(void *pvParameters) {
for (;;) {
dailyWeather weatherDATA[8] = {};
// Code to fill the array of structs with data
if (xQueueSend( weatherQueue, &weatherDATA, ( TickType_t ) 0 ) == pdTRUE) {
// The message was sent sucessfully
}
}
}
file2.cpp
#include "common.h"
#include "file1.h"
void task2(void *pvParameters) {
for (;;) {
dailyWeather *weatherDATA_P; // Also tried without pointer and as an array of Structs
if( xQueueReceive(weatherQueue, &( weatherDATA_P ), ( TickType_t ) 0 ) ) {
Serial.println("Received");
dailyWeather weatherDATA = *weatherDATA_P;
Serial.println(weatherDATA.dayOfWeek);
}
}
}
When I run this code on my ESP32 it works until I try to print the data with Serial.println. The "Received" message is printed, but it crash in the next Serial.println with this error.
Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.
I am locked with this problem and I am not able to find a way to fix it, so any help will be very apreciated.
EDIT:
I am thinking that maybe a solution will be just to add an order item to the struct, make the queue bigger (in number) and send all the Structs separately to the queue. Then use that order in reader to order it again.
Anyway, will be nice to learn what I am doing wrong with the above code.
freeRTOS queues operate by using the buffer and data size you specify during initialization, when you call xQueueCreate(), to make copies of the data you want to send-receive.
When you call xQueueSend(), which is equivalent to xQueueSendToBack(), it makes a copy into that buffer.
If another task is awaiting for the queue in a call to xQueueReceive(), at the moment it becomes ready to run, xQueueReceive() will make a copy of the item in front of the queue's buffer into the destination buffer you specify in your call to xQueueReceive().
If the data you want to send is of pointer/array type, dailyWeather * in your case, then you need to make sure the memory pointed to by the pointer does not get out of scope before being read by the task that receives the pointer by calling xQueueReceive(). Local variables are created in the calling task's stack and will certainly get out of scope, and very likely overwritten, after the function returns.
IMO, best solution if you really need to pass pointers is to allocate the structures array in the function that generates the data and deallocate it in the task that consumes the data.
For many scenarios it is highly desirable not to abuse of dynamic memory handling, so in several communications stacks you will find the use of buffer pools, which at the end are also queues that are initialized during application startup. Operation is approximately as follows:
Initialization:
Initialize the buffer pool queues (simple queues of pointers).
Fill the buffer pools with dynamically allocated buffers of appropriated sizes.
Initialize the queues for inter- task communications.
Task that provides the data:
Get (Receive) a buffer pointer from one buffer pool.
Fill the buffer with data.
Send the buffer pointer to the communications queue.
Task that receives the data:
Get (Receive) the data buffer pointer from the communications queue.
Use the data.
Return (Send) the buffer pointer to the buffer pool.
In case your structures are small, so you have a more or less constrained copy-then-copy overhead, it makes more sense to create the queue so you work directly with structure instances and structure copies instead of structure buffer pointers.
Firstly it's not a good idea to create the queue in the global scope like you do. A global queue handle is OK. But run xQueueCreate() in the same function that creates task1 and task2 (queue must be created before the tasks), something like this:
QueueHandle_t weatherQueue = NULL;
void main() {
weatherQueue = xQueueCreate(32, sizeof(struct dailyWeather));
if (!weatherQueue) {
// Handle error
}
if (xTaskCreate(task1, ...) != pdPASS) {
// Handle error
}
if (xTaskCreate(task2, ...) != pdPASS) {
// Handle error
}
}
Secondly, the code in task1() does the following in a loop:
Create a new array of 8 dailyWeather structs in stack (in the scope of a single loop iteration)
Copy a pointer to first item in weatherDATA[] to the queue (task2 will receive it a bit later, when it's time to switch tasks)
Release the array of 8 dailyWeather (because we're exiting loop scope)
A bit later task2() executes and tries to read the pointer to first item in weatherDATA[]. However this memory has probably been released already. You can't dereference it.
So you're passing pointers to invalid memory over the queue.
It's much, much easier to work with a queue if you just pass the data you want to send instead of a pointer. Your structure is small and consists of elementary data types, so it's a good idea to pass it over the queue in its entirety, one at a time (you can pass an entire array if you want, but this way is simpler).
Something like this:
void task1(void *pvParameters) {
for (;;) {
dailyWeather weatherDATA[8] = {};
// Code to fill the array of structs with data
for (int i = 0; i < 8; i++) {
// Copy the structs to queue, one at a time
if (xQueueSend( weatherQueue, &(weatherDATA[i]), ( TickType_t ) 0 ) == pdTRUE) {
// The message was sent successfully
}
}
}
}
On the receiver side:
void task2(void *pvParameters) {
for (;;) {
dailyWeather weatherDATA;
if( xQueueReceive(weatherQueue, &( weatherDATA ), ( TickType_t ) 0 ) ) {
Serial.println("Received");
Serial.println(weatherDATA.dayOfWeek);
}
}
}
I cannot recommend the official FreeRTOS book enough, it's a great resource for beginners.
Thanks for all for the answers.
Finally I have added a variable to track the item position and I have passed all the data through the queue to the destination task. Then I put all those structs back into another array.
common.h
#include <stdint.h>
struct dailyWeather {
// Day of the week starting in Monday (1)
uint8_t dayOfWeek;
// Min and Max daily temperature
float minTemperature;
float maxTemperature;
uint8_t weather;
uint8_t itemOrder;
};
file1.h
// Queue
extern QueueHandle_t weatherQueue;
file1.cpp
#include "common.h"
#include "file1.h"
// Queue
QueueHandle_t weatherQueue = xQueueCreate( 2 * 8, sizeof(struct dailyWeather) );
void task1(void *pvParameters) {
for (;;) {
dailyWeather weatherDATA[8];
// Code to fill the array of structs with data
for (uint8_t i = 0; i < 8; i++) {
weatherDATA[i].itemOrder = i;
if (xQueueSend( weatherQueue, &weatherDATA[i], ( TickType_t ) 0 ) == pdTRUE) {
// The message was sent sucessfully
}
}
}
}
file2.cpp
#include "common.h"
#include "file1.h"
void task2(void *pvParameters) {
dailyWeather weatherDATA_D[8];
for (;;) {
dailyWeather weatherDATA;
if( xQueueReceive(weatherQueue, &( weatherDATA ), ( TickType_t ) 0 ) ) {
Serial.println("Received");
weatherDATA_D[weatherDATA.itemOrder] = weatherDATA;
}
}
}
Best regards.

Why is my file with redirected stderr empty? [duplicate]

In the following program I print to the console using two different functions
#include <windows.h>
int main() {
HANDLE h = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD byteswritten;
WriteConsole(h, "WriteConsole", 12, &byteswritten, NULL);
WriteFile(h, "WriteFile", 9, &byteswritten, NULL);
}
If when I execute this program and redirect it's output using a > out.txt or a 1> out.txt nothing gets printed to the console (as expected) but the contents of out.txt are only
WriteFile
What is different between the two that allows calls to WriteFile to be redirected to the file and calls to WriteConsole to go to ... nowhere
Tested with gcc and msvc on windows 10
WriteConsole only works with console screen handles, not files nor pipes.
If you are only writing ASCII content you can use WriteFile for everything.
If you need to write Unicode characters you can use GetConsoleMode to detect the handle type, it fails for everything that is not a console handle.
When doing raw output like this you also have to deal with the BOM if the handle is redirected to a file.
This blog post is a good starting point for dealing with Unicode in the Windows console...
Edit 2021:
Windows 10 now has the ConPTY API (aka pseudo-console), which basically allows any program to act like the console for another program, thus enables capturing output that is directly written to the console.
This renders my original answer obsolete for Windows versions that support ConPTY.
Original answer:
From the reference:
WriteConsole fails if it is used with a standard handle that is
redirected to a file. If an application processes multilingual output
that can be redirected, determine whether the output handle is a
console handle (one method is to call the GetConsoleMode function and
check whether it succeeds). If the handle is a console handle, call
WriteConsole. If the handle is not a console handle, the output is
redirected and you should call WriteFile to perform the I/O.
This is only applicable if you control the source code of the application that you want to redirect. I recently had to redirect output from a closed-source application that unconditionally called WriteConsole() so it could not be redirected normally.
Reading the console screen buffer (as suggested by this answer) prooved to be unreliable, so I used Microsoft Detours library to hook the WriteConsole() API in the target process and call WriteFile() if necessary. Otherwise call the original WriteConsole() function.
I created a hook DLL based on the example of Using Detours:
#include <windows.h>
#include <detours.h>
// Target pointer for the uninstrumented WriteConsoleW API.
//
auto WriteConsoleW_orig = &WriteConsoleW;
// Detour function that replaces the WriteConsoleW API.
//
BOOL WINAPI WriteConsoleW_hooked(
_In_ HANDLE hConsoleOutput,
_In_ const VOID *lpBuffer,
_In_ DWORD nNumberOfCharsToWrite,
_Out_ LPDWORD lpNumberOfCharsWritten,
_Reserved_ LPVOID lpReserved
)
{
// Check if this actually is a console screen buffer handle.
DWORD mode;
if( GetConsoleMode( hConsoleOutput, &mode ) )
{
// Forward to the original WriteConsoleW() function.
return WriteConsoleW_orig( hConsoleOutput, lpBuffer, nNumberOfCharsToWrite, lpNumberOfCharsWritten, lpReserved );
}
else
{
// This is a redirected handle (e. g. a file or a pipe). We multiply with sizeof(WCHAR), because WriteFile()
// expects the number of bytes, but WriteConsoleW() gets passed the number of characters.
BOOL result = WriteFile( hConsoleOutput, lpBuffer, nNumberOfCharsToWrite * sizeof(WCHAR), lpNumberOfCharsWritten, nullptr );
// WriteFile() returns number of bytes written, but WriteConsoleW() has to return the number of characters written.
if( lpNumberOfCharsWritten )
*lpNumberOfCharsWritten /= sizeof(WCHAR);
return result;
}
}
// DllMain function attaches and detaches the WriteConsoleW_hooked detour to the
// WriteConsoleW target function. The WriteConsoleW target function is referred to
// through the WriteConsoleW_orig target pointer.
//
BOOL WINAPI DllMain(HINSTANCE hinst, DWORD dwReason, LPVOID reserved)
{
if (DetourIsHelperProcess()) {
return TRUE;
}
if (dwReason == DLL_PROCESS_ATTACH) {
DetourRestoreAfterWith();
DetourTransactionBegin();
DetourUpdateThread(GetCurrentThread());
DetourAttach(&(PVOID&)WriteConsoleW_orig, WriteConsoleW_hooked);
DetourTransactionCommit();
}
else if (dwReason == DLL_PROCESS_DETACH) {
DetourTransactionBegin();
DetourUpdateThread(GetCurrentThread());
DetourDetach(&(PVOID&)WriteConsoleW_orig, WriteConsoleW_hooked);
DetourTransactionCommit();
}
return TRUE;
}
Note: In the WriteFile() branch I don't write a BOM (byte order mark), because it is not always wanted (e. g. when redirecting to a pipe instead of a file or when appending to an existing file). An application that is using the DLL to redirect process output to a file can simply write the UTF-16 LE BOM on its own before launching the redirected process.
The target process is created using DetourCreateProcessWithDllExW(), specifying the name of our hook DLL as argument for the lpDllName parameter. The other arguments are identical to how you create a redirected process via the CreateProcessW() API. I won't go into detail, because these are all well documented.
The code below can be used to redirect console output if the other party uses WriteConsole. The code reads the output via a hidden console screen buffer. I've written this code to intercept debug output some directshow drivers write to the console. Directshow drivers have the habit of doing things drivers should not do, like writing unwanted logfiles, writing to console and crashing.
// info to redirected console output
typedef struct tagRedConInfo
{
// hidden console
HANDLE hCon;
// old console handles
HANDLE hOldConOut;
HANDLE hOldConErr;
// buffer to read screen content
CHAR_INFO *BufData;
INT BufSize;
//
} TRedConInfo;
//------------------------------------------------------------------------------
// GLOBALS
//------------------------------------------------------------------------------
// initial handles
HANDLE gv_hOldConOut;
HANDLE gv_hOldConErr;
//------------------------------------------------------------------------------
// PROTOTYPES
//------------------------------------------------------------------------------
/* init redirecting the console output */
BOOL Shell_InitRedirectConsole(BOOL,TRedConInfo*);
/* done redirecting the console output */
BOOL Shell_DoneRedirectConsole(TRedConInfo*);
/* read string from hidden console, then clear */
BOOL Shell_ReadRedirectConsole(TRedConInfo*,TCHAR*,INT);
/* clear buffer of hidden console */
BOOL Shell_ClearRedirectConsole(TRedConInfo*);
//------------------------------------------------------------------------------
// IMPLEMENTATIONS
//------------------------------------------------------------------------------
/***************************************/
/* init redirecting the console output */
/***************************************/
BOOL Shell_InitRedirectConsole(BOOL in_SetStdHandles, TRedConInfo *out_RcInfo)
{
/* locals */
HANDLE lv_hCon;
SECURITY_ATTRIBUTES lv_SecAttr;
// preclear structure
memset(out_RcInfo, 0, sizeof(TRedConInfo));
// prepare inheritable handle just in case an api spans an external process
memset(&lv_SecAttr, 0, sizeof(SECURITY_ATTRIBUTES));
lv_SecAttr.nLength = sizeof(SECURITY_ATTRIBUTES);
lv_SecAttr.bInheritHandle = TRUE;
// create hidden console buffer
lv_hCon = CreateConsoleScreenBuffer(
GENERIC_READ|GENERIC_WRITE, FILE_SHARE_READ|FILE_SHARE_WRITE,
&lv_SecAttr, CONSOLE_TEXTMODE_BUFFER, 0);
// failed to create console buffer?
if (lv_hCon == INVALID_HANDLE_VALUE)
return FALSE;
// store
out_RcInfo->hCon = lv_hCon;
// set as standard handles for own process?
if (in_SetStdHandles)
{
// mutex the globals
WaitForGlobalVarMutex();
// remember the old handles
out_RcInfo->hOldConOut = GetStdHandle(STD_OUTPUT_HANDLE);
out_RcInfo->hOldConErr = GetStdHandle(STD_ERROR_HANDLE);
// set hidden console as std output
SetStdHandle(STD_OUTPUT_HANDLE, lv_hCon);
SetStdHandle(STD_ERROR_HANDLE, lv_hCon);
// is this the first instance?
if (!gv_hOldConOut)
{
// inform our own console output code about the old handles so our own
// console will be writing to the real console, only console output from
// other parties will write to the hidden console
gv_hOldConOut = out_RcInfo->hOldConOut;
gv_hOldConErr = out_RcInfo->hOldConErr;
}
// release mutex
ReleaseGlobalVarMutex();
}
// done
return TRUE;
}
/***************************************/
/* done redirecting the console output */
/***************************************/
BOOL Shell_DoneRedirectConsole(TRedConInfo *in_RcInfo)
{
// validate
if (!in_RcInfo->hCon)
return FALSE;
// restore original handles?
if (in_RcInfo->hOldConOut)
{
// mutex the globals
WaitForGlobalVarMutex();
// restore original handles
SetStdHandle(STD_OUTPUT_HANDLE, in_RcInfo->hOldConOut);
SetStdHandle(STD_ERROR_HANDLE, in_RcInfo->hOldConErr);
// was this the first instance?
if (in_RcInfo->hOldConOut == gv_hOldConOut)
{
// clear
gv_hOldConOut = NULL;
gv_hOldConErr = NULL;
}
// release mutex
ReleaseGlobalVarMutex();
}
// close the console handle
CloseHandle(in_RcInfo->hCon);
// free read buffer
if (in_RcInfo->BufData)
MemFree(in_RcInfo->BufData);
// clear structure
memset(in_RcInfo, 0, sizeof(TRedConInfo));
// done
return TRUE;
}
/***********************************************/
/* read string from hidden console, then clear */
/***********************************************/
BOOL Shell_ReadRedirectConsole(TRedConInfo *in_RcInfo, TCHAR *out_Str, INT in_MaxLen)
{
/* locals */
TCHAR lv_C;
INT lv_X;
INT lv_Y;
INT lv_W;
INT lv_H;
INT lv_N;
INT lv_Len;
INT lv_Size;
INT lv_PrvLen;
COORD lv_DstSze;
COORD lv_DstOfs;
DWORD lv_Written;
SMALL_RECT lv_SrcRect;
CHAR_INFO *lv_BufData;
CONSOLE_SCREEN_BUFFER_INFO lv_Info;
// preclear output
out_Str[0] = 0;
// validate
if (!in_RcInfo->hCon)
return FALSE;
// reserve character for eos
--in_MaxLen;
// get current buffer info
if (!GetConsoleScreenBufferInfo(in_RcInfo->hCon, &lv_Info))
return FALSE;
// check whether there is something at all
if (!lv_Info.dwSize.X || !lv_Info.dwSize.Y)
return FALSE;
// limit the buffer passed onto read call otherwise it
// will fail with out-of-resources error
lv_DstSze.X = (INT16)(lv_Info.dwSize.X);
lv_DstSze.Y = (INT16)(lv_Info.dwSize.Y < 8 ? lv_Info.dwSize.Y : 8);
// size of buffer needed
lv_Size = lv_DstSze.X * lv_DstSze.Y * sizeof(CHAR_INFO);
// is previous buffer too small?
if (!in_RcInfo->BufData || in_RcInfo->BufSize < lv_Size)
{
// free old buffer
if (in_RcInfo->BufData)
MemFree(in_RcInfo->BufData);
// allocate read buffer
if ((in_RcInfo->BufData = (CHAR_INFO*)MemAlloc(lv_Size)) == NULL)
return FALSE;
// store new size
in_RcInfo->BufSize = lv_Size;
}
// always write to (0,0) in buffer
lv_DstOfs.X = 0;
lv_DstOfs.Y = 0;
// init src rectangle
lv_SrcRect.Left = 0;
lv_SrcRect.Top = 0;
lv_SrcRect.Right = lv_DstSze.X;
lv_SrcRect.Bottom = lv_DstSze.Y;
// buffer to local
lv_BufData = in_RcInfo->BufData;
// start at first string position in output
lv_Len = 0;
// loop until no more rows to read
do
{
// read buffer load
if (!ReadConsoleOutput(in_RcInfo->hCon, lv_BufData, lv_DstSze, lv_DstOfs, &lv_SrcRect))
return FALSE;
// w/h of actually read content
lv_W = lv_SrcRect.Right - lv_SrcRect.Left + 1;
lv_H = lv_SrcRect.Bottom - lv_SrcRect.Top + 1;
// remember previous position
lv_PrvLen = lv_Len;
// loop through rows of buffer
for (lv_Y = 0; lv_Y < lv_H; ++lv_Y)
{
// reset output position of current row
lv_N = 0;
// loop through columns
for (lv_X = 0; lv_X < lv_W; ++lv_X)
{
// is output full?
if (lv_Len + lv_N > in_MaxLen)
break;
// get character from screen buffer, ignore attributes
lv_C = lv_BufData[lv_Y * lv_DstSze.X + lv_X].Char.UnicodeChar;
// append character
out_Str[lv_Len + lv_N++] = lv_C;
}
// remove spaces at the end of the line
while (lv_N > 0 && out_Str[lv_Len+lv_N-1] == ' ')
--lv_N;
// if row was not blank
if (lv_N > 0)
{
// update output position
lv_Len += lv_N;
// is output not full?
if (lv_Len + 2 < in_MaxLen)
{
// append cr/lf
out_Str[lv_Len++] = '\r';
out_Str[lv_Len++] = '\n';
}
}
}
// update screen position
lv_SrcRect.Top = (INT16)(lv_SrcRect.Top + lv_H);
lv_SrcRect.Bottom = (INT16)(lv_SrcRect.Bottom + lv_H);
// until nothing is added or no more screen rows
} while (lv_PrvLen != lv_Len && lv_SrcRect.Bottom < lv_Info.dwSize.Y);
// remove last cr/lf
if (lv_Len > 2)
lv_Len -= 2;
// append eos
out_Str[lv_Len] = 0;
// total screen buffer size in characters
lv_Size = lv_Info.dwSize.X * lv_Info.dwSize.Y;
// clear the buffer with spaces
FillConsoleOutputCharacter(in_RcInfo->hCon, ' ', lv_Size, lv_DstOfs, &lv_Written);
// reset cursor position to (0,0)
SetConsoleCursorPosition(in_RcInfo->hCon, lv_DstOfs);
// done
return TRUE;
}
/**********************************/
/* clear buffer of hidden console */
/**********************************/
BOOL Shell_ClearRedirectConsole(TRedConInfo *in_RcInfo)
{
/* locals */
INT lv_Size;
COORD lv_ClrOfs;
DWORD lv_Written;
CONSOLE_SCREEN_BUFFER_INFO lv_Info;
// validate
if (!in_RcInfo->hCon)
return FALSE;
// get current buffer info
if (!GetConsoleScreenBufferInfo(in_RcInfo->hCon, &lv_Info))
return FALSE;
// clear from (0,0) onward
lv_ClrOfs.X = 0;
lv_ClrOfs.Y = 0;
// total screen buffer size in characters
lv_Size = lv_Info.dwSize.X * lv_Info.dwSize.Y;
// clear the buffer with spaces
FillConsoleOutputCharacter(in_RcInfo->hCon, ' ', lv_Size, lv_ClrOfs, &lv_Written);
// reset cursor position to (0,0)
SetConsoleCursorPosition(in_RcInfo->hCon, lv_ClrOfs);
// done
return TRUE;
}

show location of biggest memory leak on Windows

My program has big leaks. I am using the debug heap by putting this in my stdafx.h:
#define _CRTDBG_MAP_ALLOC
#include <stdlib.h>
#include <crtdbg.h>
Then I'm capturing all the leaks in a text file by putting this code just before exit:
HANDLE hLogFile;
hLogFile = CreateFile( "T:\\MyProject\\heap.txt", GENERIC_WRITE,
FILE_SHARE_WRITE, NULL, CREATE_ALWAYS,
FILE_ATTRIBUTE_NORMAL, NULL);
_CrtSetReportMode(_CRT_WARN, _CRTDBG_MODE_FILE);
_CrtSetReportFile(_CRT_WARN, hLogFile);
_CrtDumpMemoryLeaks();
exit( EXIT_SUCCESS );
However even then the data is leak by leak, which is far too low-level information.
Stepping into _CrtDumpMemoryLeaks(), the code is actually easy to follow. I wrote my own function that summarizes the data, reporting bytes leaked for each line of code and sorting by leak size.
However it requires a static variable inside dbgheap.c in order to work. I've tried to make a version of dbgheap.c that doesn't have these as static symbols and tried to make a mini-DLL out of it (but it complains about a missing symbol I can't find anywhere in the MSFT code, _heap_regions). Instead what I've settled on is putting this code right before the code above calling _CrtDumpMemoryLeaks():
// Put a breakpoint here; step INTO the malloc, then in variable watch
// window evaluate: _CrtDumpMemoryLeakSummary( _pFirstBlock );
void* pvAccess = malloc(1);
And in turn this is the code for the _CrtDumpMemoryLeakSummary function:
#define _CRTBLD
#include "C:\Program Files\Microsoft Visual Studio 9.0\VC\crt\src\dbgint.h"
typedef struct {
const char* pszFileName;
int iLine;
int iTotal;
} Location_T;
#define MAX_SUMMARY 5000
static Location_T aloc[ MAX_SUMMARY ];
static int CompareFn( const void* pv1, const void* pv2 ) {
Location_T* ploc1 = (Location_T*) pv1;
Location_T* ploc2 = (Location_T*) pv2;
if ( ploc1->iTotal > ploc2->iTotal )
return -1;
if ( ploc1->iTotal < ploc2->iTotal )
return 1;
return 0;
}
void _CrtDumpMemoryLeakSummary( _CrtMemBlockHeader* pHead )
{
int iLocUsed = 0, iUnbucketed = 0, i;
for ( /*pHead = _pFirstBlock */;
pHead != NULL && /* pHead != _pLastBlock && */ iLocUsed < MAX_SUMMARY;
pHead = pHead->pBlockHeaderNext ) {
const char* pszFileName = pHead->szFileName ? pHead->szFileName : "<UNKNOWN>";
// Linear search is theoretically horribly slow but saves trouble of
// avoiding heap use while measuring heap use.
int i;
for ( i = 0; i < iLocUsed; i++ ) {
// To speed search, compare line number (fast) before strcmp() (slow).
// If szFileName were guaranteed to be __LINE__ then we could take advantage
// of __LINE__ always having the same address for any given file, and just
// compare pointers rather than using strcmp(). However, szFileName could
// be something else.
if ( pHead->nLine == aloc[i].iLine &&
strcmp( pszFileName, aloc[i].pszFileName ) == 0 ) {
aloc[i].iTotal += pHead->nDataSize;
break;
}
}
if ( i == iLocUsed ) {
aloc[i].pszFileName = pszFileName;
aloc[i].iLine = pHead->nLine;
aloc[i].iTotal = pHead->nDataSize;
iLocUsed++;
}
}
if ( iLocUsed == MAX_SUMMARY )
_RPT0( _CRT_WARN, "\n\n\nARNING: RAN OUT OF BUCKETS! DATA INCOMPLETE!!!\n\n\n" );
qsort( aloc, iLocUsed, sizeof( Location_T ), CompareFn );
_RPT0(_CRT_WARN, "SUMMARY OF LEAKS\n" );
_RPT0(_CRT_WARN, "\n" );
_RPT0(_CRT_WARN, "bytes leaked code location\n" );
_RPT0(_CRT_WARN, "------------ -------------\n" );
for ( i = 0; i < iLocUsed; i++ )
_RPT3(_CRT_WARN, "%12d %s:%d\n", aloc[i].iTotal, aloc[i].pszFileName, aloc[i].iLine );
}
It produces output like this:
SUMMARY OF LEAKS
bytes leaked code location
------------ -------------
912997 <UNKNOWN>:0
377800 ..\MyProject\foo.h:205
358400 ..\MyProject\A.cpp:959
333672 ..\MyProject\B.cpp:359
8192 f:\dd\vctools\crt_bld\self_x86\crt\src\_getbuf.c:58
6144 ..\MyProject\Interpreter.cpp:196
4608 ..\MyProject\Interpreter.cpp:254
3634 f:\dd\vctools\crt_bld\self_x86\crt\src\stdenvp.c:126
2960 ..\MyProject\C.cpp:947
2089 ..\MyProject\D.cpp:1031
2048 f:\dd\vctools\crt_bld\self_x86\crt\src\ioinit.c:136
2048 f:\dd\vctools\crt_bld\self_x86\crt\src\_file.c:133

Reliable way to determine file size on POSIX/OS X given a file descriptor

I wrote a function to watch a file (given an fd) growing to a certain size including a timeout. I'm using kqueue()/kevent() to wait for the file to be "extended" but after I get the notification that the file grew I have to check the file size (and compare it against the desired size). That seems to be easy but I cannot figure out a way to do that reliably in POSIX.
NB: The timeout will hit if the file doesn't grow at all for the time specified. So, this is not an absolute timeout, just a timeout that some growing happens to the file. I'm on OS X but this question is meant for "every POSIX that has kevent()/kqueue()", that should be OS X and the BSDs I think.
Here's my current version of my function:
/**
* Blocks until `fd` reaches `size`. Times out if `fd` isn't extended for `timeout`
* amount of time. Returns `-1` and sets `errno` to `EFBIG` should the file be bigger
* than wanted.
*/
int fwait_file_size(int fd,
off_t size,
const struct timespec *restrict timeout)
{
int ret = -1;
int kq = kqueue();
struct kevent changelist[1];
if (kq < 0) {
/* errno set by kqueue */
ret = -1;
goto out;
}
memset(changelist, 0, sizeof(changelist));
EV_SET(&changelist[0], fd, EVFILT_VNODE, EV_ADD | EV_ENABLE | EV_CLEAR, NOTE_DELETE | NOTE_RENAME | NOTE_EXTEND, 0, 0);
if (kevent(kq, changelist, 1, NULL, 0, NULL) < 0) {
/* errno set by kevent */
ret = -1;
goto out;
}
while (true) {
{
/* Step 1: Check the size */
int suc_sz = evaluate_fd_size(fd, size); /* IMPLEMENTATION OF THIS IS THE QUESTION */
if (suc_sz > 0) {
/* wanted size */
ret = 0;
goto out;
} else if (suc_sz < 0) {
/* errno and return code already set */
ret = -1;
goto out;
}
}
{
/* Step 2: Wait for growth */
int suc_kev = kevent(kq, NULL, 0, changelist, 1, timeout);
if (0 == suc_kev) {
/* That's a timeout */
errno = ETIMEDOUT;
ret = -1;
goto out;
} else if (suc_kev > 0) {
if (changelist[0].filter == EVFILT_VNODE) {
if (changelist[0].fflags & NOTE_RENAME || changelist[0].fflags & NOTE_DELETE) {
/* file was deleted, renamed, ... */
errno = ENOENT;
ret = -1;
goto out;
}
}
} else {
/* errno set by kevent */
ret = -1;
goto out;
}
}
}
out: {
int errno_save = errno;
if (kq >= 0) {
close(kq);
}
errno = errno_save;
return ret;
}
}
So the basic algorithm works the following way:
Set up the kevent
Check size
Wait for file growth
Steps 2 and 3 are repeated until the file reached the wanted size.
The code uses a function int evaluate_fd_size(int fd, off_t wanted_size) which will return < 0 for "some error happened or file larger than wanted", == 0 for "file not big enough yet", or > 0 for file has reached the wanted size.
Obviously this only works if evaluate_fd_size is reliable in determining file size. My first go was to implement it with off_t eof_pos = lseek(fd, 0, SEEK_END) and compare eof_pos against wanted_size. Unfortunately, lseek seems to cache the results. So even when kevent returned with NOTE_EXTEND, so the file grew, the result may be the same! Then I thought to switch to fstat but found articles that fstat caches as well.
The last thing I tried was using fsync(fd); before off_t eof_pos = lseek(fd, 0, SEEK_END); and suddenly things started working. But:
Nothing states that fsync() really solves my problem
I don't want to fsync() because of performance
EDIT: It's really hard to reproduce but I saw one case in which fsync() didn't help. It seems to take (very little) time until the file size is larger after a NOTE_EXTEND event hit user space. fsync() probably just works as a good enough sleep() and therefore it works most of the time :-.
So, in other words: How to reliably check file size in POSIX without opening/closing the file which I cannot do because I don't know the file name. Additionally, I can't find a guarantee that this would help
By the way: int new_fd = dup(fd); off_t eof_pos = lseek(new_fd, 0, SEEK_END); close(new_fd); did not overcome the caching issue.
EDIT 2: I also created an all in one demo program. If it prints Ok, success before exiting, everything went fine. But usually it prints Timeout (10000000) which manifests the race condition: The file size check for the last kevent triggered is smaller than the actual file size at this very moment. Weirdly when using ftruncate() to grow the file instead of write() it seems to work (you can compile the test program with -DUSE_FTRUNCATE to test that).
Nothing states that fsync() really solves my problem
I don't want to fsync() because of performance
Your problem isn't "fstat caching results", it's the I/O system buffering writes. Fstat doesn't get updated until the kernel flushes the I/O buffers to the underlying file system.
This is why fsync fixes your problem and any solution to your problem more or less has to do the equivalent of fsync. ( This is what the open/close solution does as a side effect. )
Can't help you with 2 because I don't see any way to avoid doing fsync.

Upper limit to UDP performance on windows server 2008

It looks like from my testing I am hitting a performance wall on my 10gb network. I seem to be unable to read more than 180-200k packets per second. Looking at perfmon, or task manager I can receive up to a million packets / second if not more. Testing 1 socket or 10 or 100, doesn't seem to change this limit of 200-300k packets a second. I've fiddled with RSS and the like without success. Unicast vs multicast doesn't seem to matter, overlapped i/o vs synchronous doesn't make a difference either. Size of packet doesn't matter either. There just seems to be a hard limit to the number of packets windows can copy from the nic to the buffer. This is a dell r410. Any ideas?
#include "stdafx.h"
#include <WinSock2.h>
#include <ws2ipdef.h>
static inline void fillAddr(const char* const address, unsigned short port, sockaddr_in &addr)
{
memset( &addr, 0, sizeof( addr ) );
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = inet_addr( address );
addr.sin_port = htons(port);
}
int _tmain(int argc, _TCHAR* argv[])
{
#ifdef _WIN32
WORD wVersionRequested;
WSADATA wsaData;
int err;
wVersionRequested = MAKEWORD( 1, 1 );
err = WSAStartup( wVersionRequested, &wsaData );
#endif
int error = 0;
const char* sInterfaceIP = "10.20.16.90";
int nInterfacePort = 0;
//Create socket
SOCKET m_socketID = socket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );
//Re use address
struct sockaddr_in addr;
fillAddr( "10.20.16.90", 12400, addr ); //"233.43.202.1"
char one = 1;
//error = setsockopt(m_socketID, SOL_SOCKET, SO_REUSEADDR , &one, sizeof(one));
if( error != 0 )
{
fprintf( stderr, "%s: ERROR setsockopt returned %d.\n", __FUNCTION__, WSAGetLastError() );
}
//Bind
error = bind( m_socketID, reinterpret_cast<SOCKADDR*>( &addr ), sizeof( addr ) );
if( error == -1 )
{
fprintf(stderr, "%s: ERROR %d binding to %s:%d\n",
__FUNCTION__, WSAGetLastError(), sInterfaceIP, nInterfacePort);
}
//Join multicast group
struct ip_mreq mreq;
mreq.imr_multiaddr.s_addr = inet_addr("225.2.3.13");//( "233.43.202.1" );
mreq.imr_interface.s_addr = inet_addr("10.20.16.90");
//error = setsockopt( m_socketID, IPPROTO_IP, IP_ADD_MEMBERSHIP, reinterpret_cast<char*>( &mreq ), sizeof( mreq ) );
if (error == -1)
{
fprintf(stderr, "%s: ERROR %d trying to join group %s.\n", __FUNCTION__, WSAGetLastError(), "233.43.202.1" );
}
int bufSize = 0, len = sizeof(bufSize), nBufferSize = 10*1024*1024;//8192*1024;
//Resize the buffer
getsockopt(m_socketID, SOL_SOCKET, SO_RCVBUF, (char*)&bufSize, &len );
fprintf(stderr, "getsockopt size before %d\n", bufSize );
fprintf(stderr, "setting buffer size %d\n", nBufferSize );
error = setsockopt(m_socketID, SOL_SOCKET, SO_RCVBUF,
reinterpret_cast<const char*>( &nBufferSize ), sizeof( nBufferSize ) );
if( error != 0 )
{
fprintf(stderr, "%s: ERROR %d setting the receive buffer size to %d.\n",
__FUNCTION__, WSAGetLastError(), nBufferSize );
}
bufSize = 1234, len = sizeof(bufSize);
getsockopt(m_socketID, SOL_SOCKET, SO_RCVBUF, (char*)&bufSize, &len );
fprintf(stderr, "getsockopt size after %d\n", bufSize );
//Non-blocking
u_long op = 1;
ioctlsocket( m_socketID, FIONBIO, &op );
//Create IOCP
HANDLE iocp = CreateIoCompletionPort( INVALID_HANDLE_VALUE, NULL, NULL, 1 );
HANDLE iocp2 = CreateIoCompletionPort( (HANDLE)m_socketID, iocp, 5, 1 );
char buffer[2*1024]={0};
int r = 0;
OVERLAPPED overlapped;
memset(&overlapped, 0, sizeof(overlapped));
DWORD bytes = 0, flags = 0;
// WSABUF buffers[1];
//
// buffers[0].buf = buffer;
// buffers[0].len = sizeof(buffer);
//
// while( (r = WSARecv( m_socketID, buffers, 1, &bytes, &flags, &overlapped, NULL )) != -121 )
//sleep(100000);
while( (r = ReadFile( (HANDLE)m_socketID, buffer, sizeof(buffer), NULL, &overlapped )) != -121 )
{
bytes = 0;
ULONG_PTR key = 0;
LPOVERLAPPED pOverlapped;
if( GetQueuedCompletionStatus( iocp, &bytes, &key, &pOverlapped, INFINITE ) )
{
static unsigned __int64 total = 0, printed = 0;
total += bytes;
if( total - printed > (1024*1024) )
{
printf( "%I64dmb\r", printed/ (1024*1024) );
printed = total;
}
}
}
while( r = recv(m_socketID,buffer,sizeof(buffer),0) )
{
static unsigned int total = 0, printed = 0;
if( r > 0 )
{
total += r;
if( total - printed > (1024*1024) )
{
printf( "%dmb\r", printed/ (1024*1024) );
printed = total;
}
}
}
return 0;
}
I am using Iperf as the sender and comparing the amount of data received to the amount of data sent: iperf.exe -c 10.20.16.90 -u -P 10 -B 10.20.16.51 -b 1000000000 -p 12400 -l 1000
edit: doing iperf to iperf the performance is closer to 180k or so without dropping (8mb client side buffer). If I am doing tcp I can do about 200k packets/second. Here's what interesting though - I can do far more than 200k with multiple tcp connections, but multiple udp connections do not increase the total (I test udp performance with multiple iperfs, since a single iperf with multiple threads doesn't seem to work). All hardware acceleration is tuned on in the drivers.. It seems like udp performance is simply subpar?
I've been doing some UDP testing with similar hardware as I investigate the performance gains that can be had from using the Winsock Registered I/O network extensions, RIO, in Windows 8 Server. For this I've been running tests on Windows Server 2008 R2 and on Windows Server 8.
I've yet to get to the point where I've begun testing with our 10Gb cards (they've only just arrived) but the results of my earlier tests and the example programs used to run them can be found here on my blog.
One thing that I might suggest is that with a simple test like the one you show where there's very little work being done to each datagram you may find that old fashioned, synchronous I/O, is faster than the IOCP design. Whilst the IOCP design steps ahead as the
workload per datagram rises and you can fully utilise the multiple threads.
Also, are your test machines wired back to back (i.e. without a switch) or do they run through a switch; if so, could the issue be down to the performance of your switch rather than your test machines? If you're using a switch, or have multiple nics in the server, can you run multiple clients against the server, could the issue be on the client rather than the server?
What CPU usage are you seeing on the sending and receiving machines? Have you looked at the machine's cpu usage with Process Explorer? This is more accurate than Task Manager. Which CPU is handling the nic interrupts, can you improve things by binding these to another cpu? or changing the affinity of your test program to run on another cpu? Is your IOCP example spreading its threads across multiple NUMA nodes or are you locking all of them to one node?
I'm hoping to get to run some more tests next week and will update my answer when I have done so.
Edit: For me the problem was due to the fact that the NIC drivers had "flow control" enabled and this caused the sender to run at the speed of the receiver. This had some undesirable "non-paged pool" usage characteristics and turning off flow control allows you to see how fast the sender can go (and the difference in network utilisation between the sender and receiver clearly shows how much data is being lost). See my blog posting here for more details.

Resources