I have a simple tunnel program that needs to simultaneously block on standard input and a socket. I currently have a program that looks like this (error handling and boiler plate stuff omitted):
HANDLE host = GetStdHandle(STD_INPUT_HANDLE);
SOCKET peer = ...; // socket(), connect()...
WSAEVENT gate = WSACreateEvent();
OVERLAPPED xfer;
ZeroMemory(&xfer, sizeof(xfer));
xfer.hEvent = gate;
WSABUF pbuf = ...; // allocate memory, set size.
// start an asynchronous transfer.
WSARecv(peer, &pbuf, 1, 0, &xfer, 0);
while ( running )
{
// wait until standard input has available data or the event
// is signaled to inform that socket read operation completed.
HANDLE handles[2] = { host, gate };
const DWORD which = WaitForMultipleObjects
(2, handles, FALSE, INFINITE) - WAIT_OBJECT_0;
if (which == 0)
{
// read stuff from standard input.
ReadFile(host, ...);
// process stuff received from host.
// ...
}
if (which == 1)
{
// process stuff received from peer.
// ...
// start another asynchronous transfer.
WSARecv(peer, &pbuf, 1, 0, &xfer, 0);
}
}
The program works like a charm, I can transfer stuff through this tunnel program without a hitch. The thing is that it has a subtle bug.
If I start this program in interactive mode from cmd.exe and standard input is attached to the keyboard, pressing a key that does not produce input (e.g. the Ctrl key) makes this program block and ignore data received on the socket. I managed to realize that this is because pressing any key signals the standard input handle and WaitForMultipleObjects() returns. As expected, control enters the if (which == 0) block and the call to ReadFile() blocks because there is no input available.
Is there a means to detect how much input is available on a Win32 stream? If so, I could use this to check if any input is available before calling ReadFile() to avoid blocking.
I know of a few solutions for specific types of streams (notably ClearCommError() for serial ports and ioctlsocket(socket,FIONBIO,&count) for sockets), but none that I know of works with the CONIN$ stream.
Use overlapped I/O. Then test the event attached to the I/O operation, instead of the handle.
For CONIN$ specifically, you might also look at the Console Input APIs, such as PeekConsoleInput and GetNumberOfConsoleInputEvents
But I really recommend using OVERLAPPED (background) reads wherever possible and not trying to treat WaitForMultipleObjects like select.
Since the console can't be overlapped in overlapped mode, your simplest options are to wait on the console handle and use ReadConsoleInput (then you have to process control sequences manually), or spawn a dedicated worker thread for synchronous ReadFile. If you choose a worker thread, you may want to then connect a pipe between that worker and the main I/O loop, using overlapped pipe reads.
Another possibility, which I've never tried, would be to wait on the console handle and use PeekConsoleInput to find out whether to call ReadFile or ReadConsoleInput. That way you should be able to get non-blocking along with the cooked terminal processing. OTOH, passing control sequences to ReadConsoleInput might inhibit the buffer-manipulation actions they were supposed to take.
If the two streams are processed independently, or nearly so, it may make more sense to start a thread for each one. Then you can use a blocking read from standard input.
Related
I got stuck in the interrupt part while learning AVR.
Datasheet says about RXCn flag:
"This flag bit is set when there are unread data in the receive buffer and cleared when the receive buffer is empty
(i.e., does not contain any unread data)."
and there is an example about getting a characters with uart
while ( !(UCSRnA & (1<<RXCn)) );
/* Get and return received data from buffer */
return UDRn;
Will it wait here forever until the data comes from the Uart? And will mcu not be able to do any other work because of "while(1);"?
I know this method is polling and I also know that there is an interrupt method but will the mcu be locked because of this?
As #AterLux already said the program will halt until data is recived there are some other possibilities to catch the data nonblocking e.g.:
char uart_get(char *data)
{
if (UCSRnA & (1<<RXCn) );
{
*data = UDRn;
return 1;
}
return 0;
}
If no data has been received you will get 0 and can continue with the program. If you should use interrupt handling or polling depends on your problem. With interrupt handling you can use for example a circular buffer to save received data and use it if you need it. if you are still waiting for one value polling is also an oppertunity.
Yes. It will wait forever while the condition (!(UCSRnA & (1<<RXCn))) is fulfiled. I.e. it will wait until UCSRnA has the bit RXCn set.
If the Global Interrupt Flag (I flag in SREG register) is not cleared (by calling cli(), or entering an interrupt handler) then interrupts still able to run, all the peripherals (counters, SPI, TWI, etc) continue to work, while in this cycle. Of course the program beneath the cycle will not execute.
I have searched widely, I am writing a network filter and I am putting my registry filter in the same driver. Can I call multiple IOCTL's of the same driver at the same time? Would it be better if I separated my network filter and registry filter?
Open the device using FILE_FLAG_OVERLAPPED.
Then, when sending the IOCTL, use the Overlapped argument. Then the call will return immediately (async) and you can either wait (using WaitForSingleObject), call more things, or do whatever. Beware that the way the data is returned may have some gotchas in this case, depending on the use case.
OVERLAPPED async_data = { 0 };
async_data.hEvent = event_handle;
if (DeviceIoControl(hDevice, dwIoControlCode, lpInBuffer, nInBufferSize, lpOutBuffer, nOutBufferSize, lpBytesReturned, &async_data)
{
// do stuff, more deviceiocontrol if you want
WaitForSingleObjects(async_data.hEvent, INFINITE);
// We wait until it finishes
}
// Handle error
Is there a special "wait for event" function that can wait for 3 queues at the same time at device side so it doesn't wait for all queues serially from host side?
Is there a checkpoint command to send into a command queue such that it must wait for other command queues to hit same(vertically) barrier/checkpoint to wait and continue from device side so no host-side round-trip is needed?
For now, I tried two different versions:
clWaitForEvents(3, evt_);
and
int evtStatus0 = 0;
clGetEventInfo(evt_[0], CL_EVENT_COMMAND_EXECUTION_STATUS,
sizeof(cl_int), &evtStatus0, NULL);
while (evtStatus0 > 0)
{
clGetEventInfo(evt_[0], CL_EVENT_COMMAND_EXECUTION_STATUS,
sizeof(cl_int), &evtStatus0, NULL);
Sleep(0);
}
int evtStatus1 = 0;
clGetEventInfo(evt_[1], CL_EVENT_COMMAND_EXECUTION_STATUS,
sizeof(cl_int), &evtStatus1, NULL);
while (evtStatus1 > 0)
{
clGetEventInfo(evt_[1], CL_EVENT_COMMAND_EXECUTION_STATUS,
sizeof(cl_int), &evtStatus1, NULL);
Sleep(0);
}
int evtStatus2 = 0;
clGetEventInfo(evt_[2], CL_EVENT_COMMAND_EXECUTION_STATUS,
sizeof(cl_int), &evtStatus2, NULL);
while (evtStatus2 > 0)
{
clGetEventInfo(evt_[2], CL_EVENT_COMMAND_EXECUTION_STATUS,
sizeof(cl_int), &evtStatus2, NULL);
Sleep(0);
}
second one is a bit faster(I saw it from someone else) and both are executed after 3 flush commands.
Looking at CodeXL profiler results, first one waits longer between finish points and some operations don't even seem to be overlapping. Second one shows 3 finish points are all within 3 milliseconds so it is faster and longer parts are overlapped(read+write+compute at the same time).
If there is a way to achieve this with only 1 wait command from host side, there must a "flush" version of it too but I couldn't find.
Is there any way to achieve below picture instead of adding flushes between each pipeline step?
queue1 write checkpoint write checkpoint write
queue2 - compute checkpoint compute checkpoint compute
queue3 - checkpoint read checkpoint read
all checkpoints have to be vertically synchronized and all these actions must not start until a signal is given. Such as:
queue1.ndwrite(...);
queue1.ndcheckpoint(...);
queue1.ndwrite(...);
queue1.ndcheckpoint(...);
queue1.ndwrite(...);
queue2.ndrangekernel(...);
queue2.ndcheckpoint(...);
queue2.ndrangekernel(...);
queue2.ndcheckpoint(...);
queue2.ndrangekernel(...);
queue3.ndread(...);
queue3.ndcheckpoint(...);
queue3.ndread(...);
queue3.ndcheckpoint(...);
queue3.ndread(...);
queue1.flush()
queue2.flush()
queue3.flush()
queue1.finish()
queue2.finish()
queue3.finish()
checkpoints are all handled in device side and only 3 finish commands are needed from host side(even better,only 1 finish for all queues?)
How I bind 3 queues to 3 events with "clWaitForEvents(3, evt_);" for now is:
hCommandQueue->commandQueue.enqueueBarrierWithWaitList(NULL, &evt[0]);
hCommandQueue2->commandQueue.enqueueBarrierWithWaitList(NULL, &evt[1]);
hCommandQueue3->commandQueue.enqueueBarrierWithWaitList(NULL, &evt[2]);
if this "enqueue barrier" can talk with other queues, how could I achieve that? Do I need to keep host-side events alive until all queues are finished or can I delete them or re-use them later? From the documentation, it seems like first barrier's event can be put to second queue and second one's barrier event can be put to third one along with first one's event so maybe it is like:
hCommandQueue->commandQueue.enqueueBarrierWithWaitList(NULL, &evt[0]);
hCommandQueue2->commandQueue.enqueueBarrierWithWaitList(evt_0, &evt[1]);
hCommandQueue3->commandQueue.enqueueBarrierWithWaitList(evt_0_and_1, &evt[2]);
in the end wait for only evt[2] maybe or using only 1 same event for all:
hCommandQueue->commandQueue.enqueueBarrierWithWaitList(sameEvt, &evt[0]);
hCommandQueue2->commandQueue.enqueueBarrierWithWaitList(sameEvt, &evt[1]);
hCommandQueue3->commandQueue.enqueueBarrierWithWaitList(sameEvt, &evt[2]);
where to get sameEvt object?
anyone tried this? Should I start all queues with a barrier so they dont start until I raise some event from host side or lazy-executions of "enqueue" is %100 trustable to "not to start until I flush/finish" them? How do I raise an event from host to device(sameEvt doesn't have a "raise" function, is it clCreateUserEvent?)?
All 3 queues are in-order type and are in same context. Out-of-order type is not supported by all graphics cards. C++ bindings are being used.
Also there are enqueueWaitList(is this deprecated?) and clEnqueueMarker but I don't know how to use them and documentation doesn't have any example in Khronos' website.
You asked too many questions and expressed too many variants to provide you with the only solution, so I will try to answer in general that you can figure out the most suitable solution.
If the queues are bind to the same context (possibly to different devices within the same context) than it is possible to synchronize them through the events. I.e. you can obtain an event from a command submitted to one queue and use this event to synchronize a command submitted to another queue, e.g.
queue1.enqueue(comm1, /*dependency*/ NULL, /*result event*/ &e1);
queue2.enqueue(comm2, /*dependency*/ &e1, /*result event*/ NULL);
In this example, comm2 will wait for comm1 completion.
If you need to enqueue commands first but no to allow them to be executed you can create user event (clCreateUserEvent) and signal it manually (clSetUserEventStatus). The implementation is allowed to process command as soon as they enqueued (the driver is not required to wait for the flush).
The barrier seems overkill for your purpose because it waits for all commands previously submitted to the queue. You can really use clEnqueueMarker that can be used to wait for all events and provide one event to be used for other commands.
As far as I know you can retain the event at any moment if you do not need it more. The implementation should prolong the event life-time if it is required for internal purposes.
I do not know what is enqueueWaitList.
Off-topic: if you need non-trivial dependencies between calculations you may want to consider TBB flow graph and opencl_node. The opencl_node uses events for syncronization and avoids "host-device" synchronizations if possible. However, it can be tricky to use multiple queues for the same device.
As far as I know, Intel HD Graphics 530 supports out-of-order queues (at least host-side).
You are making it much harder than it needs to be. On the write queue take an event. Use that as a condition for the compute on the compute queue, and take another event. Use that as a condition on the read on the read queue. There is no reason to force any other synchronization. Note: My interpretation of the spec is that you must clFlush on a queue that you took an event from before using that event as a condition on another queue.
The problem:
To design an efficient and very fast named-pipes client server framework.
Current state:
I already have battle proven production tested framework. It is fast, however it uses one thread per one pipe connection and if there are many clients the number of threads could fast be to high. I already use smart thread pool (task pool in fact) that can scale with need.
I already use OVERLAPED mode for pipes, but then I block with WaitForSingleObject or WaitForMultipleObjects so that is why I need one thread per connection on the server side
Desired solution:
Client is fine as it is, but on the server side I would like to use one thread only per client request and not per connection. So instead of using one thread for the whole lifecycle of client (connect / disconnect) I would use one thread per task. So only when client requests data and no more.
I saw an example on MSDN that uses array of OVERLAPED structures and then uses WaitForMultipleObjects to wait on them all. I find this a bad design. Two problems I see here. First you have to maintain an array that can grow quite large and deletions will be costly. Second, you have a lot of events, one for each array member.
I also saw completion ports, like CreateIoCompletionPort and GetQueuedCompletionStatus, but I don't see how they are any better.
What I would like is something ReadFileEx and WriteFileEx do, they call a callback routine
when the operation is completed. This is a true async style of programming. But the problem is that ConnectNamedPipe does not support that and furthermore I saw that the thread needs to be in alertable state and you need to call some of the *Ex functions to have that.
So how is such a problem best solved?
Here is how MSDN does it: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365603(v=vs.85).aspx
The problem I see with this approach is that I can't see how you could have 100 clients connected at once if the limit to WaitForMultipleObjects is 64 handles. Sure I can disconnect the pipe after each request, but the idea is to have a permanent client connection just like in TCP server and to track the client through whole life-cycle with each client having unique ID and client specific data.
The ideal pseudo code should be like this:
repeat
// wait for the connection or for one client to send data
Result = ConnectNamedPipe or ReadFile or Disconnect;
case Result of
CONNECTED: CreateNewClient; // we create a new client
DATA: AssignWorkerThread; // here we process client request in a thread
DISCONNECT: CleanupAndDeleteClient // release the client object and data
end;
until Aborted;
This way we have only one listener thread that accepts connect / disconnect / onData events. Thread pool (worker thread) only process the actual request. This way 5 worker threads can serve a lot of clients that are connected.
P.S.
My current code should not be important. I code this in Delphi but its pure WinAPI so the language does not matter.
EDIT:
For now IOCP look like the solution:
I/O completion ports provide an efficient threading model for
processing multiple asynchronous I/O requests on a multiprocessor
system. When a process creates an I/O completion port, the system
creates an associated queue object for requests whose sole purpose is
to service these requests. Processes that handle many concurrent
asynchronous I/O requests can do so more quickly and efficiently by
using I/O completion ports in conjunction with a pre-allocated thread
pool than by creating threads at the time they receive an I/O request.
If server must handle more than 64 events (read/writes) then any solution using WaitForMultipleObjects becomes unfeasible. This is the reason the Microsoft introduced IO completion ports to Windows. It can handle very high number of IO operations using the most appropriate number of threads (usually it's the number of processors/cores).
The problem with IOCP is that it is very difficult to implement right. Hidden issues are spread like mines in the field: [1], [2] (section 3.6). I would recommend using some framework. Little googling suggests something called Indy for Delphi developers. There are maybe others.
At this point I would disregard the requirement for named pipes if that means coding my own IOCP implementation. It's not worth the grief.
I think what you're overlooking is that you only need a few listening named pipe instances at any given time. Once a pipe instance has connected, you can spin that instance off and create a new listening instance to replace it.
With MAXIMUM_WAIT_OBJECTS (or fewer) listening named pipe instances, you can have a single thread dedicated to listening using WaitForMultipleObjectsEx. The same thread can also handle the rest of the I/O using ReadFileEx and WriteFileEx and APCs. The worker threads would queue APCs to the I/O thread in order to initiate I/O, and the I/O thread can use the task pool to return the results (as well as letting the worker threads know about new connections).
The I/O thread main function would look something like this:
create_events();
for (index = 0; index < MAXIMUM_WAIT_OBJECTS; index++) new_pipe_instance(i);
for (;;)
{
if (service_stopping && active_instances == 0) break;
result = WaitForMultipleObjectsEx(MAXIMUM_WAIT_OBJECTS, connect_events,
FALSE, INFINITE, TRUE);
if (result == WAIT_IO_COMPLETION)
{
continue;
}
else if (result >= WAIT_OBJECT_0 &&
result < WAIT_OBJECT_0 + MAXIMUM_WAIT_OBJECTS)
{
index = result - WAIT_OBJECT_0;
ResetEvent(connect_events[index]);
if (GetOverlappedResult(
connect_handles[index], &connect_overlapped[index],
&byte_count, FALSE))
{
err = ERROR_SUCCESS;
}
else
{
err = GetLastError();
}
connect_pipe_completion(index, err);
continue;
}
else
{
fail();
}
}
The only real complication is that when you call ConnectNamedPipe it may return ERROR_PIPE_CONNECTED to indicate that the call succeeded immediately or an error other than ERROR_IO_PENDING if the call failed immediately. In that case you need to reset the event and then handle the connection:
void new_pipe(ULONG_PTR dwParam)
{
DWORD index = dwParam;
connect_handles[index] = CreateNamedPipe(
pipe_name,
PIPE_ACCESS_DUPLEX | FILE_FLAG_OVERLAPPED,
PIPE_TYPE_MESSAGE | PIPE_WAIT | PIPE_ACCEPT_REMOTE_CLIENTS,
MAX_INSTANCES,
512,
512,
0,
NULL);
if (connect_handles[index] == INVALID_HANDLE_VALUE) fail();
ZeroMemory(&connect_overlapped[index], sizeof(OVERLAPPED));
connect_overlapped[index].hEvent = connect_events[index];
if (ConnectNamedPipe(connect_handles[index], &connect_overlapped[index]))
{
err = ERROR_SUCCESS;
}
else
{
err = GetLastError();
if (err == ERROR_SUCCESS) err = ERROR_INVALID_FUNCTION;
if (err == ERROR_PIPE_CONNECTED) err = ERROR_SUCCESS;
}
if (err != ERROR_IO_PENDING)
{
ResetEvent(connect_events[index]);
connect_pipe_completion(index, err);
}
}
The connect_pipe_completion function would create a new task in the task pool to handle the newly connected pipe instance, and then queue an APC to call new_pipe to create a new listening pipe at the same index.
It is possible to reuse existing pipe instances once they are closed but in this situation I don't think it's worth the hassle.
I'm interested in the behavior of send function when using a blocking socket.
The manual specifies nothing about this case explicitly.
From my tests (and documentation) it results that when using send on a blocking socket I have 2 cases:
all the data is sent
an error is returned and nothing is sent
In lines of code (in C for example) this translate like this:
// everything is allocated and initilized
int socket_fd;
char *buffer;
size_t buffer_len;
ssize_t nret;
nret = send(socket_fd, buffer, buffer_len, 0);
if(nret < 0)
{
// error - nothing was sent (at least we cannot assume anything)
}
else
{
// in case of blocking socket everything is sent (buffer_len == nret)
}
Am I right?
I'm interested about this behavior on all platforms (Windows, Linux, *nix).
From the man page. (http://linux.die.net/man/2/send)
"On success, these calls return the number of characters sent. On error, -1 is returned, and errno is set appropriately. "
You have three conditions.
-1 is a local error in the socket or it's binding.
Some number < the length: not all the bytes were sent. This is usually the case when the socket is marked non-blocking and the requested operation would block; the errno value is EAGAIN.
You probably won't see this because you're doing blocking I/O.
However, the other end of the socket could close the connection prematurely, which may lead to this. The errno value would probably be EPIPE.
Some number == the length: all the bytes were sent.
My understanding is that a blocking send need not be atomic, see for example the Solaris send man page:
For socket types such as SOCK_DGRAM and SOCK_RAW that require atomic messages,
the error EMSGSIZE is returned and the message is not transmitted when it is
too long to pass atomically through the underlying protocol. The same
restrictions do not apply to SOCK_STREAM sockets.
And also look at the EINTR error code there:
The operation was interrupted by delivery of a signal before any data could
be buffered to be sent.
Which indicates that send can be interrupted after some data has been buffered to be sent - but in that case send would return the number of bytes that have already been buffered to be sent (instead of an EINTR error code).
In practice I would only expect to see this behaviour for large messages (that can not be handled atomically by the operating system) on SOCK_STREAM sockets.