Serial I/O Overlapped/Non-Overlapped with Windows/Windows CE - windows

I'm sorry this isn't much of a question, but more of to help people having problems with these particular things. The problem I'm working on requires the use of Serial I/O, but is primarily running under Windows CE 6.0. However, I was recently asked if the application could also be made to work under Windows too, so I set about solving this problem. I did spend quite a lot of time looking around to see if anyone had the answers I was looking for and it all came across as a lot of misinformation and things that were just basically wrong in some instances. So having solved this problem, I thought I'd share my findings with everyone so anyone encountering these difficulties would have answers.
Under Windows CE, OVERLAPPED I/O is NOT supported. This means that bi-directional communication through the serial port can be quite troublesome. The main problem being that when you are waiting on data from the serial port, you cannot send data because doing so will cause your main thread to block until the read operation completes or timeouts (depending on whether you've set timeouts up)
Like most people doing serial I/O, I had a reader serial thread set up for reading the serial port, which used WaitCommEvent() with an EV_RXCHAR mask to wait for serial data. Now this is where the difficulty arises with Windows and Windows CE.
If I have a simple reader thread like this, as an example:-
UINT SimpleReaderThread(LPVOID thParam)
{
DWORD eMask;
WaitCommEvent(thParam, &eMask, NULL);
MessageBox(NULL, TEXT("Thread Exited"), TEXT("Hello"), MB_OK);
}
Obviously in the above example, I'm not reading the data from the serial port or anything and I'm assuming that thParam contains the opened handle to the comm port etc. Now, the problem is under Windows when your thread executes and hits the WaitCommEvent(), your reader thread will go to sleep waiting for serial port data. Okay, that's fine and as it should be, but... how do you end this thread and get the MessageBox() to appear? Well, as it turns out, it's not actually that easy and is a fundamental difference between Windows CE and Windows in the way it does its Serial I/O.
Under Windows CE, you can do a couple of things to make the WaitCommEvent() fall through, such as SetCommMask(COMMPORT_HANDLE, 0) or even CloseHandle(COMMPORT_HANDLE). This will allow you to properly terminate your thread and therefore release the serial port for you to start sending data again. However neither of these things will work under Windows and both will cause the thread you call them from to sleep waiting on the completion of the WaitCommEvent(). So, how do you end the WaitCommEvent() under Windows? Well, ordinarily you'd use OVERLAPPED I/O and the thread blocking wouldn't be an issue, but since the solution has to be compatible with Windows CE as well, OVERLAPPED I/O isn't an option. There is one thing you can do under Windows to end the WaitCommEvent() and that is to call the CancelSynchronousIo() function and this will end your WaitCommEvent(), but be aware this can be device dependent. The main problem with CancelSynchronousIo() is that it isn't supported by Windows CE either, so you're out of luck using that for this problem!
So how do you do it? The fact is, to solve this problem, you simply can't use WaitCommEvent() as there is no way to terminate this function on Windows that is supported by Windows CE. That then leaves you with ReadFile() which again will block whilst it is reading NON OVERLAPPED I/O and this WILL work with Comm Timeouts.
Using ReadFile() and a COMMTIMEOUTS structure does mean that you will have to have a tight loop waiting for your serial data, but if you're not receiving large amount of serial data, it shouldn't be a problem. Also an event for ending your loop with a small timeout will also ensure that resources are passed back to the system and you're not hammering the processor at 100% load. Below is the solution I came up with and would appreciate some feedback, if you think it could be improved.
typedef struct
{
UINT8 sync;
UINT8 op
UINT8 dev;
UINT8 node;
UINT8 data;
UINT8 csum;
} COMMDAT;
COMSTAT cs = {0};
DWORD byte_count;
COMMDAT cd;
ZeroMemory(&cd, sizeof(COMMDAT));
bool recv = false;
do
{
ClearCommError(comm_handle, 0, &cs);
if (cs.cbInQue == sizeof(COMMDAT))
{
ReadFile(comm_handle, &cd, sizeof(COMMDAT), &byte_count, NULL);
recv = true;
}
} while ((WaitForSingleObject(event_handle, 2) != WAIT_OBJECT_0) && !recv);
ThreadExit(recv ? cd.data : 0xFF);
So to end the thread you just signal the event in the event_handle and that allow you to exit the thread properly and clean up resources and works correctly on Windows and Windows CE.
Hope that helps everyone who I've seen has had difficulty with this problem.

Since I think there was a misunderstanding in my comment above, here's more detail on two possible solutions that don't use a tight loop. Note that these use runtime determination and aretherefore fine under both OSes (though you have to compile for each target separately anyway) and since neither use an #ifdef it's less likely to end up breaking the compiler on one side or the other without you noticing immediately.
First, you could dynamically load CancelSynchonousIo and use it when present in the OS. Even optionally doing something instead of the Cancel for CE (like maybe closing the handle?);
typedef BOOL (WINAPI *CancelIo)(HANDLE hThread);
HANDLE hPort;
BOOL CancelStub(HANDLE h)
{
// stub for WinCE
CloseHandle(hPort);
}
void IoWithCancel()
{
CancelIo cancelFcn;
cancelFcn = (CancelIo)GetProcAddress(
GetModuleHandle(_T("kernel32.dll")),
_T("CancelSynchronousIo"));
// if for some reason you want something to happen in CE
if(cancelFcn == NULL)
{
cancelFcn = (CancelIo)CancelStub;
}
hPort = CreateFile( /* blah, blah */);
// do my I/O
if(cancelFcn != NULL)
{
cancelFcn(hPort);
}
}
The other option, which takes a bit more work as you're going to likely have different threading models (though if you're using C++, it would be an excellent case for separate classes based on platform anyway) would be to determine the platform and use overlapped on the desktop:
HANDLE hPort;
void IoWithOverlapped()
{
DWORD overlapped = 0;
OSVERSIONINFO version;
GetVersionEx(&version);
version.dwOSVersionInfoSize = sizeof(OSVERSIONINFO);
if((version.dwPlatformId == VER_PLATFORM_WIN32_WINDOWS)
|| (version.dwPlatformId == VER_PLATFORM_WIN32_NT))
{
overlapped = FILE_FLAG_OVERLAPPED;
}
else
{
// create a receive thread
}
hPort = CreateFile(
_T("COM1:"),
GENERIC_READ | GENERIC_WRITE,
FILE_SHARE_READ | FILE_SHARE_WRITE,
NULL,
OPEN_EXISTING,
overlapped,
NULL);
}

Related

Linux Device Drivers 3Ed File IO & How to Influence Scheduling with Explanatory UML Diagrams

I've used UMLet to draw some UML diagrams describing various entity relationships for each of the chapters of Linux Device Drivers 3Ed (LDD3), by Corbet, Rubini, Kroah-Hartman. The latest version of the diagrams can be found here:
Linux Device Drivers 3Ed UML Diagrams
I would like to ask help understanding a scheduling problem which is supported by documentation in the Non-Blocking File IO Sequence Diagram(s) at the above link, and in LDD3 on P156-158, and in particular this code snippet from scull_getwritespace() (also see P156, but this code has been updated to use mutex rather than a semaphore):
/* Wait for space for writing; caller must hold device semaphore. On
* error the semaphore will be released before returning. */
static int scull_getwritespace(struct scull_pipe *dev, struct file *filp)
{
while (spacefree(dev) == 0) { /* full */
DEFINE_WAIT(wait);
mutex_unlock(&dev->mutex);
if (filp->f_flags & O_NONBLOCK)
return -EAGAIN;
PDEBUG("\"%s\" writing: going to sleep\n",current->comm);
prepare_to_wait(&dev->outq, &wait, TASK_INTERRUPTIBLE);
if (spacefree(dev) == 0)
schedule();
finish_wait(&dev->outq, &wait);
if (signal_pending(current))
return -ERESTARTSYS; /* signal: tell the fs layer to handle it */
if (mutex_lock_interruptible(&dev->mutex))
return -ERESTARTSYS;
}
return 0;
}
and in particular:
if(spacefree(dev) == 0)
schedule();
The case of interest is this:
spacefree(dev) == 0 is true and the write process is about to call schedule().
before schedule() is called, the read process issues wake_up_interruptible(&dev->outq) having consumed all the buffer data so the write process should be woken up, so it can produce more data. This sets the write process state back to TASK_RUNNING.
the write process calls schedule() and potentially goes to sleep.
The above is done to avoid race conditions.
Here are my questions:
Is it possible to modify the code/operation of the kernel so that I can guarantee that schedule() doesn't go to sleep but returns immediately from the call? I don't understand schedule() in sufficient detail to answer this question and any help would be appreciated. I think the answer is no because the scheduler gets to choose what happens next and there may be software interrupts, tasklets or signals to process.
Is it possible to modify the code so that I can guarantee the write process runs before the read process is re-entered? Again, I think the answer might be no but perhaps there are some possibilities with thread prioritisation.
I find the pictorial representations of the linux kernel entities very helpful in understanding the patterns in the kernel which greatly improves my coding productivity, but they are very tedious to generate by hand. In the interests of saving time has anybody else done something similar with specific reference to LDD3?
Thanks.
Not sure if you have had a chance to took into kernel map http://www.makelinux.net/kernel_map/

Serial communication with minimal delay

I have a computer which is connected with external devices via serial communication (i.e. RS-232/RS-422 of physical or emulated serial ports). They communicate with each other by frequent data exchange (30Hz) but with only small data packet (less than 16 bytes for each packet).
The most critical requirement of the communication is low latency or delay between transmitting and receiving.
The data exchange pattern is handshake-like. One host device initiates communication and keeps sending notification on a client device. A client device needs to reply every notification from the host device as quick as possible (this is exactly where the low latency needs to be achieved). The data packets of notifications and replies are well defined; namely the data length is known.
And basically data loss is not allowed.
I have used following common Win API functions to do the I/O read/write in a synchronous manner:
CreateFile, ReadFile, WriteFile
A client device uses ReadFile to read data from a host device. Once the client reads the complete data packet whose length is known, it uses WriteFile to reply the host device with according data packet. The reads and writes are always sequential without concurrency.
Somehow the communication is not fast enough. Namely the time duration between data sending and receiving takes too long. I guess that it could be a problem with serial port buffering or interrupts.
Here I summarize some possible actions to improve the delay.
Please give me some suggestions and corrections :)
call CreateFile with FILE_FLAG_NO_BUFFERING flag? I am not sure if this flag is relevant in this context.
call FlushFileBuffers after each WriteFile? or any action which can notify/interrupt serial port to immediately transmit data?
set higher priority for thread and process which handling serial communication
set latency timer or transfer size for emulated devices (with their driver). But how about the physical serial port?
any equivalent stuff on Windows like setserial/low_latency under Linux?
disable FIFO?
thanks in advance!
I solved this in my case by setting the comm timeouts to {MAXDWORD,0,0,0,0}.
After years of struggling this, on this very day I finally was able to make my serial comms terminal thingy fast enough with Microsoft's CDC class USB UART driver (USBSER.SYS, which is now built in in Windows 10 making it actually usable).
Apparently the aforementioned set of values is a special value that sets minimal timeouts as well as minimal latency (at least with the Microsoft driver, or so it seems to me anyway) and also causes ReadFile to return immediately if no new characters are in the receive buffer.
Here's my code (Visual C++ 2008, project character set changed from "Unicode" to "Not set" to avoid LPCWSTR type cast problem of portname) to open the port:
static HANDLE port=0;
static COMMTIMEOUTS originalTimeouts;
static bool OpenComPort(char* p,int targetSpeed) { // e.g. OpenComPort ("COM7",115200);
char portname[16];
sprintf(portname,"\\\\.\\%s",p);
port=CreateFile(portname,GENERIC_READ|GENERIC_WRITE,0,0,OPEN_EXISTING,0,0);
if(!port) {
printf("COM port is not valid: %s\n",portname);
return false;
}
if(!GetCommTimeouts(port,&originalTimeouts)) {
printf("Cannot get comm timeouts\n");
return false;
}
COMMTIMEOUTS newTimeouts={MAXDWORD,0,0,0,0};
SetCommTimeouts(port,&newTimeouts);
if(!ComSetParams(port,targetSpeed)) {
SetCommTimeouts(port,&originalTimeouts);
CloseHandle(port);
printf("Failed to set COM parameters\n");
return false;
}
printf("Successfully set COM parameters\n");
return true;
}
static bool ComSetParams(HANDLE port,int baud) {
DCB dcb;
memset(&dcb,0,sizeof(dcb));
dcb.DCBlength=sizeof(dcb);
dcb.BaudRate=baud;
dcb.fBinary=1;
dcb.Parity=NOPARITY;
dcb.StopBits=ONESTOPBIT;
dcb.ByteSize=8;
return SetCommState(port,&dcb)!=0;
}
And here's a USB trace of it working. Please note the OUT transactions (output bytes) followed by IN transactions (input bytes) and then more OUT transactions (output bytes) all within 3 milliseconds:
And finally, since if you are reading this, you might be interested to see my function that sends and receives characters over the UART:
unsigned char outbuf[16384];
unsigned char inbuf[16384];
unsigned char *inLast = inbuf;
unsigned char *inP = inbuf;
unsigned long bytesWritten;
unsigned long bytesReceived;
// Read character from UART and while doing that, send keypresses to UART.
unsigned char vgetc() {
while (inP >= inLast) { //My input buffer is empty, try to read from UART
while (_kbhit()) { //If keyboard input available, send it to UART
outbuf[0] = _getch(); //Get keyboard character
WriteFile(port,outbuf,1,&bytesWritten,NULL); //send keychar to UART
}
ReadFile(port,inbuf,1024,&bytesReceived,NULL);
inP = inbuf;
inLast = &inbuf[bytesReceived];
}
return *inP++;
}
Large transfers are handled elsewhere in code.
On a final note, apparently this is the first fast UART code I've managed to write since abandoning DOS in 1998. O, doest the time fly when thou art having fun.
This is where I found the relevant information: http://www.egmont.com.pl/addi-data/instrukcje/standard_driver.pdf
I have experienced similar problem with serial port.
In my case I resolved the problem decreasing the latency of the serial port.
You can change the latency of every port (which by default is set to 16ms) using control panel.
You can find the method here:
http://www.chipkin.com/reducing-latency-on-com-ports/
Good Luck!!!

ReadFile doesn't work asynchronously on Win7 and Win2k8

According to MSDN, ReadFile can read data 2 different ways: synchronously and asynchronously.
I need the second one. The folowing code demonstrates usage with OVERLAPPED struct:
#include <windows.h>
#include <stdio.h>
#include <time.h>
void Read()
{
HANDLE hFile = CreateFileA("c:\\1.avi", GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_FLAG_OVERLAPPED, NULL);
if ( hFile == INVALID_HANDLE_VALUE )
{
printf("Failed to open the file\n");
return;
}
int dataSize = 256 * 1024 * 1024;
char* data = (char*)malloc(dataSize);
memset(data, 0xFF, dataSize);
OVERLAPPED overlapped;
memset(&overlapped, 0, sizeof(overlapped));
printf("reading: %d\n", time(NULL));
BOOL result = ReadFile(hFile, data, dataSize, NULL, &overlapped);
printf("sent: %d\n", time(NULL));
DWORD bytesRead;
result = GetOverlappedResult(hFile, &overlapped, &bytesRead, TRUE); // wait until completion - returns immediately
printf("done: %d\n", time(NULL));
CloseHandle(hFile);
}
int main()
{
Read();
}
On Windows XP output is:
reading: 1296651896
sent: 1296651896
done: 1296651899
It means that ReadFile didn't block and returned imediatly at the same second, whereas reading process continued for 3 seconds. It is normal async reading.
But on windows 7 and windows 2008 I get following results:
reading: 1296661205
sent: 1296661209
done: 1296661209.
It is a behavior of sync reading.
MSDN says that async ReadFile sometimes can behave as sync (when the file is compressed or encrypted for example). But the return value in this situation should be TRUE and GetLastError() == NO_ERROR.
On Windows 7 I get FALSE and GetLastError() == ERROR_IO_PENDING. So WinApi tells me that it is an async call, but when I look at the test I see that it is not!
I'm not the only one who found this "bug": read the comment on ReadFile MSDN page.
So what's the solution? Does anybody know? It is been 14 months after Denis found this strange behavior.
I don't know the size of the "c:\1.avi" file but the size of the buffer you give to Windows (256M!) is probably big enough to hold the file. So windows decides to read the whole file and put it in the buffer the way it likes. You don't say to windows "I want async", you say "I know how to handle async".
Just change the buffer size say 1024, and your program will behave exactly the same, but read only 1024 bytes (and return ERROR_IO_PENDING as well).
In general, you do asynchronous because you want to do something else during the operation. Look at the sample here: Testing for the End of a File, as it demonstrate an async ReadFile. If you change the sample's buffer and set it to a big value, it should behave exactly like yours.
PS: I suggest you don't rely on time samples to check things, use return codes and events
According to this, I would suspect that it should return TRUE in your case. But it may also be that the completion modes default settings are different on Win7/Win2k8.
Try setting a different mode with SetFileCompletionNotificationModes().
Have you tried to use an event as #Simon Mourier suggested ?. I know that the documentation says that the event is not required, but if you see the example in links provided by #Simon Mourier, it is using an event for asynchronous read.
Windows7/Server2008 have different behavior to resolve a race condition that can occurn in GetOverlappedResultEx. When you compile for these OS's Windows detects this and uses different behavior. I find this wicked confusing.
Here is a link:
http://msdn.microsoft.com/en-us/library/dd371711(VS.85).aspx
I'm sure you've read this many times in the past, but some of the text has changed since Win7 - esp the hEvent field in the OVERLAPPED struct,
http://msdn.microsoft.com/en-us/library/ms684342(v=VS.85).aspx
Functions such as
GetOverlappedResult and the
synchronization wait functions reset
auto-reset events to the nonsignaled
state. Therefore, you should use a
manual reset event; if you use an
auto-reset event, your application can
stop responding if you wait for the
operation to complete and then call
GetOverlappedResult with the bWait
parameter set to TRUE.
could you do an experiment - please allocate a manual reset event in your OVERLAPPED struct instead of a auto reset event? (I dont see the allocation in your snippit - dont forget to create the event and to set 'hEvent' after zeroing the struct)
This probably has something to do with caching. Try to open the file non-cached (FILE_FLAG_NO_BUFFERING)
EDIT
This is actually documented in the MSDN documentation for ReadFile:
Note If a file or device is opened
for asynchronous I/O, subsequent calls
to functions such as ReadFile using
that handle generally return
immediately, but can also behave
synchronously with respect to blocked
execution. For more information see
http://support.microsoft.com/kb/156932.

GetIpAddrTable() leaks memory. How to resolve that?

On my Windows 7 box, this simple program causes the memory use of the application to creep up continuously, with no upper bound. I've stripped out everything non-essential, and it seems clear that the culprit is the Microsoft Iphlpapi function "GetIpAddrTable()". On each call, it leaks some memory. In a loop (e.g. checking for changes to the network interface list), it is unsustainable. There seems to be no async notification API which could do this job, so now I'm faced with possibly having to isolate this logic into a separate process and recycle the process periodically -- an ugly solution.
Any ideas?
// IphlpLeak.cpp - demonstrates that GetIpAddrTable leaks memory internally: run this and watch
// the memory use of the app climb up continuously with no upper bound.
#include <stdio.h>
#include <windows.h>
#include <assert.h>
#include <Iphlpapi.h>
#pragma comment(lib,"Iphlpapi.lib")
void testLeak() {
static unsigned char buf[16384];
DWORD dwSize(sizeof(buf));
if (GetIpAddrTable((PMIB_IPADDRTABLE)buf, &dwSize, false) == ERROR_INSUFFICIENT_BUFFER)
{
assert(0); // we never hit this branch.
return;
}
}
int main(int argc, char* argv[]) {
for ( int i = 0; true; i++ ) {
testLeak();
printf("i=%d\n",i);
Sleep(1000);
}
return 0;
}
#Stabledog:
I've ran your example, unmodified, for 24 hours but did not observe that the program's Commit Size increased indefinitely. It always stayed below 1024 kilobyte. This was on Windows 7 (32-bit, and without Service Pack 1).
Just for the sake of completeness, what happens to memory usage if you comment out the entire if block and the sleep? If there's no leak there, then I would suggest you're correct as to what's causing it.
Worst case, report it to MS and see if they can fix it - you have a nice simple test case to work from which is more than what I see in most bug reports.
Another thing you may want to try is to check the error code against NO_ERROR rather than a specific error condition. If you get back a different error than ERROR_INSUFFICIENT_BUFFER, there may be a leak for that:
DWORD dwRetVal = GetIpAddrTable((PMIB_IPADDRTABLE)buf, &dwSize, false);
if (dwRetVal != NO_ERROR) {
printf ("ERROR: %d\n", dwRetVal);
}
I've been all over this issue now: it appears that there is no acknowledgment from Microsoft on the matter, but even a trivial application grows without bounds on Windows 7 (not XP, though) when calling any of the APIs which retrieve the local IP addresses.
So the way I solved it -- for now -- was to launch a separate instance of my app with a special command-line switch that tells it "retrieve the IP addresses and print them to stdout". I scrape stdout in the parent app, the child exits and the leak problem is resolved.
But it wins "dang ugly solution to an annoying problem", at best.

How to know what amount of memory I'm using in a process? win32 C++

I'm using
Win32 C++ in
CodeGear Builder 2009
Target is Windows XP Embedded.
I found the PROCESS_MEMORY_COUNTERS_EX struct
and I have created a siple function to return the
Memory consumption of my process
SIZE_T TForm1::ProcessPrivatBytes( DWORD processID )
{
SIZE_T lRetval = 0;
HANDLE hProcess;
PROCESS_MEMORY_COUNTERS_EX pmc;
hProcess = OpenProcess( PROCESS_QUERY_INFORMATION |
PROCESS_VM_READ,
FALSE, processID );
if (NULL == hProcess)
{
lRetval = 1;
}
else
{
if ( GetProcessMemoryInfo( hProcess, (PROCESS_MEMORY_COUNTERS*)&pmc, sizeof(pmc)) )
{
lRetval = pmc.WorkingSetSize;
lRetval = pmc.PrivateUsage;
}
CloseHandle( hProcess );
}
return lRetval;
}
//---------------------------------------------------------------------------
Do i have to use lRetval = pmc.WorkingSetSize; or lRetval = pmc.PrivateUsage;
the privateUsage are what I see in perfmon.
but what is that WorkingSetSize exactly.
I what to see every byte I allocate in the counter when I allocate it. Is this Posible?
regards
jvdn
This is a much tougher question than you probably realized. The reason is that Windows shares most executable code between processes (especially the ones that make up most of Windows itself) between processes. For example, there's normally ONE copy of kernel32.dll loaded into memory, but it'll normally be mapped into every process. Do you consider that part of the memory your process is "using" or not?
Private memory is what's unique to that particular process. This can be somewhat misleading too. Since the executable for your process could potentially be shared with another process (i.e. two instances of your program could be run), that's not counted as part of the private memory, even if (as is often the case) there's only one instance of it running.
The working set size is about 99.999% meaningless. What it returns is whatever has been set as the preferred working set size for the process. You can adjust that with SetProcessWorkingSetSize(). Windows has a working set trimmer that attempts to trim down working sets. If memory serves, it uses the working set size to guess at whether it's worth trying to trim the working set of this process -- i.e. if its current working set is larger than the working set size was set to, it tries to trim it down. Otherwise, it (mostly) leaves it alone.
Chances are that nothing you do will show you ever byte you allocate as you allocate it though. Calling Windows to allocate memory is fairly slow, so what's normally done is that the run-time library allocates a fairly big chunk of memory from Windows. When you allocate memory, the run-time library gives you a piece of that big chunk. Only when that chunk is gone does it go back to Windows and ask for more.

Resources