Two-way communication between kernel-mode driver and user-mode application? - windows

I need a two-way communication between a kernel-mode WFP driver and a user-mode application. The driver initiates the communication by passing a URL to the application which then does a categorization of that URL (Entertainment, News, Adult, etc.) and passes that category back to the driver. The driver needs to know the category in the filter function because it may block certain web pages based on that information. I had a thread in the application that was making an I/O request that the driver would complete with the URL and a GUID, and then the application would write the category into the registry under that GUID where the driver would pick it up. Unfortunately, as the driver verifier pointed out, this is unstable because the Zw registry functions have to run at PASSIVE_LEVEL. I was thinking about trying the same thing with mapped memory buffers, but I’m not sure what the interrupt requirements are for that. Also, I thought about lowering the interrupt level before the registry function calls, but I don't know what the side effects of that are.

You just need to have two different kinds of I/O request.
If you're using DeviceIoControl to retrieve the URLs (I think this would be the most suitable method) this is as simple as adding a second I/O control code.
If you're using ReadFile or equivalent, things would normally get a bit messier, but as it happens in this specific case you only have two kinds of operations, one of which is a read (driver->application) and the other of which is a write (application->driver). So you could just use WriteFile to send the reply, including of course the GUID so that the driver can match up your reply to the right query.
Another approach (more similar to your original one) would be to use a shared memory buffer. See this answer for more details. The problem with that idea is that you would either need to use a spinlock (at the cost of system performance and power consumption, and of course not being able to work on a single-core system) or to poll (which is both inefficient and not really suitable for time-sensitive operations).

There is nothing unstable about PASSIVE_LEVEL. Access to registry must be at PASSIVE_LEVEL so it's not possible directly if driver is running at higher IRQL. You can do it by offloading to work item, though. Lowering the IRQL is usually not recommended as it contradicts the OS intentions.
Your protocol indeed sounds somewhat cumbersome and doing a direct app-driver communication is probably preferable. You can find useful information about this here: http://msdn.microsoft.com/en-us/library/windows/hardware/ff554436(v=vs.85).aspx

Since the callouts are at DISPATCH, your processing has to be done either in a worker thread or a DPC, which will allow you to use ZwXXX. You should into inverted callbacks for communication purposes, there's a good document on OSR.
I've just started poking around WFP but it looks like even in the samples that they provide, Microsoft reinject the packets. I haven't looked into it that closely but it seems that they drop the packet and re-inject whenever processed. That would be enough for your use mode engine to make the decision. You should also limit the packet capture to a specific port (80 in your case) so that you don't do extra processing that you don't need.

Related

Resolve Windows socket error WSAENOBUFS (10055)

Our application has a feature to actively connect to the customers' internal factory network and send a message when inspection events occur. The customer enters the IP address and port number of their machine and application into our software.
I'm using a TClientSocket in blocking mode and have provided callback functions for the OnConnect and OnError events. Assuming the abovementioned feature has been activated, when the application starts I call the following code in a separate thread:
// Attempt active connection
try
m_socketClient.Active := True;
except
end;
// Later...
// If `OnConnect` and socket is connected...send some data!
// If `OnError`...call `m_socketClient.Active := True;` again
When IP + port are valid, the feature works well. But if not, after several thousand errors (and many hours or even days) eventually Windows socket error 10055 (WSAENOBUFS) occurs and the application crashes.
Various articles such as this one from ServerFramework and this one from Microsoft talk about exhausting the Windows non-paged pool and mention (1) actively managing the number outstanding asynchronous send operations and (2) releasing the data buffers that were used for the I/O operations.
My question is how to achieve this and is three-fold:
A) Am I doing something wrong that memory is being leaked? For example, is there some missing cleanup code in the OnError handler?
B) How do you monitor I/O buffers to see if they are being exhausted? I've used Process Explorer to confirm my application is the cause of the leak, but ideally I'd need some programmatic way of measuring this.
C) Apart from restarting the application, is there a way to ask Windows to clear out or release I/O operation data buffers?
Code samples in Delphi, C/C++, C# fine.
A) The cause of the resource leak was a programming error. When the OnError event occurs, Socket.Close() should be called to release low-level resources associated with the socket.
B) The memory leak does not show up in the standard Working Set memory use of the process. Open handles belonging to your process need to be monitored which is possible with GetProcessHandleCount. See this answer in Delphi which was tested and works well. This answer in C++ was not tested but the answer is accepted so should work. Of course, you should be able to use GetProcessHandleCount directly in C++.
C) After much research, I must conclude that just like a normal memory leak, you cannot just ask Windows to "clean up" after you! The handle resource has been leaked by your application and you must find and fix the cause (see A and B above).

Who enforces dwShareMode on CreateFile? The OS or the Driver?

I have a Windows application that interacts with some hardware. A handle to the hardware is opened with CreateFile, and we control the hardware using DeviceIoControl.
I'm attempting to update an application which uses this hardware to open the hardware in an exclusive mode, so that other programs can't access the hardware at the same time (the hardware has mutable state that I can't have changed out from under me). I do this by passing 0 as the dwShareMode parameter to CreateFile. After making this change, I am still able to run two separate instances of my application. Both calls to CreateFile in both processes are successful. Neither returns INVALID_HANDLE_VALUE.
I believe one of several things is happening, and I'm asking for help narrowing the problem down.
I badly misunderstand the dwShareMode parameter
dwShareMode doesn't have any effect on DeviceIoControl - only ReadFile or WriteFile
The driver itself is somehow responsible for respecting the dwShareMode parameter and our driver is written badly. This, sadly, isn't totally unheard of.
Edit Option 2 is nonsense. dwShareMode should prevent the 2nd CreateFile from happening, DeviceIoControl has nothing to do with it. It must be option #1 or option #3
The Question:
Is the device driver responsible for looking at the dwShareMode parameter, and rejecting requests if someone has already opened a handle without sharing, or is the OS responsible?
If the device driver is responsible, then I'm going to assume #3 is happening. If the OS is responsible, then it must be #1.
Some additional Clues:
IRP_MJ_CREATE documentation suggests that the sharing mode does indeed get passed down to the device driver
I believe that sharing rules are only enforced on some devices. In many (most?) cases enforcing sharing rules on the device object itself (as opposed to on objects within the device namespace) would make no sense.
Therefore, it must be the responsibility of the device driver to enforce these rules in those rare cases where they are required. (Either that or the device driver sets a flag to instruct the operating system to do so, but there doesn't seem to be a flag of this sort.)
In the case of a volume device, for example, you can open the device with a sharing mode of 0 even though the volume is mounted. [The documentation for CreateFile says you must use FILE_SHARE_WRITE but this does not appear to be true.]
In order to gain exclusive access to the volume, you use the FSCTL_LOCK_VOLUME control code.
[That's a file system driver, so it might not be a typical case. But I don't think it makes a difference in this context.]
Serial port and LPT drivers would be an example of a device that should probably enforce sharing rules. I think there may be some applicable sample code, perhaps this would shed light on things?
Edited to add:
I've had a look through the Windows Research Kernel source (this is essentially the same as the Windows Server 2003 kernel) and:
1) The code that opens a device object (by sending IRP_MJ_CREATE to the driver) does not appear to make any attempt to enforce the sharing mode parameter, though it does check access permissions and enforces the Exclusive flag for the driver;
2) I've also searched the code for references to the structure member that holds the requested dwShareMode. As far as I can see it is written into the relevant structure by the internal function that implements CreateFile, and later passed to the device driver, but otherwise ignored.
So, my conclusion remains the same: enforcing the sharing mode rules, or providing an alternative mechanism if appropriate, is the responsibility of the device driver.
(The kernel does, however, provide functions such as IoCheckShareAccess to assist file system drivers in enforcing sharing rules.)
In cases where we open a COM port with :
CreateFile(devname,GENERIC_READ | GENERIC_WRITE,
0,
0,
OPEN_EXISTING,
0,
0);
It doesnt allow another application to open the same COM port until the previous handle is closed. I would suggest walking through serenum.sys to check if it has a role here.

Best way to communicate from KEXT to Daemon and block until result is returned from Daemon

In KEXT, I am listening for file close via vnode or file scope listener. For certain (very few) files, I need to send file path to my system daemon which does some processing (this has to happen in daemon) and returns the result back to KEXT. The file close call needs to be blocked until I get response from daemon. Based on result I need to some operation in close call and return close call successfully. There is lot of discussion on KEXT communication related topic on the forum. But they are not conclusive and appears be very old (year 2002 around). This requirement can be handled by FtlSendMessage(...) Win32 API. I am looking for equivalent of that on Mac
Here is what I have looked at and want to summarize my understanding:
Mach message: Provides very good way of bidirectional communication using sender and reply ports with queueing mechansim. However, the mach message APIs (e.g. mach_msg, mach_port_allocate, bootstrap_look_up) don't appear to be KPIs. The mach API mach_msg_send_from_kernel can be used, but that alone will not help in bidirectional communication. Is my understanding right?
IOUserClient: This appears be more to do with communicating from User space to KEXT and then having some callbacks from KEXT. I did not find a way to initiate communication from KEXT to daemon and then wait for result from daemon. Am I missing something?
Sockets: This could be last option since I would have to implement entire bidirectional communication channel from KEXT to Daemon.
ioctl/sysctl: I don't know much about them. From what I have read, its not recommended option especially for bidirectional communication
RPC-Mig: Again I don't know much about them. Looks complicated from what I have seen. Not sure if this is recommended way.
KUNCUserNotification: This appears to be just providing notification to the user from KEXT. It does not meet my requirement.
Supported platform is (10.5 onwards). So looking at the requirement, can someone suggest and provide some pointers on this topic?
Thanks in advance.
The pattern I've used for that process is to have the user-space process initiate a socket connection to the KEXT; the KEXT creates a new thread to handle messages over that socket and sleeps the thread. When the KEXT detects an event it needs to respond to, it wakes the messaging thread and uses the existing socket to send data to the daemon. On receiving a response, control is passed back to the requesting thread to decide whether to veto the operation.
I don't know of any single resource that will describe that whole pattern completely, but the relevant KPIs are discussed in Mac OS X Internals (which seems old, but the KPIs haven't changed much since it was written) and OS X and iOS Kernel Programming (which I was a tech reviewer on).
For what it's worth, autofs uses what I assume you mean by "RPC-Mig", so it's not too complicated (MIG is used to describe the RPC calls, and the stub code it generates handles calling the appropriate Mach-message sending and receiving code; there are special options to generate kernel-mode stubs).
However, it doesn't need to do any lookups, as automountd (the user-mode daemon to which the autofs kext sends messages) has a "host special port" assigned to it. Doing the lookups to find an arbitrary service would be harder.
If you want to use the socket established with ctl_register() on the KExt side, then beware: The communication from kext to user space (via ctl_enqueuedata()) works OK. However opposite direction is buggy on 10.5.x and 10.6.x.
After about 70.000 or 80.000 send() calls with SOCK_DGRAM in the PF_SYSTEM domain complete net stack breaks with disastrous consequences for complete system (hard turning off is the only way out). This has been fixed in 10.7.0. I workaround by using setsockopt() in our project for the direction from user space to kext as we only send very small data (just to allow/disallow some operation).

Using gevent and multiprocessing together to communicate with a subprocess

Question:
Can I use the multiprocessing module together with gevent on Windows in an efficient way?
Scenario:
I have a gevent based Python application doing asynchronous I/O on Windows. The application is mostly I/O bound, but there are spikes of higher CPU load as well. This application would need to control a console application via its stdin and stdout. I cannot modify this console application and the user will be able to use his own custom one, only the text (line) based communication protocol is fixed.
I have a working implementation using subprocess and threads, but I would rather move the whole subprocess based communication code together with those threads into a separate process to turn the main application back to single-threaded. I plan to use the multiprocessing module for this.
Prior reading:
I have been searching the Web a lot and read some source code, so I know that the multiprocessing module is using a Pipe implementation based on named pipes on Windows. A pair of multiprocessing.queue.Queue objects would be used to communicate with the second Python process. These queues are based on that Pipe implementation, e.g. the IPC would be done via named pipes.
The key question is, whether calling the incoming Queue's get method would block gevent's main loop or not. There's a timeout for that method, so I could make it into a loop with a small timeout, but that's not a good solution, since it would still block gevent for small time periods hurting its low I/O latency.
I'm also open to suggestions on how to circumvent the whole problem of using pipes on Windows, which is known to be hard and sometimes fragile. I'm not sure whether shared memory based IPC is possible on Windows or not. Maybe I could wrap the console application in a way which would allow communicating with the child process using network sockets, which is known to work well with gevent.
Please don't question my primary use case, if possible. Thanks.
The Queue's get method is really blocking. Using it with timeout could potentially solve your problem, but it definitely won't be a cleanest solution and, which is the most important, will introduce extra latency for no good reason. Even if it wasn't blocking, that won't be a good solution either. Just because non-blocking itself is not enough, the good asynchronous call/API should smoothly integrate into the I/O framework in use. Be that gevent for Python, libevent for C or Boost ASIO for C++.
The easiest solution would be to use simple I/O by spawning your console applications and attaching to its console in and out descriptors. There are at two major factors to consider:
It will be extremely easy for your clients to write client applications. They will not have to work with any kind of IPC, socket or other code, which could be very hard thing for many. With this approach, application will just read from stdin and write to stdout.
It will be extremely easy to test console applications using this approach as you can manually start them, enter text into console and see results.
Gevent is a perfect fit for async read/write here.
However, the downside is that you will have to start this application, there will be no support for concurrent communication with it, and there will be no support for communication over network. There is even a good example for starters.
To keep it simple but more flexible, you can use TCP/IP sockets. If both client and server are running on the same machine. Also, a good operating system will use IPC as an underlying implementation, so it will be fast. And, if you are worrying about performance of this case, you probably should not use Python at all and look at other technologies.
Even fancies solution – use ZeroC ICE. It is very modern technology allowing almost seamless inter-process communication. It is a CORBA killer, very easy to use. It is heavily used by many, proven to be fastest in its class and rock stable. The beauty of this solution is that you can seamlessly integrate programs in many different languages, like Python, Java, C++ etc. But this will require some of your time to get familiar with a concept. If you decide to go this way, just spend a day reading trough documentation.
Hope it helps. Good luck!
Your question is already quite old. Nevertheless, I would like to recommend http://gehrcke.de/gipc which -- I believe -- would tackle the outlined challenge in a very straight-forward fashion. Basically, it allows you to integrate multiprocessing-based child processes anywhere in your application (also on Windows). Interaction with Process objects (such as calling join()) is gevent-cooperative. Via its pipe management, it allows for cooperatively blocking inter-process communication. However, on Windows, IPC currently is much less efficient than on POSIX-compliant systems (since non-blocking I/O is imitated through a thread pool). Depending on the IPC messaging volume of your application, this might or might not be of significance.

how custom route for a process?

In my computer, there are two network adapters, connecting to different subnet. As below:
adapter A: 10.20.30.201
adapter B: 10.20.31.201
I want to make all outgoing data of a special process (for example Process A) through adapter A. That is I want to make adapter A as the process's default route.
I know, I can modify route table for some special destination, But what I want to do here is very different. Process A may communicate with many different IP and I don't know in advance.
Winsock2 provides LSP as a way to lay a dll in TCP/IP stack. I'm not familiar with LSP and don't know whether LSP can do what I want to do.
Can anybody give me some suggestion, Thanks.
A quick background on LSP:
An application, which uses Winsock2 API, calls a combination of WSA-prefix functions, eg WSAConnect, WSASocket, WSASend, WSARecv, etc.
If an application still use old winsock functions, these functions are mapped to Winsock2 behind the scene anyway. For instances: send() is mapped to WSASend(), recv() to WSARecv(), etc
WSA-prefix functions will internally call their corresponding WSP-prefix functions provided by LSP. For instances WSASend() calls WSPSend(), WSASocket() call WSPSocket(), etc. In short, WSAWhateverFunction() will calls WSPWhateverFunction(). Their parameters/returns are also the same (Not quite, but kind of).
LSP is a dll with these WSP-prefix functions implemented, eg. modify outbound/inbound traffic, filtering, etc. However an LSP is still a userspace dll. It's as limited as other userspace programs, and has no higher privilege than its host application, eg internet browsers. It has access to same set of system functions that is available to other programs, eg. winsock etc.
Conclusion is if your program can direct out-coming traffic to specific NIC, LSP can do it too. If it can't, neither can LSP. LSP therefore is irrelevant to your problem.

Resources