BSOD due to DRIVER_POWER_STATE_FAILURE (9f) with virtual NIC driver - ndis

I am working on a NDIS 6 miniport virtual NIC driver which does not handle IRP_MJ_POWER or IRP_MN_SET_POWER. The driver does not register in DispatchPower function for IRP_MJ_POWER. It sets this as NULL. The driver does power management using OID_PNP_CAPABILITIES and OID_PNP_SET_POWER.
Recently I observed a crash on Windows 7 machine with code DRIVER_POWER_STATE_FAILURE (9f). After doing some analysis, I observed that the crash happened because the driver did not process the Power IRP (IRP_MJ_POWER) for more than 10 minutes. This is the first time I observed this problem.
I am also curious to know more about delivery of IRP_MJ_POWER/IRP_MN_SET_POWER to drivers. Is it mandatory for NDIS drivers to handle these IRPs. I have seen multiple miniport drivers that do not have dispatch function registered for IRP_MJ_POWER. If it is not mandatory and driver does not need to register dispatch function for IRP_MJ_POWER, under what conditions, such a problem can happen.

NDIS miniports do not (and can not) handle IRPs directly. NDIS handles them on behalf of the miniport. In general, when you see the system exhibiting some WDM behavior (like a bugcheck that talks about IRPs) you need to mentally translate that into the equivalent NDIS concepts and callbacks. Unfortunately this is not always obvious at first -- it takes a few years of working with NDIS to get the hang of this.
In the particular case of 9F -- that's a very common bugcheck for a network adapter. About 70% of the time, it's caused because the miniport driver leaked a NET_BUFFER_LIST and didn't complete the NBL back to NDIS. This is a common code bug, because there are otherwise no symptoms of leaking an NBL, and it's all too overlook a small leak.
About 20% of the time, a 9F in networking is caused by some other thing getting stuck, like a stuck OID or a deadlock in MiniportPause. The remaining few percent of networking 9Fs are caused by bugs in filter drivers (who also tend to leak NBLs) and the occasional OS bug.
When debugging a 9F in a network driver, you should focus (1) identifying what thing specifically isn't getting completed (NBL, OID, function call, etc), and then (2) figure out why it's not getting completed. Sometimes you get lucky, and the kernel debugger command !stacks 2 ndis! has all the stacks in a neat little deadlock for you to unravel. Other times you need to add some diagnostics/counters to narrow down what happened to the NBLs.
Good luck.

Related

Which networking syscalls cause VM Exits to Hypervisor in Intel VMX?

I am having trouble understanding which (if any) system calls cause a VM Exit to VMX root-mode under Intel VMX. I am specifically interested in network-related system calls (i.e. socket, accept, send, recv) as they require a "virtual" device. I understand the hypervisor will have to be invoked to actually open a socket, but could this be done in parallel (assuming on a multi-core processor)?
Any clarification would be greatly appreciated.
According to the Intel 64 and IA-32 Architectures Software Developer's Manual (Volume 3, Chapter 22) none of int 0x80, sysenter and syscall, the three main instructions used under Linux to execute a system call, can cause VM exits per se. So in general there isn't a clear-cut way to tell which syscalls cause a VM exit and which ones don't.
VM exits can occur in a lot of scenarios, for example the host can configure an exception bitmap to decide which exceptions cause a VM exit, including page faults, so in theory almost any piece of code doing memory operations (kernel or user) could cause a VM exit.
Excluding such an extreme case and talking specifically about networking, as Peter Cordes suggests in the above comment, what you should be concerned about are operations that [may] send and receive data, since those will eventually require communication with the hardware (NIC):
Syscalls like socket, socketpair, {get,set}sockopt, bind, shutdown (etc.) should not cause VM exits since they do not require communication with the underlying hardware and they merely manipulate logical kernel data structures.
read, recv and write can cause VM exits unless the kernel already has data available to read or is waiting to accumulate enough data to write (e.g. as per Nagle's algorithm) before sending. Whether or not the kernel actually stops to read from HW or directly sends to HW depends on socket options, syscall flags and current state of the underlying socket/connection.
sendto, recvfrom, sendmsg, recvmsg (etc.), select, poll, epoll (etc.) on network sockets can all cause VM exits, again depending on the specific situation, pretty much the same reasoning as the previous point.
connect should not need to VM exit for datagram sockets (SOCK_DGRAM) as it merely sets a default address, but definitely can for connection-based protocols (e.g. SOCK_STREAM) as the kernel needs to send and receive packets to establish a connection.
accept also needs to send/receive data and therefore can cause VM exits.
I understand the hypervisor will have to be invoked to actually open a socket, but could this be done in parallel (assuming on a multi-core processor)?
"In parallel" is not the term that I would use, but network I/O operations can be handled by the OS asynchronously, e.g. packets are not necessarily received or sent exactly when requested through a syscall, but when needed. For example, one or more VM exits needed to receive data could have already been performed before the guest userspace program issues the relative syscall.
Is it always necessary for a VM Exit to occur (if necessary) to send a packet on the NIC if on a multi-core system and there are available cores that could allow the VMM and a guest to run concurrently? I guess what I'm asking if increased parallelism could prevent VM Exits simply by allowing the hypervisor to run in parallel with a guest.
When a VM exit occurs the guest CPU is stopped and cannot resume execution until the VMM issues a VMRESUME for it (see Intel SDE Vol 3 Chapter 23.1 "Virtual Machine Control Structures Overview"). It is not possible to "prevent" a VM exit from occurring, however on a multi-processor system the VMM could theoretically run on multiple cores and delegate the handling of a VM exit to another VMM thread while resuming the stopped VM early.
So while increased parallelism cannot prevent VM exits, it could theoretically reduce their overhead. However do note that this can only happen for VM exits that can be handled "lazily" while resuming the guest. As an example, if the guest page-faults and VM-exits, the VMM cannot really "delegate" the handling of the VM exit and resume the guest earlier, since the guest will need the page fault to be resolved before resuming execution.
All in all, whenever the guest kernel needs to communicate with hardware, this can be a cause of VM exit. Access to emulated hardware for I/O operations requires the hypervisor to step in and therefore cause VM exits. There are however possible optimizations to consider:
Hardware passthrough can be used on systems which support IOMMU to make devices directly available to the guest OS and achieve very low overhead in HW communication with no need for VM exits. See Intel VT-d, Intel VT-c, SR-IOV, and also "PCI passthrough via OVMF" on ArchWiki.
Virtio is a standard for paravirtualization of network (NICs) and block devices (disks) which aims at reducing I/O overhead (i.e. overall number of needed VM exits), but needs support from both guest and host. The guest is "aware" of being a guest in this case. See also: Virtio for Linux/KVM.
Further reading:
x86 virtualization - Wikipedia
Virtual device passthrough for high speed VM networking - S. Garzarella, G. Lettieri, L. Rizzo
virtio: Towards a De-Facto Standard For Virtual I/O Devices - Rusty Russell

Virtual Windows Driver proper replacement for interrupts

I'm working on a virtual audio/midi driver and although its already working, I'm wondering whether my implementation is ... proper..
Usually, the midi hardware triggers interrupts in the driver to send / receive / process data, however as my driver is virtual, there is no hardware that could trigger the interrupts.
The way I handled this is that I set up a DPC timer for 100ms that calls the processing / sending routines, data received is still handles via interrupt from the OS.
Now that obviously isn't quite what DPCs are for, are they. However I cannot think of another implementation that works as well.
So.. any suggestions would be greatly appreciated :)
Regards,
Xaser

Two-way communication between kernel-mode driver and user-mode application?

I need a two-way communication between a kernel-mode WFP driver and a user-mode application. The driver initiates the communication by passing a URL to the application which then does a categorization of that URL (Entertainment, News, Adult, etc.) and passes that category back to the driver. The driver needs to know the category in the filter function because it may block certain web pages based on that information. I had a thread in the application that was making an I/O request that the driver would complete with the URL and a GUID, and then the application would write the category into the registry under that GUID where the driver would pick it up. Unfortunately, as the driver verifier pointed out, this is unstable because the Zw registry functions have to run at PASSIVE_LEVEL. I was thinking about trying the same thing with mapped memory buffers, but I’m not sure what the interrupt requirements are for that. Also, I thought about lowering the interrupt level before the registry function calls, but I don't know what the side effects of that are.
You just need to have two different kinds of I/O request.
If you're using DeviceIoControl to retrieve the URLs (I think this would be the most suitable method) this is as simple as adding a second I/O control code.
If you're using ReadFile or equivalent, things would normally get a bit messier, but as it happens in this specific case you only have two kinds of operations, one of which is a read (driver->application) and the other of which is a write (application->driver). So you could just use WriteFile to send the reply, including of course the GUID so that the driver can match up your reply to the right query.
Another approach (more similar to your original one) would be to use a shared memory buffer. See this answer for more details. The problem with that idea is that you would either need to use a spinlock (at the cost of system performance and power consumption, and of course not being able to work on a single-core system) or to poll (which is both inefficient and not really suitable for time-sensitive operations).
There is nothing unstable about PASSIVE_LEVEL. Access to registry must be at PASSIVE_LEVEL so it's not possible directly if driver is running at higher IRQL. You can do it by offloading to work item, though. Lowering the IRQL is usually not recommended as it contradicts the OS intentions.
Your protocol indeed sounds somewhat cumbersome and doing a direct app-driver communication is probably preferable. You can find useful information about this here: http://msdn.microsoft.com/en-us/library/windows/hardware/ff554436(v=vs.85).aspx
Since the callouts are at DISPATCH, your processing has to be done either in a worker thread or a DPC, which will allow you to use ZwXXX. You should into inverted callbacks for communication purposes, there's a good document on OSR.
I've just started poking around WFP but it looks like even in the samples that they provide, Microsoft reinject the packets. I haven't looked into it that closely but it seems that they drop the packet and re-inject whenever processed. That would be enough for your use mode engine to make the decision. You should also limit the packet capture to a specific port (80 in your case) so that you don't do extra processing that you don't need.

Hardware IO Access from Interrupt Handler with Windows XP 32 bit

I have a Windows XP application that is using a driver called TVicHW32 which allows me to create an interrupt handler for OS interrupts. Currently I am using a custom ISA card in an industrial chassis with IRQ 5
The interrupt handler code is working and I can see a variable being incremented so the code that sets up and handles the interrupt is working.
The issue I have is that an IO access call fails to generate any IO activity on the ISA bus. I have an address at 0x308 that is used to trigger a start pulse on the ISA bus interface board.
If I trigger this pulse from the main code, for example, from a timer, the pulse is detected on the ISA bus and the card responds.
If I call the exact same function call to access that IO address from within the interrupt handler, nothing appears on the ISA bus. A logic analyser confirms this.
I have emailed the supplier of the driver but that can't help so I was wondering if anyone here has come across this situation and can offer a solution. This is critical to getting this project working and the only solution I can think of is to develop a custom driver with the DDK but as this requires a steep learning curve, I would hope to find an alternative solution.
Thanks
Dave...

How do improve Tx peformance in USB device driver?

I developed a device driver for a USB 1.1 device onr Windows 2000 and later with Windows Driver Model (WDM).
My problem is the pretty bad Tx performance when using 64byte bulk transfers. Depending on the used USB Host Controller the maximum packet throughput is either 1000 packets (UHCI) or 2000 packets (OHCI) per seconds. I've developed a similar driver on Linux Kernel 2.6 with round about 5000 packets per second.
The Linux driver uses up to 10 asynchronous bulk transfer while the Windows driver uses 1 synchronous bulk transfer. So comparing this makes it clear while the performance is so bad, but I already tried with asynchronous bulk transfers as well without success (no performance gain).
Does anybody has some tips and tricks how to boost the performance on Windows?
I've now managed it to speed up sending to about 6.6k messages/s. The solution was pretty simple, I've just implemented the same mechanism as in the Linux driver.
So now I'm scheduling up to 20 URBs at once, at what should I say, it worked.
What kind of throughput are you getting? USB 1.1 is limited to about 1.5 Mbit/s
It might be a limitation that you'll have to live with, the one thing you must never do is to starve the system for resources. I've seen so many poor driver implementations where the driver is hogging system resources in utter failure to increase its own performance.
My guess is that you're using the wrong API calls, have you looked at the USB samples in the Win32 DDK?

Resources