interrupt paradigms (MSI/MSI-X and legacy) in drivers - linux-kernel

Suppose a PCI hardware supports three available interrupt paradigms:
Legacy pin based INTx
MSI
MXI-X
I'd like to support all three modes in my driver and pass an intr_type argument in module_param() macro. I'm wondering what is the general rule of thumb: if by default the command line parameter is empty, support MSI or MSI-X?
Since MSI and MSI-X are logically close, which one should be preferable to use?
For instance: if the driver detects that the device supports MSI-X, is this sufficient to try enable MSI-X and use it in driver, and in case of failure revert to the legacy INTx ?
Thanks.

INTx interrupts are likely to be shared, so the driver always has to check if its own device is the interrupt source. Typically, you want to avoid that.
MSI-X was designed to overcome some shortcomings of MSI. If you don't know what those are, then it's likely that they do not matter for your driver, and you can choose either one.

Basically, you can just query the endpoint capabilities for MSI/MSIX support.
grep for: PCI_CAP_ID_MSIand PCI_CAP_ID_MSIX
You should fallback from MSIX to MSI and to INT if it's not supported.

Related

Using netcfg to remove an NDIS LWF doesn't remove it from the driver store?

When i try to remove my NDIS LWF using netcfg -u, i notice that it doesn't remove it from the driver store (can be seen with pnputil /enum-drivers).
This is causing problem because on some Windows 10 machines, if we uninstall the previous version of our NDIS LWF and install the new one using netcfg, for some unknown reason the old inf is still used to install it! And i assume its because the inf still has the same componentID? We are updating the INF file in order to attach to some virtual adapters that we previously couldn't attach. Note that this doesn't happen in Windows 7, and we can install the new one without any problem.
So my questions are:
Why is Windows still using the previous INF from driver store when we try to install the new updated driver that has a different INF?
What is the proper way to fully remove the previous NDIS LWF, including from driver store? If we need to use pnputil to fully remove it from driver store, then what is the proper way of finding the OEM number, considering that pnputil -d requires an OEM number?
Right, as you've noticed, netcfg.exe -i is not the exact opposite of netcfg.exe -u.
Installation does these steps:
Install the INF you provided with -l to the driver store (SetupCopyOEMInf)
Call INetCfgClassSetup::Install to:
Query PNP for the "best match" for the componentId you provided with -i (SetupDiBuildDriverInfoList, SetupDiSelectBestCompatDrv)
Run all the sections in the INF (AddReg, AddService, etc)
Register a LWF/Protocol/TDI driver with the system using info in the Ndi registry key
Uninstall does these steps:
Call INetCfgComponent::DeInstall to:
Deregister your LWF/Protocol/TDI driver with the system
Run the special .Remove section of the INF (which, hopefully, contains a DelReg, DelService to undo everything done during install step #2.2)
(The descriptions above ignore the driver refcount system (aka OBO_TOKEN), since it isn't often used — most drivers just use a single refcount. If you exclusively use netcfg.exe to manage your driver, then you too can ignore refcounts.)
You might be wondering: why is this so very less-than-awesome? The backstory here is that netcfg.exe was never really meant to be a general-purpose tool for 3rd party software to manage their drivers. It was only meant to be used internally, for the drivers that are built into the OS (ms_tcpip etc). The assumption was that 3rd party driver installers would want to call proper APIs like INetCfg, not CreateProcess some executable and screen-scrape the output. So netcfg.exe was only built up to be the minimum needed for our internal needs. In particular, very little attention was paid to uninstall, since built-in drivers are rarely uninstalled. (Likewise, argument parsing is inflexible, the help text is not helpful, and the error handling is not robust.)
Starting in Windows 10, built-in drivers are no longer installed using netcfg.exe, so the OS itself doesn't need netcfg.exe at all anymore. But by then, 3rd party products had discovered it and taken a dependency on it, so we couldn't just remove netcfg.exe anymore. Ah well.
Why is Windows still using the previous INF from driver store when we try to install the new updated driver that has a different INF?
This is a common gotcha. Note that, during install, steps #1 and #2 have no association between them. You could install a printer INF and a LWF at the same time — netcfg.exe -l foo.inf -i bar makes no effort whatsoever to ensure that the "best" component selected in step #2.2 actually came from the INF installed in step #1.
In order to ensure that the driver you want is the "best" driver, you have to ensure that your favored driver wins the PNP driver selection algorithm. I've personally been bitten by this because I didn't bump the DriverVer line during development iterations. Make sure you increment DriverVer every time you change the driver.
What is the proper way to fully remove the previous NDIS LWF, including from driver store? If we need to use pnputil to fully remove it from driver store, then what is the proper way of finding the OEM number, considering that pnputil -d requires an OEM number?
Honestly, if you want to do everything really correctly, I suggest avoiding netcfg.exe entirely. Use the underlying INetCfg APIs instead. Then your installer will have to manage the driver (SetupCopyOEMInf / SetupUninstallOEMInf).
You aren't losing much by ditching netcfg.exe and calling INetCfg yourself. netcfg.exe doesn't do anything particularly fancy with INetCfg: its own implementation is nearly exactly taken from this sample code. If you start with that and slap a call to SetupCopyOEMInf on top, you'll pretty much be at parity with netcfg.exe already. From there, you can improve it with more robust INF uninstall. You can even write some code to inventory all the INFs with your componentId, to make sure there aren't stale copies of your INF hiding around.
You still have to make that leap of trust from installing an INF to hoping that INetCfgClassSetup::Install thinks your recently-installed INF is the "best" INF. But if you've removed every other INF with that componentId, then you can be certain that the sole remaining INF must be the best match.

Loading a Windows Driver Class other than NetService to act as an NDIS Filter

Is it possible to take a Windows driver such as a Ports class driver, then have it also set itself up as an NDIS filter (NetService class) driver by calling NdisFRegisterFilterDriver() in it's DriverEntry()? This would be essentially having the driver work double duty as a Ports and NetService class driver, but within a single code base and binary.
I'm attempting to do this and I'm seeing the call to register the NDIS driver fail, specifically with the following trace message:
[0][mp]<==ndisCreateFilterDriverRegistry, FilterServiceName 807EFA18 Status c0000001
[0][mp]==>NdisFRegisterFilterDriver: DriverObject 84C6C428
[0][mp]==>ndisCreateFilterDriverRegistry, FilterServiceName 807EFA18
[0][mp]<==ndisCreateFilterDriverRegistry, FilterServiceName 807EFA18 Status c0000001
I've looked around and it seems that the NDIS driver is heavily dependent on the values placed in the registry from the INF and the INF itself. I've tried to spoof the registry keys by adding the NetCfgInstanceId by hand and calling that value out in my code before trying to register the NDIS filter, but have hit a point where it just seems like the wrong way to go about it.
What is the recommended way to go about this? At this point I'd imagine that this would require a Ports class driver and NetService class driver separately, with some kind of composite driver to tie them together to be able to communicate, or have a way for one or the other to communicate through interprocess communication.
A stern warning
Do not attempt to "install" a filter by manually writing registry keys. As you've noticed, it's not easy, and even if you seem to get it working, it will all collapse when the OS tries to install the next LWF. Furthermore, I added some additional hardening features designed exactly to prevent people from doing this to Windows 10; you'll have to do some significant damage to the OS before you can hijack network bindings in Windows 10.
How to structure your driver package
Anyway, what you're describing is indeed possible. The way to do it is to provide the following in your driver package:
A PNP-style INF. This INF has:
The PORTS class
An AddService directive, that installs your driver service
A CopyFiles directive to bring in any files you need
Any other bits you need for the PNP device
A NetCfg-style INF. This INF has:
The NETSERVICE class
The usual LWF stuff: Characteristics=0x40000, FilterMediaTypes=xxx, FilterType=xxx, etc.
A reference to the service you installed in the other INF (HKR,Ndi,Service,,xxx)
Do not include an AddService or CopyFiles; that's already taken care of by the first INF
One .sys file. This driver does:
In DriverEntry, call NdisFRegisterFilterDriver, and pass the name of your service "xxx"
In DriverEntry, call WdfDriverCreate or fill out the DRIVER_OBJET dispatch table as you normally would for any other PNP driver
Implement FilterAttach and etc normally; implement your WDF EvtXxx or WDM IRP handlers normally
Don't forget to call NdisFDeregisterFilterDriver in EvtDriverUnload or DriverUnload, and also in the failure path for DriverEntry
How to install this fine mess
The good news is that, with these 2 INFs, you can meet your requirement of having 1 .sys file do two things. The bad news is that you've now got 2 INFs. Worse, one of the INFs is a NetCfg-style INF, so you can't just Include+Need it. The only way to install a NetCfg-style INF is to call INetCfgClassSetup::Install (or NetCfg.exe, its command-line wrapper). Windows Update only knows how to install PNP-style INFs, and PNP only knows how to Include other PNP-style INFs.
So the simplest solution is to ship an installer exe/msi that invokes the INetCfg API. If you can do that, it's simply a matter of a couple calls to SetupCopyOemInf and the INetCfg boilerplate that you can find in the bindview sample.
But, if you have to support a hardware-first installation, you need to bring out the big guns. You'll need to write a Co-Installer and include it with your driver package. The Co-Installer's job is to call the INetCfg APIs when your driver package is installed, and deregister when the package is uninstalled.
Co-Installers are generally discouraged, and are not supported for Universal drivers. So you should avoid a Co-Installer unless you've got no choice. Unfortunately I cannot think of any other way to register an NDIS LWF when a PNP device driver is installed through Windows Update. (This doesn't mean there isn't a crafty way to do it; I don't know everything.)
Note that you'd need a Co-Installer anyway even if you were shipping 2 .sys files. The need to call INetCfg doesn't change just because you merged the driver binaries.
Limitations
You'll have a full-fledged NDIS LWF driver, as well as a full-fledged PNP device driver. The only (minor) thing that doesn't work is that you cannot call NdisRegisterDeviceEx in this driver. The reason is that when you call NdisRegisterDeviceEx from a LWF, NDIS will attempt to co-opt your driver's dispatch table. But in this PNP+LWF dual driver, the dispatch table is owned by WDF or by you. This limitation is no problem, since you can call WdfDeviceCreate, and this routine is easier to use and has more features than the NDIS one anyway.
With the above configuration, the driver service is owned by PNP. That means the lifetime of your .sys file is owned by PNP. You cannot manually "net start" a PNP driver service; the only way to get your .sys file loaded is to actually enumerate your hardware. That means you can't have your NDIS LWF running when the hardware is not present. Typically this is what you'd want anyway. If it's not, you can try messing with the ServiceName directive, but there's some weird caveats with that, and I don't fully understand it myself.

how to debug a pci device and linux driver

I am programming a pci device with verilog and also writing its driver,
I have probably inserted some bug in the hardware design and when i load the driver with insmod the kernel just gets stuck and doesnt respond. Now Im trying to figure out what's the last driver code line that makes my computer stuck. I have inserted printk in all relevant functions like probe and init but non of them get printed.
What other code is running when i use insmod before it gets to my init function? (I guess the kernel gets stuck over there)
printks are often not useful debugging such a problem. They are buffered sufficiently that you won't see them in time if the system hangs shortly after printk is called.
It is far more productive to selectively comment out sections of your driver and by process of elimination determine which line is the (first) problem.
Begin by commenting out the entire module's init section leaving only return 0;. Build it and load it. Does it hang? Reboot system, reenable the next few lines (class_create()?) and repeat.
From what you are telling, it is looks like that Linux scheduler is deadlocking by your driver. That's mean that interrupts from the system timer doesn't arrive or have a chance to be handled by kernel. There are two possible reasons:
You hang somewhere in your driver interrupt handler (handler starts its work but never finish it).
Your device creates interrupts storm (Device generates interrupts too frequently as a result your system do the only job -- handling of your device interrupts).
You explicitly disable all interrupts in your driver but doesn't reenable them.
In all other cases system will either crash, either oops or panic with all appropriate outputs or tolerate potential misbehavior of your device.
I guess that printk won't work for such extreme scenario as hang in kernel mode. It is quite heavy weight and due to this unreliable diagnostic tool for scenarios like your.
This trick works only in simpler environments like bootloaders or more simple kernels where system runs in default low-end video mode and there is no need to sync access to the video memory. In such systems tracing via debugging output to the display via direct writing to the video memory can be great and in many times the only tool that can be used for debugging purposes. Linux is not the case.
What techniques can be recommended from the software debugging point of view:
Try to review you driver code devoting special attention to interrupt handler and places where you disable/enable interrupts for synchronization.
Commenting out of all driver logic with gradual uncommenting can help a lot with localization of the issue.
You can try to use remote kernel debugging of your driver. I advice to try to use virtual machine for that purposes, but I'm not aware about do they allow to pass the PCI device in the virtual machine.
You can try the trick with in-memory tracing. The idea is to preallocate the memory chunk with well known virtual and physical addresses and zeroes it. Then modify your driver to write the trace data in this chunk using its virtual address. (For example, assign an unique integer value to each event that you want to trace and write '1' into the appropriate index of bytes array in the preallocated memory cell). Then when your system will hang you can simply force full memory dump generation and then analyze the memory layout packed in the dump using physical address of the memory chunk with traces. I had used this technique with VmWare Workstation VM on Windows. When the system had hanged I just pause a VM instance and looked to the appropriate .vmem file that contains raw memory latout of the physical memory of the VM instance. Not sure that this trick will work easy or even will work at all on Linux, but I would try it.
Finally, you can try to trace the messages on the PCI bus, but I'm not an expert in this field and not sure do it can help in your case or not.
In general kernel debugging is a quite tricky task, where a lot of tricks in use and all they works only for a specific set of cases. :(
I would put a logic analyzer on the bus lines (on FPGA you could use chipscope or similar). You'll then be able to tell which access is in cause (and fix the hardware). It will be useful anyway in order to debug or analyze future issues.
Another way would be to use the kernel crash dump utility which saved me some headaches in the past. But depending your Linux distribution requires installing (available by default in RH). See http://people.redhat.com/anderson/crash_whitepaper/
There isn't really anything that is run before your init. Bus enumeration is done at boot, if that goes by without a hitch the earliest cause for freezing should be something in your driver init AFAIK.
You should be able to see printks as they are printed, they aren't buffered and should not get lost. That's applicable only in situations where you can directly see kernel output, such as on the text console or over a serial line. If there is some other application in the way, like displaying the kernel logs in a terminal in X11 or over ssh, it may not have a chance to read and display the logs before the computer freezes.
If for some other reasons the printks still do not work for you, you can instead have your init function return early. Just test and move the return to later in the init until you find the point where it crashes.
It's hard to say what is causing your freezes, but interrupts is one of those things I would look at first. Make sure the device really doesn't signal interrupts until the driver enables them (that includes clearing interrupt enables on system reset) and enable them in the driver only after all handlers are registered (also, clear interrupt status before enabling interrupts).
Second thing to look at would be bus master transfers, same thing applies: Make sure the device doesn't do anything until it's asked to and let the driver make sure that no busmaster transfers are active before enabling busmastering at the device level.
The fact that the kernel gets stuck as soon as you install your driver module makes me wonder if any other driver (built in to kernel?) is already driving the device. I made this mistake once which is why i am asking. I'd look for the string "kernel driver in use" in the output of 'lspci' before installing the module. In any case, your printk's should be visible in dmesg output.
in addition to Claudio's suggestion, couple more debug ideas:
1. try kgdb (https://www.kernel.org/doc/htmldocs/kgdb/EnableKGDB.html)
2. use JTAG interfaces to connect to debug tools (these i think vary between devices, vendors so you'll have to figure out which debug tools you need to the particular hardware)

How to distinguish Windows driver from dll

I need to differ two binary files - a driver and a common dll. As far as I understand I need to view sections of this files (e.g. via DumpBin) and see if there is an INIT section. Is this criteria complete?
You need to parse the binary and look into Subsystem filed of IMAGE_OPTIONAL_HEADER, if it's NATIVE, then it's a driver. Look into the following link for details:
http://msdn.microsoft.com/en-us/library/ms809762.aspx
You would have to use heuristics to establish this fact and be certain to the extent possible. The problem is that there literally exist native user-mode programs (e.g. autochk.exe) and DLLs (frankly nothing comes to mind off hand, but I've seen them as part of native programs that do stuff before winlogon.exe gets to run) as well as kernel-mode counterparts (bootvid.dll, hal.dll and the kernel in one of its various forms ntoskrnl.exe).
So to establish it is a driver you could try the following:
IMAGE_OPTIONAL_HEADER::SubSystem, as pointed out, should signify that it's "native" (i.e. has no subsystem: IMAGE_SUBSYSTEM_NATIVE)
Verify that the IMAGE_FILE_HEADER::Characteristics is not DLL (which would mean it's a kernel or user mode DLL, check against IMAGE_FILE_DLL)
Make sure it does or does not import ntdll.dll or another user mode DLL or to the contrary that it imports one of the kernel mode modules (ntoskrnl.exe, hal.dll, bootvid.dll) to establish whether it would run in kernel or user mode.
The structs and defines are all included in winnt.h.
The gist:
establish the subsystem (only IMAGE_SUBSYSTEM_NATIVE is interesting for your case)
establish it is a DLL or not
establish whether it links against user or kernel mode components

How to write to I/O ports in Windows XP? (Delphi7)

I am trying to write to ports 0x60 and 0x64, with no luck.
Delphi code:
procedure PortOut(IOport: WORD; Value: BYTE); assembler; register;
asm
XCHG DX,AX
OUT DX,AL
end;
Upon calling PortOut, I get an EPrivilege Privileged instruction exception, because IN and OUT may only execute as Ring0.
I would like to know how I can get Ring0 privileges my an application or how I could write to ports 0x60 and 0x64 using some existing external library.
Have a look at the IO.DLL from Geek Hideout.
IO.DLL allows seamless port I/O
operations for Windows
95/98/NT/2000/XP using the same
library.
Here is an example: Parallel Port I/O Using Delphi V 6.0
The correct way to handle this situations is to write a Windows driver, but it can't be done in Delphi for lack of support. It requires the DDK and a C compiler. The other solutions presented here works, but be aware that usually the give access to any I/O port, not only those your application requires. And that could be a security issue - if the x86 architecture lets the system programmer to define the IOPL (I/O privilege level) and most systems set it a ring 0, there's a reason.
General access I/O ports drivers are useful for tests and maybe prototyping or as stopgap measures, but I will be very careful to deploy them, especially if the system in not under strict control. If you need that kind of access, you definitely need to understand how the Windows kernel and its drivers works, and why - and implement your own driver.
Most of the time Windows in not unsecure per se - it is running to many unsecure software with the wrong privileges that make it so.
Inpout32.dll for Windows 98/2000/NT/XP (binaries and source code)
Inpoutx64.dll for WIN XP 64 bit (binaries and source code)
Delphi: Accessing Port Hardware and how to use InpOut32.dll
Of course that might cause trouble for devices that are controlled by a driver. Stuff like IO.DLL is mostly meant to interface cards for which no Windows drivers exist, or where the windows driver is dormant until activated.
And since port 60h is the keyboard controller, and the keyboardi is usually in use, it might cause problems.
If you are not interfacing ancient hardware, but just trying to port dos (TP) code, I urgently advise you to rewrite the said code based on normal Windows APIs.
Jeez,
It has been a long time for me. I just launched my DPro 2006 to look at the VCL on this and it bombed. (Guess that is what I get for not doing any Delphi code in the the last couple of years on this machine... and keeping patching up to date, plus installing/uninstalling a jillion other paid and FOSS packages on the box....)
But it would seem to me that if you grabbed the header files for the Windows Driver Framework, or check out Project JEDI's site, you might find something to put together a Miniport driver or such.
Just my $0.02 worth
/s/ BezantSoft

Resources