Do we need surprise hotplug support to remove device and rescan bus through sysfs? - pci-e

We have a NVMe SSD that requires a power cycle (because of firmware updates that does not work with controller reset). We are exploring using NVMe subsystem reset. But our platform does not support surprise hotplug, so we are trying to use sysfs to remove the drive and rescan the bus.
To remove: echo 1 > /sys/pci/bus/devices/<pci_address_of_drive>/remove
To rescan: echo 1 > /sys/pci/bus/rescan
This works majority of the time and the drive is able to enumerate after this. But sometimes, the drive fails to enumerate. A system reboot puts the drive back in good state. We are not seeing any issues with the drive firmware (verified using telemetry).
I am trying to understand if we need surprise hotplug support to remove device and rescan bus through sysfs? I am not able to find any resources related to this. Any leads related to this will be very helpful.
Thanks in advance.

Related

How to tell linux retrain and scan PCIe bus?

We have an embedded board that has an iMX8M-Plus Processor and Linux v5.4.161. This board has one PCIe bus and that one is connected to an FPGA. When we power up the board, the FPGA is not yet configured, so it acts as if it was not on the PCIe bus.
Once the Linux is fully booted, we configure the FPGA and only after that it starts acting as a PCIe endpoint (device).
At this point, when I run lspci -> it returns nothing.
When I first execute echo "1" > /sys/bus/pci/rescan as suggested here and here and then lspci, I still get nothing.
But if I reboot the linux without reseting the FPGA, it starts being visible in the lspci list. Rebooting the linux is not an option for us. Somehow I need to tell the linux that whatever it's doing at the boot time, please do it again at runtime. But I couldn't find a solution for this so far.
According to the Texas Instrument support forum, they said if the PCIe link is not trained at the boot time, rescan command never works.
At the boot time, while linux loads a pci driver, it tries to establish a PCIe link, I can see that with an oscilloscope, PERST pin is asserted and PCIE_CLK generated for a while and then stops if it can not detect any device. But the rescan command never does that.
Also in the system there is no pcie device to executeecho 1 > $pcidevice/remove in order to make rescan functional. Or there is no device or bus to set power off and on back like echo 0 > /sys/bus/pci/slots/.../power
I also learned that there was a method in old linux times (v2.6) called adding a Fake PCIe Device which physically doesn't exist to solve this problem. For that I took the fakephp.c driver from an old linux repo and ported it to ours. After solving a couple of deprecated function problems, it is compiled for Linux Kernel v5.4. modprobe fakephp worked and driver loaded but somehow I didn't get this fake device in my device list. Here it is mentioned that the fakephp driver was removed from mainstream linux since PCI core has similar functionality, but he never mentioned how.
Short of the story is that, I am stuck here, I need my FPGA to be visible in the lspci list without restarting the linux.
I recommend configuring the FPGA in u-boot to get away from these kinds of problems. Connect up SPI pins to FPGA's config pins & run it in Slave configuration mode.

Disable Linux hard disk support

I'm building a custom linux kernel that should be able to access cdrom and usb mass storage devices, but not hard disks.
I tried disabling CONFIG_BLK_DEV_SD, but I lose usb mass storage support.
How can I achieve that? If not possible, is there a way to remove hard disk nodes in /dev at startup?
First, you need to define, what exactly "hard disk" means.
Second, you need to express the above definition as a set of udev rules. This way, device nodes for devices you don't want would not even get created in /dev/ in the first place.
One nice tutorial for udev rules is here:
http://www.reactivated.net/writing_udev_rules.html
Relevant Q/A:
https://unix.stackexchange.com/questions/66897/what-is-the-udev-rule-to-allow-specific-thumb-drive-vendors
Frankly, I'm amazed you even managed a bootable system with CONFIG_BLK_DEV_SD disabled: modern Linux kernels funnel virtually all storage I/O through the SCSI layer, then treat the specific types (SATA, PATA, USB mass storage, etc.) as flavors of SCSI.
I'd try disabling things at the next layer down in the system: enable SCSI disk and CD-ROM support, then disable all methods of actually talking to those disks: low-level SCSI drivers, ATA SFF support, ACHI support, etc.

Unload kernel module for only a specific device (preferrably from code in another kernel module)

I'm working on a project where I have a management system that exports PCIe hardware devices to other systems via PCI Express. I have a working management kernel module but need to find a way to ensure that a device I export doesn't have a driver already loaded for it on the management system. Without that, the device will end up with conflicts since the same driver will be accessing it from 2 different systems & obviously cause problems.
For example, assume I have a dual-port Intel 100MBps NIC device installed on the Manager which will show up 2 PCIe Endpoints in the system (eg Fn 0 & 1). The Intel module e1000 will be loaded for both devices. If I want to export port 2 of that device to another system, I would like to "detach" it from the e1000 module.
Does anyone know a clean way of doing this without hacking the kernel or tweaking the e1000's driver's probe function? I can't simply do an rmmod because that will remove the module all-together for both NIC devices. I would like the NIC I'm not exporting to remain functional in the Management system with the e1000 driver still loaded for it.
Essentially, rmmod does this but will remove the driver for all devices probed for & owned by the driver. Any way to tell Linux just "unload module for only this specific device"? On Windows, I guess this would be the equivalent of right-clicking on a device in Device Manager & select "Disable".
You can disable driver for your device by writting following method:
Use sudo -i or before any command write sudo to operate as root user.And follow below procedure:
Goto /sys/bus/pci/<driver_name>/ folder.
Give command echo -n 0000:03:00.1 > unbind
Where 0000:03:00.1 is device you want to detech your driver.
Read this link for getting idea about sysfs for pci bus.
The mechanism responsible for device/driver pairing in Linux driver model is called "bus" (usually controlled through entries in /sys/bus). The problem is, the particular bus driver your device is attached to must support this sort of action (and it is far from trivial supporting this functionality in general case).
Specifically for PCI, if you've got "pci hotplug" enabled, you will be able to kick devices off pci bus by writing numbers to corresponding "hotplug" entries in /sys/bus/pci (you can also reconnect them back by triggering the bus rescan). The problems will start later, as you will have to somehow convince the Linux device subsystem to prefer your driver over the one already registered for the device id in question.
Normally registered drivers are added to some sort of list and then tried out one by one to see if either of them is listing the new or re-enabled device in their "*_device_id" tables. If PCI subsystem prefers to try drivers in "first registered, first tried" order you will have to hack it to achieve your goal.
To unbind a PCI driver from a device, use the unbind file of the driver in sysfs.
From Documentation/ABI/testing/sysfs-bus-pci:
/sys/bus/pci/drivers/.../unbind
Description:
Writing a device location to this file will cause the
driver to attempt to unbind from the device found at
this location. This may be useful when overriding default
bindings. The format for the location is: DDDD:BB:DD.F.
That is Domain:Bus:Device.Function and is the same as
found in /sys/bus/pci/devices/. For example:
# echo 0000:00:19.0 > /sys/bus/pci/drivers/foo/unbind
you can disable a particular pci device by resetting enable value for the corresponding device
Eg:
echo 0 > /sys/bus/pci/devices/0000:00:1a.2/enable

How do I reset USB devices using the Windows API?

Do you know a way to use the Windows XP API to reset the USB bus? In other words, I'd like the OS to kick out any USB devices that are currently connected, and then auto-detect everything anew.
I'm aware of devcon, and I suppose I could do system calls out to it, but I'm hoping for a direct call into the API.
From kernel mode: You can force a specific USB device to be re-connected, as if it was unplugged and replugged again, by sending an IOCTL_INTERNAL_USB_CYCLE_PORT to its PDO. (This can only be done from a kernel mode, e.g. through a helper driver.) This 'cycle' operation will cause a USB reset to occur, after which the device would be re-enumerated. For example, if the device comes back with a different USB device descriptor, a different driver may be matched for it.
From user mode: You can do this by ejecting the device through the CfgMgr API. For example, to go over all USB hubs and eject all devices:
Find all devices having device interface GUID_DEVINTERFACE_USB_HUB with SetupDiGetClassDevs(... DIGCF_DEVICEINTERFACE).
Enumerate over the returned device information set (SetupDiEnumDeviceInfo).
For each device, get the DevInst member:
Invoke CM_Get_Child(DevInst) and then CM_Get_Sibling repeatedly to go over all child nodes of the hub (i.e. the USB devices).
For each child node, call CM_Request_Device_Eject.
Well, use can use the Setup API (SetupDiXXX functions) to enumerate the USB devices in the system, and then call WinUsb_ResetPipe on each one, but I'm not sure if that's what you're looking for. It's been a while since I worked with USB devices, but as I recall, there is no standard way to reset a device (i.e. simulate a power off/power on cycle). If it's possible for a particular device, you'd have to send an appropriate IOCTL (using DeviceIOControl) to the driver. The IOCTL would vary from manufacturer to manufacturer.
It's possible to cycle the parent port on the USB hub the device is attached to, as well. This will result in, among other things, apparrent unplug/replug actions, as you will see a balloon popup when this occurs.
Much of this is poorly documented, and honestly, I've gotten the impression there are only a handful of people at Microsoft who really understand it well. The design decision I've made for future devices I design is that I intend to include watchdog functionality on both sides, as well as a device-side full reset function. That way, if the device figures out it is confused, it can just cut its own power for a second and fully reset, if the host can't communicate with it, it could do the same thing, and if the device thinks everything is fine but the host knows better, the host could order it to reset.
There are at least three APIs worth looking into for this problem: the Setup API, the Config Manager API, and various WMI extensions. However, be cautious about diving into WMI if you intend to use an Embedded XP target, as you will have to include a lot of other things in your OS image you might otherwise not need.
As far as I know, there is no way to do this - you can issue a command to have PnP rescan the bus for new devices, but that isn't the same as issuing a bus reset.
Furthermore, just because from a hardware perspective you issued a bus reset doesn't mean that Windows will remove the PDOs that represent the children of the hub and redetect them; the USB bus driver can (and does) do just what I describe (i.e. issue hardware bus resets without disturbing the device tree), and only after the device doesn't respond does it issue the surprise removal and yank it from the tree.

Killing power to a USB port

Is there a way to programatically turning off the power or killing a USB port on the Mac?
I believe that the USB power typically comes directly from the power supply. It might go through the motherboard or some other hardware to combine it with the data lines, but I don't think the voltage ever goes through any programmable circuitry. If you wanted to deactivate the data transfer, that would probably be possible since that is handled by an extension file (IOUSBMassStorageClass.kext), but short of modifying the port physically, I don't think you will be able to deactivate the power.
If you're refering to USB storage devices, according to the Tiger Security Configuartion Manual (pdf):
6 To remove support for mass storage devices (e.g. USB flash drives, external
USB hard drives, external FireWire Hard Drives), drag the following files to the Trash:
IOUSBMassStorageClass.kext
IOFireWireSerialBusProtocolTransport.kext
7 Open the /System/Library folder.
8 Drag the following files to the Trash:
Extensions.kextcache
Extensions.mkext
9 Choose Finder > Secure Empty Trash to delete the file.
10 Restart the system.
I've seen some other stuff around but it's all much harrier. Leading gluing the ports up to get suggested quite a bit.

Resources