In Linux, how to measure/sampling GPIO pulse width - linux-kernel

My board was running ARM Linux. It will be having a GPIO to connect to an input signal, which encodes information with different pulse width but the same period. Say the period is 10ms, and a 2ms (high) pulse will present 0 and 5ms pulse will present 1.
In Linux, user space or kernel space, how do I sample this kind of continuous pulse-width shift signal? If it was a bare MCU, I know I can use a hardware clock with pulse-width-measure mode. But I did not find relevant information on Linux.
Please give me some advice.

The reason no one can give you a concrete answer is because what you're asking for is specific to the CPU in question. It can be done if the hardware supports it... For example, the Texas Instruments Jacinto6 processor, one I use, has M4 cores that can run code like a micro-controller and one of the M4 cores could be configured to handle an IRQ on a mapped GPIO. Then then when different pulse widths are detected, you could send the information to Linux running the A15 cores of the same chip. The NXP iMX family of processors, has hardware support for this too but it's not on a dedicated core like the Ti chip. For the iMX6, you could set up a GPIO to generate the IRQ with the the delta-time (from clock tree) pushed into a memory mapped register. In this case, a different Linux driver would be needed to read the memory mapped register than the Ti based solution because of the hardware differences. Hope this helps shed light on why you won't simply find the capability already there in a Linux kernel/driver....

Related

Trigger packet transmit for DPDK/DPAA2 from FPGA

I want to transmit a small static UDP packet upon receiving a trigger signal from an FPGA by GPOI. This has to be done around 1 microsecond with low latency and no jitter. My setup consists of FPGA card is connected tot NXP processor via PCIe lane.
My current experimentation showed that even starting the transmit from the GPIO interrupt handler in the kernel typically exhibits too high a jitter to be useful for the application (about one microsecond should be doable). As I am not familiar with DPDK, I wanted to ask whether it can be of any help in this situation.
Can I use DPDK to do the following
Prepare the UDP payload in Buffer.
Push the buffer to DPAA2.
Poll periodically for the GPIO from FPGA over mmaped area on PCIe in DPDK application.
Trigger the transmit of buffer in DPAA2 (and not CPU DDR memory).
Question: instead of issuing the transmit DPDK rte_eth_tx_burst the FPGA shall directly interact with the networking hardware to queue the packet. Can DPDK on NXP do the same for my use case?
note: If DPDK is not going to help, I think I would need to map an IO portal of the DPAA2 management complex directly into the FPGA. But according to the documentation from NXP, they do not consider DPAA2 a public API (unlike USDPAA) and only support it through e.g. DPDK.

Does modern PC video hardware support VGA text mode in HW, or does the BIOS emulate it (with System Management Mode)?

What really happens on modern PC hardware booted in 16-bit legacy BIOS MBR mode when you store a byte such as '1' (0x31) into the VGA text (mode 03) framebuffer at physical linear address B8000? How slow is a mov [es:di], eax store with the MTRR for that region set to UC? (Experimental testing on one Kaby Lake iGPU laptop indicates that clflushopt on WC was roughly the same speed as UC for VGA memory. But without clflushopt, mov stores to WC memory never leave the CPU and don't update the screen at all, running super fast.)
If it's not an SMI for every store, is there any way to approximate this cost on a chunk of WB memory in user-space, for performance experiments without actually rebooting into real mode? (e.g. using a BSS page as a pretend framebuffer that doesn't actually display anywhere).
The corresponding font glyph appears on screen in the next refresh, but is hardware scan-out really reading that ASCII char from VRAM (or DRAM for an iGPU) and mapping to bitmap font glyphs on the fly? Or is there some software interception on each store or once per vblank so the real hardware only has to handle a bitmapped framebuffer?
Legacy BIOS booting is well known to use System Management Mode (SMM) to emulate USB kbd/mouse as a PS/2 devices. I'm wondering if it's also used for the VGA text mode framebuffer. I assume it is used for VGA I/O ports for mode-setting but it's plausible that a text framebuffer could be supported by hardware. However, most computers spend all their time in graphics mode so leaving out HW support for text mode seems like something vendors might want to do. (OTOH this blog suggests that a homebrew verilog VGA controller can implement text mode fairly simply.)
I'm specifically interested in systems using the iGPU in Intel Skylake, but would be interested in earlier / later iGPUs from Intel and AMD, and new or old discrete GPUs.
(Including vendors other than AMD and NVidia; there are some Skylake motherboards with PCI slots, not PCIe. If modern GPU firmware drivers do emulate text mode, presumably there are some old PCI video cards with hardware VGA text mode. And maybe such a card could make stores just be a PCI transaction instead of an SMI.)
My own desktop is an i7-6700k in an Asus Z170 Pro Gaming mobo, no add-on cards just iGPU with a 1920x1200 monitor on the DVI-D output. I don't know the details of the Kaby Lake i5-7300HQ system #Eldan is testing on, only the CPU model.
I found Phoenix BIOS's patent US20120159520 from 2011,
Emulating legacy video using uefi. Instead of requiring video hardware vendors to supply both UEFI and native 16-bit real mode option-ROM drivers, they propose a real-mode VGA driver (int 10h functions and so on) that calls a vendor-supplied UEFI video driver via SMM hooks.
Abstract
[...] The generic video option ROM notifies a generic video SMM driver of the request for video services. Such notification may be performed using a software system management interrupt (SMI). Upon notification, the generic video SMM driver notifies a third party UEFI video driver of the request for video services. The third party video driver provides the requested video services to the operating system. In this way, a third party UEFI graphics driver may support a wide variety of operating systems, even those that do not natively support the UEFI display protocols.
Much of the description covers handling int 10h calls and stuff like that which already obviously trap through the IVT, thus can easily run custom code that triggers an SMI on purpose. The relevant part is what they describe for direct stores into the text-mode framebuffer which need to work even for code that doesn't trigger any software or hardware interrupts. (Other than HW triggering SMI on such stores, which they say they can use if supported.)
Text Buffer Support
[0066] In certain embodiments, applications may manipulate the VGA's
text buffer directly. In such an embodiment, generic video SMM driver
130 support this in one of two ways, depending on whether the hardware
provides SMI trapping on read/write access to the 740 KB-768 KB memory
region (where the text buffers are located).
[0067] When SMI trapping is available, the hardware generates an SMI
on each read or write access. Using the trap address of the SMI trap,
the exact text column and row may be calculated and the corresponding
row and column in the virtual text screen accessed.
Alternately,
normal memory is enabled for this region and, using a periodic SMI,
generic video SMM driver 130 scans for changes in the emulated
hardware text buffer and updates the corresponding virtual text screen
maintained by the video driver. In both cases, when a change is
detected, the character is redrawn on the virtual text screen.
This is just one BIOS vendor's patent, and doesn't tell us which way most hardware actually works, or if other vendors do different things. It does essentially confirm that some hardware exists which can trap on stores in that range, though. (Unless that's just a hypothetical possibility that they decided to cover in their patent.)
For the use-case I have in mind, trapping only on screen refresh would be vastly faster than trapping on every store so I'm curious which hardware / firmware works which way.
Motivation for this question
Optimizing an incrementing ASCII decimal counter in video RAM on 7th gen Intel Core - repeatedly storing new digits for an ASCII text counter into the same few bytes of video RAM.
I tested a version of the code in 32-bit user-space under Linux, on WB memory, hoping to approximate the situation with movnti and different ways of getting the CPU to sync its WC buffer to video RAM after each store (or perhaps occasionally in a timer interrupt). But this is not realistic if the real-mode bootloader situation isn't just storing to DRAM, but instead triggering an SMI.
On WB memory, flushing movnti stores with a lock xor byte [esp], 0 is somewhat faster than flushing with clflushopt. But #Eldan reports no speed improvement for those on VGA memory after programming an MTRR to make it WC. (And the same speed as for the original doing normal stores, indicating that by default the VGA framebuffer was UC. Some older BIOSes had an option to make VGA memory WC, which they called USWC = Uncached Speculative Write Combining.)
It's not a real-world problem so I'm not looking for actual workarounds; although it would be interesting to know if manually storing pixel bytes into a VGA graphics mode could be much faster.
Summary
Do any / all real modern systems trigger an SMI on every store to the text-mode framebuffer?
If no, can we approximate a WC store+clflush to the framebuffer, using a movnti + something in user-space on WB memory? So we can easily profile with perf for performance counters.
If different BIOSes and/or hardware use different strategies, what are those strategies? (I don't want details, just a high level like "SMI every vblank to sync the VGA framebuffer to the actual hardware framebuffer")
Would a PCIe or PCI video card with hardware VGA textmode be faster than whatever integrated GPUs actually do? I'm guessing an actual PCIe write transaction would be slower than waiting for a store to hit DRAM, but that a PCIe write would be cheaper than an SMI on every store. A ballpark / order of magnitude comparison would be interesting.
These questions are all highly related, but I can split this up if there isn't as much overlap as I expect.
Do any / all real modern systems trigger an SMI on every store to the text-mode framebuffer?
For video cards, I very much doubt it. Video card manufacturers have had the "get pixel data from char+attribute" logic built into hardware since the 1980s (it predates VGA and hasn't changed much since CGA), and just cut&paste that logic into each newer design without caring much about it.
For things that are not video cards at all (e.g. remote system management tools using LAN) I don't know but suspect not (often they use a special management CPU rather than the main CPU/s so that it works even if the computer is turned "off").
If no, can we approximate a WC store+clflush to the framebuffer, using a movnti + something in user-space on WB memory?
If you're not in user-space, you can change MTTRs (on all CPUs - MTRRs must match and there's a special sequence involved) to make an area of RAM "uncached"; or use PAT in the page tables (much easier than messing with MTRRs, especially if you're using paging anyway, but slightly different behavior due to still needing cache coherency). If you are in user-space then you will have to rely on whatever the OS/kernel provides, and (depending on which OS it is) the OS/kernel may not provide any way to do this at all.
However; even if you find a way to make (an area of) RAM uncached it still won't be very similar, because you'll be writing directly to something attached to a memory controller built into the CPU (that CPU can write to extremely quickly) instead of talking to something at the other end of a PCI link (that will have higher latency and lower bandwidth from CPU's side). Even for integrated video (where it's technically the same RAM chips in the end) writes to VRAM go through a very different path (subject to remapping/GART/paging in the video card, effected by a "write mode" VGA register, effected by bit/plane mask VGA registers, etc).
Would a PCIe or PCI video card with hardware VGA textmode be faster than whatever integrated GPUs actually do?
For writes from CPU to VRAM; typically integrated video is significantly faster than discrete cards (at least for plain writes from CPU to linear frame buffers where none of the VGA's "write logic" is involved).
For extremely rough ballpark estimates; I'd expect a single write to RAM to be around 150 cycles and a single write to PCI to be close to 1000 cycles. For SMI I'd expect a few hundred cycles of latency before SMI arrives at CPU, then the cost of CPU pipeline flush, then about 500 cycles to save CPU's state (and same loading state on the return path); then the firmware's code would have to find the cause of the SMI (another few hundred cycles?) before it could know it was a write to VRAM and not something else; then it'd have to examine the saved CPU state and find and decode the instruction that made the write (because it can't know what data was being written, if it was a byte/word/dword write, etc) while taking into account previous CPU state (which mode CPU was in, code size, etc) and keeping track of how emulating the instruction effects the future CPU state (advancing RIP, etc - don't forget that they'll be emulating every instruction that can cause a write, including things like XADD, etc). Next it would have to analyze the state of (emulated) VGA registers (write mode, write mask, plane enable, whatever controls which 64 KiB bank is mapped into the legacy area, font height, ...). Basically; for SMI emulation of a write to text mode frame buffer; I'd expect it to take tens of thousands of cycles before the firmware's code overlooks a minor but important detail buried among a huge amount of complexity, causing it to do the wrong thing and be unusably broken.
Other Notes
I found Phoenix BIOS's patent US20120159520 from 2011, Emulating legacy video using uefi.
I doubt this was ever implemented, because I doubt it can ever work. There's far too many (common and obscure) things you can do with the legacy interfaces (e.g. detect vertical refresh, setup non-standard video modes like "mode X", fiddle with "display start" to implement smooth scrolling and/or page flipping, use "CRTC info" in VBE to alter video timings, etc) that isn't supported by UEFI and can't be done via. a third party video driver for UEFI.
Instead, video card manufacturers didn't bother providing UEFI drivers for about 10 years and UEFI firmware used the legacy interface to emulate UEFI services (often breaking secure boot while they were at it); until almost everything was UEFI anyway.
I assume it (SMM) is used for VGA I/O ports for mode-setting.
I assume not. The only thing vaguely related to video that I'd suspect SMM may be used for is controlling the brightness of the screen's backlight in laptops (especially for older laptops, and especially for "lid open/close events") during early boot (before OS takes over).
.. leaving out HW support for text mode seems like something vendors might want to do
I still believe that the (eventual, after the already too long "hybrid BIOS+UEFI" transition phase) removal of 30+ years of accumulated legacy mess (A20, VGA, PS/2, PIT, PIC, ...) from hardware is one of the main reasons hardware manufacturers (Intel) are/have been pushing for UEFI adoption.
Reading through various modern Intel CPU and Platform Controller Hub (PCH) datasheets, it doesn't appear that the necessary hardware is implemented. There doesn't seem to be any way to generate an SMI (System Management Interrupt) in response to processor accesses of the VGA frame buffer (physical addresses 0xA0000 - 0xBFFFF).
The memory controller in the CPU will either route accesses to VGA frame buffer to the integrated graphics controller, the PCI Express port connected directly to the CPU, or the DMI interface connecting the CPU to the PCH. While it's possible route parts VGA frame buffer separately, this appears only meant to support a separate MDA (Monochrome Display Adapter) device. The integrated graphics controller is not well documented so it's possible that it can be configured to generate an SMI on VGA frame buffer accesses, but this seems unlikely. In any case, it wouldn't work with discrete graphics.
Intel PCH's also don't seem to have any support for generating SMIs in response to VGA frame buffer accesses. This would be the most natural place for it, as it already has support for generating SMIs in response to I/O accesses to the keyboard controller, IDE controller and other legacy devices. It possible that there's some undocumented feature that does this, but it's not included in the lists of possible SMI sources given in the PCH datasheets.
Theoretically, it would be possible for a motherboard manufacture to connect a fake VGA device to the PCH through a PCI Express port and then generate SMIs using a PCH GPIO pin. However, I'm not sure this will work in practice. By the time the CPU gets the SMI it could have moved on to executing other instructions and it wouldn't be possible to examine the CPU state at the time of the frame buffer access.
(A similar problem happened with SoundBlaster 16 emulation on the SoundBlaster Live. It would generate a PCI SERR# when the legacy SoundBlaster ports were accessed, which would generate a NMI on the CPU. Unfortunately the emulation would break on many Pentium 4 motherboards because the NMI would arrive on the next or subsequent instruction.)

How does x86 assigns interrupt number for PCI device in Linux?

My understanding is BIOS or EFI detects the hardware during bootup and determines interrupt number, then passes it to Linux once kernel is up and running. And based on my research the lower the interrupt number the higher its priority.
My question is how does BIOS/EFI decide which hardware should have high priority over another? Is it something that is configurable or is hardcoded by BIOS/EFI?
Kind of.
When using the legacy 8259A PIC chip, one of the priority modes is based on the IRQ number - with lower IRQs having more priority.
However with the IO APIC and the MSI(X) technology the IRQ priority is handled in the LAPIC and it is configurable by the OS.
For the legacy scenario, these devices have fixed IRQs (not configurable).
The priority was assigned so that important/frequent tasks could interrupt less important/frequent ones.
Today those devices are emulated and their IRQ can be reassigned (in same case, it depends on the chipset/superio/embedded controller) if needed but that could cause some compatibility issue.
So every device that impersonate a legacy one (e.g. an HDD) is usually assigned its legacy IRQ number.
A different topic is the PCI interrupts (PCIe deprecated the INTx# lines in favour of MSI) for non legacy devices (e.g. a NIC).
Those were (are) the real programmable IRQs, each PCI-to-PCI bridge remap its four PIRQA-PIRQD input pins to its four INTA#-INTD# output pins (that are connected to the bridge's parent PIRQA-PIRQD pins in a tangled fashion).
The Host-to-PCI-bridge INTA#-INTD# connects (conceptually) to the 8259A and the IO-APIC.
The mapping is configurable with some chipset registers (e.g. see Chapter 29 of the Intel Series 200 PCH datasheet Volume 2).
So the firmware is free to remap at least the PCI interrupts for non legacy devices. I think the algorithm used is simply to assign the lower free IRQ to the most "important" device.
However, as said above, as soon as the OS switch away from the 8259A mode these priorities stop to matter.

How to do an edge counter/clock detector for a Raspberry PI(as linux kernel driver)?

Hej
I connected to one of my raspberry pi pins to a voltage controlled oscillator. A change of one of the oscillator capacity changes the output frequency. My question is how to realize the frequency detection?
There some constraints:
1st.) I like to keep the cpu consumption low.
2nd.) I like to have a file based interface, which my user space app can query for the actual frequency.
I was checking the Linux GPIO driver. But I see no possibility to configure a pin as an edge counter.
The raspberry pi soc data sheet describes at page 86 (chapter "6 General Purpose I/O (GPIO)")registers to configure a gpio to detect edges.
Any ideas how to configure these registers in a kernel character device driver?

What is the advantage of using GPIO as IRQ.?

I know that we convert the GPIO to irq, but want to understand what is the advantage of doing so ?
If we need interrupt why can't we have interrupt line only in first place and use it directly as interrupt ?
What is the advantage of using GPIO as IRQ?
If I get your question, you are asking why even bother having a GPIO? The other answers show that someone may not even want the IRQ feature of an interrupt. Typical GPIO controllers can configure an I/O as either an input or an output.
Many GPIO pads have the flexibility to be open drain. With an open drain configuration, you may have a bi-direction 'BUS' and data can be both sent and received. Here you need to change from an input to an output. You can imagine this if you bit-bash I2C communications. This type of use maybe fine if the I2C is only used to initialize some other interface at boot.
Even if the interface is not bi-directional, you might wish to capture on each edge. Various peripherals use zero crossing and a timer to decode a signal. For example a laser bar code reader, a magnetic stripe reader, or a bit-bashed UART might look at the time between zero crossings. Is the time double a bit width? Is the line high or low; then shift previous value and add two bits. In these cases you have to look at the signal to see whether the line is high or low. This can happen even if polarity shouldn't matter as short noise pulses can cause confusion.
So even for the case where you have only the input as an interrupt, the current level of the signal is often very useful. If this GPIO interrupt happens to be connected to an Ethernet controller and active high means data is ready, then you don't need to have the 'I/O' feature. However, this case is using the GPIO interrupt feature as glue logic. Often this signalling will be integrated into a dedicated module. The case where you only need the interrupt is typically some custom hardware to detect a signal (case open, power disconnect, etc) which is not industry standard.
The ARM SOC vendor has no idea which case above the OEM might use. The SOC vendor gives lots of flexibility as the transistors on the die are cheap compared to the wire bond/pins on the package. It means that you, who only use the interrupt feature, gets economies of scale (and a cheaper part) because other might be using these features and the ARM SOC vendor gets to distribute the NRE cost between more people.
In a perfect world, there is maybe no need for this. Not so long ago when tranistors where more expensive, some lines did only behave as interrupts (some M68k CPUs have this). Historically the ARM only has a single interrupt line with one common routine (the Cortex-M are different). So the interrupt source has to be determined by reading another register. As the hardware needs to capture the state of the line on the ARM, it is almost free to add the 'input controller' portion.
Also, for this reason, all of the ARM Linux GPIO drivers have a macro to convert from a GPIO pin to an interrupt number as they are usually one-to-one mapped. There is usually a single 'GIC' interrupt for the GPIO controller. There is a 'GPIO' interrupt controller which forms a tree of interrupt controllers with the GIC as the root. Typically, the GPIO irq numbers are Max GIC IRQ + port *32 + pin; so the GPIO irq numbers are just appended to the 'GIC' irq numbers.
If you were designing a bespoke ASIC for one specific system you could indeed do precisely that - only implement exactly what you need.
However, most processors/SoCs are produced as commodity products, so more flexibility allows them to be integrated in a wider variety of systems (and thus sell more). Given modern silicon processes, chip size tends to be constrained by the physical packaging, so pin count is at an absolute premium. Therefore, allowing pins to double up as either I/O or interrupt sources depending on the needs of the user offers more functionality in a given space, or the same functionality in less space, depending on which way you look at it.
It is not about "converting" anything - on a typical processor or microcontroller, a number of peripherals are connected to an interrupt controller; GPIO is just one of those peripherals. It is also by no means universally true; different devices have different capabilities, but in any case you are simply configuring a GPIO pin to generate an interrupt - that's a normal function of the GPIO not a "conversion".
Prior to ARM Cortex, ARM did not define an interrupt controller, and the core itself had only two interrupt sources (IRQ and FIQ). A vendor defined interrupt controller was required to multiplex the single IRQ over multiple peripherals. ARM Cortex defines an interrupt controller and a more flexible interrupt architecture; it is possible to achieve zero-latency interrupt from a GPIO, so there is no real advantage in accessing a dedicated interrupt? Doing that might mean the addition of external signal conditioning circuitry that is often incorporated in GPIO on the die.

Resources