FPGA to DMA to RDMA - fpga

I am trying to send data generated from my FPGA card out to an IB device. I want the latency to be as low as possible, so I am thinking this may be the data path.
FPGA --> DMA via scatter/gather DMA into Memory Buffer --> RDMA into a ConnectX-6 card --> IB cable --> my other device.
With this potential solution, I have a bunch of unknowns that I cant seem to find on the internet and was hoping someone could assist:
Is this possible/viable? I have never worked with DMA and RDMA and want to make sure it can work before purchasing. I fear it may be a one or the other situation and you can't do both or doing both will cause latency somehow or lost data.
Ideally, I want it to reach the other devices CPU (I just want it to avoid the Host device's CPU), but it seems like RDMA makes it avoid both CPUs? Would it then just be DMA to my ConnectX card? I've been searching the datasheets/manuals/firmware/support to see if the ConnectX cards can support DMA, but it doesn't seem to be possible? They just support RDMA (which is a subset of DMA.)
Any information/guidance would be appreciated. If I am in the wrong group, let me know. I wasn't sure if it belonged here or in the electrical engineering one (there seemed to be more DMA/RDMA questions in here)

Related

esp32 EEPROM read/write cycle

I am using ESP32 module for BLE & WiFi functionality, I am writing data on EEPROM of ESP32 module after every 2 seconds.
How many read/write cycles are allowed as per standard features of ESP32 module? based on which I need to calculate EEPROM life time and number of readings (with frequency) I can store.
The ESP32 doesn’t have an actual EEPROM; instead it uses some of its flash storage to mimic an EEPROM. The specs will depend on the specific SPI flash chip, but they’re likely to be closer to 10,000 cycles than 100,000. Writing to it every couple of seconds will likely wear it out pretty quickly - it’s not a good design choice, especially if you keep rewriting the same location.
I'm very late here, but an SD card seems like the ideal option for you. If you want to save just a few bytes, you can use FeRAM (also called FRAM). It's a combination between RAM and ROM, it's vast, and the data stays on it after power off. It is pretty expensive, so you might want to go with the SD card or web server option. I just wanted to tell you that this existed, I also know this for like a few months.
At that write rate even automotive grade EEPROM like the 24LC001 which supports at least 1,000,000 writes will only last about 2 months!
I think microchip has EERAM which supports infinite writes and will not loose contents on power loss.
Check the microchips 47L series.

Connect stack of Parallela boards and a rPI via FPGA and 1/0 pins

I want to conect my Pi and Parallella such that the Pi does the GPU side and the Parallella stack this is to be controled by a third Parallella
I think the best way to do this is through an FPGA. Is this possible and a good way to do it?
Also what structure should I use and how should I start to implement it?
I know little VHDL and Verilog and do not want to use paid software.
I am eager to learn and have a lot of time to do it though so no "simple but bad solutions".
I will up load the project on Git when done
The solution depends on the bandwidth and latency requirements. You are right that FPGA provides the largest bandwidth and lowest latency. However, do you really need such good performance? Maybe USB or Ethernet connections are good enough.
For the FPGA solution, consider the secondary pi and parallella as two peripherals for the primary pi, and assign different address spaces for them. The communications among three devices are based on polling initiated by the primary pi. FPGA should pass the signaling on data/address bus to the two peripherals with compatible I/O timing. Peripherals consider the FPGA as a RAM, and should listen to any data/controls with their best effort. FPGA should buffer the data/control signals if peripherals cannot respond in real-time.
Overall, it's a very tough work. I'd like to see the source code if the FPGA solution works.

(libusb) Confusion about continous isochronous USB streams

I am using a 32-bit AVR microcontroller (AT32UC3A3256) with High speed USB support. I want to stream data regularly from my PC to the device (without acknowledge of data), so exactly like a USB audio interface, except the data I want to send isn't audio. Such an interface is described here: http://www.edn.com/design/consumer/4376143/Fundamentals-of-USB-Audio.
I am a bit confused about USB isochronous transfers. I understand how a single transfer works, but how and when is the next subsequent transfer planned? I want a continuous stream of data that is calculated a little ahead of time, but streamed with minimum latency and without interruptions (except some occasional data loss). From my understanding, Windows is not a realtime OS so I think the transfers should not be planned with a timer every x milliseconds, but rather using interrupts/events? Or maybe a buffer needs to be filled continuously with as much data as there is available?
I think my question is still about the concepts of USB and not code-related, but if anyone wants to see my code, I am testing and modifying the "USB Vendor Class" example in the ASF framework of Atmel Studio, which contains the firmware source for the AVR and the source for the Windows EXE as well. The Windows example program uses libusb with a supplied driver.
Stephen -
You say "exactly like USB Audio"; but beware! The USB Audio class is very, very complicated because it implements a closed-loop servo system to establish long-term synchronisation between the PC and the audio device. You probably don't need all of that in your application.
To explain a bit more about long-term synchronisation: The audio codec at one end (e.g. the USB headphones) may run at a nominal 48KHz sampling rate, and the audio file at the other end (e.g. the PC) may be designed to offer 48 thousand samples per second, but the PC and the headphones are never going to run at exactly the same speed. Sooner or later there is going to be a buffer overrun or under-run. So the USB audio class implements a control pipe as well as the audio pipe(s). The control pipe is used to negotiate a slight speed-up or slow-down at one end, usually the Device end (e.g. headphones), to avoid data loss. That's why the USB descriptors for audio device class products are so incredibly complex.
If your application can tolerate a slight error in the speed at which data is delivered to the AVR from the PC, you can dispense with the closed-loop servo. That makes things much, much simpler.
You are absolutely right in assuming the need for long-term buffering when streaming data using isochronous pipes. A single isochronous transfer is pointless - you may as well use a bulk pipe for that. The whole reason for isochronous pipes is to handle data streaming. So a lot of look-ahead buffering has to be set up, just as you say.
I use LibUsbK for my iso transfers in product-specific applications which do not fit any preconceived USB classes. There is reasonably good documentation at libusbk for iso transfers. In short - you decide how many bytes per packet and how many packets per transfer. You decide how many buffers to pre-fill (I use five), and offer the libusbk driver the whole lot to start things going. Then you get callbacks as each of those buffers gets emptied by the driver, so you can fill them with new data. It works well for me, even though I have awkward sampling rates to deal with. In my case I set up a bunch of twenty-one packets where twenty of them carry 40 bytes and the twenty-first carries 44 bytes!
Hope that helps
- Tony

Questions on how network cards in Windows work

I am trying to figure out how network cards work in Windows, and how the data is being relayed.
I have two hypotheses.
1.
Data is received by the network card.
The card then puts the data in an internal buffer, possibly a double buffer or a ring buffer.
The card accumulates data until some amount has been reached, upon which it sends an interrupt.
Windows copies the data from the card to the RAM and notifies appropriate handlers.
2.
Data is received.
The card puts the data in the RAM using DMA. (Does DMA guarantee that data will not be lost, or does the card still need its own buffer?)
The card fires an interrupt upon putting enough data in the RAM.
Windows receives the interrupt and copies or exposes the data to appropriate handlers.
Are either of my hypotheses correct?
Is there any message from the card or Windows if buffers are full?
In my Windows systems properties for my ethernet controller I can see properties called "Receive buffers" and "Transmit buffers", both are set to 256.
What does this mean?
Are there any good literature on this subject? (I have Tanenbaum's Modern Operating Systems, but it is not specifically related to Windows.)
Your question subsumes (at least!) three very, very broad topics:
1) how does a Layer 2 (Data Link) hardware device work?
2) How does it relate to the operating system's network stack
... and ...
3) How does it relate to the operating system's kernel-level device driver?
The next link is actually 180 degrees opposite your original question (the API is relatively high level, your question pertains to the lowest software levels), but it wouldn't hurt to look at the .Net API for perspective "how things work":
http://msdn.microsoft.com/en-us/library/4as0wz7t.aspx
'Hope that helps ... at least a little bit...
PS:
Linux is a wealth of information about implementing a network stack: all of the kernel source and all of the device drivers are completely available, and very well documented.

What happens when we plug a piece of hardware into a computer system?

When we plug a piece of hardware into a computer system, say a NIC (Network Interface Card) or a sound card, what happens under the hood so that we coud use that piece of hardware?
I can think of the following 2 scenarios, correct me if I am wrong.
If the hardware has its own memory chips, someone will arrange for a range of address space to map to those memory chips.
If the hardware doesn't have its own memory chips, someone will allocate a range of address in the main memory of the computer system to accomodate that hardware.
I am not sure the aforemetioned someone is the operating system or the CPU.
And another question: Does hardware always need some memory to work?
Am I right on this?
Many thanks.
The world is not that easily defined.
first off look at the hardware and what it does. Take a mouse for example, it is trying to deliver x and y coordinate changes and button status, that can be as little as a few bytes or even a single byte two bits define what the other 6 mean, update x, update y, update buttons, that kind of thing. And the memory requirement is just enough to hold those bytes. Take a serial mouse there is already at least one byte of storage in the serial port so do you need any more? usb, another story just to speak usb back and forth takes memory for the messages, but that memory can be in the usb logic, so do you need any more for such small information.
NICs and sound cards are another category and more interesting. For nics you have packets of data coming and going and you need some buffer space, ring, fifo, etc to allow for multiple packets to be in flight in both directions for efficiency and interrupt latency and the like. You also need registers, these have their storage in the hardware/logic itself and wont need main memory. In both the sound card case and the nic case you can either have memory on the board with the hardware or have it use system memory that it can access semi-directly (dma, etc). Sound cards are similar but different in that you can think of the packets as being fixed sized and continuous. Basically you need to ping-pong buffers to or from the card at some rate, 44100khz 16 bit per sample stereo is 44100 * 2 * 2 = 176400 bytes per second, say for example the driver/software is preparing the next 8192 bytes at a time and while the hardware is playing the pong buffer software is filling the ping buffer, when hardware drains the pong buffer it indicates this to the software, starts draining the ping buffer and the software fills the ping buffer.
All interesting stuff but to get to the point. With the nic or sound card you could have as little as two registers, an address/command register and a data register. Quite painful but was often used in the old days in restricted systems, still used as well. Or you could go to the other extreme and desire to have all of the memory on the device mapped into system memory's address space as well as each register having its own unique address. With audio you dont really need random access to the memory so you dont really need this, graphics you do, nic cards you could argue do you leave the packet on the nic or do you make a copy in system memory where you can have a much larger software buffer/ring freeing the hardwares limited buffer/ring. If on nic then you would want random access, if not then you dont.
For isa/pci/pcie, etc on x86 systems the hardware is usually mapped directly into the processors memory space. So for 32 bit systems you can address up to 4GB, well even if you have 4GB worth of memory some of that memory you cannot get to because video cards, hardware registers, PCI, etc consume some of that address space (registers or memory or both, whatever the hardware was designed to use). As distasteful as it may appear to day this is why there was a distiction between I/O mapped I/O and memory mapped I/O on x86 systems, its another address bit if you will. You could have all of your registers in I/O space and not lose memory space, and map memory into nice neat aligned chunks, requiring less of your ram to be replaced with hardware. either way, isa had basically vendor specific ways of mapping into the memory space available to the isa bus, jumpers, interesting detection schemes with programmable address decoders, etc. PCI and its successors came up with something more standard. When the computer boots (talking x86 machines in general now) the BIOS goes out on the pcie bus and looks to see who is out there by talking to config space that is mapped per card in a known place. Using a known protocol the cards indicate the desired amount of memory they require, the BIOS then allocates out of the flat memory space for the processor chunks of memory for each device and tells the device what address and how much it has been allocated. It is certainly possible for the operating system to re-do or override this but typically the BIOS does this discovery for the system and the operating system simply reads the config space on each device which includes the vendor id and device id and then knows how and where to talk to the device. For this memory space I believe the hardware contains the memory/registers. For general system memory to dma to/from I believe the operating system and device drivers have to provide the mechanism for allocating that system memory then telling the hardware what address to dma to/from.
The x86 way of doing it with the bios handling the ugly details and having system memory address space and pci address space being the same address space has its pros and cons. A pro is that the hardware can easily dma to/from system memory because it does not have to know how to get from pcie address space to system address space. The negative is the case of a 32 bit system where pcie normally consumes up to 1GB of address space and the dram you bought for that hole is not available. The transition from 32 bit to 64 bit is slow and painful, the bioses and pcie chips are still limiting to the lower 4gig and limiting to 1gb for all the pcie devices, even if the chipset has a 64 bit mode, and this is with 64 bit processors and more than 4gb of ram. the mmu allowes for fragmented memory so that is not an issue. Slowly the chipsets and bioses are catching up but it is taking time.
USB. these are serial mostly master/slave protocols. Like a serial port but bigger and faster and more complicated, and like a serial port both the master and slave hardware need to have ram to store the messages, very much like a nic. Like a nic, in theory, you can be register based and pull the memory sequentially or have it mapped in to system memory and have random access to it, etc. Think of it this way, the usb interface can/does sit on a pcie interface even if it is on the motherboard. A number of devices are pcie devices on your motherboard even if they are not an actual pcie connector with a card. And they fall into the pcie cagetory of how you might design your interface or who has what memory where.
Some devices like video cards have lots of memory on board, more than is practical or is at least painful to allow all of it to be mapped into pcie memory space at once. And these would want to use a sliding window type arrangement. Tell the video card you want to look at address 0x0000 in the video cards address space, but your window may only be 0x1000 bytes (for example) in system/pcie space. When you want to look at addresses 0x1000 to 0x1FFF in video memory space you write some register to move the window then the same pcie memory space accesses different memory on the video card.
x86 being the dominant architecture has this overlapped pcie and system memory addressing thing but that is not how the whole world works. Other solutions include having independent system and pcie address spaces, with sliding windows, like the video card problem above, allowing you to have say a 2gb video card mapped flat in pcie space but limiting the window into pcie space to something not painful for the host system.
hardware designs are as varied as software designs. take 100 software engineers and give them a specification and you may get as many as 100 different solutions. Same with hardware give them a specification and you may get 100 different pcie designs. Some standards are in place to limit that, and/or cloning where you want to make a sound blaster compatible card, you dont change the interface, but given the freedom software has the hardware can and will vary and with the number of types of pcie devices (sound, hard disk controllers, video, usb, networking,etc) you will get that many different mixes of registers and addressable memory.
sorry for the long answer, hope this helps. I would dig through linux and/or bsd sources for device drivers along with programmers reference manuals if you can get access to them, and see how different hardware designs use register and memory space and see what designs are painful for the software folks and what designs are elegant and well done.
The answer depends on what is the interface of the hardware- is it over USB or PCI-Express? (and there could be others connectivity methods too - USB and PCI-Express are the most common)
With USB
The host learns about the newly arrived device by reading the descriptors and loads the appropriate device driver. The device would have presented its ID that is used for Plug n Play. The device is also assigned an address by the Host. Once the device driver kicks-in it configures the device and makes it ready for data transfer. The data transfer is done using IRP, the transfer technique and how the IRPs are loaded depend upon whether the transfer is isochronous data or bulk or other modes.
So to answer your second question - yes the hardware needs some memory to work. The Driver and the USB Host Controller Driver together setup the Memory on the host for the USB Device - the USB Device Driver then accordingly communicates/drives the device.
With PCI-Express
It is similar - sorry I do not have hands on experience with PCI-Express.

Resources