Is there a way for two custom PCIe cards to talk directly to each on when plugged into a server (no switch)? - pci-e

We saw this answer: Direct communication between two PCI devices
Which goes a long way towards answering the question. But we wanted to poke a bit at it, to see if there's any wiggle room.
So, we are making custom PCIe cards, with custom drivers. The only thing outside our control is the chipset in the Server that the cards are plugged into. if we control which two boards we plug in, and both are programmed to talk to each other, such that "raw" data is fine.. and the device drivers are aware of the desire for direct communication.. can you see a way for direct data transfer? Getting creative?

If it is one root complex with multiple ports, then it's possible to have direct endpoint to endpoint communication.
If it's a multi-rooted root complex, then it's not covered by the PCIe specification and vendor-dependent if that feature is implemented. So notice multi-rooted PCIe subsystems in many cases if the bus IDs aren't consecutively numbered. It also depends on the implementation of the enumeration procedures if additional root complexes start their first bus ID by multiples of e.g. 64.

Related

FPGA to DMA to RDMA

I am trying to send data generated from my FPGA card out to an IB device. I want the latency to be as low as possible, so I am thinking this may be the data path.
FPGA --> DMA via scatter/gather DMA into Memory Buffer --> RDMA into a ConnectX-6 card --> IB cable --> my other device.
With this potential solution, I have a bunch of unknowns that I cant seem to find on the internet and was hoping someone could assist:
Is this possible/viable? I have never worked with DMA and RDMA and want to make sure it can work before purchasing. I fear it may be a one or the other situation and you can't do both or doing both will cause latency somehow or lost data.
Ideally, I want it to reach the other devices CPU (I just want it to avoid the Host device's CPU), but it seems like RDMA makes it avoid both CPUs? Would it then just be DMA to my ConnectX card? I've been searching the datasheets/manuals/firmware/support to see if the ConnectX cards can support DMA, but it doesn't seem to be possible? They just support RDMA (which is a subset of DMA.)
Any information/guidance would be appreciated. If I am in the wrong group, let me know. I wasn't sure if it belonged here or in the electrical engineering one (there seemed to be more DMA/RDMA questions in here)

How the traditional device driver program differs?

How the traditional device driver program differs from writing a device driver that support Device Tree ?
In the Linux kernel before device trees were introduced data required by drivers was provided through board files and there would be a board file for each possible board. It was provided from something called as platform data and drivers would be platform driver which are drivers basically not discoverable automatically or enumerated like USB or PCI.
The above approach resulted in a lot of mess and duplication as a lot of the data would of course be common or could be common between boards. For example, a GPIO controller for a particular SoC is not going to be different between boards or require different information at least not primary ones like interrupt id.
Device trees approach have a SoC level dtsi file which is common and all board or module variants inherit them but specify what any differences or additional peripherals to turn off or on.
Functions to parse or read either of them are different. Device trees describe hardware and the primary difference in both of them is how the data is provided and read.

Is there a well known algorithm to discover the ID of each device in a daisy chain network?

Imagine I have say 6 intelligent devices all connected together end to end with a data link (could be two serial ports per device). Each device has unique ID programmed into it and we want each device to work out where it is in the chain of devices. So with 6 devices my daisy might look like:
-[901]---[905]---[902]---[903]---[906]---[904]-
At the end of the 'discovery' algorithm each device would have the above map and know which device it is connected to on its left and right, if any for the end devices. Each device would operate the same software and be identical to each other, apart from the unique ID.
Is there an easy way to do this without it getting too complicated ? The number of devices in the chain could be variable but a maximum of 6.
This sounds very similar to ARP resolution. Since there is a maximum of only 6 devices, a basic algorithm of broadcasting the request to the network (i.e. each device) would probably be the simplest way. Likewise, linearly passing the request wouldn't take much longer either.
If they are network devices with MAC addresses, you can even take advantage of these unique IDs instead of creating your own, if that's useful.

Questions on how network cards in Windows work

I am trying to figure out how network cards work in Windows, and how the data is being relayed.
I have two hypotheses.
1.
Data is received by the network card.
The card then puts the data in an internal buffer, possibly a double buffer or a ring buffer.
The card accumulates data until some amount has been reached, upon which it sends an interrupt.
Windows copies the data from the card to the RAM and notifies appropriate handlers.
2.
Data is received.
The card puts the data in the RAM using DMA. (Does DMA guarantee that data will not be lost, or does the card still need its own buffer?)
The card fires an interrupt upon putting enough data in the RAM.
Windows receives the interrupt and copies or exposes the data to appropriate handlers.
Are either of my hypotheses correct?
Is there any message from the card or Windows if buffers are full?
In my Windows systems properties for my ethernet controller I can see properties called "Receive buffers" and "Transmit buffers", both are set to 256.
What does this mean?
Are there any good literature on this subject? (I have Tanenbaum's Modern Operating Systems, but it is not specifically related to Windows.)
Your question subsumes (at least!) three very, very broad topics:
1) how does a Layer 2 (Data Link) hardware device work?
2) How does it relate to the operating system's network stack
... and ...
3) How does it relate to the operating system's kernel-level device driver?
The next link is actually 180 degrees opposite your original question (the API is relatively high level, your question pertains to the lowest software levels), but it wouldn't hurt to look at the .Net API for perspective "how things work":
http://msdn.microsoft.com/en-us/library/4as0wz7t.aspx
'Hope that helps ... at least a little bit...
PS:
Linux is a wealth of information about implementing a network stack: all of the kernel source and all of the device drivers are completely available, and very well documented.

Do programmable Ethernet devices (think onboard CPU) really exist?

I've heard from various people that programmable Ethernet cards exist and are easily available. However I have yet to be able to track down one of these mythical devices so I'm wondering if they're just that - a myth.
Such a programmable card has a gigabit Ethernet interface, has a programmable CPU and connects to the host system via PCI Express. The problem area these cards address are low latency network applications where the card itself does the work and "reports back" to the operating system. Basically the card acts as a co-processor and handles all the low latency requirements on the card, thus avoiding the issues of writing low latency code in user-land - think 0.4ms - 0.5ms response times.
So my question is, do these cards really exist and if so, where can I get my hands on one?
AdvancedIO has dual and quad programmable 10 gbe PCI Express cards. These cards are geared toward ultra-low latency and line rate applications (high frequency trading, military and telecom). They use FPGAs instead of CPUs because FPGAs have lower latencies and can handle large amount of data in real time.
If you want more information on these cards, you can go to:
http://www.advancedio.com/products/form-factor/pci-express/
If you want more information about applications, you can go to: http://www.advancedio.com/markets/financial/ or browse the different Markets on the web site.
These cards come with a development framework to facilitate the development of applications.
Good luck
RNet technologies has a user-programmable NIC, that is software programmable, rather than Advanced I/O cards that are FPGA based (HW programmable).
Bigfoot Networks makes a series of products (their Killer line) that are "smart" NICs: e.g. the Killer 2100.
It's not clear at a glance whether their current products are user-programmable. However, a review of a legacy product of theirs suggests that you were able to load specialized "apps" onto the cards, at least.

Resources