AXI4 AxVALID high in same clock - axi4

I have been looking for some documentation on the case when ARVALID and AWVALID both go high in the same clock and contain the same address. Should the write be handled first, or should the read? Any help is much appreciated.

AXI4 does not specify this. It is up to you to decide and then to implement this in your interconnect or in your slave.

Related

How windows progran can transmit an input and get an output to an FPGA

I am new to programming and FPGA. I like to run a program on my windows 10 PC and like to send input to the FPGA and when processing is done I like to receive output to the same program. Is it possible and how it can be achieved. I need some direction to start finding a way.
Thank you.
I recommend you to buy a Digilent Arty A7 board. It is low cost and very nice to work with.
To communicate with a PC/Windows you can use the USB to UART that you have on that board. However I think the best and easiest way to do it is to use an IP core that has support for Ethernet and TCP/IP. Using TCP/IP is very simple on the PC side using Python, Matlab, Telnet or any programming tool.
The best IP for the Xilinx FPGA that I have found so far is the ones from fpga-cores.com. There you only have to implement an AXI4 Stream to communicate with the client. I don't think it gets easier than that.
That core also include remote programming of the FPGA over Ethernet and a logic analyzer. All that is for free.
Good question. A lot of people ask about data processing on FPGA but never think about how to get the data to and from it. (Until it is too late)
The best way is to find an FPGA which has also has an SOC. That is: a processor, DDR interface and one or more high speed interfaces. Ethernet, USB, PCIe. Make sure they come with complete working example code, often some RTOS.
As to which FPGA to choose greatly depends on what you want it to do. You also need to have enough programmable gates to implement the function you want.
Nowadays all vendors have free HDL compilers up to a certain size FPGA.
Every FPGA manufacturer also has one or more prototyping boards, but the price of those varies a lot.
If you have some FPGA code which is capable of very high data throughput your interface is likely to become the bottleneck.
A PCIe board offers the highest data throughput, but for that you need to have matching drivers on both the FPGA board and the PC. In that case check that it has example drivers for the PC side too.
Yes, I fell into that trap a few years back

AXI4 delay transactions

I am just looking for advice. I currently have a custom IP integrated in VHDL which has a AXI4 slave input and an AXI4 master output, and currently the signals are directly tied together.
I would like to add a customizable latency to the AXI signals, so that way they can be delayed for a particular amount of time through the IP, rather than being connected to each other.
My question is; can I delay read and write transactions through the IP merely through the use of the AxVALID and AxREADY (and maybe the RVALID/RREADY and WVALID/WREADY) signals?
If for instance I wanted a 20 clock cycle delay, I could wait for an external master to assert VALID, and wait 20 clocks before having the IP slave assert READY? Is this correct logic?
Thanks in advance for any any advice.
Yes, that can be done. Depending on your infrastructure it can cause bus congestion. Alternatively, you should also insert a FIFO to buffer these delayed bus transactions.

How do I read large amounts of data from an AXI4 bus

I'm building something on a zybo board, so using a Zynq device.
I'd like to write into main memory from the CPU, and read from it with the FPGA in order to write the CPU results out to another device.
I'm pretty sure that I need to use the AXI bus to do this, but I can't work out the best approach to the problem. Do I:
Make a full AXI peripheral myself? Presumably a master which issues read requests to main memory, and then has them fulfilled. I'm finding it quite hard to find resources on how to actually make an AXI peripheral, where would I start looking for straightforward explanations.
Use one of the Xilinx IP cores to handle the AXI bus for me, but there are quite a few of them, and I'm not sure of the best one to use.
Whatever it is, it needs to be fast, and it needs to be able to do large reads from the DDR memory on my board. That memory needs to also be writable by the CPU.
Thanks!
An easy option is to use the AXI-Stream FIFO component in your block diagram. Then you can code up an AXI-Stream slave to receive the data. So the ARM would write via AXI to the FIFO, and your component would stream data out of the FIFO. No need to do any AXI work.
Take a look at Xilinx's PG080 for details.
If you have access to the vivado-hls tool.
Then transferring data from the main memory to the FPGA memory (e.g., BRAM) under a burst scheme would be one solution.
Just you need to use memcpy in your code and then the synthesis tool automatically generates the master IP which is very fast and reliable.
Option 1: Create your own AXI master. You would probably need to create a AXI slave for configuration purposes as well.
I found this article quite helpful to get started with AXI:
http://silica.com/wps/wcm/connect/88aa13e1-4ba4-4ed9-8247-65ad45c59129/SILICA_Xilinx_Designing_a_custom_axi_slave_rev1.pdf?MOD=AJPERES&CVID=kW6xDPd
And of course, the full AXI reference specification is here:
http://www.gstitt.ece.ufl.edu/courses/fall15/eel4720_5721/labs/refs/AXI4_specification.pdf
Option 2: Use the Xilinx AXI DMA component to setup DMA transfers between DDR memory and AXI streams. You would need to interface your logic to the "AXI streams" of the Xilinx DMA component. AXI streams are typically easier to implement than creating a new high performance AXI master.
This approach supports very high bandwidths, and can do both continous streams and packet based transfers. It also supports metadata for each packet.
The Xilinx AXI DMA component is here:
http://www.xilinx.com/products/intellectual-property/axi_dma.html
Xilinx also provides software drivers for this.

Connect stack of Parallela boards and a rPI via FPGA and 1/0 pins

I want to conect my Pi and Parallella such that the Pi does the GPU side and the Parallella stack this is to be controled by a third Parallella
I think the best way to do this is through an FPGA. Is this possible and a good way to do it?
Also what structure should I use and how should I start to implement it?
I know little VHDL and Verilog and do not want to use paid software.
I am eager to learn and have a lot of time to do it though so no "simple but bad solutions".
I will up load the project on Git when done
The solution depends on the bandwidth and latency requirements. You are right that FPGA provides the largest bandwidth and lowest latency. However, do you really need such good performance? Maybe USB or Ethernet connections are good enough.
For the FPGA solution, consider the secondary pi and parallella as two peripherals for the primary pi, and assign different address spaces for them. The communications among three devices are based on polling initiated by the primary pi. FPGA should pass the signaling on data/address bus to the two peripherals with compatible I/O timing. Peripherals consider the FPGA as a RAM, and should listen to any data/controls with their best effort. FPGA should buffer the data/control signals if peripherals cannot respond in real-time.
Overall, it's a very tough work. I'd like to see the source code if the FPGA solution works.

How to convert 24MHz and 12MHz clock to 8MHz clock using VHDL?

I am writing a code using VHDL to convert 24MHz and 12 MHz clock to 8 MHz clock. Can anyone please help me in this coding? Thanks in advance.
Is this for an FPGA? Or something else? Are you really dividing a clock, or just a signal? For a divide by three counter, try this link:
http://www.asic-world.com/examples/vhdl/divide_by_3.html
And for a 2/3:
http://www.edaboard.com/thread42620.html
As Martin has already said, use a clock management device by Xilinx recommendations in order to divide your clock down to a lower rate.
While you might be tempted to implement a clock divider using logic and a counter, you will not obtain good synthesis results.
Here are some tips:
Be sure to closely read and follow recommendations for the clock management hardware for your device. There can be quite a few "gotchas" related to power-up, reset, loss of clock lock, etc.
Make sure that you are operating the clock management device within its specifications. See your device's datasheet for more information (in this case for the S3-A).
Use FPGA Editor to verify correct placement and configuration of your clock management units (i.e. did it end up in the right spot on the chip)
Adhere to recommended practices for feedback clocks, and clock buffering.
Use a DCM or PLL (depending on the family of FPGA) - there's examples in the documentation. If you tell us which family, I might be able to point you more directly.
EDIT:
As you say Spartan 3ADSP - you need to either:
Use the Core Generator Clocking Wizard to create you a VHDL or Verilog file with the components you need in and hope you never need to understand what's going on
Read the libraries guide and the DCM section of the Userguide for that chip and instantiate a DCM on your own and apply the correct generics/parameters to it.
Don't forget to apply a reset pulse to the DCM after configuration has finished 0 and make sure that pulse lasts long enough. The min pulse length is different for each family, I don't recall off the top of my head what it is for that chip, so check the datasheet.

Resources