I am not good at English. sorry.
I don't know if the content of the question is too abstract.
I'm going to build a Neural Network Hardware Accelerator with Artix 7 FPGA.
However, block memory is out of capacity.
So I'm going to use DDR3 memory, which is included on the arty a7 board.
I want to write the value in the block memory to DDR memory or read the value in DDR memory.
Is there a good way to read and write DDR memory on the FPGA?
I had a quick look at the Artix-7 product summary. They mention DD3 memory support and the datasheet mentions DDR memory controllers.
You have to find Xilinx' information about the Artix DDR controller and read through it. Probably it has an AXI interface as Xilinx is very much into AXI these days. If so you have to write an AXI master interface to read from or write to the DDR. Or maybe Xilinx have some IP which does most of the work.
None of the above is easy! Start with installing the latest Vivado design suit (it is free) which gives you also Xilinx' docnav. You will need it as the documentation of Xilinx is reasonably good but there is a lot and a lot and a lot of it.
I'll be honest: this is not something I would recommended a begginner with HDL to do unless you are prepared to put a lot of time it (and also learn a lot).
You need to instantiate a memory controller IP from Xilinx. See https://www.xilinx.com/support/documentation/ip_documentation/ug586_7Series_MIS.pdf (to begin with).
Related
I'm building something on a zybo board, so using a Zynq device.
I'd like to write into main memory from the CPU, and read from it with the FPGA in order to write the CPU results out to another device.
I'm pretty sure that I need to use the AXI bus to do this, but I can't work out the best approach to the problem. Do I:
Make a full AXI peripheral myself? Presumably a master which issues read requests to main memory, and then has them fulfilled. I'm finding it quite hard to find resources on how to actually make an AXI peripheral, where would I start looking for straightforward explanations.
Use one of the Xilinx IP cores to handle the AXI bus for me, but there are quite a few of them, and I'm not sure of the best one to use.
Whatever it is, it needs to be fast, and it needs to be able to do large reads from the DDR memory on my board. That memory needs to also be writable by the CPU.
Thanks!
An easy option is to use the AXI-Stream FIFO component in your block diagram. Then you can code up an AXI-Stream slave to receive the data. So the ARM would write via AXI to the FIFO, and your component would stream data out of the FIFO. No need to do any AXI work.
Take a look at Xilinx's PG080 for details.
If you have access to the vivado-hls tool.
Then transferring data from the main memory to the FPGA memory (e.g., BRAM) under a burst scheme would be one solution.
Just you need to use memcpy in your code and then the synthesis tool automatically generates the master IP which is very fast and reliable.
Option 1: Create your own AXI master. You would probably need to create a AXI slave for configuration purposes as well.
I found this article quite helpful to get started with AXI:
http://silica.com/wps/wcm/connect/88aa13e1-4ba4-4ed9-8247-65ad45c59129/SILICA_Xilinx_Designing_a_custom_axi_slave_rev1.pdf?MOD=AJPERES&CVID=kW6xDPd
And of course, the full AXI reference specification is here:
http://www.gstitt.ece.ufl.edu/courses/fall15/eel4720_5721/labs/refs/AXI4_specification.pdf
Option 2: Use the Xilinx AXI DMA component to setup DMA transfers between DDR memory and AXI streams. You would need to interface your logic to the "AXI streams" of the Xilinx DMA component. AXI streams are typically easier to implement than creating a new high performance AXI master.
This approach supports very high bandwidths, and can do both continous streams and packet based transfers. It also supports metadata for each packet.
The Xilinx AXI DMA component is here:
http://www.xilinx.com/products/intellectual-property/axi_dma.html
Xilinx also provides software drivers for this.
i'm relatively new at fpga (vhdl) programming. So i have no clue about resource cost of different solutions to a problem...
So i was wondering which approach makes most sense if i want to implement some memory mapped registers inside an fpga design. Should i design 1 address decoder that strobes al the registers on an address match or is it better to design each register with its own decoder (or at least each subcomponent like pwm generator which uses a couple of registers in my implementation).
Thanks in advance for the insights
Regards
Jan
The critical resource is usually not gates (LUTs), but engineering time, and so the primary concern is to make the design easy to manage and modules easy to reuse.
For that reason alone, you should make a hierarchical address decode, where each module is responsible for partitioning and decode of the address space it has been allocated.
So in your case, the PWM generator should have separate address decoder for registers allocated in the address space given to the PWM module at the next higher level in the hierarchy.
To learn about resource usage you can install a FPGA synthesis tool and experiment with different approaches; that is a good exercise which will help you choose between different implementations.
Example:
Let's assume there is a Nios running on a FPGA that sends randomly (or every second) a string to an attached display over the SPI interface. On the other hand there is the FPGA code that monitors a pushbutton. Every press on this button should send a string to the same attached display.
Question:
How works the interaction (or communication) between the FPGA and Nios in general or in such described case? How is it possible to 'inform' Nios that the pushbutton is pressed when this code is running under the FPGA code? Maybe there is a documentation about this topic to get an idea how it works...
Thanks in advance
Low speed or high speed?
For low speed, plug a GPIO core with enough I/O "pins" into the NIOS system and rebuild it. Wire your hardware to those pins, and use the GPIO driver code to access them. Done. Buttons count as low speed. SPI can too, though you'll probably find a much better SPI peripheral for NIOS, so I'd use that.
For high speed, you need to design a peripheral (IP core) that interfaces to whatever bus the NIOS system uses, and provides all the registers, memory, interrupt sources etc you need to interface to your VHDL hardware. There are plenty of example peripherals you can use as a starting point. Then you get to write the driver software to access that peripheral, again, starting from sample code.
This is a much more complex project, and while it's much faster than GPIO, you find "high speed" is relative; any embedded CPU is appallingly slow compared to custom hardware. We're not talking about factors of 2 here but orders of magnitude.
EDIT : Whichever approach you use, as described above, interacting with the hardware from the software side is best done through the driver software.
If you're in the situation where you have to write your own driver, then you declare variables to match each accessible register or memory block (represented by an array variable). Often the vendor tools can create a skeleton driver for you, from either the VHDL code or some other description. I don't know how Altera/Nios tools are set up but they surely have tutorials to teach you their approach.
If you have an Ada compiler you can declare these variables at package scope, to maintain proper abstraction and information hiding. But if you have to use C, with no packages, you are probably stuck with global variables.
You fix each variable at whatever physical address your hardware maps them to, and you must declare them "volatile" so that accesses to them are never optimised into registers.
If your hardware can interrupt the CPU, you have to write an interrupt handler function, with pragmas to tell the compiler which interrupt vector it should be connected to. You'll need to get the exact details from your own compiler documentation and examples of driver code for other peripherals.
I would start here:
https://www.altera.com/support/support-resources/design-examples/intellectual-property/embedded/nios-ii/exm-developing-hal-drivers.html
with sample code and a short "Guidelines" document
and use the NIOS software handbook for more depth.
To help find what you're looking for, apparently Altera use the terms "HAL" (Hardware Abstraction Layer) to describe the part of the driver that directly accesses the hardware, and "BSP" (Board Support Package) for the facilities that allow you to describe your hardware to the tools - and to your software team. Any tools to build a skeleton driver will be associated with the BSP : I see a section called "Creating a new BSP" in the software handbook.
I want to program an FPGA on a board that has a socket (zif etc or whatever is applicable) for said FPGA, from which it can be removed and reattached without soldering. I want to know where I can get a board suitable for programming an FPGA in this way?
Once the FPGAs have been programmed they will be attached to another different PCB via solder.
I wish to essentially program the FPGA in a similar way that it is possible to program an EPROM.
I wish to use VHDL if at all possible.
FPGAs are not programmed like an EPROM - their internals are completely volatile. In system use, they are 'configured' from some other non-volatile memory. For example, many can interface directly to a standard serial flash device to load that configuration.
This non-volatile memory is the device which you need to "program" in some fashion. For example:
before soldering, using some external agency
using JTAG (if it has such an interface).
Or, you can load a configuration into the FPGA over JTAG which then allows you to program the flash using the FPGA!
It sounds as if you've misunderstood a thing or two. An STM32F103 is a microcontroller, that is, a processor with built-in memory, I/O and similar, and is typically programmed in C or C++.
VHDL (a Hardware Description Language) is used to program FPGAs (amongst others). There is a fundamental difference in the two types of chips. A processor is a "static" chip, which executes a program instruction by instruction, whereas in an FPGA the chip hardware itself is programmable - you (by using for instance VHDL) describe the actual connectivity and functionality of the chip, and essentially create numerous small, customized and application-specific processors.
You should probably first of all learn a bit more about the differences between the two types of chips - then have a look at for instance some of Digilents FPGA boards.
Also, programming a chip in one board, unsoldering it, and soldering it to another is not a good idea. Both microcontrollers and FPGAs today should be soldered to their final board, and then programmed (for instance over JTAG) - I'm sorry to say that what you are proposing doesn't really make much sense - and if you look at the pin count and packages of today's chips you might see why.
I seem to be under the impression that FPGAs can be updated while the chip is running; and I need to know if that is correct or not.
It seems to be from what I've read that you can change the FPGA netlist on demand the same way you can change the program that's running on a processor. Yes I know that an FPGA is not a processor.
Is my assumption correct, and if not then how come?
Most of the time, you load the configuration for the entire FPGA in one go, and all logic stops running during the reconfiguration process.
It sounds like you want to reload a subset of the FPGA, while the remainder continues running. You would need a device with special support for partial reconfiguration. There's more information on Wikipedia.
==> EDIT: I stand corrected: EETimes article on partial reconfiguration
You will generally need to reset the FPGA so that it can be reprogrammed.
At a system level reconfiguration is possible. You can have a software application running on a PC or embedded system that reprograms the FPGA as needed. Depending on the application or software license, you can program different FPGA designs easily. You cannot, however, significantly alter the design structure, such I/Os, logic cells, DSP configs, memory blocks, etc.
FPGAs have a bunch of logic cells that need to be initialized by a stream of configuration bits. This stream of bits usually comes from a flash chip located off the device, although some devices have the flash memory on-board.
Partial Reconfiguration means the ability to configure just some of the logic cells while the rest are in use. This is specific to particular models.
Total reconfiguration is possible even if your device doesn't support it - you would need to reprogram the flash chip and then issue a Reset or reload command when done.
Some devices have more than one configuration image in the configuration flash. The device will load the first image, and if it doesn't like it, it will load the second (or subsequent) images. This can be for redundancy, or difference feature sets.
Some of the SOC FPGAs (like Xilinx Zynq) use the microprocessor core to load the FPGA. In this case, the microprocessor core can change the FPGA as much as it wants while running.
Yes I know that an FPGA is not a processor.
An FPGA is is a type of processor, but it is not a type of CPU.
Most FPGAs only have volatile storage so you have to update them whilst they're on. This doesn't mean that you can change their operation any time you want. That's dynamic reconfiguration and only supported by a subset of FPGAs.