I'm working on a VHDL project for a Xilinx FPGA, and found myself at a cross road.
I need for example to add two signals (C=A+B), and found that Xilinx has Tool that can generate a component which will do the job.
But this could also be implemented in standard VHDL: C <= A + B
If I use standard VHDL the code should be portable, but does this has lower throughput?
I mean, does the special components use DSP functions inside the FPGA ect., that makes them faster, or can the Synthesizer normally handle this?
Any time you can infer something do so.
Performance is very rarely impacted, especially in the case of simple things like adders and multipliers. Even RAM blocks are easy to infer for many purposes - I only tend to instantiate vendor components if I need a very specific vendor-block behaviour.
The DSP blocks will be used well if you write VHDL code for adders and multipliers of the appropriate widths (or smaller) with pipelining that matches what is available inside the block. If you want to be more advanced and use (for example) Xilinx's DSP block's opcode input then you will have to instantiate a block.
Related
In VHDL, if you want to increment a std_logic_vector that represents a real number by one, I have come across a few options.
1) Use typecasting datatype conversion functions to change the std_logic vector to a signed or unsigned value, then convert it to an integer, add one to that integer, and convert it back to a std_logic_vector the opposite way than before. The chart below is handy when trying to do this.
2) Check to see the value of the LSB. If it is a '0', make it a '1'. If it is a '1', do a "shift left" and concatenate a '0' to the LSB. Ex: (For a 16 bit vector) vector(15 downto 1) & '0';
In an FPGA, as compared to a microprocessor, physical hardware resources seem to be the limiting factor instead of actual processing time. There is always the risk that you could run out of physical gates.
So my real question is this: which one of these implementations is "more expensive" in an FPGA and why? Are the compilers robust enough to implement the same physical representation?
None of the type conversions cost.
The different types are purely about expressing the design as clearly as possible - not only to other readers (or yourself, next year:-) but also to the compiler, letting it catch as many errors as possible (such as, this integer value is out of range)
Type conversions are your way of telling the compiler "yes, I meant to do that".
Use the type that best expresses the design intent.
If you're using too many type conversions, that usually means something has been declared as the wrong type; stop and think about the design for a bit and it will often simplify nicely. If you want to increment a std_logic_vector, it should probably be an unsigned, or even a natural.
Then convert when you have to : often at top level ports or other people's IP.
Conversions may infinitesimally slow down simulations, but that's another matter.
As for your option 2 : low level detailed descriptions are not only harder to understand than a <= a + 1; but they are no easier for synth tools to translate, and more likely to contain bugs.
I am giving another answer to better answer why in terms of gates and FPGA resources, it really doesn't matter which method you use. At the end, the logic will be implemented in Look-Up-Tables and flip flops. Usually (or always?) there are no native counters in the FPGA fabric. The synthesis will turn your code into LUTs, period. I always recommend trying to express the code as simple as possible. The more you try to write your code in RTL (vs. behavioral) the more error prone it will be. KISS is the appropriate course of action everytime, The synthesis tool, if any good, will simplify your intent as much as possible.
The only reason to implement arithmetic by hand is if you:
Think you can do a better job than the synthesis tool (where better could be smaller, faster, less power consuming, etc)
and you think the reduced portability and maintainability of your code does not matter too much in the long run
and it actually matters if you do a better job than the synthesis tool (e.g. you can reach your desired operating frequency only by doing this by hand rather than letting the synthesis tool do it for you).
In many cases you can also rewrite your RTL code slightly or use synthesis attributes such as KEEP to persuade the synthesis tool to make more optimal implementation choices rather than hand-implementing arithmetic components.
By the way, a fairly standard trick to reduce the cost of hardware counters is to avoid normal binary arithmetic and instead use for example LFSR counters. See Xilinx XAPP 052 for some inspiration in this area if you are interested in FPGAs (it is quite old but the general principles is the same in current FPGAs).
I would like to implement a 8 to 1 multiplexer in FPGA. The inputs of the multiplexers are constants, so I use a look up table instead.
I know that fpgas are made of LUTs. Is there any hardware block that I can use in order to optimize the multiplexer?
Thank you
Not really, unless the each of the 8 "words" you're using is EXTREMELY large and could justify a blockRAM (discussion on when to use a blockRAM here - http://forums.xilinx.com/t5/Virtex-Family-FPGAs/Lut-vs-Block-Ram/td-p/251888 ). If your bus is only 1 bit, or 8 bits wide...just use a case statement. The synthesis & routing stages will take care of converting that "code" into the individual LUTs on the FPGA.
As others have said, ISE is smart enough to infer a mux from your code. You can verify this after running synthesis. Check the Summary in the synthesis report.
If you really want to use LUTs as a good learning exercise, you certainly can do it using a few staged together. Read Understanding the INIT attribute for LUTs to learn how to use the INIT property.
We are using a tool to convert the code into RTL.
Using those VHDL files, we would like to synthesis the code using FPGA.
In the synthesis results, we see the following table:
Slice Logic Utilization Used Available Utilization
Number of DSP48E1s 15 864 1%
I would like to search in VHDL files to see which operations use these units.
Is there any way to find them? or any documentation which shows the operations causing DSPs to be used?
There are a few ways that a DSP48 may be used in your VHDL.
It may be inferred. This is when the synthesis tool is smart by looking at an operation that you are doing (such as a multiply) and realizing that it would be most efficient to do the multiply with a dedicated resource (DSP48) instead of fabric/logic.
It may be instantiated. This means that the primitive was directly called out in your source file. The designer said that I know I want to use this piece of hardware, so I am going to call it out explicitly. This is when you could do a text search for "DSP48" in your VHDL source files.
It may be part of a core. If it is part of a core, you may or may not have visibility into that core. For example, how the core is actually implemented may be different than the behavioral model which is used for simulation.
In any case, as recommended by Russell, using Xilinx toolset to determine utilization of primitives in the design hierarchy can be a good first pass to figuring out where the units are coming from. Additionally, you can always open up FPGA Editor, see what the DSP48 units are called and what signals are going to/out of the DSP48 for additional hints on where it is in your design.
It sounds like you're trying to find your Module Level Utilization. I know that Xilinx ISE supports this. Under Design Overview there's an option called Module Level Utilization that breaks down every module in your VHDL design and tells you where the Regs, LUTs, BRAMs, and DSPs are used.
If you're unable to find it, look for any large multiplications in your design. Large multiply/accumulate operations will synthesize into DSP48s.
Although I'm somewhat proficient in writing VHDL there's a relatively basic question I need answering: When to break down VHDL?
A basic example: Say I was designing an 8bit ALU in VHDL, I have several options for its VHDL implementation.
Simply design the whole ALU as one entity. With all the I/O required in the entity (can be done because of the IEEE_STD_ARITHMETIC library).
--OR--
Break that ALU down into its subsequent blocks, say a carry-lookahead adder and some multiplexors.
--OR--
Break that down further into the blocks which make a carry-lookahead; a bunch of partial-full adders, a carry path and multiplexors and then connect them all together using structural elements.
We could then (if we wanted) break all of that right down to gate level, creating entities, behaviours and structures for each.
Of course the further down we break up the ALU the more VHDL files we need.
Does this affect the physical implementation after synthesis and when should we stop breaking things up?
You should keep your VHDL at the highest level of abstraction, so don't ever "break it down" as you described. What you are proposing is that you do the synthesis yourself (like creating a carry-lookahead adder) which is a bad idea. You don't know the target device (FPGA or ASIC library) as well as the synthesizer does and you shouldn't try to tell it what to do. If you want to do an addition, use the + operator and the tools will figure out the best structure that fits your design constraints.
Dividing the design into many modules will often make it more difficult to optimize your design, since optimizations between modules are generally harder to do than optimizations within modules.
Of course, major functional blocks that have well defined interfaces between them should be in separate modules for the sake of maintaining the design and readability. The ALU can be one module, the instruction ROM another, and so forth. These modules have distinct, well-defined functions and there is not much opportunity for intramodule optimization. If you want to get the last possible bit of optimization available, just flatten the design before optimization and let the tools do the work.
I believe at university I wrote a program for an FPGA which was in a language derived from C. I am aware about languages such as VHDL and verilog. However, what I dont understand is the amount of choice a programmer has regarding which to use? Is it dependent on the FPGA? I am going to be using a Xilinx FPGA.
I am confused because the C-variant language was, unsurprisingly, similar to C- however I know things like VHDL are nothing like C. Therefore if I have a choice I would prefer to programme an FPGA using a C-variant language. The Xilinx website had a million documents and it wasn't overly clear.
It was probably Verilog that you used. It's rather C-like in a lot of it's constructs. I wouldn't say it's "like C", but some syntax is similar.
VHDL is based on ADA, so yes, it's rather different.
There are some small FPGA specific languages around, but VHDL and Verilog are the big two. I think most others have died now.
Remember that writing hardware and writing software are two rather different things. You can't really describe hardware constructs in a language like C (*). The language needs to have special features to allow you to describe exactly what you want. The code needs to be structured in a way that will make the hardware efficient. Don't fool yourself into thinking that you can take a piece of software and magically run it on an FPGA just by changing the language/compiler. (This is targeted more at your follow up question to Marty).
Trying to use C to write a circuit description, is like trying to program a computer in English. You could do it, but it's really the wrong language for the job.
(*) Yes, I know there's SystemC (a C++ class library that is meant to make code synthesisable), but I've yet to see anyone get good results from it, and certainly not on FPGAs. Even then the code has to be structured in a similar way as for an HDL.
Clearly HDL are still preferable when programming FPGA (Xilinx, Altera etc : all accept VHDL or Verilog).
However, things are changing (slowly) : there are now excellent so-called behavioral synthesizers that allow you to code in C and generate hardware, expressed for you in VHDL or Verilog at the register-transfer level. They are sometimes refered as HLS : high-level synthesis.
The problem is that they are quite expensive.
Synfony from Synopsys
Cynthesizer from Forte Design Systems
CatapultC from Calypto (was from Mentor)
ImpulseC
At the academic level :
Gaut from Labsticc lab (France)
spark
legup
Hercules
...
Basically, these tools work by extracting a dependency graph from the C program : nodes represents computations and edges represent variables : that is all what you do when you program, in either C or other programming language. Using this internal representation, the compiler can do hardware-relevant transformations like : register allocation (mapping variables to register, or keeping them combinatorial i.e on wires), operations scheduling (deciding if operation execute in same clock cycle) etc, and finally generate HDL automatically.
Hope this helps
JCLL
Usually FPGA vendors will have toolchains that support both Verilog and VHDL - it's up to you to choose which language you'd like. It's generally just these two languages that are supported.
For more C-like languages, a long-shot option is to use the synthesisable subset of SystemC. This is C++ with circuit-friendly stuff added. I'm not sure if the FGPA tools support this though.