FPGA implement look up table with LUTs - vhdl

I would like to implement a 8 to 1 multiplexer in FPGA. The inputs of the multiplexers are constants, so I use a look up table instead.
I know that fpgas are made of LUTs. Is there any hardware block that I can use in order to optimize the multiplexer?
Thank you

Not really, unless the each of the 8 "words" you're using is EXTREMELY large and could justify a blockRAM (discussion on when to use a blockRAM here - http://forums.xilinx.com/t5/Virtex-Family-FPGAs/Lut-vs-Block-Ram/td-p/251888 ). If your bus is only 1 bit, or 8 bits wide...just use a case statement. The synthesis & routing stages will take care of converting that "code" into the individual LUTs on the FPGA.

As others have said, ISE is smart enough to infer a mux from your code. You can verify this after running synthesis. Check the Summary in the synthesis report.
If you really want to use LUTs as a good learning exercise, you certainly can do it using a few staged together. Read Understanding the INIT attribute for LUTs to learn how to use the INIT property.

Related

Look-Up Table division synthesizable in an ASIC/FPGA design? Makes any sense?

I was studying the ways to make an efficient FPGA project (toward to become an ASIC design) which include division operations of simple 32 bits binary numbers.
I have found that the most expedite way to do it, is using LUT (Look-up table), than generating a complex division logic. That's fine, however, when I think about ASIC I imagine a physical microchip, with digital logic inside, I can't imagine to put a whole table inside to produce the division. I can understand it makes sense in an FPGA because it has a lot of resources including on-chip memory etc, but not on a definitive ASIC.
My question is, LUT is actually synthesizable in an ASIC design? Is this how chips which need division operation, are in fact made?
Also, LUT does consumes less area than creating a division module??
I am quite noob on this, I thank you for your input.
General integer division is made using an iterative process, where each iteration generates a number of result bits based on either subtraction or table lookup, similar to when you did division on paper back in school. Specific integer division, for example if the numbers have few digits then a lookup table may be used instead, or if the divisor is an 2 ^ n number, then simple shifting may be used maybe combined with addition for rounding. So actual implementation of division actually depends on the arguments, and speed/size requirements.
Regarding your FPGA to ASIC conversion, then the LUTs in the FPGA is just a flexible way to implement general purpose combinational circuits, since an e.g. 4-input LUT can implement all outputs for a 4-input function. When you synthesize logical expressions to an FPGA, then the result will be a LUT representation since that is the building blocks available in an FPGA, but if you synthesize logical expressions to an ASIC, then the result will usually be a discrete gate representation since that is the building blocks available in an ASIC. The ASIC implementation is smaller and faster (for same technology), since the general purpose LUT overhead is avoided, however at the loss of the FPGA flexibility.
Synthesis become popular in FPGA designers. Everything you need to know about LUT based architecture is a transistor level design techniques which required set of skills.
I personally go with verilog netlist file with netgen command. You can go FPGA - LUT Architecture Optmization

How expensive is data type conversion vs. bit array manipulation in VHDL?

In VHDL, if you want to increment a std_logic_vector that represents a real number by one, I have come across a few options.
1) Use typecasting datatype conversion functions to change the std_logic vector to a signed or unsigned value, then convert it to an integer, add one to that integer, and convert it back to a std_logic_vector the opposite way than before. The chart below is handy when trying to do this.
2) Check to see the value of the LSB. If it is a '0', make it a '1'. If it is a '1', do a "shift left" and concatenate a '0' to the LSB. Ex: (For a 16 bit vector) vector(15 downto 1) & '0';
In an FPGA, as compared to a microprocessor, physical hardware resources seem to be the limiting factor instead of actual processing time. There is always the risk that you could run out of physical gates.
So my real question is this: which one of these implementations is "more expensive" in an FPGA and why? Are the compilers robust enough to implement the same physical representation?
None of the type conversions cost.
The different types are purely about expressing the design as clearly as possible - not only to other readers (or yourself, next year:-) but also to the compiler, letting it catch as many errors as possible (such as, this integer value is out of range)
Type conversions are your way of telling the compiler "yes, I meant to do that".
Use the type that best expresses the design intent.
If you're using too many type conversions, that usually means something has been declared as the wrong type; stop and think about the design for a bit and it will often simplify nicely. If you want to increment a std_logic_vector, it should probably be an unsigned, or even a natural.
Then convert when you have to : often at top level ports or other people's IP.
Conversions may infinitesimally slow down simulations, but that's another matter.
As for your option 2 : low level detailed descriptions are not only harder to understand than a <= a + 1; but they are no easier for synth tools to translate, and more likely to contain bugs.
I am giving another answer to better answer why in terms of gates and FPGA resources, it really doesn't matter which method you use. At the end, the logic will be implemented in Look-Up-Tables and flip flops. Usually (or always?) there are no native counters in the FPGA fabric. The synthesis will turn your code into LUTs, period. I always recommend trying to express the code as simple as possible. The more you try to write your code in RTL (vs. behavioral) the more error prone it will be. KISS is the appropriate course of action everytime, The synthesis tool, if any good, will simplify your intent as much as possible.
The only reason to implement arithmetic by hand is if you:
Think you can do a better job than the synthesis tool (where better could be smaller, faster, less power consuming, etc)
and you think the reduced portability and maintainability of your code does not matter too much in the long run
and it actually matters if you do a better job than the synthesis tool (e.g. you can reach your desired operating frequency only by doing this by hand rather than letting the synthesis tool do it for you).
In many cases you can also rewrite your RTL code slightly or use synthesis attributes such as KEEP to persuade the synthesis tool to make more optimal implementation choices rather than hand-implementing arithmetic components.
By the way, a fairly standard trick to reduce the cost of hardware counters is to avoid normal binary arithmetic and instead use for example LFSR counters. See Xilinx XAPP 052 for some inspiration in this area if you are interested in FPGAs (it is quite old but the general principles is the same in current FPGAs).

DSP unit usage in VHDL

We are using a tool to convert the code into RTL.
Using those VHDL files, we would like to synthesis the code using FPGA.
In the synthesis results, we see the following table:
Slice Logic Utilization Used Available Utilization
Number of DSP48E1s 15 864 1%
I would like to search in VHDL files to see which operations use these units.
Is there any way to find them? or any documentation which shows the operations causing DSPs to be used?
There are a few ways that a DSP48 may be used in your VHDL.
It may be inferred. This is when the synthesis tool is smart by looking at an operation that you are doing (such as a multiply) and realizing that it would be most efficient to do the multiply with a dedicated resource (DSP48) instead of fabric/logic.
It may be instantiated. This means that the primitive was directly called out in your source file. The designer said that I know I want to use this piece of hardware, so I am going to call it out explicitly. This is when you could do a text search for "DSP48" in your VHDL source files.
It may be part of a core. If it is part of a core, you may or may not have visibility into that core. For example, how the core is actually implemented may be different than the behavioral model which is used for simulation.
In any case, as recommended by Russell, using Xilinx toolset to determine utilization of primitives in the design hierarchy can be a good first pass to figuring out where the units are coming from. Additionally, you can always open up FPGA Editor, see what the DSP48 units are called and what signals are going to/out of the DSP48 for additional hints on where it is in your design.
It sounds like you're trying to find your Module Level Utilization. I know that Xilinx ISE supports this. Under Design Overview there's an option called Module Level Utilization that breaks down every module in your VHDL design and tells you where the Regs, LUTs, BRAMs, and DSPs are used.
If you're unable to find it, look for any large multiplications in your design. Large multiply/accumulate operations will synthesize into DSP48s.

When to break down VHDL?

Although I'm somewhat proficient in writing VHDL there's a relatively basic question I need answering: When to break down VHDL?
A basic example: Say I was designing an 8bit ALU in VHDL, I have several options for its VHDL implementation.
Simply design the whole ALU as one entity. With all the I/O required in the entity (can be done because of the IEEE_STD_ARITHMETIC library).
--OR--
Break that ALU down into its subsequent blocks, say a carry-lookahead adder and some multiplexors.
--OR--
Break that down further into the blocks which make a carry-lookahead; a bunch of partial-full adders, a carry path and multiplexors and then connect them all together using structural elements.
We could then (if we wanted) break all of that right down to gate level, creating entities, behaviours and structures for each.
Of course the further down we break up the ALU the more VHDL files we need.
Does this affect the physical implementation after synthesis and when should we stop breaking things up?
You should keep your VHDL at the highest level of abstraction, so don't ever "break it down" as you described. What you are proposing is that you do the synthesis yourself (like creating a carry-lookahead adder) which is a bad idea. You don't know the target device (FPGA or ASIC library) as well as the synthesizer does and you shouldn't try to tell it what to do. If you want to do an addition, use the + operator and the tools will figure out the best structure that fits your design constraints.
Dividing the design into many modules will often make it more difficult to optimize your design, since optimizations between modules are generally harder to do than optimizations within modules.
Of course, major functional blocks that have well defined interfaces between them should be in separate modules for the sake of maintaining the design and readability. The ALU can be one module, the instruction ROM another, and so forth. These modules have distinct, well-defined functions and there is not much opportunity for intramodule optimization. If you want to get the last possible bit of optimization available, just flatten the design before optimization and let the tools do the work.

VHDL IEEE standard lib vs. component

I'm working on a VHDL project for a Xilinx FPGA, and found myself at a cross road.
I need for example to add two signals (C=A+B), and found that Xilinx has Tool that can generate a component which will do the job.
But this could also be implemented in standard VHDL: C <= A + B
If I use standard VHDL the code should be portable, but does this has lower throughput?
I mean, does the special components use DSP functions inside the FPGA ect., that makes them faster, or can the Synthesizer normally handle this?
Any time you can infer something do so.
Performance is very rarely impacted, especially in the case of simple things like adders and multipliers. Even RAM blocks are easy to infer for many purposes - I only tend to instantiate vendor components if I need a very specific vendor-block behaviour.
The DSP blocks will be used well if you write VHDL code for adders and multipliers of the appropriate widths (or smaller) with pipelining that matches what is available inside the block. If you want to be more advanced and use (for example) Xilinx's DSP block's opcode input then you will have to instantiate a block.

Resources