DSP unit usage in VHDL - vhdl

We are using a tool to convert the code into RTL.
Using those VHDL files, we would like to synthesis the code using FPGA.
In the synthesis results, we see the following table:
Slice Logic Utilization Used Available Utilization
Number of DSP48E1s 15 864 1%
I would like to search in VHDL files to see which operations use these units.
Is there any way to find them? or any documentation which shows the operations causing DSPs to be used?

There are a few ways that a DSP48 may be used in your VHDL.
It may be inferred. This is when the synthesis tool is smart by looking at an operation that you are doing (such as a multiply) and realizing that it would be most efficient to do the multiply with a dedicated resource (DSP48) instead of fabric/logic.
It may be instantiated. This means that the primitive was directly called out in your source file. The designer said that I know I want to use this piece of hardware, so I am going to call it out explicitly. This is when you could do a text search for "DSP48" in your VHDL source files.
It may be part of a core. If it is part of a core, you may or may not have visibility into that core. For example, how the core is actually implemented may be different than the behavioral model which is used for simulation.
In any case, as recommended by Russell, using Xilinx toolset to determine utilization of primitives in the design hierarchy can be a good first pass to figuring out where the units are coming from. Additionally, you can always open up FPGA Editor, see what the DSP48 units are called and what signals are going to/out of the DSP48 for additional hints on where it is in your design.

It sounds like you're trying to find your Module Level Utilization. I know that Xilinx ISE supports this. Under Design Overview there's an option called Module Level Utilization that breaks down every module in your VHDL design and tells you where the Regs, LUTs, BRAMs, and DSPs are used.
If you're unable to find it, look for any large multiplications in your design. Large multiply/accumulate operations will synthesize into DSP48s.

Related

How to write to a file using an FPGA

I feel I have put it a decent effort in searching for a solution to my problem online, but can't find what I need in order to accomplish my goal.
Essentially, what I need to do is parse data from a file being received by my FPGA through serial. The data is fairly extensive and I think it would be easier if were able to use some of the functions inside of the textIO library.
All of the techniques I have found online reffering to data parsing is only for simulation. I need this to actually happen on the FPGA.
So my question is, is there a way to create a file internally on the FPGA and have the input from serial write to it then be able to use the textIO functions on that txt file?
Some psuedo code might look somthing like:
File_Open("newFile.txt", write) --If it doesn't exist, then create it
write(SerialByteStream, newFile.txt) --Collect serial data onto txt file
Then run textIO function on newFile.txt in order to use the data in newFile.txt
Also, it's worth mentioning that I am new to FPGA's and VHDL, so it could be that there is a trivial solution that I am not aware of. And I'm using VHDL with the Altera DE2-115.
I appreciate any help.
No, what you are proposing is not possible. As you have found, VHDL's file i/o are really just instructions to the simulator to do something. Note the distinctions between synthesisable and non-synthesisable VHDL. You can only program the synthesisable part of your VHDL into an FPGA, and this usually won't include file related libraries.
Complex file operations are a general purpose processing task - what PCs do. Your best avenue of investigation is probably to reconsider what you want the FPGA for in the first place and focus on that.
Some possibilities:
If the FPGA is only to provide an interface to read and write some byte stream onto a PC, perhaps you should do precisely that. Do the data processing on the PC. Transferring the data is still not trivial, but in this case you needed to solve that problem anyway.
If you need the FPGA for some high performance computations, see if you can pre-process and provide that data in a format that is easier for your design to digest.
If really necessary, processing your serial byte stream with VHDL may not be as hard as it sounds, especially if you only need to operate on it linearly. Probably you need a design involving at least one state machine that parses your serial byte stream, but the rest all depends on the details of your problem.
If you really need complex processing ON the FPGA, consider using a soft core CPU. There may be open source ones that fit on your FPGA. Whatever you want to do may be easier to write in C, which you can then compile and run on the FPGA. That option gives you a very flexible standalone hardware component, but if you need very high performance or it you do not have much time to set it up, this is probably not right for you.

Are muxes more "expensive" than other logic?

This is mostly out of curiosity.
One fragment from some VHDL code that I've been working on recently resembles the following:
led_q <= (pwm_d and ch_ena) when pwm_ena = '1' else ch_ena;
This is a mux-style expression, of course. But it's also equivalent to the following basic logic expression (at least when ignoring non-binary states):
led_q <= ch_ena and (pwm_d or not pwm_ena);
Is one "better" than the other in terms of logic utilisation or efficiency when actually implemented in an FPGA? Is it preferable to use one over the other, or is the compiler smart enough to pick the "best" on its own?
(For the curious, the purpose of the expression is to define the state of an LED -- if ch_ena is false it should always be off as the channel is disabled, otherwise it should either be on solidly or flashing according to pwm_d, according to pwm_ena (PWM enable). I think the first form describes this more obviously than the second, although it's not too hard to realise how the second behaves.)
For a simple logical expression, like the one shown, where the synthesis tool can easily create a complete truth table, the expression is likely to be converted to an internal truth table, which is then directly mapped to the available FPGA LUT resources. Since the truth table is identical for the two equivalent expressions, the hardware will also be the same.
However, for complex expressions where a complete truth table can't be generated, e.g. when using arithmetic operations, and/or where dedicated resources are available, the synthesis tool may choose to hold an internal representation that is more closely related to the original VHDL code, and in this case the VHDL coding style can have a great impact on the resulting logic, even for equivalent expressions.
In the end, the implementation is tool specific, so the best way to find out what logic is generated is to try it with the specific tool, in special for large or timing critical parts of the design, where the implementation is critical.
In general it depends on the target architecture. For Xilinx FPGAs the logic is mostly mapped into LUTs with sporadic use of the hard logic resources where the mapper can make use of them. Every possible LUT configuration has essentially equal performance so there's little benefit to scrutinizing the mapper's work unless you're really pushing the speed limits of the device where you'd be forced into manually instantiating hand-mapped LUTs.
Non-LUT based architectures like the Actel/Microsemi device families use 2-input muxes as the main logic primitive and everything is mapped down to them. You can't generalize what is best across all types of FPGAs and CPLDs but nowadays you can mostly trust that the mapper will do a decent enough job using timing constraints to push it toward the results you need.
With regards to the question I think it is best to avoid obscure Boolean expressions where possible. They tend to be hard to decipher months later when you forgot what you meant them to do. I would lean toward the when-else simply from a code maintenance point of view. Even for this trivial example you have to think closely about what behavior it describes whereas the when-else describes the intended behavior directly in human level syntax.
HDLs work best when you use the highest abstraction possible and avoid wallowing around with low-level bit twiddling. This is a place where VHDL truly shines if you leverage the more advanced features of the language and move away from describing raw logic everywhere. Let the synthesizer do the work. Introductory learning materials focus on the low level structural gate descriptions and logic expressions because that is easiest for beginners to get a start on but it is not the best way to use VHDL for complex designs in the long run.
Of course there are situations where Booleans are better, particularly when doing bitwise operations across vectors in parallel which requires messy loops to do the same imperatively. It all depends on the context.

VHDL IEEE standard lib vs. component

I'm working on a VHDL project for a Xilinx FPGA, and found myself at a cross road.
I need for example to add two signals (C=A+B), and found that Xilinx has Tool that can generate a component which will do the job.
But this could also be implemented in standard VHDL: C <= A + B
If I use standard VHDL the code should be portable, but does this has lower throughput?
I mean, does the special components use DSP functions inside the FPGA ect., that makes them faster, or can the Synthesizer normally handle this?
Any time you can infer something do so.
Performance is very rarely impacted, especially in the case of simple things like adders and multipliers. Even RAM blocks are easy to infer for many purposes - I only tend to instantiate vendor components if I need a very specific vendor-block behaviour.
The DSP blocks will be used well if you write VHDL code for adders and multipliers of the appropriate widths (or smaller) with pipelining that matches what is available inside the block. If you want to be more advanced and use (for example) Xilinx's DSP block's opcode input then you will have to instantiate a block.

How to determine how many slices a design uses

I've implemented a 16-bit ALU and a register file in VHDL using the Xilinx ISE. I've been asked how many slices my design uses, and I have no idea how to go about answering that question. I'm not working with a particular chip or simulating one, I just wrote the vhdl and debugged it by using a test bench.
Is there a way to get the ISE to generate how many slices my design uses? Or do I need to go through all my code and count up my operations? Or is it as simple as defining what type of components I used?
To get a true view of what resources your design will consume use the map report. Implement the design and look at the hierarchical report of the usage (Slices, slice registers(or flip-flops), LUTS, LUTRAM, BRAM, DSPs, etc) of each module in your design in the map report file. In ISE 13.2 that is Section 13 of your _map.mrp file. You may have to enable the -detail switch in map.
Slices can be a deceptive metric (especially after a map report) since if you only use a single element of a slice it will count the entire slice as used. You will have to understand what is in a slice to really understand what the usage number means. Virtex 6 for example has 8 flip-flops and 4 6-input LUTS per slice.
If you only look at the synthesis numbers (slice flip-flops and slice LUTS) you may miss any netlist black boxes that your design uses (ie coregen elements, microblaze, system generator, or third party IP delivered in netlist form).
Ugh, I figured it out.
The trick is to click whatever module you want to get the slice count for and set it as the top level module by going to Source->Set as top level module. Once you do that, under the Processes pane (making sure the module is still highlighted in the Sources pane) go to the Synthesize - XST and double click 'View Synthesis Report'. The number of slices for that module is then listed in that report.

How to reduce number of logic elements

I am trying to reduce the number of logic elements in my vhdl code. I am using quartus II to program a Altera DE2 FPGA. Can someone please give some advice on how I can do that ?
Thanks
Without additional detail of your design, only generic advice can be given.
There are many ways to reduce device utilization in an FPGA, which break down into two major categories:
Better use of your build toolset (synthesis, map, p&r tools)
Better HDL design
Build Toolset Areas to Look For
Set tool to optimize for area instead of speed
Enable tool to allow resource sharing, retiming, and pipelining (as available and appropriate)
Are your constraints being properly applied to your design? If not, the tools could be "working harder" in order to meet your constraints creating more logic/area utilization.
HDL Design Areas to Look For
Consider your target device's architecture. Can you make use of device specific features to save on general logic? (examples: internal block memory for large LUTs, FIFOs, RAMs/ROMs, dedicated multipliers, etc)
Use the tool output to determine areas to optimize in your HDL design. Look at your RTL and technology views. Analyze your critical paths. Are there places where trades could be made?
Look at HDL coding guidelines published by Altera for their synthesis tools. Does your code implementation match recommendations made in documentation in order to gain best synthesis results?
If you have more specific concerns, please add an update.
Check out the relevant chapter of the Quartus II Handbook: Area and Timing Optimization (Vol 2, Ch 13)

Resources