When do you use a block statement in a VHDL design and when do you not? - vhdl

I come from SW world and recently I've started to create FPGA designs in VHDL. I've read about the block concurrent statement and its principal uses like organize architecture grouping concurrent code and guard signals, which is not recommendable.
But this is one of many possibilities in order to implement the same functionality. For instance, I've been implemented a CRC frame checker with a VHDL function. It has one bit value input, and return a register with the cumulative CRC value of all bit inputs.
I think the same functionality can be implemented with a block. What is the best option for resource utilization? When would you use a block and when would not? Which is the best case to implement a block?
Thanks,

What is the best option for resource utilization?
There should be no different between with or without block in terms of resource utilization. This assumes that you're creating the same logic.
When would you use a block and when would not?
Similar to software, the only reason you want to use block statement is when you want to limit the scope of the variables used within a portion of the code. This can significantly improve code readability in a large design where signals can be declared and utilized in the same region.
I would not recommend anyone to use block statement in a small design, or where component instantiation is more appropriate.
Which is the best case to implement a block?
When it improves code readability.

Related

Case statement vs If else in VHDL

What is main differences between if else and case statement in VHDL. Although both look similar and sometime replace each other.but What logic circuit appear after synthesis . When should we go for if else or case statement ?
Assuming an if-statement and a case-statement describes the same behavior, then the resulting circuit is likely to be identical after the synthesis tools done the translation and optimization.
As Paebbels writes in the comment, the details are described for each tool in the relevant synthesis guide, and there are probably tool-dependent cases where the result may differ, but as a general working assumption, then the synthesis tool will get to the same circuit for equivalent if-statements and case-statements.
The critical point is usually to make correct and maintainable VHDL code, and here readability counts, so choose an if-statement or a case-statement depending on what makes the code most straight forward, and don't try to control the resulting circuit through VHDL constructions, unless there is a specific reason that this is required.
Note that in the if-statement early conditions takes priority over later, but in the case-statement all when have equal priority.
Remember that VHDL is parallel programming language and a form of declarative programming see here as opposed to procedural programming like c/c++ and another other sequential language.
This means in essence, you are telling or attempting to describe to the compiler with your code what the behavior should be, and not specifically telling it what to do or what the behavior is like with procedural programming. This might be what prompted you to ask the question.
Now remember however, that the sequencing of the if or case will affect synthesis. With FPGA's nowadays, all combinatorial part of the logic are in the form of Loop up tables which are internally designed as cascaded arrays multiplexers grouped together to form LUTs with input number N commonly 4 See here for more details, and the compiler decides how to configure these arrays of LUTs.
The ordering can affect the number of cascaded multiplexer that the compiler calculates before the output is resolved.
Note that although in theory, it is possible to get the same behaviour for both if and switch. Case is looking at a single variable and deciding cases for each possible outcome while an If statement can be applied to multiple variables at the same time.
So flexibility? I would say goes to If. However with great power comes great responsibility, if is easy it use several signals from everywhere and if not done properly can lead to bad design, ie coupling of too many variable and any change is subject to failure due to too many dependency issues. Case is suitable for state machines but that is also true for procedural languages I suppose.
In addition, if you use too many different signals to act as conditions to your If, it can affect timing. which may mean limitation in your clock frequency, if you are working with high speed and the list goes on. clock skew, need to constrain signals etc.

Are muxes more "expensive" than other logic?

This is mostly out of curiosity.
One fragment from some VHDL code that I've been working on recently resembles the following:
led_q <= (pwm_d and ch_ena) when pwm_ena = '1' else ch_ena;
This is a mux-style expression, of course. But it's also equivalent to the following basic logic expression (at least when ignoring non-binary states):
led_q <= ch_ena and (pwm_d or not pwm_ena);
Is one "better" than the other in terms of logic utilisation or efficiency when actually implemented in an FPGA? Is it preferable to use one over the other, or is the compiler smart enough to pick the "best" on its own?
(For the curious, the purpose of the expression is to define the state of an LED -- if ch_ena is false it should always be off as the channel is disabled, otherwise it should either be on solidly or flashing according to pwm_d, according to pwm_ena (PWM enable). I think the first form describes this more obviously than the second, although it's not too hard to realise how the second behaves.)
For a simple logical expression, like the one shown, where the synthesis tool can easily create a complete truth table, the expression is likely to be converted to an internal truth table, which is then directly mapped to the available FPGA LUT resources. Since the truth table is identical for the two equivalent expressions, the hardware will also be the same.
However, for complex expressions where a complete truth table can't be generated, e.g. when using arithmetic operations, and/or where dedicated resources are available, the synthesis tool may choose to hold an internal representation that is more closely related to the original VHDL code, and in this case the VHDL coding style can have a great impact on the resulting logic, even for equivalent expressions.
In the end, the implementation is tool specific, so the best way to find out what logic is generated is to try it with the specific tool, in special for large or timing critical parts of the design, where the implementation is critical.
In general it depends on the target architecture. For Xilinx FPGAs the logic is mostly mapped into LUTs with sporadic use of the hard logic resources where the mapper can make use of them. Every possible LUT configuration has essentially equal performance so there's little benefit to scrutinizing the mapper's work unless you're really pushing the speed limits of the device where you'd be forced into manually instantiating hand-mapped LUTs.
Non-LUT based architectures like the Actel/Microsemi device families use 2-input muxes as the main logic primitive and everything is mapped down to them. You can't generalize what is best across all types of FPGAs and CPLDs but nowadays you can mostly trust that the mapper will do a decent enough job using timing constraints to push it toward the results you need.
With regards to the question I think it is best to avoid obscure Boolean expressions where possible. They tend to be hard to decipher months later when you forgot what you meant them to do. I would lean toward the when-else simply from a code maintenance point of view. Even for this trivial example you have to think closely about what behavior it describes whereas the when-else describes the intended behavior directly in human level syntax.
HDLs work best when you use the highest abstraction possible and avoid wallowing around with low-level bit twiddling. This is a place where VHDL truly shines if you leverage the more advanced features of the language and move away from describing raw logic everywhere. Let the synthesizer do the work. Introductory learning materials focus on the low level structural gate descriptions and logic expressions because that is easiest for beginners to get a start on but it is not the best way to use VHDL for complex designs in the long run.
Of course there are situations where Booleans are better, particularly when doing bitwise operations across vectors in parallel which requires messy loops to do the same imperatively. It all depends on the context.

DSP unit usage in VHDL

We are using a tool to convert the code into RTL.
Using those VHDL files, we would like to synthesis the code using FPGA.
In the synthesis results, we see the following table:
Slice Logic Utilization Used Available Utilization
Number of DSP48E1s 15 864 1%
I would like to search in VHDL files to see which operations use these units.
Is there any way to find them? or any documentation which shows the operations causing DSPs to be used?
There are a few ways that a DSP48 may be used in your VHDL.
It may be inferred. This is when the synthesis tool is smart by looking at an operation that you are doing (such as a multiply) and realizing that it would be most efficient to do the multiply with a dedicated resource (DSP48) instead of fabric/logic.
It may be instantiated. This means that the primitive was directly called out in your source file. The designer said that I know I want to use this piece of hardware, so I am going to call it out explicitly. This is when you could do a text search for "DSP48" in your VHDL source files.
It may be part of a core. If it is part of a core, you may or may not have visibility into that core. For example, how the core is actually implemented may be different than the behavioral model which is used for simulation.
In any case, as recommended by Russell, using Xilinx toolset to determine utilization of primitives in the design hierarchy can be a good first pass to figuring out where the units are coming from. Additionally, you can always open up FPGA Editor, see what the DSP48 units are called and what signals are going to/out of the DSP48 for additional hints on where it is in your design.
It sounds like you're trying to find your Module Level Utilization. I know that Xilinx ISE supports this. Under Design Overview there's an option called Module Level Utilization that breaks down every module in your VHDL design and tells you where the Regs, LUTs, BRAMs, and DSPs are used.
If you're unable to find it, look for any large multiplications in your design. Large multiply/accumulate operations will synthesize into DSP48s.

Driving module output from combinatorial block

Is it a good design practice to use combinatorial logic to drive the output of a module in VHDL/Verilog?
Is it okay to use the module input directly inside a combinatorial block,and use the output of that combinatorial block to drive another sequential block in the same module?
An answer to the two questions really depends on the overall design methodology
and conditions, and will be opinion based, as Morgan points out in his comment.
The questions are in special relevant for a large design with timing pushed to
the limit, and where multiple designers contribute with different modules. In
this case it is important to determine a design methodology up front which
answers the two questions, in order to ensure that modules provided by
different designers can be integrated smoothly without timing issues.
Designing with flip-flops on all outputs of each module, gives the advantage
that when an output is used as input to other module, then the input timing is
reasonable well defined, and only depends on the routing delay. This makes it
a Yes to question 1.
Having a reasonable well-defined input timing makes it possible to make complex
combinatorial logic directly on the inputs, since most of the clock cycle will
be available for this. So this also makes it a Yes to question 2.
With the above Yes/Yes design methodology, the available cycle time is only
used once, and that is at the input side of the module, before the flip-flops
that goes on the output. The result is that multiple modules will click nicely
together like LEGO bricks, as shown in the figure below.
If a strict design methodology is not adhered to in different modules, then
some modules may place flip-flops on the input, and some on the output. A
longer cycle time, thus slower frequency, is then required, since the worst
case path goes through twice the depth of combinatorial logic. Such a design
is shown in the figure below, and should be avoided.
A third option exists, where flip-flops are placed on all inputs, and the
design will look like the figure below if two different modules use the same
output.
One disadvantage with this approach is that the number of flip-flops may be
higher, since the same output is used as input to multiple flip-flops, and the
synthesis tool may not combine these equivalent flip-flops. And even more
flip-flops than this may be required, if the module that generates the output
will also have to make a flip-flopped version for internal use, which is often
the case.
So the short answer to the questions is: Yes and Yes.
The answer to both questions as expressed is basically yes, provided the final design meets speed targets, and the input signals are clean.
The problem with blocks designed this way are that the signal timings through them are not accurately defined, so that combining several such blocks may result in an absurdly slow design, or one in which fast input signals don't propagate cleanly through the design.
If you design such a circuit, and it meets ALL your input and output timing constraints as well as any clock speed constraints you set, it will work.
However if it fails to meet the clock constraints you will have to insert registers to "pipeline" the design, breaking up long slow chains of combinational logic. And you will have to observe the input and output timings reported by synthesis and PAR, and they can get complicated.
In practice (in an FPGA : ASICs can be different) registers are free with each logic block (Xilinx/Altera, not true for Actel/Microsemi) and placing registers on each block's inputs and/or outputs makes the timings much simpler to understand and analyse.
And because such a design is pipelined, it is normally also much faster.

How do for loops in Verilog execute?

Do for loops in Verilog execute in parallel? I need to call a module several times, but they have to execute at the same time. Instead of writing them out one by one, I was thinking of using a for loop. Will it work the same?
Verilog describes hardware, so it doesn't make sense to think in terms of executing loops or calling modules in this context. If I understand the intent of your question correctly, you'd like to have multiple instantiations of the same module with distinct inputs and outputs.
To accomplish this you can use Verilog's generate statements to generate the instantiations automatically.
You can also use the auto_template functionality in Emacs' excellent verilog-mode. I prefer this approach as each instantiation appears explicitly in my source code and I find it easier to detect errors.
As jlf answered, you're looking for a generate statement. You would use a for-loop to model combinational logic, such as going through all of the bits in a register and computing an output. This would be in an always block or even an initial block in your testbench.

Resources