When to break down VHDL? - vhdl

Although I'm somewhat proficient in writing VHDL there's a relatively basic question I need answering: When to break down VHDL?
A basic example: Say I was designing an 8bit ALU in VHDL, I have several options for its VHDL implementation.
Simply design the whole ALU as one entity. With all the I/O required in the entity (can be done because of the IEEE_STD_ARITHMETIC library).
--OR--
Break that ALU down into its subsequent blocks, say a carry-lookahead adder and some multiplexors.
--OR--
Break that down further into the blocks which make a carry-lookahead; a bunch of partial-full adders, a carry path and multiplexors and then connect them all together using structural elements.
We could then (if we wanted) break all of that right down to gate level, creating entities, behaviours and structures for each.
Of course the further down we break up the ALU the more VHDL files we need.
Does this affect the physical implementation after synthesis and when should we stop breaking things up?

You should keep your VHDL at the highest level of abstraction, so don't ever "break it down" as you described. What you are proposing is that you do the synthesis yourself (like creating a carry-lookahead adder) which is a bad idea. You don't know the target device (FPGA or ASIC library) as well as the synthesizer does and you shouldn't try to tell it what to do. If you want to do an addition, use the + operator and the tools will figure out the best structure that fits your design constraints.
Dividing the design into many modules will often make it more difficult to optimize your design, since optimizations between modules are generally harder to do than optimizations within modules.
Of course, major functional blocks that have well defined interfaces between them should be in separate modules for the sake of maintaining the design and readability. The ALU can be one module, the instruction ROM another, and so forth. These modules have distinct, well-defined functions and there is not much opportunity for intramodule optimization. If you want to get the last possible bit of optimization available, just flatten the design before optimization and let the tools do the work.

Related

Are muxes more "expensive" than other logic?

This is mostly out of curiosity.
One fragment from some VHDL code that I've been working on recently resembles the following:
led_q <= (pwm_d and ch_ena) when pwm_ena = '1' else ch_ena;
This is a mux-style expression, of course. But it's also equivalent to the following basic logic expression (at least when ignoring non-binary states):
led_q <= ch_ena and (pwm_d or not pwm_ena);
Is one "better" than the other in terms of logic utilisation or efficiency when actually implemented in an FPGA? Is it preferable to use one over the other, or is the compiler smart enough to pick the "best" on its own?
(For the curious, the purpose of the expression is to define the state of an LED -- if ch_ena is false it should always be off as the channel is disabled, otherwise it should either be on solidly or flashing according to pwm_d, according to pwm_ena (PWM enable). I think the first form describes this more obviously than the second, although it's not too hard to realise how the second behaves.)
For a simple logical expression, like the one shown, where the synthesis tool can easily create a complete truth table, the expression is likely to be converted to an internal truth table, which is then directly mapped to the available FPGA LUT resources. Since the truth table is identical for the two equivalent expressions, the hardware will also be the same.
However, for complex expressions where a complete truth table can't be generated, e.g. when using arithmetic operations, and/or where dedicated resources are available, the synthesis tool may choose to hold an internal representation that is more closely related to the original VHDL code, and in this case the VHDL coding style can have a great impact on the resulting logic, even for equivalent expressions.
In the end, the implementation is tool specific, so the best way to find out what logic is generated is to try it with the specific tool, in special for large or timing critical parts of the design, where the implementation is critical.
In general it depends on the target architecture. For Xilinx FPGAs the logic is mostly mapped into LUTs with sporadic use of the hard logic resources where the mapper can make use of them. Every possible LUT configuration has essentially equal performance so there's little benefit to scrutinizing the mapper's work unless you're really pushing the speed limits of the device where you'd be forced into manually instantiating hand-mapped LUTs.
Non-LUT based architectures like the Actel/Microsemi device families use 2-input muxes as the main logic primitive and everything is mapped down to them. You can't generalize what is best across all types of FPGAs and CPLDs but nowadays you can mostly trust that the mapper will do a decent enough job using timing constraints to push it toward the results you need.
With regards to the question I think it is best to avoid obscure Boolean expressions where possible. They tend to be hard to decipher months later when you forgot what you meant them to do. I would lean toward the when-else simply from a code maintenance point of view. Even for this trivial example you have to think closely about what behavior it describes whereas the when-else describes the intended behavior directly in human level syntax.
HDLs work best when you use the highest abstraction possible and avoid wallowing around with low-level bit twiddling. This is a place where VHDL truly shines if you leverage the more advanced features of the language and move away from describing raw logic everywhere. Let the synthesizer do the work. Introductory learning materials focus on the low level structural gate descriptions and logic expressions because that is easiest for beginners to get a start on but it is not the best way to use VHDL for complex designs in the long run.
Of course there are situations where Booleans are better, particularly when doing bitwise operations across vectors in parallel which requires messy loops to do the same imperatively. It all depends on the context.

VHDL behavioural vs structural performance

I was wandering, in terms of "performance" if there's some kind of difference between vhdl structural and behavioural. I know that nowdays is more common to write behavioural instead of structural but since i'd like to have an understanding in terms of performance i have been thinking that maybe there's some difference...
There is no hardware-related reason to prefer one form or the other.
It may be that one form leads to faster simulation than the other; I haven't seen any evidence for this in general, but then I haven't looked. It is true that after synthesis, the design is translated into a structural form, and post-synthesis simulation is slow, but this is due to the sheer size of the resulting structural version expressed as thousands of individual gates and their interconnections.
What matters more is the quality of synthesis results: It should be possible to write a design in both forms and have it synthesise to essentially the same hardware. And this seems to be generally true, in my experience.
Sometimes you will find synthesis tools have difficulty efficiently translating a construct (usually behavioural) but not as frequently as in the past.
What matters most (unless you are pushing the boundaries of speed or FPGA size) is clarity, leading to readability, reliability, efficiency, testability, maintainability and so on. If you can't understand it you can't see the inefficiencies or even test it properly.
Here, structural VHDL has a role to play at the top level : dividing a system into blocks like CPU, memory interface, FFT processor, UART, SPI and so on. Sometimes hierarchically, so you might want to divide the memory interface into refresh logic, error correction, address multiplexing, and so on.
But most blocks - for example, tasks that a single state machine can handle, are clearest and simplest when expressed behaviourally. So in a UART you might have two separate processes for TX and RX, while an SPI interface (which sends and receives on the same SPI clock) is probably best as a single behavioural process.

Driving module output from combinatorial block

Is it a good design practice to use combinatorial logic to drive the output of a module in VHDL/Verilog?
Is it okay to use the module input directly inside a combinatorial block,and use the output of that combinatorial block to drive another sequential block in the same module?
An answer to the two questions really depends on the overall design methodology
and conditions, and will be opinion based, as Morgan points out in his comment.
The questions are in special relevant for a large design with timing pushed to
the limit, and where multiple designers contribute with different modules. In
this case it is important to determine a design methodology up front which
answers the two questions, in order to ensure that modules provided by
different designers can be integrated smoothly without timing issues.
Designing with flip-flops on all outputs of each module, gives the advantage
that when an output is used as input to other module, then the input timing is
reasonable well defined, and only depends on the routing delay. This makes it
a Yes to question 1.
Having a reasonable well-defined input timing makes it possible to make complex
combinatorial logic directly on the inputs, since most of the clock cycle will
be available for this. So this also makes it a Yes to question 2.
With the above Yes/Yes design methodology, the available cycle time is only
used once, and that is at the input side of the module, before the flip-flops
that goes on the output. The result is that multiple modules will click nicely
together like LEGO bricks, as shown in the figure below.
If a strict design methodology is not adhered to in different modules, then
some modules may place flip-flops on the input, and some on the output. A
longer cycle time, thus slower frequency, is then required, since the worst
case path goes through twice the depth of combinatorial logic. Such a design
is shown in the figure below, and should be avoided.
A third option exists, where flip-flops are placed on all inputs, and the
design will look like the figure below if two different modules use the same
output.
One disadvantage with this approach is that the number of flip-flops may be
higher, since the same output is used as input to multiple flip-flops, and the
synthesis tool may not combine these equivalent flip-flops. And even more
flip-flops than this may be required, if the module that generates the output
will also have to make a flip-flopped version for internal use, which is often
the case.
So the short answer to the questions is: Yes and Yes.
The answer to both questions as expressed is basically yes, provided the final design meets speed targets, and the input signals are clean.
The problem with blocks designed this way are that the signal timings through them are not accurately defined, so that combining several such blocks may result in an absurdly slow design, or one in which fast input signals don't propagate cleanly through the design.
If you design such a circuit, and it meets ALL your input and output timing constraints as well as any clock speed constraints you set, it will work.
However if it fails to meet the clock constraints you will have to insert registers to "pipeline" the design, breaking up long slow chains of combinational logic. And you will have to observe the input and output timings reported by synthesis and PAR, and they can get complicated.
In practice (in an FPGA : ASICs can be different) registers are free with each logic block (Xilinx/Altera, not true for Actel/Microsemi) and placing registers on each block's inputs and/or outputs makes the timings much simpler to understand and analyse.
And because such a design is pipelined, it is normally also much faster.

VHDL IEEE standard lib vs. component

I'm working on a VHDL project for a Xilinx FPGA, and found myself at a cross road.
I need for example to add two signals (C=A+B), and found that Xilinx has Tool that can generate a component which will do the job.
But this could also be implemented in standard VHDL: C <= A + B
If I use standard VHDL the code should be portable, but does this has lower throughput?
I mean, does the special components use DSP functions inside the FPGA ect., that makes them faster, or can the Synthesizer normally handle this?
Any time you can infer something do so.
Performance is very rarely impacted, especially in the case of simple things like adders and multipliers. Even RAM blocks are easy to infer for many purposes - I only tend to instantiate vendor components if I need a very specific vendor-block behaviour.
The DSP blocks will be used well if you write VHDL code for adders and multipliers of the appropriate widths (or smaller) with pipelining that matches what is available inside the block. If you want to be more advanced and use (for example) Xilinx's DSP block's opcode input then you will have to instantiate a block.

Large Scale VHDL modularization techniques

I'm thinking about implimenting a 16 bit CPU in VHDL.
A simplish CPU.
ADD, MULS, NEG, BitShift, JUMP, Relitive Jump, BREQ, Relitive BREQ, i don't know somethign along these lines>
Probably all only working with 16bit operands.
I might even cut it down and use only a single operand and a accumulator.
With Some status regitsters, Carry, Zero, Neg (unless i use a Accumlator),
I know how to design all the parts from logic gates, and plan to build them up from first priciples,
So for my ALU I'll need to 'build' a ADDer, proably a Carry Look ahead, group adder,
this adder it self is make up oa a couple of parts, wich are themselves made up of a couple of parts.
Anyway, my problem is not the CPU design, or the VHDL (i know the language, more or less).
It's how i should keep things organised.
How should I use packages,
How should I name my processes and port maps? (i've never seen the benifit of naming the port maps, or processes)
Whatever you do, be sure to read Jiri Gaisler's master work on structured VHDL design method.
http://www.gaisler.com/doc/vhdl2proc.pdf
http://www.gaisler.com/doc/structdes.pdf
You'll be very glad you did.
Looking at some existing examples wouldn't hurt. At the level you're talking about (naming conventions and such) I've never really done much different in hardware design than in software.
As an aside, I'd generally advise against doing things like your own adders and such, unless it's something that's required because it's homework, or something like that. With FPGA's and (to a slightly lesser extent) ASICs, you have an existing "library" of hardware in the device, so some thing like A <= B + c will typically use an adder circuit that's already built into the device in the case of an FPGA or a hand-optimized hard macro in the case of an ASIC.
Writing your own will take a fair amount of extra work, and it'll almost always produce a worse result. In the case of an ASIC, it'll be a little worse; in the case of an FPGA, it'll usually be quite a bit worse.
Edit: I should also note that a simple CPU doesn't really qualify as a large-scale design, at least IMO. Maybe it's due to my background in software, but I've always found CPU design fairly straightforward. Just for one example, the one time I did a DRAM controller, it seemed like a lot more work to me. I don't recall anything like source code line counts, but based on memory, I'd say it was larger (probably by something like 2x). Of course, it'll depend on exactly how simple of a CPU you decide on too...

Resources