Estimating area required by a VHDL implementation - vhdl

I've got a few VHDL files, which I can compile with ghdl on Debian. The same files have been adapted by some for an ASIC implementation. There's one "large area" implementation and one "compact" implementation for an algorithm. I'd like to write some more implementations, but to evaluate them I'd need to be able to compare how much area the different implementations would take.
I'd like to do the evaluation without installing any proprietary compilers or obtaining any hardware. A sufficient evaluation criteria would be an estimation of GE (gate equivalent) area, or the number of logic slices needed by some FPGA implementation.

Start by counting the flip-flops (FFs). Their number is (almost) uniquely defined by the RTL code that you have written. With some experience, you can get this number by inspecting the code.
Typically, there is a good correlation between the #FFs and the overall area. An old rule of thumb is that for many designs, the combinatorial area will be about the same as the sequential area. For example, suppose the area count of a flip-flop is 10 gates in a gate array technology, then #FFs * 20 would give you an initial estimation.
Of course, the design characteristics have a significant influence. For datapath-oriented designs, the combinatorial area will be relatively larger. For control-oriented designs, the opposite is true. For standard-cell designs, the sequential area may be smaller because FFs are more efficient. For timing-critical designs, the combinatorial area may be much larger as a result of timing optimization by the synthesis tool.
Therefore, the remaining issue is to find out what a good multiplication factor is for your type of designs and target technology. The strategy could be to carry out some experiments, or to look at prior design results, or to ask others. From then on, estimating is a matter of multiplying the #FFs, known from your code, with that factor.

I'd like to do the evaluation without installing any proprietary compilers or obtaining any hardware.
Inspection will give you a rough idea but with all the optimisations that occur during synthesis you may find this level of accuracy too far removed from the end result.
I would suggest that you re-examine your reasons for avoiding "proprietary compilers" to perform the evaluation. I'm unaware of any non-proprietary synthesis tools for VHDL (though it has been discussed). The popular FPGA vendors provide free versions of their software for Windows and Linux which you could use to obtain accurate counts of resource usage. It should be feasible to translate the FPGA resource usage into something more meaningful for your target technology.
I'm not very familiar with the ASIC world but again there may be free (but proprietary) tools available for you to use.

Related

Look-Up Table division synthesizable in an ASIC/FPGA design? Makes any sense?

I was studying the ways to make an efficient FPGA project (toward to become an ASIC design) which include division operations of simple 32 bits binary numbers.
I have found that the most expedite way to do it, is using LUT (Look-up table), than generating a complex division logic. That's fine, however, when I think about ASIC I imagine a physical microchip, with digital logic inside, I can't imagine to put a whole table inside to produce the division. I can understand it makes sense in an FPGA because it has a lot of resources including on-chip memory etc, but not on a definitive ASIC.
My question is, LUT is actually synthesizable in an ASIC design? Is this how chips which need division operation, are in fact made?
Also, LUT does consumes less area than creating a division module??
I am quite noob on this, I thank you for your input.
General integer division is made using an iterative process, where each iteration generates a number of result bits based on either subtraction or table lookup, similar to when you did division on paper back in school. Specific integer division, for example if the numbers have few digits then a lookup table may be used instead, or if the divisor is an 2 ^ n number, then simple shifting may be used maybe combined with addition for rounding. So actual implementation of division actually depends on the arguments, and speed/size requirements.
Regarding your FPGA to ASIC conversion, then the LUTs in the FPGA is just a flexible way to implement general purpose combinational circuits, since an e.g. 4-input LUT can implement all outputs for a 4-input function. When you synthesize logical expressions to an FPGA, then the result will be a LUT representation since that is the building blocks available in an FPGA, but if you synthesize logical expressions to an ASIC, then the result will usually be a discrete gate representation since that is the building blocks available in an ASIC. The ASIC implementation is smaller and faster (for same technology), since the general purpose LUT overhead is avoided, however at the loss of the FPGA flexibility.
Synthesis become popular in FPGA designers. Everything you need to know about LUT based architecture is a transistor level design techniques which required set of skills.
I personally go with verilog netlist file with netgen command. You can go FPGA - LUT Architecture Optmization

Are muxes more "expensive" than other logic?

This is mostly out of curiosity.
One fragment from some VHDL code that I've been working on recently resembles the following:
led_q <= (pwm_d and ch_ena) when pwm_ena = '1' else ch_ena;
This is a mux-style expression, of course. But it's also equivalent to the following basic logic expression (at least when ignoring non-binary states):
led_q <= ch_ena and (pwm_d or not pwm_ena);
Is one "better" than the other in terms of logic utilisation or efficiency when actually implemented in an FPGA? Is it preferable to use one over the other, or is the compiler smart enough to pick the "best" on its own?
(For the curious, the purpose of the expression is to define the state of an LED -- if ch_ena is false it should always be off as the channel is disabled, otherwise it should either be on solidly or flashing according to pwm_d, according to pwm_ena (PWM enable). I think the first form describes this more obviously than the second, although it's not too hard to realise how the second behaves.)
For a simple logical expression, like the one shown, where the synthesis tool can easily create a complete truth table, the expression is likely to be converted to an internal truth table, which is then directly mapped to the available FPGA LUT resources. Since the truth table is identical for the two equivalent expressions, the hardware will also be the same.
However, for complex expressions where a complete truth table can't be generated, e.g. when using arithmetic operations, and/or where dedicated resources are available, the synthesis tool may choose to hold an internal representation that is more closely related to the original VHDL code, and in this case the VHDL coding style can have a great impact on the resulting logic, even for equivalent expressions.
In the end, the implementation is tool specific, so the best way to find out what logic is generated is to try it with the specific tool, in special for large or timing critical parts of the design, where the implementation is critical.
In general it depends on the target architecture. For Xilinx FPGAs the logic is mostly mapped into LUTs with sporadic use of the hard logic resources where the mapper can make use of them. Every possible LUT configuration has essentially equal performance so there's little benefit to scrutinizing the mapper's work unless you're really pushing the speed limits of the device where you'd be forced into manually instantiating hand-mapped LUTs.
Non-LUT based architectures like the Actel/Microsemi device families use 2-input muxes as the main logic primitive and everything is mapped down to them. You can't generalize what is best across all types of FPGAs and CPLDs but nowadays you can mostly trust that the mapper will do a decent enough job using timing constraints to push it toward the results you need.
With regards to the question I think it is best to avoid obscure Boolean expressions where possible. They tend to be hard to decipher months later when you forgot what you meant them to do. I would lean toward the when-else simply from a code maintenance point of view. Even for this trivial example you have to think closely about what behavior it describes whereas the when-else describes the intended behavior directly in human level syntax.
HDLs work best when you use the highest abstraction possible and avoid wallowing around with low-level bit twiddling. This is a place where VHDL truly shines if you leverage the more advanced features of the language and move away from describing raw logic everywhere. Let the synthesizer do the work. Introductory learning materials focus on the low level structural gate descriptions and logic expressions because that is easiest for beginners to get a start on but it is not the best way to use VHDL for complex designs in the long run.
Of course there are situations where Booleans are better, particularly when doing bitwise operations across vectors in parallel which requires messy loops to do the same imperatively. It all depends on the context.

VHDL behavioural vs structural performance

I was wandering, in terms of "performance" if there's some kind of difference between vhdl structural and behavioural. I know that nowdays is more common to write behavioural instead of structural but since i'd like to have an understanding in terms of performance i have been thinking that maybe there's some difference...
There is no hardware-related reason to prefer one form or the other.
It may be that one form leads to faster simulation than the other; I haven't seen any evidence for this in general, but then I haven't looked. It is true that after synthesis, the design is translated into a structural form, and post-synthesis simulation is slow, but this is due to the sheer size of the resulting structural version expressed as thousands of individual gates and their interconnections.
What matters more is the quality of synthesis results: It should be possible to write a design in both forms and have it synthesise to essentially the same hardware. And this seems to be generally true, in my experience.
Sometimes you will find synthesis tools have difficulty efficiently translating a construct (usually behavioural) but not as frequently as in the past.
What matters most (unless you are pushing the boundaries of speed or FPGA size) is clarity, leading to readability, reliability, efficiency, testability, maintainability and so on. If you can't understand it you can't see the inefficiencies or even test it properly.
Here, structural VHDL has a role to play at the top level : dividing a system into blocks like CPU, memory interface, FFT processor, UART, SPI and so on. Sometimes hierarchically, so you might want to divide the memory interface into refresh logic, error correction, address multiplexing, and so on.
But most blocks - for example, tasks that a single state machine can handle, are clearest and simplest when expressed behaviourally. So in a UART you might have two separate processes for TX and RX, while an SPI interface (which sends and receives on the same SPI clock) is probably best as a single behavioural process.

Implementions of algorithms for evaluating circuits

Consider the problem of circuit evaluation, where the input is a boolean circuit C and an input string x and you want to compute C(x). (Assume fan-in 2 if you like.)
This is a 'trivial' problem algorithmically, however it appears non-trivial to implement when C can be huge (think several million gates) and memory management becomes an issue.
There are several ways this problem can be approached, trading off memory, time, and disc access. But before going through all this work myself, does anyone know of any existing implementations of algorithms for this problem? It would be surprising to me if none exist...
For C/C++, the standard digital circuit design & simulation system for more than 10 years now is SystemC.
It is a library that allows you to design digital logic in C++. There are supporting software that allows you to do timing analysis and even generate schematic netlist for C code.
I've only played with it a little before deciding that I was more comfortable with Verilog. But it is a mature piece of software with lots of industry support. Googling around will yield a lot of information including several tutorial pages.
It sounds like Binary Decision Diagrams could be used for your task? There are well-known algorithms (and implementations) of these which are very compact in terms of memory usage, given that they are designed to be used on huge state spaces.

Comparing FPGA with ASIC design

I have a fundamental question. I produced some FPGA image for some media application and
now I would like to compare my results to the ones of ASIC implementation of the same algorithm in terms of performance & area. I have heard such a comparasion does not make sense since it is somewhat comparing apples and oranges. But I have heard about the Gate-equivalence metric, cant I use this for comparison reasons?
Thanks
As has been pointed out, gate equivalents are only a rough guesstimate and not all that accurate for determining area in an ASIC. There are different of ways you can go about finding out how your design would perform (and cost) in an ASIC. You likely used an HDL (VHDL or Verilog) to implement your design. If you have access to a synthesis tool like Synopsys' Design Compiler (DC) you can use that with one of the supplied ASIC vendor libraries to determine area. You can also use it to generate a post-synthesis, gate-level netlist that you can use in simulation to determine performance. DC will also give you information about critical path timing, etc. that can be used to calculate performance as well.
However, DC is a very expensive product and you likely used FPGA vendor supplied tools to synthesize your HDL design. You could approach an ASIC vendor and ask them to analyze your design to determine size & performance (they would likely use DC - you'd have to be willing to hand your HDL over to them). They may be inclined to do this in order to win your business. But as has been pointed out ASIC NREs are very expensive, so unless you have a high-volume product it probably doesn't make sense to move your design to an ASIC.
The gate equivalence metric might get you to within an order of magnitude - if that's good enough for you? The problem is that a 4-input LUT can implement a single AND gate, or a complex 4-input function representing several gates. Or (in a Xilinx chip) it can be a shift register with 16 bits of memory in it. And it has a flipflop attached to its output (with attendant control signals and the like.... another few gates). And if you've used Block memory or the DSP blocks, they are even harder to quantify.
When you say you want to compare performance and area, do you really mean "cost"? Is this a potential product with millions of units sold, or "just" a few 10s of thousands? ASIC NRE is big!
You can optimise your FPGA design for cost as well, which might be good enough, depending on your volumes. For example, an image-processing design done in a traditional fashion can be 10x bigger than one designed for seriously small FPGA usage, with similar application performance... if you know what you're doing :)

Resources