I am trying to reduce the number of logic elements in my vhdl code. I am using quartus II to program a Altera DE2 FPGA. Can someone please give some advice on how I can do that ?
Thanks
Without additional detail of your design, only generic advice can be given.
There are many ways to reduce device utilization in an FPGA, which break down into two major categories:
Better use of your build toolset (synthesis, map, p&r tools)
Better HDL design
Build Toolset Areas to Look For
Set tool to optimize for area instead of speed
Enable tool to allow resource sharing, retiming, and pipelining (as available and appropriate)
Are your constraints being properly applied to your design? If not, the tools could be "working harder" in order to meet your constraints creating more logic/area utilization.
HDL Design Areas to Look For
Consider your target device's architecture. Can you make use of device specific features to save on general logic? (examples: internal block memory for large LUTs, FIFOs, RAMs/ROMs, dedicated multipliers, etc)
Use the tool output to determine areas to optimize in your HDL design. Look at your RTL and technology views. Analyze your critical paths. Are there places where trades could be made?
Look at HDL coding guidelines published by Altera for their synthesis tools. Does your code implementation match recommendations made in documentation in order to gain best synthesis results?
If you have more specific concerns, please add an update.
Check out the relevant chapter of the Quartus II Handbook: Area and Timing Optimization (Vol 2, Ch 13)
Related
I have completed on a project making physical logic gates and am now looking for a way to turn an arbitrary program into some series of logic gates so I can use them.
I need a program that can take some arbitrary function (say f= x^2 -1) directly into some series of logic gates. Does this already exist?
I have found Verilog and several other open source options but they do not appear to output circuit diagrams. There is also Quartus II and other programs which will convert VHDL code into a schematic.
Preferably I would like something that compiles Python/C++ to a logic gate schematic directly - but really any language is fine.
Thanks.
EDIT: I do mean physical gates - they use ball bearings!
I have all of the 1-bit-input/1-bit-output gates and all of the 2-bit-input/1-bit output-gates. From these I can also construct a MAJ gate to do error correction.
Write your arbitrary code in VHDL, turning VHDL into gates is what a synthesis tool does.
Not everything you write will be synthesisable; file handling can't be translated into gates, neither can anything that conventionally uses the heap (such as malloc, or pointers, in C, or "new" and access types in VHDL). Floating point can be synthesised (with VHDL-2008) but it's not quite as simple as signal A : Real; A <= 2.0 * X * X - 1.0; you have to use types from the synthesisable floating point library. So there may be some negotiation with the tool about the programming language subset you can use.
But I'm sensing a slightly different question here : how do I translate arbitrary code into MY logic gates, implemented in (technology not described in the question). And that's harder to answer.
Synthesis tools usually come from a vendor such as an FPGA vendor, and they translate arbitrary code into that vendor's logic gates, not yours.
The ideal solution is to create a library describing your logic technology, which plugs into a vendor-neutral synthesis tool, such as Synplicity from Synopsys. Then Synplicity can synthesise to your technology instead of an FPGA vendor's.
OK, but creating that library is likely to be a task roughly on a par with writing a backend for a custom CPU's instruction set, and integrating that backend into gcc. Except that Synplicity, unlike gcc, isn't open source, so without considerable financial and technical resources, and help and internal documentation from a major EDA tool company, it's approximately impossible. (At this point I'd be delighted to be corrected by someone who's actually done it)
EDIT (nearly 7 years later!)
GHDL (open source VHDL simulator) available from Github has grown a synthesis branch, so this route is now likely to be easier (I'll stop short of saying easy!) or may be a better choice than a proprietary toolset for the route below. It has a tie-in to the YOSYS open source synthesis suite. CAVEAT : I haven't tried it, but it may be worth a look for future viewers of this question.
(end edit)
So we need a different approach.
Back to the FPGA synth tools : I'll use Xilinx XST as an example. It'll synth to Xilinx primitives, in some Xilinx closed internal format.
However there's also an option labelled "Write post-synthesis netlist".
Using that, you can get a structural VHDL file representing the translation of your arbitrary code into Xilinx gates, flip-flops, and other elements, which are drawn from the Simprims library.
Attempt to compile that (e.g. in a simulator) without making the Simprims library visible (e.g. with the library Simprims; use Simprims; statements commented out and you'll get something valuable:
a list of compilation errors
Not so valuable, you may think : except that it's actually a list of gates, flipflops and other elements that are needed to implement your design.
If you can find (or create) a one-to-one correspondence between these, and elements in your chosen technology, then you can map this netlist to your technology.
If there are Simprims elements for which you don't have equivalents, you need to implement them - e.g. by creating 3-input AND gates from networks of the 2-input NAND gates you have.
Use the corresponding elements from your own (VHDL) library, instead of Simprims, and you should, in theory, have a usable compiled design.
(You're on your own with problems like layout, routing, timing analysis. This is not trivial, but for the sake of this question I'm assuming your "project making physical gates" has these covered...)
Firstly, logic synthesis is basic logic optimization of logic circuits, or the transformation of structural logic circuits into a data structure representing logic circuits. See https://stackoverflow.com/a/60535990/1531728 for resources about logic synthesis.
Secondly, to transform computer programs (e.g., in Python, C, C++, or otherwise) into logic circuits, you would need to perform high-level synthesis (or behavioral synthesis) to transform the computer program by parsing it into a control and data flow graph (CDFG, or a pair of control flow graph and a dataflow graph), optimize the CDFG, and subsequently transform this CDFG into a logic circuit for logic synthesis. Once you have the logic circuit, you still need to perform physical design (e.g., floorplanning, placement, and routing) to map that logic circuit into a layout for tapeout (standard-cell -based digital integrated circuit design) or onto a field-programmable gate array (FPGA).
The references below provide literature reviews into the topics I mentioned.
#book{Lavagno2016a,
Address = {Boca Raton, {FL}},
Author = {Luciano Lavagno and Igor L. Markov and Grant Martin and Louis K. Scheffer},
Doi = {https://dx.doi.org/10.1201/b19569},
Edition = {Second},
Publisher = {{CRC} Press},
Series = {Electronic Design Automation for Integrated Circuits Handbook},
Title = {Electronic Design Automation for {IC} System Design, Verification, and Testing},
Volume = {1},
Year = {2016}}
#book{Lavagno2016,
Address = {Boca Raton, {FL}},
Author = {Luciano Lavagno and Igor L. Markov and Grant Martin and Louis K. Scheffer},
Doi = {https://dx.doi.org/10.1201/b19714},
Edition = {Second},
Publisher = {{CRC} Press},
Series = {Electronic Design Automation for Integrated Circuits Handbook},
Title = {Electronic Design Automation for {IC} Implementation, Circuit Design, and Process Technology},
Volume = {2},
Year = {2016}}
See https://github.com/lsils/lstools-showcase for the state-of-the-art logic synthesis libraries from EPFL (Ecole Polytechnique Fédérale de Lausanne) in Lausanne, Switzerland.
You can use these tools to perform logic synthesis. They are programs, commercial high-level synthesis software, that transform C or C++ code into Verilog (or some other hardware description language, HDL), and subsequently into logic circuits.
You can also consider Python-based HDL (such as PyMTL, PyRTL, or MyHDL), Scala-based HDL (e.g., Chisel HDL), or Haskell-based HDL (e.g., Clash). These HDLs provide a workaround to your goal of transforming computer programs into logic circuits.
We are using a tool to convert the code into RTL.
Using those VHDL files, we would like to synthesis the code using FPGA.
In the synthesis results, we see the following table:
Slice Logic Utilization Used Available Utilization
Number of DSP48E1s 15 864 1%
I would like to search in VHDL files to see which operations use these units.
Is there any way to find them? or any documentation which shows the operations causing DSPs to be used?
There are a few ways that a DSP48 may be used in your VHDL.
It may be inferred. This is when the synthesis tool is smart by looking at an operation that you are doing (such as a multiply) and realizing that it would be most efficient to do the multiply with a dedicated resource (DSP48) instead of fabric/logic.
It may be instantiated. This means that the primitive was directly called out in your source file. The designer said that I know I want to use this piece of hardware, so I am going to call it out explicitly. This is when you could do a text search for "DSP48" in your VHDL source files.
It may be part of a core. If it is part of a core, you may or may not have visibility into that core. For example, how the core is actually implemented may be different than the behavioral model which is used for simulation.
In any case, as recommended by Russell, using Xilinx toolset to determine utilization of primitives in the design hierarchy can be a good first pass to figuring out where the units are coming from. Additionally, you can always open up FPGA Editor, see what the DSP48 units are called and what signals are going to/out of the DSP48 for additional hints on where it is in your design.
It sounds like you're trying to find your Module Level Utilization. I know that Xilinx ISE supports this. Under Design Overview there's an option called Module Level Utilization that breaks down every module in your VHDL design and tells you where the Regs, LUTs, BRAMs, and DSPs are used.
If you're unable to find it, look for any large multiplications in your design. Large multiply/accumulate operations will synthesize into DSP48s.
I have a computational intensive task which I used CUDA to implement it and now I want to make it even faster with FPGAs (if possible)
The system I want to implement is a series of computations each similar to matrix multiplication in sense of being parallel. It also has some non-parallel parts in between. It works with big amounts of data.
Although I want it as fast as possible, I have enough time to learn and explore with FPGAs.
here I'm asking for suggestions on how I start my path? Which FPGA to choose and where to learn about it. any website or online class or books? I've decided to do this anyway but your idea of whether this will be faster on FPGA or not would be helpful too.
The big wins from an FPGA over using a GPU come from:
Using non-standard word widths optimised to your application. This allows denser logic, which allows more parallel processing blocks
using your knowledge of the required accesses to external RAM to schedule them in hardware more efficiently than a general purpose memory controller can.
The downside is getting data to and from the FPGA. Draw a data-transfer diagram before you start. Even if the FPGA provides infinite speedup, you might still find it's not worth the effort if there's loads of data to be shuffled to and fro!
It's likely you'll be wanting a PCI express based board. Which is (I imagine) a whole new learning-curve before you get to do anything with the FPGA - but if you're up for it, it'll be a very interesting task!
In terms of choosing FPGAs, have a play with the software tools from the various vendors - at the learning stage that's much more important than the chips themselves. You won't find (at this early learning-stage) a show-stopper feature in any of the various chips. Also take into account the availability of boards with your required interfaces on, and any IP-core you might need to do the high-speed interfacing (eg PCIe)
You can get a substantial speedup on most parallel problems with an FPGA.
However, in addition to implementing your computation on the FPGA, there's a lot of work involved in getting the data back and forth from the CPU/main memory. This will require implementation of (for example) a PCI Express endpoint in the FPGA logic (bus mastering for maximum speed) and custom drivers on the software side. Most operating systems will require those drivers to be developed in kernel mode.
And you can't just use the most straightforward approach for FPGA programming either. You're going to need to worry about pipelining and clock synchronization in order to maximize throughput.
In other words, it's a substantially difficult task even for engineers with years of FPGA experience. I strongly suggest you find someone to work with on this. Depending on how proprietary your project is, you might find skilled academics willing to work with you as long as you provide them with all materials and publication rights.
If you're determined to go it alone, you'll need some hardware. Many different companies offer FPGA wired up as accelerators, for example http://www.nallatech.com/pci-express-cards.html
Depending on whether you choose a Xilinx or Altera FPGA, you'll find considerable sample code and tutorials for getting PCI express working.
I've got a few VHDL files, which I can compile with ghdl on Debian. The same files have been adapted by some for an ASIC implementation. There's one "large area" implementation and one "compact" implementation for an algorithm. I'd like to write some more implementations, but to evaluate them I'd need to be able to compare how much area the different implementations would take.
I'd like to do the evaluation without installing any proprietary compilers or obtaining any hardware. A sufficient evaluation criteria would be an estimation of GE (gate equivalent) area, or the number of logic slices needed by some FPGA implementation.
Start by counting the flip-flops (FFs). Their number is (almost) uniquely defined by the RTL code that you have written. With some experience, you can get this number by inspecting the code.
Typically, there is a good correlation between the #FFs and the overall area. An old rule of thumb is that for many designs, the combinatorial area will be about the same as the sequential area. For example, suppose the area count of a flip-flop is 10 gates in a gate array technology, then #FFs * 20 would give you an initial estimation.
Of course, the design characteristics have a significant influence. For datapath-oriented designs, the combinatorial area will be relatively larger. For control-oriented designs, the opposite is true. For standard-cell designs, the sequential area may be smaller because FFs are more efficient. For timing-critical designs, the combinatorial area may be much larger as a result of timing optimization by the synthesis tool.
Therefore, the remaining issue is to find out what a good multiplication factor is for your type of designs and target technology. The strategy could be to carry out some experiments, or to look at prior design results, or to ask others. From then on, estimating is a matter of multiplying the #FFs, known from your code, with that factor.
I'd like to do the evaluation without installing any proprietary compilers or obtaining any hardware.
Inspection will give you a rough idea but with all the optimisations that occur during synthesis you may find this level of accuracy too far removed from the end result.
I would suggest that you re-examine your reasons for avoiding "proprietary compilers" to perform the evaluation. I'm unaware of any non-proprietary synthesis tools for VHDL (though it has been discussed). The popular FPGA vendors provide free versions of their software for Windows and Linux which you could use to obtain accurate counts of resource usage. It should be feasible to translate the FPGA resource usage into something more meaningful for your target technology.
I'm not very familiar with the ASIC world but again there may be free (but proprietary) tools available for you to use.
I have a fundamental question. I produced some FPGA image for some media application and
now I would like to compare my results to the ones of ASIC implementation of the same algorithm in terms of performance & area. I have heard such a comparasion does not make sense since it is somewhat comparing apples and oranges. But I have heard about the Gate-equivalence metric, cant I use this for comparison reasons?
Thanks
As has been pointed out, gate equivalents are only a rough guesstimate and not all that accurate for determining area in an ASIC. There are different of ways you can go about finding out how your design would perform (and cost) in an ASIC. You likely used an HDL (VHDL or Verilog) to implement your design. If you have access to a synthesis tool like Synopsys' Design Compiler (DC) you can use that with one of the supplied ASIC vendor libraries to determine area. You can also use it to generate a post-synthesis, gate-level netlist that you can use in simulation to determine performance. DC will also give you information about critical path timing, etc. that can be used to calculate performance as well.
However, DC is a very expensive product and you likely used FPGA vendor supplied tools to synthesize your HDL design. You could approach an ASIC vendor and ask them to analyze your design to determine size & performance (they would likely use DC - you'd have to be willing to hand your HDL over to them). They may be inclined to do this in order to win your business. But as has been pointed out ASIC NREs are very expensive, so unless you have a high-volume product it probably doesn't make sense to move your design to an ASIC.
The gate equivalence metric might get you to within an order of magnitude - if that's good enough for you? The problem is that a 4-input LUT can implement a single AND gate, or a complex 4-input function representing several gates. Or (in a Xilinx chip) it can be a shift register with 16 bits of memory in it. And it has a flipflop attached to its output (with attendant control signals and the like.... another few gates). And if you've used Block memory or the DSP blocks, they are even harder to quantify.
When you say you want to compare performance and area, do you really mean "cost"? Is this a potential product with millions of units sold, or "just" a few 10s of thousands? ASIC NRE is big!
You can optimise your FPGA design for cost as well, which might be good enough, depending on your volumes. For example, an image-processing design done in a traditional fashion can be 10x bigger than one designed for seriously small FPGA usage, with similar application performance... if you know what you're doing :)