A latch based fifo (i.e. level sensitive latch) might be cheaper in terms of area than FF based FIFO. I'm looking for a latch based FIFO design code or architecture. So far I didn't come across any. Is it possible to design one? I'm looking for some papers or idea to get started...
You can use pulse latches, which retain the advantages of both latches and flip-flops, offering higher performance and lower power consumption, but they are not often "fully" supported by common CAD tools.
Alternatively, you can convert your flops into two level-sensitive master/slave latches. A flip flop can be implemented by two opposite phase latches. This is usually done to enable time borrowing and does not necessarily result a smaller/faster circuit. This way your FIFO structure is very similar to the flop-based design, except that each flop is replaced by two latches.
It is possible to use latches for fifos, though I don't have any code handy to show how. Typically, I have seen fifos implemented as a 'sram' for the storage with a wrapper for the fifo logic around it. This structure can also handle different read/write clocks relatively naturally.
I don't know the exact heuristics, but I think
small sram cells are implemented using flops.
medium sram cells are implemented using latches.
large sram cells are implemented using actual ram cells.
There is some crossover point between using flops and latches, where the extra overhead of control logic and routing for the latches becomes worth the area saving in the actual storage.
Related
I am doing a small task where I have to count the pulses coming from two inputs. The requirement doesn't specify clock. Currently I have a process that is triggered when any of the input changes and then corresponding count is increased.
My question is Should I use clock for this design and make the process Clock sensitive and then check if inputs have Changed? Is it a good practice to use Clocks in VHDL design?
Sub-question- I have to double buffer the input data. Does this mean I have to use clock and pass inputs through two flipflops? or is there a way to double buffer data without using clock?
It is possible to design asynchronous circuit with VHDL. The design rules are a bit different than synchronous design (using request and acknoledge signals).
Your need is not very complex and could be designed without clock but you have to be careful with your memorisation element. Specialy if you work with an FPGA, these devices are not supposed to work asynchonously. So look carefully the synthetizer results.
(If it's school homework, use a clock ;) In digital design, the clock usage is the default case. Asynchronous logic is an advanced concept)
I was wandering, in terms of "performance" if there's some kind of difference between vhdl structural and behavioural. I know that nowdays is more common to write behavioural instead of structural but since i'd like to have an understanding in terms of performance i have been thinking that maybe there's some difference...
There is no hardware-related reason to prefer one form or the other.
It may be that one form leads to faster simulation than the other; I haven't seen any evidence for this in general, but then I haven't looked. It is true that after synthesis, the design is translated into a structural form, and post-synthesis simulation is slow, but this is due to the sheer size of the resulting structural version expressed as thousands of individual gates and their interconnections.
What matters more is the quality of synthesis results: It should be possible to write a design in both forms and have it synthesise to essentially the same hardware. And this seems to be generally true, in my experience.
Sometimes you will find synthesis tools have difficulty efficiently translating a construct (usually behavioural) but not as frequently as in the past.
What matters most (unless you are pushing the boundaries of speed or FPGA size) is clarity, leading to readability, reliability, efficiency, testability, maintainability and so on. If you can't understand it you can't see the inefficiencies or even test it properly.
Here, structural VHDL has a role to play at the top level : dividing a system into blocks like CPU, memory interface, FFT processor, UART, SPI and so on. Sometimes hierarchically, so you might want to divide the memory interface into refresh logic, error correction, address multiplexing, and so on.
But most blocks - for example, tasks that a single state machine can handle, are clearest and simplest when expressed behaviourally. So in a UART you might have two separate processes for TX and RX, while an SPI interface (which sends and receives on the same SPI clock) is probably best as a single behavioural process.
I am trying to generate a synthesizable buffer in VHDL for a time-to digital project in FPGA.
I have been looking around but cannot find any set-up out there.
I have been recommended that stackoverflow has very good answers.
Could you please give me some tips for this course work, and I would be very greatful to any approach you might come up with.
Thank you a lot in advance!
Regards
Doing time-delay-circuits (TDC) is somewhat hard right now.
Basically, it boils down to having HDL that describes multiple registers all reading the same signal. You then need to apply a keep directive, e.g. equivalent_register_removal for Xilinx. You will possibly also need a timing ignore constraint on the signal you are sampling.
You then need to carefully examine the fabric of your FPGA and make sure your flop flops are placed in the same slice across multiple sites that can all be connected through the same kind of wire (check FPGA Editor), i.e. will have the same time delay.
You can build a minimal test design for Xilinx in FPGA editor. Once you have the routing down, you can then formulate appropriate constraints for your UCF file and build much bigger, more complex TDCs.
I'm only familiar with Altera from a few years ago. But Altera doesn't give you an interface like Xilinx's FPGA editor, so you're on your own determining the placement of your flops. I saw a presentation once about a university work group doing TDCs with Altera and ultimately it boiled down to measuring the resolution by using input stimuli to check whether the design was routed according to their wishes. If it was not, they would adjust some timing parameters out of sensible bounds, rinse and repeat.
The last step of course is to sample your signal in the synchronous part of your design (where the counter is) and read the counter plus flip flop contents when the event you wanted occurs (i.e. rising edge, falling edge). Then you have major time units in your counter and minor time units as a bitfield in the flip flop state.
If you want even spread among your flip flop delays, you will need to carefully examine the delay length of paths between the flip flops and adjust for your overall clock period.
So basically, counter * clock_period + index_of_highest_set_bit_in_flip_flop_state * path_delay is then your delay time.
You will also need to check the FPGA datasheet to know your minimal timings, i.e. the fastest toggle time the input buffer can achieve, the minimal setup and hold time of your flops etc.
I have a Verilog design that compiles to ~15K LEs on a Cyclone IV (EP4CE22F17C6N). When I compile the same same code on a Cyclone V (5CEFA2F23C8N), it takes ~8500 ALMs. Based on Altera's own LE equivalency for the particular Cyclone V, this would be ~20K LEs. Now, I realize that the estimates are going to be highly dependent on particular design, but a %33 increase in "effective" resource utilization seems like a lot.
So it makes me wonder if there are design tips/tricks/etc. for making more efficient use of ALMs. In particular, I'm looking for Verilog constructs that would improve the register density, fabric density, dense packing, etc.
I would agree with the comments above that generally you shouldn't need to optimise, however it's always important to check that your code does map to the chosen architecture. Specifically:
Reset
Using the wrong kind of reset for your architecture can cause problems. It's also very easy to accidentally cause the synthesis tool to insert logic to emulate a clock-enable. For full details see this answer. For Altera you should be using an asynchronous reset which is synchronously de-asserted.
Priority of control signals
In Altera:
Asynchronous Clear, aclr—highest priority
Asynchronous Load, aload
Enable, ena
Synchronous Clear, sclr
Synchronous Load, sload
Data In, data—lowest priority
Latches
Easy to grep from the reports, but unless you're absolutely sure it's intentional, latches are generally bad mmmmkay.
Synthesis
There are many options available to tweaking the behaviour of the synthesis process. Here are a few that will affect your results:
ALM_REGISTER_PACKING_EFFORT
This option guides the Fitter when packing registers into ALMs.
MUX_RESTRUCTURE
Allows the Compiler to reduce the number of logic elements required to implement multiplexersin a design.
OPTIMIZATION_TECHNIQUE
Specifies the overall optimization goal for Analysis & Synthesis: attempt to maximize performance, minimize logic usage, or balance high performance with minimal logic usage.
Bear in mind that if your device isn't getting too full, the tool won't have much "incentive" to minimise logic utilisation unless you explicitly tell it to.
I have a clock in my vhdl code but i don't use it , simply my process just depends on handshake when one component finishes and gets an output out , this output is in the sensitivity list of my FSM and is then becomes an input to the next component and of course its output is also in the sensitivity list of my FSM(so to know when will component finishes its computation)... and so on.
Is this method wrong ? it works in simulation and also in post-route simulation but gets me warnings like this : warning :HOLD High VIOLATION ON I WITH RESPECT TO CLK; and
warning :HOLD Low VIOLATION ON I WITH RESPECT TO CLK;
is this warnings not important or will my code damage my fpga because it doesn't depend on a clock ?
The warning you are getting are timing violations. You get these because the tools detect that your design does not obey the necessary timing restrictions for the internal primitives.
For instance, inputs to lookup-tables (which is one of the main building-blocks inside an FPGA) need to be held for a specific time for the output to stabilize. This is very hard to guarantee when your entire timing relies only on the latencies and delays of the components themselves, and switch on a completely asynchronous basis.
Depending on your actual design (mostly the size and complexity of it), I'll wager the guess that you'll end up with a lot of very-hard-to-debug errors once you get it inside an FPGA. You'll have a much, much, much easier time using a clock. This will allow you to have a clear idea of when signals arrive where, and it will allow you to use the internal tools to check your timing. You'll also find it much easier to interface to other devices, and your system will be less susceptible to noisy inputs.
So all in all, use a clock. You (probably) wont damage your FPGA by not doing it, but a clock will save you from tons of trouble.
your code does most probably not damage your FPGA because it doesn't depend on a clock. however, for synthesis you should always use registered (clocked) logic. without using a clock your design will not be controllable because of timing/delay/routing/fan out/... this will let your FSM behave "mysteriously" when synthesized (even if it worked in simulation).
you'll find plenty of examples for good FSM implementation style with google's help (search for Moore or Mealy FSM)
Definitely use a clock. And only one clock throughout the design. This is the easiest way - the tools support this design style very well. You can often get away with a single timing constraint, especially if your inputs are slow and synchronous to the same clock.
When you have gained experience designing this way, you can move outside of this, but be ready for more analysis, timing constraints and potentially build iterations while you learn the pitfalls of crossing clock-domains and asynchronous signals.