I am doing a small task where I have to count the pulses coming from two inputs. The requirement doesn't specify clock. Currently I have a process that is triggered when any of the input changes and then corresponding count is increased.
My question is Should I use clock for this design and make the process Clock sensitive and then check if inputs have Changed? Is it a good practice to use Clocks in VHDL design?
Sub-question- I have to double buffer the input data. Does this mean I have to use clock and pass inputs through two flipflops? or is there a way to double buffer data without using clock?
It is possible to design asynchronous circuit with VHDL. The design rules are a bit different than synchronous design (using request and acknoledge signals).
Your need is not very complex and could be designed without clock but you have to be careful with your memorisation element. Specialy if you work with an FPGA, these devices are not supposed to work asynchonously. So look carefully the synthetizer results.
(If it's school homework, use a clock ;) In digital design, the clock usage is the default case. Asynchronous logic is an advanced concept)
Related
I am trying to generate a synthesizable buffer in VHDL for a time-to digital project in FPGA.
I have been looking around but cannot find any set-up out there.
I have been recommended that stackoverflow has very good answers.
Could you please give me some tips for this course work, and I would be very greatful to any approach you might come up with.
Thank you a lot in advance!
Regards
Doing time-delay-circuits (TDC) is somewhat hard right now.
Basically, it boils down to having HDL that describes multiple registers all reading the same signal. You then need to apply a keep directive, e.g. equivalent_register_removal for Xilinx. You will possibly also need a timing ignore constraint on the signal you are sampling.
You then need to carefully examine the fabric of your FPGA and make sure your flop flops are placed in the same slice across multiple sites that can all be connected through the same kind of wire (check FPGA Editor), i.e. will have the same time delay.
You can build a minimal test design for Xilinx in FPGA editor. Once you have the routing down, you can then formulate appropriate constraints for your UCF file and build much bigger, more complex TDCs.
I'm only familiar with Altera from a few years ago. But Altera doesn't give you an interface like Xilinx's FPGA editor, so you're on your own determining the placement of your flops. I saw a presentation once about a university work group doing TDCs with Altera and ultimately it boiled down to measuring the resolution by using input stimuli to check whether the design was routed according to their wishes. If it was not, they would adjust some timing parameters out of sensible bounds, rinse and repeat.
The last step of course is to sample your signal in the synchronous part of your design (where the counter is) and read the counter plus flip flop contents when the event you wanted occurs (i.e. rising edge, falling edge). Then you have major time units in your counter and minor time units as a bitfield in the flip flop state.
If you want even spread among your flip flop delays, you will need to carefully examine the delay length of paths between the flip flops and adjust for your overall clock period.
So basically, counter * clock_period + index_of_highest_set_bit_in_flip_flop_state * path_delay is then your delay time.
You will also need to check the FPGA datasheet to know your minimal timings, i.e. the fastest toggle time the input buffer can achieve, the minimal setup and hold time of your flops etc.
I have a Verilog design that compiles to ~15K LEs on a Cyclone IV (EP4CE22F17C6N). When I compile the same same code on a Cyclone V (5CEFA2F23C8N), it takes ~8500 ALMs. Based on Altera's own LE equivalency for the particular Cyclone V, this would be ~20K LEs. Now, I realize that the estimates are going to be highly dependent on particular design, but a %33 increase in "effective" resource utilization seems like a lot.
So it makes me wonder if there are design tips/tricks/etc. for making more efficient use of ALMs. In particular, I'm looking for Verilog constructs that would improve the register density, fabric density, dense packing, etc.
I would agree with the comments above that generally you shouldn't need to optimise, however it's always important to check that your code does map to the chosen architecture. Specifically:
Reset
Using the wrong kind of reset for your architecture can cause problems. It's also very easy to accidentally cause the synthesis tool to insert logic to emulate a clock-enable. For full details see this answer. For Altera you should be using an asynchronous reset which is synchronously de-asserted.
Priority of control signals
In Altera:
Asynchronous Clear, aclr—highest priority
Asynchronous Load, aload
Enable, ena
Synchronous Clear, sclr
Synchronous Load, sload
Data In, data—lowest priority
Latches
Easy to grep from the reports, but unless you're absolutely sure it's intentional, latches are generally bad mmmmkay.
Synthesis
There are many options available to tweaking the behaviour of the synthesis process. Here are a few that will affect your results:
ALM_REGISTER_PACKING_EFFORT
This option guides the Fitter when packing registers into ALMs.
MUX_RESTRUCTURE
Allows the Compiler to reduce the number of logic elements required to implement multiplexersin a design.
OPTIMIZATION_TECHNIQUE
Specifies the overall optimization goal for Analysis & Synthesis: attempt to maximize performance, minimize logic usage, or balance high performance with minimal logic usage.
Bear in mind that if your device isn't getting too full, the tool won't have much "incentive" to minimise logic utilisation unless you explicitly tell it to.
A latch based fifo (i.e. level sensitive latch) might be cheaper in terms of area than FF based FIFO. I'm looking for a latch based FIFO design code or architecture. So far I didn't come across any. Is it possible to design one? I'm looking for some papers or idea to get started...
You can use pulse latches, which retain the advantages of both latches and flip-flops, offering higher performance and lower power consumption, but they are not often "fully" supported by common CAD tools.
Alternatively, you can convert your flops into two level-sensitive master/slave latches. A flip flop can be implemented by two opposite phase latches. This is usually done to enable time borrowing and does not necessarily result a smaller/faster circuit. This way your FIFO structure is very similar to the flop-based design, except that each flop is replaced by two latches.
It is possible to use latches for fifos, though I don't have any code handy to show how. Typically, I have seen fifos implemented as a 'sram' for the storage with a wrapper for the fifo logic around it. This structure can also handle different read/write clocks relatively naturally.
I don't know the exact heuristics, but I think
small sram cells are implemented using flops.
medium sram cells are implemented using latches.
large sram cells are implemented using actual ram cells.
There is some crossover point between using flops and latches, where the extra overhead of control logic and routing for the latches becomes worth the area saving in the actual storage.
I have a clock in my vhdl code but i don't use it , simply my process just depends on handshake when one component finishes and gets an output out , this output is in the sensitivity list of my FSM and is then becomes an input to the next component and of course its output is also in the sensitivity list of my FSM(so to know when will component finishes its computation)... and so on.
Is this method wrong ? it works in simulation and also in post-route simulation but gets me warnings like this : warning :HOLD High VIOLATION ON I WITH RESPECT TO CLK; and
warning :HOLD Low VIOLATION ON I WITH RESPECT TO CLK;
is this warnings not important or will my code damage my fpga because it doesn't depend on a clock ?
The warning you are getting are timing violations. You get these because the tools detect that your design does not obey the necessary timing restrictions for the internal primitives.
For instance, inputs to lookup-tables (which is one of the main building-blocks inside an FPGA) need to be held for a specific time for the output to stabilize. This is very hard to guarantee when your entire timing relies only on the latencies and delays of the components themselves, and switch on a completely asynchronous basis.
Depending on your actual design (mostly the size and complexity of it), I'll wager the guess that you'll end up with a lot of very-hard-to-debug errors once you get it inside an FPGA. You'll have a much, much, much easier time using a clock. This will allow you to have a clear idea of when signals arrive where, and it will allow you to use the internal tools to check your timing. You'll also find it much easier to interface to other devices, and your system will be less susceptible to noisy inputs.
So all in all, use a clock. You (probably) wont damage your FPGA by not doing it, but a clock will save you from tons of trouble.
your code does most probably not damage your FPGA because it doesn't depend on a clock. however, for synthesis you should always use registered (clocked) logic. without using a clock your design will not be controllable because of timing/delay/routing/fan out/... this will let your FSM behave "mysteriously" when synthesized (even if it worked in simulation).
you'll find plenty of examples for good FSM implementation style with google's help (search for Moore or Mealy FSM)
Definitely use a clock. And only one clock throughout the design. This is the easiest way - the tools support this design style very well. You can often get away with a single timing constraint, especially if your inputs are slow and synchronous to the same clock.
When you have gained experience designing this way, you can move outside of this, but be ready for more analysis, timing constraints and potentially build iterations while you learn the pitfalls of crossing clock-domains and asynchronous signals.
I am aware of the need to synchronize all inputs to an FPGA before using those inputs in order to avoid metastability. I'm also aware of the need to synchronize signals that cross clock domains within a single FPGA. This question isn't about crossing clock domains.
My question is whether it is a good idea to routinely register all of the inputs and outputs of every internal hardware module in an FPGA design. The rationale is that we want to break up long chains of combinational logic in order to improve the clock rate so that we can meet the timing constraints for a chosen clock rate. This will add additional cycles of latency proportional to the number of modules that a signal must cross. Is this a good idea or a bad idea? Should one register only inputs and not outputs?
Answer Summary
Rule of thumb: register all outputs of internal FPGA cores; no need to register inputs. If an output already comes from a register, such as the state register of a state machine, then there is no need to register again.
It is difficult to give a hard and fast rule. It really depends on many factors.
It could:
Increase Fmax by breaking up combinatorial paths
Make place and route easier by allowing the tools to spread logic out in the part
Make partitioning your design easier, allowing for partial rebuilds.
It will not magically solve critical path timing issues. If there is a critical path inside one of your major "blocks", then it will still remain your critical path.
Additionally, you may encounter more problems, depending on how full your design is on the target part.
These things said, I lean to the side of registering outputs only.
Registering all of the inputs and outputs of every internal hardware module in an FPGA design is a bit of overkill. If an output register feeds an input register with no logic between them, then 2x the required registers are consumed. Unless, of course, you're doing logic path balancing.
Registering only inputs and not outputs of every internal hardware module in an FPGA design is a conservative design approach. If the design meets its performance and resource utilization requirements, then this is a valid approach.
If the design is not meeting its performance/utilization requirements, then you've got to do the extra timing analysis in order to reduce the registers in a given logic path within the FPGA.
My question is whether it is a good idea to routinely register all of the inputs and outputs of every internal hardware module in an FPGA design.
No, it's not a good idea to routinely introduce registers like this.
Doing both inputs and outputs is redundant. They'll be no logic between the output register and the next input register.
If my block contains a single AND gate, it's overkill. It depends on the timing and design complexity.
Register stages need to be properly thought about and designed. What happens when a output FIFO fills or other stall conditions? Do all signals have the right register delay so that they appear at the right stage in the right cycle? Adding registers isn't necessarily as simple as it seems.
The rationale is that we want to break up long chains of combinational logic in order to improve the clock rate so that we can meet the timing constraints for a chosen clock rate. This will add additional cycles of latency proportional to the number of modules that a signal must cross. Is this a good idea or a bad idea?
In this case it sounds like you must introduce registers, and you shouldn't read the previous points as "don't do it". Just don't do it blindly. Think about the control logic around the registers and the (now) multi-cycle nature of the logic. You are now building a "Pipeline". Being able to stall a pipeline properly when the output can't write is a huge source of bugs.
Think of cars moving on a road. If one car applies it's brakes and stops, all cars behind need to as well. If the first cars brake lights aren't working, the next car won't get the signal to brake, and it'll crash. Similarly each stage in a pipeline needs to tell the previous stage it's stopping for a moment.
What you can find is that instead of having long timing paths along your computation paths going from input to output, you end up with long timing paths on your enable controlling all these register stages from output to input.
Another option you have is, to let the tools work for you. Add add the end of your complete system a bunch of registers (if you want to pipeline more) and activate in your synthesis tool retiming. This will move the registers (hopefully) between the logic where it is most useful.