Do all Flip Flops in a design need to be resettable (ASIC)? - vhdl

I'm trying to understand clock-reset in a chip. In a design what criteria are used to decide whether a flop should be assigned to a value (typically to zero) during reset?
always_ff #(posedge clk or negedge reset) begin : process_w_reset
if(~reset) begin
flop1 <= '0;
....
end else begin
if (condition) begin
flop1 <= something ;
....
end
end
end
always_ff #(posedge clk) begin : process_wo_reset
if (condition) begin
flop1 <= something ;
....
end
end
Is it a bad practice to not to reset a flop which is used later as a control signal in a comb logic? What if the design makes sure that the flop will have a valid value (0 or 1) assigned to it before its used in a comb logic block (i.e. in a if statement or in FSM comb logic) ?
I feel like it's better to always reset all the flops in the design. In that way there won't be any Xs after reset in the chip. However, it seems like for datapath logic, resetting flop might need not be a big deal as it'll be just pipe stages. However if a flop is in control path (i.e. FSM next state comb logic) then it should be reset to a default value. Is my understanding correct? I don't know much about DFT and not sure if it has any other implication.

Assuming that reset means asynchronous reset, as in the code examples.
The answer is partly opinion based, since a design can be made to work with reset of a minimum number of the Flip-Flops (FFs) and all of the FFs.
I suggest that a minimum number of FFs are reset, and typically that leads to reset of most FFs in the control path, and no reset of FFs in the data path. The advantages of this approach are outlined below.
Simulation is often conservative with respect to propagation of uninitialized values, both for Verilog and VHDL, so it is like simulation can check both 0 and 1 values at once when the value is uninitialized.
Bugs due to FFs that are not reset, are therefore likely to show earlier in verification with simulation, and the designer thereby gets valuable feedback about wrong design assumptions, which may lead to corrections in the design that fixes other bugs. Just resetting all the FFs is likely to hide such bugs.
It may seem like design and verification is just easier if all FFs are reset, both in control and data path, since it fixes all those "annoying" X propagation in the design. But it requires an increased number of tests in order to verify all value combinations when X propagation is suppressed through reset.
Implementation gives a smaller load on the reset signal, so it is easier to meet timing of the reset net throughout the chip.
DFT (Design For Test) in general, then adding reset to the FFs will not aid DFT in finding nets stuck at reset value. With a DFT scan chain approach where all the FFs are loaded through the scan chain, then the lack of reset on some FFs will not require more vectors.

Generally you need to think about where the 'X's will propagate in your simulation and which ones matter and which ones are don't care conditions. For example, if you have a block of logic which doesn't start operating until an enable bit is set, so long as the enable bit itself is set and enough upstream logic is reset so reset values will propagate through to the enabled logic in time, you are most likely OK with not reseting the logic in between. However, you do want to reset any logic that feeds back into itself, (for example state machines) otherwise the upstream resets will never be able to establish a known state in the feedback block.

I agree with Morten Zilmer that you should only reset flops that require resetting, although my background is more FPGA than ASIC.
It's worth pointing out there is a gotcha in Verilog / SystemVerilog - if you have a clocked process that drives registers that are reset and registers that aren't you will end up inferring a clock enable or an additional mux on the input of your flip-flop.
This is usually not what was intended.
There is a more detailed explanation in this answer. I also wrote a blog post outlining a mechanism for abstracting away synchronous/asynchronous and active high/low reset.

As a general rule of thumb, you should probably always reset control signals.
For data flops, resetting can cost you area, so it really depends on whether you care about area.
In recent years simulators started to support X propagation modes that allow you to catch some of the X issues in RTL (instead of gate level simulation). It is a good practice to run these to make sure you don't have a reset problem with uninitialized sram or flops.

Related

Does time delay in a sequential logic circuit block have a influence on synthesize or place or route's result?

I use Xilinx ISE as a IDE.
If I add a 100 ps delay at every assignment in a always(Verilog)/process(VHDL) with sensitive list only have clock and reset.
Like this.
always#(posedge clk)
if(rst)
a <= #100 'd0;
else
a <= #100 b;
end
I think the delay function is only effect the simulation process.Because every book and user guide tell us delay is not synthesizable.
But I still wondering if the delay function can really effect the place or route's result?Like static timing or clock report?
Like can make a circuit max frequency higher or slower?
No the #delay in your code is not going to affect the timing of the design when it is loaded on to the FPGA.
It also does not affect the place and route results or the static timing analysis. Both of these steps use timing information that is provided by the manufacturer in the form of device models.
You are correct that there's nothing intrinsic about delay statements that makes them unsynthesizable, however it's wildly impractical to attempt to do so. The reason for this is that once on the FPGA you are dealing with a physical circuit whose performance varies with PVT (process, voltage, temperature) and can do so by a lot! The only hedge against this would be an analog circuit that attempts to sense all of the above and adjust itself accordingly. Such a beast will still be limited in what it can do, and would be physically large and power hungry depending on the rage of delay and the variance in all of the above you want to support.
So with than in mind and considering that there is very little (read: no) demand for this outside of special purpose IO FPGA vendors don't provide any such components making the construct unsythesizable.
Delay statements (#100) are usually ignored during synthesis in Verilog. So in synthesis it is the same as:
always#(posedge clk)
if(rst)
a <= 0;
else
a <= b;
end
Xlinx Synthesis and Simuation Design Guide states:
Delays in Synthesis Code
Do not use Wait for XX ns (VHDL) or the #XX (Verilog) statements in
your code. (...) This statement does not synthesize to a component.
In designs that include this construct, the functionality of the
simulated design does not always match the functionality of the
synthesized design.
(...)
Wait for XX ns Statement Verilog Coding Example
#XX;
Do not use the After XX ns statement in your VHDL code or the Delay
assignment in your Verilog code
(...)
Delay Assignment Verilog Coding Example
assign #XX Q=0;
XX specifies the number of nanoseconds that must pass before a
condition is executed. This statement is usually ignored by the
synthesis tool. In this case, the functionality of the simulated
design does not match the functionality of the synthesized design.
"Usually" there is no impact on synthesis and P&R results.
Xilinx: This statement is usually ignored by the synthesis tool.
When does it have impact then?
Although the delay statement is ignored by the synthesis tool, the HDL code is a little bit different. That may change the seed of randomization in any stage (parsing, elaboration, synthesis etc.), so there is a possibility for different results. These results may be better or worse.
If a delay statement exists in the code, the following warning is expected from Xilinx ISE:
WARNING:Xst:916 - design.v line x: Delay is ignored for synthesis.

Ensuring propagation is complete in VHDL without an explicit click

I am looking to build a VHDL circuit which responds to an input as fast as possible, meaning I don't have an explicit clock to clock signals in and out if I don't absolutely need one. However, I am also looking to avoid "bouncing" where one leg of a combinatorial block of logic finishes before another.
As an example, the expression B <= A xor not not A should clearly never assign true to B. However, in a real implementation, the two not gates introduce delays which permit the output of that expression to flicker after A changes but the not gates have not propagated that change. I'd like to "debounce" that circuit.
The usual, and obvious, solution is to clock the circuit, so that one never observes a transient value. However, for this circuit, I am looking to avoid a dependence on a clock signal, and only have a network of gates.
I'd like to have something like:
x <= not A -- line 1
y <= not x -- line 2
z <= A xor y -- line 3
B <= z -- line 4
such that I guarantee that line 4 occurs after line 3.
The tricky part is that I am not doing this in one block, as the exposition above might suggest. The true behavior of the circuit is defined by two or more separate components which are using signals to communicate. Thus once the signal chain propagates into my sub-circuit, I see nothing until the output changes, or doesn't change!
In effect, the final product I'm looking for is a procedure which can be "armed" by the inputs changing, and "triggered" by the sub-circuit announcing its outputs are fully changed. I'd like the result to be snynthesizable so that it responds to the implementation technology. If it's on a FPGA, it always has access to a clock, so it can use that to manage the debouncing logic. If it's being implemented as an ASIC, the signals would have to be delayed such that any procedure which sees the "triggered" signal is 100% confident that it is seeing updated ouputs from that circuit.
I see very few synthesizable approaches to such a procedural "A happens-before B" behavior. wait seems to be the "right" tool for the job, but is typically only synthesizable for use with explicit clock signals.

Is the use of records the solution to all latch problems in VHDL

I was recently told that the solution to all (most) problems with unintended latches during VHDL synthesis is to put whatever the problematic signal is in a record.
This seems like it's a little bit too good to be true, but I'm not that experienced with VHDL so there could be something else that I'm not considering.
Should I put all my signals in records?
No, you should not put all your signals in records. This will quickly become very confusing and you will not gain anything by using the record.
One way that a record may help you avoid latches, is if you register an entire record in a clocked process, you are really registering all of the components of the record. This takes one line of code, instead of possibly tens of lines. In the case where you have many elements which all need to be treated the same, a record can save you "silly mistakes", and possibly save you from creating a latch.
As stated by others, a record doesn't have any specific synthesis interpretation. It is simply a group of signals that you are grouping together for coding-convenience.
I don't see how this would help - a record (or even just parts of a record) can become a latch just as easily as a signal. A latch is generated if a signal keeps its state through some combinatorial process (i.e., is not assigned a value on ALL paths through the process). The same holds for constituents of a record.
Records can be useful to group related signals for readability, but synthesis-wise a record is pretty much equivalent to a bunch of individual signals.
My personal suggestion to avoid latches: avoid combinatorial processes. Make all processes clocked, and do combinatorial logic at the architecture level.
A record is just another way of grouping other types, similar to using an array
for grouping of a std_logic to std_logic_vector, so there is nothing
magical about records that make them better for avoiding latches in a design.
If you get unintended latches in your design, what I guess you think of as
"latch problems", it is because you coding style specifies latches, and you
should change the coding style, as #zennehoy also suggests.
One approach can be to define some code templates for different constructions
that you use, and then stick to these known and working templates.
The template for a flip-flop (FF) with asynchronous reset can be:
process (clk_i, rst_i) is
begin
-- Clock
if rising_edge(clk_i) then
... Control structures with Qs assign by function for Ds
... Synchronous reset is just another branch
end if;
-- Reset (asynchronous) if required
if rst_i = '1' then
... Qs assign with constant reset value for so or all Qs
end if;
end process;
Use concurrent signal assigns when possible, and more complex expressions can
be done through use of concurrent function call, where a function is used
outside a process like:
z_o <= fun(a_i, b_i);
If a process is used to create combinatorial logic, then a common pitfall and
cause for latches in VHDL is to forget a signal in the sensitivity list.
However, VHDL-2008 has a solution for this, since you can use (all) as
sensitivity list, whereby all signal used in the process are implicitly
included in the sensitivity list. So if you use VHDL-2008, then your template
for combinatorial processes can be:
process (all) is
begin
z_o <= a_i and b_i;
end process;
These template should be all you need for typical synthesizable design, and
these will keep your design latch free.

ODDR2 usage found in auto-generated xilinx wrapper VHDL file

I'm using the TEMAC IP core to generate a 1gb ethernet MAC, and came across an interesting piece of code:
-- DDr logic is used for this purpose to ensure that clock routing/timing to the pin is
-- balanced as part of the clock tree
not_rx_clk_int <= not (rx_clk_int);
rx_clk_ddr : ODDR2
port map (
Q => rx_clk,
C0 => rx_clk_int
C1 => not_rx_clk_int,
CE => '1',
D0 => '1',
D1 => '0',
R => reset,
S => '0'
);
So according to my understanding, what's happening here is that a "new" clock is being generated by two clocks that are 180 degrees out of phase by using each clock as a select line input to the mux. (See very useful diagram below taken from page 64 in this document!)
When C0 is '1' then Q <= D0 which gives rx_clk <= '1', and if C1 is '1' then Q <= D1 which gives rx_clk <= '0'. During reset both flipflops are reset giving rx_clk <= '0' while reset = '1'
So I have a few questions:
Are the two clocks (not_rx_clk_int and rx_clk_int) going to be precisely 180 degrees out of phase when generated in this way? (by this way, I mean not_rx_clk_int <= not (rx_clk_int)). I assume not due to delta time? What are the implications of this?
What is the benefit of using the ODDR2 in the first place (why isn't rx_clk <= rx_clk_int adequate)? (Which leads to...)
What does it mean for a clock to be "balanced" as part of the clock tree? (clock tree mentioned briefly on page 59 here.)
Isn't rx_clk being gated during reset? Isn't this bad?
Is this the "standard" way of using a ODDR2 and/or performing this operation? Are there better options? (and hence, should I add this to my arsenal of useful VHDL bits and pieces? )
Feel free to suggest recommended reading and/or other resources. I don't want to blindly copy/paste this code into my project without knowing exactly what's going on here.
1) Are the two clocks (not_rx_clk_int and rx_clk_int) going to be precisely 180 degrees out of phase when generated in this way? (by this way, I mean not_rx_clk_int <= not (rx_clk_int)). I assume not due to delta time? What are the implications of this?
Yes, they will be pretty well exactly phased.
Delta-delays are not at issue here. They only apply to HDL simulations, standing in place of unknown "real" delays. I would hope that Xilinx got their model correct so that both edges change in the same delta cycle! ie. they do something like:
not_rx_clk <= not (rx_clk_int);
rx_clk <= rx_clk_int;
to match the deltas.
2) What is the benefit of using the ODDR2 in the first place (why isn't rx_clk <= rx_clk_int adequate)? (Which leads to...)
It ensures that the delay is predictable relative to the other IOs that you no doubt have synchronised with this clock. If you just drive the clock signal out of a pin, it has to come off the clock distribution network, through some routing, and then to the pin (as there's no direct route for a clock net to get to the IO pin. That's a delay which is unpredictable and likely to vary from one compile to another.
3) What does it mean for a clock to be "balanced" as part of the clock tree? (clock tree mentioned briefly on page 59 [here.][3])
As I understand it, it means that the clock tree makes sure that the clock goes the same distance (approximately) to every destination.
4) Isn't rx_clk being gated during reset? Isn't this bad?
Yes it is being turned on and off (I'd hesitate to use the word 'gated' as that means a specific thing - being fed through an AND gate - which this isn't). Only you can say if that matters - it depends on where it goes to.
5) Is this the "standard" way of using a ODDR2 and/or performing this operation? Are there better options? (and hence, should I add this to my arsenal of useful VHDL bits and pieces? )
Three questions in one, sneaky :)
Yes, it's (a) standard way of using ODDR2 (the other standard use is for actual DDR data of course).
No, I don't know of a better way to simply get a clock out.
Yes, add it to your arsenal.
Partial set of answers:
1) I'm amused by the unnecessary brackets in not (rx_clk_int); like a lot of Xilinx cores, it makes me wonder if it's auto-translated from Verilog or something; there's a lot of really bad VHDL in some of them. (So I'm easily amused.) Anyway...
Synthesis tools probably optimise out the separate "not" and use the falling edge of rx_clk_int, so you certainly can get 180 degree phase shift this way. (Whether it's guaranteed, or a more complex expression might fool synthesis, I can't say).
2) Straight assignment would take rx_clk_int off the clock tree, onto ordinary routing, through an output buffer and the total delay would be anybody's guess. This way you have precisely timed clocks directly in the IOB for more predictable timing.
3) FFs and IOBs right next to the clock gen don't see the clock before the ones in the far corner; balancing the clock tree is slowing up all the short paths to match the longest one. (You can see this on DIMM memory PCBs, a lot of zigzag lines on traces to lengthen them!)
4) I would expect it to be gated. Whether that's bad depends on what it's clocking. Perhaps an Ethernet expert can chip in here. Or chase the logic driving "reset" to this block; it may not be the main system reset, to fix this issue.
5) It's certainly a fairly well known trick on newer FPGAs (ones with DDR regs), and very useful for clocks in addition to their main purpose (DDR inderfaces to memory etc). Keep it handy!

Make a Verilog module sensitive to a switch turning off

This would make a Verilog module sensitive to a clock and a reset switch being turned on:
always #(posedge clk, posedge rst)
How would this be changed to being sensitive to a switch being turned off?
If you want your block to be sensitive to a switch being turned off, you'll want a negedge in front of the name of the switch input, for example, "switch_line":
always #(posedge clock, posedge reset, negedge swtich_line)
If you just want to have a flipflop check the status of a switch on every positive edge of the clock cycle,
always #(posedge clock, posedge reset)
if (!switch_line)
// ...
else
// ...
Are you trying to model a flip-flop, latch, or perhaps some new type of hardware? Usually, only flipflops and latches are interested in the clock signal. A flip-flop with an asynchronous reset is modeled as
always #(posedge clock, posedge reset)
For a synchronous reset, drop the reset signal from the sensitivity list.
As per the user's comment, another option is to just plug-in the go signal for the reset signal. When you are hooking up this module, you can do the following:
mymodule UUT(
.clock(clock),
.reset(~go),
//...
);
If you negate go, you'll get the same as behavior as reset, just inverted (e.g. a signal going from 1->0).
Unless you are running at a very low clock frequency the change in button state will be much slower that a clock period. Treating the button input as a async signal is therefore not a good idea. Instead you probably want to sample the input and also probably remove glitches.
As minimum sample the input by a register and then have your control FSM look at that register and when the expected change is detected move to the appropriate state. This means that it will take the design 1-2 cycles to "react" to the button change. But, again, unless the clock frequency is very low, the period will be short enough that no human will notice a few cycles latency.
If you connect the signal to the reset you (1) loose all state info by pressing the button (bad) and suddenly have two reset signals. Hot rodding the design like that might work, but is bad design methodology and will make your design sensitive to noise etc.

Resources