Passing clock between entities

Passing clock between entities - vhdl

my doubt is how to pass a clock between two entities that are at the same hierarchical level in VHDL.
What I have is an entity "wrapper" in which there are instantiated two components "comp_1" and "comp_2". comp_1 has an output port (let's say "clk_out") that is its clock and that must be also the clock for comp_2. Now, if I use a signal in "wrapper" to pass the clock from comp_1 to comp_2, this cause a functional error in simulation (at least with Modelsim), because the two designs are considered not synchronous (right?). Can this cause an error also in synthesis (with Xilinx)? How can I avoid the problem without changing all the structure?
architecture bhv of my_wrap is
signal tmp_clk : std_logic;
begin
comp_1_i : comp_1
port map(out_clk => tmp_clk,
...
);
compo_2_i : comp_2
port map(in_clk => tmp_clk,
...
);
In this case, in simulation there is the delta cycle problem in the signals between the two components. Can this problem also affect the implemented design on FPGA?

It sounds like you may have a delta cycle delay on the clock, which is a feature in VHDL, but it may appear as if clock and data is out of sync.
This only shows in simulation, but is general VHDL thus not ModelSim specific. After synthesis (in hardware) the internal delay gives similar behavior. Note that ModelSim has a feature ("Expanded Time Delta Mode") to show delta delays.
Without code, I guess that the generated clock in comp_1 is also used for output generation, besides being output on clk_out. Depending on the implementation, it may result in a delta cycle delay difference between clock and data, which is may appear as not synchronous, but it is actually a delta cycle issue.
A possible fix is to output the generated clock from comp_1 without using it, and then making an additional clk_in input on comp_1, similar to the clk_in on comp_2, and then use that clock internally in comp_1. The clock use will then be similar on comp_1 and comp_2, removing the issue with delta delays on clock.

As Morten also pointed out some source code could help making your question more precise.
There is nothing wrong in connecting the clock out signal from one component to the clock in signal of another component. What might be a problem in your case is the way you generate the clock signal.
Depending on your use case you have different options.
If your target is an FPGA you should use a clock generator IP form the given vendor.

Related

Integer output turns to binary in synthesize ISE

I have a VHDL BCD counter whose output is an integer value (digit).
But when I simulate the code in Xilinx ISE it shows the code's waveform in binary value. The code works but the output should be integer but it's not. I have tested this code in Modelsim and the output is correct and it's in integer value. This problem is in code synthesize too and the value is binary.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity bcdcnt is
Port ( clk : in STD_LOGIC;
digit : out INTEGER RANGE 0 TO 9);
end bcdcnt;
architecture Behavioral of bcdcnt is
begin
count: PROCESS(clk)
VARIABLE temp : INTEGER RANGE 0 TO 10;
BEGIN
IF (clk'EVENT AND clk = '1') THEN
temp := temp + 1;
IF (temp = 10) THEN temp := 0;
END IF;
END IF;
digit <= temp;
END PROCESS count;
end Behavioral;

That's what synthesis does.
That's what synthesis MUST do : it translates your high level design to the resources in your FPGA or ASIC, which are binary. So, what's the problem here?
If you need to simulate the post-synth result, the usual approach is to create a wrapper entity that takes the correct port types, and translates between those and the post-synthesis netlist component.
Then, simulation should work with either the original entity, or this wrapper entity, both of which have integer ports.
Better still, you can re-use the same entity, and add the wrapper as a second architecture, thus guaranteeing that it uses the same interface (ports).
(You can even instantiate both the original and post-synth in its wrapper in the testbench, in parallel with a comparator on their outputs, to see they both do the same thing. But note there will be gate-level delays between them; usually you check outputs only on clock edges so these do not matter.)
Another approach is to restrict port types on the top level of the design to binary types like std_logic_vector. This plays nicer with badly designed tools, like ISE where the automatically generated testbench will have binary port types, (I generally edit them back to the correct ones; it's almost easier to write TBs from scratch).
But it restricts you to using an obscure and complex design style instead of higher level abstractions like Integers.
It's bad - really bad - that this approach is taught and encouraged so widely. But it is, and sometimes you'll just have to live with it. (Even in this approach, there's no reason to avoid decent abstractions internal to the FPGA, as long as the synthesis tool understands them).
A third approach - roughly, "trust, but verify" - is to trust that synthesis tools are competently written - which is usually true - and forget about post-synthesis simulation.
Just verify the design thoroughly at the behavioural level in simulation, then synthesise it, and test in live FPGA.
99% of the time (unless you're writing really weird VHDL), synth and P&R have done the right thing, and any differences you see are due to the aforementioned I/O timings (gate delays at the I/O pins). Then model these in the testbench and/or wrapper until you see the same behaviour in both (fix anything that needs fixing, and re-synthesise).
In this approach you only need to bother with the (MUCH slower) post-synth and post-PAR simulations if you need to track down a suspected synthesis tool bug.
This does happen : I've seen two in a quarter century.
Most of the time I just use the third approach.

Detecting rising edge synchronization of 2 different clocks

How do you detect rising edge synchronization of 2 different clocks(different frequencies) in VHDL programming using Xilinx software?
There is a main clock of frequency 31.845 Mhz , and another clock of frequency 29.972 Mhz. So the basic aim is to trigger an action when there is synchronization between the rising edges of 2 clocks. We tried implementing it using flipflops but we could achieve only Level synchronization, not Edge sync.
And we cannot compare the rising edges of 2 different clocks in statements like IF and WAIT in vhdl, So that is out of question.
We are trying to count pulses using a counter. For that, we need to stop the count whenever edge matching takes place. We are trying to implement a method called 'Vernier Interpolation'.
Initially, we used the following statement code, but since rising edges of 2 different clocks (clk0, clk1) cannot be compared in an IF statement, we had to drop it.
if(rising_edge(clk0)=rising_edge(clk1)) then wait;
We then tried using WAIT statements (wait until) but it failed.
Then we tried using flipflops and delay circuits (D flipflop), but it resulted in level sync, and not Edge sync.

Firstly I'm not sure why you would want to do this. What you will get out is a new clock at the beat frequency between the two clocks.
The correct way to do this is to sample both clocks using another clock which is at least twice the frequency of the highest expected input. You could generate this higher clock using one of the PLLs in the device. x2 is a minimum. Ideally use a clock which is much higher than both sampled clocks.
Remember VHDL is not a language, it a description of synthesis of real hardware. So just saying Rising_Edge(clk1) = Rising_Edge(clk2) does not make the 'software' detect edges. All the function Rising_Edge really does is to tell the hardware to connect the clk signal to the clock input of a flipflop.
The proper solution is sample both 'clocks' in a process which is clocked by the a sample clock, look for edges (an edge being two subsequent samples that are different) then AND the result and latch if required.
sample code (untested, sorry no time right now).
entity twoclocks is
port (
op : out std_logic;
clk1 : in std_logic;
clk2 : in std_logic;
sample_clk : in std_logic);
end entity;
architecture RTL of twoclocks is
begin
process sample(sample_clock, clk1, clk2):
begin
if rising_edge(sample_clock):
clk1_d <= clk1;
clk2_d <= clk1;
if clk1_d != clk1 and clk2_d != clk2 then
op <= '1';
else
op <= '0';
end if;
end if;
end process;
end architecture;

The kind of vernier interpolator you want needs to be build using very tight timing constraints, thus you can probably not make it using VHDL alone. You need (a lot of) device specific constraints on resource locations and timing.
Please check out the work by A.Aloisio et al.. Aloisio and colleagues have build a vernier interpolator using specific Xilinx delay elements.
Standard VHDL synthesis is mostly suited for register transfer level descriptions. I.e. clocked/synchronous logic. But to compare these two inputs, you would need to sample them at a frequency of the least common multiple of both frequencies. For 31.845 MHz and 29.972 MHz that is a whopping 954.458340 MHz, which is a lot. I have seen these kind of speeds in FPGA logic though.
... But I'm thinking you might even need to double that, due to Nyquist. Maybe FPGA logic can nowadays handle 2 GHz swichting rate. But I'm not sure.
It might be possible to utilize a GT transceiver for this, but since that would be non-standard use of a such a transceiver, it might be hard to realize.

VHDL concurrent selective assignment synthesis

a real junior question with hopefully a junior answer, regarding one of the main assignments of VHDL (concurrent selective assignment) can anyone explain what a VHDL compiler would synthesise the following description into?
LIBRARY IEEE;
USE IEEE.std_logic_1164.ALL;
USE IEEE.numeric_std.ALL;
ENTITY Q2 IS
PORT (a,b,c,d : IN std_logic;
EW_NS : OUT std_logic
);
END ENTITY Q2;
ARCHITECTURE hybrid OF Q2 IS
SIGNAL INPUT : std_logic_vector(3 DOWNTO 0);
SIGNAL EW_NS : std_logic;
BEGIN
INPUT <= (a & b & c & d); -- concatination
WITH (INPUT) SELECT
EW_NS <= '1' WHEN "0001"|"0010"|"0011"|"0110"|"1011",
'0' WHEN OTHERS;
END ARCHITECTURE hybrid;
Why do I ask? well I have previously gone about things the wrong way i.e. describing things on VHDL before making a block diagram of the components needed. I would envisage this been synthed as a group of and gate logic ?
Any help would be really helpful.
Thanks D

You need to look at the user guide for your target FPGA, and understand what is contained within one 'logic element' ('slice' in Xilinx terminology). In general an FPGA does not implement combinatorial logic by connecting up discrete gates like AND, OR, etc. Instead, a logic element will contain one or more 'look-up tables', with typically four (but now 6 in some newer devices) inputs. The inputs to this look up table (LUT) are the inputs to your logic function, and the output is one of the outputs of the function. The LUT is then programmed as a ROM, allowing your input signals to function as an address. There is one ROM entry for every possible combination of inputs, with the result being the intended logic function.
A function with several outputs would simply use several of these LUTs in parallel, with the same inputs, one LUT for each of the function's outputs. A function requiring more inputs than the LUT has (say, 7 inputs, where a LUT has only 4), simply combines two LUTs in parallel, using a multiplexer to choose between the output of the two LUTs. This final multiplexer uses one of the input signals as it's control, and again every possible combination of inputs is accounted for.
This may sound inefficient for creating something simple like an AND gate, but the benefit is that this simple building block (a LUT) can implement absolutely any combinatorial function. It's also worth noting that an FPGA tool chain is extremely good at optimising logic functions in order to simplify them, and to better map them into the FPGA. The LUT provides a highly generic element for these tools to target.
A logic element will also contain some dedicated resources for functions that aren't well suited to the LUT approach. These might include dedicated carry chains for adders, multiplexers for combining the output of several LUTS, registers (most designs are synchronous). LUTs can also sometimes be configured as small shift registers or RAM elements. External to the logic elements, there will be more specific blocks like large multipliers, larger memories, PLLs, etc, none of which can be as efficiently implemented using LUT resource. Again, this will all be explained in the user guide for your target FPGA.

Back in the day, your code would have been implemented as a single 74150 TTL circuit, which is a 16-to-1 mux. you have a 4-bit select (INPUT), and this selects one of 16 inputs to the chip, which is routed to a single output ('EW_NS`). The 74150 is obsolete and I can't find any datasheets, but it's easy to find diagrams of what an 8-to-1 mux looks like (here, for example). The 16->1 is identical, but everything is wider. My old TI databook shows basically exactly the diagram at this link doubled up.
But - wait. Your problem is easier, because you're not routing real inputs to the output - you're just setting fixed data values. On the '150, you do this by wiring 5 of the 16 inputs to 1, and the remaining 11 to 0. This makes the logic much easier.
The 74150 has basiscally exactly the same functionality as a 4-input look-up table (where the fixed look-up data is the same as fixed levels at the '150 inputs), so it's trivial to implement your entire circuit in a single LUT in an FPGA, as per scary_jeff's answer, rather than using a NAND-level implementation. In a proper chip, though, it would be implemented as a sum-of-products, or something similar (exactly what's in the linked diagram). In this case, draw a K-map and find a minimum solution. My 2 minutes on the back of an envelope comes up with three 3-input AND gates, driving a 3-input OR gate. I'll leave it as an exercise to you to check this :)

Do all Flip Flops in a design need to be resettable (ASIC)?

I'm trying to understand clock-reset in a chip. In a design what criteria are used to decide whether a flop should be assigned to a value (typically to zero) during reset?
always_ff #(posedge clk or negedge reset) begin : process_w_reset
if(~reset) begin
flop1 <= '0;
....
end else begin
if (condition) begin
flop1 <= something ;
....
end
end
end
always_ff #(posedge clk) begin : process_wo_reset
if (condition) begin
flop1 <= something ;
....
end
end
Is it a bad practice to not to reset a flop which is used later as a control signal in a comb logic? What if the design makes sure that the flop will have a valid value (0 or 1) assigned to it before its used in a comb logic block (i.e. in a if statement or in FSM comb logic) ?
I feel like it's better to always reset all the flops in the design. In that way there won't be any Xs after reset in the chip. However, it seems like for datapath logic, resetting flop might need not be a big deal as it'll be just pipe stages. However if a flop is in control path (i.e. FSM next state comb logic) then it should be reset to a default value. Is my understanding correct? I don't know much about DFT and not sure if it has any other implication.

Assuming that reset means asynchronous reset, as in the code examples.
The answer is partly opinion based, since a design can be made to work with reset of a minimum number of the Flip-Flops (FFs) and all of the FFs.
I suggest that a minimum number of FFs are reset, and typically that leads to reset of most FFs in the control path, and no reset of FFs in the data path. The advantages of this approach are outlined below.
Simulation is often conservative with respect to propagation of uninitialized values, both for Verilog and VHDL, so it is like simulation can check both 0 and 1 values at once when the value is uninitialized.
Bugs due to FFs that are not reset, are therefore likely to show earlier in verification with simulation, and the designer thereby gets valuable feedback about wrong design assumptions, which may lead to corrections in the design that fixes other bugs. Just resetting all the FFs is likely to hide such bugs.
It may seem like design and verification is just easier if all FFs are reset, both in control and data path, since it fixes all those "annoying" X propagation in the design. But it requires an increased number of tests in order to verify all value combinations when X propagation is suppressed through reset.
Implementation gives a smaller load on the reset signal, so it is easier to meet timing of the reset net throughout the chip.
DFT (Design For Test) in general, then adding reset to the FFs will not aid DFT in finding nets stuck at reset value. With a DFT scan chain approach where all the FFs are loaded through the scan chain, then the lack of reset on some FFs will not require more vectors.

Generally you need to think about where the 'X's will propagate in your simulation and which ones matter and which ones are don't care conditions. For example, if you have a block of logic which doesn't start operating until an enable bit is set, so long as the enable bit itself is set and enough upstream logic is reset so reset values will propagate through to the enabled logic in time, you are most likely OK with not reseting the logic in between. However, you do want to reset any logic that feeds back into itself, (for example state machines) otherwise the upstream resets will never be able to establish a known state in the feedback block.

I agree with Morten Zilmer that you should only reset flops that require resetting, although my background is more FPGA than ASIC.
It's worth pointing out there is a gotcha in Verilog / SystemVerilog - if you have a clocked process that drives registers that are reset and registers that aren't you will end up inferring a clock enable or an additional mux on the input of your flip-flop.
This is usually not what was intended.
There is a more detailed explanation in this answer. I also wrote a blog post outlining a mechanism for abstracting away synchronous/asynchronous and active high/low reset.

As a general rule of thumb, you should probably always reset control signals.
For data flops, resetting can cost you area, so it really depends on whether you care about area.
In recent years simulators started to support X propagation modes that allow you to catch some of the X issues in RTL (instead of gate level simulation). It is a good practice to run these to make sure you don't have a reset problem with uninitialized sram or flops.

variable assignment and synthesizable code

Simply having a code like this :
if(rising_edge(clk)) then
temp(0):="001";
temp(1):="011";
temp(2):="101";
temp(3):="000";
temp(0):=temp(3)xor temp(5);
end if
For the example above all this variable assignment would be done in 1 clock cycle which is pretty unpractical. In the behavioral simulation it works fine but in post synthesis it's messed up. Can I add like a delay or a sth like a wait(wait statement is un-synthesizable) to make it wait util the variable gets its value before jumping to the next line?

Doing all of those things in one clock cycle is simple. Hardware is extremely fast, and FPGA clock rates aren't that high relative to processors.
Since you are using variables, the intermediate results are used immediately. If you want a more explicit delay, you could use a signal. The above code with signals would use temp(3) from the previous rising edge.

for synthesis you can not make delays like wait. well defined, controllable delays in synthesis can only be made with pipelining (clock cycles as delay units).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio