I made a VHDL Design which consists of 2 Modules. One handles the communication and runs at 100 Mhz (I can't change this). The other module does the calculation and has to run at 45 Mhz because of timing.
The second module has an input pin called "newPair" which I want to set to high when new data is ready and can be processed by the second module. Now my second module only works on the rising edge of the clock so I need to set this pin one 45 Mhz clock cycle high and then pull it down low. This has to be done from the module which runs at 45 Mhz. How can I accomplish this? I though about creating a DCM for this purpose but this seems a little bit overpowered for this.
Thanks!
Update:
I'm working on a Spartan-6 (xc6slx150)
Your design has 2 clock domains, so you'll need synchronizer circuits to transfer information from one clock domain to the other.
Part 1 - Synchronizers
The basic synchronizer is build of 2 chained D-FF. This synchronizer can be used for flag signals (these are signals not changing very often). It can not be used for strobe signals (these are signals, which are high for 1 cycle), because strobes can be missed or double seen on the destination clock domain.
2 D-FF synchronizer:
genLoop : for i in Input'range generate
signal Data_async : STD_LOGIC;
signal Data_meta : STD_LOGIC := '0');
signal Data_sync : STD_LOGIC := '0' ;
begin
Data_async <= Input(i);
process(Clock)
begin
if rising_edge(Clock) then
Data_meta <= Data_async;
Data_sync <= Data_meta;
end if;
end process;
Output(i) <= Data_sync;
end generate;
This code can be improved by vendor specific VHDL attributes. See my linked source for a generic, an Altera and a Xilinx variant.
Source: PoC.misc.sync.Bits
See also these timing constraints stored in UCF files for Xilinx ISE designs:
- https://github.com/VLSI-EDA/PoC/blob/master/ucf/MetaStability.ucf
- https://github.com/VLSI-EDA/PoC/blob/master/ucf/misc/sync/sync_Bits_Xilinx.ucf
Part 2 - Building a synthesizer circuit for strobe signals
A strobe capable cross clock synchronizer is build of:
1 T-FF to encode signal changes in the source clock domain
2 D-FF as a flag synchronizer, as described in part 1
a change detector to restore the original value in the destination clock domain (1 D-FF and an XOR)
This synchronizer can be used to transfer your newPair signal to the 45 MHz clock domain. You will need the same circuit for the way back :)
The following example implements a busy signal to indicate the transfer process. Asserting Input while Busy is high leads to ignored strobes.
entity sync_Strobe IS
generic (
BITS : POSITIVE := 1; -- number of bit to be synchronized
GATED_INPUT_BY_BUSY : BOOLEAN := TRUE -- use gated input (by busy signal)
);
port (
Clock1 : in STD_LOGIC; -- <Clock> input clock domain
Clock2 : in STD_LOGIC; -- <Clock> output clock domain
Input : in STD_LOGIC_VECTOR(BITS - 1 downto 0); -- #Clock1: input bits
Output : out STD_LOGIC_VECTOR(BITS - 1 downto 0); -- #Clock2: output bits
Busy : out STD_LOGIC_VECTOR(BITS - 1 downto 0) -- #Clock1: busy bits
);
end entity;
architecture rtl of sync_Strobe is
attribute SHREG_EXTRACT : STRING;
signal syncClk1_In : STD_LOGIC_VECTOR(BITS - 1 downto 0);
signal syncClk1_Out : STD_LOGIC_VECTOR(BITS - 1 downto 0);
signal syncClk2_In : STD_LOGIC_VECTOR(BITS - 1 downto 0);
signal syncClk2_Out : STD_LOGIC_VECTOR(BITS - 1 downto 0);
begin
gen : for i in 0 to BITS - 1 generate
signal D0 : STD_LOGIC := '0';
signal T1 : STD_LOGIC := '0';
signal D2 : STD_LOGIC := '0';
signal Changed_Clk1 : STD_LOGIC;
signal Changed_Clk2 : STD_LOGIC;
signal Busy_i : STD_LOGIC;
-- Prevent XST from translating two FFs into SRL plus FF
attribute SHREG_EXTRACT OF D0 : signal is "NO";
attribute SHREG_EXTRACT OF T1 : signal is "NO";
attribute SHREG_EXTRACT OF D2 : signal is "NO";
begin
process(Clock1)
begin
if rising_edge(Clock1) then
-- input delay for rising edge detection
D0 <= Input(I);
-- T-FF to converts a strobe to a flag signal
if (GATED_INPUT_BY_BUSY = TRUE) then
T1 <= (Changed_Clk1 and not Busy_i) xor T1;
else
T1 <= Changed_Clk1 xor T1;
end if;
end if;
end process;
-- D-FF for level change detection (both edges)
D2 <= syncClk2_Out(I) when rising_edge(Clock2);
-- assign syncClk*_In signals
syncClk2_In(I) <= T1;
syncClk1_In(I) <= syncClk2_Out(I); -- D2
Changed_Clk1 <= not D0 and Input(I); -- rising edge detection
Changed_Clk2 <= syncClk2_Out(I) xor D2; -- level change detection; restore strobe signal from flag
Busy_i <= T1 xor syncClk1_Out(I); -- calculate busy signal
-- output signals
Output(I) <= Changed_Clk2;
Busy(I) <= Busy_i;
end generate;
syncClk2 : entity PoC.sync_Bits
generic map (
BITS => BITS -- number of bit to be synchronized
)
port map (
Clock => Clock2, -- <Clock> output clock domain
Input => syncClk2_In, -- #async: input bits
Output => syncClk2_Out -- #Clock: output bits
);
syncClk1 : entity PoC.sync_Bits
generic map (
BITS => BITS -- number of bit to be synchronized
)
port map (
Clock => Clock1, -- <Clock> output clock domain
Input => syncClk1_In, -- #async: input bits
Output => syncClk1_Out -- #Clock: output bits
);
end architecture;
Source: PoC.misc.sync.Strobe
Part 2 - Special synthesizer circuits
I assume you'll also transfer data from one clock domain to the other one. So you'll need either a multi bit synchronizer (build upon the strobe synchronizer) or a cross clock capable FIFO.
The PoC-Library, I'm contributing to, has also multi bit/vector synchronizers. See the other modules in the linked source folder. And there is a cross clock / independent clock (ic) FIFO, too.
You don't need to assert some signal for a specific amount of time.
If the data exchange between first and second modules happens sparsely, you can use level triggering to let the second block know the data is ready. (You can read this answer to understand the difference between level and edge triggering).
If you need to handle streaming data, you'll need to use an asynchronous FIFO. (This answer may give you more info on this).
As an advice, I wouldn't use the signal assertion for a specific number of cycles approach because it's not a good practice for reusable code (it works for this specific design but you may need to recalibrate the number of cycles if the period of your clocks change for some reason).
Related
I am taking a vhdl online course.
One of the laboratory work is: "Based on frequency divider and 8-bit cyclic shift register implement a ring counter with a shift period of 1 s."
The task says that the most significant bit of the counter cannot be used as the clock signal of the shift register (i.e. in the if rising_edge (shifter (MSB)) construction.
It is necessary to form the enable signal as a strobe.
I did the job. The result is accepted.
I have a question related to shift register by enable.
shift_reg_proc : process(clk)
begin
if (rising_edge(clk)) then
if (srst = '1') then
shift_reg <= "10000000";
elsif (en = '1') then
shift_reg <= shift_reg(0) & shift_reg(7 downto 1);
end if;
end if;
end process shift_reg_proc
If the duration of the enable signal is 1 period clk, then there is a probability that at the moment of rising_edge (clk) the en signal level will not have time to become = 1.
If this is the case, then it is not guaranteed that the register shift will occur in the next second.
Is there any "correct" way to do this task?
Is it so? Is my decision correct? Is the lab clue misleading?
I am attaching the implementation code, test bench and wave image.
ring_counter.vhd
--------------------------------------------------------------------------------
-- Based on frequency divider and 8-bit cyclic shift register implement a ring
-- counter with a shift period of 1 s.
--------------------------------------------------------------------------------
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
entity ring_counter is
port(clk : in std_logic;
srst : in std_logic;
dout : out std_logic_vector(7 downto 0);
en_o : out std_logic
);
end entity ring_counter;
architecture behave of ring_counter is
signal cntr : std_logic_vector(26 downto 0) := (others => '0');
signal cntr_msb_delayed : std_logic;
signal shift_reg : std_logic_vector(7 downto 0);
signal en : std_logic;
constant cntr_msb_num : integer := 4; -- 26 for DE board, 4 for test bench
begin
-- signal for test bench
en_o <= en;
--------------------------------------------------------------------------------
-- Counter implementation
--------------------------------------------------------------------------------
cntr_proc : process(clk)
begin
if (rising_edge(clk)) then
if (srst = '1') then
cntr <= (others => '0');
else
cntr <= unsigned(cntr) + 1;
end if;
end if;
end process cntr_proc;
----------------------------------------------------------------------------
-- Shift register implementation
----------------------------------------------------------------------------
shift_reg_proc : process(clk)
begin
if (rising_edge(clk)) then
if (srst = '1') then
shift_reg <= "10000000";
elsif (en = '1') then
shift_reg <= shift_reg(0) & shift_reg(7 downto 1);
end if;
end if;
end process shift_reg_proc;
dout <= shift_reg;
----------------------------------------------------------------------------
-- Enable signal generation
----------------------------------------------------------------------------
-- Counter MSB delay for 1 period of clk
delay_proc : process(clk)
begin
if (rising_edge(clk)) then
cntr_msb_delayed <= cntr(cntr_msb_num);
end if;
end process delay_proc;
en <= cntr(cntr_msb_num) and not cntr_msb_delayed;
end architecture behave;
ring_counter_tb.vhd
library ieee;
use ieee.std_logic_1164.all;
entity ring_counter_tb is
end entity ring_counter_tb;
architecture behave of ring_counter_tb is
component ring_counter is
port(clk : in std_logic;
srst : in std_logic;
dout : out std_logic_vector(7 downto 0);
en_o : out std_logic
);
end component ring_counter;
signal clk : std_logic;
signal srst : std_logic;
signal dout : std_logic_vector(7 downto 0);
signal en_o : std_logic;
constant clk_period : time := 4 ns;
begin
dut : ring_counter
port map (
clk => clk,
srst => srst,
dout => dout,
en_o => en_o
);
clk_gen : process
begin
clk <= '0';
wait for clk_period;
loop
clk <= '0';
wait for clk_period/2;
clk <= '1';
wait for clk_period/2;
end loop;
end process clk_gen;
srst <= '0',
'1' after 100 ns,
'0' after 150 ns;
end architecture behave;
wave for test bench
TL;DR
The rising edge of clk after which en is raised is not the same as the rising edge of clk at which your shift register shifts. en is asserted high after rising edge N and de-asserted after rising edge N+1. Your shift register is thus shifted at rising edge N+1.
So you have about one clock period delay between assertions of en and the register shifts. You don't care because your specification says that you want a shift period of 1 second. As long as en is periodic with a period of one second, even if there is a small constant delay between en and your shift register, you fulfill the specifications.
But what is of uttermost importance is that, as it is seen by your shift register, en is asserted high sufficiently after rising edge N to avoid a too early shift and de-asserted sufficiently after rising edge N+1 to allow a good nice shift. If you are interested in this too, please continue reading.
Detailed explanation
Your en signal is computed from the outputs of registers synchronized on the same clock clk as your shift register. You cannot have any hold time problem there: the propagation delay from the rising edge of the clock to the outputs of your cntr and cntr_msb_delayed registers guarantee that en will arrive at your shift register sufficiently after the rising edge of the clock that caused it (assuming you don't have large clock skews). It cannot arrive too early.
Can it arrive too late (setup time problem)? Yes, if your clock frequency is too high. The clock period would then be too short, en would not have enough time to be computed, stabilize and propagate to your shift register before the next rising edge of the clock and anything could happen (no shift at all, partial shift, metastabilities...)
This is a very common concern in digital design: you cannot operate at an arbitrarily high clock frequency. If you could you would clock your own computer at yotta-Hz or even more, instead of giga-Hz, and everything would become instantaneous. It would be nice but it is not how the real world works.
In a digital design you always have what is called a critical path. It is a particular chain of logic gates between a set of source registers and a destination register, along which the propagation delay of electrical signals is the largest of the whole design.
Which path it is among all possible and the total delay along this path depend on your design's complexity (e.g. the number of bits of your counter), your target hardware technology (e.g. the FPGA of your prototyping board) and the operating conditions (temperature, voltage of power supply, speed-grade of your FPGA).
(Yes, it depends also on the temperature, reason why hard-core gamers cool down their computers with high performance cooling systems. This avoids the destruction of the silicon and allows to operate the computer at a higher clock frequency with more frames per second and a better user experience.)
The largest time it takes for the signals to travel from the source clock-edge to the arrival at destination, augmented by a small security margin called the setup time of the destination register, is the smallest clock period (highest clock frequency) at which you can run your design. As long as you don't exceed this limit your system works as expected.
Hardware design tool chains usually comprise a Static Timing Analyzer (STA) that tells you what this maximum clock frequency is for your design, your target, and your operating conditions. If it tells you 500 MHz and you need only 350 MHz, everything is fine (you could however investigate and see if you could modify your design, save some hardware, and still run at 350 MHz).
But if you need 650 MHz it is time to roll up your sleeves, look at the critical path (the STA will also show the path), understand it and rework your design to speed it up (e.g. pipeline long computations, use carry look ahead adders instead of carry ripple...) Note that, usually, when you encounter timing closure problems you do not consider only one critical path but the set of all paths that exceed your time budget because you want to eliminate them all, not just the worst. This is why the STA gives you not only the worst critical path but a list of critical paths, in decreasing order of severity.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I want to transfer a pulse from a clock domain clk1 to another clock domain clk2, but we don't know which one is faster than the other!
What is the best way to do that?
Thanks,
You need a strobe synchronizer.
A strobe synchronizer translates a rising edge on input to a level change (T-FF). This level change is transferred to the second clock domain via 2-FF synchronizer. The information is then restored by an XOR gate. (Note: A T-FF is a D-FF with a XOR to invert the state on every high input.)
In addition, a busy state can be calculated to notify the sending clock domain about the state of the overall circuit. This busy signal can be used to lock the input's rising edge detection.
The circuit looks like this:
Source Code for multiple bits at once:
library IEEE;
use IEEE.STD_LOGIC_1164.all;
use IEEE.NUMERIC_STD.all;
library PoC;
use PoC.sync.all;
entity sync_Strobe is
generic (
BITS : positive := 1; -- number of bit to be synchronized
GATED_INPUT_BY_BUSY : boolean := TRUE; -- use gated input (by busy signal)
SYNC_DEPTH : T_MISC_SYNC_DEPTH := T_MISC_SYNC_DEPTH'low -- generate SYNC_DEPTH many stages, at least 2
);
port (
Clock1 : in std_logic; -- <Clock> input clock domain
Clock2 : in std_logic; -- <Clock> output clock domain
Input : in std_logic_vector(BITS - 1 downto 0); -- #Clock1: input bits
Output : out std_logic_vector(BITS - 1 downto 0); -- #Clock2: output bits
Busy : out std_logic_vector(BITS - 1 downto 0) -- #Clock1: busy bits
);
end entity;
architecture rtl of sync_Strobe is
attribute SHREG_EXTRACT : string;
signal syncClk1_In : std_logic_vector(BITS - 1 downto 0);
signal syncClk1_Out : std_logic_vector(BITS - 1 downto 0);
signal syncClk2_In : std_logic_vector(BITS - 1 downto 0);
signal syncClk2_Out : std_logic_vector(BITS - 1 downto 0);
begin
gen : for i in 0 to BITS - 1 generate
signal D0 : std_logic := '0';
signal T1 : std_logic := '0';
signal D2 : std_logic := '0';
signal Changed_Clk1 : std_logic;
signal Changed_Clk2 : std_logic;
signal Busy_i : std_logic;
-- Prevent XST from translating two FFs into SRL plus FF
attribute SHREG_EXTRACT of D0 : signal is "NO";
attribute SHREG_EXTRACT of T1 : signal is "NO";
attribute SHREG_EXTRACT of D2 : signal is "NO";
begin
process(Clock1)
begin
if rising_edge(Clock1) then
-- input delay for rising edge detection
D0 <= Input(i);
-- T-FF to converts a strobe to a flag signal
if GATED_INPUT_BY_BUSY then
T1 <= (Changed_Clk1 and not Busy_i) xor T1;
else
T1 <= Changed_Clk1 xor T1;
end if;
end if;
end process;
-- D-FF for level change detection (both edges)
D2 <= syncClk2_Out(i) when rising_edge(Clock2);
-- assign syncClk*_In signals
syncClk2_In(i) <= T1;
syncClk1_In(i) <= syncClk2_Out(i); -- D2
Changed_Clk1 <= not D0 and Input(i); -- rising edge detection
Changed_Clk2 <= syncClk2_Out(i) xor D2; -- level change detection; restore strobe signal from flag
Busy_i <= T1 xor syncClk1_Out(i); -- calculate busy signal
-- output signals
Output(i) <= Changed_Clk2;
Busy(i) <= Busy_i;
end generate;
syncClk2 : entity PoC.sync_Bits
generic map (
BITS => BITS, -- number of bit to be synchronized
SYNC_DEPTH => SYNC_DEPTH
)
port map (
Clock => Clock2, -- <Clock> output clock domain
Input => syncClk2_In, -- #async: input bits
Output => syncClk2_Out -- #Clock: output bits
);
syncClk1 : entity PoC.sync_Bits
generic map (
BITS => BITS, -- number of bit to be synchronized
SYNC_DEPTH => SYNC_DEPTH
)
port map (
Clock => Clock1, -- <Clock> output clock domain
Input => syncClk1_In, -- #async: input bits
Output => syncClk1_Out -- #Clock: output bits
);
end architecture;
The source code can be found here: PoC.misc.sync.Strobe
Another solution to your issue is the Flancter, best explained by Doulos:
https://www.doulos.com/knowhow/fpga/fastcounter/
Hey I have a verilog version of this code compatible for xilinx tools,
src_pulse -> src_clk level_convertor-> 2-Stage dest Synchronizer -> dest_pulse detector
this is the sequence steps to follow when transmitting a pulse between 2 relation unknown clock domains.
module pulse_cdc(
input src_clk,
input src_pulse,
input dest_clk,
output logic dest_pulse
);
(* ASYNC_REG = "TRUE" *)logic [2:0] dest_sync;
logic src_clk_level = '0;
//------------------------------//
//-- Pulse to level convertor --//
//------------------------------//
always_ff #(posedge src_clk)
if(src_pulse) src_clk_level <= #10 ~src_clk_level;
else src_clk_level <= #10 src_clk_level;
//----------------------------//
//------- Synchronizer -------//
//----------------------------//
always_ff #(dest_clk)begin
dest_sync[0] <= #10 src_clk_level;
dest_sync[1] <= #10 dest_sync[0];
dest_sync[2] <= #10 dest_sync[1];
end
//-------------------------//
//--- Pulse Generator -----//
//-------------------------//
always_comb dest_pulse = dest_sync[1] ^ dest_sync[2];
endmodule
In the past I asked a question about resets, and how to divide a high clock frequency down to a series of lower clock square wave frequencies, where each output is a harmonic of one another e.g. the first output is 10 Hz, second is 20 Hz etc.
I received several really helpful answers recommending what appears to be the convention of using a clock enable pin to create lower frequencies.
An alternative since occurred to me; using a n bit number that is constantly incremented, and taking the last x bits of the number as the clock ouputs, where x is the number of outputs.
It works in synthesis for me - but I'm curious to know - as I've never seen it mentioned anywhere online or on SO, am I missing something that means its actually a terrible idea and I'm simply creating problems for later?
I'm aware that the limitations on this are that I can only produce frequencies that are the input frequency divided by a power of 2, and so most of the time it will only approximate the desired output frequency (but will still be of the right order). Is this limitation the only reason it isn't recommended?
Thanks very much!
David
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
library UNISIM;
use UNISIM.VComponents.all;
use IEEE.math_real.all;
ENTITY CLK_DIVIDER IS
GENERIC(INPUT_FREQ : INTEGER; --Can only divide the input frequency by a power a of 2
OUT1_FREQ : INTEGER
);
PORT(SYSCLK : IN STD_LOGIC;
RESET_N : IN STD_LOGIC;
OUT1 : OUT STD_LOGIC; --Actual divider is 2^(ceiling[log2(input/freq)])
OUT2 : OUT STD_LOGIC); --Actual output is input over value above
END CLK_DIVIDER;
architecture Behavioral of Clk_Divider is
constant divider : integer := INPUT_FREQ / OUT1_FREQ;
constant counter_bits : integer := integer(ceil(log2(real(divider))));
signal counter : unsigned(counter_bits - 1 downto 0) := (others => '0');
begin
proc : process(SYSCLK)
begin
if rising_edge(SYSCLK) then
counter <= counter + 1;
if RESET_N = '0' then
counter <= (others => '0');
end if;
end if;
end process;
OUT1 <= counter(counter'length - 1);
OUT2 <= not counter(counter'length - 2);
end Behavioral;
Functionally the two outputs OUT1 and OUT2 can be used as clocks, but that method of making clocks does not scale and is likely to cause problems in the implementation, so it is a bad habit. However, it is of course important to understand why this is so.
The reason it does not scale, is that every signal used as clock in a FPGA is to be distributed through a special clock net, where the latency and skew is well-defined, so all flip-flops and memories on each clock are updated synchronously. The number of such clock nets is very limited, usually in the range of 10 to 40 in a FPGA device, and some restrictions on use and location makes it typically even more critical to plan the use of clock nets. So it is typically required to reserve clock nets for only real asynchronous clocks, where there is no alternative than to use a clock net.
The reason it is likely to cause problems, is that clocks created based on bits in a counter have no guaranteed timing relation. So if it is required to moved data between these clock domains, it requires additional constrains for synchronization, in order to be sure that the Clock Domain Crossing (CDC) is handled correctly. This is done through constrains for synthesis and/or Static Timing Analysis (STA), and is usually a little tricky to get right, so using a design methodology that simplifies STA is habit that saves design time.
So in designs where it is possible to use a common clock, and then generate synchronous clock enable signals, this should be the preferred approach. For the specific design above, a clock enable can be generated simply by detecting the '0' to '1' transition of the relevant counter bit, and then assert the clock enable in the single cycle where the transition is detected. Then a single clock net can be used, together with 2 clock enables like CE1 and CE2, and no special STA constrains are required.
Morten already pointed out the theory in his answer.
With the aid of two examples, I will demonstrate the problems you encounter when using a generated clock instead of clock enables.
Clock Distribution
At first, one must take care that a clock arrives at (almost) the same time at all destination flip-flops. Otherwise, even a simple shift register with 2 stages like this one would fail:
process(clk_gen)
begin
if rising_edge(clk_gen) then
tmp <= d;
q <= tmp;
end if;
end if;
The intended behavior of this example is that q gets the value of d after two rising edges of the generated clock clock_gen.
If the generated clock is not buffered by a global clock buffer, then the delay will be different for each destination flip-flop because it will be routed via the general-purpose routing.
Thus, the behavior of the shift register can be described as follows with some explicit delays:
library ieee;
use ieee.std_logic_1164.all;
entity shift_reg is
port (
clk_gen : in std_logic;
d : in std_logic;
q : out std_logic);
end shift_reg;
architecture rtl of shift_reg is
signal ff_0_q : std_logic := '0'; -- output of flip-flop 0
signal ff_1_q : std_logic := '0'; -- output of flip-flop 1
signal ff_0_c : std_logic; -- clock input of flip-flop 0
signal ff_1_c : std_logic; -- clock input of flip-flop 1
begin -- rtl
-- different clock delay per flip-flop if general-purpose routing is used
ff_0_c <= transport clk_gen after 500 ps;
ff_1_c <= transport clk_gen after 1000 ps;
-- two closely packed registers with clock-to-output delay of 100 ps
ff_0_q <= d after 100 ps when rising_edge(ff_0_c);
ff_1_q <= ff_0_q after 100 ps when rising_edge(ff_1_c);
q <= ff_1_q;
end rtl;
The following test bench just feeds in a '1' at input d, so that, q should be '0' after 1 clock edge an '1' after two clock edges.
library ieee;
use ieee.std_logic_1164.all;
entity shift_reg_tb is
end shift_reg_tb;
architecture sim of shift_reg_tb is
signal clk_gen : std_logic;
signal d : std_logic;
signal q : std_logic;
begin -- sim
DUT: entity work.shift_reg port map (clk_gen => clk_gen, d => d, q => q);
WaveGen_Proc: process
begin
-- Note: registers inside DUT are initialized to zero
d <= '1'; -- shift in '1'
clk_gen <= '0';
wait for 2 ns;
clk_gen <= '1'; -- just one rising edge
wait for 2 ns;
assert q = '0' report "Wrong output" severity error;
wait;
end process WaveGen_Proc;
end sim;
But, the simulation waveform shows that q already gets '1' after the first clock edge (at 3.1 ns) which is not the intended behavior.
That's because FF 1 already sees the new value from FF 0 when the clock arrives there.
This problem can be solved by distributing the generated clock via a clock tree which has a low skew.
To access one of the clock trees of the FPGA, one must use a global clock buffer, e.g., BUFG on Xilinx FPGAs.
Data Handover
The second problem is the handover of multi-bit signals between two clock domains.
Let's assume we have 2 registers with 2 bits each. Register 0 is clocked by the original clock and register 1 is clocked by the generated clock.
The generated clock is already distributed by clock tree.
Register 1 just samples the output from register 0.
But now, the different wire delays for both register bits in between play an important role. These have been modeled explicitly in the following design:
library ieee;
use ieee.std_logic_1164.all;
library unisim;
use unisim.vcomponents.all;
entity handover is
port (
clk_orig : in std_logic; -- original clock
d : in std_logic_vector(1 downto 0); -- data input
q : out std_logic_vector(1 downto 0)); -- data output
end handover;
architecture rtl of handover is
signal div_q : std_logic := '0'; -- output of clock divider
signal bufg_o : std_logic := '0'; -- output of clock buffer
signal clk_gen : std_logic; -- generated clock
signal reg_0_q : std_logic_vector(1 downto 0) := "00"; -- output of register 0
signal reg_1_d : std_logic_vector(1 downto 0); -- data input of register 1
signal reg_1_q : std_logic_vector(1 downto 0) := "00"; -- output of register 1
begin -- rtl
-- Generate a clock by dividing the original clock by 2.
-- The 100 ps delay is the clock-to-output time of the flip-flop.
div_q <= not div_q after 100 ps when rising_edge(clk_orig);
-- Add global clock-buffer as well as mimic some delay.
-- Clock arrives at (almost) same time on all destination flip-flops.
clk_gen_bufg : BUFG port map (I => div_q, O => bufg_o);
clk_gen <= transport bufg_o after 1000 ps;
-- Sample data input with original clock
reg_0_q <= d after 100 ps when rising_edge(clk_orig);
-- Different wire delays between register 0 and register 1 for each bit
reg_1_d(0) <= transport reg_0_q(0) after 500 ps;
reg_1_d(1) <= transport reg_0_q(1) after 1500 ps;
-- All flip-flops of register 1 are clocked at the same time due to clock buffer.
reg_1_q <= reg_1_d after 100 ps when rising_edge(clk_gen);
q <= reg_1_q;
end rtl;
Now, just feed in the new data value "11" via register 0 with this testbench:
library ieee;
use ieee.std_logic_1164.all;
entity handover_tb is
end handover_tb;
architecture sim of handover_tb is
signal clk_orig : std_logic := '0';
signal d : std_logic_vector(1 downto 0);
signal q : std_logic_vector(1 downto 0);
begin -- sim
DUT: entity work.handover port map (clk_orig => clk_orig, d => d, q => q);
WaveGen_Proc: process
begin
-- Note: registers inside DUT are initialized to zero
d <= "11";
clk_orig <= '0';
for i in 0 to 7 loop -- 4 clock periods
wait for 2 ns;
clk_orig <= not clk_orig;
end loop; -- i
wait;
end process WaveGen_Proc;
end sim;
As can be seen in the following simulation output, the output of register 1 toggles to an intermediate value of "01" at 3.1 ns first because the input of register 1 (reg_1_d) is still changing when the rising edge of the generated clock occurs.
The intermediate value was not intended and can lead to undesired behavior. The correct value is seen not until another rising edge of the generated clock.
To solve this issue, one can use:
special codes, where only one bit flips at a time, e.g., gray code, or
cross-clock FIFOs, or
handshaking with the help of single control bits.
For speed measurement of an electric motor I would like to count the amount of rising and falling edges of an encoder-input in a time-interval of 10ms.
To do this I have implementet a clock divider for my 40 MHz Clock as follows:
entity SpeedCLK is
Port ( CLK : in STD_LOGIC;
CLKspeed : out STD_LOGIC);
end SpeedCLK;
architecture Behavioral of SpeedCLK is
signal CounterCLK : NATURAL range 0 to 400001;
begin
SpeedCounter : process
begin
wait until rising_edge(CLK);
CounterCLK <= CounterCLK + 1;
if CounterCLK < 400000 then
CLKspeed <= '0';
else
CLKspeed <= '1';
CounterCLK <= 0;
end if;
end process SpeedCounter;
end Behavioral;
This should make CLKSpeed '1' every 10 ms. I use this block as a component in my Toplevel VHDL-Module. In the next Block I count the Edges from my encoderinput(QEPA) with a shift register to debounce the Input Signal.
entity SpeedMeasure is
Port (
QEPA : in STD_LOGIC;
CLK : in STD_LOGIC;
CLKSpeed : in STD_LOGIC;
Speed : OUT INTEGER -- in rpm
);
end SpeedMeasure;
architecture Behavioral of SpeedMeasure is
begin
Edges : process(CLKSpeed, CLK)
variable detect : STD_LOGIC_VECTOR(5 downto 0) := "000000";
variable EdgesPerTime : INTEGER := 0;
variable CountQEPA : INTEGER := 0;
begin
if CLKSpeed = '1' then
EdgesPerTime := countQEPA;
countQEPA := 0;
ELSIF (CLK'EVENT AND CLK = '1') THEN
detect(5 downto 1) := detect(4 downto 0);
detect(0) := QEPA;
if (detect = "011111") OR (detect = "100000") then
countQEPA := countQEPA + 1;
else
countQEPA := countQEPA;
end if;
end if;
Speed <= EdgesPerTime;
end process Edges;
end Behavioral;
This should write the current value of CountQEPA in my variable edgesPerTime every 10 ms and reset the Counter afterwards.
The signal Speed gets transmitted via uart. Unfortunatly with the reset of CountQEPA every 10ms I receive a constant value of 0 for EdgesPerTime. If I remove the reset line in the code, I can receive an increasing value for EdgesPerTime until the largest number for Integer (2^16) is reached, at which Point the Counter resets to 0.
What is the correct implementation in VHDL to count rising and falling edges in a set period of time?
Any help is greatly apreciated as I am still very new to vhdl.
You're building a latch using a variable. Your synthesis tool inferred an asynchronous reset CLKSpeed for countQEPA and since your latch EdgesPerTime latches countQEPA when its write enable CLKSpeed is asserted, you only see the reset countQEPA value of 0.
You should build proper synchronous logic:
signal CountQEPA : integer := 0;
signal detect : std_logic_vector(5 downto 0) := (others => '0');
begin
edge_cnt_p : process (
CLK
)
begin
if (rising_edge(CLK)) then
-- synchronous reset
if (CLKSpeed = '1') then
countQEPA <= 0;
elsif ((detect = "011111") or (detect = "100000")) then
countQEPA <= countQEPA + 1;
end if;
end if;
end process edge_cnt_p;
edge_debounce_p : process (
CLK
)
begin
if (rising_edge(CLK)) then
detect <= detect(detect'left downto 1) & QEPA;
end if;
end process edge_debounce_p;
count_register_p : process (
CLK
)
begin
if (rising_edge(CLK)) then
-- synchronous write enable
if (CLKSpeed = '1') then
Speed <= countQEPA;
end if;
end if;
end process count_register_p;
I'm not a big fan of variables, but it seems synthesis tools got a lot smarter these days. As you can see, I made three separate processes to show what we actually want going on. CountQEPA and debounce are signals now, because they need to be readable from other processes.
I assume CLKSpeed is synchronous to CLK (40 MHz). If it isn't, then you should think about handshaking between the clock domains.
You can of course make a single process out of these individual processes. In general, I try to avoid variables, because they can easily be used to describe behavior that is not synthesizable, like you found out the hard way.
I'm not sure how long your encoder bounces, but if you do sample at 40 MHz, a 6 bit shift register will likely not work as that's only a 125 ns debounce (you use the lower 5 bits only for debouncing, the top bit for noticing state changes).
Speaking of your debouncer. Consider the following case:
debounce QEPA
111111 1
111111 0 <-- glitch/bounce on encoder line; 25 ns wide
111110 1
111101 1
111011 1
110111 1
101111 1
011111 1 <-- fake edge detected
That's probably not what you want. You should compare your debounced signal to the last stable state instead of assuming state'delayed(5 * clk_period) being a stable state.
As I said above, I'm assuming CLK is your 40 MHz clock. You needn't sample your encoder signal that fast, because a proper debouncer would take a large number of shift register bits at no added value to you, because you have to divide that clock down to 1 kHz anyway for your time-slot counting.
Instead, take into account the maximum frequency of your encoder signal. You would want to oversample that at least by double that frequency, quadruple that frequency to be safe. So you have four samples per "bit time".
Since you're expecting many edges at 1 kHz, I assume your encoder is at least two orders of magnitude fast, i.e. 100 kHz. So scale 40 Mhz down to 400 kHz and shift in QEPA for debouncing. Sample these 400 kHz down to 1 kHz for your time slot measurements.
On a side note: What kind of motor encoding only uses a single bit? Are you sure you're not dealing with a quadrature encoder?
I want to take samples of digital data coming externaly to FPGA spartan 3.
I want to take 1000 samples/sec initially. How to select a clock frequency in vhdl coding?
Thanks.
Do not use a counter to generate a lower frequency clock signal.
Multiple clock frequencies in an FPGA cause a variety of design problems, some of which come under the heading of "advanced topics" and, while they can (if necessary) all be dealt with and solved, learning how to use a single fast clock is both simpler and generally better practice (synchronous design).
Instead, use whatever fast clock your FPGA board provides, and generate lower frequency timing signals from it, and - crucially - use them as clock enables, not clock signals.
DLLs, DCMs, PLLs and other clock managers do have their uses, but generating 1 kHz clock signals is generally not a good use, even if their limitations permit it. This application is just crying out for a clock enable...
Also, don't mess around with magic numbers, let the VHDL compiler do the work! I have put the timing requirements in a package, so you can share them with the testbench and anything else that needs to use them.
package timing is
-- Change the first two constants to match your system requirements...
constant Clock_Freq : real := 40.0E6;
constant Sample_Rate : real := 1000.0;
-- These are calculated from the above, so stay correct when you make changes
constant Divide : natural := natural(Clock_Freq / Sample_Rate);
-- sometimes you also need a period, e.g. in a testbench.
constant clock_period : time := 1 sec / Clock_Freq;
end package timing;
And we can write the sampler as follows:
(I have split the clock enable out into a separate process to clarify the use of clock enables, but the two processes could be easily rolled into one for some further simplification; the "sample" signal would then be unnecessary)
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.numeric_std.all;
use work.timing.all;
entity sampler is
Port (
Clock : in std_logic;
Reset : in std_logic;
ADC_In : in signed(7 downto 0);
-- signed for audio, or unsigned, depending on your app
Sampled : out signed(7 downto 0);
);
end sampler;
architecture Behavioral of Sampler is
signal Sample : std_logic;
begin
Gen_Sample : process (Clock,Reset)
variable Count : natural;
begin
if reset = '1' then
Sample <= '0';
Count := 0;
elsif rising_edge(Clock) then
Sample <= '0';
Count := Count + 1;
if Count = Divide then
Sample <= '1';
Count := 0;
end if;
end if;
end process;
Sample_Data : process (Clock)
begin
if rising_edge(Clock) then
if Sample = '1' then
Sampled <= ADC_In;
end if;
end if;
end process;
end Behavioral;
The base clock must be based on an external clock, and can't be generated just through internal resources in a Spartan-3 FPGA. If required, you can use the Spartan-3 FPGA Digital Clock Manager (DCM) resources to scale the external clock. Synthesized VHDL code in itself can't generate a clock.
Once you have some base clock at a higher frequency, for example 100 MHz, you can easily divide this down to generate an indication at 1 kHz for sampling of the external input.
It depends on what clock frequency you have available. If you have a 20MHz clock source, you need to divided it by 20000 in order to get 1KHz, you can do it in VHDL or use a DCM to do this.
This is from an example on how to create a 1kHz clock from a 20MHz input:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity clk20Hz is
Port (
clk_in : in STD_LOGIC;
reset : in STD_LOGIC;
clk_out: out STD_LOGIC
);
end clk200Hz;
architecture Behavioral of clk20Hz is
signal temporal: STD_LOGIC;
signal counter : integer range 0 to 10000 := 0;
begin
frequency_divider: process (reset, clk_in) begin
if (reset = '1') then
temporal <= '0';
counter <= 0;
elsif rising_edge(clk_in) then
if (counter = 10000) then
temporal <= NOT(temporal);
counter <= 0;
else
counter <= counter + 1;
end if;
end if;
end process;
clk_out <= temporal;
end Behavioral;