Strange behaviour in VHDL - vhdl

I'm trying to integrate (sum) a 14-bit signal of ADC at 50 Mhz. The integration starts with rising edge of signal "trigger". If the integral reaches a defined threshold (6000000), a digital signal ("dout") should be set to 0 (which became 1 with "trigger" becoming 1). So far a quite easy task.
Though on the hardware itself (Cyclone V) I realized a strange behaviour. Although I kept the voltage level at the ADC constant, the pulse width of the output signal "dout" is sometimes fluctuating (although it should stay nearly constant for a constant 14-bit value at the ADC, which has a low noise). The pulse width is decreasing with rising voltage level, so the integration itself works fine. But it keeps fluctuating.
Here is my code:
library IEEE;
use IEEE.std_logic_1164.all;
use ieee.numeric_std.all;
entity integrator is
port(
trigger: in std_logic;
adc: in std_logic_vector(13 downto 0);
clk: in std_logic;
dout: out std_logic);
end integrator ;
architecture rtl of integrator is
signal sum : integer;
begin
process(clk) is
begin
if rising_edge(clk) then
if (trigger='1') and (sum<6000000) then
sum<=sum+to_integer(unsigned(adc));
dout<='1';
else
dout<='0';
if (trigger='0') then
sum<=0;
end if;
end if;
end if;
end process;
end rtl;
I checked the signals using SignalTab II of Quartus Prime. I realized that the value of "sum" was rising, but not perfectly correct (compared the sum I calculated manually of the values of "adc".
I used a PLL to phase shift the 50 Mhz clock ("clk") about 90 degrees. The resulting clock served as input for the ADC clock. I left out the PLL and the value of "sum" matched. Nonetheless I see fluctuations in the "dout" signal (oscilloscope).
Even more strange: I changed the type of "sum" to unsigned and finally the fluctuations disappeared. But only without using the PLL! But while making adaptations to the code below the fluctuations came back. Maybe the sum of integer and unsigned leaded to another timing?!?
The questions are now:
- Why the value of "sum" is incorrect when using PLL (I though the value of "adc" should stay constant for half a clock cycle when phase shifting of 90 degrees)?
- Why I see the fluctuations in "dout"? Is there something wrong with the code?
EDIT1: Add testbench
Here is my testbench:
library IEEE;
use IEEE.std_logic_1164.all;
use ieee.numeric_std.all;
entity testbench is
end testbench;
architecture tb of testbench is
component integrator is
port(
trigger: in std_logic;
adc: in std_logic_vector(13 downto 0);
clk: in std_logic;
dout: out std_logic);
end component;
signal trigger_in, clk_in, dout_out: std_logic;
signal adc_in: std_logic_vector(13 downto 0);
begin
DUT: integrator port map(trigger_in, adc_in, clk_in, dout_out);
process
begin
for I in 1 to 4500 loop
clk_in <= '0';
wait for 10 ns;
clk_in <= '1';
wait for 10 ns;
end loop;
wait;
end process;
process
begin
trigger_in <= '0';
wait for 10 us;
trigger_in <= '1';
wait for 30 us;
trigger_in <= '0';
wait for 10 us;
trigger_in <= '1';
wait for 30 us;
trigger_in <= '0';
wait for 10 us;
wait;
end process;
process
begin
adc_in <= (others => '0');
wait for 10 us;
adc_in <= std_logic_vector(to_unsigned(6000, 14));
wait for 30 us;
adc_in <= (others => '0');
wait for 10 us;
adc_in <= std_logic_vector(to_unsigned(6000, 14));
wait for 30 us;
adc_in <= (others => '0');
wait for 10 us;
wait;
end process;
end tb;
And the resulting output:

I asked for a test-bench because your code looks a bit strange. Just as user1155120 I noticed that the summation takes place outside any condition which can cause overflow. You do not see that overflow because you do not test for it in your test bench.
I can make a suggestion to change your code but the problem, lies in the specification:
If the integral reaches a defined threshold (6000000),...
You do not specify what the sum should do in that case. Continue? Hold?
If you let it continue it will at some point warp around and become negative.
A possible solution would be:
if sum<some_maximum_value_you_define then
sum<=sum+to_integer(unsigned(adc));
end if;
A possible maximum value would be 231-214-1.
Alternative you must make sure the the trigger_in comes fast enough that a the sum never overflows. With 50MHz sampling and a 14 bit ADC that means at least 382Hz.
I would add some VHDL code to check the ADC signal. e.g. maximum and minimum values seen. Compare those against the actual (more or less constant) input value. That might give you an idea about the stability/reliability of the sampling.

thanks for all the responses. It helped my to get a bit further. I checked the ADC signal, but there is only noise of around 10 (of 14-Bit) and no unexpected values. Furthermore all the other signals (tried without logic) are fine.
I also found a solution for the inconsistent sum behaviour. I just saved it in temp_adc before the calculations. I tried variable and signal but I go with signal, because i can visualize it in SignalTap (of course there is a delay of one clock cycle now):
library IEEE;
use IEEE.std_logic_1164.all;
use ieee.numeric_std.all;
entity integrator is
port(
trigger: in std_logic;
adc: in std_logic_vector(13 downto 0);
clk: in std_logic;
dout: out std_logic);
end integrator ;
architecture rtl of integrator is
signal sum : integer;
signal temp_adc : unsigned(13 downto 0);
begin
process(clk) is
begin
temp_adc<=unsigned(adc);
if rising_edge(clk) then
if (trigger='1') and (sum<6000000) then
sum<=sum+to_integer(tem_adc);
dout<='1';
else
dout<='0';
if (trigger='0') then
sum<=0;
end if;
end if;
end if;
end process;
end rtl;
In SignalTap now it's fitting well (sum=sum+temp_adc) most of the time. Coming back to the problem: I found a way in SignalTap to trigger unexpected events. I found one very strange behaviour:
Lets t=0 be the cycle in which trigger goes '1'. The output looks like this:
This means dout just goes '1' for a single clock cycle due to the high value in sum. This happens random but with around every 300th pulse.
Looks like there is something like an overflow with a single adc added to sum. Do you have any ideas where this comes from?
Additionally I played around with the PLL for the ADC clock. I tried different phase shifts (0°, 90°, 180°) but the result is more or less the same.

Sorry guys. The problem was a misconfigured Quartus project with wrong

Related

Weird simulation output

I have the following code (which is an extract on something I'm currently working on):
library ieee;
use ieee.std_logic_1164.all;
entity tb_min_example is
end entity tb_min_example;
architecture arch of tb_min_example is
signal clk1, clk2, slow_clk : std_logic;
signal y : std_logic;
procedure drive_clk(signal clk : out std_logic; constant period : time) is
begin
loop
clk <= '1';
wait for period / 2;
clk <= '0';
wait for period / 2;
end loop;
end procedure drive_clk;
begin
drive_clk(clk1, period => 4 ns);
drive_clk(slow_clk, period => (119.0 / 8.0) * 4 ns);
clk2 <= clk1;
process(slow_clk) is
begin
if rising_edge(slow_clk) then
y <= clk1 xor clk2;
end if;
end process;
end architecture arch;
I would expect the signal y to be low all the time since clk2 is assigned to clk1 and x xor x == 1. However, I'm seeing this strange waveform:
It seems that clk2 (or clk1) is sampled just a bit earlier than the other and therefore the result of clk2 xor clk1 is not zero. Why is that, is there some weird delta-cycle stuff going on?
I'm using XSim with Vivado version 2020.2
clk2 is a delayed version of clk1 - delayed by 1 simulation delta cycle.
Because slow_clk and clk1 rise at the same time initially, they will always be in sync on every 2 slow_clk rising edges (119/8 = 59.5, and hence they are both related clocks and align every 2nd rising edge of slow_clk). At this point, clk1 will be '1' while clk2 will be '0', giving you the XOR result of '1'.
This is generally why re-assigning clocks in a design can cause you simulation problems.
If you simply delay the start of slow_clock you should see y being always '0'.

1-cycle enable signal in a clocked process

I am taking a vhdl online course.
One of the laboratory work is: "Based on frequency divider and 8-bit cyclic shift register implement a ring counter with a shift period of 1 s."
The task says that the most significant bit of the counter cannot be used as the clock signal of the shift register (i.e. in the if rising_edge (shifter (MSB)) construction.
It is necessary to form the enable signal as a strobe.
I did the job. The result is accepted.
I have a question related to shift register by enable.
shift_reg_proc : process(clk)
begin
if (rising_edge(clk)) then
if (srst = '1') then
shift_reg <= "10000000";
elsif (en = '1') then
shift_reg <= shift_reg(0) & shift_reg(7 downto 1);
end if;
end if;
end process shift_reg_proc
If the duration of the enable signal is 1 period clk, then there is a probability that at the moment of rising_edge (clk) the en signal level will not have time to become = 1.
If this is the case, then it is not guaranteed that the register shift will occur in the next second.
Is there any "correct" way to do this task?
Is it so? Is my decision correct? Is the lab clue misleading?
I am attaching the implementation code, test bench and wave image.
ring_counter.vhd
--------------------------------------------------------------------------------
-- Based on frequency divider and 8-bit cyclic shift register implement a ring
-- counter with a shift period of 1 s.
--------------------------------------------------------------------------------
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
entity ring_counter is
port(clk : in std_logic;
srst : in std_logic;
dout : out std_logic_vector(7 downto 0);
en_o : out std_logic
);
end entity ring_counter;
architecture behave of ring_counter is
signal cntr : std_logic_vector(26 downto 0) := (others => '0');
signal cntr_msb_delayed : std_logic;
signal shift_reg : std_logic_vector(7 downto 0);
signal en : std_logic;
constant cntr_msb_num : integer := 4; -- 26 for DE board, 4 for test bench
begin
-- signal for test bench
en_o <= en;
--------------------------------------------------------------------------------
-- Counter implementation
--------------------------------------------------------------------------------
cntr_proc : process(clk)
begin
if (rising_edge(clk)) then
if (srst = '1') then
cntr <= (others => '0');
else
cntr <= unsigned(cntr) + 1;
end if;
end if;
end process cntr_proc;
----------------------------------------------------------------------------
-- Shift register implementation
----------------------------------------------------------------------------
shift_reg_proc : process(clk)
begin
if (rising_edge(clk)) then
if (srst = '1') then
shift_reg <= "10000000";
elsif (en = '1') then
shift_reg <= shift_reg(0) & shift_reg(7 downto 1);
end if;
end if;
end process shift_reg_proc;
dout <= shift_reg;
----------------------------------------------------------------------------
-- Enable signal generation
----------------------------------------------------------------------------
-- Counter MSB delay for 1 period of clk
delay_proc : process(clk)
begin
if (rising_edge(clk)) then
cntr_msb_delayed <= cntr(cntr_msb_num);
end if;
end process delay_proc;
en <= cntr(cntr_msb_num) and not cntr_msb_delayed;
end architecture behave;
ring_counter_tb.vhd
library ieee;
use ieee.std_logic_1164.all;
entity ring_counter_tb is
end entity ring_counter_tb;
architecture behave of ring_counter_tb is
component ring_counter is
port(clk : in std_logic;
srst : in std_logic;
dout : out std_logic_vector(7 downto 0);
en_o : out std_logic
);
end component ring_counter;
signal clk : std_logic;
signal srst : std_logic;
signal dout : std_logic_vector(7 downto 0);
signal en_o : std_logic;
constant clk_period : time := 4 ns;
begin
dut : ring_counter
port map (
clk => clk,
srst => srst,
dout => dout,
en_o => en_o
);
clk_gen : process
begin
clk <= '0';
wait for clk_period;
loop
clk <= '0';
wait for clk_period/2;
clk <= '1';
wait for clk_period/2;
end loop;
end process clk_gen;
srst <= '0',
'1' after 100 ns,
'0' after 150 ns;
end architecture behave;
wave for test bench
TL;DR
The rising edge of clk after which en is raised is not the same as the rising edge of clk at which your shift register shifts. en is asserted high after rising edge N and de-asserted after rising edge N+1. Your shift register is thus shifted at rising edge N+1.
So you have about one clock period delay between assertions of en and the register shifts. You don't care because your specification says that you want a shift period of 1 second. As long as en is periodic with a period of one second, even if there is a small constant delay between en and your shift register, you fulfill the specifications.
But what is of uttermost importance is that, as it is seen by your shift register, en is asserted high sufficiently after rising edge N to avoid a too early shift and de-asserted sufficiently after rising edge N+1 to allow a good nice shift. If you are interested in this too, please continue reading.
Detailed explanation
Your en signal is computed from the outputs of registers synchronized on the same clock clk as your shift register. You cannot have any hold time problem there: the propagation delay from the rising edge of the clock to the outputs of your cntr and cntr_msb_delayed registers guarantee that en will arrive at your shift register sufficiently after the rising edge of the clock that caused it (assuming you don't have large clock skews). It cannot arrive too early.
Can it arrive too late (setup time problem)? Yes, if your clock frequency is too high. The clock period would then be too short, en would not have enough time to be computed, stabilize and propagate to your shift register before the next rising edge of the clock and anything could happen (no shift at all, partial shift, metastabilities...)
This is a very common concern in digital design: you cannot operate at an arbitrarily high clock frequency. If you could you would clock your own computer at yotta-Hz or even more, instead of giga-Hz, and everything would become instantaneous. It would be nice but it is not how the real world works.
In a digital design you always have what is called a critical path. It is a particular chain of logic gates between a set of source registers and a destination register, along which the propagation delay of electrical signals is the largest of the whole design.
Which path it is among all possible and the total delay along this path depend on your design's complexity (e.g. the number of bits of your counter), your target hardware technology (e.g. the FPGA of your prototyping board) and the operating conditions (temperature, voltage of power supply, speed-grade of your FPGA).
(Yes, it depends also on the temperature, reason why hard-core gamers cool down their computers with high performance cooling systems. This avoids the destruction of the silicon and allows to operate the computer at a higher clock frequency with more frames per second and a better user experience.)
The largest time it takes for the signals to travel from the source clock-edge to the arrival at destination, augmented by a small security margin called the setup time of the destination register, is the smallest clock period (highest clock frequency) at which you can run your design. As long as you don't exceed this limit your system works as expected.
Hardware design tool chains usually comprise a Static Timing Analyzer (STA) that tells you what this maximum clock frequency is for your design, your target, and your operating conditions. If it tells you 500 MHz and you need only 350 MHz, everything is fine (you could however investigate and see if you could modify your design, save some hardware, and still run at 350 MHz).
But if you need 650 MHz it is time to roll up your sleeves, look at the critical path (the STA will also show the path), understand it and rework your design to speed it up (e.g. pipeline long computations, use carry look ahead adders instead of carry ripple...) Note that, usually, when you encounter timing closure problems you do not consider only one critical path but the set of all paths that exceed your time budget because you want to eliminate them all, not just the worst. This is why the STA gives you not only the worst critical path but a list of critical paths, in decreasing order of severity.

Alternative method for creating low clock frequencies in VHDL

In the past I asked a question about resets, and how to divide a high clock frequency down to a series of lower clock square wave frequencies, where each output is a harmonic of one another e.g. the first output is 10 Hz, second is 20 Hz etc.
I received several really helpful answers recommending what appears to be the convention of using a clock enable pin to create lower frequencies.
An alternative since occurred to me; using a n bit number that is constantly incremented, and taking the last x bits of the number as the clock ouputs, where x is the number of outputs.
It works in synthesis for me - but I'm curious to know - as I've never seen it mentioned anywhere online or on SO, am I missing something that means its actually a terrible idea and I'm simply creating problems for later?
I'm aware that the limitations on this are that I can only produce frequencies that are the input frequency divided by a power of 2, and so most of the time it will only approximate the desired output frequency (but will still be of the right order). Is this limitation the only reason it isn't recommended?
Thanks very much!
David
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
library UNISIM;
use UNISIM.VComponents.all;
use IEEE.math_real.all;
ENTITY CLK_DIVIDER IS
GENERIC(INPUT_FREQ : INTEGER; --Can only divide the input frequency by a power a of 2
OUT1_FREQ : INTEGER
);
PORT(SYSCLK : IN STD_LOGIC;
RESET_N : IN STD_LOGIC;
OUT1 : OUT STD_LOGIC; --Actual divider is 2^(ceiling[log2(input/freq)])
OUT2 : OUT STD_LOGIC); --Actual output is input over value above
END CLK_DIVIDER;
architecture Behavioral of Clk_Divider is
constant divider : integer := INPUT_FREQ / OUT1_FREQ;
constant counter_bits : integer := integer(ceil(log2(real(divider))));
signal counter : unsigned(counter_bits - 1 downto 0) := (others => '0');
begin
proc : process(SYSCLK)
begin
if rising_edge(SYSCLK) then
counter <= counter + 1;
if RESET_N = '0' then
counter <= (others => '0');
end if;
end if;
end process;
OUT1 <= counter(counter'length - 1);
OUT2 <= not counter(counter'length - 2);
end Behavioral;
Functionally the two outputs OUT1 and OUT2 can be used as clocks, but that method of making clocks does not scale and is likely to cause problems in the implementation, so it is a bad habit. However, it is of course important to understand why this is so.
The reason it does not scale, is that every signal used as clock in a FPGA is to be distributed through a special clock net, where the latency and skew is well-defined, so all flip-flops and memories on each clock are updated synchronously. The number of such clock nets is very limited, usually in the range of 10 to 40 in a FPGA device, and some restrictions on use and location makes it typically even more critical to plan the use of clock nets. So it is typically required to reserve clock nets for only real asynchronous clocks, where there is no alternative than to use a clock net.
The reason it is likely to cause problems, is that clocks created based on bits in a counter have no guaranteed timing relation. So if it is required to moved data between these clock domains, it requires additional constrains for synchronization, in order to be sure that the Clock Domain Crossing (CDC) is handled correctly. This is done through constrains for synthesis and/or Static Timing Analysis (STA), and is usually a little tricky to get right, so using a design methodology that simplifies STA is habit that saves design time.
So in designs where it is possible to use a common clock, and then generate synchronous clock enable signals, this should be the preferred approach. For the specific design above, a clock enable can be generated simply by detecting the '0' to '1' transition of the relevant counter bit, and then assert the clock enable in the single cycle where the transition is detected. Then a single clock net can be used, together with 2 clock enables like CE1 and CE2, and no special STA constrains are required.
Morten already pointed out the theory in his answer.
With the aid of two examples, I will demonstrate the problems you encounter when using a generated clock instead of clock enables.
Clock Distribution
At first, one must take care that a clock arrives at (almost) the same time at all destination flip-flops. Otherwise, even a simple shift register with 2 stages like this one would fail:
process(clk_gen)
begin
if rising_edge(clk_gen) then
tmp <= d;
q <= tmp;
end if;
end if;
The intended behavior of this example is that q gets the value of d after two rising edges of the generated clock clock_gen.
If the generated clock is not buffered by a global clock buffer, then the delay will be different for each destination flip-flop because it will be routed via the general-purpose routing.
Thus, the behavior of the shift register can be described as follows with some explicit delays:
library ieee;
use ieee.std_logic_1164.all;
entity shift_reg is
port (
clk_gen : in std_logic;
d : in std_logic;
q : out std_logic);
end shift_reg;
architecture rtl of shift_reg is
signal ff_0_q : std_logic := '0'; -- output of flip-flop 0
signal ff_1_q : std_logic := '0'; -- output of flip-flop 1
signal ff_0_c : std_logic; -- clock input of flip-flop 0
signal ff_1_c : std_logic; -- clock input of flip-flop 1
begin -- rtl
-- different clock delay per flip-flop if general-purpose routing is used
ff_0_c <= transport clk_gen after 500 ps;
ff_1_c <= transport clk_gen after 1000 ps;
-- two closely packed registers with clock-to-output delay of 100 ps
ff_0_q <= d after 100 ps when rising_edge(ff_0_c);
ff_1_q <= ff_0_q after 100 ps when rising_edge(ff_1_c);
q <= ff_1_q;
end rtl;
The following test bench just feeds in a '1' at input d, so that, q should be '0' after 1 clock edge an '1' after two clock edges.
library ieee;
use ieee.std_logic_1164.all;
entity shift_reg_tb is
end shift_reg_tb;
architecture sim of shift_reg_tb is
signal clk_gen : std_logic;
signal d : std_logic;
signal q : std_logic;
begin -- sim
DUT: entity work.shift_reg port map (clk_gen => clk_gen, d => d, q => q);
WaveGen_Proc: process
begin
-- Note: registers inside DUT are initialized to zero
d <= '1'; -- shift in '1'
clk_gen <= '0';
wait for 2 ns;
clk_gen <= '1'; -- just one rising edge
wait for 2 ns;
assert q = '0' report "Wrong output" severity error;
wait;
end process WaveGen_Proc;
end sim;
But, the simulation waveform shows that q already gets '1' after the first clock edge (at 3.1 ns) which is not the intended behavior.
That's because FF 1 already sees the new value from FF 0 when the clock arrives there.
This problem can be solved by distributing the generated clock via a clock tree which has a low skew.
To access one of the clock trees of the FPGA, one must use a global clock buffer, e.g., BUFG on Xilinx FPGAs.
Data Handover
The second problem is the handover of multi-bit signals between two clock domains.
Let's assume we have 2 registers with 2 bits each. Register 0 is clocked by the original clock and register 1 is clocked by the generated clock.
The generated clock is already distributed by clock tree.
Register 1 just samples the output from register 0.
But now, the different wire delays for both register bits in between play an important role. These have been modeled explicitly in the following design:
library ieee;
use ieee.std_logic_1164.all;
library unisim;
use unisim.vcomponents.all;
entity handover is
port (
clk_orig : in std_logic; -- original clock
d : in std_logic_vector(1 downto 0); -- data input
q : out std_logic_vector(1 downto 0)); -- data output
end handover;
architecture rtl of handover is
signal div_q : std_logic := '0'; -- output of clock divider
signal bufg_o : std_logic := '0'; -- output of clock buffer
signal clk_gen : std_logic; -- generated clock
signal reg_0_q : std_logic_vector(1 downto 0) := "00"; -- output of register 0
signal reg_1_d : std_logic_vector(1 downto 0); -- data input of register 1
signal reg_1_q : std_logic_vector(1 downto 0) := "00"; -- output of register 1
begin -- rtl
-- Generate a clock by dividing the original clock by 2.
-- The 100 ps delay is the clock-to-output time of the flip-flop.
div_q <= not div_q after 100 ps when rising_edge(clk_orig);
-- Add global clock-buffer as well as mimic some delay.
-- Clock arrives at (almost) same time on all destination flip-flops.
clk_gen_bufg : BUFG port map (I => div_q, O => bufg_o);
clk_gen <= transport bufg_o after 1000 ps;
-- Sample data input with original clock
reg_0_q <= d after 100 ps when rising_edge(clk_orig);
-- Different wire delays between register 0 and register 1 for each bit
reg_1_d(0) <= transport reg_0_q(0) after 500 ps;
reg_1_d(1) <= transport reg_0_q(1) after 1500 ps;
-- All flip-flops of register 1 are clocked at the same time due to clock buffer.
reg_1_q <= reg_1_d after 100 ps when rising_edge(clk_gen);
q <= reg_1_q;
end rtl;
Now, just feed in the new data value "11" via register 0 with this testbench:
library ieee;
use ieee.std_logic_1164.all;
entity handover_tb is
end handover_tb;
architecture sim of handover_tb is
signal clk_orig : std_logic := '0';
signal d : std_logic_vector(1 downto 0);
signal q : std_logic_vector(1 downto 0);
begin -- sim
DUT: entity work.handover port map (clk_orig => clk_orig, d => d, q => q);
WaveGen_Proc: process
begin
-- Note: registers inside DUT are initialized to zero
d <= "11";
clk_orig <= '0';
for i in 0 to 7 loop -- 4 clock periods
wait for 2 ns;
clk_orig <= not clk_orig;
end loop; -- i
wait;
end process WaveGen_Proc;
end sim;
As can be seen in the following simulation output, the output of register 1 toggles to an intermediate value of "01" at 3.1 ns first because the input of register 1 (reg_1_d) is still changing when the rising edge of the generated clock occurs.
The intermediate value was not intended and can lead to undesired behavior. The correct value is seen not until another rising edge of the generated clock.
To solve this issue, one can use:
special codes, where only one bit flips at a time, e.g., gray code, or
cross-clock FIFOs, or
handshaking with the help of single control bits.

VHDL - FSM not starting (JUST in timing simulation)

I'm working for my master thesis and I'm pretty new to VHDL, but still I have to implement some complex things. This is one of the easiest structures I had to write, and still I'm encountering some problems.
It's a FSM implementing a 24bit shift register with an active-low sync signal (to program a DAC). It's just the end of a complex elaboration chain I created for my project. I followed the example model of a FSM as much as I could.
The behavioral simulation works fine, actually the whole elaboration chain I created works perfectly fine as far as the behavioral simulation concerns. However, once I try the Post-translate simulation things start to go wrong: lots of 'X' output signals.
With this simple shift register I DON'T get any 'X', however I can't get to the load_and_prepare_data phase. It seems that the current_state changes (by inspecting some signals), but the elaboration doesn't go on.
Please keep in mind that since I'm new to the language, I have no idea of what timing constraints I should set on this FSM (and I wouldn't know how to write them on the top.ucf anyway)
Can you see what's wrong?
Thanks in advance
EDIT
I followed your advices and cleaned up the FSM by using a single state process. I still have some doubts about "where to put what" but I really like the new implementation. Anyway I now get a clean behavioral simulation but 'X' on all outputs in post translate simulation.
What is causing this?
I'll post the both the new code and the testbench:
----------------------------------------------------------------------------------
-- Company:
-- Engineer:
--
-- Create Date: 14:44:03 11/28/2014
-- Design Name:
-- Module Name: dac_ad5764r_24bit_sr_programmer_v2 - Behavioral
-- Project Name:
-- Target Devices:
-- Tool versions:
-- Description: This is a PISO shift register that gets a 24bit parallel input word.
-- It outputs the 24bit input word starting from the MSB and enables
-- an active low ChipSelect line for 24 clock periods.
-- Dependencies:
--
-- Revision:
-- Revision 0.01 - File Created
-- Additional Comments:
--
----------------------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
-- Uncomment the following library declaration if using
-- arithmetic functions with Signed or Unsigned values
use IEEE.NUMERIC_STD.ALL;
-- Uncomment the following library declaration if instantiating
-- any Xilinx primitives in this code.
--library UNISIM;
--use UNISIM.VComponents.all;
entity dac_ad5764r_24bit_sr_programmer_v2 is
Port ( clk : in STD_LOGIC;
start : in STD_LOGIC;
reset : in STD_LOGIC; -- Note that this reset is for the FSM not for the DAC
reset_all_dac : in STD_LOGIC;
data_in : in STD_LOGIC_VECTOR (23 downto 0);
serial_data_out : out STD_LOGIC;
sync_out : out STD_LOGIC; -- This is a chip select
reset_out : out STD_LOGIC;
busy : out STD_LOGIC
);
end dac_ad5764r_24bit_sr_programmer_v2;
architecture Behavioral of dac_ad5764r_24bit_sr_programmer_v2 is
-- Stati
type state_type is (idle, load_and_prepare_data, transmission);
--ATTRIBUTE ENUM_ENCODING : STRING;
--ATTRIBUTE ENUM_ENCODING OF state_type: TYPE IS "001 010 100";
signal state: state_type := idle;
--signal next_state: state_type := idle;
-- Clock counter
--signal clk_counter_enable : STD_LOGIC := '0';
signal clk_counter : unsigned(4 downto 0) := (others => '0');
-- Shift register
signal stored_data: STD_LOGIC_VECTOR (23 downto 0) := (others => '0');
begin
FSM_single_process: process(clk)
begin
if rising_edge(clk) then
if reset = '1' then
serial_data_out <= '0';
sync_out <= '1';
reset_out <= '1';
busy <= '0';
state <= idle;
else
-- Default
serial_data_out <= '0';
sync_out <= '1';
reset_out <= '1';
busy <= '0';
case (state) is
when transmission =>
serial_data_out <= stored_data(23);
sync_out <= '0';
busy <= '1';
clk_counter <= clk_counter + 1;
stored_data <= stored_data(22 downto 0) & "0";
state <= transmission;
if (clk_counter = 23) then
state <= idle;
end if;
when others => -- Idle
if start = '1' then
serial_data_out <= data_in(23);
sync_out <= '0';
reset_out <= '1';
busy <= '1';
stored_data <= data_in;
clk_counter <= "00001";
state <= transmission;
end if;
end case;
-- if (reset_all_dac = '1') then
-- reset_out <= '0';
-- end if;
end if;
end if;
end process;
end;
And the testbench:
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
-- Uncomment the following library declaration if using
-- arithmetic functions with Signed or Unsigned values
--USE ieee.numeric_std.ALL;
ENTITY dac_ad5764r_24bit_sr_programmer_tb IS
END dac_ad5764r_24bit_sr_programmer_tb;
ARCHITECTURE behavior OF dac_ad5764r_24bit_sr_programmer_tb IS
-- Component Declaration for the Unit Under Test (UUT)
COMPONENT dac_ad5764r_24bit_sr_programmer_v2
PORT(
clk : IN std_logic;
start : IN std_logic;
reset : IN std_logic;
data_in : IN std_logic_vector(23 downto 0);
serial_data_out : OUT std_logic;
reset_all_dac : IN std_logic;
sync_out : OUT std_logic;
reset_out : OUT std_logic;
--finish : OUT std_logic;
busy : OUT std_logic
);
END COMPONENT;
--Inputs
signal clk : std_logic := '0';
signal start : std_logic := '0';
signal reset : std_logic := '0';
signal data_in : std_logic_vector(23 downto 0) := (others => '0');
signal reset_all_dac : std_logic := '0';
--Outputs
signal serial_data_out : std_logic;
signal sync_out : std_logic;
signal reset_out : std_logic;
--signal finish : std_logic;
signal busy : std_logic;
-- Clock period definitions
constant clk_period : time := 100 ns;
BEGIN
-- Instantiate the Unit Under Test (UUT)
uut: dac_ad5764r_24bit_sr_programmer_v2 PORT MAP (
clk => clk,
start => start,
reset => reset,
data_in => data_in,
reset_all_dac => reset_all_dac,
serial_data_out => serial_data_out,
sync_out => sync_out,
reset_out => reset_out,
--finish => finish,
busy => busy
);
-- Clock process definitions
clk_process :process
begin
clk <= '0';
wait for clk_period/2;
clk <= '1';
wait for clk_period/2;
end process;
-- Stimulus process
stim_proc: process
begin
-- hold reset state for 100 ns.
wait for clk_period*10;
reset <= '1' after 25 ns;
wait for clk_period*1;
reset <= '0' after 25 ns;
wait for clk_period*3;
reset_all_dac <= '1' after 25 ns;
wait for clk_period*1;
reset_all_dac <= '0' after 25 ns;
wait for clk_period*5;
data_in <= "111111111111111111111111" after 25 ns;
wait for clk_period*3;
start <= '1' after 25 ns;
wait for clk_period*1;
start <= '0' after 25 ns;
wait;
end process;
END;
UPDATE 1
Updated with the last design: this code is not causing any 'X' (can't figure out why, this doesn't but the previous did). However it's not starting (in POST-TRANSLATE simulation) just like the first 3 process machine, and the signal sync_out is stuck at 0 while it should be '1' by default.
UPDATE 2
I've been looking into the tecnology schematic, starting from the problem of the sync_out=0: it's implemented with a FDS, S is the FSM reset signal, D is coming from a LUT3 with I = state&reset&start and INIT = 45 = "00101101". I've looked for this LUT3 in the simulation and I've noticed that it has INIT = "00000000"!
Is there something I'm missing about how to run this simulation? It seems that every LUT in the design have not been set!
UPDATE 3
It seems that the Post-Translate simulation is buggy in some way, or I'm not configuring it correctly for some reason: the Post-Map and the Post-PAR simulations work and display some outputs.
However there is an odd bug: the stored_data register is not updated with the complete data_in vector, after that, the FSM operates correctly and outputs the data stored.
I've looked in the tecnology schematic just after synthesis and for some reason the bits 23,22,21,19,18 are not connected to the corresponding data_in bit. You can see the effect in this screenshot from Post-Map simulation. Same happens in Post-PAR, but it seems that this problems comes directly from the synthesis!
Solved: the strange output comes from the Synthesis optimization. The tool realized that the previous block in the elaboration chain will never output a bit different from 0 for those specific bit. My mistake was assuming that I could test the single block alone: what I was really testing was the block synthetized for the FPGA taking into account everything else in the design!
Thanks to everybody helped me, I'm going to follow your advices!
I prefer the single-process form of state machine, which is cleaner, simpler and much less prone to bugs like sensitivity-list errors. I would also endorse the points in Paebbels' excellent answer. However I don't think any of these are the problem here.
One thing to be aware of in post-synth and post-PAR simulations is that their model of time is different from the behavioural model. The behavioural model follows simple rules as I described in this answer and ensures that in a typical design flow you can go straight to hardware - without post-synth simulation, without worry.
Indeed I only use post-synth or post-PAR simulations if I'm chasing a suspected tool bug. (For FPGA designs, not ASIC, that is!)
However, that simple timing model has its limitations. You may be familiar with problems like a clock signal assigned via signal assignment (usually buried in a 3rd party model where you don't expect it) which consumes a delta cycle, and ensures that your clocked data arrives before your clock instead of after, and everything subsequently occurs one cycle earlier than intended...
In behavioural modelling, a little discipline will keep clear of such troubles. But the same is not true of post-PAR modelling.
Your testbench is probably set up the same way as the behavioural model. And if so, that is likely to be the problem.
Here's what I do in this situation : I claim no formal authority for it, just experience. It also works well when interfacing the FPGA to external memory models with realistic timings.
1) I assume the simple (behavioural) timing model works correctly for all signals INTERNAL to the design.
2) I assume nothing of the sort for inputs and outputs from the design.
3) I take note of the estimated setup and hold timings on the inputs, (a) from the FPGA datasheet or better, (b) from the worst case values shown in the post-synth or post-PAR report, and structure the testbench around them.
Worked example : setup time 1 ns, hold time 2 ns, clock period 10 ns. This means that any input between 2 ns and 9 ns after a clock edge is guaranteed to be corrrectly read. I choose (arbitrarily) 5 ns.
signal_to_fpga <= driving_value after 5 ns;
(Note that Xilinx makes this absurdly counter-intuitive by expressing them as "offset in/out before/after" which refers timings to a previous or future clock edge instead of the one you're looking at)
Alternatively, if the input is fed from a CPU or memory in the real world, I use datasheet timing specifications for that device.
4) I take note of the worst case clk-out timing reported in the datasheet or report, and structure the design around them. (say, 7 ns)
fpga_output_pin <= driving_value after 7 ns;
Note that this "after" clause is obviously ignored by synthesis; however the post-synth back-annotation will introduce something very like it.
5) If this turns out to be not good enough, then (possibly in a wrapper component to avoid polluting the synthesisable code) improve accuracy like
fpga_output_pin <= 'X' after 1 ps, driving_value after 7 ns;
6) I re-run the behavioural simulation. Typically, it now fails, because it was written without realistic timings in mind.
7) I fix those failures. This may include adding realistic delays before testing values output from the design. It can be an iterative process.
Now, I have a reasonable expectation that the post-PAR simulation model will drop straight in to the testbench and work.
Here are some hints to improve your code:
You can remove the Xilinx dependencies to UNISIM, because you are not using any Xilinx Primitves.
Applying attribute ENUM_ENCODING has no effect on state encoding unless you also define the attribute FSM_ENCODING and set it's value to user. One-Hot encoding can be forced by setting FSM_ENCODING to one-hot. Normally synthesis is smart enough to find the best encoding.
read more ...
None of your registers has a default value:
signal current_state : state_type := idle;
Your FSM is no FSM in the eyes of Xilinx synthesis tool (XST). I'm sure if you look into your synthesis report, you won't find that XST reports a FSM for current_state.
So what's wrong with your FSM?
Your FSM has no initial state.
Your FSM has multiple reset states (idle, load_and_prepare_data)
Your FSM has no transition from idle to load_and_prepare_data (reset is no transition)
Writing next_state transitions for the current state can cause XST to think it's no FSM
the default assignment next_state <= current_state; is sufficient.
If you change the type of signal clk_counter to unsigned you can do arithmetic much easier.
increment: clk_counter <= clk_counter + 1;
clear: clk_counter <= (others => '0');
compare: if (clk_counter = 23) then
It's no good style to use the FSM's state signal outside of the FSM processes.
FSM_next_state_process: process(current_state, start, clk_counter, reset_all_dac)
begin
next_state <= current_state;
OutReg_busy <= '1';
OutReg_reset_out <= '1';
OutReg_sync_out <= '1';
clk_counter_enable <= '0';
case (current_state) is
when idle =>
OutReg_busy <= '0';
if (reset_all_dac = '1') then
OutReg_reset_out <= '0';
end if;
when load_and_prepare_data =>
next_state <= transmission;
when transmission =>
clk_counter_enable <= '1';
OutReg_sync_out <= '0';
if (clk_counter = 23) then
next_state <= idle;
end if;
when others =>
next_state <= idle;
end case ;
end process;

Synthesis: Implementing a delay signal using a counter on power-up of FPGA

I am trying to have a delay of 20 seconds on power-up of the FPGA.
There is a clock input of 100Hz, so if a counter gets to 20,000, that should be 20 seconds worth of delay. After the delay, it should set an output pin high. However, for some reason, this out pin is going high immediately and never goes low at all on powerup. It's almost as if it is skipping the s_count <= 20000 completely.
Here is the code I have:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity delay is
port
(
pi_Clock : in std_logic;
po_Delay_Done : out std_logic
);
end entity;
architecture behavioral of delay is
begin
process(pi_Clock)
variable s_count : integer := 0;
begin
if rising_edge(pi_Clock) then
if s_count <= 20000 then
s_count := s_count + 1;
po_Delay_Done <= '0';
else
po_Delay_Done <= '1';
end if;
end if;
end process;
end architecture;
I have increased the 20000 to the max integer value, just to see if my clock was incorrect but the same result.
There is no other driver of this signal on the top level file.
Anyone see what I'm doing wrong?
For some reason, setting the initial values by the declarations seems to be the problem with this FPGA/toolset (synopsis Synplify and the FPGA is Actel A3PN250) even if it works in modelsim simulation.
The following code does what I want -- Set an output high after the FPGA is turned on for 20 seconds:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity delay is
port
(
pi_Clock : in std_logic;
pi_Reset : in std_logic;
po_Delay_Done : out std_logic
);
end entity;
architecture behavioral of delay is
begin
process(pi_Clock)
variable s_count: integer;
begin
if rising_edge(pi_Clock) then
if pi_Reset = '1' then
s_count := 0;
po_Delay_Done <= '0';
else
if s_count < 2000 then
s_count := s_count + 1;
else
po_Delay_Done <= '1';
end if;
end if;
end if;
end process;
end architecture;
The catch is that the microcontroller is now sending a reset signal (pi_Reset = '1') to the FPGA after it has been started.
Hope this helps anyone in the future, thanks to Quantum Ripple and Brian especially for suggesting the hard reset. If you had an answer I would accept it.
The code you typed has no simulation or synthesis errors and has a correct result in modelsim simulation software. (according to the above first comment "fru1tbat")
With respect to the above comments, I think if you synthesized the FPGA board correctly and the design doesn't worked (with applying a reset port to reset the output and variable parameters), the problem is related to the clock generator. Be sure about the clock generator and find the correct frequency.

Resources