Trying to find Fmax in VHDL but getting extra cycle of delay - vhdl

I want to see the speed of my VHDL design. As far as I know, it is indicated by Fmax in the Quartus II software. After compiling my design, it shows an Fmax of 653.59 MHz. I wrote a testbench and did some tests to make sure that the design is working as expected. The problem I have with the design is that at the rising edge of the clock, the inputs are set correctly, but the output only comes after one more cycle.
My question is: How can I check the speed of my design (longest delay between the input ports and the output port) and also get the output of the addition at the same time that the inputs are loaded/at the same cycle?
My testbench results are as follows:
a: 0001 and b: 0101 gives XXXX
a: 1001 and b: 0001 gives 0110 (the expected result from the previous
calculation)
a: 1001 and b: 1001 gives 1010 (the expected result from the previous
calculation)
etc
Code:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity adder is
port(
clk : in STD_LOGIC;
a : in unsigned(3 downto 0);
b : in unsigned(3 downto 0);
sum : out unsigned(3 downto 0)
);
end adder;
architecture rtl of adder is
signal a_r, b_r, sum_r : unsigned(3 downto 0);
begin
sum_r <= a_r + b_r;
process(clk)
begin
if (rising_edge(clk)) then
a_r <= a;
b_r <= b;
sum <= sum_r;
end if;
end process;
end rtl;
Testbench:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity testbench is
end entity;
architecture behavioral of testbench is
component adder is
port(
clk : in STD_LOGIC;
a : in unsigned(3 downto 0);
b : in unsigned(3 downto 0);
sum : out unsigned(3 downto 0)
);
end component;
signal a, b, sum : unsigned(3 downto 0);
signal clk : STD_LOGIC;
begin
uut: adder
port map(
clk => clk,
a => a,
b => b,
sum => sum
);
stim_process : process
begin
wait for 1 ns;
clk <= '0';
wait for 1 ns;
clk <= '1';
a <= "0001";
b <= "0101";
wait for 1 ns;
clk <= '0';
wait for 1 ns;
clk <= '1';
a <= "1001";
b <= "0001";
wait for 1 ns;
clk <= '0';
wait for 1 ns;
clk <= '1';
a <= "1001";
b <= "1001";
end process;
end behavioral;

is there any issue with using sum_r as your output?
You dont need the input and output registers, if you consider this ALU as a pure combinatorial logic. The Fmax once you deleted them will disappear, will then be dependent and what its connected from and what its connected to and only if incoming is from registers and outgoing is to registers. If it is only logic going from in to out and from input pin to output pin, I think its extremely difficult to say what the propagation delay is and vendors software like Altera and other modern vendors do not have tools which are adequate for this kind of analysis.
Thats why you will hear people talking about difficulties in design asynchronous logic.
I think such fine analysis is difficult to perform with certainty and accuracy. Since for you, the propagation delay would be in picoseconds. Even literature is difficult to find any quantitative answers on propagation delay.
Why is it difficult? remember that propagation delay is determined by the total path capacitance, there is a way to estimate propagation delay for transistors but I dont know the deep details about how the LUTs are internally constructed so I cannot give you a good estimation. So it depends heavily on the family, the process of manufacture, the construction of FPGA and if the load is connected to IO.
You may however make your own estimations by going to the logic planner, look at the path and assume about 20-100ps propagation delay per LUT that it travels through
See the image below.
What you are trying to design is an ALU. By definition, an ALU should be in theory simply a combinatorial logic.
Therefore, strictly speaking, your adder code should only be this.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity adder is
port(
a : in unsigned(3 downto 0);
b : in unsigned(3 downto 0);
sum : out unsigned(3 downto 0)
);
end adder;
architecture rtl of adder is
begin
sum <= a + b;
end rtl;
Where no clock is required since this function is really a combinatorial process.
However if you want to make your ALU go into a stage like how i have described, what you should be doing is actually this
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity adder is
port(
clk : in STD_LOGIC;
a : in unsigned(3 downto 0);
b : in unsigned(3 downto 0);
sum : out unsigned(3 downto 0)
);
end adder;
architecture rtl of adder is
signal a_r, b_r, sum_r : unsigned(3 downto 0);
signal internal_sum : unsigned(3 downto 0);
begin
sum <= sum_r;
internal_sum <= a_r + b_r;
process(clk)
begin
if (rising_edge(clk)) then
a_r <= a;
b_r <= b;
sum_r <= internal_sum;
end if;
end process;
end rtl;
You have not mentioned about carry out so i will not discuss that here.
Finally if you are using Altera, they have a very nice RTL viewer that you can have a look to see your synthesized design. Under Tools->Netlist Viewer-> RTL Viewer.

Related

10028 Can't solve multiple constant drivers for net "D[x]"

Hi pleople this is my code, and the only error is Error (10028): Can't resolve multiple constant drivers for net "D[22]" at bonus2.vhd(26). I<m new at this and I don't understant this error.
Here is my code
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity bonus2 is
port (
B,C : in unsigned(15 downto 0);
clear, clk : in std_logic;
Q : out unsigned(31 downto 0)
);
end entity bonus2;
architecture arch_bonus2 of bonus2 is
signal mult : unsigned(31 downto 0); --signal d'addition
signal D: unsigned(31 downto 0); --signal d'addition
begin
mult <= B * C;
process(clear)
begin
if clear = '1' then
D <= x"00000000";
end if;
end process;
process(clk)
begin
if rising_edge(clk) then
D <= mult + D;
end if;
end process;
Q <= D;
end architecture arch_bonus2;
Your code shows that you might be unfamiliar with some concepts of hardware design, so I suggest you catch up on combinational/sequential logic and how to describe them with VHDL. Also, a good rule of thumb is that you should be able to draw some parts of your design, and if you do you'll see that the D wire is driven by multiple nets.
What I believe you want to do, based on your code, is to be able to reset/initialize your output register. However, the way you have coded it suggests you want an asynchronous reset.
Please check here to know more about synch vs asynch logic and then I suggest you to explore further to understand the implications.
Here is a modified version of your code with an asynchronous reset, because it seems what you want.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity bonus2 is
port (
B,C : in unsigned(15 downto 0);
clear, clk : in std_logic;
Q : out unsigned(31 downto 0)
);
end entity bonus2;
architecture arch_bonus2 of bonus2 is
signal mult : unsigned(31 downto 0); --signal d'addition
signal D: unsigned(31 downto 0); --signal d'addition
begin
mult <= B * C;
process(clk,clear)
if clear == '1' then
D <= x"00000000";
elsif rising_edge(clk) then
D <= mult + D;
end if;
end process;
Q <= D;
end architecture arch_bonus2;

Weird behaviour in vhdl average using Microsemi FPGA

Good Afternoon, I am working on some code of averaging with a sliding window using VHDL language.
The problem is that the accumulator takes sometimes wrong values. (generally after restart)
library IEEE;
use IEEE.STD_LOGIC_1164.all;
use IEEE.STD_LOGIC_ARITH.all;
use IEEE.std_logic_unsigned.all;
entity cc_rssi_avr is
port (
nrst : in std_logic;
clk : in std_logic; --
ena : in std_logic;
data_in : in std_logic_vector(9 downto 0);
data_out : out std_logic_vector(9 downto 0)
);
end cc_rssi_avr;
architecture rtl of cc_rssi_avr is
constant buffer_size : natural :=8;
type MEM is array(0 to buffer_size-1) of std_logic_vector(9 downto 0);
signal shift_LT : MEM:=(others =>(others=>'0'));
signal sum_val:std_logic_vector(12 downto 0);
begin
--shift input data at every clock edge
process(clk,nrst)
begin
if nrst='0' then
shift_LT <= (others => (others => '0'));
sum_val <= (others=>'0');
elsif clk'event and clk='1' then
if ena = '0' then
shift_LT<=(others=>(others=>'0'));
sum_val<=(others=>'0');
else
shift_LT(0) <= data_in;
shift_LT(1 to buffer_size-1) <= shift_LT(0 to buffer_size-2);
sum_val <= sum_val + ("000"&data_in) - ("000"&shift_LT(buffer_size-1));
end if;
end if;
end process;
data_out<=sum_val(sum_val'high downto 3);
end rtl;
The problem is somehow, sum_val adds a value without subtraction or subtracts without addition, in a way that if the input returns to 0, the output returns to 7850 or a random value but not zero.
The design is running # 20 MHz (FPGA : Microsemi Smartfusion M2S050), and consists on an ADC driven by FPGA clock, and its output is routed to the FPGA pins so the samples are processed with this module in order to compute the average on 8 samples.
One last information that might be useful : FPGA is 92.6% Occupied (4LUT).
Can anyone provide some help.
Thanks

VHDL Filter not getting output for first values

I tried implementing a fir filter in VHDL but during the first three clocks I get no output and the error at 0 ps, Instance /filter_tb/uut/ : Warning: There is an 'U'|'X'|'W'|'Z'|'-' in an arithmetic operand, the result will be 'X'(es)..
Source file (I also have 2 other files for D Flip-Flops):
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.std_logic_unsigned.all;
entity filter is
port ( x: in STD_LOGIC_VECTOR(3 downto 0);
clk: in STD_LOGIC;
y: out STD_LOGIC_VECTOR(9 downto 0));
end filter;
architecture struct of filter is
type array1 is array (0 to 3) of STD_LOGIC_VECTOR(3 downto 0);
signal coef : array1 :=( "0001", "0011", "0010", "0001");
signal c0, c1, c2, c3: STD_LOGIC_VECTOR(7 downto 0):="00000000";
signal s0, s1, s2, s3: STD_LOGIC_VECTOR(3 downto 0) :="0000";
signal sum: STD_LOGIC_VECTOR(9 downto 0):="0000000000";
component DFF is
Port ( d : in STD_LOGIC_VECTOR(3 downto 0);
clk : in STD_LOGIC;
q : out STD_LOGIC_VECTOR(3 downto 0));
end component;
component lDFF is
Port ( d : in STD_LOGIC_VECTOR(9 downto 0);
clk : in STD_LOGIC;
q : out STD_LOGIC_VECTOR(9 downto 0));
end component;
begin
s0<=x;
c0<=x*coef(0);
DFF1: DFF port map(s0,clk,s1);
c1<=s1*coef(1);
DFF2: DFF port map(s1,clk,s2);
c2<=s2*coef(2);
DFF3: DFF port map(s2,clk,s3);
c3<=s3*coef(3);
sum<=("00" & c0+c1+c2+c3);
lDFF1: lDFF port map(sum,clk,y);
end struct;
Testbench:
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
-- Uncomment the following library declaration if using
-- arithmetic functions with Signed or Unsigned values
use ieee.std_logic_unsigned.all;
ENTITY filter_tb IS
END filter_tb;
ARCHITECTURE behavior OF filter_tb IS
-- Component Declaration for the Unit Under Test (UUT)
COMPONENT filter
PORT(
x : IN STD_LOGIC_VECTOR(3 downto 0);
clk : IN std_logic;
y : OUT STD_LOGIC_VECTOR(9 downto 0)
);
END COMPONENT;
--Inputs
signal x : STD_LOGIC_VECTOR(3 downto 0) := (others => '0');
signal clk : std_logic := '0';
--Outputs
signal y : STD_LOGIC_VECTOR(9 downto 0);
-- Clock period definitions
constant clk_period : time := 10 ns;
BEGIN
-- Instantiate the Unit Under Test (UUT)
uut: filter PORT MAP (
x => x,
clk => clk,
y => y
);
-- Clock process definitions
clk_process :process
begin
clk <= '0';
wait for clk_period/2;
clk <= '1';
wait for clk_period/2;
end process;
-- Stimulus process
stim_proc1: process
begin
x<="0001";
wait for 10ns;
x<="0011";
wait for 10ns;
x<="0010";
wait for 10ns;
--x<="0011";
end process;
END;
Output:
If anyonce could help, I'd appreciate it. I think it has something to do with the inital values of the signals c_i and s_i but I'm not too sure.
Your FIR filter contains flip-flops. These flip-flops have no reset input and so power up in an unknown state. You simulator models this by initialising the flip-flops' outputs to "UUUU" (as the are four bits wide). A 'U' std_logic value represents and uninitialised value.
So, your code behaves as you ought to expect. If you're not happy with that behaviour, you need to add a reset input and connect it to your flip-flops.
You have build a series of three register making up a cascade of registers.
You have not provided a reset so the register contents will be Unknown. You use the registers for calculations without any condition. Thus you arithmetic calculations will see the Unknown values and fail as you have seen.
The first (simplest) solution would be to add a reset. But that is not the best solution. You will no longer get warnings but the first three cycles of your output will be based on the register reset value not of your input signal.
If you have a big stream and don't care about some incorrect values in the first clock cycle you can live with that.
The really correct way would be to have a 'valid' signal transported along side your data. You only present the output data when there is a 'valid'. This is the standard method to process data through any pipeline hardware structure.
By the way: you normally do not build D-ffs yourself. The synthesizer will do that for you. You just use a clocked process and process the data vectors in it.
I have some questions. If I add a reset pin, when will I toggle it from 1 to 0? How can I create this circuit without explicitly using D-ffs?
You make a reset signal in the same way as you make your clock.
As to D-registers: they come out if you use the standard register VHDL code:
reg : process (clk,reset_n)
begin
// a-synchronous active low reset
if (reset_n='0') then
s0 <= "0000";
s1 <= "0000";
s2 <= "0000";
elsif (rising_edge(clk)) then
s0 <= x;
s1 <= s0;
s2 <= s1;
....
(Code entered as-is, not checked for syntax or typing errors)

VHDL No Delta Delay Input to Output Assignment

I've got a situation like the following:
library ieee;
use ieee.std_logic_1164;
entity clkin_to_clkout is
port (
clk_in : in std_logic;
clk_out : out std_logic);
end entity clkin_to_clkout;
architecture arch of clkin_to_clkout is
begin
clk_out <= clk_in;
end architecture arch;
The assignment of clk_in to clk_out isn't a problem for synthesis, but in a simulator it will induce a delta delay from clk_in to clk_out, thereby creating a clock crossing boundary. Is there any way to assign an entity output to an entity input without introducing a delta delay? Thanks.
Edit: Responses to some comments. First, I want this exact question answered, please. For clarification, I want the output port to behave exactly as if it were an alias of the input port. If the answer is, "In VHDL there is no possible way to make an output port an exact behavioral match of an input port", then that is the correct answer and I'll accept it as a limitation of the language. Second, if you don't see what the problem is, please instantiate the clkin_to_clkout entity in the following testbench and observe the difference between mr_sig_del_dly vs mr_sig_clk_dly when you simulate for a few clk1 cycles:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity delta_delay is
end entity delta_delay;
architecture arch of delta_delay is
signal clk1: std_logic := '0';
signal clk2 : std_logic;
signal mr_sig : unsigned(7 downto 0) := (others => '0');
signal mr_sig_del_dly : unsigned(7 downto 0);
signal mr_sig_clk_dly : unsigned(7 downto 0);
component clkin_to_clkout is
port (
clk_in : in std_logic;
clk_out : out std_logic);
end component clkin_to_clkout;
begin
clk1 <= not clk1 after 10 ns;
clk_inst : clkin_to_clkout
port map (
clk_in => clk1,
clk_out => clk2);
mr_sig <= mr_sig + 1 when rising_edge(clk1);
mr_sig_del_dly <= mr_sig when rising_edge(clk2);
mr_sig_clk_dly <= mr_sig when rising_edge(clk1);
end architecture arch;
When you simulate, you will observe that mr_sig_clk_dly is delayed 1 clock cycle as expected because it is assigned on the same clock that mr_sig is on (clk1). mr_sig_del_dly is not delayed 1 clk1 cycle even though clk2 is just a passthrough of clk1 in the clkin_to_clkout module. This is because clk2 is a delta delayed version of clk1 because I used a signal assignment.
Again, thanks for all your responses.
In VHDL-2008 or before there is no possible way to make an output port an exact behavioral match of an input port.
Reference Jim Lewis's comment to the original question.
Thanks, Jim and to all who opined.
It seems you do not know what a delta delay is.
A delta delay is an infinity small delay. Every assignment has (at least) a delta delay in simulation. That's just how VHDL works.
edit:
After your comments, I see where you are coming from. The issue you are encountering is probably simulation only, as synthesis will simplify it. However, there is a electronic equivalent, being the multi-phase clocks. Consider you want a 2-phase clock, i.e. differential signal, where the second signal is the inverse of the first. If you would realize these clocks by just using one invertor, the second signal would have a phase offset. This is due to the latency of the invertor component. Thus, in clock generating logic (like PLL and DCM) the not-inverted signal is also delayed (using a variable latency buffer). I.e. all clock signals need to be processed, giving them the same (delta) delay.
The same solution can be applied in VHDL. Example:
library ieee;
use ieee.std_logic_1164.all;
entity clk_buffers is
port(
clk : in std_logic;
clk1 : out std_logic;
clk2_n : out std_logic
);
end entity;
architecture rtl of clk_buffers is begin
clk1 <= clk;
clk2_n <= not clk;
end architecture;
library ieee;
entity test_bench is end entity;
architecture behavioural of test_bench is
use ieee.std_logic_1164.all;
signal clk, clk1, clk2_n : std_logic := '1';
signal base, child1, child2 : integer := 0;
begin
clk <= not clk after 1 ns;
clk_buffers_inst : entity work.clk_buffers
port map(clk => clk, clk1 => clk1, clk2_n => clk2_n);
base <= base+1 when rising_edge(clk1);
child1 <= base when rising_edge(clk1);
child2 <= base when falling_edge(clk2_n);
end architecture;

Signal value won't be initialized during simulation

We've a project for college where we have to simulate a MAC unit for DSP.
For the simulation, I'm using Aldec Riviera Pro 2014.06 through EDA playground.
The problem is that even though I initialized a 32-bit signed signal named add_res, at the simulation its value will be shown as XXXX_XXXX the whole time.
Here's the simulation's result.
Here's the code of the design.vhd
LIBRARY IEEE;
USE IEEE.std_logic_1164.all;
USE IEEE.numeric_std.all;
-----------------------------
ENTITY mac IS
PORT (B, C : IN SIGNED (15 DOWNTO 0);
clk : IN STD_LOGIC;
A : OUT SIGNED (31 DOWNTO 0));
END mac;
-----------------------------
ARCHITECTURE mac_rtl OF mac IS
SIGNAL mul_res: SIGNED (31 DOWNTO 0);
SIGNAL add_res: SIGNED (31 DOWNTO 0) := (others => '0');
BEGIN
mul_res <= B * C;
PROCESS (clk)
BEGIN
A <= mul_res + add_res;
add_res <= A;
END PROCESS;
END mac_rtl;
And here's the code of the testbench.vhd
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
entity testbench is
end entity testbench;
architecture BENCH of testbench is
component mac is
port (B, C : in SIGNED (15 DOWNTO 0);
clk : in STD_LOGIC;
A : out SIGNED (31 DOWNTO 0));
end component;
signal StopClock : BOOLEAN;
signal clk : STD_LOGIC;
signal B, C : SIGNED (15 DOWNTO 0);
signal A : SIGNED (31 DOWNTO 0);
begin
ClockGenerator: process
begin
clk <= '0';
wait for 2 ns;
while not StopClock loop
clk <= '0';
wait for 1 ns;
clk <= '1';
wait for 1 ns;
end loop;
wait;
end process ClockGenerator;
Stimulus: process
begin
B <= "0000000000000010";
C <= "0000000000001000";
wait;
end process Stimulus;
DUT : entity work.mac
port map (B, C, clk, A);
end architecture BENCH;
I've searched here and in Google in general for others having the same problem, but the solutions given didn't help.
I've tried and with a Reset variable from testbench, but nothing. It's like it won't be initialized at all, while everything else work normally.
The issue is that the value of add_res and mul_res have to be known at the time add_res is loaded into a register.
Note that the process is sensitive to clk but doesn't use an edge nor qualify with a value of clk.
I modified your architecture to qualify add_res update to the rising edge of clk. There's a built in assumption you have non-metavalue values on mult_res at that time. That can be dealt with in part by defining a default initial value.
Also the new value of A is not available until signals are update, which doesn't occur while there are any processes pending to be resumed in the current simulation cycle. This means you need to assign to add_res (which holds the accumulated value anyway) and assign to A outside the process:
ARCHITECTURE mac_rtl OF mac IS
SIGNAL mul_res: SIGNED (31 DOWNTO 0) := (others => '0'); -- added init val
SIGNAL add_res: SIGNED (31 DOWNTO 0) := (others => '0');
BEGIN
mul_res <= B * C;
PROCESS (clk)
BEGIN
if rising_edge(clk) then -- ADDED
-- A <= mul_res + add_res; CHANGED
add_res <= mul_res + add_res;
-- add_res <= A; CHANGED
end if; -- ADDED
END PROCESS;
A <= add_res;
END mac_rtl;
And this gives:
You could note there is no need to try to collapse A and add_res. For simulation purposes delta cycles caused by signal assignments that take effect after 0 simulation time has passed do not take simulation time.
Scheduled signal updates and delta cycles are used to emulate concurrency in signals that are inherently assigned sequentially. (And yes in the modified architecture the assignment to A will occur one delta cycle later than add_res).
(And yes I put a StopClock transaction at the tail end of the stimuli in process Stimulus in the testbench).

Resources