Weird behaviour in vhdl average using Microsemi FPGA - vhdl

Good Afternoon, I am working on some code of averaging with a sliding window using VHDL language.
The problem is that the accumulator takes sometimes wrong values. (generally after restart)
library IEEE;
use IEEE.STD_LOGIC_1164.all;
use IEEE.STD_LOGIC_ARITH.all;
use IEEE.std_logic_unsigned.all;
entity cc_rssi_avr is
port (
nrst : in std_logic;
clk : in std_logic; --
ena : in std_logic;
data_in : in std_logic_vector(9 downto 0);
data_out : out std_logic_vector(9 downto 0)
);
end cc_rssi_avr;
architecture rtl of cc_rssi_avr is
constant buffer_size : natural :=8;
type MEM is array(0 to buffer_size-1) of std_logic_vector(9 downto 0);
signal shift_LT : MEM:=(others =>(others=>'0'));
signal sum_val:std_logic_vector(12 downto 0);
begin
--shift input data at every clock edge
process(clk,nrst)
begin
if nrst='0' then
shift_LT <= (others => (others => '0'));
sum_val <= (others=>'0');
elsif clk'event and clk='1' then
if ena = '0' then
shift_LT<=(others=>(others=>'0'));
sum_val<=(others=>'0');
else
shift_LT(0) <= data_in;
shift_LT(1 to buffer_size-1) <= shift_LT(0 to buffer_size-2);
sum_val <= sum_val + ("000"&data_in) - ("000"&shift_LT(buffer_size-1));
end if;
end if;
end process;
data_out<=sum_val(sum_val'high downto 3);
end rtl;
The problem is somehow, sum_val adds a value without subtraction or subtracts without addition, in a way that if the input returns to 0, the output returns to 7850 or a random value but not zero.
The design is running # 20 MHz (FPGA : Microsemi Smartfusion M2S050), and consists on an ADC driven by FPGA clock, and its output is routed to the FPGA pins so the samples are processed with this module in order to compute the average on 8 samples.
One last information that might be useful : FPGA is 92.6% Occupied (4LUT).
Can anyone provide some help.
Thanks

Related

FSM for 4-bit UP-counter on VHDL

Hay, I've coded my 4-bit up counter already but I need some help with this certain part.
The up counter works fine but I need to make some changes to my input so it follows my lab requirement.
Design a 4-bit UP-counter which counts from 0 through n and follows the sequence
[0, n^0+a, n^1+a, n^2+a, n^3+a, ....].
I'm suppose to use two input vectors of n and a and they both are 2-bits each. My clock suppose to be connected into SW0, also, I'm suppose to connect n to SW1-SW2 and a to SW3-SW4.
I've already connected everything I just need help and understanding on how can I implement a and n in the following code.
As the instructions says I can not use Multipliers or Adders.
Any help would be appreciated.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_unsigned.ALL;
entity upCounter is
Port ( clk : in STD_LOGIC;
reset : in STD_LOGIC;
--n : in STD_LOGIC_VECTOR (1 downto 0);
--a : in STD_LOGIC_VECTOR (1 downto 0);
output : out STD_LOGIC_VECTOR (3 downto 0)
);
end upCounter;
architecture Behavioral of upCounter is
signal count: STD_LOGIC_VECTOR (3 downto 0);
begin
process (clk, reset)
begin
if reset = '1' then
count <= "0000";
elsif clk'event and clk = '1' then
count <= count + 1;
end if;
end process;
output <= count;
end Behavioral;

VHDL Filter not getting output for first values

I tried implementing a fir filter in VHDL but during the first three clocks I get no output and the error at 0 ps, Instance /filter_tb/uut/ : Warning: There is an 'U'|'X'|'W'|'Z'|'-' in an arithmetic operand, the result will be 'X'(es)..
Source file (I also have 2 other files for D Flip-Flops):
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.std_logic_unsigned.all;
entity filter is
port ( x: in STD_LOGIC_VECTOR(3 downto 0);
clk: in STD_LOGIC;
y: out STD_LOGIC_VECTOR(9 downto 0));
end filter;
architecture struct of filter is
type array1 is array (0 to 3) of STD_LOGIC_VECTOR(3 downto 0);
signal coef : array1 :=( "0001", "0011", "0010", "0001");
signal c0, c1, c2, c3: STD_LOGIC_VECTOR(7 downto 0):="00000000";
signal s0, s1, s2, s3: STD_LOGIC_VECTOR(3 downto 0) :="0000";
signal sum: STD_LOGIC_VECTOR(9 downto 0):="0000000000";
component DFF is
Port ( d : in STD_LOGIC_VECTOR(3 downto 0);
clk : in STD_LOGIC;
q : out STD_LOGIC_VECTOR(3 downto 0));
end component;
component lDFF is
Port ( d : in STD_LOGIC_VECTOR(9 downto 0);
clk : in STD_LOGIC;
q : out STD_LOGIC_VECTOR(9 downto 0));
end component;
begin
s0<=x;
c0<=x*coef(0);
DFF1: DFF port map(s0,clk,s1);
c1<=s1*coef(1);
DFF2: DFF port map(s1,clk,s2);
c2<=s2*coef(2);
DFF3: DFF port map(s2,clk,s3);
c3<=s3*coef(3);
sum<=("00" & c0+c1+c2+c3);
lDFF1: lDFF port map(sum,clk,y);
end struct;
Testbench:
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
-- Uncomment the following library declaration if using
-- arithmetic functions with Signed or Unsigned values
use ieee.std_logic_unsigned.all;
ENTITY filter_tb IS
END filter_tb;
ARCHITECTURE behavior OF filter_tb IS
-- Component Declaration for the Unit Under Test (UUT)
COMPONENT filter
PORT(
x : IN STD_LOGIC_VECTOR(3 downto 0);
clk : IN std_logic;
y : OUT STD_LOGIC_VECTOR(9 downto 0)
);
END COMPONENT;
--Inputs
signal x : STD_LOGIC_VECTOR(3 downto 0) := (others => '0');
signal clk : std_logic := '0';
--Outputs
signal y : STD_LOGIC_VECTOR(9 downto 0);
-- Clock period definitions
constant clk_period : time := 10 ns;
BEGIN
-- Instantiate the Unit Under Test (UUT)
uut: filter PORT MAP (
x => x,
clk => clk,
y => y
);
-- Clock process definitions
clk_process :process
begin
clk <= '0';
wait for clk_period/2;
clk <= '1';
wait for clk_period/2;
end process;
-- Stimulus process
stim_proc1: process
begin
x<="0001";
wait for 10ns;
x<="0011";
wait for 10ns;
x<="0010";
wait for 10ns;
--x<="0011";
end process;
END;
Output:
If anyonce could help, I'd appreciate it. I think it has something to do with the inital values of the signals c_i and s_i but I'm not too sure.
Your FIR filter contains flip-flops. These flip-flops have no reset input and so power up in an unknown state. You simulator models this by initialising the flip-flops' outputs to "UUUU" (as the are four bits wide). A 'U' std_logic value represents and uninitialised value.
So, your code behaves as you ought to expect. If you're not happy with that behaviour, you need to add a reset input and connect it to your flip-flops.
You have build a series of three register making up a cascade of registers.
You have not provided a reset so the register contents will be Unknown. You use the registers for calculations without any condition. Thus you arithmetic calculations will see the Unknown values and fail as you have seen.
The first (simplest) solution would be to add a reset. But that is not the best solution. You will no longer get warnings but the first three cycles of your output will be based on the register reset value not of your input signal.
If you have a big stream and don't care about some incorrect values in the first clock cycle you can live with that.
The really correct way would be to have a 'valid' signal transported along side your data. You only present the output data when there is a 'valid'. This is the standard method to process data through any pipeline hardware structure.
By the way: you normally do not build D-ffs yourself. The synthesizer will do that for you. You just use a clocked process and process the data vectors in it.
I have some questions. If I add a reset pin, when will I toggle it from 1 to 0? How can I create this circuit without explicitly using D-ffs?
You make a reset signal in the same way as you make your clock.
As to D-registers: they come out if you use the standard register VHDL code:
reg : process (clk,reset_n)
begin
// a-synchronous active low reset
if (reset_n='0') then
s0 <= "0000";
s1 <= "0000";
s2 <= "0000";
elsif (rising_edge(clk)) then
s0 <= x;
s1 <= s0;
s2 <= s1;
....
(Code entered as-is, not checked for syntax or typing errors)

VHDL Moving average: simulation & synthesis result differ (Vivado)

For my project I need to reduce a noise of an ADC output and implemented a simple moving average filter in VHDL.
Although it works in simulation (see the picture):
it has some strange behavior if I display it on the chipscope when the system is running in FPGA (see the picture):
The VHDL code I use for the moving average is as follows:
library ieee;
use ieee.std_logic_1164.all;
use ieee.math_real.all;
use ieee.numeric_std.all;
entity moving_avg is
generic(
SAMPLES_COUNT : integer := 32
);
port (
clk_i : in std_logic;
rst_n_i : in std_logic;
sample_i : in std_logic_vector(11 downto 0);
avg_o : out std_logic_vector(11 downto 0)
);
end;
architecture rtl of moving_avg is
type sample_buff_t is array (1 to SAMPLES_COUNT) of std_logic_vector(11 downto 0);
signal sample_buffer : sample_buff_t;
signal sum : std_logic_vector(31 downto 0);
constant wid_shift : integer := integer(ceil(log2(real(SAMPLES_COUNT))));
signal avg_interm_s : std_logic_vector(31 downto 0);
begin
process (clk_i, rst_n_i) begin
if rst_n_i='1' then
sample_buffer <= (others => sample_i);
sum <= std_logic_vector(unsigned(resize(unsigned(sample_i), sum'length)) sll wid_shift) ;
elsif rising_edge(clk_i) then
sample_buffer <= sample_i & sample_buffer(1 to SAMPLES_COUNT-1);
sum <= std_logic_vector(unsigned(sum) + unsigned(sample_i) - unsigned(sample_buffer(SAMPLES_COUNT)));
end if;
end process;
avg_interm_s <= std_logic_vector((unsigned(sum) srl wid_shift));
avg_o <= avg_interm_s(11 downto 0);
end;
I use Xilinx Vivado tool 2015.2 running on Ubuntu 14.04 x64.
Could you please help me to identify the problem, such
that results in simulation correspond to results after synthesis?

Trying to find Fmax in VHDL but getting extra cycle of delay

I want to see the speed of my VHDL design. As far as I know, it is indicated by Fmax in the Quartus II software. After compiling my design, it shows an Fmax of 653.59 MHz. I wrote a testbench and did some tests to make sure that the design is working as expected. The problem I have with the design is that at the rising edge of the clock, the inputs are set correctly, but the output only comes after one more cycle.
My question is: How can I check the speed of my design (longest delay between the input ports and the output port) and also get the output of the addition at the same time that the inputs are loaded/at the same cycle?
My testbench results are as follows:
a: 0001 and b: 0101 gives XXXX
a: 1001 and b: 0001 gives 0110 (the expected result from the previous
calculation)
a: 1001 and b: 1001 gives 1010 (the expected result from the previous
calculation)
etc
Code:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity adder is
port(
clk : in STD_LOGIC;
a : in unsigned(3 downto 0);
b : in unsigned(3 downto 0);
sum : out unsigned(3 downto 0)
);
end adder;
architecture rtl of adder is
signal a_r, b_r, sum_r : unsigned(3 downto 0);
begin
sum_r <= a_r + b_r;
process(clk)
begin
if (rising_edge(clk)) then
a_r <= a;
b_r <= b;
sum <= sum_r;
end if;
end process;
end rtl;
Testbench:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity testbench is
end entity;
architecture behavioral of testbench is
component adder is
port(
clk : in STD_LOGIC;
a : in unsigned(3 downto 0);
b : in unsigned(3 downto 0);
sum : out unsigned(3 downto 0)
);
end component;
signal a, b, sum : unsigned(3 downto 0);
signal clk : STD_LOGIC;
begin
uut: adder
port map(
clk => clk,
a => a,
b => b,
sum => sum
);
stim_process : process
begin
wait for 1 ns;
clk <= '0';
wait for 1 ns;
clk <= '1';
a <= "0001";
b <= "0101";
wait for 1 ns;
clk <= '0';
wait for 1 ns;
clk <= '1';
a <= "1001";
b <= "0001";
wait for 1 ns;
clk <= '0';
wait for 1 ns;
clk <= '1';
a <= "1001";
b <= "1001";
end process;
end behavioral;
is there any issue with using sum_r as your output?
You dont need the input and output registers, if you consider this ALU as a pure combinatorial logic. The Fmax once you deleted them will disappear, will then be dependent and what its connected from and what its connected to and only if incoming is from registers and outgoing is to registers. If it is only logic going from in to out and from input pin to output pin, I think its extremely difficult to say what the propagation delay is and vendors software like Altera and other modern vendors do not have tools which are adequate for this kind of analysis.
Thats why you will hear people talking about difficulties in design asynchronous logic.
I think such fine analysis is difficult to perform with certainty and accuracy. Since for you, the propagation delay would be in picoseconds. Even literature is difficult to find any quantitative answers on propagation delay.
Why is it difficult? remember that propagation delay is determined by the total path capacitance, there is a way to estimate propagation delay for transistors but I dont know the deep details about how the LUTs are internally constructed so I cannot give you a good estimation. So it depends heavily on the family, the process of manufacture, the construction of FPGA and if the load is connected to IO.
You may however make your own estimations by going to the logic planner, look at the path and assume about 20-100ps propagation delay per LUT that it travels through
See the image below.
What you are trying to design is an ALU. By definition, an ALU should be in theory simply a combinatorial logic.
Therefore, strictly speaking, your adder code should only be this.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity adder is
port(
a : in unsigned(3 downto 0);
b : in unsigned(3 downto 0);
sum : out unsigned(3 downto 0)
);
end adder;
architecture rtl of adder is
begin
sum <= a + b;
end rtl;
Where no clock is required since this function is really a combinatorial process.
However if you want to make your ALU go into a stage like how i have described, what you should be doing is actually this
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity adder is
port(
clk : in STD_LOGIC;
a : in unsigned(3 downto 0);
b : in unsigned(3 downto 0);
sum : out unsigned(3 downto 0)
);
end adder;
architecture rtl of adder is
signal a_r, b_r, sum_r : unsigned(3 downto 0);
signal internal_sum : unsigned(3 downto 0);
begin
sum <= sum_r;
internal_sum <= a_r + b_r;
process(clk)
begin
if (rising_edge(clk)) then
a_r <= a;
b_r <= b;
sum_r <= internal_sum;
end if;
end process;
end rtl;
You have not mentioned about carry out so i will not discuss that here.
Finally if you are using Altera, they have a very nice RTL viewer that you can have a look to see your synthesized design. Under Tools->Netlist Viewer-> RTL Viewer.

Why Does This VHDL Work in Sumulation and Does not Work on the Virtex 5 Device

I have spent the whole day trying to solve the following problem. I am building a small averaging multichannel oscilloscope and I have the following module for storing the signal:
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned.all;
use IEEE.numeric_std.all;
entity storage is
port
(
clk_in : in std_logic;
reset : in std_logic;
element_in : in std_logic;
data_in : in std_logic_vector(11 downto 0);
addr : in std_logic_vector(9 downto 0);
add : in std_logic; -- add = '1' means add to RAM
-- add = '0' means write to RAM
dump : in std_logic;
element_out : out std_logic;
data_out : out std_logic_vector(31 downto 0)
);
end storage;
architecture rtl of storage is
component bram is
port
(
clk : in std_logic;
we : in std_logic;
en : in std_logic;
addr : in std_logic_vector(9 downto 0);
di : in std_logic_vector(31 downto 0);
do : out std_logic_vector(31 downto 0)
);
end component bram;
type state is (st_startwait, st_add, st_write);
signal current_state : state := st_startwait;
signal next_state : state := st_startwait;
signal start : std_logic;
signal we : std_logic;
signal en : std_logic;
signal di : std_logic_vector(31 downto 0);
signal do : std_logic_vector(31 downto 0);
signal data : std_logic_vector(11 downto 0);
begin
ram : bram port map
(
clk => clk_in,
we => we,
en => en,
addr => addr,
di => di,
do => do
);
process(clk_in, reset, start)
begin
if rising_edge(clk_in) then
if (reset = '1') then
current_state <= st_startwait;
else
start <= '0';
current_state <= next_state;
if (element_in = '1') then
start <= '1';
end if;
end if;
end if;
end process;
process(current_state, start, dump)
variable acc : std_logic_vector(31 downto 0);
begin
element_out <= '0';
en <= '1';
we <= '0';
case current_state is
when st_startwait =>
if (start = '1') then
acc(11 downto 0) := data_in;
acc(31 downto 12) := (others => '0');
next_state <= st_add;
else
next_state <= st_startwait;
end if;
when st_add =>
if (add = '1') then
acc := acc + do;
end if;
we <= '1';
di <= acc;
next_state <= st_write;
when st_write =>
if (dump = '1') then
data_out <= acc;
element_out <= '1';
end if;
next_state <= st_startwait;
end case;
end process;
end rtl;
Below is the BRAM module as copied from the XST manual. This is a no-change type of BRAM and I believe there is the problem. The symptom is that, while this simulates fine, I read only zeroes from the memory when I use the design on the device.
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned.all;
entity bram is
port
(
clk : in std_logic;
we : in std_logic;
en : in std_logic;
addr : in std_logic_vector(9 downto 0);
di : in std_logic_vector(31 downto 0);
do : out std_logic_vector(31 downto 0)
);
end bram;
architecture rtl of bram is
type ram_type is array (0 to 999) of std_logic_vector (31 downto 0);
signal buf : ram_type;
begin
process(clk, en, we)
begin
if rising_edge(clk) then
if en = '1' then
if we = '1' then
buf(conv_integer(addr)) <= di;
else
do <= buf(conv_integer(addr));
end if;
end if;
end if;
end process;
end rtl;
What follows is a description of the chip use and the expected output. "clk_in" is a 50 MHz clock. "element_in" is '1' for 20 ns and '0' for 60 ns. "addr_in" iterates from 0 to 999 and changes every 80 ns. "element_in", "data_in", and "addr" are all aligned and synchronous. Now "add" is '1' for 1000 elements, then both "add" and "dump" are zero for 8000 elements and, finally "dump" is '1' for 1000 elements. Now, if I have a test bench that supplies "data_in" from 0 to 999, I expect data_out to be 0, 10, 20, 30, ..., 9990 when "dump" is '1'. That is according to the simulation. In reality I get 0, 1, 2, 3, ..., 999....
Some initial issues to address are listed below.
The process(current_state, start, dump) in storage entity looks like it is
intended to implement a combinatorial element (gates), but the signal (port)
data_in is not in the sensitivity list.
This is very likely to cause a difference between simulation and synthesis
behavior, since simulation will typically only react to the signals in the
sensitivity list, where synthesis will implement the combinatorial design and
react on all used signals, but may give a warning about incomplete sensitivity
list or inferred latches. If you are using VHDL-2008 then use can use a
sensitivity list of (all) to have the process sensitivity to all used
signals, and otherwise you need to add missing signals manually.
The case current_state is in process(current_state, start, dump) lacks an
when others => ..., so the synthesis tool has probably given you a warning
about inferred latches. This should be fixed by adding the when others =>
with and assign all signals driven by the process to the relevant value.
The use clause lists:
use IEEE.std_logic_unsigned.all;
use IEEE.numeric_std.all;
But both of these should not be used at the same time, since they declare some
of the same identifiers, for example is unsigned declared in both. Since the
RAM uses std_logic_unsigned I suggest that you stick with that only, and
delete use of numeric_std. For new code I would though recommend use of
numeric_std.
Also the process(clk_in, reset, start) in storage entity implements a
sequential element (flip flop) sensitive to only rising edge of clk_in, so
the two last signals in sensitivity list ..., reset, start) are unnecessary,
but does not cause a problem.

Resources