I would have just put a comment in the sourced post below but I don't have that privilege yet so I thought I might just ask a question so that I can get some clarification.
Basically I need to implement 2 clock cycle delay to this process located in the behavioral of my VHDL project (the code for which is shown below):
process(font_address(0), font_address(1), font_address(2), font_address(3),font_address(4), font_address(5), font_address(6), font_address(7),vga_hcount(0), vga_hcount(1),vga_hcount(2),CLK)
--if (CLK'event and CLK = '1') then
-- a_store <= a_store(1 downto 0) & a;
-- a_out <= a_store(1 downto 0);
--end if;
if (CLK'event and CLK = '1') then
case vga_hcount(2 downto 0) is
when "000" => font_bit <= font_data(7);
when "001" => font_bit <= font_data(6);
when "010" => font_bit <= font_data(5);
when "011" => font_bit <= font_data(4);
when "100" => font_bit <= font_data(3);
when "101" => font_bit <= font_data(2);
when "110" => font_bit <= font_data(1);
when "111" => font_bit <= font_data(0);
when others => font_bit <= font_data(0);
end case;
end if;
end process;
As you can see I have made it such that it takes a single clock cycle delay before the signal assignments in the process are made as provided by the if statement wrapped around the signal assignments but I cannot seem to create a synthesize-able 2 clock pulse delay despite reading the answered question linked above
When I comment the if statement wrapped around the case and uncomment the following block of code
if (CLK'event and CLK = '1') then
a_store <= a_store(1 downto 0) & a;
a_out <= a_store(1 downto 0);
end if;
Which was taken from the link given at the beginning of this question I get the following error:
[Synth 8-690] width mismatch in assignment; target has 2 bits, source has 3 bits ["U:/Computer organisation lab/vga/vga_prac.vhd":304]
the target being referred to in this error message is the a_store vector and the source is the concatenation of a_store and a.
This is after I assigned logic 1 to a and created a_store and a_out as std_logic_vectors with 2 elements (as I want a delay of two clock cycles). I think the reason I am getting this error is because even after reading over this question for hours I still can't seem to understand how it actually is supposed to generate a 2 clock cycle delay.
I thought at first it might be that a 1 bit gets iterated through the a_store vector until the MSB is one and then this vector is applied to a_out but looking at the fact that it is in all in an if statement I cannot see how these two lines of code would even execute more than once. If this were even true I would have to have some test to make sure that a_out has a 1 in its MSB.
Usually I would have moved on but after extensive searching I couldn't find a simpler solution than this despite the fact I don't fully understand how it is supposed to work.
If somebody could clarify this or suggest a modification to my program which will generate the required delay that would be great.
First off, the first code is not that efficient, and could be reduced to
use ieee.numeric_std.all;
if rising_edge(CLK) then
font_bit <= font_data(7 - to_integer(unsigned(vga_hcount(2 downto 0))));
end if;
end process;
For the second part, the error says everything. You say a_store has 2 bits (or "elements" as you call it), then you can imagine that a_store(1 downto 0) & a is two bits of a_store + 1 bit of a = 3 bits. You cannot assign 3 bits to 2 bits. How would that fit? Same problem for assigning a_out: How can 2 bits fit into 1 bit?
if (CLK'event and CLK = '1') then
a_store <= a_store(0) & a;
a_out <= a_store(1);
Multiplier via Repeated Addition

I need to create a 4 bit multiplier as a part of a 4-bit ALU in VHDL code, however the requirement is that we have to use repeated addition, meaning if A is one of the four bit number and B is the other 4 bit number, we would have to add A + A + A..., B number of times. I understand this requires either a for loop or a while loop while also having a temp variable to store the values, but my code just doesn't seem to be working and I just don't really understand how the functionality of it would work.
PR and T are temporary buffer standard logic vectors and A and B are the two input 4 bit numbers and C and D are the output values, but the loop just doesn't seem to work. I don't understand how to loop it so it keeps adding the A bit B number of times and thus do the multiplication of A * B.
WHEN "010" =>
PR <= "00000000";
T <= "0000";
PR <= PR + A;
T <= T + 1;
C <= PR(3 downto 0);
D <= PR(7 downto 4);
This will never work, because when a line with a signal assignment (<=) like this one:
PR <= PR + A;
is executed, the target of the signal assignment (PR in this case) is not updated immediately; instead an event (a future change) is scheduled. When is this event (change) actioned? When all processes have suspended (reached wait statements or end process statements).
So, your loop:
PR <= PR + A;
T <= T + 1;
just schedules more and more events on PR and T, but these events never get actioned because the process is still executing. There is more information here.
So, what's the solution to your problem? Well, it depends what hardware you are trying to achieve. Are you trying to achieve a block of combinational logic? Or sequential? (where the multiply takes multiple clock cycles)
I advise you to try not to think in terms of "temporary variables", "for loops" and "while loops". These are software constructions that can be useful, but ultimately you are designing a piece of hardware. You need to try to think about what physical pieces of hardware can be connected together to achieve your design, then how you might describe them using VHDL. This is difficult at first.
You should provide more information about what exactly you want to achieve (and on what kind of hardware) to increase the probability of getting a good answer.
You don't mention whether your multiplier needs to operate on signed or unsigned inputs. Let's assume signed, because that's a bit harder.
As has been noted, this whole exercise makes little sense if implemented combinationally, so let's assume you want a clocked (sequential) implementation.
You also don't mention how often you expect new inputs to arrive. This makes a big difference in the implementation. I don't think either one is necessarily more difficult to write than the other, but if you expect frequent inputs (e.g. every clock cycle), then you need a pipelined implementation (which uses more hardware). If you expect infrequent inputs (e.g. every 16 or more clock cycles) then a cheaper serial implementation should be used.
Let's assume you want a serial implementation, then I would start somewhere along these lines:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity loopy_mult is
g_a_bits : positive := 4;
g_b_bits : positive := 4
clk : in std_logic;
srst : in std_logic;
-- Input
in_valid : in std_logic;
in_a : in signed(g_a_bits-1 downto 0);
in_b : in signed(g_b_bits-1 downto 0);
-- Output
out_valid : out std_logic;
out_ab : out signed(g_a_bits+g_b_bits-1 downto 0)
end loopy_mult;
architecture rtl of loopy_mult is
signal a : signed(g_a_bits-1 downto 0);
signal b_sign : std_logic;
signal countdown : unsigned(g_b_bits-1 downto 0);
signal sum : signed(g_a_bits+g_b_bits-1 downto 0);
mult_proc : process(clk)
if rising_edge(clk) then
if srst = '1' then
out_valid <= '0';
countdown <= (others => '0');
if in_valid = '1' then -- (Initialize)
-- Record the value of A and sign of B for later
a <= in_a;
b_sign <= in_b(g_b_bits-1);
-- Initialize countdown
if in_b(g_b_bits-1) = '0' then
-- Input B is positive
countdown <= unsigned(in_b);
-- Input B is negative
countdown <= unsigned(-in_b);
end if;
-- Initialize sum
sum <= (others => '0');
-- Set the output valid flag if we're already finished (B=0)
if in_b = 0 then
out_valid <= '1';
out_valid <= '0';
end if;
elsif countdown > 0 then -- (Loop)
-- Let's assume the target is an FPGA with efficient add/sub
if b_sign = '0' then
sum <= sum + a;
sum <= sum - a;
end if;
-- Set the output valid flag when we get to the last loop
if countdown = 1 then
out_valid <= '1';
out_valid <= '0';
end if;
-- Decrement countdown
countdown <= countdown - 1;
-- (Idle)
out_valid <= '0';
end if;
end if;
end if;
end process mult_proc;
-- Output
out_ab <= sum;
end rtl;
how to initialize array of STD_LOGIC_VECTOR(15 downto 0) with data stored in BRAM

I have some Filter coefficients in BRAM those coefficients need to be written into an array to perform convolution. I have created an array using type and assigned it to a signal. That signal I have port mapped to DATA_OUT of BRAM. it's giving an error "expecting STD_LOGIC_VECTOR"
I've tried writing the data in an array with for loop. It results in an error "indices is not a STD_LOGIC_VECTOR"
my type declaration
TYPE coeff_pipe IS ARRAY(0 TO 15) OF std_logic_vector(7 downto 0);
Signal coeff:coeff_pipe;
my for loop is like this
for i in 0 to loop
coeff(i) <= data_out_BRAM(i); end loop;
Help me with suitable changes in my code to make it work
You mix the for loop behavior in a coding language (C) and in a hardware descritpion language (VHDL). In a coding language, if you write a for loop, processor will execute the content of the loop several times in row sequentially (one after one). In an HDL, for loop is used to instanciate several times the same circuit with different inputs/outputs. There is no time notion in a for loop.
In your case, you have to use a sequential process and increment your BRAM address :
process(clk, rst)
if rst = '1' then
addr_BRAM <= (others => '0');
addr_BRAM_d <= (others => '0');
ram_init_en <= '1';
ram_init_en_d <= '0';
coeff <= (others => (others => '0'));
elsif rising_edge(clk) then
addr_BRAM_d <= addr_BRAM ; -- Delay of 1 clk cycle
ram_init_en_d <= ram_init_en; -- Delay of 1 clk cycle
-- Init done
if addr_BRAM = x"1111" then
ram_init_en <= '0';
end if;
-- Increment BRAM address
if ram_init_en = '1' then
addr_BRAM <= std_logic_vector(unsigned(addr_BRAM) + 1);
end if;
-- Get data one cycle after set address because a BRAM doesn't answer instant, it answers in one clk cycle.
if ram_init_en_d = '1' then
coeff(to_integer(unsigned(addr_BRAM_d))) <= data_out_BRAM;
end if;
end if;
Led matrix row bits don't shift

I am new to VHDL and I am trying to do a simple application with a led matrix (8x8). My goal is to turn on the leds of the matrix so I can see a smiley face. For some reason none of the leds turn on.
In order to see what's wrong I tried to turn on all leds on each line at a time by commenting the case statement and giving cols<="00000000" before the statement, the result is that the only line that turns on is the first, it keeps turning on and off each second.
I made the frequency divider for 1 second just to see if the code works correctly.
use IEEE.std_logic_unsigned.all;
entity main is
Port ( clk : in STD_LOGIC;
rows : out STD_LOGIC_VECTOR (7 downto 0);
cols : out STD_LOGIC_VECTOR (7 downto 0));
end main;
architecture Behavioral of main is
signal count: std_logic_vector(7 downto 0):= "00000001";
signal clk1Hz: std_logic_vector(26 downto 0);
if rising_edge(clk) then
if clk1Hz = X"5F5E0FF" then
clk1Hz <= "000" & X"000000";
clk1Hz <= clk1Hz + 1;
end if;
if clk1Hz(26) = '1' then
if count = "10000000" then
count <= "00000001";
count(7 downto 1) <= count(6 downto 0);
count(0) <= '0';
end if;
rows <= count;
case count is
when "00000001" => cols <= "11111111";
when "00000010" => cols <= "11011011";
when "00000100" => cols <= "11011011";
when "00001000" => cols <= "11111111";
when "00010000" => cols <= "00111100";
when "00100000" => cols <= "10000001";
when "01000000" => cols <= "11000011";
when "10000000" => cols <= "11111111";
when others => cols <= "11111111";
end case;
end if;
end if;
end process;
end Behavioral;
Do you realize that if clk1Hz(26) = '1' then stays true from X"4000000" to X"5F5E0FF"?
You most likely want to change count only on the exact X"4000000" value, no? And not continuously for 1/3rd of the time...
I can tell you didn't simulate this. The clk1Hz signal wasn't initialized and wasn't reset. It won't spin in sim without this, since it initializes to X. However, on hardware it will work just fine.
So, when your clk counter hits x400_0000, bit (26) is set and your row/col shifters start going like mad for 1/3 of a second. Then when the clk counter resets, all activity stops.
Is this really what you want? I can see from simulating this that rows and cols are both shifting correctly, albeit for only a third of a second.
Edit - Can't Infer Register Because It's Behavior Does Not Match Any Supported Register Model VHDL

This is a branch off of a separate question I asked. I am going to explain more in depth on what I am trying to do and what it is not liking. This is a school project and doesn't need to follow standards.
I am attempting to make the SIMON game. Right now, what I am trying to do is use a switch case for levels and each level is supposed to be faster (hence different frequency dividers). The first level is supposed to be the first frequency and a pattern of LEDs is supposed to light up and disappear. Before I put in a switch case, the first level was by itself (no second level stuff) and it lit up and disappeared like it should. I also used compare = 0 in order to compare in output to an input. (The user is supposed to flip up the switches in the light pattern they saw). This worked when the first level was by itself but now that it is in a switch case, it doesn't like compare. I'm not sure how to get around that in order to compare an output to an input.
The errors I am getting are similar to before:
Error (10821): HDL error at FP.vhd(75): can't infer register for "compare" because its behavior does not match any supported register model
Error (10821): HDL error at FP.vhd(75): can't infer register for "count[0]" because its behavior does not match any supported register model
Error (10821): HDL error at FP.vhd(75): can't infer register for "count[1]" because its behavior does not match any supported register model
Error (10821): HDL error at FP.vhd(75): can't infer register for "count[2]" because its behavior does not match any supported register model
Error (10822): HDL error at FP.vhd(80): couldn't implement registers for assignments on this clock edge
Error (10822): HDL error at FP.vhd(102): couldn't implement registers for assignments on this clock edge
Error (12153): Can't elaborate top-level user hierarchy
I also understand that it doesn't like the rising_edge(toggle) but I need that in order to make the LED pattern light up and disappear.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.std_logic_unsigned.all;
entity FP is
clk, reset : in std_logic;
QF : out std_logic_vector (3 downto 0);
checkbtn : in std_logic;
Switch : in std_logic_vector(3 downto 0);
sel : in std_logic_vector (1 downto 0);
score : out std_logic_vector (6 downto 0)
end FP;
architecture behavior of FP is
signal time_count: integer:=0;
signal toggle : std_logic;
signal toggle1 : std_logic;
signal count : std_logic_vector (2 downto 0);
signal seg : std_logic_vector (3 downto 0);
signal compare : integer range 0 to 1:=0;
type STATE_TYPE is (level1, level2);
signal level : STATE_TYPE;
--signal input : std_logic_vector (3 downto 0);
--signal sev : std_logic_vector (6 downto 0);
process (clk, reset, sel)
if (reset = '0') then
time_count <= 0;
toggle <= '0';
elsif rising_edge (clk) then
case sel is
when "00" =>
if (time_count = 1249999) then
toggle <= not toggle;
time_count <= 0;
time_count <= time_count+1;
end if;
when "01" =>
if (time_count = 2499999) then
toggle1 <= not toggle1;
time_count <= 0;
time_count <= time_count+1;
end if;
when "10" =>
if (time_count = 4999999) then
toggle <= not toggle;
time_count <= 0;
time_count <= time_count+1;
end if;
when "11" =>
if (time_count = 12499999) then
toggle <= not toggle;
time_count <= 0;
time_count <= time_count+1;
end if;
end case;
end if;
end process;
Process (toggle, compare, switch)
case level is
when level1 =>
if sel = "00" then
count <= "001";
seg <= "1000";
elsif (rising_edge (toggle)) then
count <= "001";
compare <= 0;
if (count = "001") then
count <= "000";
count <= "000";
end if;
end if;
if (switch = "1000") and (compare = 0) and (checkbtn <= '0') then
score <= "1111001";
level <= level2;
score <= "1000000";
level <= level1;
end if;
when level2 =>
if sel = "01" then
count <= "010";
seg <= "0100";
elsif (rising_edge (toggle1)) then
count <= "010";
compare <= 1;
if (count = "010") then
count <= "000";
count <= "000";
end if;
end if;
if (switch = "0100") and (compare = 1) and (checkbtn <= '0') then
score <= "0100100";
score <= "1000000";
level <= level1;
end if;
end case;
case count is
when "000"=>seg<="0000";
when "001"=>seg<="1000";
when "010"=>seg<="0100";
when "011"=>seg<="0110";
when "100"=>seg<="0011";
when others=>seg<="0000";
end case;
end process;
QF <= seg;
end behavior;
Thanks again in advance!
Well... it is hard to tell what is wrong, because this state machine is written in wrong way. You should look for references about proper modeling of FSM in VHDL. One good example is here.
If you use Quartus, you could also look for Altera's description on how to model FSM specifically for their compiler.
I will now give you just two advices. First is that you shouldn't (or mabye even you can't) use is two
if rising_edge (clk)
checks in one process. If your process is supposed to be sensitive on clock edge, write it once at the beginning.
Second thing is that if you want to model FSM with one process with synchronous reset, then put just clk on sensitivity list.
EDIT after question and code edit:
Ok, much better now. But another few things:
Your FSM is still not like it should. Look again at the example in the source I gave you above and edit it to be like there, or make it one process FSM like in example in this link.
Intends! Very important. I couldn't spot some of obvious errors, before I made proper intendation in your code. This leads me to...
Look at the places, there you assign values to count, in particular the if statements. No mater what, you assign the same value of "000".
Similar story with another signal - seg. You assign to it some value in the process, and then at the end of this process there is case statement in which you assign to it some other value, making this previous assignments irrelevant.
Use rising_edge only once in the process, only to clock, and only at the very beginning of the process, or in the way you did in the first process, that has asynchronous reset. In second process you did all this three things.
In sequential process with rising_edge, like the first one, you don't have to put to sensitivity list anything more than clock, and reset if it is asynchronous, like in your case.
Sensitivity list in second process. It is parallel process, so you should put there signals, that you check in a process, and can change outside of it. It is not the case for compare. But there should be signals: level, sel and toggle1.
VHDL - synthesis results is not the same as behavioral

I have to write program in VHDL which calculate sqrt using Newton method. I wrote the code which seems to me to be ok but it does not work.
Behavioral simulation gives proper output value but post synthesis (and launched on hardware) not.
Program was implemented as state machine. Input value is an integer (used format is std_logic_vector), and output is fixed point (for calculation
purposes input value was multiplied by 64^2 so output value has 6 LSB bits are fractional part).
I used function to divide in vhdl from vhdlguru blogspot.
In behavioral simulation calculating sqrt takes about 350 ns (Tclk=10 ns) but in post synthesis only 50 ns.
Used code:
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity moore_sqrt is
port (clk : in std_logic;
enable : in std_logic;
input : in std_logic_vector (15 downto 0);
data_ready : out std_logic;
output : out std_logic_vector (31 downto 0)
end moore_sqrt;
architecture behavioral of moore_sqrt is
function division (x : std_logic_vector; y : std_logic_vector) return std_logic_vector is
variable a1 : std_logic_vector(x'length-1 downto 0):=x;
variable b1 : std_logic_vector(y'length-1 downto 0):=y;
variable p1 : std_logic_vector(y'length downto 0):= (others => '0');
variable i : integer:=0;
for i in 0 to y'length-1 loop
p1(y'length-1 downto 1) := p1(y'length-2 downto 0);
p1(0) := a1(x'length-1);
a1(x'length-1 downto 1) := a1(x'length-2 downto 0);
p1 := p1-b1;
if(p1(y'length-1) ='1') then
a1(0) :='0';
p1 := p1+b1;
a1(0) :='1';
end if;
end loop;
return a1;
end division;
type state_type is (s0, s1, s2, s3, s4, s5, s6); --type of state machine
signal current_state,next_state: state_type; --current and next state declaration
signal xk : std_logic_vector (31 downto 0);
signal temp : std_logic_vector (31 downto 0);
signal latched_input : std_logic_vector (15 downto 0);
signal iterations : integer := 0;
signal max_iterations : integer := 10; --corresponds with accuracy
process (clk,enable)
if enable = '0' then
current_state <= s0;
elsif clk'event and clk = '1' then
current_state <= next_state; --state change
end if;
end process;
--state machine
process (current_state)
case current_state is
when s0 => -- reset
output <= "00000000000000000000000000000000";
data_ready <= '0';
next_state <= s1;
when s1 => -- latching input data
latched_input <= input;
next_state <= s2;
when s2 => -- start calculating
-- initial value is set as a half of input data
output <= "00000000000000000000000000000000";
data_ready <= '0';
xk <= "0000000000000000" & division(latched_input, "0000000000000010");
next_state <= s3;
iterations <= 0;
when s3 => -- division
temp <= division ("0000" & latched_input & "000000000000", xk);
next_state <= s4;
when s4 => -- calculating
if(iterations < max_iterations) then
xk <= xk + temp;
next_state <= s5;
iterations <= iterations + 1;
next_state <= s6;
end if;
when s5 => -- shift logic right by 1
xk <= division(xk, "00000000000000000000000000000010");
next_state <= s3;
when s6 => -- stop - proper data
-- output <= division(xk, "00000000000000000000000001000000"); --the nearest integer value
output <= xk; -- fixed point 24.6, sqrt = output/64;
data_ready <= '1';
end case;
end process;
end behavioral;
Below screenshoots of behavioral and post-sythesis simulation results:
Behavioral simulation
Post-synthesis simulation
I have only little experience with VHDL and I have no idea what can I do to fix problem. I tried to exclude other process which was for calculation but it also did not work.
I hope you can help me.
Platform: Zynq ZedBoard
IDE: Vivado 2014.4
A lot of the problems can be eliminated if you rewrite the state machine in single process form, in a pattern similar to this. That will eliminate both the unwanted latches, and the simulation /synthesis mismatches arising from sensitivity list errors.
I believe you are also going to have to rewrite the division function with its loop in the form of a state machine - either a separate state machine, handshaking with the main one to start a divide and signal its completion, or as part of a single hierarchical state machine as described in this Q&A.
This code is neither correct for simulation nor for synthesis.
Simulation issues:
Your sensitivity list is not complete, so the simulation does not show the correct behavior of the synthesized hardware. All right-hand-side signals should be include if the process is not clocked.
Synthesis issues:
Your code produces masses of latches. There is only one register called current_state. Latches should be avoided unless you know exactly what you are doing.
You can't divide numbers in the way you are using the function, if you want to keep a proper frequency of your circuit.
=> So check your Fmax report and
=> the RTL schematic or synthesis report for resource utilization.
Don't use the devision to shift bits. Neither in software the compiler implements a division if a value is shifted by a power of two. Us a shift operation to shift a value.
Other things to rethink:
enable is a low active asynchronous reset. Synchronous resets are better for FPGA implementations.
VHDL code may by synthesizable or not, and the synthesis result may behave as the simulation, or not. This depends on the code, the synthesizer, and the target platform, and is very normal.
Behavioral code is good for test-benches, but - in general - cannot be synthesized.
Here I see the most obvious issue with your code:
process (current_state)
iterations <= iterations + 1;
end process;
You are iterating over a signal which does not appear in the sensitivity list of the process. This might be ok for the simulator which executes the process blocks just like software. On the other hand side, the synthesis result is totally unpredictable. But adding iterations to the sensitivity list is not enough. You would just end up with an asynchronous design. Your target platform is a clocked device. State changes may only occur at the trigger edge of the clock.
You need to tell the synthesizer how to map the iterations required to perform this calculation over the clock cycles. The safest way to do that is to break down the behavioural code into RTL code (
