how to initialize array of STD_LOGIC_VECTOR(15 downto 0) with data stored in BRAM - vhdl

I have some Filter coefficients in BRAM those coefficients need to be written into an array to perform convolution. I have created an array using type and assigned it to a signal. That signal I have port mapped to DATA_OUT of BRAM. it's giving an error "expecting STD_LOGIC_VECTOR"
I've tried writing the data in an array with for loop. It results in an error "indices is not a STD_LOGIC_VECTOR"
my type declaration
TYPE coeff_pipe IS ARRAY(0 TO 15) OF std_logic_vector(7 downto 0);
Signal coeff:coeff_pipe;
my for loop is like this
for i in 0 to loop
coeff(i) <= data_out_BRAM(i); end loop;
Help me with suitable changes in my code to make it work

You mix the for loop behavior in a coding language (C) and in a hardware descritpion language (VHDL). In a coding language, if you write a for loop, processor will execute the content of the loop several times in row sequentially (one after one). In an HDL, for loop is used to instanciate several times the same circuit with different inputs/outputs. There is no time notion in a for loop.
In your case, you have to use a sequential process and increment your BRAM address :
process(clk, rst)
begin
if rst = '1' then
addr_BRAM <= (others => '0');
addr_BRAM_d <= (others => '0');
ram_init_en <= '1';
ram_init_en_d <= '0';
coeff <= (others => (others => '0'));
elsif rising_edge(clk) then
addr_BRAM_d <= addr_BRAM ; -- Delay of 1 clk cycle
ram_init_en_d <= ram_init_en; -- Delay of 1 clk cycle
-- Init done
if addr_BRAM = x"1111" then
ram_init_en <= '0';
end if;
-- Increment BRAM address
if ram_init_en = '1' then
addr_BRAM <= std_logic_vector(unsigned(addr_BRAM) + 1);
end if;
-- Get data one cycle after set address because a BRAM doesn't answer instant, it answers in one clk cycle.
if ram_init_en_d = '1' then
coeff(to_integer(unsigned(addr_BRAM_d))) <= data_out_BRAM;
end if;
end if;
end process;

Related

How to remove redundant processes in VHDL

I am unfortunately new to VHDL but not new to software development. What is the equivalency to functions in VHDL? Specifically, in the code below I need to debounce four push buttons instead of one. Obviously repeating my process code four times and suffixing each of my signals with a number to make them unique for the four instances is not the professional nor correct way of doing this. How do I collapse all this down into one process "function" to which I can "pass" the signals so I can excise all this duplicate code?
----------------------------------------------------------------------------------
-- Debounced pushbutton examples
----------------------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity pushbutton is
generic(
counter_size : integer := 19 -- counter size (19 bits gives 10.5ms with 50MHz clock)
);
port(
CLK : in std_logic; -- input clock
BTN : in std_logic_vector(0 to 3); -- input buttons
AN : out std_logic_vector(0 to 3); -- 7-segment digit anodes ports
LED : out std_logic_vector(0 to 3) -- LEDs
);
end pushbutton;
architecture pb of pushbutton is
signal flipflops0 : std_logic_vector(1 downto 0); -- input flip flops
signal flipflops1 : std_logic_vector(1 downto 0);
signal flipflops2 : std_logic_vector(1 downto 0);
signal flipflops3 : std_logic_vector(1 downto 0);
signal counter_set0 : std_logic; -- sync reset to zero
signal counter_set1 : std_logic;
signal counter_set2 : std_logic;
signal counter_set3 : std_logic;
signal counter_out0 : std_logic_vector(counter_size downto 0) := (others => '0'); -- counter output
signal counter_out1 : std_logic_vector(counter_size downto 0) := (others => '0');
signal counter_out2 : std_logic_vector(counter_size downto 0) := (others => '0');
signal counter_out3 : std_logic_vector(counter_size downto 0) := (others => '0');
signal button0 : std_logic; -- debounce input
signal button1 : std_logic;
signal button2 : std_logic;
signal button3 : std_logic;
signal result0 : std_logic; -- debounced signal
signal result1 : std_logic;
signal result2 : std_logic;
signal result3 : std_logic;
begin
-- Make sure Mercury BaseBoard 7-Seg Display is disabled (anodes are pulled high)
AN <= (others => '1');
-- Feed buttons into debouncers
button0 <= BTN(0);
button1 <= BTN(1);
button2 <= BTN(2);
button3 <= BTN(3);
-- Start or reset the counter at the right time
counter_set0 <= flipflops0(0) xor flipflops0(1);
counter_set1 <= flipflops1(0) xor flipflops1(1);
counter_set2 <= flipflops2(0) xor flipflops2(1);
counter_set3 <= flipflops3(0) xor flipflops3(1);
-- Feed LEDs from the debounce circuitry
LED(0) <= result0;
LED(1) <= result1;
LED(2) <= result2;
LED(3) <= result3;
-- Debounce circuit 0
process (CLK)
begin
if (CLK'EVENT and CLK = '1') then
flipflops0(0) <= button0;
flipflops0(1) <= flipflops0(0);
if (counter_set0 = '1') then -- reset counter because input is changing
counter_out0 <= (others => '0');
elsif (counter_out0(counter_size) = '0') then -- stable input time is not yet met
counter_out0 <= counter_out0 + 1;
else -- stable input time is met
result0 <= flipflops0(1);
end if;
end if;
end process;
-- Debounce circuit 1
process (CLK)
begin
if (CLK'EVENT and CLK = '1') then
flipflops1(0) <= button1;
flipflops1(1) <= flipflops1(0);
if (counter_set1 = '1') then -- reset counter because input is changing
counter_out1 <= (others => '0');
elsif (counter_out1(counter_size) = '0') then -- stable input time is not yet met
counter_out1 <= counter_out1 + 1;
else -- stable input time is met
result1 <= flipflops1(1);
end if;
end if;
end process;
-- Debounce circuit 2
process (CLK)
begin
if (CLK'EVENT and CLK = '1') then
flipflops2(0) <= button2;
flipflops2(1) <= flipflops2(0);
if (counter_set2 = '1') then -- reset counter because input is changing
counter_out2 <= (others => '0');
elsif (counter_out2(counter_size) = '0') then -- stable input time is not yet met
counter_out2 <= counter_out2 + 1;
else -- stable input time is met
result2 <= flipflops2(1);
end if;
end if;
end process;
-- Debounce circuit 3
process (CLK)
begin
if (CLK'EVENT and CLK = '1') then
flipflops3(0) <= button3;
flipflops3(1) <= flipflops3(0);
if (counter_set3 = '1') then -- reset counter because input is changing
counter_out3 <= (others => '0');
elsif (counter_out3(counter_size) = '0') then -- stable input time is not yet met
counter_out3 <= counter_out3 + 1;
else -- stable input time is met
result3 <= flipflops3(1);
end if;
end if;
end process;
end pb;
VHDL has functions but function calls are expressions and not statements or expression statements as in some programming languages. A function call always returns a value of a type and an expression can't represent a portion of a design hierarchy.
Consider the other subprogram class procedures which are statements instead.
The debouncer processes and associated declarations can also be simplified without using a procedure:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity pushbutton is
generic (
counter_size: integer := 19 -- The left bound of debounce counters
);
port (
clk: in std_logic;
btn: in std_logic_vector(0 to 3);
an: out std_logic_vector(0 to 3);
led: out std_logic_vector(0 to 3)
);
end entity pushbutton;
architecture pb1 of pushbutton is
-- There are two flip flops for each of four buttons:
subtype buttons is std_logic_vector(0 to 3);
type flip_flops is array (0 to 1) of buttons;
signal flipflops: flip_flops;
signal counter_set: std_logic_vector(0 to 3);
use ieee.numeric_std.all;
type counter is array (0 to 3) of
unsigned(counter_size downto 0);
signal counter_out: counter := (others => (others => '0'));
begin
an <= (others => '1');
counter_set <= flipflops(0) xor flipflops(1);
DEBOUNCE:
process (clk)
begin
if rising_edge (clk) then
flipflops(0) <= btn;
flipflops(1) <= flipflops(0);
for i in 0 to 3 loop
if counter_set(i) = '1' then
counter_out(i) <= (others => '0');
elsif counter_out(i)(counter_size) = '0' then
counter_out(i) <= counter_out(i) + 1;
else
led(i) <= flipflops(1)(i);
end if;
end loop;
end if;
end process;
end architecture pb1;
Moving part of the design specification into a procedure:
architecture pb2 of pushbutton is
-- There are two flip flops for each of four buttons:
subtype buttons is std_logic_vector(0 to 3);
type flip_flops is array (0 to 1) of buttons;
signal flipflops: flip_flops;
signal counter_set: std_logic_vector(0 to 3);
use ieee.numeric_std.all;
type counter is array (0 to 3) of
unsigned(counter_size downto 0);
signal counter_out: counter := (others => (others => '0'));
procedure debounce (
-- Can eliminate formals of mode IN within the scope of their declaration:
-- signal counter_set: in std_logic_vector (0 to 3);
-- signal flipflops: in flip_flops;
signal counter_out: inout counter;
signal led: out std_logic_vector(0 to 3)
) is
begin
for i in 0 to 3 loop
if counter_set(i) = '1' then
counter_out(i) <= (others => '0');
elsif counter_out(i)(counter_size) = '0' then
counter_out(i) <= counter_out(i) + 1;
else
led(i) <= flipflops(1)(i);
end if;
end loop;
end procedure;
begin
an <= (others => '1');
counter_set <= flipflops(0) xor flipflops(1);
DEBOUNCER:
process (clk)
begin
if rising_edge (clk) then
flipflops(0) <= btn;
flipflops(1) <= flipflops(0);
-- debounce(counter_set, flipflops, counter_out, led);
debounce (counter_out, led);
end if;
end process;
end architecture pb2;
Here the procedure serves as a collection of sequential statements and doesn't save any lines of code.
Sequential procedure calls can be useful to hide repetitious clutter. The clutter has been removed already by consolidating declarations and using the loop statement. There's a balancing act between the design entry effort, code maintenance effort and user readability, which can also be affected by coding style. Coding style is also affected by RTL constructs implying hardware.
Moving the clock evaluation into a procedure would require the procedure call be be a concurrent statement, similar to an instantiation, which you already have. It doesn't seem worthwhile here should you consolidate signals declared as block declarative items in the architecture body or when using a loop statement.
Note that result and button declarations have been eliminated. Also the use of package numeric_std and type unsigned for the counters prevents inadvertent assignment to other objects with the same subtype. The counter values are treated as unsigned numbers while counter_set for instance is not.
Also there's an independent counter for each input being debounced just as in the original. Without independent counters some events might be lost for independent inputs when a single counter is repetitively cleared.
This code hasn't been validated by simulation, lacking a testbench. With the entity both architectures analyze and elaborate.
There doesn't appear to be anything here other than sequential statements now found in a for loop that would benefit from a function call. Because a function call returns a value the type of that value would either need to be a composite (here a record type) or be split into separate function calls for each assignment target.
There's also the generate statement which can elaborate zero or more copies of declarations and concurrent statements (here a process) as block statements with block declarative items. Any signal used only in an elaborated block can be a block declarative item.
architecture pb3 of pushbutton is
begin
DEBOUNCERS:
for i in btn'range generate
signal flipflops: std_logic_vector (0 to 1);
signal counter_set: std_logic;
signal counter_out: unsigned (counter_size downto 0) :=
(others => '0');
begin
counter_set <= flipflops(0) xor flipflops(1);
DEBOUNCE:
process (clk)
begin
if rising_edge (clk) then
flipflops(0) <= btn(i);
flipflops(1) <= flipflops(0);
if counter_set = '1' then
counter_out <= (others => '0');
elsif counter_out(counter_size) = '0' then
counter_out <= counter_out + 1;
else
led(i) <= flipflops(1);
end if;
end if;
end process;
end generate;
end architecture pb3;
Addendum
The OP pointed out an error made in the above code due to a lack of simulation and complexity hidden by abstraction when synthesizing architecture pb2. While the time for the debounce counter was given at 10.5 ms (50 MHz clock) the name of the generic (counter_size) is also actually the left bound of the counter (given as an unsigned binary counter using type unsigned).
The mistake (two flip flops in the synchronizer for each of four buttons) and simply acceding to the OP's naming convention with respect to the counter has been corrected in the above code.
The OP's synthesis error in the comment relates to the requirement there be a matching element for each element on the left hand or right hand of an aassignment statement.
Without synthesizing the code (which the OP did) the error can't be found without simulation. Because the only thing necessary to find the particular error assigning flipflops(0) is the clock a simple testbench can be written:
use ieee.std_logic_1164.all;
entity pushbutton_tb is
end entity;
architecture fum of pushbutton_tb is
signal clk: std_logic := '0';
signal btn: std_logic_vector (0 to 3);
signal an: std_logic_vector(0 to 3);
signal led: std_logic_vector(0 to 3);
begin
CLOCK:
process
begin
wait for 0.5 ms;
clk <= not clk;
if now > 50 ms then
wait;
end if;
end process;
DUT:
entity work.pushbutton (pb2)
generic map (
counter_size => 4 -- FOR SIMULATION
)
port map (
clk => clk,
btn => btn,
an => an,
led => led
);
STIMULUS:
process
begin
btn <= (others => '0');
wait for 20 ms;
btn(0) <= '1';
wait for 2 ms;
btn(1) <= '1';
wait for 3 ms;
btn(2) <= '1';
wait for 6 ms;
btn(3) <= '1';
wait;
end process;
end architecture;
The corrected code and a testbench to demonstrate there are no matching element errors in assignment during simulation.
Simulation was provided for both architectures with identical results.
The generic was used to reduce the size of the debounce counters using a 1 millisecond clock in the testbench (to avoid simulation time with 50 MHz clock events that don't add to the narrative).
Here's the output of the first architecture's simulation:
The caution here is that designs should be simulated. There's a class of VHDL semantic error conditions that are checked only at runtime (or in synthesis).
Added abstraction for reducing 'uniquified' code otherwise identically performing can introduce such errors.
The generate statement wouldn't have that issue using names in a design hierarchy:
The concurrent statements and declarations found in a generate statement are replicated in any generated block statements implied by the generate statement. Each block statement represents a portion of a design hierarchy.
There's been a trade off between design complexity and waveform display organization for debugging.
A design description depending on hiding repetitious detail should be simulated anyway. Here there are two references to the generate parameter i used in selected names, susceptible to the same errors as ranges should parameter substitution be overlooked.
A multiple bit debouncing circuit might look like this:
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
use work.Utilities.all;
entity Debouncer is
generic (
CLOCK_PERIOD_NS : positive := 10;
DEBOUNCE_TIME_MS : positive := 3;
BITS : positive
);
port (
Clock : in std_logic;
Input : in std_logic_vector(BITS - 1 downto 0);
Output : out std_logic_vector(BITS - 1 downto 0) := (others => '0')
);
end entity;
architecture rtl of Debouncer is
begin
genBits: for i in Input'range generate
constant DEBOUNCE_COUNTER_MAX : positive := (DEBOUNCE_TIME_MS * 1000000) / CLOCK_PERIOD_NS;
constant DEBOUNCE_COUNTER_BITS : positive := log2(DEBOUNCE_COUNTER_MAX);
signal DebounceCounter : signed(DEBOUNCE_COUNTER_BITS downto 0) := to_signed(DEBOUNCE_COUNTER_MAX - 3, DEBOUNCE_COUNTER_BITS + 1);
begin
process (Clock)
begin
if rising_edge(Clock) then
-- restart counter, whenever Input(i) was unstable within DEBOUNCE_TIME_MS
if (Input(i) /= Output(i)) then
DebounceCounter <= DebounceCounter - 1;
else
DebounceCounter <= to_signed(DEBOUNCE_COUNTER_MAX - 3, DebounceCounter'length);
end if;
-- latch input bit, if input was stable for DEBOUNCE_TIME_MS
if (DebounceCounter(DebounceCounter'high) = '1') then
Output(i) <= Input(i);
end if;
end if;
end process;
end generate;
end architecture;
In stead of a counter size, it expects the user to provide a frequency (as period in nanoseconds) and a debounce time (in milliseconds).
The referenced package implements a log2 function.

Error(10820) and (10822) VHDL

i am trying to write a code but i get error, i dont understand that, i am new to vhdl, any help would be appreciated.
code:
entity counter is
port
(
upp_down : in std_logic;
rst : in std_logic;
pressed : in std_logic;
count : out std_logic_vector(3 downto 0)
);
end entity;
architecture rtl of counter is
signal count_value: std_logic_vector(3 downto 0);
begin
process (rst,pressed,upp_down)
begin
if(rst'event and rst = '0') then
count <= "0000";
else
if(pressed'event and pressed = '0' ) then
if(upp_down = '1') then
count_value <= count_value + 1;
elsif(upp_down = '0') then
count_value <= count_value - 1;
end if;
end if;
end if;
end process;
count <= count_value;
end rtl;
Errors:
Error (10820): Netlist error at counter.vhd(28): can't infer register for count_value[1] because its behavior depends on the edges of multiple distinct clocks
Error (10822): HDL error at counter.vhd(28): couldn't implement registers for assignments on this clock edge
The first problem is that you're trying to use the edge of two different 'clocks' in one process. A particular process can only respond to one clock.
The second problem is that your code does not translate into any real-world hardware. There's nothing in the FPGA that can respond to there not being an edge of a clock, which is what you have described with your if(rst'event and rst = '0') then else structure.
Nicolas pointed out another problem (which your compiler didn't get as far as), which is that you're assigning count both inside and outside a process; this is not allowed, as signals can only be assigned in one process.
Generally the type of reset it looks like you're trying to implement would be written as in the example below:
process (rst,pressed,upp_down)
begin
if(rst = '0') then
count_value <= "0000";
elsif(pressed'event and pressed = '0' ) then
if(upp_down = '1') then
count_value <= count_value + 1;
elsif(upp_down = '0') then
count_value <= count_value - 1;
end if;
end if;
end process;
count <= count_value;
The reason for changing the reset to affect count_value, is that without this, the effect of your reset would only last one clock cycle, after which the count would resume from where it left off (Thanks #Jim Lewis for this suggestion).
In addition to your compile errors, you should try to use the rising_edge() or falling_edge() functions for edge detection, as they behave better than the 'event style.
The reset can be more easily implemented using count_value <= (others => '0'); this makes all elements '0', no matter how long count is.
Lastly, it looks like you're using the std_logic_arith package. There are many other answers discouraging the use of this package. Instead, you should use the numeric_std package, and have your counter of type unsigned. If your output must be of type std_logic_vector, you can convert to this using a cast: count <= std_logic_vector(count_value);.
One more thing, I just noticed that your counter is not initialised; this can be done in the same way as I suggested for the reset function, using the others syntax.
"count" can't be assigned inside and outside a process.
count <= "0000"; <-- inside process
count <= count_value; <-- outside process.
You should do "count <= count_value;" inside your process :
entity counter is
port
(
upp_down : in std_logic;
rst : in std_logic;
pressed : in std_logic;
count : out std_logic_vector(3 downto 0)
);
end entity;
architecture rtl of counter is
signal count_value: std_logic_vector(3 downto 0);
begin
process (rst,pressed,upp_down)
begin
if(rst'event and rst = '0') then
count <= "0000";
else
if(pressed'event and pressed = '0' ) then
if(upp_down = '1') then
count_value <= count_value + 1;
elsif(upp_down = '0') then
count_value <= count_value - 1;
end if;
count <= count_value;
end if;
end if;
end process;
end rtl;

Unexpected delays with register VHDL

Found a strange occurance with this register I coded. I'm very new to VHDL, but I was taught when writing a value to output ports like data_out you should always use a "middleman" signal to transfer your value. Here I tried to use the signal "data" to transfer the signal, but this implementation resulted in a delay before data_out changes (when ld goes high). Taking out data completely and coding how I would in a C program removes this delay and the register works perfectly. Any idea on why this is and why I shouldn't do this?
Broken code:
entity register is
generic (
DATA_WIDTH : natural := 12);
port (
data_in : in std_logic_vector(DATA_WIDTH-1 downto 0);
ld : in std_logic;
clk : in std_logic;
rst_L : in std_logic;
data_out : out std_logic_vector(DATA_WIDTH-1 downto 0));
end entity register;
architecture bhv of register is
signal data : std_logic_vector(DATA_WIDTH-1 downto 0);
begin -- bhv
REG : process (clk, rst_L, ld, data_in)
begin -- process REG
if rst_L = '0' then
data <= (others => '0');
else
if clk'event and clk = '1' then
if ld = '1' then
data <= data_in;
end if;
end if;
end if;
data_out <= data;
end process REG;
end architecture bhv;
Changes for process that made it work:
REG : process (clk, rst_L, ld, data_in)
begin -- process REG
if rst_L = '0' then
data <= (others => '0');
else
if clk'event and clk = '1' then
if ld = '1' then
data_out <= data_in;
end if;
end if;
end if;
end process REG;
Just wanted to know what I did wrong and why I even have to use a signal to transfer the value if the code works fine. Thanks!
The problem in the broken process code is that data signal is not updated for
read until after a delta delay, thus the data_out update by
data_out <= data assign is postponed until next execution of the process code, thereby giving a delay in simulation.
Note that the ld and data_in in the sensitivity list of the initial process are not required, since use of these are guarded by rising clk.
Update of the code can be:
reg : process (clk, rst_L)
begin -- process REG
if rst_L = '0' then
data <= (others => '0');
else
if clk'event and clk = '1' then -- May be written as rising_edge(clk) instead
if ld = '1' then
data <= data_in;
end if;
end if;
end if;
end process REG;
data_out <= data;
It may be useful to take a look at
VHDL's crown jewel for some information about processes and delta cycles in VHDL.
Note that register is a VHDL reserved word, so it can't be used as identifier
for the entity.
Unexpected delays with register VHDL
The problem with the original process is that data is not in the sensitivity list, delaying the assignment to data_out until the next time the process executes instead of the next simulation data cycle. The process will execute on any transaction of a signal in the sensitivity list - originally as you indicated ld.
Adding data to the sensitivity list of your original process:
REG : process (clk, rst_L, data)
begin -- process REG
if rst_L = '0' then
data <= (others => '0');
else
if clk'event and clk = '1' then
if ld = '1' then
data <= data_in;
end if;
end if;
end if;
data_out <= data;
end process REG;
Will allow the assignment to data_out to occur in a delta simulation cycle instead of waiting until a transition on a signal currently in the sensitivity list.
I removed the extraneous signals from the sensitivity list for the same reason Morten did in his answer. These were causing execution of the process at other times, but not when a transaction occurred on data.
And yes, as long as you don't have the data_out signal in an expression (e.g. on the right hand side of an assignment statement) in your architecture you don't need the intermediary variable data and your simulation will go a bit faster.
Your changed process is the most efficient implementation once the sensitivity list is pared down.
Simply adding data to the sensitivity list would have caused the original process to simulate correctly, feel free to try it.

VHDL program to count upto 10 in 4 bit up counter....?

library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_signed.all;
entity counter is
port(CLK, CLR : in std_logic;
output : inout std_logic_vector(3 downto 0));
end counter;
architecture archi of counter is
signal tmp: std_logic_vector(3 downto 0);
begin
process (CLK, CLR)
variable i: integer:=0;
begin
if (CLR='1') then
tmp <= "0000";
elsif (clk = '1') then
for i in 0 to 6 loop
tmp <= tmp + 1;
end loop;
end if;
to count upto 7 i have done for i in 0 to 10. it is not showing any error but it counts from 0000 to 1111
end process;
output <= tmp;
end architecture;
could you please suggest how to do it....sorry for wrong grammar in english
Needs to operate off one clock edge
Because your counter port has clk in it, we can assume you want the counter to count synchronous to the clock.
You're operating off of both clock edges
elsif (clk = '1') then
should be something like
elsif clk'event and clk = '1' then
or
elsif rising_edge(clk) then
These examples use the rising edge of clk. You can't synthesize something that uses both clock edges under the IEEE-1076.6 IEEE Standard for VHDL Register
Transfer Level (RTL) Synthesis. It's not a recognized clocking method.
Making a modulo 10 counter
Under the assumption you want the counter to go from 0 to 9 and rollover this
for i in 0 to 6 loop
tmp <= tmp + 1;
end loop;
Should be something like
if tmp = "1001" then # binary 9
tmp <= (others => '0'); # equivalent to "0000"
else
tmp <= tmp + 1;
end if;
And this emulates a synchronous load that takes priority over increment driven by an external 'state' recognizer. With an asynchronous clear it would emulate an 74163 4 bit counter with an external 4 input gate recognizing "1001" and producing a synchronous parallel load signal loading "0000".
What's wrong with the loop statement
The loop process as shown would result in a single increment and resulting counter rollover at "1111" like you describe. You could remove the for ... loop and end loop; statements and it would behave identically. There's only one schedule future update for a signal for each driver, and a process only has one driver for each signal it assigns. All the loop iterations occur at the same clk event. tmp won't get updated until the next simulation cycle (after the loop is completed) and it's assignment is identical in all loop iterations, the expression tmp + 1. The last loop iterated assignment would be the one that actually occurs and the value it assigns would be identical.
Using a loop statement isn't necessary when counter is state driven (state ≃ tmp). The additional state represented by i isn't needed.
entity mod10 is
Port ( d : out std_logic_vector(3 downto 0);
clr: in std_logic;
clk : in std_logic);
end mod10;
architecture Behavioral of mod10 is
begin
process(clk)
variable temp:std_logic_vector(3 downto 0);
begin
if(clr='1') then temp:="0000";
elsif(rising_edge(clk)) then
temp:=temp+1;
if(temp="1010") then temp:="0000";
end if;
end if;
d<=temp;
end process;
end Behavioral;

resetting values in a VHDL register and stop writing further

I have simple register and getting single bit values from 5 state machines (all at one time). These values are stored in a register as std_logic_vector and has to be given as an input to another module. Once the output of this register is being processed in another module, the index in the register where there was a change (e,g 0 to 1), the value at that index should reset (e,g 1 to 0) and it should take no further input for that particular index (but there is constant input coming from state machines). Any suggestion, how it should be done?
The register code is:
entity fault_reg is
port (
clk : in std_logic;
rst : in std_logic;
reg_in : in std_logic_vector(NUM_PORTS - 1 downto 0);
reg_out : out std_logic_vector(NUM_PORTS - 1 downto 0));
end fault_reg;
architecture Behavioral of fault_reg is
begin
reg_impl : process(clk, rst)
begin
if rst = '1' then
reg_out <= (others => '0');
elsif clk'event and clk='1' then
reg_out <= reg_in;
end if;
end process reg_impl;
end Behavioral;
I'm not entirely sure what you are asking, but it seems to me you want something like:
initialise your reg_out to all ones
then in the clocked process do a for loop to iterate over all the input bits and clear the bits which are set in the input
Like this:
reg_out <= reg_in;
for i in reg_in'range loop
if reg_in(i) = '1' then
masked_bits(i) := '1';
end if;
if masked_bits(i) = '1' then
reg_out(i) <= '0';
end if;
end loop;

Resources