Making a 4-bit ALU from several 1-bit ALUs - vhdl

I'm trying to combine several 1 bit ALUs into a 4 bit ALU. I am confused about how to actually do this in VHDL. Here is the code for the 1bit ALU that I am using:
component alu1 -- define the 1 bit alu component
port(a, b: std_logic_vector(1 downto 0);
m: in std_logic_vector(1 downto 0);
result: out std_logic_vector(1 downto 0));
end alu1;
architecture behv1 of alu1 is
begin
process(a, b, m)
begin
case m is
when "00" =>
result <= a + b;
when "01" =>
result <= a + (not b) + 1;
when "10" =>
result <= a and b;
when "11" =>
result <= a or b;
end case
end process
end behv1
I am assuming I define alu1 as a component of the larger entity alu4, but how can I tie them together?

Interesting you would even ask that question. VHDL synthesizers are quite capable of inferring any adder you like. You can just type what you need:
use ieee.numeric_std.all;
...
signal r : unsigned(3 downto 0);
signal a : unsigned(2 downto 0);
signal b : unsigned(2 downto 0);
signal c : unsigned(2 downto 0);
...
r <= a + b + c;
Then you can slice r to fit your needs:
result <= std_logic_vector(r(2 downto 0));

You can't (easily) string together these 1-bit ALUs into a functional multiple bit version. There is no way to handle the carry in/out needed for your add and subtract modes to work properly (the bitwise and & or should work OK, however).
Ignoring the carry issue for the moment, you would typically just setup a for generate loop and instantiate multiple copies of your bitwise logic, possibly special casing the first and/or last elements, ie:
MyLabel : for bitindex in 0 to 3 generate
begin
alu_x4 : entity work.alu1
port map (
a => input_a(bitindex),
b => input_b(bitindex),
m => mode,
result => result_x4(bitindex) );
end generate;

Related

VHDL: big slv array slicing indexed by integer (big mux)

I want to slice a std_logic_vector in VHDL obtaining parts of it of fixed dimensions.
The general problem is:
din N*M bits
dout M bits
sel clog2(N) bits
Expected behaviour in an example (pseudocode): input 16 bit, want to slice it in 4 subvectors of 4bit each.
signal in: std_logic_vector(N*M-1 downto 0);
signal sel: integer;
-- with sel = 0
output <= in(N-1:0);
--with sel = 1 output <= in(2N-1:N)
-- with sel = 2
output <= in(3N-1:2N)
.....
--with sel = M-1
output <= in(M*N-1:(M-1)N)
I know a couples of way to do this, but I don't know which one is the best practice and give the best results in synthesis.
the entity
din: in std_logic_vector(15 downto 0);
dout: out std_logic_vector(3 downto 0);
sel: in std_logic_vecotor(1 downto 0)
CASE STATEMENT
case sel is
when "00" => dout <= din(3:0);
when "01" => dout <= din(7:4);
when "10" => dout <= din(11:8);
when "11" => dout <= din(15:12);
when others => ....`
It clearly implement a mux, but it's not generic at all and If the input gets big it's really hard to write and to codecover.
INTEGER INDEXING
sel_int <= to_integer(unsigned(sel));
dout <= din(4*(sel_int+1) - 1 downto 4*sel_int);
Extremely easy to write and to mantain, BUT it can have problems when the input is not a power of 2. For example, if I want to slice a 24bit vector in chunks of 4, what happen when the integer conversion of sel brings to the index 7?
A STRANGE TRADEOFF
sel_int <= to_integer(unsigned(sel));
for i in 0 to 4 generate
din_slice(i) <= din(4*(i+1)-1 downto 4*i);
end generate dout <= din_slice(sel_int);
I'm searching a solution that is general enough to be used with various input/output relationships and safe enough to be synthesized consistently everytime.
The Case statement is the only one with the Others case (that feels really safe), the other solutions rely on the slv to integer conversion and indexing that feels really comfortable but not so reliable.
Which solution would you use?
practical usecase
I have a 250bit std_logic_vector and I need to select 10 contigous bits inside of it starting from a certain point from 0 to 239. How can I do that in a way that is good for synthesis?
There is another option that is accepted by tools that allow VHDL 2008 (which includes Vivado and Prime Pro). You can use an unconstrained 2d type from a package:
type slv_array_t is array(natural range <>) of std_logic_vector; --vhdl 2008 unconstrained array type
then you can simply select which port you want. And it is as generic as you like.
library ieee;
use ieee.std_logic_1164.all;
use work.my_pkg.all;
entity mux is
generic (
N : natural;
M : natural
);
port (
sel : in natural;
ip : in slv_array_t (N-1 downto 0)(M-1 downto 0);
op : out std_logic_vector (M-1 downto 0);
);
end entity;
architecture rtl of mux is
begin
op <= ip(sel);
end architecture;
First you must extend the incoming data to be sure to have always as much bits as you need for connecting all multiplexer inputs (see the code below, process p_extend).
This will not create any logic at synthesis.
Second you must convert the resulting vector into an array, which you can access later by an index (see the code below, process p_create_array).
Again this will not create any logic at synthesis.
At last you must access this array by the select input signal (see the code below, process p_mux).
library ieee;
use ieee.std_logic_1164.all;
entity mux is
generic (
g_data_width : natural := 250;
g_slice_width : natural := 10;
g_sel_width : natural := 5;
g_start_point : natural := 27
);
port (
d_i : in std_logic_vector(g_data_width-1 downto 0);
sel_i : in std_logic_vector(g_sel_width-1 downto 0);
d_o : out std_logic_vector(g_slice_width-1 downto 0)
);
end entity mux;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
architecture struct of mux is
signal data : std_logic_vector(g_slice_width * 2**g_sel_width-1 downto 0);
type t_std_logic_slice_array is array (natural range <>) of std_logic_vector(g_slice_width-1 downto 0);
signal mux_in : t_std_logic_slice_array (2**g_sel_width-1 downto 0);
begin
p_extend: process(d_i)
begin
for i in 0 to g_slice_width * 2**g_sel_width-1 loop
if i+g_start_point<g_data_width then
data(i) <= d_i(i+g_start_point);
else
data(i) <= '0';
end if;
end loop;
end process;
p_create_array: process (data)
begin
for i in 0 to 2**g_sel_width-1 loop
mux_in(i) <= data((i+1)*g_slice_width-1 downto i*g_slice_width);
end loop;
end process;
p_mux: d_o <= mux_in(to_integer(unsigned(sel_i)));
end architecture;

VHDL : Internal signals are undefined even when defined in the architecture declaration section

So I've been working on some homework for my VHDL course and I can't seem to understand this problem.
The point here is to create the adder/subtractor of an ALU that works both on 2's complement and unsigned 32-bit buses, which is why I have a condition called sub_mode ( A - B = A + !B + 1 ) which will also be the carry-in when activated.
The rest of the different inputs and outputs are pretty self-explanatory.
My problem is with the testbenching of such component where, even though carry_temp and r_temp have been initialized in declaration section of the architecture, end up showing up undefined. I have guessed that it is due to the for loop within the process screwing everything up. Would that be an accurate guess? And if yes, is it possible to proceed to add two bit buses together without having to fully create an n-bit adder made from n 1-bit adder components?
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity add_sub is
port(
a : in std_logic_vector(31 downto 0);
b : in std_logic_vector(31 downto 0);
sub_mode : in std_logic;
carry : out std_logic;
zero : out std_logic;
r : out std_logic_vector(31 downto 0)
);
end add_sub;
architecture synth of add_sub is
signal cond_inv : std_logic_vector(31 downto 0);
signal carry_temp : std_logic_vector(32 downto 0) := (others => '0');
signal r_temp : std_logic_vector(31 downto 0) := (others => '0');
begin
behave : process(a,b,sub_mode)
begin
if sub_mode = '1' then
cond_inv <= b xor x"ffffffff";
else
cond_inv <= b;
end if;
carry_temp(0) <= sub_mode;
for i in 0 to 31 loop
r_temp(i) <= a(i) xor cond_inv(i) xor carry_temp(i);
carry_temp(i+1) <=
(a(i) and cond_inv(i)) or
(a(i) and carry_temp(i)) or
(cond_inv(i)and carry_temp(i));
end loop;
if r_temp = x"00000000" then
zero <= '1';
else
zero <= '0';
end if;
r <= r_temp;
carry <= carry_temp(32);
end process behave;
end synth;

TestBench for Bitwise Operators

Can someone help me to create a TestBench Program for the below Program, please?
library ieee;
use ieee.std_logic_1164.all;
entity bitwise is
port( a,b : in std_logic_vector(4 downto 0);
result1, result2, result3, result4, result5, result6 : out std_logic_vector(4 downto 0));
end bitwise;
architecture arch of bitwise is
begin
result1 <= a and b;
result2 <= a or b;
result3 <= a xor b;
result4 <= not a;
result5 <= to_stdlogicvector(to_bitvector(a) sll 1);
result6 <= to_stdlogicvector(to_bitvector(a) srl 1);
end arch;
My Test Bench Program is below: I am stuck to in the Stimulus process where we have to test each and every possibility. It could be either a loop version or just testing possible numbers for each operator.
LIBRARY ieee;
USE ieee.std_logic_1164.all;
entity test_bitwise is
end test_bitwise;
architecture behavior of test_bitwise is
component bitwise;
port( a,b : in std_logic_vector(4 downto 0);
result1, result2, result3, result4 : out std_logic_vector(4 downto 0));
end component;
--INPUTS
signal tb_a : std_logic_vector(4 downto 0) := (others => '0');
`signal tb_b : std_logic_vector(4 downto 0) := (others => '0');
--OUTPUTS
signal tb_result1 : std_logic_vector(7 downto 0);
signal tb_result2 : std_logic_vector(7 downto 0);
signal tb_result3 : std_logic_vector(7 downto 0);
signal tb_result4 : std_logic_vector(7 downto 0);
begin
-- INSTANTIATE THE UNIT UNDER TEST (UUT)
U1_Test : entity work.test_bitwise(behavioral)
port map (a => tb_a,
b => tb_b,
result1 <= tb_result1,
result2 <= tb_result2,
result3 <= tb_result3,
result4 <= tb_result4);
--STIMULUS PROCESS
stim_proc : process
begin
-- CODE HERE
end process;
end behavior;
As others have stated in the comments, you should provide some input yourself. What have you tried and why didn't it succeed? If you have hard time to find out what to try and how to start, you could begin by doing the following. And if you don't succeed, you can then edit your question or post a new one so the other members can help you.
Use a for loop to iterate over each and every possibility. Writing all the possible values to test by hand would be exhausting.
Because you have two inputs, use two nested for loops inside your process. One iterates the values for input a and the other one for b. Check here how a for loop is written.
Inside the loops, assign values to your signals tb_a and tb_b. The loop indices are integers, so you have to convert them to std_logic_vector type before assigning. Check here for a short tutorial about VHDL conversions.
Add some delay after each iteration with wait.
Print the output values for example to simulator console with report, or you can even use assert statement.

Unsigned multiplication in VHDL 4bit vector?

im making an ALU with an option to do A + 2B
but im having trouble getting my head around multiplying the 2B and getting the proper answer in my test bench.
EG: A = 0110 B = 0011
Equation is A + 2B
and im getting 0110
a snippit of my code is
entity ALU is
port( A :IN STD_LOGIC_VECTOR(3 DOWNTO 0) ;
B :IN STD_LOGIC_VECTOR(3 DOWNTO 0) ;
S0 :IN STD_LOGIC ;
S1 :IN STD_LOGIC ;
M :IN STD_LOGIC ;
C0 :IN STD_LOGIC ;
Cout :OUT STD_LOGIC ;
Z :OUT STD_LOGIC ;
F :OUT STD_LOGIC_VECTOR(3 DOWNTO 0));
SIGNAL VariableAlu : STD_LOGIC_VECTOR(3 DOWNTO 0);
SIGNAL FTEMP : STD_LOGIC_VECTOR(3 DOWNTO 0);
SIGNAL FTEMP2 : STD_LOGIC_VECTOR(4 DOWNTO 0);
SIGNAL ZTEMP : STD_LOGIC;
SIGNAL BTEMP1 : STD_LOGIC_VECTOR(4 DOWNTO 0);
END ALU ;
PROCESS(A,B,S0,S1,M,C0)
BEGIN
VariableAlu <= (S0 & S1 & C0 & M);
--M = 1 ARITHMETIC
(part that shifts it, lab teacher told us to do this)
BTEMP1(4 DOWNTO 1)<= B;
BTEMP1(0)<= '0';
when "1111" => FTEMP2 <= ((A) + BTEMP1);
any help would be greatly appreciated.
In addition to what GSM said, you can also just write what you want. I.e. a multiplication by 2. Synthesis software is smart enough to recognize what you are doing.
What you have to remember is that the result will be too large, so it has to be resized.
library IEEE;
use IEEE.std_logic_1164.all;
entity input_output_adder is
port (
input_a : in std_logic_vector(4 downto 0);
input_b : in std_logic_vector(4 downto 0);
output : out std_logic_vector(4 downto 0)
);
end entity;
architecture rtl of input_output_adder is
use IEEE.numeric_std.all;
begin
output <= std_logic_vector(unsigned(input_a) + resize((unsigned(input_b) * 2), 5));
end architecture;
This will result in only LUTs... nu multipliers.
Result from Vivado:
Result from Quartus:
There are a few things to note about your code. Firstly, for any arithmetic, avoid using SLV and stick with unsigned or signed types from the numeric_std library.
Your explicit shift (multiplication by 2) for the operand B:
BTEMP1(4 DOWNTO 1)<= B;
BTEMP1(0)<= '0';
Is, a) not required, and b) verbose. You can achieve this by simply doing BTEMP <= B & '0';, or better yet, don't even use an intermediary signal and assign directly to FTEMP2 in the switch statement. eg.
when "1111" => FTEMP2 <= std_logic_vector(unsigned(A) + unsigned(B&'0'));
Note the conversions in the above line. They are required, as by default, SLV's do not support the + operator (unless you use the std_logic_unsigned or std_logic_signed libraries). You will need to include the numeric_std library for this.
EDIT:
I also forgot to mention that FTEMP will potentially overflow for the given function; F <= A + 2B, where A and B are both 4 bits and F is 5 bits.

Query on VHDL generics in packages

I have written a simple VHDL code to add two matrices containing 32 bit floating point numbers. The matrix dimensions have been defined in a package. Currently, I specify the matrix dimensions in the vhdl code and use the corresponding type from the package. However, I would like to use generic in the design to deal with matrices of different dimensions. For this I would have to somehow use the right type defined in the package. How do I go about doing this?
My current VHDL code is as below.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
use work.mat_pak.all;
entity newproj is
Port ( clk : in STD_LOGIC;
clr : in STD_LOGIC;
start : in STD_LOGIC;
A_in : in t2;
B_in : in t2;
AplusB : out t2;
parallel_add_done : out STD_LOGIC);
end newproj;
architecture Behavioral of newproj is
COMPONENT add
PORT (
a : IN STD_LOGIC_VECTOR(31 DOWNTO 0);
b : IN STD_LOGIC_VECTOR(31 DOWNTO 0);
clk : IN STD_LOGIC;
sclr : IN STD_LOGIC;
ce : IN STD_LOGIC;
result : OUT STD_LOGIC_VECTOR(31 DOWNTO 0);
rdy: OUT STD_LOGIC
);
END COMPONENT;
signal temp_out: t2 := (others=>(others=>(others=>'0')));
signal add_over: t2bit:=(others=>(others=>'0'));
signal check_all_done,init_val: std_logic:='0';
begin
init_val <= '1';
g0: for k in 0 to 1 generate
g1: for m in 0 to 1 generate
add_instx: add port map(A_in(k)(m), B_in(k)(m), clk, clr, start, temp_out(k)(m), add_over(k)(m));
end generate;
end generate;
g2: for k in 0 to 1 generate
g3: for m in 0 to 1 generate
check_all_done <= add_over(k)(m) and init_val;
end generate;
end generate;
p1_add:process(check_all_done,temp_out)
begin
AplusB <= temp_out;
parallel_add_done <= check_all_done;
end process;
end Behavioral;
My package is as below
library IEEE;
use IEEE.STD_LOGIC_1164.all;
use IEEE.NUMERIC_STD.ALL;
package mat_pak is
subtype small_int is integer range 0 to 2;
type t22 is array (0 to 1) of std_logic_vector(31 downto 0);
type t2 is array (0 to 1) of t22; --2*2 matrix
type t22bit is array (0 to 1) of std_logic;
type t2bit is array (0 to 1) of t22bit; --2*2 matrix bit
type t33 is array (0 to 2) of std_logic_vector(31 downto 0);
type t3 is array (0 to 2) of t33; --3*3 matrix
end mat_pak;
Any suggestions would be welcome. Thank you.
There are some logical issues with your design.
First, there's some maximum number of ports for a sub-hierarchy a design can tolerate, you have 192 'bits' of matrix inputs and outputs. Do you really believe this number should be configurable?
At some point it will only fit in the very large FPGA devices, and shortly thereafter not fit there either.
Imagining some operation taking a variable number of clocks in add and parallel_add_done signifies when an aplusb datum is available comprised of elements of the matrix array contributed by all instantiated add components, the individual rdy signals are ANDed together. If the adds all take the same amount of time you could take the rdy from anyone of them (If you silicon is not that deterministic it would not be usable, there are registers in add).
The nested generate statements all assign the result of the AND between add_over(k,m) and init_val (which is a synthesis constant of 1). The effect or wire ANDing add_over(k.m) bits together (which doesn't work in VHDL and is likely not achievable in synthesis, either).
Note I also showed the proper indexing method for the two dimensional arrays.
Using Jonathan's method of sizing matrixes:
library ieee;
use ieee.std_logic_1164.all;
package mat_pak is
type matrix is array (natural range <>, natural range <>)
of std_logic_vector(31 downto 0);
type bmatrix is array (natural range <>, natural range <>)
of std_logic;
end package mat_pak;
library ieee;
use ieee.std_logic_1164.all;
use work.mat_pak.all;
entity newproj is
generic ( size: natural := 2 );
port (
clk: in std_logic;
clr: in std_logic;
start: in std_logic;
a_in: in matrix (0 to size - 1, 0 to size - 1);
b_in: in matrix (0 to size - 1, 0 to size - 1);
aplusb: out matrix (0 to size - 1, 0 to size - 1);
parallel_add_done: out std_logic
);
end entity newproj;
architecture behavioral of newproj is
component add
port (
a: in std_logic_vector(31 downto 0);
b: in std_logic_vector(31 downto 0);
clk: in std_logic;
sclr: in std_logic;
ce: in std_logic;
result: out std_logic_vector(31 downto 0);
rdy: out std_logic
);
end component;
signal temp_out: matrix (0 to size - 1, 0 to size - 1)
:= (others => (others => (others => '0')));
signal add_over: bmatrix (0 to size - 1, 0 to size - 1)
:= (others => (others => '0'));
begin
g0:
for k in 0 to size - 1 generate
g0x:
for m in 0 to size - 1 generate
add_instx: add
port map (
a => a_in(k,m),
b => b_in(k,m),
clk => clk,
sclr => clr,
ce => start,
result => temp_out(k,m),
rdy => add_over(k,m)
);
end generate;
end generate;
aplusb <= temp_out;
p1_add:
process (add_over)
variable check_all_done: std_logic;
begin
check_all_done := '1';
for k in 0 to size - 1 loop
for m in 0 to size -1 loop
check_all_done := check_all_done and add_over(k,m);
end loop;
end loop;
parallel_add_done <= check_all_done;
end process;
end architecture behavioral;
We find that we really want to AND the various rdy outputs (add_over array) together. In VHDL -2008 this can be done with the unary AND, otherwise you're counting on a synthesis tool to flatten the AND (and they generally do).
I made the assignment to aplusb a concurrent assignment.
So I dummied up an add entity with an empty architecture, the above then analyzes, elaborates and simulates, which shows that none of the connectivity has length mismatches anywhere.
I'm not quite sure to understand perfectly, but I'll try to answer anyway ;)
You can use unconstrained array like this:
package mat_pak is
type matrix is array(natural range <>, natural range <>) of std_logic_vector(31 downto 0);
end package mat_pack;
entity newproj is
Generic ( size : natural );
Port ( clk : in STD_LOGIC;
clr : in STD_LOGIC;
start : in STD_LOGIC;
A_in : in matrix(0 to size-1, 0 to size-1);
B_in : in matrix(0 to size-1, 0 to size-1);
AplusB : out matrix(0 to size-1, 0 to size-1);
parallel_add_done : out STD_LOGIC);
end newproj;

Resources