Design of MAC unit (dsp processors) using VHDL

Design of MAC unit (dsp processors) using VHDL - vhdl

My project is design of 32bit MAC(Multiply and Accumlate) unit using reversible logic. For the project , i have designed 32bit mulitplier and 64 bit adder using reversible logic. Now, in the next step i want to design a 64 bit accumlator which takes the value from the adder and stores it and adds with the previous value present in it. I am not getting any idea how to design Accumlator.
Please help in completion of my project.

A basic VHDL accumulator can be implemented in only a few lines of code. How you decide to implement it, and any additional features necessary are going to depend on your specific requirements.
For example:
Are the inputs signed or unsigned?
What is the type of the inputs?
Does the accumulator saturate, or will it roll over?
Here is a sample unsigned accumulator to give you an idea of what you need to implement (based on this source):
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity accumulator is
port (
DIN: in std_logic_vector(3 downto 0);
CLK: in std_logic;
RST: in std_logic;
DOUT: out std_logic_vector(3 downto 0)
);
end entity accumulator;
architecture behave of accumulator is
signal acc_value : std_logic_vector(3 downto 0);
begin
process(CLK)
begin
if rising_edge(CLK) then
if RST='1' then
acc_value <= (others => '0'); -- reset accumulated value to 0
else
acc_value <= std_logic_vector( unsigned(acc_value) + unsigned(DIN) );
end if;
end if;
end process;
-- Assign output
DOUT <= acc_value;
end behave;
To describe what this design does in words: Every clock cycle on the rising edge, the data input DIN is interpreted as an unsigned value, and added to the currently accumulated value acc_value. If the RST input is asserted, instead of accumulating the DIN input, the accumulated value is cleared back to 0. The value of the accumulator is always presented on the output of the block, DOUT.
Based on what you are interfacing with, you might want to consider the following changes/modifications:
Perhaps DIN should be signed or unsigned types instead of std_logic_vector. I actually recommend this, but it depends on how you are representing your values in other places of your design.
DOUT could also be a signed or unsigned value instead of std_logic_vector - it depends your requirements.
In this case, acc_value, the accumulated value register, will rollover if the values accumulated get too high. Maybe you want to generate an error condition when this happens, or perform a check to ensure that you saturate at the maximum value of acc_value instead.
acc_value need not be the same width as DIN -- it could be twice as wide (or whatever your requirements are). The wider it is, the more you can accumulate before the rollover condition occurs.

Related

Process sensitivity list vhdl

I have the following problem:
I have a simple entity driven by a single process:
LIBRARY IEEE;
USE ieee.std_logic_1164.ALL;
USE ieee.std_logic_unsigned.ALL;
USE ieee.std_logic_arith.ALL;
entity somma_CDC is
Port
(
A : in std_logic;
B : in std_logic;
Reset : in std_logic;
Internal_Carry_enable : in std_logic;
S : out std_logic
);
end somma_CDC;
architecture Behavioral_somma_CDC of somma_CDC is
signal Internal_Carry: std_logic;
begin
somma_CDC:process (Reset,A,B)
begin
if Reset = '1'
then
Internal_Carry <= '0';
else
S <= A XOR B XOR Internal_Carry ;
if (Internal_Carry_enable = '1')
then
Internal_Carry <= (A AND B) OR (Internal_Carry AND A) OR (Internal_Carry AND B) ;
end if;
end if;
end process;
end architecture;
In practice, it is very similar to a full adder.
Ideally, the block diagram should look like this:
My problem arises when in the cycles following the first, I find the values of the operands equal. In this case, the process does not activate and therefore fails to calculate the case in which
A = 1, B = 1, Carry_In = 1.
There is a clock signal in my system, but the clock goes faster than the input data change. If I put the clock in the sensitivity list I get wrong results, as the carry "propagates" in the wrong way.
I tried without using the sensitivity list and putting a wait for "X" time, with "X" the minimum time for changing operands A and B. It works, but it depends on something that can always change in a project.
Is there another way to activate this process?

TL;DR:
Add Internal_Carry to your sensitivity list.
Edit: As #Tricky pointed out, Internal_Carry_enable should be in the sensitivity list as well.
Full Answer:
I think the problem here is that you may miss understood how to use the sensitivity list. You are using it as C like programming, where the process would be reading reset, A and B as inputs.
But in vhdl every signal in the sensitivity list is a trigger that must change its value in order for the process to be rerun again.
So, the main problem here is the signal Internal_Carry. Since it is not in the sensitivity list, the signal S wont respond as Internal_Carry change to a new value after the first run. You would need to change either reset, A or B to see the effects of Internal_Carry from the last run.
There are other problems in your code, but not related to this.
Internal_Carry is a latch since you didnt assign any default value to it (which value should it holds if reset is not 0 and Internal_Carry_enable is not 1?).
You may need to take a look in the differences between combinational and sequential logic, since you wrote about clock in an adder circuit. If you add a clock the assignment of a value to a signal will generate a register, while a process without a clock in its sensitivity list will be purely combinational.

VHDL: Correctly way to infer a single port ram with synchronous read

I've been having this debate for years... What's the correct why to infer a single port ram with synchronous read.
Let's Suppose the interface for my inferred memory in VHDL is:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity sram1 is
generic(
aw :integer := 8; --address width of memory
dw :integer := 8 --data width of memory
);
port(
--arm clock
aclk :in std_logic;
aclear :in std_logic;
waddr :in std_logic_vector(aw-1 downto 0);
wdata :in std_logic_vector(dw-1 downto 0);
wen :in std_logic;
raddr :in std_logic_vector(aw-1 downto 0);
rdata :out std_logic_vector(dw-1 downto 0)
);
end entity;
is this this way: Door #1
-- I LIKE THIS ONE
architecture rtl of sram1 is
constant mem_len :integer := 2**aw;
type mem_type is array (0 to mem_len-1) of std_logic_vector(dw-1 downto 0);
signal block_ram : mem_type := (others => (others => '0'));
begin
process(aclk)
begin
if (rising_edge(aclk)) then
if (wen = '1') then
block_ram(to_integer(unsigned(waddr))) <= wdata(dw-1 downto 0);
end if;
-- QUESTION: REGISTERING THE READ DATA (ALL OUTPUT REGISTERED)?
rdata <= block_ram(to_integer(unsigned(raddr)));
end if;
end process;
end architecture;
Or this way: Door #2
-- TEXTBOOKS LIKE THIS ONE
architecture rtl of sram1 is
constant mem_len :integer := 2**aw;
type mem_type is array (0 to mem_len-1) of std_logic_vector(dw-1 downto 0);
signal block_ram : mem_type := (others => (others => '0'));
signal raddr_dff : std_logic_vector(aw-1 downto 0);
begin
process(aclk)
begin
if (rising_edge(aclk)) then
if (wen = '1') then
block_ram(to_integer(unsigned(waddr))) <= wdata(dw-1 downto 0);
end if;
-- QUESTION: REGISTERING THE READ ADDRESS?
raddr_dff <= raddr;
end if;
end process;
-- QUESTION: HOT ADDRESS SELECTION OF DATA
rdata <= block_ram(to_integer(unsigned(raddr_dff)));
end architecture;
I'm a fan of the first version because I think its good practice to register all of the output of your vhdl module. However, many textbook list the later version as the correct way to infer a single port ram with synchronous read.
Does it really matter from a Xilinx or Altera synthesis point of view, as long as you already have taken into account the different between delaying the data verses the address (and determined it doesn't matter for your application.)
I mean...they both still give you block rams in the FPGA? right?
or does one give you LUTS and the other Block rams?
Which would infer a better timing and better capacity in an FPGA, door #1 or door #2?

Unfortunately, the synthesis tool vendors have made the RAM inference functions so that they typically recognize both styles, regardless of the physical implementation of the RAM in the FPGA in question.
So even if you specify registered output, the syntesis tool may silently ignore that and infer a RAM with registered inputs instead. This is not functionally equivalent, so it may actually lead to undesired behaviour, particularly in the case of dual port RAMs.
To avoid this pitfall, you can add vendor specific attributes telling the syntehsis tool exactly which kind of RAM you need.
In general, most FPGAs have mandatory registered inputs on the physical RAM, and can add a additional optional register on the output.
So using the code style code with registered inputs will probably make simulation match reality, which is typically a good thing.

The differences can matter, and it really depends on the specific family you are targeting. Most modern FPGAs have options for the block ram that allow them to function either way, but I wouldn't count on that in practice.
If I am inferring RAM, I typically start with the example design provided with the tools (there's almost always a "how to infer ram" section of the user guide). If targeting cross-platform (eg: Altera + Xilinx) I'd stick with a "minimal common supported" set of features, merging the two example designs.
All that said, I typically register BOTH the address and the data. It's one more clock, but it helps close timings and I'm usually more concerned with throughput vs. overall latency. I also typically use wrapper functions (eg: My_Simple_Dual_Port_RAM) and directly instantiate the low-level block rams using primitives which makes it easy to switch between FPGA vendors (or swap out the inferred logic if/when needed). I just drop the modules in a directory (eg: Altera, Lattice, Xilinx) and include the appropriate directory in the project file. I also do the same thing with dual clock FIFOs, where you're typically a LOT better off using the library parts vs. trying to build your own.

You can take a look at the results of the synthesis. My Vivado gives me the following reports after synthesizing your solutions (default settings).
First solution:
BRAM: 0.5 (from 60 Blocks)
IO: 34
BUFG: 1
And the schematic looks like this
Second solution:
BRAM: 0.5 (from 60 Blocks)
IO: 34
BUFG: 1
With the following result:
So you see that the synthesis will generate the same output for both variants. It is up to you which one you want to use. I prefer the first variant because the second is slightly more code.

Timing between 7-segment display and enable

I am working through Altera University LABS but I am using a board of a slightly different design so I am having to mimic the way the boards used in the labs display to 7 Segment LED.
I have sorted it out with the code below:
LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY DE1_disp IS
PORT ( HEX0, HEX1, HEX2, HEX3: IN STD_LOGIC_VECTOR(6 DOWNTO 0);
clk : IN STD_LOGIC;
HEX : OUT STD_LOGIC_VECTOR(6 DOWNTO 0);
DISPn: OUT STD_LOGIC_VECTOR(3 DOWNTO 0));
END DE1_disp;
ARCHITECTURE Behavior OF DE1_disp IS
COMPONENT sweep
Port ( mclk : in STD_LOGIC;
sweep_out : out std_logic_vector(1 downto 0));
END COMPONENT;
SIGNAL M : STD_LOGIC_VECTOR(1 DOWNTO 0);
BEGIN -- Behavior
S0: sweep PORT MAP (clk,M);
DISPProcess: process (clk,M) is
begin
CASE M IS
WHEN "00" => HEX <= HEX0; DISPn <= "1110";
WHEN "01" => HEX <= HEX1; DISPn <= "1101";
WHEN "10" => HEX <= HEX2; DISPn <= "1011";
WHEN "11" => HEX <= HEX3; DISPn <= "0111";
END CASE;
end process DISPProcess;
END Behavior;
The gist is that my board has one lot of segment drivers and you have to scan the LED enable. Whilst the LAB boards simply have n sets of segment drivers.
The code above works except for a pesky "ghost" character. What appears to be happening is that the enable is likely held low whilst a character change is occurring so the following display is lit for a poofteenth enable time.
As you can see from the code I am taking four 7-segment display inputs and generating a scanned and the ghost is always on the digit following the last enable - so it will also wrap from 4th to 1st display. Obviously, this is most apparent when a display is blanked.
For the purposes of the labs this code is fine. However, I would love to better understand what I have done to incur the ghost as understanding that would help me understand VHDL design a tad more.
Can anyone please suggested then what principle I need to grasp here or at least how to code up the enable so it falls after the digit change?
Note I have tried a default case (both using NULL and setting DISPn to "1111"). I suspect a way to do it is to expand case statement and alternatively set HEX and then set DISPn on successive case statements. But are there any other VHDL tricks that might work?
Cheers,
A

It is possible that your diagnosis is slightly wrong.
Check the schematic for your board : it is likely that the Enables (called Disp) drive the bases of bipolar transistors into saturation. Then - even though Hex and Disp change in the same delta cycle, charge storage in the external transistors maintain the Enable for long enough to see the ghost.
The fix is to provide a dead time, turning the Enables off for a short while until the enable transistors are fully off - probably 10's of microseconds - then you can change the digit and re-enable at the same time.
Your solution accomplishes this elegantly simply, but at the cost of half the potential brightness.

VHDL modify one signal with mutiple clock

I met a problem in using 3 clock in one process
if i make a process like this:
HC1,HC2 may function at the same time and they are much more slower than H , H is the base clock which works at 16MHZ.
entity fifo is
Port ( H : in STD_LOGIC;
HC1 : in STD_LOGIC;
HC2 : in STD_LOGIC;
C1data : in STD_LOGIC_VECTOR (2 downto 0);
C2data : in STD_LOGIC_VECTOR (2 downto 0);
Buffer1 : out STD_LOGIC_VECTOR (3 downto 0);
Buffer2 : out STD_LOGIC_VECTOR (3 downto 0));
end fifo;
architecture Behavioral of fifo is
signal Full1,Full2 : STD_LOGIC;
begin
process(H,HC1,HC2)
begin
if(rising_edge(H)) then
Full1 <= '0';
Full2 <= '0';
else
if(rising_edge(HC1)) then
Buffer1(3 downto 1) <= C1data;
Buffer1(0) <= C1data(2) xor C1data(1) xor C1data(0);
Full1 <= '1';
end if;
if(rising_edge(HC2)) then
Buffer2(3 downto 1) <= C2data;
Buffer2(0) <= C2data(2) xor C2data(1) xor C2data(0);
Full2 <= '1';
end if;
end if;
end process;
and it says:
ERROR:Xst:827 - "C:/Users/Administrator/Desktop/test/justatest/fifo.vhd" line 45: Signal Buffer1<0> cannot be synthesized, bad synchronous description. The description style you are using to describe a synchronous element (register, memory, etc.) is not supported in the current software release.
why? Many thanks !

Not all valid VHDL is synthesizable. What is considered synthesizable varies between tools and the target architecture. The Xilinx hardware architectures have no way to represent the logic described by your code (without resorting to gated clocks). Synthesizers only support a subset of the language and expect hardware primitives to be described using certain "set" templates. Modern tools are more forgiving in what they will accept for a high level description but there is a limit to what they can accomplish.
Digital logic synthesis tools make certain assumptions about the types of circuits they will support. Your circuit description applies the rising_edge() function to three different signals in the same process. Complex clocking arrangements like this are generally not supported. The usual expectation is that a circuit consists of isolated clock domains activated by a single clock edge. They will not automatically create gated clocks to suit atypical code like your example because this introduces potential hazards into the circuit that may not be detected with timing constraints and static timing analysis.
In the case of FPGAs, the clocking architecture is baked in and no amount of fiddling with the input description can change that. Feeding clocks into the logic fabric to be gated upsets the default expectations of the synthesizer and is best avoided if at all possible.
If HC1 and HC2 are actually control signals and not clocks then you shouldn't be using the rising_edge() function to detect changes in their state. Instead you should create delayed versions registered by the common clock H. A change from '0' to '1' is then detected by the expression HC1 = '1' and HC1_prev = '0'.
The else condition to the top level if statement is not supported by XST as it doesn't conform to XSTs expectations for describing synchronous logic. You should instead eliminate the else and move the initialization of Full1 and Full2 to a separate reset/clear section. This can be done synchronously or asynchronously. Refer to the XST synthesis guide for examples on how to accomplish that.

adding '1' to LOGIC_VECTOR in VHDL

I'm trying to add '1' to an N-Length STD_LOGIC_VECTOR in VHDL
This is the very first time I'm using VHDL so I'm not at all sure how to add this 1 without bulding a Full-Adder which seems kinda of redundent
We are not allowed to use any more liberaries then then one in the code.
LIBRARY IEEE ;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.STD_LOGIC_UNSIGNED.ALL;
ENTITY cnt IS
GENERIC (n: natural :=3);
PORT( clk: IN std_logic; -- clock
rst: IN std_logic; -- reset
cntNum: IN std_logic_vector(n-1 DOWNTO 0); -- # of counting cycles
cntOut: OUT std_logic_vector(n-1 DOWNTO 0) -- count result
);
END cnt;
architecture CntBhvArc OF cnt IS
signal counta : std_logic_vector(n-1 DOWNTO 0);
begin
process (clk, rst)
begin
if rst='1' then
counta<="0";
elsif (clk'event) and (clk='0') then
counta<= counta+'1';
end if;
cntOut<=counta;
end process;
END CntBhvArc
Also... can anyone point to a VHDL totrial for someone who has very little experince in programing?
Thanks

You should not use library IEEE.STD_LOGIC_UNSIGNED.ALL
This library is deprecated (see VHDL FAQ); use ieee.numeric_std.all instead.

To answer your last point - don't think of it as programming. HDL stands for "hardware description language". You're describing hardware, always keep it in mind when writing your code :)
I've also written at length about not using STD_LOGIC_UNSIGNED, but using NUMERIC_STD instead. If this is homework and you're being taught to use STD_LOGIC_UNSIGNED, then I despair of the educational establishments. It's been years since that made sense.
VHDL is strongly-typed, so if count is representing a number (and with a name like that, it better had be :), use either a signed or unsigned vector, or an integer. Integers don't wrap around in simulation unless you make them (if you add 1 to them when they are at their max value, the simulator will terminate). The vector types do. Sometimes you want one behaviour, sometimes the other.
Finally, I just noted this:
elsif (clk'event) and (clk='0') then
which is better written as:
elsif falling_edge(clk) then
again, this has been so for about a decade or two. Were you intending to use the falling edge - rising edge is more usual.

You need to cast the std_logic_vector to an unsigned value so you can add one, then cast it back so you can assign it to your output.
This looks like a homework assignment, so I'll leave you to figure out exactly how to do the implementation.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Design of MAC unit (dsp processors) using VHDL - vhdl

Related

Process sensitivity list vhdl

VHDL: Correctly way to infer a single port ram with synchronous read

Timing between 7-segment display and enable

VHDL modify one signal with mutiple clock

adding '1' to LOGIC_VECTOR in VHDL

Categories

Resources