PRBS Generator module in VHDL

PRBS Generator module in VHDL - vhdl

Here i am posting a snapshot of prbs
My code for prbs module is
-- Module Name: prbs - Behavioral
-- Project Name: modulator
-- Description:
--To make it of N bit replace existing value of N with desired value of N
----------------------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code.
--library UNISIM;
--use UNISIM.VComponents.all;
entity prbs is
Port ( pclock : in STD_LOGIC;
preset : IN std_logic := '0';
prbsout : out STD_LOGIC);
end prbs;
architecture Behavioral of prbs is
COMPONENT dff is
PORT(
dclock : IN std_logic;
dreset : IN std_logic;
din : IN std_logic ;
dout : OUT std_logic
);
END COMPONENT;
signal dintern : std_logic_vector (4 downto 1); --Change value of N to change size of shift register
signal feedback : std_logic := '0';
begin
instdff : dff port map (pclock , preset , feedback , dintern(1));
genreg : for i in 2 to 4 generate --Change Value of N Here to generate that many instance of d flip flop
begin
instdff : dff port map ( pclock , preset , dintern(i-1) , dintern(i));
end generate genreg;
main : process(pclock)
begin
if pclock'event and pclock = '1' then
if preset = '0' then
if dintern /= "0" then
feedback <= dintern(1) xor dintern(3); -- For N equals four;
--feedback <= dintern(4) xor dintern(5) xor dintern(6) xor dintern(8); -- For N equals eight;
--feedback <= dintern(11) xor dintern(13) xor dintern(14) xor dintern(16); -- For N equals sixteen;
--feedback <= dintern(1) xor dintern(2) xor dintern(22) xor dintern(32); -- For N equals thirty two
else
feedback <= '1';
end if;
end if;
end if;
end process main;
prbsout <= dintern(4) ; --Change Value of N Here to take output to top entity
end Behavioral;
In it i am instantiating a d flip flop module
d ff module code
----------------------------------------------------------------------------------
-- Module Name: dff - Behavioral
-- Project Name:
-- Description:
----------------------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code.
--library UNISIM;
--use UNISIM.VComponents.all;
entity dff is
Port ( dclock : in STD_LOGIC ;
dreset : in STD_LOGIC ;
din : in STD_LOGIC;
dout : out STD_LOGIC);
end dff;
architecture Behavioral of dff is
begin
process(dclock)
begin
if dclock'event and dclock = '1' then
if dreset = '0' then
dout <= din;
else
dout <= '1';
end if;
end if;
end process;
end Behavioral;
But i am not getting desired output.
In top level entity i am getting always 1 at prbsout signal.
When i try to simulate then prbsout signal becomes undefined.
What i am missing?

The prbs module reset at preset does not apply to the feedback signal,
probably because the intention was to use the initial value of 0 assigned in
the declaration of the feedback signal. However, since the dff modules
uses synchronous reset, and the dintern signal will be undriven U at start,
and since then next value for feedback is calculated using dintern(1) in an
xor, the feedback will get undefined right after start, and can't recover,
even if a lengthy reset is applied. See waveform from ModelSim below.
An immediate fix for the reset issue is to apply reset for feedback also in
the main process:
...
else -- preset /= '0'
feedback <= '0';
...
Now at least reset works, and can make the prbs generate a sequence. See
waveform below.
Just a few additional comments to the code, while at it:
Instead of dclock'event and dclock = '1' you can use rising_edge(dclock),
which I think must reader will find easier to understand, and it is less
error prone
For most tools, it is unnecessary to make a separte module just for a
flip-flop, like the dff module, since the tools can infer flip-flop
directly from the process even when advanced expressions are used for signal
assignments are used.
But, I don't think the output is what you actually want. Based on your
design, and the selected taps for the LFSR, it looks like you want to generate
maximum length LFSR sequences, that is sequences with a length of 2 ** N - 1
for a LFSR register being N bits long.
The principles of LFSR and the taps to for feedback generation is described
on Wikipedia: Linear feedback shift
register.
However, since the feedback signal is generated as a flip-flop, it becomes
part of the LSFR shift register, thus adds a bit to the length, but the tap
values are based on the dintern part of the LFSR only, the taps will be
wrong. Selecting the wrong bits will result in a LFSR sequence that is less
than then maximum sequence, and you can also see that in the simulation output,
where the sequence is only 6 cycles long, even through the dintern(4 downto
1) + feedback together makes a 5 bit register.
So a more thorough rewrite of the prbs module is required, if what you want
is to generate maximum length PRBS sequences, and below is an example of how
the prbs module can be written:
library ieee;
use ieee.std_logic_1164.all;
entity prbs_new is
generic(
BITS : natural);
port(
clk_i : in std_logic;
rst_i : in std_logic;
prbs_o : out std_logic);
end entity;
library ieee;
use ieee.numeric_std.all;
architecture syn of prbs_new is
signal lfsr : std_logic_vector(BITS downto 1); -- Flip-flops with LFSR state
function feedback(slv : std_logic_vector) return std_logic is -- For maximum length LFSR generation
begin
case slv'length is
when 3 => return slv( 3) xor slv( 2);
when 4 => return slv( 4) xor slv( 3);
when 8 => return slv( 8) xor slv( 6) xor slv( 5) xor slv(4);
when 16 => return slv(16) xor slv(15) xor slv(13) xor slv(4);
when 32 => return slv(32) xor slv(22) xor slv( 2) xor slv(1);
when others => report "feedback function not defined for slv'lenght as " & integer'image(slv'length)
severity FAILURE;
return 'X';
end case;
end function;
begin
process (clk_i, rst_i) is
begin
if rising_edge(clk_i) then
if unsigned(lfsr) /= 0 then
lfsr <= lfsr(lfsr'left - 1 downto lfsr'right) & feedback(lfsr); -- Left shift with feedback in
end if;
end if;
if rst_i = '1' then -- Asynchronous reset
lfsr <= std_logic_vector(to_unsigned(1, BITS)); -- Reset assigns 1 to lfsr signal
end if;
end process;
prbs_o <= lfsr(BITS); -- Drive output
end architecture;
Comments to ´prbs_new´ module
Generic BITS is added so diffrent LFSR length can be made from the same code.
Ports are named with "_i" for inputs and "_o" for outputs, since this naming
convension is very useful when tracing signals at a toplevel with multiple
modules.
The VHDL standard package ieee.numeric_std is used instead of
the non-standard package ieee.std_logic_unsigned.
Asynchronous reset is used instead of synchronous reset and initial value in
the signal declaration.
The advantage over synchronous reset is that asynchronous reset typical
applies to a dedicated input on the flip-flops in FPGA and ASIC
technology, and not in the potentially timing critical data path, whereby
the design can be faster.
The advantage over initial value in the signal declation is that FPGA and
ASIC technologies are more likely to be able to implement this; there are
cases where initial values are not supported. Also functional reset
makes restart possible in a test bench without having to reload the
simulator.
There is no check for an all-0 value of the lfsr signal in the process,
since the lfsr will never get an all-0 value if proper maximum length taps
are used, and the lfsr signal is reset to a non-0 value.

It looks like you are never setting your internal state (dintern) to a known value. Since all subsequent states are calculated from your initial dintern value, they are unknown as well. Try assigning an initial state to dintern, or fixing your preset code to actually do something when preset is high (and then assert it at the start of your testbench).

Related

Why does my implementation of an XOR-reduction has multiple drivers?

I have a binary string like "11000110".
I'm trying to XOR all bits together.
I have this code:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.NUMERIC_STD.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity e_circuit is
generic(n : INTEGER:=8);
port(
d1:in std_logic_vector(n-1 downto 0);
result:out std_logic
);
end e_circuit;
architecture structure of e_circuit is
signal temp:std_logic;
begin
temp<=d1(0);
loop1:for i in 1 to n-1 generate
temp<= d1(i) or d1(i-1);
end generate;
result<=temp;
end structure;
But, when I try to compile it, I get the error below:
ERROR:Xst:528 - Multi-source in Unit <e_circuit> on signal <result>;
this signal is connected to multiple drivers.
What does it mean? How can I fix it?

This means that you are assigning to a signal (temp) multiple times (tying their outputs together directly) without any combinational circuit or if clause. When the synthesizer synthesizes the resulting circuit, all of the statements in the for...generate statement get executed simultaneously. Here are some solutions you can try:
First, you can use the VHDL-2008 reduction operator:
result <= xor d1;
Or, if your synthesizer doesn't support that, create a function to do it for you:
function xor_reduct(slv : in std_logic_vector) return std_logic is
variable res_v : std_logic := '1'; -- Null slv vector will also return '1'
begin
for i in slv'range loop
res_v := res_v xor slv(i);
end loop;
return res_v;
end function;
And call it in your corresponding architecture:
result <= xor_reduct(d1);
Or do the circuit manually, (with temp being a std_logic_vector of the same size as d1:
temp(0) <= d1(0);
gen: for i in 1 to n-1 generate
temp(i) <= temp(i-1) xor d1(i);
end generate;
result <= temp(n-1);

What does it mean?
You have concurrently assigned the signal temp at several locations in your code (and none of them is masked by an if ... generate). Each concurrent assignment makes up a driver.
The for generate repeats the concurrent statements inside the block for each value of i within the given range. Thus, for n = 8, your code is similar to:
temp <= d1(0);
temp <= d1(1) or d1(1-1);
temp <= d1(2) or d1(2-1);
temp <= d1(3) or d1(3-1);
temp <= d1(4) or d1(4-1);
temp <= d1(5) or d1(5-1);
temp <= d1(6) or d1(6-1);
temp <= d1(7) or d1(7-1);
Thus, you have connected 8 drivers to temp. The first is the input d1(0) and the others or the outputs of 7 OR gates.
The posted error message is reported only with the old parser of ISE for Spartan-3 like devices. You should switch to the new one. Right click on Synthesize -> Properties -> Synthesis Options. The set option "Other XST Command Line Options" to
-use_new_parser YES
Then, you get a more meaningful error message:
ERROR:HDLCompiler:636 - "/home/zabel/tmp/xor_reduction/e_circuit.vhdl" Line 23: Net is already driven by input port .
Line 23 is the one within the for ... generate statement.
How can I fix it?
At first, your code has to use an XOR instead of OR. ISE does not support the XOR reduction operator from VHDL'08, thus, you have to describe it manually. One solution is to use sequential statements within a process, which has also been suggested by #user1155120 in the comments:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity e_circuit is
generic(n : INTEGER := 8);
port(d1 : in std_logic_vector(n-1 downto 0);
result : out std_logic);
end e_circuit;
architecture structure of e_circuit is
begin
process(d1)
variable temp : std_logic;
begin
temp := d1(0);
for i in 1 to n-1 loop
temp := temp xor d1(i);
end loop;
result <= temp;
end process;
end structure;
Here, the variable temp is updated sequentially during the execution of the process (like in an imperative software programming language). The final value is then assigned to the signal result.
I have also omitted all VHDL packages which are not required.

Alternative method for creating low clock frequencies in VHDL

In the past I asked a question about resets, and how to divide a high clock frequency down to a series of lower clock square wave frequencies, where each output is a harmonic of one another e.g. the first output is 10 Hz, second is 20 Hz etc.
I received several really helpful answers recommending what appears to be the convention of using a clock enable pin to create lower frequencies.
An alternative since occurred to me; using a n bit number that is constantly incremented, and taking the last x bits of the number as the clock ouputs, where x is the number of outputs.
It works in synthesis for me - but I'm curious to know - as I've never seen it mentioned anywhere online or on SO, am I missing something that means its actually a terrible idea and I'm simply creating problems for later?
I'm aware that the limitations on this are that I can only produce frequencies that are the input frequency divided by a power of 2, and so most of the time it will only approximate the desired output frequency (but will still be of the right order). Is this limitation the only reason it isn't recommended?
Thanks very much!
David
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
library UNISIM;
use UNISIM.VComponents.all;
use IEEE.math_real.all;
ENTITY CLK_DIVIDER IS
GENERIC(INPUT_FREQ : INTEGER; --Can only divide the input frequency by a power a of 2
OUT1_FREQ : INTEGER
);
PORT(SYSCLK : IN STD_LOGIC;
RESET_N : IN STD_LOGIC;
OUT1 : OUT STD_LOGIC; --Actual divider is 2^(ceiling[log2(input/freq)])
OUT2 : OUT STD_LOGIC); --Actual output is input over value above
END CLK_DIVIDER;
architecture Behavioral of Clk_Divider is
constant divider : integer := INPUT_FREQ / OUT1_FREQ;
constant counter_bits : integer := integer(ceil(log2(real(divider))));
signal counter : unsigned(counter_bits - 1 downto 0) := (others => '0');
begin
proc : process(SYSCLK)
begin
if rising_edge(SYSCLK) then
counter <= counter + 1;
if RESET_N = '0' then
counter <= (others => '0');
end if;
end if;
end process;
OUT1 <= counter(counter'length - 1);
OUT2 <= not counter(counter'length - 2);
end Behavioral;

Functionally the two outputs OUT1 and OUT2 can be used as clocks, but that method of making clocks does not scale and is likely to cause problems in the implementation, so it is a bad habit. However, it is of course important to understand why this is so.
The reason it does not scale, is that every signal used as clock in a FPGA is to be distributed through a special clock net, where the latency and skew is well-defined, so all flip-flops and memories on each clock are updated synchronously. The number of such clock nets is very limited, usually in the range of 10 to 40 in a FPGA device, and some restrictions on use and location makes it typically even more critical to plan the use of clock nets. So it is typically required to reserve clock nets for only real asynchronous clocks, where there is no alternative than to use a clock net.
The reason it is likely to cause problems, is that clocks created based on bits in a counter have no guaranteed timing relation. So if it is required to moved data between these clock domains, it requires additional constrains for synchronization, in order to be sure that the Clock Domain Crossing (CDC) is handled correctly. This is done through constrains for synthesis and/or Static Timing Analysis (STA), and is usually a little tricky to get right, so using a design methodology that simplifies STA is habit that saves design time.
So in designs where it is possible to use a common clock, and then generate synchronous clock enable signals, this should be the preferred approach. For the specific design above, a clock enable can be generated simply by detecting the '0' to '1' transition of the relevant counter bit, and then assert the clock enable in the single cycle where the transition is detected. Then a single clock net can be used, together with 2 clock enables like CE1 and CE2, and no special STA constrains are required.

Morten already pointed out the theory in his answer.
With the aid of two examples, I will demonstrate the problems you encounter when using a generated clock instead of clock enables.
Clock Distribution
At first, one must take care that a clock arrives at (almost) the same time at all destination flip-flops. Otherwise, even a simple shift register with 2 stages like this one would fail:
process(clk_gen)
begin
if rising_edge(clk_gen) then
tmp <= d;
q <= tmp;
end if;
end if;
The intended behavior of this example is that q gets the value of d after two rising edges of the generated clock clock_gen.
If the generated clock is not buffered by a global clock buffer, then the delay will be different for each destination flip-flop because it will be routed via the general-purpose routing.
Thus, the behavior of the shift register can be described as follows with some explicit delays:
library ieee;
use ieee.std_logic_1164.all;
entity shift_reg is
port (
clk_gen : in std_logic;
d : in std_logic;
q : out std_logic);
end shift_reg;
architecture rtl of shift_reg is
signal ff_0_q : std_logic := '0'; -- output of flip-flop 0
signal ff_1_q : std_logic := '0'; -- output of flip-flop 1
signal ff_0_c : std_logic; -- clock input of flip-flop 0
signal ff_1_c : std_logic; -- clock input of flip-flop 1
begin -- rtl
-- different clock delay per flip-flop if general-purpose routing is used
ff_0_c <= transport clk_gen after 500 ps;
ff_1_c <= transport clk_gen after 1000 ps;
-- two closely packed registers with clock-to-output delay of 100 ps
ff_0_q <= d after 100 ps when rising_edge(ff_0_c);
ff_1_q <= ff_0_q after 100 ps when rising_edge(ff_1_c);
q <= ff_1_q;
end rtl;
The following test bench just feeds in a '1' at input d, so that, q should be '0' after 1 clock edge an '1' after two clock edges.
library ieee;
use ieee.std_logic_1164.all;
entity shift_reg_tb is
end shift_reg_tb;
architecture sim of shift_reg_tb is
signal clk_gen : std_logic;
signal d : std_logic;
signal q : std_logic;
begin -- sim
DUT: entity work.shift_reg port map (clk_gen => clk_gen, d => d, q => q);
WaveGen_Proc: process
begin
-- Note: registers inside DUT are initialized to zero
d <= '1'; -- shift in '1'
clk_gen <= '0';
wait for 2 ns;
clk_gen <= '1'; -- just one rising edge
wait for 2 ns;
assert q = '0' report "Wrong output" severity error;
wait;
end process WaveGen_Proc;
end sim;
But, the simulation waveform shows that q already gets '1' after the first clock edge (at 3.1 ns) which is not the intended behavior.
That's because FF 1 already sees the new value from FF 0 when the clock arrives there.
This problem can be solved by distributing the generated clock via a clock tree which has a low skew.
To access one of the clock trees of the FPGA, one must use a global clock buffer, e.g., BUFG on Xilinx FPGAs.
Data Handover
The second problem is the handover of multi-bit signals between two clock domains.
Let's assume we have 2 registers with 2 bits each. Register 0 is clocked by the original clock and register 1 is clocked by the generated clock.
The generated clock is already distributed by clock tree.
Register 1 just samples the output from register 0.
But now, the different wire delays for both register bits in between play an important role. These have been modeled explicitly in the following design:
library ieee;
use ieee.std_logic_1164.all;
library unisim;
use unisim.vcomponents.all;
entity handover is
port (
clk_orig : in std_logic; -- original clock
d : in std_logic_vector(1 downto 0); -- data input
q : out std_logic_vector(1 downto 0)); -- data output
end handover;
architecture rtl of handover is
signal div_q : std_logic := '0'; -- output of clock divider
signal bufg_o : std_logic := '0'; -- output of clock buffer
signal clk_gen : std_logic; -- generated clock
signal reg_0_q : std_logic_vector(1 downto 0) := "00"; -- output of register 0
signal reg_1_d : std_logic_vector(1 downto 0); -- data input of register 1
signal reg_1_q : std_logic_vector(1 downto 0) := "00"; -- output of register 1
begin -- rtl
-- Generate a clock by dividing the original clock by 2.
-- The 100 ps delay is the clock-to-output time of the flip-flop.
div_q <= not div_q after 100 ps when rising_edge(clk_orig);
-- Add global clock-buffer as well as mimic some delay.
-- Clock arrives at (almost) same time on all destination flip-flops.
clk_gen_bufg : BUFG port map (I => div_q, O => bufg_o);
clk_gen <= transport bufg_o after 1000 ps;
-- Sample data input with original clock
reg_0_q <= d after 100 ps when rising_edge(clk_orig);
-- Different wire delays between register 0 and register 1 for each bit
reg_1_d(0) <= transport reg_0_q(0) after 500 ps;
reg_1_d(1) <= transport reg_0_q(1) after 1500 ps;
-- All flip-flops of register 1 are clocked at the same time due to clock buffer.
reg_1_q <= reg_1_d after 100 ps when rising_edge(clk_gen);
q <= reg_1_q;
end rtl;
Now, just feed in the new data value "11" via register 0 with this testbench:
library ieee;
use ieee.std_logic_1164.all;
entity handover_tb is
end handover_tb;
architecture sim of handover_tb is
signal clk_orig : std_logic := '0';
signal d : std_logic_vector(1 downto 0);
signal q : std_logic_vector(1 downto 0);
begin -- sim
DUT: entity work.handover port map (clk_orig => clk_orig, d => d, q => q);
WaveGen_Proc: process
begin
-- Note: registers inside DUT are initialized to zero
d <= "11";
clk_orig <= '0';
for i in 0 to 7 loop -- 4 clock periods
wait for 2 ns;
clk_orig <= not clk_orig;
end loop; -- i
wait;
end process WaveGen_Proc;
end sim;
As can be seen in the following simulation output, the output of register 1 toggles to an intermediate value of "01" at 3.1 ns first because the input of register 1 (reg_1_d) is still changing when the rising edge of the generated clock occurs.
The intermediate value was not intended and can lead to undesired behavior. The correct value is seen not until another rising edge of the generated clock.
To solve this issue, one can use:
special codes, where only one bit flips at a time, e.g., gray code, or
cross-clock FIFOs, or
handshaking with the help of single control bits.

Why it is necessary to use internal signal for process?

I'm learning VHDL from the root, and everything is OK except this. I found this from Internet. This is the code for a left shift register.
library ieee;
use ieee.std_logic_1164.all;
entity lsr_4 is
port(CLK, RESET, SI : in std_logic;
Q : out std_logic_vector(3 downto 0);
SO : out std_logic);
end lsr_4;
architecture sequential of lsr_4 is
signal shift : std_logic_vector(3 downto 0);
begin
process (RESET, CLK)
begin
if (RESET = '1') then
shift <= "0000";
elsif (CLK'event and (CLK = '1')) then
shift <= shift(2 downto 0) & SI;
end if;
end process;
Q <= shift;
SO <= shift(3);
end sequential;
My problem is the third line from bottom. My question is, why we need to pass the internal signal value to the output? Or in other words, what would be the problem if I write Q <= shift (2 downto 0) & SI?

In the case of the shown code, the Q output of the lsr_4 entity comes from a register (shift representing a register stage and being connected to Q). If you write the code as you proposed, the SI input is connected directly (i.e. combinationally) to the Q output. This can also work (assuming you leave the rest of the code in place), it will perform the same operation logically expect eliminate one clock cycle latency. However, it's (generally) considered good design practice to have an entity's output being registered in order to not introduce long "hidden" combinational paths which are not visible when not looking inside an entity. It usually makes designing easier and avoids running into timing problems.

First, this is just a shift register, so no combinational blocks should be inferred (except for input and output buffers, which are I/O related, not related to the circuit proper).
Second, the signal called "shift" can be eliminated altogether by specifying Q as "buffer" instead of "out" (this is needed because Q would appear on both sides of the expression; "buffer" has no side effects on the inferred circuit). A suggestion for your code follows.
Note: After compiling your code, check in the Netlist Viewers / Technology Map Viewer tool what was actually implemented.
library ieee;
use ieee.std_logic_1164.all;
entity generic_shift_register is
generic (
N: integer := 4);
port(
CLK, RESET, SI: in std_logic;
Q: buffer std_logic_vector(N-1 downto 0);
SO: out std_logic);
end entity;
architecture sequential of generic_shift_register is
begin
process (RESET, CLK)
begin
if (RESET = '1') then
Q <= (others => '0');
elsif rising_edge(CLK) then
Q <= Q(N-2 downto 0) & SI;
end if;
end process;
SO <= Q(N-1);
end architecture;

Binary serial adder - VHDL

I'm trying to design a 32bit binary serial adder in VHDL, using a structural description. The adder should make use of a full adder and a d-latch. The way I see it is:
Full adder:
architecture Behavioral of FullAdder is
begin
s <= (x xor y) xor cin;
cout <= (x and y) or (y and cin) or (x and cin);
end Behavioral;
D-Latch:
architecture Behavioral of dLatch is
begin
state: process(clk)
begin
if(clk'event and clk = '1') then
q <= d;
end if;
end process;
end Behavioral;
Serial adder:
add: process ( clk )
variable count : integer range 0 to 31;
variable aux : STD_LOGIC;
variable aux2 : STD_LOGIC;
begin
if(clk'event and clk = '1') then
fa: FullAdder port map(x(count), y(count), aux, s(count), aux2);
dl: dLatch port map(clock, aux2, aux);
count := count + 1;
end if;
end process;
However, it doesn't seem to work.
Also, what would be the simplest way to pipeline the serial adder?

"It doesn't seem to work" is pretty general, but one problem I see is that you are trying to instantiate the component fa: FullAdder within a process. Think about what component instantiation means in hardware, and you will realize that it makes no sense to instantiate the module on the rising_edge of clk...
Move the instantiation out of the process, and it should at least remove the syntax error you should be seeing ("Illegal sequential statement." in ModelSim).

For pipelining the serial adder, the best way is to connect the adders and d flip-flops one after the other. So, you would have the cout of the first adder be the input of a flip-flop. The output of that flip-flop will be the cin of the next adder and so on. Be careful though, because you will also have to pipeline the s of each adder, as well as each bit of the input, by essentially putting several d flip-flops in a row to copy them through the various pipeline stages.

Implementing an Accumulator in VHDL

I am trying to implement a signed accumulator using Core Gen in Xilinx. According to my understanding an accumulator performs the function of a normal register which is just routing the input to the output, but I wanted clarification on that.
I added the Accumulator IPcore (.xco) module to the project and I have a main file which basically contains the component declaration and the port map. I have a single step process too. Everything compiles and I can see the result on the board but don't quite understand what's going on...
When I input 1000 the 8 bit output on the LEDs is 11111000. Another input of 1111 gives me 11110111. I am attaching the code here for the main vhd file called Accm and the .vho file.
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
-- Uncomment the following library declaration if using
-- arithmetic functions with Signed or Unsigned values
--use IEEE.NUMERIC_STD.ALL;
-- Uncomment the following library declaration if instantiating
-- any Xilinx primitives in this code.
--library UNISIM;
--use UNISIM.VComponents.all;
entity Accm is
port( b: in std_logic_vector(3 downto 0);
sclr, clk, b1, b2 : in std_logic;
q : out std_logic_vector(7 downto 0)
);
end Accm;
architecture Behavioral of Accm is
-- signal declaration
type tell is (rdy,pulse,not_rdy);
signal d_n_s: tell;
signal en: std_logic;
-- component declaration
COMPONENT my_accm
PORT (
b : IN STD_LOGIC_VECTOR(3 DOWNTO 0);
clk : IN STD_LOGIC;
sclr : IN STD_LOGIC;
q : OUT STD_LOGIC_VECTOR(7 DOWNTO 0)
);
END COMPONENT;
-- port map
begin
A1 : my_accm
PORT MAP (
b => b,
clk => en,
sclr => sclr,
q => q
);
process(clk)
begin
if clk'event and clk='1' then
case d_n_s is
when rdy => en <= '0';
if b1='1' then d_n_s <= pulse; end if;
when pulse => en <= '1';
d_n_s <= not_rdy;
when not_rdy => en <='0';
if b2='1' then d_n_s <= rdy; end if;
end case;
end if;
end process;
-- .VHO CODE
------------- Begin Cut here for COMPONENT Declaration ------ COMP_TAG
COMPONENT my_accm
PORT (
b : IN STD_LOGIC_VECTOR(3 DOWNTO 0);
clk : IN STD_LOGIC;
sclr : IN STD_LOGIC;
q : OUT STD_LOGIC_VECTOR(7 DOWNTO 0)
);
END COMPONENT;
-- COMP_TAG_END ------ End COMPONENT Declaration ------------
-- The following code must appear in the VHDL architecture
-- body. Substitute your own instance name and net names.
------------- Begin Cut here for INSTANTIATION Template ----- INST_TAG
your_instance_name : my_accm
PORT MAP (
b => b,
clk => clk,
sclr => sclr,
q => q
);
end Behavioral;
I am also pasting an image of the accumualtor I generated in CoreGen.
I'D appreciate it if someone could explain me what is going on in this program. Thanks!

"Accumulator" can mean many things. In the hardware Xilinx library, the component you instantiated is an adder in front of a register. The adder is adding the current value of the accumulator register with the input term. The accumulator register is wider than the input so you can accumulate (add together) many input terms without overflowing the output.
When your circuit starts, the accumulator contains zero. You input 1000 (-8) which when added to zero becomes 11111000 (-8 sign extended) on the output. You then add 1111 (-1), and the output becomes 11110111 (-9 sign extended).
Once you are done "accumulating", assert SCLR to clear the accumulator register back to zero (or use SSET or SINIT, as appropriate for your logic).
This should all be covered by the documentation for the Xilinx library (try clicking the "datasheet" button in the corgen dialog).

Actually, I think I get it now. It's just acting like an Adder with signed inputs. I think I am correct on this but would appreciate any clarification!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio