Parallel stadium of Pipeline and Multiplexer 2:1 - vhdl

I have a Spartan-E3 FPGA and I'm realizing a (parallel) pipeline with 4 stages like this: http://i.imgur.com/6CQNk.png
The two stages "T3" are the same. T1, T2 and T4 "run" at 50MHz, while T3 runs at 25MHz (and 180° shifted like in the figure).
In Behavioral Simulation it works fine, the results are correct. The problem occurs when I try to synthesize this project on my FPGA. In particular I receive these warnings (and offcourse the results produced are wrong):
WARNING:LIT:175 - Clock buffer is designated to drive clock loads. BUFGMUX
symbol "physical_group_clk_2/Clock_DCM/CLKFX_BUFG_INST" (output signal=clk_2)
has a mix of clock and non-clock loads. Some of the non-clock loads are
(maximum of 5 listed):
Pin I0 of pipeline/mux3/o<65>1
Pin I0 of pipeline/mux3/o<64>1
Pin I0 of pipeline/mux3/o<17>
Pin I0 of pipeline/mux3/o<18>
Pin I0 of pipeline/mux3/o<20>
WARNING:Route:455 - CLK Net:clk_2 may have excessive skew because
0 CLK pins and 66 NON_CLK pins failed to route using a CLK template.
Where "clk_2" is the CLOCK 25MHz.
This is "my multiplexer": stage_4_in <= stage_3_1_out when clk_2='1' else stage_3_2_out;
Basically I can't drive the Multiplexer Selection with a clock signal. So, how can I do it?
I have to do this: if CLOCK 25MHz is high the mux output has to be the top one; otherwise it has to be the second one (bottom). I couldn't figure out how to do this.
By the way, this is the DCM configuration:
CLK_FEEDBACK => "1X",
CLKDV_DIVIDE => 2.0,
CLKFX_DIVIDE => 4,
CLKFX_MULTIPLY => 2,
CLKIN_DIVIDE_BY_2 => FALSE,
CLKIN_PERIOD => 20.000,
CLKOUT_PHASE_SHIFT => "NONE",
DESKEW_ADJUST => "SYSTEM_SYNCHRONOUS",
DFS_FREQUENCY_MODE => "LOW",
DLL_FREQUENCY_MODE => "LOW",
DUTY_CYCLE_CORRECTION => TRUE,
FACTORY_JF => x"C080",
PHASE_SHIFT => 0,
STARTUP_WAIT => TRUE
Thanks in advance.

Eliminate the 25MHz clock - run the whole lot off the 50MHz clock, with clock enables during the even and odd 50MHz cycles for the stages you want to run at 25 MHz. (The culrrent 25MHz clock would probably serve as the clock enable signal)

Related

VHDL - access to 2D array of std_logic_vectors gives unexpected bus conflict

Background:
I'm working on a generic systolic array. I created an entity called "systolic_row" to create one row of the systolic array. The data is handed from one row to the next using an array of std_logic_vectors (type "row_exchange_array"). This 1D array is packed into an other array with the size of all generated rows+2 (type "exchange_array")
The reason for +2 is described next: The first entry of the 2D array are the top inputs to the systolic array, which is always set to 0. The other additional entry has the following reason: The most bottom row of the systolic array consists of accumulators (additional to the already generated rows) (enitiy "ACC"). The accumulators work on the same 2D-array structure as the rows. The overall outputs of the systolic array is the last entry of this 2d-array (equal to 1D-array of std_logic_vectors (type "row_exchange_array")).
Overview of the used types in figure below. Blue are the left inputs of the systolic array. Red is the 1D-array, that passes information from one row to the next. Black unifies all red 1D-arrays to a 2D-array that handles all exchanges from top to bottom of systolic array. (So all black lines are red lines at the same time.)
Problem:
The results in the 2D array are calculated correctly, but as soon as I access the last entry, to extract the overall result and hand it to the output, I get Bus conflicts, that I can not explain.
The Simulation was conducted with a ROW_WIDTH of 3 and only one row generated (NUM_ROWS = 1).
"row_exchange(0)" are the top inputs of the systolic array, which are always set to 0.
"row_exchange(1)" are the outputs of the only generated row and the inputs to the ACCs.
"row_exchange(2)" are the outputs of the ACCs and the overall output of the systolic array ("ys")
Here the simulation result of the 2D array called "row_exchange" ("ys" is the output). Here the results are correct and as expected.
The simulation results of the output ("ys") after accessing the 2D-array "row_exchange". Suddenly there are conflicts on some bits
VHDL Code (only an excerpt because the whole reproduceable code is way to long):
All Generics are declared as Constants in a configuration file "config_pkg.vhd"
-- load configuration file (Generics and data type declaration )
use work.config_pkg.all;
entity systolic_array is
port (
clk : in std_logic;
rstn : in std_logic;
-- "left side" inputs of the systolic array (arrays with size of number of rows, declared in conifguration-vhdl-file)
xs : in input_vector_array;
coeffs : in input_vector_array;
ies : in input_vector_array;
-- enable to declare then valid input data is provided
en : in std_logic;
-- output of systolic array (1D array of std_logic_vectors -> also declared in configuration file)
ys : out row_exchange_array
);
end entity;
architecture behav of systolic_array is
-- array to handle data from one row to the next
-- JUST HERE FOR BETTER UNDERSTANDING, ACCTUALLY DECLARED IN CONFIGURATION FILE BECAUSE IT IS THE SAME TYPE AS THE OUTPUT "ys"
type row_exchange_array is array (0 to ROW_WIDTH-1) of std_logic_vector(DATA_WIDTH-1 downto 0);
-- 2D array to handle all exchanges in the systolic array (from row to row, ACCs and outputs)
type exchange_array is array (0 to NUM_ROWS+1) of row_exchange_array;
signal row_exchange : exchange_array;
begin
-- first row gets all ys set to 0 as input
row_exchange(0) <= (others => (others => '0'));
-- last row is connected to output
-- THIS IS THE THE PROBLEM SEEMS TO OCCUR
ys <= row_exchange(NUM_ROWS+1);
-- generation of the rows
gen_rows : for j in 0 to NUM_ROWS-1 generate
inst_row: systolic_row
port map (
clk => clk, -- clock
i => ies(j), -- "left side" input to systolic array row
coef => coeffs(j), -- "left side" input to systolic array row
x => xs(j), -- "left side" input to systolic array row
y_in => row_exchange(j), -- "top" input to systolic array row
y_out => row_exchange(j+1) -- "bottom" output of systolic array row
);
end generate gen_rows;
-- generation of the accumulators as the last row of the systolic array
gen_accs : for j in 0 to ROW_WIDTH-1 generate
inst_acc: ACC
port map (
clk => clk, -- clock
rstn => rstn, -- reset (low active)
en => en_shift_regs(NUM_ROWS+j), -- enable for the accumulator (signal declaration not shown here, because not part of the problem, I guess)
T => row_exchange(NUM_ROWS)(j), -- input (std_logic_vector)
y => row_exchange(NUM_ROWS+1)(j) -- ouput (std_logic_vector)
);
end generate gen_accs;
end architecture;
I already tried to access/write all std_logic_vectors of the outputs ("ys") individually using a for loop to see if the problem is the access of the 2D-array itself, but it does not seem to be.
I hope my explanation is clear enough. Thank you very much for your help and for trying to understand the complex code. If further Information is needed I am happy to provide it.
I was about to post an little toy example where the same mistake occurs, when I realized the problem.
The cause of the bus conflict was not in the design files, but in the testbench, which I hastily created for quick compilation checking.
The problem was that I was pushing values into the signals that were connected to the outputs of the UUT (see the commented part below, also only an excerpt).
-- Testbench signals to connect to UUT ports
signal clk : std_logic := '1';
signal rstn : std_logic := '0';
signal en : std_logic := '0';
signal xs : input_vector_array;
signal coeffs : input_vector_array;
signal ies : input_vector_array;
signal ys : row_exchange_array;
begin
-- ROW WIDTH AND NUMBER OF ROWS IS SET IN CONFIGURATION FILE
inst_UUT: systolic_array
generic map (
DATA_WIDTH => C_DATA_WIDTH
)
port map (
clk => clk,
rstn => rstn,
en => en,
xs => xs,
coeffs => coeffs,
ies => ies,
ys => ys
);
clk <= not clk after C_CLOCK_PERIOD/2;
-- THOSE ASSIGNMENTS WERE RESPONSIBLE FOR THE BUS CONFLICT
--ys(0) <= x"01";
--ys(1) <= x"00";
--ys(2) <= x"01";
Thank you very much for your help.

[Theory]: VHDL - For Loop possible or just a simple counter?

Lets say I have a input.
inputData : in std_logic_vector(63 downto 0)
At the moment I handle all 8 data bytes from inputData.
I have a fsm which processes the data.
For Example:
State A, State B, State C for processing inputData(7 downto 0)
again
State A, State B, State C for processing inputData(15 downto 8)
.
.
.
State A, State B, State C for processing inputData(63 downto 56)
At the end I go throw State D, E and F...
Now I want to make this more flexible.
To make this possible I have now two inputs:
inputData : in std_logic_vector(63 downto 0)
dataLength: in std_logic_vector(3 downto 0)
Now it could be that I have only 3 Bytes of inputData that I want to handle.
For example:
State A, State B, State C for processing inputData(7 downto 0)
wait a little bit and then again
State A, State B, State C for processing inputData(15 downto 8)
wait a little bit and then again
State A, State B, State C for processing inputData(23 downto 16)
wait a little bit and then again
At the end I go throw State D, E and F...
Should or can I use the For-Statement for this or should I use a simple counter?
Is it correct when I do the following when I want to use the For-Statement?
Pseudocode:
process (clk)
begin
for I in 0 ... dataLength loop
if(I not dataLength) then
Go though State A,B,C with dataByte I
else
Go though State A,B,C with dataByte I
and then through State D, E, F and then break ...
end if
end loop;
end process;
Yes you can use the for loop in 2 ways:
if data length is known at compilation time (e.g. generic constant) you can use the construct: for I in 0 to datalength-1 GENERATE
if data length can change during runtime the you need to use: for I in 0 to datalength-1 LOOP
Please notice that, because I used "datalength-1" you don't need the check if(I not datalength) since you will never go to actual datalength
I hope that answers your question.
The answer depends on what do you want to achieve. I don't mean the general computation, but more what cycle timing is wished?
Solution 1:
Each computation requires 1 clock cycle and is equal to one FSM state. Then you would either need as many FSM states. Not a very generic approach, because changing the input width needs more fixed coded states.
The better solution is a counter that gives:
a. the index to process the input data
b. the condition to leave the current FSM state after processing all input bytes
Solution 2:
If all data can be process in just one clock cycle (be careful of the resulting maximum delay), then you can use a for .. loop statement and unroll all computations in one clock cycle / FSM state.

How to design a decoder that will have extra outputs?

For an application I am creating I would like to use a decoder that helps write to one of 42 registers. In order to account for all possible registers, I need a 6 bit input since the ceiling of lg(42) is 6.
However, this will create a 6 to 64 decoder, leaving me with an extra 12 outputs that I do not know how to handle. I know that in VHDL I can write a case statement for it:
case input is
when "000000" => output <= reg0;
when "000001" => output <= reg1;
.
.
.
when others => output <= ???;
end case;
Hopefully everything else will be designed so that an input > 41 does not occur, but how should the code be written to handle that case? Is there a way to handle it without stopping the application some how? Or, as an alternative, is there a way to write a decoder that has only 42 outputs?
An easier way to write this is:
type regs_type is array (integer range <>) of std_logic_vector(7 downto 0);
signal regs : regs_type (0 to 41) := (others => (others => '0'));
...
output <= regs(to_integer(unsigned(input));
Assuming 'input' is an std_logic_vector, and that your registers are 8-bits wide.
Then use the regs array for your registers 0-41. I suppose if you wanted to be explicit about registers 42+, you could create an array of size 64, and leave the upper elements unconnected, but I believe the above code would achieve the same thing.
If your registers actually have meaningful names, not just reg0 etc, you can have a separate block of code connecting these to the regs array, example:
regs(0) <= setup_reg;
regs(1) <= data_out;
and so on. If I was doing it this way, I would have defined constants for the regs index values, example:
constant SETUP_REG_ADDRESS : integer := 0;
constant DATA_OUT_ADDRESS : integer := 1;
...
regs(SETUP_REG_ADDRESS) <= setup_reg;
regs(DATA_OUT_ADDRESS) <= data_out;
Alternatively, if you wanted to keep the case statement, you could write your others clause as
when others => output <= (others => '-');
This 'don't care' value allows the tools to do whatever is the most efficient in these cases that you believe to be unreachable anyway. If you were concerned about something undefined being assigned to output if input somehow did exceed 41, you could always replace the '-' with a '0'.

VHDL Design - Clock

Can someone please help me with the following:
Design a digital circuit, using VHDL, to keep track of time in the form of HH:MM:SS. The circuit should produce 6 separate four bit digital outputs (2 four bit outputs for the HH, 2 for the MM, 2 for the SS). The HH can just be a 2 digit number in the range 00 to 99 i.e. it’s not a clock, it just a counter for hours even though 99 hour tapes don’t exist. The time is to be displayed on the 6 right most 7 segment displays of the DE2. You have already designed a 7 segment decoder and driver as part of a previous lab, so that can be used to convert each 4 bit output into a 7 bit signal for each the 7-segment display. Don’t forget to set up the pin planer for these display (and all other signals)
The circuit should have the following single bit inputs: A Clock, an increment, a decrement and a reset. The increment/decrement inputs should cause the tape counter to add or subtract 1 second from the tape time on the next rising edge of the clock signal. If neither the increment or decrement inputs are present, the tape counter does not change. The reset is synchronous to the clock (to avoid glitches accidentally resetting it). The increment and decrement signals are all active high signals (i.e. a logic 1), the reset is active low (logic 0).
You tape counter should handle full hour, minute and second roll over, e.g. if the counter is showing 9:59:59, then the next increment should make it display 10:00:00 and vice versa when decrement is present.
Rather than solving your homework, I'd like to give you an idea. Most designers will tend to implement this clock using digit-by-digit rollover (some digits will rollover from 9-0, others from 5-0). I'd like to propose someting different.
The overall idea is: keep your time value in seconds as an integer. This will greatly facilitate the tasks of incrementing and decrementing. Then, you simply implement a conversion function that returns the number of hours, minutes, and seconds, given an integer number of seconds.
Your clock entity would look like this:
library ieee;
use ieee.std_logic_1164.all;
use work.clock_pkg.all;
entity clock is
port (
clock: in std_logic;
n_reset: in std_logic;
increment: in std_logic;
decrement: in std_logic;
hours: out natural range 0 to 99;
minutes: out natural range 0 to 59;
seconds: out natural range 0 to 59
);
end;
architecture rtl of clock is
signal time_in_seconds: natural range 0 to 359999;
begin
process (clock, n_reset) begin
if rising_edge(clock) then
if n_reset = '0' then
time_in_seconds <= 0;
elsif increment then
time_in_seconds <= time_in_seconds + 1;
elsif decrement then
time_in_seconds <= time_in_seconds - 1;
end if;
end if;
end process;
process (time_in_seconds) begin
(hours, minutes, seconds) <= seconds_to_time_type(time_in_seconds);
end process;
end;
As you can imagine, the workhorse of this solution is the seconds_to_time_type() function. You could implement it like this:
package clock_pkg is
type time_type is record
hours: natural range 0 to 99;
minutes, seconds: natural range 0 to 59;
end record;
function seconds_to_time_type(seconds: in natural) return time_type;
end;
package body clock_pkg is
function seconds_to_time_type(seconds: in natural) return time_type is
variable hh: natural range 0 to 99;
variable mm: natural range 0 to 119;
variable ss: natural range 0 to 119;
begin
hh := seconds / 3600;
mm := (seconds mod 3600) / 60;
ss := (seconds mod 3600) mod 60;
return (hh, mm, ss);
end;
end;
Now you have an entity that outputs separate integer values for hours, minutes, and seconds. Converting those values from integers to BCD, and showing those values on the displays is left as an exercise to the reader.
The typical way of implementing a counting clock is using binary coded decimal (BCD), where each digit consists of a separate n-bit counter, with a range as needed.
For example, in order to count seconds (from 0-59), you could use something like the following code:
process(clk, reset) begin
if(reset='1') then
second_tens <= (others=>'0');
second_ones <= (others=>'0');
elsif(rising_edge(clk)) then
if(count_en='1') then
if(second_ones = 9) then
second_ones <= (others=>'0');
if(second_tens = 5) then
second_tens <= (others=>'0');
-- Count up minutes.
else
second_tens <= second_tens + 1;
end if;
else
second_ones <= second_ones + 1;
end if;
end if;
end if;
end process;
Minutes and hours can be counted analogously.
You have skipped a step. You are trying to think about code with just a worded problem statement. First step is to design the hardware by drawing a block diagram. Break the problem down into pieces.
Initial partitioning might be Seconds, Minutes, and Hours. If you are counting in BCD, it you may wish to partition it further digit by digit. Work out what your hardware is supposed to do. Draw a picture. Write code that describes what is in the picture.
At the end of the day, your RTL block diagram is your HDL flow chart.

I want to order two signals to one input in vhdl

I want to have two signals (overflow1 and set1) for one input(tick).
counter2 : counter
generic map (border => 5, width => 4)
port map (RST => RST,
tick => overflow1 [...] set1, -- overflow1 and set1 are these signals
enable => SW0,
x => count2,
overflow => overflow2);
so i want to fill the gap there. i hope u can understand my Problem.
thanks
Assuming that tick is an input port, and overflow1 and set1 are std_logic, then in VHDL-2008 you can do overflow1 or set1.
In previous VHDL versions, like VHDL-2002 and before, you must make an internal temporary signal like temp <= overflow1 or set1, and use that to drive the port.

Resources