I want to have two signals (overflow1 and set1) for one input(tick).
counter2 : counter
generic map (border => 5, width => 4)
port map (RST => RST,
tick => overflow1 [...] set1, -- overflow1 and set1 are these signals
enable => SW0,
x => count2,
overflow => overflow2);
so i want to fill the gap there. i hope u can understand my Problem.
thanks
Assuming that tick is an input port, and overflow1 and set1 are std_logic, then in VHDL-2008 you can do overflow1 or set1.
In previous VHDL versions, like VHDL-2002 and before, you must make an internal temporary signal like temp <= overflow1 or set1, and use that to drive the port.
Related
I want an approximation of the Tanh function by saving the values in a LUT (by this I am doing a quantization). I want to choose the Number of entries in the LUT.
As an not-correct example, I imagine a code like
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
use ieee.fixed_pkg.all;
entity tanh_lut is
generic (
MIN_RANGE: real := 0.0; -- Minimum value of x
MAX_RANGE: real := 5.0; -- Maximum value of x
DATA_RANGE_int: positive:= 8;
DATA_RANGE_frac: positive:= 8;
);
Port ( DIN : in sfixed(DATA_RANGE_int-1 downto -(DATA_RANGE_frac-1));
DOUT : out sfixed(DATA_RANGE_int-1 downto -(DATA_RANGE_frac-1))
end tanh_lut;
architecture Behavioral of tanh_lut is
begin
lut_gen: for i in 0 to LUT_SIZE-1 generate
constant x_val : real := MIN_RANGE + (MAX_RANGE - MIN_RANGE) * i / (LUT_SIZE-1);
constant x_val_next : real := MIN_RANGE + (MAX_RANGE - MIN_RANGE) * (i+1) / (LUT_SIZE-1);
constant y_val : real := tanh(x_val);
if DIN>=x_val_previous AND DIN<x_val then
DOUT <= to_sfixed(tanh(y_val),DOUT ) ;
END IF
end generate;
end Behavioral;
Per example, if I want 4 entries in the range 0 to 3, I want that it is synthesizing a code like:
if DIN>0 AND DIN<=1 then
DOUT <= to_sfixed(0, DOUT);
else DIN>1 AND DIN<=2 then
DOUT <= to_sfixed(0.76159415595, DOUT);
else DIN>2 AND DIN<=3 then
DOUT <= to_sfixed(0.96402758007, DOUT);
else DIN>3 AND DIN<=4 then
DOUT <= to_sfixed(0.99505475368, DOUT);
End if
Is there any way that a code like this or a code which implements the idea behind this is possible?
A simple LUT with addresses is not possible because the addresses are always integer and DIN is fixed point, e.g., 1.5
The other possibility would be two LUTs, one for mapping the Input to an address, another for mapping the address to the LUT entry, e.g., LUT1: 1.5=> address 5, LUT2: address 5 => 0.90. But by this I would double the amount of resources what I dont want
My requirements: things like the tanh(x) should not be synthesized, only the final value of tanh(x). It shoudl also be hardware efficient
It does not matter if you use a nested „if-elsif“ construct or if you use a new „if“ construct for each check.
So you can create a loop like this:
for i in 0 to c_number_of_checks-1 loop
if c_boundaries(i)<DIN and DIN<=c_boundaries(i+1) then
DOUT <= c_output_values(i);
end if;
end loop;
Of course you must provide the constants c_number_of_checks and c_boundaries, c_output_values. This can be done by:
constant c_number_of_checks : natural := 4;
type array_of_your_data_type is array (natural range <>) of your_data_type;
constant c_boundaries : array_of_your_data_type(c_number_of_checks downto 0) := init_c_boundaries(c_number_of_checks);
constant c_output_values : array_of_your_data_type(c_number_of_checks-1 downto 0) := init_c_output_values(c_number_of_checks);
This means you will need the functions init_c_boundaries, init_c_output_values, which create arrays of values, which can initialize the constant c_boundaries and c_output_values.
But this is not complicated (you can use from ieee.math_real the function TANH), as the functions need not to be synthesizable, as they are called only during compile time.
As you see, you will have some effort. So perhaps it is easier to follow the other suggestions. If you do so (value as address of a LUT) you should think about automatic ROM inference, which is provided by several tool chains and will give you a very efficient (small) hardware.
Background:
I'm working on a generic systolic array. I created an entity called "systolic_row" to create one row of the systolic array. The data is handed from one row to the next using an array of std_logic_vectors (type "row_exchange_array"). This 1D array is packed into an other array with the size of all generated rows+2 (type "exchange_array")
The reason for +2 is described next: The first entry of the 2D array are the top inputs to the systolic array, which is always set to 0. The other additional entry has the following reason: The most bottom row of the systolic array consists of accumulators (additional to the already generated rows) (enitiy "ACC"). The accumulators work on the same 2D-array structure as the rows. The overall outputs of the systolic array is the last entry of this 2d-array (equal to 1D-array of std_logic_vectors (type "row_exchange_array")).
Overview of the used types in figure below. Blue are the left inputs of the systolic array. Red is the 1D-array, that passes information from one row to the next. Black unifies all red 1D-arrays to a 2D-array that handles all exchanges from top to bottom of systolic array. (So all black lines are red lines at the same time.)
Problem:
The results in the 2D array are calculated correctly, but as soon as I access the last entry, to extract the overall result and hand it to the output, I get Bus conflicts, that I can not explain.
The Simulation was conducted with a ROW_WIDTH of 3 and only one row generated (NUM_ROWS = 1).
"row_exchange(0)" are the top inputs of the systolic array, which are always set to 0.
"row_exchange(1)" are the outputs of the only generated row and the inputs to the ACCs.
"row_exchange(2)" are the outputs of the ACCs and the overall output of the systolic array ("ys")
Here the simulation result of the 2D array called "row_exchange" ("ys" is the output). Here the results are correct and as expected.
The simulation results of the output ("ys") after accessing the 2D-array "row_exchange". Suddenly there are conflicts on some bits
VHDL Code (only an excerpt because the whole reproduceable code is way to long):
All Generics are declared as Constants in a configuration file "config_pkg.vhd"
-- load configuration file (Generics and data type declaration )
use work.config_pkg.all;
entity systolic_array is
port (
clk : in std_logic;
rstn : in std_logic;
-- "left side" inputs of the systolic array (arrays with size of number of rows, declared in conifguration-vhdl-file)
xs : in input_vector_array;
coeffs : in input_vector_array;
ies : in input_vector_array;
-- enable to declare then valid input data is provided
en : in std_logic;
-- output of systolic array (1D array of std_logic_vectors -> also declared in configuration file)
ys : out row_exchange_array
);
end entity;
architecture behav of systolic_array is
-- array to handle data from one row to the next
-- JUST HERE FOR BETTER UNDERSTANDING, ACCTUALLY DECLARED IN CONFIGURATION FILE BECAUSE IT IS THE SAME TYPE AS THE OUTPUT "ys"
type row_exchange_array is array (0 to ROW_WIDTH-1) of std_logic_vector(DATA_WIDTH-1 downto 0);
-- 2D array to handle all exchanges in the systolic array (from row to row, ACCs and outputs)
type exchange_array is array (0 to NUM_ROWS+1) of row_exchange_array;
signal row_exchange : exchange_array;
begin
-- first row gets all ys set to 0 as input
row_exchange(0) <= (others => (others => '0'));
-- last row is connected to output
-- THIS IS THE THE PROBLEM SEEMS TO OCCUR
ys <= row_exchange(NUM_ROWS+1);
-- generation of the rows
gen_rows : for j in 0 to NUM_ROWS-1 generate
inst_row: systolic_row
port map (
clk => clk, -- clock
i => ies(j), -- "left side" input to systolic array row
coef => coeffs(j), -- "left side" input to systolic array row
x => xs(j), -- "left side" input to systolic array row
y_in => row_exchange(j), -- "top" input to systolic array row
y_out => row_exchange(j+1) -- "bottom" output of systolic array row
);
end generate gen_rows;
-- generation of the accumulators as the last row of the systolic array
gen_accs : for j in 0 to ROW_WIDTH-1 generate
inst_acc: ACC
port map (
clk => clk, -- clock
rstn => rstn, -- reset (low active)
en => en_shift_regs(NUM_ROWS+j), -- enable for the accumulator (signal declaration not shown here, because not part of the problem, I guess)
T => row_exchange(NUM_ROWS)(j), -- input (std_logic_vector)
y => row_exchange(NUM_ROWS+1)(j) -- ouput (std_logic_vector)
);
end generate gen_accs;
end architecture;
I already tried to access/write all std_logic_vectors of the outputs ("ys") individually using a for loop to see if the problem is the access of the 2D-array itself, but it does not seem to be.
I hope my explanation is clear enough. Thank you very much for your help and for trying to understand the complex code. If further Information is needed I am happy to provide it.
I was about to post an little toy example where the same mistake occurs, when I realized the problem.
The cause of the bus conflict was not in the design files, but in the testbench, which I hastily created for quick compilation checking.
The problem was that I was pushing values into the signals that were connected to the outputs of the UUT (see the commented part below, also only an excerpt).
-- Testbench signals to connect to UUT ports
signal clk : std_logic := '1';
signal rstn : std_logic := '0';
signal en : std_logic := '0';
signal xs : input_vector_array;
signal coeffs : input_vector_array;
signal ies : input_vector_array;
signal ys : row_exchange_array;
begin
-- ROW WIDTH AND NUMBER OF ROWS IS SET IN CONFIGURATION FILE
inst_UUT: systolic_array
generic map (
DATA_WIDTH => C_DATA_WIDTH
)
port map (
clk => clk,
rstn => rstn,
en => en,
xs => xs,
coeffs => coeffs,
ies => ies,
ys => ys
);
clk <= not clk after C_CLOCK_PERIOD/2;
-- THOSE ASSIGNMENTS WERE RESPONSIBLE FOR THE BUS CONFLICT
--ys(0) <= x"01";
--ys(1) <= x"00";
--ys(2) <= x"01";
Thank you very much for your help.
I'm trying to create a Shift Register, by using multiplication (*2) to shift bits one position.
However, when I do it, ISE (Xilinx IDE) says me that this expression has x2 the number of elements the original signal has.
To be specific, I've:
if rising_edge(clk) then
registro <= unsigned(sequence);
registro <= registro * 2;
-- Just adds into the last position the new bit, Sin (signal input)
registro <= registro or (Sin, others => '0');
sequence <= std_logic_vector(registro);
end if;
And before, I've declared:
signal Sin : std_logic;
signal sequence : std_logic_vector(0 to 14) := "100101010000000";
signal registro : unsigned (0 to 14);
So I'm getting the error (at multiplication line):
Expression has 30 elements ; expected 15
So, why does it creates a x2 sized vector, if I've only multiplied *2?
What am I missing? How can I accomplish it?
Thank you in advance
Word width grows because you have used multiplication.
Multiplying 2 16-bit unsigned numbers gives you a 32 bit unsigned, in general.
Now it would be possible to optimise your specific case of multiplication by a constant, 2, and have synthesis do the correct thing. In which case the error message would change to
Expression has 16 elements ; expected 15
but why should the synthesis tool bother?
Use a left shift instead, either using a left (right?) shift operator, or explicit slicing and concatenation, for example:
registro <= registro(1 to registro'length-1) & '0';
Incidentally:
Using ascending bit order range is quite unconventional for arithmetic : all I can say is good luck with that...
you have three assignments to the same signal within the same process; only the last one will take effect. (See Is process in VHDL reentrant? for some information on the semantics of signal assignment)
If you declared "sequence" as unsigned in the first place you'd save a lot of unnecessary conversions and the code inside the process would reduce to a single statement, something like
sequence <= ('0' & sequence(0 to sequence'length-2)) or
(0 => Sin, others => '0') when rising_edge(clk);
I am utterly unfamiliar with "wrong way round" arithmetic so I cannot vouch that the shifts actually do what you want.
For an application I am creating I would like to use a decoder that helps write to one of 42 registers. In order to account for all possible registers, I need a 6 bit input since the ceiling of lg(42) is 6.
However, this will create a 6 to 64 decoder, leaving me with an extra 12 outputs that I do not know how to handle. I know that in VHDL I can write a case statement for it:
case input is
when "000000" => output <= reg0;
when "000001" => output <= reg1;
.
.
.
when others => output <= ???;
end case;
Hopefully everything else will be designed so that an input > 41 does not occur, but how should the code be written to handle that case? Is there a way to handle it without stopping the application some how? Or, as an alternative, is there a way to write a decoder that has only 42 outputs?
An easier way to write this is:
type regs_type is array (integer range <>) of std_logic_vector(7 downto 0);
signal regs : regs_type (0 to 41) := (others => (others => '0'));
...
output <= regs(to_integer(unsigned(input));
Assuming 'input' is an std_logic_vector, and that your registers are 8-bits wide.
Then use the regs array for your registers 0-41. I suppose if you wanted to be explicit about registers 42+, you could create an array of size 64, and leave the upper elements unconnected, but I believe the above code would achieve the same thing.
If your registers actually have meaningful names, not just reg0 etc, you can have a separate block of code connecting these to the regs array, example:
regs(0) <= setup_reg;
regs(1) <= data_out;
and so on. If I was doing it this way, I would have defined constants for the regs index values, example:
constant SETUP_REG_ADDRESS : integer := 0;
constant DATA_OUT_ADDRESS : integer := 1;
...
regs(SETUP_REG_ADDRESS) <= setup_reg;
regs(DATA_OUT_ADDRESS) <= data_out;
Alternatively, if you wanted to keep the case statement, you could write your others clause as
when others => output <= (others => '-');
This 'don't care' value allows the tools to do whatever is the most efficient in these cases that you believe to be unreachable anyway. If you were concerned about something undefined being assigned to output if input somehow did exceed 41, you could always replace the '-' with a '0'.
I have a Spartan-E3 FPGA and I'm realizing a (parallel) pipeline with 4 stages like this: http://i.imgur.com/6CQNk.png
The two stages "T3" are the same. T1, T2 and T4 "run" at 50MHz, while T3 runs at 25MHz (and 180° shifted like in the figure).
In Behavioral Simulation it works fine, the results are correct. The problem occurs when I try to synthesize this project on my FPGA. In particular I receive these warnings (and offcourse the results produced are wrong):
WARNING:LIT:175 - Clock buffer is designated to drive clock loads. BUFGMUX
symbol "physical_group_clk_2/Clock_DCM/CLKFX_BUFG_INST" (output signal=clk_2)
has a mix of clock and non-clock loads. Some of the non-clock loads are
(maximum of 5 listed):
Pin I0 of pipeline/mux3/o<65>1
Pin I0 of pipeline/mux3/o<64>1
Pin I0 of pipeline/mux3/o<17>
Pin I0 of pipeline/mux3/o<18>
Pin I0 of pipeline/mux3/o<20>
WARNING:Route:455 - CLK Net:clk_2 may have excessive skew because
0 CLK pins and 66 NON_CLK pins failed to route using a CLK template.
Where "clk_2" is the CLOCK 25MHz.
This is "my multiplexer": stage_4_in <= stage_3_1_out when clk_2='1' else stage_3_2_out;
Basically I can't drive the Multiplexer Selection with a clock signal. So, how can I do it?
I have to do this: if CLOCK 25MHz is high the mux output has to be the top one; otherwise it has to be the second one (bottom). I couldn't figure out how to do this.
By the way, this is the DCM configuration:
CLK_FEEDBACK => "1X",
CLKDV_DIVIDE => 2.0,
CLKFX_DIVIDE => 4,
CLKFX_MULTIPLY => 2,
CLKIN_DIVIDE_BY_2 => FALSE,
CLKIN_PERIOD => 20.000,
CLKOUT_PHASE_SHIFT => "NONE",
DESKEW_ADJUST => "SYSTEM_SYNCHRONOUS",
DFS_FREQUENCY_MODE => "LOW",
DLL_FREQUENCY_MODE => "LOW",
DUTY_CYCLE_CORRECTION => TRUE,
FACTORY_JF => x"C080",
PHASE_SHIFT => 0,
STARTUP_WAIT => TRUE
Thanks in advance.
Eliminate the 25MHz clock - run the whole lot off the 50MHz clock, with clock enables during the even and odd 50MHz cycles for the stages you want to run at 25 MHz. (The culrrent 25MHz clock would probably serve as the clock enable signal)