How to correctly simplify indices in a generate for loop?

So I wrote this piece of VHDL code that takes a 128 bit signal and swaps the byte positions of each 32/64-bit word (i.e. Big to Little Endian conversion).
As the indices are quite complex, I wanted to simplify it:
library IEEE;
entity Test is
generic (DATA_WIDTH : integer := 32);
end Test;
architecture arch of Test is
signal test_in, test_simple, test_comp : std_logic_vector(127 downto 0);
test_in <= x"01020304020202040303030404040405";
-- 128-bit signals: ReadyValid_RW_Port
for j in 128/DATA_WIDTH-1 downto 0 generate
-- for each of the 32/64 bit words, reverse the byte positions
for i in DATA_WIDTH/8-1 downto 0 generate
signal index : integer := (j * DATA_WIDTH/8 + i)*8;
signal invindex : integer := ((j+1) * DATA_WIDTH/8 - 1 - i)*8;
-- this way works, but looks messy
test_comp((j * DATA_WIDTH/8 + i)*8+7 downto (j * DATA_WIDTH/8 + i)*8) <=
test_in(((j+1) * DATA_WIDTH/8 - 1 - i)*8+7 downto ((j+1) * DATA_WIDTH/8 - 1 - i)*8);
-- simplify by writing the indices separately
index <= (j * DATA_WIDTH/8 + i)*8;
invindex <= ((j+1) * DATA_WIDTH/8 - 1 - i)*8;
-- this doesnt work, test_simple stays U
test_simple(index+7 downto index) <= test_in(invindex+7 downto invindex);
end generate ;
end generate;
end architecture;
However in the Vivado Simulator,test_simple stays at value U, whereas test_comp works just fine. What is my mistake and how do you do correctly what I want to do?

So as Tricky pointed out, the mistake was declaring index and invindex as signals instead of constants. With constants, it works perfectly fine:
-- 128-bit signals: ReadyValid_RW_Port
for j in 128/DATA_WIDTH-1 downto 0 generate
-- for each of the 32/64 bit words, reverse the byte positions
for i in DATA_WIDTH/8-1 downto 0 generate
constant index : integer := (j * DATA_WIDTH/8 + i)*8;
constant invindex : integer := ((j+1) * DATA_WIDTH/8 - 1 - i)*8;
test_simple(index+7 downto index) <= test_in(invindex+7 downto invindex);
end generate ;
end generate;


Vivado VHDL width mismatch - how can I fix it?

Please consider this very simple minimal reproducible code:
library IEEE;
entity test is
generic ( LENGTH : integer range 1 to 16 := 5 );
Port ( x : in STD_LOGIC;
y : out STD_LOGIC_VECTOR(15 downto 0)
end test;
architecture Behavioral of test is
signal a : std_logic_vector (15 downto 0);
signal b : std_logic_vector (LENGTH - 1 downto 0);
signal i : integer range 0 to LENGTH-1 := 1;
y <= a;
if i = LENGTH then
i <= 1;
a <= a(15 downto i + 1) & b(i downto 0);
end if;
i <= i + 1;
end process;
end Behavioral;
My need is to join some elements of b into a, depending on i. By running the RTL on Vivado, it says:
[Synth 8-690] width mismatch in assignment; target has 16 bits, source has 20 bits
I don't really get why. Anyhow, the overall range will be 15 - (i + 1) + (i - 0) = 15 ... 0 and fits in the 16 bits of output -- what's the deal for 20 bits?
I should say the problem vanishes (obviously) if I use plain constants instead of i, but I still don't get what's going on.
For runtime variable I (as per the question)...
instead of a big CASE, you can use the value of I to generate masks, and evaluate (A and MASKA) or (B and MASKB). Which is equivalent to the multiplexer the synthesis tool would generate if it wasn't broken.
For generic I (it's not fair to move the goalposts in the comments!)
this approach generates unnecessary hardware, which will be optimised out by any competent synthesis tool.
(There are of course other problems with this code; I assume you deleted the clock, taking the MCVE notion a bit too far. You should leave it valid synthesisable code)

VHDL Data Flow description of Gray Code Incrementer

I am trying to write the VHDL code for a Gray Code incrementer using the Data Flow description style. I do not understand how to translate the for loop I used in the behavioral description into the Data Flow description. Any suggestion?
This is my working code in behavioral description
library IEEE;
entity graycode is
Generic (N: integer := 4);
Port ( gcode : in STD_LOGIC_VECTOR (N-1 downto 0);
nextgcode : out STD_LOGIC_VECTOR (N-1 downto 0));
end graycode;
architecture Behavioral of graycode is
variable bcode : STD_LOGIC_VECTOR(N-1 downto 0);
variable int_bcode : integer;
for i in gcode'range loop
if(i < gcode'length - 1) then
bcode(i) := gcode(i) XOR bcode(i+1);
bcode(i) := gcode(i);
end if;
end loop;
int_bcode := to_integer(unsigned(bcode));
int_bcode := int_bcode + 1;
bcode := std_logic_vector(to_unsigned(int_bcode, N));
for i in gcode'range loop
if(i < gcode'length - 1) then
nextgcode(i) <= bcode(i) XOR bcode(i+1);
nextgcode(i) <= bcode(i);
end if;
end loop;
end process;
end Behavioral;
'Dataflow' means 'like it would look in a circuit diagram'. In other words, the flow of data through a real circuit, rather than a high-level algorithmic description. So, unroll your loops and see what you've actually described. Start with N=2, and draw out your unrolled circuit. You should get a 2-bit input bus, with an xor gate in it, followed by a 2-bit (combinatorial) incrementor, followed by a 2-bit output bus, with another xor gate, in it. Done, for N=2.
Your problem now is to generalise N. One obvious way to do this is to put your basic N=2 circuit in a generate loop (yes, this is dataflow, since it just duplicates harwdare), and extend it. Ask in another question if you can't do this.
BTW, your integer incrementor is clunky - you should be incrementing an unsigned bcode directly.
Dataflow means constructed of concurrent statements using signals.
That means using generate statements instead of loops. The if statement can be an if generate statement with an else in -2008 or for earlier revisions of the VHDL standard two if generate statements with the conditions providing opposite boolean results for the same value being evaluated.
It's easier to just promote the exception assignments to their own concurrent signal assignments:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity graycode is
generic (N: natural := 4); -- CHANGED negative numbers wont be interesting
port (
gcode: in std_logic_vector (N - 1 downto 0);
nextgcode: out std_logic_vector (N - 1 downto 0)
end entity graycode;
architecture dataflow of graycode is
signal int_bcode: std_logic_vector (N - 1 downto 0); -- ADDED
signal bcode: std_logic_vector (N - 1 downto 0); -- ADDED
int_bcode(N - 1) <= gcode (N - 1);
for i in N - 2 downto 0 generate
int_bcode(i) <= gcode(i) xor int_bcode(i + 1);
end generate;
bcode <= std_logic_vector(unsigned(int_bcode) + 1);
nextgcode(N - 1) <= bcode(N - 1);
for i in N - 2 downto 0 generate
nextgcode(i) <= bcode(i) xor bcode(i + 1);
end generate;
end architecture dataflow;
Each iteration of a for generate scheme will elaborate a block statement with an implicit label of the string image of i concatenated on the generate statement label name string.
In each of these blocks there's a declaration for the iterated value of i and any concurrent statements are elaborated into those blocks.
The visibility rules tell us that any names not declared in the block state that are visible in the enclosing declarative region are visible within the block.
These mean concurrent statements in the block are equivalent to concurrent statement in the architecture body here with a value of i replaced by a literal equivalent.
The concurrent statements in the generate statements and architecture body give us a dataflow representation.
And with a testbench:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity graycode_tb is
end entity;
architecture foo of graycode_tb is
constant N: natural := 4;
signal gcode: std_logic_vector (N - 1 downto 0);
signal nextgcode: std_logic_vector (N - 1 downto 0);
signal bcode: std_logic_vector (N - 1 downto 0);
entity work.graycode
generic map ( N => N)
port map (
gcode => gcode,
nextgcode => nextgcode
variable gv: std_logic_vector (N - 1 downto 0);
variable bv: std_logic_vector (N - 1 downto 0);
wait for 10 ns;
for i in 0 to 2 ** N - 1 loop
bv := std_logic_vector(to_unsigned( i, bv'length));
gv(N - 1) := bv (N - 1);
for i in N - 2 downto 0 loop
gv(i) := bv(i) xor bv(i + 1);
end loop;
gcode <= gv;
bcode <= bv;
wait for 10 ns;
end loop;
end process;
end architecture;
We can see the effects of incrementing int_bcode:

Vivado synthesis: complex assignment not supported

I implemented a Booth modified multiplier in vhdl. I need to make a synthesis with Vivado but it's not possible because of this error:
"complex assignment not supported".
This is the shifter code that causes the error:
entity shift_register is
generic (
N : integer := 6;
M : integer := 6
port (
en_s : in std_logic;
cod_result : in std_logic_vector (N+M-1 downto 0);
position : in integer;
shift_result : out std_logic_vector(N+M-1 downto 0)
end shift_register;
architecture shift_arch of shift_register is
variable shift_aux : std_logic_vector(N+M-1 downto 0);
variable i : integer := 0; --solo per comoditÃ
if(en_s'event and en_s ='1') then
i := position;
shift_aux := (others => '0');
shift_aux(N+M-1 downto i) := cod_result(N+M-1-i downto 0); --ERROR!!
shift_result <= shift_aux ;
end if;
end process;
end shift_arch;
the booth multiplier works with any operator dimension. So I can not change this generic code with a specific one.
Please help me! Thanks a lot
There's a way to make your index addressing static for synthesis.
First, based on the loop we can tell position must have a value within the range of shift_aux, otherwise you'd end up with null slices (IEEE Std 1076-2008 8.5 Slice names).
That can be shown in the entity declaration:
library ieee;
use ieee.std_logic_1164.all;
entity shift_register is
generic (
N: integer := 6;
M: integer := 6
port (
en_s: in std_logic;
cod_result: in std_logic_vector (N + M - 1 downto 0);
position: in integer range 0 to N + M - 1 ; -- range ADDED
shift_result: out std_logic_vector(N + M - 1 downto 0)
end entity shift_register;
What's changed is the addition of a range constraint to the port declaration of position. The idea is to support simulation where the default value of can be integer is integer'left. Simulating your shift_register would fail on the rising edge of en_s if position (the actual driver) did not provide an initial value in the index range of shift_aux.
From a synthesis perspective an unbounded integer requires you take both positive and negative integer values in to account. Your for loop is only using positive integer values.
The same can be done in the declaration of the variable i in the process:
variable i: integer range 0 to N + M - 1 := 0; -- range ADDED
To address the immediate synthesis problem we look at the for loop.
Xilinx support issue AR# 52302 tells us the issue is using dynamic values for indexes.
The solution is to modify what the for loop does:
architecture shift_loop of shift_register is
process (en_s)
variable shift_aux: std_logic_vector(N + M - 1 downto 0);
-- variable i: integer range 0 to N + M - 1 := 0; -- range ADDED
if en_s'event and en_s = '1' then
-- i := position;
shift_aux := (others => '0');
for i in 0 to N + M - 1 loop
-- shift_aux(N + M - 1 downto i) := cod_result(N + M - 1 - i downto 0);
if i = position then
shift_aux(N + M - 1 downto i)
:= cod_result(N + M - 1 - i downto 0);
end if;
end loop;
shift_result <= shift_aux;
end if;
end process;
end architecture shift_loop;
If i becomes a static value when the loop is unrolled in synthesis it can be used in calculation of indexes.
Note this gives us an N + M input multiplexer where each input is selected when i = position.
This construct can actually be collapsed into a barrel shifter by optimization, although you might expect the number of variables involved for large values of N and M might take a prohibitive synthesis effort or simply fail.
When synthesis is successful you'll collapse each output element in the assignment into a separate multiplexer that will match Patrick's
barrel shifter.
For sufficiently large values of N and M we can defined the depth in number of multiplexer layers in the barrel shifter based on the number of bits in a binary expression of the integer range of distance.
That either requires a declared integer type or subtype for position or finding the log2 value of N + M. We can use the log2 value because it would only be used statically. (XST supports log2(x) where x is a Real for determining static values, the function is found in IEEE package math_real). This gives us the binary length of position. (How many bits are required to to describe the shift distance, the number of levels of multiplexers).
architecture barrel_shifter of shift_register is
process (en_s)
use ieee.math_real.all; -- log2 [real return real]
use ieee.numeric_std.all; -- to_unsigned, unsigned
constant DISTLEN: natural := integer(log2(real(N + M))); -- binary lengh
type muxv is array (0 to DISTLEN - 1) of
unsigned (N + M - 1 downto 0);
variable shft_aux: muxv;
variable distance: unsigned (DISTLEN - 1 downto 0);
if en_s'event and en_s = '1' then
distance := to_unsigned(position, DISTLEN); -- position in binary
shft_aux := (others => (others =>'0'));
for i in 0 to DISTLEN - 1 loop
if i = 0 then
if distance(i) = '1' then
shft_aux(i) := SHIFT_LEFT(unsigned(cod_result), 2 ** i);
shft_aux(i) := unsigned(cod_result);
end if;
if distance(i) = '1' then
shft_aux(i) := SHIFT_LEFT(shft_aux(i - 1), 2 ** i);
shft_aux(i) := shft_aux(i - 1);
end if;
end if;
end loop;
shift_result <= std_logic_vector(shft_aux(DISTLEN - 1));
end if;
end process;
end architecture barrel_shifter;
XST also supports ** if the left operand is 2 and the value of i is treated as a constant in the sequence of statements found in a loop statement.
This could be implemented with signals instead of variables or structurally in a generate statement instead of a loop statement inside a process, or even as a subprogram.
The basic idea here with these two architectures derived from yours is to produce something synthesis eligible.
The advantage of the second architecture over the first is in reduction in the amount of synthesis effort during optimization for larger values of N + M.
Neither of these architectures have been verified lacking a testbench in the original. They both analyze and elaborate.
Writing a simple case testbench:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity shift_register_tb is
end entity;
architecture foo of shift_register_tb is
constant N: integer := 6;
constant M: integer := 6;
signal clk: std_logic := '0';
signal din: std_logic_vector (N + M - 1 downto 0)
:= (0 => '1', others => '0');
signal dout: std_logic_vector (N + M - 1 downto 0);
signal dist: integer := 0;
entity work.shift_register
generic map (
N => N,
M => M
port map (
en_s => clk,
cod_result => din,
position => dist,
shift_result => dout
wait for 10 ns;
clk <= not clk;
if now > (N + M + 2) * 20 ns then
end if;
end process;
for i in 1 to N + M loop
wait for 20 ns;
dist <= i;
din <= std_logic_vector(SHIFT_LEFT(unsigned(din),1));
end loop;
end process;
end architecture;
And simulating reveals that the range of position and the number of loop iterations only needs to cover the number of bits in the multiplier and not the multiplicand. We don't need a full barrel shifter.
That can be easily fixed in both shift_register architectures and has the side effect of making the shift_loop architecture much more attractive, it would be easier to synthesize based on the multiplier bit length (presumably M) and not the product bit length (N+ M).
And that would give you:
library ieee;
use ieee.std_logic_1164.all;
entity shift_register is
generic (
N: integer := 6;
M: integer := 6
port (
en_s: in std_logic;
cod_result: in std_logic_vector (N + M - 1 downto 0);
position: in integer range 0 to M - 1 ; -- range ADDED
shift_result: out std_logic_vector(N + M - 1 downto 0)
end entity shift_register;
architecture shift_loop of shift_register is
process (en_s)
variable shift_aux: std_logic_vector(N + M - 1 downto 0);
-- variable i: integer range 0 to M - 1 := 0; -- range ADDED
if en_s'event and en_s = '1' then
-- i := position;
shift_aux := (others => '0');
for i in 0 to M - 1 loop
-- shift_aux(N + M - 1 downto i) := cod_result(N + M - 1 - i downto 0);
if i = position then -- This creates an N + M - 1 input MUX
shift_aux(N + M - 1 downto i)
:= cod_result(N + M - 1 - i downto 0);
end if;
end loop; -- The loop is unrolled in synthesis, i is CONSTANT
shift_result <= shift_aux;
end if;
end process;
end architecture shift_loop;
Modifying the testbench:
for i in 1 to M loop -- WAS N + M loop
wait for 20 ns;
dist <= i;
din <= std_logic_vector(SHIFT_LEFT(unsigned(din),1));
end loop;
end process;
gives a result showing the shifts are over the range of the multiplier value (specified by M):
So the moral here is you don't need a full barrel shifter, only one that works over the multiplier range and not the product range.
The last bit of code should be synthesis eligible.
You are trying to create a range using a run-time varying value, and this is not supported by the synthesis tool. cod_result(N+M-1 downto 0); would be supported, because N, M, and 1 are all known at synthesis time.
If you're trying to implement a multiplier, you will get the best result using x <= a * b, and letting the synthesis tool choose the best way to implement it. If you have operands wider than the multiplier widths in your device, then you need to look at the documentation to determine the best route, which will normally involve pipelining of some sort.
If you need a run-time variable shift, look for a 'Barrel Shifter'. There are existing answers on these, for example this one.

VHDL How to convert 32 bit variable to 4 x 8bit std_logic_vector?

I have a question which is probably in 2 parts:
I am using a (nominally 32 bit) integer variable which I would like to write to an 8 bit UART as 4 bytes (i.e., as binary data)
i.e. variable Count : integer range 0 to 2147483647;
How should I chop the 32 bit integer variable into 4 separate 8 bit std_logic_vectors as expected by my UART code, and how should I pass these to the UART one byte at a time ?
I am aware std_logic_vector(to_unsigned(Count, 32)) will convert the integer variable into a 32 bit std_logic_vector, but then what ? Should I create a 32 bit std_logic_vector, assign the converted Count value to it, then subdivide it using something like the following code ? I realise the following assumes the count variable does not change during the 4 clock cycles, and assumes the UART can accept a new byte every clock cycle, and lacks any means of re-triggering the 4 byte transmit cycle, but am I on the right track here, or is there a better way ?
variable CountOut : std_logic_vector(31 downto 0);
process (clock)
variable Index : integer range 0 to 4 := 0;
if rising_edge(clock) then
CountOut <= std_logic_vector(to_unsigned(Count, 32);
if (Index = 0) then
UartData(7 downto 0) <= CountOut(31 downto 24);
Index := 1;
elsif (Index = 1) then
UartData(7 downto 0) <= CountOut(23 downto 16);
Index := 2;
elsif (Index = 2) then
UartData(7 downto 0) <= CountOut(15 downto 8);
Index := 3;
elsif (Index =31) then
UartData(7 downto 0) <= CountOut(7 downto 0);
Index := 4;
Index := Index;
end if;
end if;
end process;
Any comments or recommendations would be appreciated.
You seem to be on the right track. I believe there are two basic solutions to this problem:
Register the output value as a 32-bit vector, and use different ranges for each output operation (as you did in your code example)
Register the output value as a 32-bit vector, and shift this value 8 bits at a time after each output operation. This way you can use the same range in all operations. The code below should give you an idea:
process (clock)
variable Index: integer range 0 to 4 := 0;
if rising_edge(clock) then
if (Index = 0) then
CountOut <= std_logic_vector(to_unsigned(Count, 32));
Index := Index + 1;
elsif (Index < 4) then
UartData <= CountOut(31 downto 24);
CountOut <= CountOut sll 8;
Index := Index + 1;
end if;
end if;
end process;
Also, please check your assignments, in your example CountOut is declared as a variable but is assigned to as a signal.
There's nothing wrong with the code you've shown. You can do something to separate the the assignment to UartData using Index to allow a loop.
library ieee;
use ieee.std_logic_1164.all;
entity union is
end entity;
architecture foo of union is
type union32 is array (integer range 1 to 4) of std_logic_vector(7 downto 0);
signal UartData: std_logic_vector(7 downto 0);
variable quad: union32;
constant fourbytes: std_logic_vector(31 downto 0) := X"deadbeef";
quad := union32'(fourbytes(31 downto 24), fourbytes(23 downto 16),
fourbytes(15 downto 8),fourbytes(7 downto 0));
for i in union32'RANGE loop
wait for 9.6 us;
UartData <= Quad(i);
end loop;
wait for 9.6 us; -- to display the last byte
wait; -- one ping only
end process;
end architecture;
Or use a type conversion function to hide complexity:
library ieee;
use ieee.std_logic_1164.all;
entity union is
type union32 is array (integer range 1 to 4) of std_logic_vector(7 downto 0);
end entity;
architecture fee of union is
signal UartData: std_logic_vector(7 downto 0);
function toquad (inp: std_logic_vector(31 downto 0)) return union32 is
return union32'(inp(31 downto 24), inp(23 downto 16),
inp(15 downto 8), inp( 7 downto 0));
end function;
variable quad: union32;
constant fourbytes: std_logic_vector(31 downto 0) := X"deadbeef";
quad := toquad (fourbytes);
for i in union32'RANGE loop
wait for 9.6 us;
UartData <= Quad(i);
end loop;
wait for 9.6 us; -- to display the last byte
wait; -- one ping only
end process;
end architecture;
And gives the same answer.

what's wrong with my VHDL sine function gen?

library IEEE;
entity SineGen is
Port (clock : in std_logic;
dac_ab_vpp : in integer range 0 to 4095;
dac_cd_vpp : in integer range 0 to 4095;
sine_dac_ab : out std_logic_vector(11 downto 0);
sine_dac_cd : out std_logic_vector(11 downto 0));
end SineGen;
architecture Behavioral of SineGen is
subtype slv is std_logic_vector(11 downto 0);
variable count : integer range 0 to 255 := 0;
variable temp_dac_ab : integer range 0 to 4095 := 0;
variable temp_dac_cd : integer range 0 to 4095 := 0;
if rising_edge(clock) then
I tried everything and it comes down to that the next two lines makes the output always zero, and I don't understand why. It should've been an output with a sine function. (count is the 256 samples per period. n is the number of bits.) Are the following in valid format?
-- A*sin (2PI/2^n * count)
temp_dac_ab := dac_ab_vpp * integer(round(sin(real(count * integer(math_2_pi/real(256))))));
temp_dac_cd := dac_cd_vpp * integer(round(sin(real(count * integer(math_2_pi/real(256))))));
if count < 256 then
count := count + 1;
count := 0;
end if;
sine_dac_ab <= conv_std_logic_vector(temp_dac_ab, slv'length);
sine_dac_cd <= conv_std_logic_vector(temp_dac_cd, slv'length);
end if;
end process;
end Behavioral;
In addition to what has been pointed out by #brianreavis, you don't want to convert the fraction math_2_pi/real(256) to an integer, since that will always be 0. So:
temp_dac_ab := integer(round(dac_ab_vpp * sin(real(count) * math_2_pi/real(256))));
temp_dac_cd := integer(round(dac_cd_vpp * sin(real(count) * math_2_pi/real(256))));
I'm realllyyy rusty with my VHDL, but I think you're wanting this:
temp_dac_ab := integer(round(dac_ab_vpp * sin(real(count * integer(math_2_pi/real(256))))));
temp_dac_cd := integer(round(dac_cd_vpp * sin(real(count * integer(math_2_pi/real(256))))));
(You don't want to round / cast the float coming from sin until after you multiply it with dac_ab_vpp / dac_cd_vpp)
