VHDL multiplication for std_logic_vector - vhdl

When simulating, I get a run time error, so I'm trying to run a RTL analysis in Vivado to see if the schematic of the component can be created at least. My code is the following:
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
entity multiplicator_test is
generic(
WORD_SIZE: natural := 8;
EXP_SIZE: natural := 3
);
port(
input_1: in std_logic_vector(WORD_SIZE-1 downto 0);
input_2: in std_logic_vector(WORD_SIZE-1 downto 0);
result: out std_logic_vector(WORD_SIZE-1 downto 0)
);
end entity multiplicator_test;
architecture multiplicator_test_arch of multiplicator_test is
constant SIGNIFICAND_SIZE: natural := WORD_SIZE - EXP_SIZE - 1;
signal significand: std_logic_vector(SIGNIFICAND_SIZE-1 downto 0) := (others => '0');
signal exponent: std_logic_vector(EXP_SIZE-1 downto 0) := (others => '0');
signal sign: std_logic := '0';
signal aux: std_logic_vector((2*SIGNIFICAND_SIZE)-1 downto 0) := (others => '0');
begin
aux <= std_logic_vector(signed(input_1(SIGNIFICAND_SIZE-1 downto 0))*signed(input_2(SIGNIFICAND_SIZE - 1 downto 0)));
significand <= aux(SIGNIFICAND_SIZE - 1 downto 0);
exponent <= std_logic_vector(unsigned(input_1(WORD_SIZE-2 downto WORD_SIZE-EXP_SIZE-2))+unsigned(input_2(WORD_SIZE-2 downto WORD_SIZE-EXP_SIZE-2)));
sign <= input_1(WORD_SIZE-1) or input_2(WORD_SIZE-1);
result <= sign & exponent & significand;
end architecture multiplicator_test_arch;
When running the analysis, I get:
ERROR: [Synth 8-690] width mismatch in assignment; target has 3 bits, source has 4 bits [(...)/multiplicador.vhd:27]
The line with the error is 27:
aux <= std_logic_vector(signed(input_1(SIGNIFICAND_SIZE-1 downto 0))*signed(input_2(SIGNIFICAND_SIZE - 1 downto 0)));
Apparently the target (aux) is 3 bits, but really it should be 8.

The line you've posted is not line 27, line 27 is the following:
exponent <= std_logic_vector(unsigned(input_1(WORD_SIZE-2 downto WORD_SIZE-EXP_SIZE-2))+unsigned(input_2(WORD_SIZE-2 downto WORD_SIZE-EXP_SIZE-2)));
As you can see, exponent only has 3 bits:
The unsigned addition will need an additional bit for carry-out.
Basically, there's an issue that you might overflow on the multiplication.
One way to solve this is to make your result and exponent one bit wider:
result: out std_logic_vector(WORD_SIZE downto 0)
signal exponent: std_logic_vector(EXP_SIZE downto 0) := (others => '0');
Yields:

Related

Width mismatch in assignment: VHDL

My code:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
library work;
use work.costanti.all;
entity Multiplier is
generic(nbA:integer:=nbA;
nbB:integer:=nbB);
port (
A: in STD_LOGIC_VECTOR(nbA-1 downto 0);
B: in STD_LOGIC_VECTOR(nbB-1 downto 0);
clk: in STD_LOGIC;
R: out STD_LOGIC_VECTOR(nbA+nbB-1 downto 0));
end Multiplier;
architecture Behavioral of Multiplier is
component AdderTree is
generic(nbit: integer:=nbA+nbB);
port (
IN1: in STD_LOGIC_VECTOR(nbit-1 downto 0);
IN2: in STD_LOGIC_VECTOR(nbit-1 downto 0);
IN3: in STD_LOGIC_VECTOR(nbit-1 downto 0);
IN4: in STD_LOGIC_VECTOR(nbit-1 downto 0);
IN5: in STD_LOGIC_VECTOR(nbit-1 downto 0);
IN6: in STD_LOGIC_VECTOR(nbit-1 downto 0);
IN7: in STD_LOGIC_VECTOR(nbit-1 downto 0);
IN8: in STD_LOGIC_VECTOR(nbit-1 downto 0);
IN9: in STD_LOGIC_VECTOR(nbit-1 downto 0);
S: out STD_LOGIC_VECTOR(nbit-1 downto 0)
);
end component;
signal V : STD_LOGIC_VECTOR(nbA-1 downto 0);
signal P : STD_LOGIC_VECTOR((nbA*nbB)-1 downto 0);
signal PP_0to6 : STD_LOGIC_VECTOR( (nbA)+(nbA+1)+(nbA+2)+(nbA+3)+(nbA+4)+(nbA+5)+(nbA+6)-1 downto 0); --(dim(pp0+PP1+PP2+PP3+PP4+PP5+PP6) downto 0 )
signal PP7 : STD_LOGIC_VECTOR(nbA+nbB-1 downto 0);
signal P7 : STD_LOGIC_VECTOR(nbA downto 0);
signal PPP : STD_LOGIC_VECTOR((nbA+nbB)*(nbB+1)-1 downto 0);
begin
for_g: for i in 0 to nbB-1 generate
V <= (others => B(i));
P((nbB)*(i)+(nbB-1) downto (nbB)*(i)) <= V and A;
end generate for_g;
P7 <= '0' & P((nbA*nbB)-1 downto (nbA*nbB)-1-(nbB-1));
PP_0to6(nbB-1 downto 0) <= P(nbB-1 downto 0); --PP0
for_g2: for i in 0 to nbB-3 generate
PP_0to6((nbB+1)*(i+1)+(i*(i+1)/2)+7 downto (nbB+1)*(i+1)+(i*(i+1)/2)) <= P(nbB*(i+1)+(nbB-1) downto nbB*(i+1)); --PP1 to PP6
PP_0to6((nbB+1)*(i+1)+(i*(i+1)/2)-1 downto (nbB+1)*(i)+((i-1)*(i)/2)+7+1) <= (others => '0');
end generate for_g2;
PP7(nbA+nbB-1 downto nbA-1) <= P7;
PP7(nbA-2 downto 0) <= (others => '0');
PPP_0to6: for i in 3 to nbB-2 generate
PPP(((i+1)*(nbA+nbB-1)+i)-(8-i) downto i*(nbA+nbB)) <= PP_0to6( (i+1)*(nbB-1)+((1/2)*((i*i)+(3*i))) downto i*(nbB)+(i-1)*i/2); --PP0 to PP6
PPP(((i+1)*(nbA+nbB-1)+i) downto ((i+1)*(nbA+nbB-1)+i)-(8-i)+1)<= (others => '0');
end generate PPP_0to6;
-- Fill last 32 bits of PPP
--Insert ADDER TREE
end Behavioral;
Portion of the error code: portion of code
PPP_0to6: for i in 0 to nbB-2 generate
PPP(((i+1)*(nbA+nbB-1)+i)-(8-i) downto i*(nbA+nbB)) <= PP_0to6( (i+1)*(nbB-1)+((1/2)*((i*i)+(3*i))) downto i*(nbB)+(i-1)*i/2); --PP0 to PP6
PPP(((i+1)*(nbA+nbB-1)+i) downto ((i+1)*(nbA+nbB-1)+i)-(8-i)+1)<= (others => '0');
end generate PPP_0to6;
Hi, I'm making a multiplier on vhdl, but on line 66 it reports me the following error:
if i=1: [Synth 8-690] width mismatch in assignment; target has 9 bits, source has 7 bits ["...Multiplier.vhd":66]
if i=2: [Synth 8-690] width mismatch in assignment; target has 10 bits, source has 5 bits ["...Multiplier.vhd":66]
if i=3: [Synth 8-690] width mismatch in assignment; target has 11 bits, source has 2 bits ["...Multiplier.vhd":66]
and so on..
I can't understand why, they seem to be the same size ..
my constant are:
nbA=8
nbB=8
and the signal P, PP_0to6 and PPP:
signal P : STD_LOGIC_VECTOR((nbA*nbB)-1 downto 0);
signal PP_0to6 : STD_LOGIC_VECTOR( (nbA)+(nbA+1)+(nbA+2)+(nbA+3)+(nbA+4)+(nbA+5)+(nbA+6)-1 downto 0);
signal PPP : STD_LOGIC_VECTOR((nbA+nbB)*(nbB+1)-1 downto 0);
N.B. I make sure to shift to the rigth by adding zeros as in the figure:
schema
The error is here:
PPP(((i+1)*(nbA+nbB-1)+i)-(8-i) downto i*(nbA+nbB)) <= PP_0to6( (i+1)*(nbB-1)+((1/2)*((i*i)+(3*i))) downto i*(nbB)+(i-1)*i/2);
but if I tried to replace the value of i:
i=0: PPP(7 downto 0) <= PP_0to6(7 downto 0);
i=1: PPP(24 downto 16)<=PP_0to6(16 downto 8)
i=2: PPP(41 downto 32)<=PP_0to6(26 downto 17)
i=3: PPP(58 downto 48)<=PP_0to6(37 downto 27)
...
...
the dimensions look the same.
I guess strictly speaking this answer doesn't really answer your question, since I'm not trying to figure out where your error is. But I'm convinced that if you change your coding style you won't encounter such difficult to debug errors any more.
As mentioned in my comments, your code will become must clearer and easier to debug if you split the signal up properly. I.e. don't create one giant signal for everything.
VHDL has arrays and records, use them, they won't make your circuit any larger, but the code will be much easier to reason about.
It's been a while since I actually wrote VHDL, so the syntax below might contain typo's, but hopefully the idea behind the code is clear:
constant c_AllZeros : std_logic_vector(c_MaxZeros - 1 downto 0) := (others => '0');
...
type t_P is std_logic_vector(c_SomeLength - 1 downto 0);
subtype t_P_Array is array (natural range <>) of t_P;
...
signal P : t_P_Array(0 to c_NumInputs - 1);
...
PPP_0to6: for i in PPP'range generate
PP(i) <= P(i) & c_AllZeros(index downto 0);
PPP(i) <= c_AllZeros(c_MaxZeros - index downto 0) & PP(i);
end generate PPP_0to6;
As you might notice, I also got rid of the explicit indices for the for-loop in the generate. There's still a magic number when indexing the all_zeroes signal to generate PPP. If I was writing this code, I'd replace that with some (calculated) constant with a meaningful name. This will make the code both more readable and trivial to change later on.
Note that there's other ways to do this. E.g. you could first set all bits of all PP signals to 0 and then assign a slice of them the P value.

Issue with using component and forloop in VHDL

I am trying to create a component for division in VHDL, below is my code. I dont know where i am going wrong. My logic is:
At every step,
• shift divisor right and compare it with current dividend
• if divisor is larger, shift 0 as the next bit of the quotient
• if divisor is smaller, subtract to get new dividend and shift 1
as the next bit of the quotient.
I have used '-' sign here but in actual i have to use gates so either i have to use my subtraction component or just create a subtractor here.
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
use ieee.std_logic_arith.all;
ENTITY divprog IS
PORT(
a: IN std_logic_vector(3 downto 0);
b: IN std_logic_vector(3 downto 0);
err: OUT std_logic;
reslow: OUT std_logic_vector(3 downto 0);
reshigh: OUT std_logic_vector(3 downto 0));
END divprog;
architecture behaviour of divprog is
signal ax,bx,bsub,res :std_logic_vector(7 downto 0) := (others => '0');
signal quo: std_logic_vector(3 downto 0) := (others => '0');
signal intcarry: std_logic_vector(8 downto 0):= (others => '0');
BEGIN
--sub1: subtractor PORT MAP(aa,bb,x,ss);
Process is
variable i : POSITIVE := 1;
BEGIN
ax <= "0000" & a;
bx <= b & "0000";
if(b > "0000") then
while (i <=3) loop
bx <= '0'&bx(7 downto 1);
IF (ax < bx) then
quo <= quo(2 downto 0)& '0';
--bx <= '0'&bx(7 downto 1);
res <=ax;
elsif(ax >= bx) then
res <= ax - bx;
quo <=quo(2 downto 0)& '1';
end if;
i := i + 1;
ax <= res;
end loop;
reshigh <= quo;
reslow <= res(3 downto 0);
end IF;
wait for 100 ns;
END PROCESS;
end behaviour;
Can please someone help me with this?
Thanks
The functional problem is related to variable i. It gets stuck at 4 after the first 100ns. It should be set to 1 between BEGIN and END PROCESS.

Specialized calculator using VHDL

I have to project a specialized calculator on a Basys3 board using VHDL. The calculator should be able to group numbers using brackets, perform additions and substractions, AND and OR operations. For example, an expression could be: 4 + 5 AND 6 +(7 OR 1) - (4 AND 10)
The input numbers are 4 bit numbers (in my code I used 5 bit numbers, the most significant bit being the sign bit) and the output can be max. 16 bits long (i used 17 bits in my code, the most significant being the sign bit).
I wrote the code for the ALU (the adder/substractor, AND/OR) and I managed to make the calculator work for 2 numbers as inputs(using 2 in ports). This is the "main" code for the calculator, that i have written:
library IEEE;
use ieee.STD_LOGIC_1164.all;
use ieee.STD_LOGIC_UNSIGNED.all;
entity calculator is
port(X: in STD_LOGIC_VECTOR(4 downto 0); -- X(4) sign
Y: in STD_LOGIC_VECTOR(4 downto 0);
OPERATIE: in STD_LOGIC_VECTOR(4 downto 0);
CLK, CLR: in STD_LOGIC;
a_to_g: out STD_LOGIC_VECTOR(6 downto 0);
an: out STD_LOGIC_VECTOR(3 downto 0);
negativ: out std_logic);
end calculator;
architecture calculator of calculator is
component ALU is
port(A,B: in STD_LOGIC_VECTOR(16 downto 0);
COMANDA: in STD_LOGIC_VECTOR(4 downto 0);
RESULT: out STD_LOGIC_VECTOR(16 downto 0));
end component;
component BCD_7seg is
port(X: in STD_LOGIC_VECTOR(15 downto 0);
CLK, CLR: in STD_LOGIC;
a_to_g: out STD_LOGIC_VECTOR(6 downto 0);
an: out STD_LOGIC_VECTOR(3 downto 0));
end component;
signal OPERAND_1: STD_LOGIC_VECTOR(16 downto 0) := (others => '0');
signal OPERAND_2: STD_LOGIC_VECTOR(16 downto 0) := (others => '0');
signal TEMP_RESULT: STD_LOGIC_VECTOR(16 downto 0) := (others => '0');
begin
operand_1(3 downto 0) <= x(3 downto 0);
operand_1(16) <= x(4);
operand_2(3 downto 0) <= y(3 downto 0);
operand_2(16) <= y(4);
calculate: ALU port map(operand_1, operand_2, operatie, temp_result);
afis: BCD_7seg port map(temp_result(15 downto 0), clk, clr, a_to_g, an);
negativ <= temp_result(16);
end calculator;
However, the calculator should work for N numbers as inputs (using only one in port) and I don't know how to do it. I thought about memorising the whole expression(operartors and operands) in a FIFO or LIFO memory (but I'm not sure if that could work) and then making the calculations but I don't know how to calculate everything in the correct order given by the priorities and where(and how) to memorise the temporary results.
I thought that maybe you could give me some ideas, I'm new to VHDL, I'm a student and I have just started learning it and got stuck at this part in the project.
Thanks!

Width mismatch: Variable in vector range for signal assignment. why and how to fix?

ISE 14.7 at synthesis returns the following warning on the subsequent line which eventually leads to an error:
"Width mismatch. <temp> has a width of 8 bits but assigned expression is 128-bit wide."
temp <= padding_start_s((((i_pad+1)*8)-1) downto (i_pad*8));
The problem seems to be with the for loop. What I am trying to do is to pad an incoming signal of N multiples of 128 bit. Eventually a non-complete 128 bit signal is received and I want to detect where it eventually ends and then add padding. Certainly, some of the code is missing, but this should really be the relevant stuff.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.NUMERIC_STD.all;
library work;
use work.keccak_globals.all;
entity Keccak_padder is
port (
clk_i : in std_logic;
data_i : in std_logic_vector(127 downto 0);
rst_n : in std_logic;
start_i : in std_logic;
end_i : in std_logic;
state_vector_o : out std_logic_vector(r-1 downto 0);
state_vector_valid_o : out std_logic;
long_message_o : out std_logic
);
end Keccak_padder;
architecture Behavioral of Keccak_padder is
signal word_count : integer range 1 to 16:=1;
signal pad_count : integer range 0 to 3:=0;
signal i_pad : integer range 0 to 15;
signal word_count : integer range 1 to 16:=1;
signal padding_start_s : std_logic_vector(127 downto 0):=(others=>'0');
signal temp : std_logic_vector(7 downto 0);
constant zero_vector : std_logic_vector(7 downto 0):=(others=>'0');
signal start_pad : std_logic;
process(clk_i, rst_n, fsm_state, pad_count, start_pad, padding_start_s)
begin
if rising_edge(clk_i) then
case fsm_state is
when IDLE =>
...
when TRANSMIT =>
...
when RECEIVE =>
if (pad_count = 1) then
state_vector_o((r-1-(data_i'length * (word_count - 1))) downto (r-(data_i'length * (word_count)))) <= temp;
pad_count <= 0;
fsm_state <= IDLE;
start_pad <= '0';
elsif (start_pad = '1') then
temp <= padding_start_s((((i_pad+1)*8)-1) downto (i_pad*8));
pad_count <= pad_count + 1;
end if;
for i in 15 downto 0 loop
if (padding_start_s((((i+1)*8)-1) downto ((i)*8)) = zero_vector) then
i_pad <= i;
start_pad <= '1';
exit;
end if;
end loop;
end case;
end if;
end process;
So eventually what I'm asking is: how do I find a way around this and why is this a problem? Is it wrong to be cutting the range in a signal assignment?
Thanks!
Without a Minimal, Complete, and Verifiable example an answer is hit or miss, and this is a synthesis issue instead of VHDL language syntax or semantic issue.
As Brian commented the temp assignment is a 16:1 mux for an 8 bit wide value, it's possible to simplify the indexing. Even more than Brian suggests:
type byte_array_16 is array (15 downto 0) of std_logic_vector (7 downto 0);
signal padding_bytes: byte_array_16;
begin
padding_bytes <= byte_array_16'(
padding_start_s(127 downto 120), padding_start_s(119 downto 112),
padding_start_s(111 downto 104), padding_start_s(103 downto 96),
padding_start_s( 95 downto 88), padding_start_s( 87 downto 80),
padding_start_s( 79 downto 72), padding_start_s( 71 downto 64),
padding_start_s( 63 downto 56), padding_start_s( 55 downto 48),
padding_start_s( 47 downto 40), padding_start_s( 39 downto 32),
padding_start_s( 31 downto 24), padding_start_s( 23 downto 16),
padding_start_s( 15 downto 8), padding_start_s( 7 downto 0)
);
TEST1: -- temp assignment expression
process
variable i_pad: integer range 0 to 15; -- overloads signal i_pad
begin
for i in 0 to 15 loop
i_pad := i;
-- temp <= padding_start_s((((i_pad + 1) * 8) - 1) downto (i_pad * 8));
temp <= padding_bytes(i_pad);
wait for 0 ns; -- temp assignment takes effect next delta cycle
end loop;
report "Test 1, temp assignment, no bounds errors";
wait;
end process;
The assignment to padding_bytes works as like a union in C, except that it's only goes one way. It also adds no hardware burden.
So the i_pad value determination is a priority encoder from a particular end with a bunch of byte recognizers comparing values to constant zero_vector. Those 16 recognizers (the for loop will get unwound in synthesis) get optimized to just look for all '0's.
What you have besides recognizers is a 16 to 4 priority encoder producing i_pad and start_pad, used to specify any recognizers found all '0's.
But what's hairy is there's all this arithmetic in what you select for inputs to the recognizers. You can fix that with the same one way union:
FIND_FIRST_ZERO_BYTE:
process
begin
start_pad <= '0';
for i in 15 downto 0 loop
if padding_bytes(i) = zero_vector then
i_pad <= i;
start_pad <= '1';
exit;
end if;
end loop;
wait;
end process;
And that eliminates a whole heck of a lot of arithmetic required because i_pad is a signal.

Adding Even Parity bit and 2 stop bits to a 8 bits std_logic_vector

Here is the code: In this the calculation for the parity bit is not done. Parity bit can be calculated using the for loop but is there any other short or better way to calculate the even parity bit in this context.
Is it somehow possible to use arrays instead of 8 TxDataReg std_logic_vector considering that after making arrays I wish to access bit by bit the array of 8 signals of 8 bits, bit by bit for sending the data in the uart_tx port?
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.STD_LOGIC_SIGNED.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.NUMERIC_STD.ALL;
entity Uart_tx is
Port (
tx_clk_in : in STD_LOGIC;
reset : in STD_LOGIC;
tx : out STD_LOGIC;
Rx_Data_in : in STD_LOGIC_VECTOR(63 downto 0)
);
end Uart_tx;
architecture Behavioral of Uart_tx is
signal Tx_Data : STD_LOGIC_VECTOR(63 downto 0) := "00000000";
signal DataByteArray1 : std_logic_vector(7 downto 0) := (others => "00000000");
signal DataByteArray2 : std_logic_vector(7 downto 0) := (others => "00000000");
signal DataByteArray3 : std_logic_vector(7 downto 0) := (others => "00000000");
signal DataByteArray4 : std_logic_vector(7 downto 0) := (others => "00000000");
signal DataByteArray5 : std_logic_vector(7 downto 0) := (others => "00000000");
signal DataByteArray6 : std_logic_vector(7 downto 0) := (others => "00000000");
signal DataByteArray7 : std_logic_vector(7 downto 0) := (others => "00000000");
signal DataByteArray8 : std_logic_vector(7 downto 0) := (others => "00000000");
signal TxDataReg1 : std_logic_vector(10 downto 0) := (others => "00000000");
signal TxDataReg2 : std_logic_vector(10 downto 0) := (others => "00000000");
signal TxDataReg3 : std_logic_vector(10 downto 0) := (others => "00000000");
signal TxDataReg4 : std_logic_vector(10 downto 0) := (others => "00000000");
signal TxDataReg5 : std_logic_vector(10 downto 0) := (others => "00000000");
signal TxDataReg6 : std_logic_vector(10 downto 0) := (others => "00000000");
signal TxDataReg7 : std_logic_vector(10 downto 0) := (others => "00000000");
signal TxDataReg8 : std_logic_vector(10 downto 0) := (others => "00000000");
signal count : unsigned(2 downto 0) := (others => '0');
signal one_bit : std_logic := '0';
begin
Tx_Data <= Rx_Data_in;
DataByteArray1 <= Rx_Data_in(7 downto 0);
DataByteArray2 <= Rx_Data_in(15 downto 8);
DataByteArray3 <= Rx_Data_in(23 downto 16);
DataByteArray4 <= Rx_Data_in(31 downto 24);
DataByteArray5 <= Rx_Data_in(39 downto 32);
DataByteArray6 <= Rx_Data_in(47 downto 40);
DataByteArray7 <= Rx_Data_in(55 downto 48);
DataByteArray8 <= Rx_Data_in(63 downto 56);
Process (tx_clk_in)
begin
-- Calculate the parity bit
for i in 0 to 7 loop
one_bit = DataByteArray1(i);
if one_bit = '1' then
count = count + 1;
end if;
end loop;
-- For all the registers,one even parity & two stop bits I am trying to add in the end
if count mod 2 = 0 then
TxDataReg1 <= DataByteArray1&'0'&'11'; -- I am not so sure that this works or not
count <= "000";
else
TxDataReg1 <= DataByteArray1&'1'&'11';
count <= "000";
end if;
-- Send the uart data from TxDataReg1,TxDataReg2 ...
-- etc.
end process;
end behavioral;
This UART would be much easier to understand if you created a State Machine. State Machines give your code an organized flow. The flow just makes more sense. In VHDL you can create enumerated states which means that you can give them names. I recommend this approach.
It's much harder to keep counters throughout your design to know exactly when to insert the parity bit or when to insert the 2 stop bits in your UART design. If you have a nice state machine it will make much more sense to you I believe. This is especially recommended for anyone new at FPGAs.
When you calculate your parity, just keep a running parity bit that gets an XOR with the outgoing serial data. Create a state to insert your parity bit at the correct time, then insert your two stop bits.
For an example of this, look at this UART VHDL Code
I would second the suggestion to reorganize this to use an FSM that works on just a byte at a time. Then you will have a general purpose async. TX entity that another controller can send bytes to as needed.
As to managing your data. It would be simpler if you created an array of byte arrays:
subtype byte is std_logic_Vector(7 downto 0);
type byte_array is array(natural range <>) of byte;
signal data_byte_array : byte_array(1 to 8);
signal byte_index : unsigned(2 downto 0);
...
-- Select the current byte
cur_byte <= data_byte_array(to_integer(byte_index));
The subtype isn't strictly necessary but it is a good habit to use for common data types to save you from littering your code with so many hard-coded array bounds.
For calculating parity you need to adopt the hardware mindset of implementing logic gates rather than the software approach of counting set bits. Parity calculation boils down to an XOR-reduce operation applied to all the bits in your vector. For even parity, you XOR all bits. For odd parity, you XOR all bits and invert the result. Because XOR is equivalent to a controlled inversion you can select the parity type by setting an initial state and performing one extra XOR to get the optional inversion based on your desire for odd or even.
-- Any VHDL:
variable parity : std_logic;
parity := '0'; -- Set to '1' to get odd parity
for i in cur_byte'range loop
parity := parity xor cur_byte(i);
end loop;
-- VHDL-2002
use ieee.reduce_pack.xor_reduce;
parity := xor_reduce(cur_byte);
-- VHDL-2008
parity := xor cur_byte;
In synthesis these approaches all boil down to the same logic so any of them is fine for all practical purposes. This is an explicitly parallel operation and you don't have to step through the byte bitwise with the unneeded overhead of a counter.
You have committed a cardinal sin of mixing the non-standard Synopsys libraries std_logic_unsigned, _signed, and _arith with the true standard numeric library numeric_std. Never mix them in the same file and, better yet, never use the Synopsys libraries at all. They are a historical aberration best forgotten.

Resources