Warning "has no load", but I can't see why - vhdl

I got these warnings from Lattice Diamond for each instance of any uart (currently 11)
WARNING - ngdbuild: logical net 'UartGenerator_0_Uart_i/Uart/rxCounter_cry_14' has no load
WARNING - ngdbuild: logical net 'UartGenerator_0_Uart_i/Uart/rxCounter_cry_0_COUT1_9_14' has no load
WARNING - ngdbuild: logical net 'UartGenerator_0_Uart_i/Uart/rxCounter_cry_12' has no load
WARNING - ngdbuild: logical net 'UartGenerator_0_Uart_i/Uart/rxCounter_cry_10' has no load
WARNING - ngdbuild: logical net 'UartGenerator_0_Uart_i/Uart/rxCounter_cry_8' has no load
WARNING - ngdbuild: logical net 'UartGenerator_0_Uart_i/Uart/rxCounter_cry_6' has no load
WARNING - ngdbuild: logical net 'UartGenerator_0_Uart_i/Uart/rxCounter_cry_4' has no load
WARNING - ngdbuild: logical net 'UartGenerator_0_Uart_i/Uart/rxCounter_cry_2' has no load
WARNING - ngdbuild: logical net 'UartGenerator_0_Uart_i/Uart/rxCounter_cry_0' has no load
The VHDL-code is
entity UART is
generic (
dividerCounterBits: integer := 16
);
port (
Clk : in std_logic; -- Clock signal
Reset : in std_logic; -- Reset input
ClockDivider: in std_logic_vector(15 downto 0);
ParityMode : in std_logic_vector(1 downto 0); -- b00=No, b01=Even, b10=Odd, b11=UserBit
[...]
architecture Behaviour of UART is
constant oversampleExponent : integer := 4;
subtype TxCounterType is integer range 0 to (2**(dividerCounterBits+oversampleExponent))-1;
subtype RxCounterType is integer range 0 to (2**dividerCounterBits)-1;
signal rxCounter: RxCounterType;
signal txCounter: TxCounterType;
signal rxClockEn: std_logic; -- clock enable signal for receiver
signal txClockEn: std_logic; -- clock enable signal for transmitter
begin
rxClockdivider:process (Clk, Reset)
begin
if Reset='1' then
rxCounter <= 0;
rxClockEn <= '0';
elsif Rising_Edge(Clk) then
-- RX counter (oversampled)
if rxCounter = 0 then
rxClockEn <= '1';
rxCounter <= to_integer(unsigned(ClockDivider));
else
rxClockEn <= '0';
rxCounter <= rxCounter - 1;
end if;
end if;
end process;
txClockDivider: process (Clk, Reset)
[...]
rx: entity work.RxUnit
generic map (oversampleFactor=>2**oversampleExponent)
port map (Clk=>Clk, Reset=>Reset, ClockEnable=>rxClockEn, ParityMode=>ParityMode,
ReadA=>ReadA, DataO=>DataO, RxD=>RxD, RxAv=>RxAv, ParityBit=>ParityBit,
debugout=>debugout
);
end Behaviour;
This is a single Uart, to create them all (currently 11 uarts) I use this
-- UARTs
UartGenerator: For i IN 0 to uarts-1 generate
begin
Uart_i : entity work.UartBusInterface
port map (Clk=>r_qclk, Reset=>r_reset,
cs=>uartChipSelect(i), nWriteStrobe=>wr_strobe, nReadStrobe=>rd_strobe,
address=>AdrBus(1 downto 0), Databus=>DataBus,
TxD=>TxD_PAD_O(i), RxD=>RxD_PAD_I(i),
txInterrupt=>TxIRQ(i), rxInterrupt=>RxIRQ(i), debugout=>rxdebug(i));
uartChipSelect(i) <= '1' when to_integer(unsigned(adrbus(5 downto 2)))=i+4 and r_cs0='0' else '0';
end generate;
I can syntesis it and the uarts work, but why I got the warning?
IMHO the rxCounter should use each single possible value, but why each second bit creates the warning "has no load"?
I read somewhere that this mean that these net's aren't used and will be removed.
But to count from 0 to 2^n-1, I need no less than n-bits.

This warning means that nobody is "listening" to those nets.
It is OK to have signals that will be removed in synthesis. Warnings are not Errors! You just need to be aware of them.
We cannot assess what is happening from your partial code.
Is there a signal named rxCounter_cry?
What is the datatype of ClockDivider?
What is the value of dividerCounterBits?
What happens in the other process? If it is irrelevant, please try to run your synthesis without that process. If it is relevant, we need to see it.

Lattice ngdbuild is particularly spammy for the job it is doing, I pipe ngdbuild output through grep in my makefile to remove exactly these messages:
ngdbuild ... | grep -v "ngdbuild: logical net '.*' has no load"
There's more than 2500 of these otherwise, eliminating them helps concentrate on real issues.
Second worst toolchain spammer is edif2ngd complaining about Verilog parameters it does not have explicit handling for. This one is a two line message (over 300 of these) so I remove it with:
edif2ngd ... | sed '/Unsupported property/{N;d;}'

Just be aware that sometimes it implements things with adders. The highest order bit will not use the carry output of that adder, and the lowest order bit will not use the sign input. So you get a warning like:
WARNING - synthesis: logical net 'clock_chain/dcmachine/count_171_add_4_1/S0' has no load
WARNING - synthesis: logical net 'clock_chain/dcmachine/count_171_add_4_19/CO' has no load
No problem, bit 19 is the highest, so it will not carry anywhere, and bit 1 is the lowest, so it does not get a sign bit from anywhere. If, however, you get this warning on any of the bits in between highest and lowest, it normally means something is wrong, but not an error, so it will build something that "works" when you test it, but not in an error case. If you simulate it with error cases it will normally show undesirable results.

Related

How generate sine wave with vhdl?

I am a beginner in vhdl, I am trying to generate a sinus and square singal with a frequency of 50 Mhz, but first i'm trying to generate the sinus wave. I saw a lot of tutorials but it was quite complicated to understand. Here is the code I made. Thank you in advance for your help :)
Indications
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
entity sinus is
port(clk : in std_logic;
clear : in std_logic;
sel : in std_logic_vector(1 downto 0);
Dataout : out std_logic_vector(7 downto 0));
end sinus;
architecture Behavioral of sinus is
signal in_data : std_logic_vector(Dataout'range);
signal i : integer range 0 to 77:=0;
TYPE mem_data IS ARRAY (0 TO 255) OF integer range -128 to 127;
constant sin : mem_data := (
( 0),( 3),( 6),( 9),( 12),( 15),( 18),( 21),( 24),( 28),( 31),( 34),( 37),( 40),( 43),( 46), ( 48),( 51),( 54),( 57),( 60),( 63),( 65),( 68),( 71),( 73),( 76),( 78),( 81),( 83),( 85),( 88), ( 90),( 92),( 94),( 96),( 98),( 100),( 102),( 104),( 106),( 108),( 109),( 111),( 112),( 114),( 115),( 117), ( 118),( 119),( 120),( 121),( 122),( 123),( 124),( 124),( 125),( 126),( 126),( 127),( 127),( 127),( 127),( 127), ( 127),( 127),( 127),( 127),( 127),( 127),( 126),( 126),( 125),( 124),( 124),( 123),( 122),( 121),( 120),( 119), ( 118),( 117),( 115),( 114),( 112),( 111),( 109),( 108),( 106),( 104),( 102),( 100),( 98),( 96),( 94),( 92), ( 90),( 88),( 85),( 83),( 81),( 78),( 76),( 73),( 71),( 68),( 65),( 63),( 60),( 57),( 54),( 51), ( 48),( 46),( 43),( 40),( 37),( 34),( 31),( 28),( 24),( 21),( 18),( 15),( 12),( 9),( 6),( 3), ( 0),( -3),( -6),( -9),( -12),( -15),( -18),( -21),( -24),( -28),( -31),( -34),( -37),( -40),( -43),( -46), ( -48),( -51),( -54),( -57),( -60),( -63),( -65),( -68),( -71),( -73),( -76),( -78),( -81),( -83),( -85),( -88), ( -90),( -92),( -94),( -96),( -98),(-100),(-102),(-104),(-106),(-108),(-109),(-111),(-112),(-114),(-115),(-117), (-118),(-119),(-120),(-121),(-122),(-123),(-124),(-124),(-125),(-126),(-126),(-127),(-127),(-127),(-127),(-127), (-127),(-127),(-127),(-127),(-127),(-127),(-126),(-126),(-125),(-124),(-124),(-123),(-122),(-121),(-120),(-119), (-118),(-117),(-115),(-114),(-112),(-111),(-109),(-108),(-106),(-104),(-102),(-100),( -98),( -96),( -94),( -92), ( -90),( -88),( -85),( -83),( -81),( -78),( -76),( -73),( -71),( -68),( -65),( -63),( -60),( -57),( -54),( -51), ( -48),( -46),( -43),( -40),( -37),( -34),( -31),( -28),( -24),( -21),( -18),( -15),( -12),( -9),( -6),( -3));
begin
process(clk, clear) begin
if (clear='1') then
in_data <= (others => '0');
elsif (clk'event and clk='1') then
in_data <= in_data +1;
end if;
end process;
process (in_data(3))
begin
if (in_data(3)'event and in_data(3)='1') then
in_data <=conv_std_logic_vector(sin(i).8);
i<=i+1;
if (i=77) then
i<=0;
end if;
end if;
end process;
process(in_data, sel) begin
case sel is
when "00" =>Dataout<=in_data;
when others =>Dataout<= "00000000";
end case;
end process;
end Behavioral;
Thanks for the diagram. Because VHDL is hardware description language, the question you should ask yourself is : what do I want to implement ? where are the entity ports, the internal signals on this block diagram ?
The main issue you face is that you've no real correspondence between your design and your illustration. Moreover your VHDL coding style needs improvement both regarding the VHDL language itself and how you implement things.
Start by replacing
use IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
by
use IEEE.NUMERIC_STD.ALL;
This is the official vendor-agnostic VHDL package for signed and unsigned types. IEEE.STD_LOGIC_UNSIGNED and STD_LOGIC_ARITH.ALL are outdated vendor-dependent VHDL packages provided at a time where vendors couldn't agree on a common definition. This led to a lot of portability issues when having a design flow with tools from different vendors (Cadence, Mentor, Synopsys). The IEEE specification body solved once and for all this issue.
If you're designing real hardware, the following code can lead to implementation problems
process(clk, clear) begin
if (clear='1') then
in_data <= (others => '0');
elsif (clk'event and clk='1') then
in_data <= in_data +1;
end if;
end process;
Your in_data signal is asynchronously cleared but synchronously set. As you don't control when the clear signal will be asserted relative to the clk rising edge, there is a risk of in_data metastability or even that some bits of the in_data remains in reset while the others capture a new value.
For a description of the various ways to implement the reset Xilinx has a well documented white paper (WP272) which provides some guidelines useful for any synchronous design be it ASIC or FPGA, Xilinx or Altera. By the way if you have a look, for example, to the widely popular AMBA specifications (AHB, AXI, AXI-Stream, ...), their reset is asserted asynchronously but deasserted synchronously.
Personnally, following Xilinx guidelines, I distribute an asynchronous reset through the entire design and generate locally a synchronous reset with the following piece of code (being locally avoid the fanout issue of a system-wide synchronous reset)
if (p_reset_n = '0') then
s_reset_on_clock <= (others => '1');
else if rising_edge(p_clock) then
s_reset_on_clock <= '0' & s_reset_on_clock(3 downto 1);
end if;
end if;
s_reset_on_clock(0) is now your local synchronous reset signal that you can use like any other signal within the synchronous code block.
Please replace the old fashioned
elsif (clk'event and clk='1') then
with
elsif rising_edge(clk) then
it will be more obvious what's going on here.
You say that you want to generate a 50MHz signal but you don't say what the system frequency (or maybe I should understand it the other way around). The ratio system clock frequency / 50MHz will give you the sequence length.
Anyway, you will need to declare a free running counter as signal, let's call it counter (in your original code you have two signals, i and in_data, for this same purpose), and reset/increment it using your locally synchronous reset and your clock signal.
In case of a square signal (let's say sel = '0'), your output will be solely determined by the value of your counter. Below a given counter value, you'll have a predetermined dataout value (your 'low' state), while above this value, you'll have another predetermined dataout value (your 'high' state).
In case of a sine signal (let's say sel = '1'), the counter will represent the phase and will be used as input to your sine lookup table (that you, by the way, could initialize with a VHDL function calculating the lookup table content instead of providing precalculated literals). The output of the sine lookup table is your dataout output.
Let me know if you need more help.

How to read text file throuGH UART?

I have a VHDL module for a UART component which sends and receives serial data between an FPGA and PC. It currently works just fine. But how would I use this serial communication to interpret a 2-d matrix of integers in a text file sent from the PC to the FPGA?
More specifically once the text file is sent from the PC to the fpga, how would the 2-d array be stored in memory as such? I am not sure how to do this in vhdl
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity uart is
generic(
-- Default setting:
-- 19,200 baud, 8 data bis, 1 stop its, 2^2 FIFO
DBIT: integer:=8; -- # data bits
SB_TICK: integer:=16; -- # ticks for stop bits, 16/24/32
-- for 1/1.5/2 stop bits
DVSR: integer:= 326; -- baud rate divisor
-- DVSR = 100M/(16*baud rate)
DVSR_BIT: integer:=9; -- # bits of DVSR
FIFO_W: integer:=2 -- # addr bits of FIFO
-- # words in FIFO=2^FIFO_W
);
port(
clk, reset: in std_logic;
rd_uart, wr_uart: in std_logic;
rx: in std_logic;
w_data: in std_logic_vector(7 downto 0);
tx_full, rx_empty: out std_logic;
r_data: out std_logic_vector(7 downto 0);
tx: out std_logic
);
end uart;
architecture str_arch of uart is
signal tick: std_logic;
signal rx_done_tick: std_logic;
signal tx_fifo_out: std_logic_vector(7 downto 0);
signal rx_data_out: std_logic_vector(7 downto 0);
signal tx_empty, tx_fifo_not_empty: std_logic;
signal tx_done_tick: std_logic;
begin
baud_gen_unit: entity work.mod_m_counter(arch)
generic map(M=>DVSR, N=>DVSR_BIT)
port map(clk=>clk, reset=>reset,
q=>open, max_tick=>tick);
uart_rx_unit: entity work.uart_rx(arch)
generic map(DBIT=>DBIT, SB_TICK=>SB_TICK)
port map(clk=>clk, reset=>reset, rx=>rx,
s_tick=>tick, rx_done_tick=>rx_done_tick,
dout=>rx_data_out);
fifo_rx_unit: entity work.fifo(arch)
generic map(B=>DBIT, W=>FIFO_W)
port map(clk=>clk, reset=>reset, rd=>rd_uart,
wr=>rx_done_tick, w_data=>rx_data_out,
empty=>rx_empty, full=>open, r_data=>r_data);
fifo_tx_unit: entity work.fifo(arch)
generic map(B=>DBIT, W=>FIFO_W)
port map(clk=>clk, reset=>reset, rd=>tx_done_tick,
wr=>wr_uart, w_data=>w_data, empty=>tx_empty,
full=>tx_full, r_data=>tx_fifo_out);
uart_tx_unit: entity work.uart_tx(arch)
generic map(DBIT=>DBIT, SB_TICK=>SB_TICK)
port map(clk=>clk, reset=>reset,
tx_start=>tx_fifo_not_empty,
s_tick=>tick, din=>tx_fifo_out,
tx_done_tick=> tx_done_tick, tx=>tx);
tx_fifo_not_empty <= not tx_empty;
end str_arch;
UART is just a communication protocol and as such it is totally clueless about the meaning of the received data. What you can do is interpret the data on the fly instead of doing that later, but I would still advise you do that in a separate module.
The easiest way, if applicable, is knowing a priori the matrix size and/or moving the issue software-side in various flavors.
You can hardwire the known dimensions in your design (e.g. N-by-4 matrix format, hardwire 4 columns) to simplify stuff, but the most generic way of doing things is having the PC doing the work for you (if none of the dimensions is known a priori you cannot work out the size from the number of entries).
You could, e.g., instruct the PC to send something like this
NumberRows
NumberColumns
Value[0][0]
Value[0][1]
.
.
Value[NumberRows-1][NumberColumns-1]
Now you can just save everything you receive in memory and you know where to look at the number of rows and columns and proceed from there.
If you cannot make the PC send anything else than the pure text file, you will have a stream of ASCII characters you have to parse locally. My advice would be to design a module that stores anything up to the separator and upon detecting a separator starts the conversion from ASCII decimal to binary of whatever it has in the buffer, then saves it in memory. Upon separator it should also increment a counter so that when a newline arrives you know the number of columns, while on newline it should increment another counter so that on EOF you know the number of rows.

wait statement must contain condition clause with UNTIL keyword

The following VHDL is to be used to test bench. I keep getting an error on the first wait statement during analysis : "wait statement must contain condition clause with UNTIL keyword" I have several working test benches written this way. I can't seem to find what the error might be.
`library IEEE;
USE IEEE.std_logic_1164.all;
entity case_ex_TB is end;
architecture simple_test of case_ex_TB is
--- DUT Component Declaration ---
component case_ex
port(
clk, rstN: IN std_logic;
color: OUT std_logic_vector(2 downto 0));
end component;
--- Signals Declaration ---
signal rst, clock: std_logic:='0';
signal color: std_logic_vector(2 downto 0);
begin
DUT: case_ex --- DUT instantiation ---
port map (clk => clock,
rstN => rst,
color => color);
--- Signal's Waves Creation ---
rst <= '1','0' after 50 ns, '1' after 2 us;
clock_crtate: process
begin
while rst = '0' loop
clock <= '1','0' after 50 ns;
wait for 100 ns;
end loop;
clock <= '1';
wait;
end process;
end simple_test;`
You get this error because you have set your testbench as the top-level entity in Quartus-II. The top-level entity must remain the component case_ex, and this component must contain synthesizable code.
To simulate your testbench, you must configure a testbench. Just klick on the plus-sign before "RTL Simulation" and then "Edit Settings". (Names may differ with Quartus version).
Another thing you could note is that, it's necessary to put this file in as a simulation, following the path:
Assignments -> settings -> EDA tool settings -> simulation
and naturally, changing adding a testbench.
Another thing to note is that, if you want to change the top level entity, you just need to follow in Quartus:
Project -> Set as top level entity (in the file you are in)
Also, in the Project navigator, be sure that don't miss the fact that you need every single file you need in order for your program to work.

Array of 1-bit-wide memory

I'm using ISE 14.7 and i'm trying to create design with some 1-bit-wide distributed RAM blocks.
My memory declaration:
type tyMemory is array (0 to MEMORY_NUM - 1) of std_logic_vector(MEMORY_SIZE - 1 downto 0) ;
signal Memory: tyMemory := ( others => (others => '0')) ;
attribute ram_style : string ;
attribute ram_style of Memory : signal is "distributed" ;
My code:
MemoryGen : for i in 0 to MEMORY_NUM - 1 generate
process( CLK )
begin
if rising_edge(CLK) then
if CE = '1' then
DataOut(i) <= Memory(i)(Addr(i)) ;
Memory(i)(Addr(i)) <= DataIn(i) ;
end if ;
end if ;
end process ;
end generate ;
After synthesis i get this warning:
WARNING:Xst:3012 - Available block RAM resources offer a maximum of two write ports.
You are apparently describing a RAM with 16 separate write ports for signal <Memory>.
The RAM will be expanded on registers.
How can i forse xst to use distributed memory blocks with size=ARRAY_LENGTH and width=1?
I can create and use separate memory component(and is works), but i need more elegant solution.
You need to create an entity that describes a variable length 1-bit wide memory, then use a generate statement to create an array of these. While what you have done would provide the functionality you are asking for in a simulator, most FPGA tools will only extract memory elements if your code is written in certain ways.
You can find documentation on what code Xilinx ISE tools will understand as a memory element by selecting the appropriate document for your device here http://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/ise_n_xst_user_guide_v6s6.htm . Look under 'HDL Coding techniques'.
Note that if your memory length is large, you will not be able to get maximum performance from it without adding manual pipelining. I think you will get a synthesis message if your memory exceeds the intended useful maximum length for distributed memories. Assuming you are using a Spartan 6 device, you can find information on what the useful supported distributed memory sizes are here: http://www.xilinx.com/support/documentation/user_guides/ug384.pdf page 52.
This should inferre 16 one bit wide BlockRAMs:
architecture ...
attribute ram_style : string;
subtype tyMemory is std_logic_vector(MEMORY_SIZE - 1 downto 0) ;
begin
genMem : for i in 0 to MEMORY_NUM - 1 generate
signal Memory : tyMemory := (others => '0');
attribute ram_style of Memory : signal is "block";
begin
process(clk)
begin
if rising_edge(clk) then
if CE = '1' then
Memory(Addr(i)) <= DataIn(i) ;
DataOut(i) <= Memory(Addr(i)) ;
end if ;
end if ;
end process ;
end generate ;
end architecture;

my program is to generate a code for calculating 4 point fft

I have written the code in three parts. The first part contains functions such as add, subtract and negate. The second part is the butterfly structure. The third part is fft4.
I find some errors in butterfly part, please help to solve these.
The program is given below. In fft_pkg instead of multiplication by -j, I used negation function.
I avoided multiplication by 1 and -j and used addition and negate function.
-- Design Name:
-- Module Name: butterfly - Behavioral
-- Project Name:
-- Target Devices:
-- Tool versions:
-- Description:
--
-- Dependencies:
--
-- Revision:
-- Revision 0.01 - File Created
-- Additional Comments:
-------------------------------------------------------------
library ieee;
use IEEE.STD_LOGIC_1164.ALL;
library work;
use work.fft_pkg.all;
entity butterfly is
port (
s1,s2 : in complex;
stage : in std_logic; -- inputs
-- w : in complex -- phase factor
g1,g2 : out complex -- outputs
);
end butterfly;
architecture Behavioral of butterfly is
begin
--butterfly equations.
if ( stage ='0') then
g1 <= add(s1,s2);
g2 <= sub(s1,s2);
elsif (stage ='1') then
g1 <= add(s1,negate(s2));
g2 <= sub(s1,negate(s2));
end if;
end Behavioral;
--butterfly structure(in it instead of multiplication negate func is used)
library ieee;
use IEEE.STD_LOGIC_1164.ALL;
library work;
use work.fft_pkg.all;
entity butterfly is
port (
s1,s2 : in complex;
stage : in std_logic; -- inputs
-- w : in complex; -- phase factor
g1,g2 :out complex -- outputs
);
end butterfly;
architecture Behavioral of butterfly is
begin
--butterfly equations.
if ( stage ='0') then
g1 <= add(s1,s2);
g2 <= sub(s1,s2);
elsif (stage ='1') then
g1 <= add(s1,negate(s2));
g2 <= sub(s1,negate(s2));
end if;
end Behavioral;
library ieee;
use ieee.std_logic_1164.all;
library work;
use work.fft_pkg.all;
entity fft4 is
port (
s: in comp_array;
y : out comp_array
);
end fft4;
architecture rtl of fft4 is
component butterfly is
port (
s1,s2 : in complex; -- inputs
-- w : in complex; -- phase factor
g1,g2 : out complex -- outputs
);
end component;
signal g1,g2: comp_array; -- :=(others=>(0000,0000));
-- signal w:comp_array2 :=(("0000","0001"),("0000","1111"));
begin
--first stage of butterfly
bf11 : butterfly port map(s(0),s(2),'0',g1(0),g1(1));
bf12 : butterfly port map(s(1),s(3),'0',g1(2),g1(3));
--second stage of butterfly's.
bf21 : butterfly port map(g1(0),g1(2),'0',g2(0),g2(2));
bf22 : butterfly port map(g1(1),g1(3),'1',g2(1),g2(3));
end rtl;
Errors are
ERROR:HDLCompiler:806 - "C:\Users\RObin\fftfinal\butterfly.vhd" Line 36: Syntax error near "if".
ERROR:HDLCompiler:806 - "C:\Users\RObin\fftfinal\butterfly.vhd" Line 39: Syntax error near "elsif".
ERROR:HDLCompiler:806 - "C:\Users\RObin\fftfinal\butterfly.vhd" Line 42: Syntax error near "if".
ERROR:HDLCompiler:854 - "C:\Users\RObin\fftfinal\butterfly.vhd" Line 31: Unit ignored due to previous errors.
Try placing the if statements into a process:
architecture behavioral of butterfly is
begin
--butterfly equations.
process (s1, s2)
begin
if ( stage ='0') then
g1 <= add(s1,s2);
g2 <= sub(s1,s2);
elsif (stage ='1') then
g1 <= add(s1,negate(s2));
g2 <= sub(s1,negate(s2));
end if;
end process;
end behavioral;
It's not possible to verify your code without either package fft_pkg, or writing one, which would seem to require deciding what the type complex is.
Note that while Nicolas also commented that you could use concurrent conditional assignment statements, all concurrent statements have equivalent process statements or processes and block statements, which is how VHDL is both simulated and synthesized after elaboration.
You can also comment out or remove one of the two copies of entity butterfly and it's architecture Behavioral. If you need two different representations with the same interface you can change the architecture name of successive different architectures.
By default the last one analyzed will be used. Analyzing the same named entity and architecture over again has the effect of replacing the first occurrence while still requiring both are valid VHDL.

Resources