VHDL - synthesis results is not the same as behavioral

VHDL - synthesis results is not the same as behavioral - vhdl

I have to write program in VHDL which calculate sqrt using Newton method. I wrote the code which seems to me to be ok but it does not work.
Behavioral simulation gives proper output value but post synthesis (and launched on hardware) not.
Program was implemented as state machine. Input value is an integer (used format is std_logic_vector), and output is fixed point (for calculation
purposes input value was multiplied by 64^2 so output value has 6 LSB bits are fractional part).
I used function to divide in vhdl from vhdlguru blogspot.
In behavioral simulation calculating sqrt takes about 350 ns (Tclk=10 ns) but in post synthesis only 50 ns.
Used code:
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity moore_sqrt is
port (clk : in std_logic;
enable : in std_logic;
input : in std_logic_vector (15 downto 0);
data_ready : out std_logic;
output : out std_logic_vector (31 downto 0)
);
end moore_sqrt;
architecture behavioral of moore_sqrt is
------------------------------------------------------------
function division (x : std_logic_vector; y : std_logic_vector) return std_logic_vector is
variable a1 : std_logic_vector(x'length-1 downto 0):=x;
variable b1 : std_logic_vector(y'length-1 downto 0):=y;
variable p1 : std_logic_vector(y'length downto 0):= (others => '0');
variable i : integer:=0;
begin
for i in 0 to y'length-1 loop
p1(y'length-1 downto 1) := p1(y'length-2 downto 0);
p1(0) := a1(x'length-1);
a1(x'length-1 downto 1) := a1(x'length-2 downto 0);
p1 := p1-b1;
if(p1(y'length-1) ='1') then
a1(0) :='0';
p1 := p1+b1;
else
a1(0) :='1';
end if;
end loop;
return a1;
end division;
--------------------------------------------------------------
type state_type is (s0, s1, s2, s3, s4, s5, s6); --type of state machine
signal current_state,next_state: state_type; --current and next state declaration
signal xk : std_logic_vector (31 downto 0);
signal temp : std_logic_vector (31 downto 0);
signal latched_input : std_logic_vector (15 downto 0);
signal iterations : integer := 0;
signal max_iterations : integer := 10; --corresponds with accuracy
begin
process (clk,enable)
begin
if enable = '0' then
current_state <= s0;
elsif clk'event and clk = '1' then
current_state <= next_state; --state change
end if;
end process;
--state machine
process (current_state)
begin
case current_state is
when s0 => -- reset
output <= "00000000000000000000000000000000";
data_ready <= '0';
next_state <= s1;
when s1 => -- latching input data
latched_input <= input;
next_state <= s2;
when s2 => -- start calculating
-- initial value is set as a half of input data
output <= "00000000000000000000000000000000";
data_ready <= '0';
xk <= "0000000000000000" & division(latched_input, "0000000000000010");
next_state <= s3;
iterations <= 0;
when s3 => -- division
temp <= division ("0000" & latched_input & "000000000000", xk);
next_state <= s4;
when s4 => -- calculating
if(iterations < max_iterations) then
xk <= xk + temp;
next_state <= s5;
iterations <= iterations + 1;
else
next_state <= s6;
end if;
when s5 => -- shift logic right by 1
xk <= division(xk, "00000000000000000000000000000010");
next_state <= s3;
when s6 => -- stop - proper data
-- output <= division(xk, "00000000000000000000000001000000"); --the nearest integer value
output <= xk; -- fixed point 24.6, sqrt = output/64;
data_ready <= '1';
end case;
end process;
end behavioral;
Below screenshoots of behavioral and post-sythesis simulation results:
Behavioral simulation
Post-synthesis simulation
I have only little experience with VHDL and I have no idea what can I do to fix problem. I tried to exclude other process which was for calculation but it also did not work.
I hope you can help me.
Platform: Zynq ZedBoard
IDE: Vivado 2014.4
Regards,
Michal

A lot of the problems can be eliminated if you rewrite the state machine in single process form, in a pattern similar to this. That will eliminate both the unwanted latches, and the simulation /synthesis mismatches arising from sensitivity list errors.
I believe you are also going to have to rewrite the division function with its loop in the form of a state machine - either a separate state machine, handshaking with the main one to start a divide and signal its completion, or as part of a single hierarchical state machine as described in this Q&A.

This code is neither correct for simulation nor for synthesis.
Simulation issues:
Your sensitivity list is not complete, so the simulation does not show the correct behavior of the synthesized hardware. All right-hand-side signals should be include if the process is not clocked.
Synthesis issues:
Your code produces masses of latches. There is only one register called current_state. Latches should be avoided unless you know exactly what you are doing.
You can't divide numbers in the way you are using the function, if you want to keep a proper frequency of your circuit.
=> So check your Fmax report and
=> the RTL schematic or synthesis report for resource utilization.
Don't use the devision to shift bits. Neither in software the compiler implements a division if a value is shifted by a power of two. Us a shift operation to shift a value.
Other things to rethink:
enable is a low active asynchronous reset. Synchronous resets are better for FPGA implementations.

VHDL code may by synthesizable or not, and the synthesis result may behave as the simulation, or not. This depends on the code, the synthesizer, and the target platform, and is very normal.
Behavioral code is good for test-benches, but - in general - cannot be synthesized.
Here I see the most obvious issue with your code:
process (current_state)
begin
[...]
iterations <= iterations + 1;
[...]
end process;
You are iterating over a signal which does not appear in the sensitivity list of the process. This might be ok for the simulator which executes the process blocks just like software. On the other hand side, the synthesis result is totally unpredictable. But adding iterations to the sensitivity list is not enough. You would just end up with an asynchronous design. Your target platform is a clocked device. State changes may only occur at the trigger edge of the clock.
You need to tell the synthesizer how to map the iterations required to perform this calculation over the clock cycles. The safest way to do that is to break down the behavioural code into RTL code (https://en.wikipedia.org/wiki/Register-transfer_level#RTL_in_the_circuit_design_cycle).

Related

Edit - Can't Infer Register Because It's Behavior Does Not Match Any Supported Register Model VHDL

This is a branch off of a separate question I asked. I am going to explain more in depth on what I am trying to do and what it is not liking. This is a school project and doesn't need to follow standards.
I am attempting to make the SIMON game. Right now, what I am trying to do is use a switch case for levels and each level is supposed to be faster (hence different frequency dividers). The first level is supposed to be the first frequency and a pattern of LEDs is supposed to light up and disappear. Before I put in a switch case, the first level was by itself (no second level stuff) and it lit up and disappeared like it should. I also used compare = 0 in order to compare in output to an input. (The user is supposed to flip up the switches in the light pattern they saw). This worked when the first level was by itself but now that it is in a switch case, it doesn't like compare. I'm not sure how to get around that in order to compare an output to an input.
The errors I am getting are similar to before:
Error (10821): HDL error at FP.vhd(75): can't infer register for "compare" because its behavior does not match any supported register model
Error (10821): HDL error at FP.vhd(75): can't infer register for "count[0]" because its behavior does not match any supported register model
Error (10821): HDL error at FP.vhd(75): can't infer register for "count[1]" because its behavior does not match any supported register model
Error (10821): HDL error at FP.vhd(75): can't infer register for "count[2]" because its behavior does not match any supported register model
Error (10822): HDL error at FP.vhd(80): couldn't implement registers for assignments on this clock edge
Error (10822): HDL error at FP.vhd(102): couldn't implement registers for assignments on this clock edge
Error (12153): Can't elaborate top-level user hierarchy
I also understand that it doesn't like the rising_edge(toggle) but I need that in order to make the LED pattern light up and disappear.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.std_logic_unsigned.all;
entity FP is
port(
clk, reset : in std_logic;
QF : out std_logic_vector (3 downto 0);
checkbtn : in std_logic;
Switch : in std_logic_vector(3 downto 0);
sel : in std_logic_vector (1 downto 0);
score : out std_logic_vector (6 downto 0)
);
end FP;
architecture behavior of FP is
signal time_count: integer:=0;
signal toggle : std_logic;
signal toggle1 : std_logic;
signal count : std_logic_vector (2 downto 0);
signal seg : std_logic_vector (3 downto 0);
signal compare : integer range 0 to 1:=0;
type STATE_TYPE is (level1, level2);
signal level : STATE_TYPE;
--signal input : std_logic_vector (3 downto 0);
--signal sev : std_logic_vector (6 downto 0);
begin
process (clk, reset, sel)
begin
if (reset = '0') then
time_count <= 0;
toggle <= '0';
elsif rising_edge (clk) then
case sel is
when "00" =>
if (time_count = 1249999) then
toggle <= not toggle;
time_count <= 0;
else
time_count <= time_count+1;
end if;
when "01" =>
if (time_count = 2499999) then
toggle1 <= not toggle1;
time_count <= 0;
else
time_count <= time_count+1;
end if;
when "10" =>
if (time_count = 4999999) then
toggle <= not toggle;
time_count <= 0;
else
time_count <= time_count+1;
end if;
when "11" =>
if (time_count = 12499999) then
toggle <= not toggle;
time_count <= 0;
else
time_count <= time_count+1;
end if;
end case;
end if;
end process;
Process (toggle, compare, switch)
begin
case level is
when level1 =>
if sel = "00" then
count <= "001";
seg <= "1000";
elsif (rising_edge (toggle)) then
count <= "001";
compare <= 0;
if (count = "001") then
count <= "000";
else
count <= "000";
end if;
end if;
if (switch = "1000") and (compare = 0) and (checkbtn <= '0') then
score <= "1111001";
level <= level2;
else
score <= "1000000";
level <= level1;
end if;
when level2 =>
if sel = "01" then
count <= "010";
seg <= "0100";
elsif (rising_edge (toggle1)) then
count <= "010";
compare <= 1;
if (count = "010") then
count <= "000";
else
count <= "000";
end if;
end if;
if (switch = "0100") and (compare = 1) and (checkbtn <= '0') then
score <= "0100100";
else
score <= "1000000";
level <= level1;
end if;
end case;
case count is
when "000"=>seg<="0000";
when "001"=>seg<="1000";
when "010"=>seg<="0100";
when "011"=>seg<="0110";
when "100"=>seg<="0011";
when others=>seg<="0000";
end case;
end process;
QF <= seg;
end behavior;
Thanks again in advance!

Well... it is hard to tell what is wrong, because this state machine is written in wrong way. You should look for references about proper modeling of FSM in VHDL. One good example is here.
If you use Quartus, you could also look for Altera's description on how to model FSM specifically for their compiler.
I will now give you just two advices. First is that you shouldn't (or mabye even you can't) use is two
if rising_edge (clk)
checks in one process. If your process is supposed to be sensitive on clock edge, write it once at the beginning.
Second thing is that if you want to model FSM with one process with synchronous reset, then put just clk on sensitivity list.
EDIT after question and code edit:
Ok, much better now. But another few things:
Your FSM is still not like it should. Look again at the example in the source I gave you above and edit it to be like there, or make it one process FSM like in example in this link.
Intends! Very important. I couldn't spot some of obvious errors, before I made proper intendation in your code. This leads me to...
Look at the places, there you assign values to count, in particular the if statements. No mater what, you assign the same value of "000".
Similar story with another signal - seg. You assign to it some value in the process, and then at the end of this process there is case statement in which you assign to it some other value, making this previous assignments irrelevant.
Use rising_edge only once in the process, only to clock, and only at the very beginning of the process, or in the way you did in the first process, that has asynchronous reset. In second process you did all this three things.
In sequential process with rising_edge, like the first one, you don't have to put to sensitivity list anything more than clock, and reset if it is asynchronous, like in your case.
Sensitivity list in second process. It is parallel process, so you should put there signals, that you check in a process, and can change outside of it. It is not the case for compare. But there should be signals: level, sel and toggle1.
As I'm still not sure what are you trying to achieve, I will not tell you what exactly to do. Fix your code according to points above, then maybe it will just work.

VHDL - Comparing present and past inputs

I have a system that has a 3 input D_in which is read at every positive clk edge.
If say I want to see if the current input, D_in is greater then the previous D_in by at least 2, then a count will increment. How do I write this in VHDL?
if clk'event and clk = '1' then --read at positive edge
if D_in > (D_in + 010) then <---I am sure this is wrong. How to write the proper code?
Entity ABC is
Port(D_in: in std_logic_vector(2 downto 0);
Count: out std_logic_vector(2 downto 0));
Architecture ABC_1 of ABC is
signal D_last: std_logic_vector(2 downto 0);
Begin
Process(D_in)
D_last <= D_in;
if clk'event and clk = '1' then
if D_last > (D_in + 2) then
count <= count + 1;
end if;
end process;
end ABC_1;

The "good" way to write this process is as follow :
process (clk)
begin
if (rising_edge(clk)) then
-- store the value for the next time the process will be activated
-- Note that D_last value will be changed after the whole process is completed
D_last <= D_in;
-- compare the actual D_in value with the previous one stored in D_last.
-- D_last value is its value at the very beginning of the process activation
if (D_in > D_last + 2) then
-- increment the counter
count <= count + 1;
end if;
end if;
end process;
Note that D_in, D_last and count has to be declared as unsigned and not as std_logic_vector.
I suggest you to read this post which explains how a process actually works : when are signals updated and which signal value is used into the process.
Cheers
[edit] This answer should be fine for your question. But the code you show has other errors :
The signal clk has to be an input for your entity.
The signal count can't be read in your architecture because it's defined as output in the entity. Then the line "count <= count + 1" can't be resolved. You have to use an internal signal and then assign its value to "count" outside of a process :
count <= count_in;

There are several other errors in your design specification as well. This answer attempts to answer all concerns in one place.
VHDL is simulated by executing processes in simulation cycles. Every
concurrent statement can be expresses as either an equivalent process
statement or combination of process statements and block statements.
Signal assignment is to a projected output waveform queue for a specified
time. When no time is specified it's the current time, and the value will be updated
prior to executing processes in the next simulation cycle, a delta cycle, simulation
time is advanced when there are no remaining events scheduled for the
current simulation time.
To avoid confusion over when signal assignments occur, view them as
separate processes (whether you express them that way or not).
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity abc is
port (
clk: in std_logic; -- Note 1
d_in: in std_logic_vector(2 downto 0);
count: out std_logic_vector(2 downto 0)
);
end entity; -- Note 2
architecture foo of abc is
signal d_last: std_logic_vector(2 downto 0);
begin
DLAST: -- Note 3
process (clk)
begin
if rising_edge(clk) then -- Note 4
d_last <= d_in;
end if;
end process;
INC_COUNT:
process (clk)
variable cnt: unsigned(2 downto 0) := "000"; -- Note 5
begin
if rising_edge(clk) and
unsigned(d_last) > unsigned(d_in) + 2 then -- Mote 6,7
cnt := cnt + 1;
end if;
count <= std_logic_vector(cnt);
end process;
end architecture;
Notes
Missing clk from port interface
Missing end statement for entity ABC.
Conceptually view D_last
register separately from Count counter sensitive to clk. (Can be
merged as one process)
rising_edge function expresses clk'event and clk = '1' ('event
and "=" are both functions)
The counter must represent a binary value for "+" to produce a
binary result
"+" is higher priority than ">", which is higher priority than "and"
(you don't need parentheses)
Package numeric_std provide relational and adding operators for
type sign and type unsigned, requiring type conversion for D_last
and D_in.
Alternatively use Synopsys package std_logic_unsigned which
depends on Synopsys package std_logic_arith and treats
std_logic_vector as unsigned. This avoids type conversion, and
allows array types to be declared as type std_logic_vector.
The variable cnt can be done away with if port count were to be declared mode buffer and provided a default value:
count: buffer std_logic_vector(2 downto 0) :="000" -- Note 5
and
INC_COUNT:
process (clk)
begin
if rising_edge(clk) and
unsigned(d_last) > unsigned(d_in) + 2 then -- Note 6,7
count <= std_logic_vector(unsigned(count) + 1);
end if;
end process;
You can't use Count as mode out to algorithmically modify it's own value. The ability to access the value of a mode out port is intended for verification and is a IEEE Std 1076-2008 feature.
And about now you can see the value of Synopsys's std_logic_unsigned package, at least as far avoiding type conversions.
Also, i got another question. If d_in is 0 for 3 consecutive clk cycles, i want to reset count to 0. How do i write the code to represent for 3 clk cycles?
Add another pipeline signal for D_in:
signal d_last: std_logic_vector(2 downto 0) := "000";
signal d_last1: std_logic_vector(2 downto 0) := "000";
Note these also have default values, which FPGA synthesis will generally honor, it's represented by the state of the flip flop in the bistream image used for programming the FPGA.
And modify how the counter is operated:
INC_COUNT:
process (clk)
begin
if rising_edge(clk) then
if d_in = "000" and d_last = "000" and d_last1 = "000" then
count <= "000";
elsif unsigned(d_last) > unsigned(d_in) + 2 then -- Note 6,7
count <= std_logic_vector(unsigned(count) + 1);
end if;
end if;
end process;
The three incarnations of the example all analyze, they haven't been simulation and should be synthesis eligible.

LFSR in VHDL always generating zero

I have written a LFSR in VHDL. I have tested it in simulation and it works as expected (generates random integers between 1 and 512). However when I put it onto hardware it always generates "000000000"
The code is as follows:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity LFSR is
port(clk, reset : in bit;
random : out std_logic_vector (8 downto 0));
end entity LFSR;
architecture behaviour of LFSR is
signal temp : std_logic_vector (8 downto 0) := (8 => '1', others => '0');
begin
process(clk)
begin
if(clk'event and clk='1') then
if(reset='0') then --reset on signal high, carry out normal function
temp(0) <= temp(8);
temp(1) <= temp(0);
temp(2) <= temp(1) XOR temp(8);
temp(3) <= temp(2) XOR temp(8);
temp(4) <= temp(3) XOR temp(8);
temp(8 downto 5) <= temp(7 downto 4);
else
--RESET
temp <= "100000000";
end if;
end if;
random <= temp;
end process;
end architecture behaviour;
It was tested in Modelsim and compiled in Quartus II for a Cyclone III DE0 board.
Can anyone see why it is not working (in practice, the simulation is fine) and explain what I need to change to get it to work?

If reset is directly from a FPGA pin, then it is probably not synchronized
with clk, so proper synchronous reset operation is not ensured.
Add two flip-flops for synchronization of reset to clk before it is used in
the process. This can be done with:
...
signal reset_meta : bit; -- Meta-stable flip-flop
signal reset_sync : bit; -- Synchronized reset
begin
process(clk)
begin
if(clk'event and clk='1') then
reset_meta <= reset;
reset_sync <= reset_meta;
if (reset_sync = '0') then -- Normal operation
...
Altera have some comment about this in External Reset Should be Correctly
Synchronized.
The description covers flip-flops with asynchronous reset, but the use of two
flip-flops for synchronisation of external reset applies equally in your case.
Still remember to move the random <= temp inside the if as David pointed
out.

If you haven't any luck you might take a look at the synthesized schematic or perform post-synthesis simulation. I occasionally swap RTL models out for post - synth models for verification if there's something that isn't obvious in behavioral simulation.
-Jerry

Why it is necessary to use internal signal for process?

I'm learning VHDL from the root, and everything is OK except this. I found this from Internet. This is the code for a left shift register.
library ieee;
use ieee.std_logic_1164.all;
entity lsr_4 is
port(CLK, RESET, SI : in std_logic;
Q : out std_logic_vector(3 downto 0);
SO : out std_logic);
end lsr_4;
architecture sequential of lsr_4 is
signal shift : std_logic_vector(3 downto 0);
begin
process (RESET, CLK)
begin
if (RESET = '1') then
shift <= "0000";
elsif (CLK'event and (CLK = '1')) then
shift <= shift(2 downto 0) & SI;
end if;
end process;
Q <= shift;
SO <= shift(3);
end sequential;
My problem is the third line from bottom. My question is, why we need to pass the internal signal value to the output? Or in other words, what would be the problem if I write Q <= shift (2 downto 0) & SI?

In the case of the shown code, the Q output of the lsr_4 entity comes from a register (shift representing a register stage and being connected to Q). If you write the code as you proposed, the SI input is connected directly (i.e. combinationally) to the Q output. This can also work (assuming you leave the rest of the code in place), it will perform the same operation logically expect eliminate one clock cycle latency. However, it's (generally) considered good design practice to have an entity's output being registered in order to not introduce long "hidden" combinational paths which are not visible when not looking inside an entity. It usually makes designing easier and avoids running into timing problems.

First, this is just a shift register, so no combinational blocks should be inferred (except for input and output buffers, which are I/O related, not related to the circuit proper).
Second, the signal called "shift" can be eliminated altogether by specifying Q as "buffer" instead of "out" (this is needed because Q would appear on both sides of the expression; "buffer" has no side effects on the inferred circuit). A suggestion for your code follows.
Note: After compiling your code, check in the Netlist Viewers / Technology Map Viewer tool what was actually implemented.
library ieee;
use ieee.std_logic_1164.all;
entity generic_shift_register is
generic (
N: integer := 4);
port(
CLK, RESET, SI: in std_logic;
Q: buffer std_logic_vector(N-1 downto 0);
SO: out std_logic);
end entity;
architecture sequential of generic_shift_register is
begin
process (RESET, CLK)
begin
if (RESET = '1') then
Q <= (others => '0');
elsif rising_edge(CLK) then
Q <= Q(N-2 downto 0) & SI;
end if;
end process;
SO <= Q(N-1);
end architecture;

VHDL: creating a very slow clock pulse based on a very fast clock

(I'd post this in EE but it seems there are far more VHDL questions here...)
Background: I'm using the Xilinx Spartan-6LX9 FPGA with the Xilinx ISE 14.4 (webpack).
I stumbled upon the dreaded "PhysDesignRules:372 - Gated clock" warning today, and I see there's a LOT of discussion out there concerning that in general. The consensus seems to be to use one of the DCMs on the FPGA to do clock division but... my DCM doesn't appear to be capable of going from 32 MHz to 4.096 KHz (per the wizard it bottoms out at 5MHz based on 32MHz... and it seems absurd to try to chain multiple DCMs for this low-frequency purpose).
My current design uses clk_in to count up to a specified value (15265), resets that value to zero and toggles the clk_out bit (so I end up with a duty cycle of 50%, FWIW). It does the job, and I can easily use the rising edge of clk_out to drive the next stage of my design. It seems to work just fine, but... gated clock (even though it isn't in the range where clock skew would IMHO be very relevant). (Note: All clock tests are done using the rising_edge() function in processes sensitive to the given clock.)
So, my questions:
If we're talking about deriving a relatively slow clk_out from a much faster clk_in, is gating still considered bad? Or is this sort of "count to x and send a pulse" thing pretty typical for FPGAs to generate a "clock" in the KHz range and instead some other unnecessary side-effect may be triggering this warning instead?
Is there a better way to create a low KHz-range clock from a MHz-range master clock, keeping in mind that using multiple DCMs appears to be overkill here (if it's possible at all given the very low output frequency)? I realize the 50% duty cycle may be superfluous but assuming one clock in and not using the on-board DCMs how else would one perform major clock division with an FPGA?
Edit: Given the following (where CLK_MASTER is the 32 MHz input clock and CLK_SLOW is the desired slow-rate clock, and LOCAL_CLK_SLOW was a way to store the state of the clock for the whole duty-cycle thing), I learned that this configuration causes the warning:
architecture arch of clock is
constant CLK_MASTER_FREQ: natural := 32000000; -- time := 31.25 ns
constant CLK_SLOW_FREQ: natural := 2048;
constant MAX_COUNT: natural := CLK_MASTER_FREQ/CLK_SLOW_FREQ;
shared variable counter: natural := 0;
signal LOCAL_CLK_SLOW: STD_LOGIC := '0';
begin
clock_proc: process(CLK_MASTER)
begin
if rising_edge(CLK_MASTER) then
counter := counter + 1;
if (counter >= MAX_COUNT) then
counter := 0;
LOCAL_CLK_SLOW <= not LOCAL_CLK_SLOW;
CLK_SLOW <= LOCAL_CLK_SLOW;
end if;
end if;
end process;
end arch;
Whereas this configuration does NOT cause the warning:
architecture arch of clock is
constant CLK_MASTER_FREQ: natural := 32000000; -- time := 31.25 ns
constant CLK_SLOW_FREQ: natural := 2048;
constant MAX_COUNT: natural := CLK_MASTER_FREQ/CLK_SLOW_FREQ;
shared variable counter: natural := 0;
begin
clock_proc: process(CLK_MASTER)
begin
if rising_edge(CLK_MASTER) then
counter := counter + 1;
if (counter >= MAX_COUNT) then
counter := 0;
CLK_SLOW <= '1';
else
CLK_SLOW <= '0';
end if;
end if;
end process;
end arch;
So, in this case it was all for lack of an else (like I said, the 50% duty cycle was originally interesting but wasn't a requirement in the end, and the toggle of the "local" clock bit seemed quite clever at the time...) I was mostly on the right track it appears.
What's not clear to me at this point is why using a counter (which stores lots of bits) isn't causing warnings, but a stored-and-toggled output bit does cause warnings. Thoughts?

If you just need a clock to drive another part of your logic in the FPGA, the easy answer is to use a clock enable.
That is, run your slow logic on the same (fast) clock as everything else, but use a slow enable for it. Example:
signal clk_enable_200kHz : std_logic;
signal clk_enable_counter : std_logic_vector(9 downto 0);
--Create the clock enable:
process(clk_200MHz)
begin
if(rising_edge(clk_200MHz)) then
clk_enable_counter <= clk_enable_counter + 1;
if(clk_enable_counter = 0) then
clk_enable_200kHz <= '1';
else
clk_enable_200kHz <= '0';
end if;
end if;
end process;
--Slow process:
process(clk_200MHz)
begin
if(rising_edge(clk_200MHz)) then
if(reset = '1') then
--Do reset
elsif(clk_enable_200kHz = '1') then
--Do stuff
end if;
end if;
end process;
The 200kHz is approximate though, but the above can be extended to basically any clock enable frequency you need. Also, it should be supported directly by the FPGA hardware in most FPGAs (it is in Xilinx parts at least).
Gated clocks are almost always a bad idea, as people often forget that they are creating new clock-domains, and thus do not take the necessary precautions when interfacing signals between these. It also uses more clock-lines inside the FPGA, so you might quickly use up all your available lines if you have a lot of gated clocks.
Clock enables have none of these drawbacks. Everything runs in the same clock domain (although at different speeds), so you can easily use the same signals without any synchronizers or similar.

Note for this example to work this line,
signal clk_enable_counter : std_logic_vector(9 downto 0);
must be changed to
signal clk_enable_counter : unsigned(9 downto 0);
and you'll need to include this library,
library ieee;
use ieee.numeric_std.all;

Both your samples create a signal, one of which toggles at a slow rate, and one of which pulses a narrow pulse at a "slow-rate". If both those signals go to the clock-inputs of other flipflops, I would expect warnings about clock routing being non-optimal.
I'm not sure why you get a gated clock warning, that usually comes about when you do:
gated_clock <= clock when en = '1' else '0';

Here's a Complete Sample Code :
LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.NUMERIC_STD.ALL;
ENTITY Test123 IS
GENERIC (
clk_1_freq_generic : unsigned(31 DOWNTO 0) := to_unsigned(0, 32); -- Presented in Hz
clk_in1_freq_generic : unsigned(31 DOWNTO 0) := to_unsigned(0, 32) -- Presented in Hz, Also
);
PORT (
clk_in1 : IN std_logic := '0';
rst1 : IN std_logic := '0';
en1 : IN std_logic := '0';
clk_1 : OUT std_logic := '0'
);
END ENTITY Test123;
ARCHITECTURE Test123_Arch OF Test123 IS
--
SIGNAL clk_en_en : std_logic := '0';
SIGNAL clk_en_cntr1 : unsigned(31 DOWNTO 0) := (OTHERS => '0');
--
SIGNAL clk_1_buffer : std_logic := '0';
SIGNAL clk_1_freq : unsigned(31 DOWNTO 0) := (OTHERS => '0'); -- Presented in Hz, Also
SIGNAL clk_in1_freq : unsigned(31 DOWNTO 0) := (OTHERS => '0'); -- Presented in Hz
--
SIGNAL clk_prescaler1 : unsigned(31 DOWNTO 0) := (OTHERS => '0'); -- Presented in Cycles (Relative To The Input Clk.)
SIGNAL clk_prescaler1_halved : unsigned(31 DOWNTO 0) := (OTHERS => '0');
--
BEGIN
clk_en_gen : PROCESS (clk_in1)
BEGIN
IF (clk_en_en = '1') THEN
IF (rising_edge(clk_in1)) THEN
clk_en_cntr1 <= clk_en_cntr1 + 1;
IF ((clk_en_cntr1 + 1) = clk_prescaler1_halved) THEN -- a Register's (F/F) Output Only Updates Upon a Clock-Edge : That's Why This Comparison Is Done This Way !
clk_1_buffer <= NOT clk_1_buffer;
clk_1 <= clk_1_buffer;
clk_en_cntr1 <= (OTHERS => '0');
END IF;
END IF;
ELSIF (clk_en_en = '0') THEN
clk_1_buffer <= '0';
clk_1 <= clk_1_buffer;
clk_en_cntr1 <= (OTHERS => '0'); -- Clear Counter 'clk_en_cntr1'
END IF;
END PROCESS;
update_clk_prescalers : PROCESS (clk_in1_freq, clk_1_freq)
BEGIN
clk_prescaler1 <= (OTHERS => '0');
clk_prescaler1_halved <= (OTHERS => '0');
clk_en_en <= '0';
IF ((clk_in1_freq > 0) AND (clk_1_freq > 0)) THEN
clk_prescaler1 <= (clk_in1_freq / clk_1_freq); -- a Register's (F/F) Output Only Updates Upon a Clock-Edge : That's Why This Assignment Is Done This Way !
clk_prescaler1_halved <= ((clk_in1_freq / clk_1_freq) / 2); -- (Same Thing Here)
IF (((clk_in1_freq / clk_1_freq) / 2) > 0) THEN -- (Same Thing Here, Too)
clk_en_en <= '1';
END IF;
ELSE
NULL;
END IF;
END PROCESS;
clk_1_freq <= clk_1_freq_generic;
clk_in1_freq <= clk_in1_freq_generic;
END ARCHITECTURE Test123_Arch;

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio