Pulse generator in VHDL with any frequency - vhdl

I am doing this project that will output a desired frequency. For most frequencies i can make valid code, but when it comes to frequency like 300 Hz I'm having trouble.
So here is my code for most of them:
library ieee;
use ieee.std_logic_1164.all;
entity test is
port(
clk:in std_logic:='0';
clk_o:buffer std_logic:='0'
);
end test;
architecture Behavioral of test is
begin
process(clk)
variable temp:integer range 0 to 1000000:=0;
begin
if(clk'event)then
temp:=temp+1;
if(temp>=1000000)then
clk_o<=not clk_o;
temp:=0;
end if;
end if;
end process;
end Behavioral;
This will generate frequency of 50 Hz because the clock speed of my FPGA is 50 MHz. So first I tried to divide it, but problem is that you can't generate 300 Hz because 50*10^6/300 is 166666.667 and so on.
Then I saw that you can make time type of variable and make period last 1/300 but then i realized it is not synthesis eligible so it's no good. Also goes with REAL type of variable that could make it more accurate then integer variable but it's also not synthesis eligible.
So I'm out of ideas, if anyone can give me some hint I would much appreciate it.

If you just use 166666 you will only be too fast by 0.0004% or 4ppm. This is such a small error that it typically won't matter in a real implementation - your 50MHz oscillator probably has more error than that.
[EDIT: as noted in the comments, crystal oscillators are likely to have 10-20ppm error, but there are other types of oscillators that have much higher or lower error (although the 50MHz FPGA clock is probably a crystal)]
If you really need to get rid of that 0.0004% error, you can use a DCM/MMCM/PLL (depending on the capabilities of your FPGA) to multiply your 50MHz clock to 150MHz first which will divide evenly to 300 Hz.
As Brian Drummond mentioned, you could also use a 0-2 counter to count to 166667 2 out of every 3 cycles and to 166666 on the other, which will avoid cumulative phase error if you were trying to remain in-phase with a perfectly ideal 300MHz source.
The fact that no real oscillators are perfectly matched to the advertized frequency is why RS232 re-syncs to the clock (to avoid cumulative phase error) frequently and why higher speed transfers use source synchronous clocking or Clock-and-data-recovery PLLs.

Related

Integer output turns to binary in synthesize ISE

I have a VHDL BCD counter whose output is an integer value (digit).
But when I simulate the code in Xilinx ISE it shows the code's waveform in binary value. The code works but the output should be integer but it's not. I have tested this code in Modelsim and the output is correct and it's in integer value. This problem is in code synthesize too and the value is binary.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity bcdcnt is
Port ( clk : in STD_LOGIC;
digit : out INTEGER RANGE 0 TO 9);
end bcdcnt;
architecture Behavioral of bcdcnt is
begin
count: PROCESS(clk)
VARIABLE temp : INTEGER RANGE 0 TO 10;
BEGIN
IF (clk'EVENT AND clk = '1') THEN
temp := temp + 1;
IF (temp = 10) THEN temp := 0;
END IF;
END IF;
digit <= temp;
END PROCESS count;
end Behavioral;
That's what synthesis does.
That's what synthesis MUST do : it translates your high level design to the resources in your FPGA or ASIC, which are binary. So, what's the problem here?
If you need to simulate the post-synth result, the usual approach is to create a wrapper entity that takes the correct port types, and translates between those and the post-synthesis netlist component.
Then, simulation should work with either the original entity, or this wrapper entity, both of which have integer ports.
Better still, you can re-use the same entity, and add the wrapper as a second architecture, thus guaranteeing that it uses the same interface (ports).
(You can even instantiate both the original and post-synth in its wrapper in the testbench, in parallel with a comparator on their outputs, to see they both do the same thing. But note there will be gate-level delays between them; usually you check outputs only on clock edges so these do not matter.)
Another approach is to restrict port types on the top level of the design to binary types like std_logic_vector. This plays nicer with badly designed tools, like ISE where the automatically generated testbench will have binary port types, (I generally edit them back to the correct ones; it's almost easier to write TBs from scratch).
But it restricts you to using an obscure and complex design style instead of higher level abstractions like Integers.
It's bad - really bad - that this approach is taught and encouraged so widely. But it is, and sometimes you'll just have to live with it. (Even in this approach, there's no reason to avoid decent abstractions internal to the FPGA, as long as the synthesis tool understands them).
A third approach - roughly, "trust, but verify" - is to trust that synthesis tools are competently written - which is usually true - and forget about post-synthesis simulation.
Just verify the design thoroughly at the behavioural level in simulation, then synthesise it, and test in live FPGA.
99% of the time (unless you're writing really weird VHDL), synth and P&R have done the right thing, and any differences you see are due to the aforementioned I/O timings (gate delays at the I/O pins). Then model these in the testbench and/or wrapper until you see the same behaviour in both (fix anything that needs fixing, and re-synthesise).
In this approach you only need to bother with the (MUCH slower) post-synth and post-PAR simulations if you need to track down a suspected synthesis tool bug.
This does happen : I've seen two in a quarter century.
Most of the time I just use the third approach.

does after [some delay in second] statement provide delay only in simulation or in actual synthesized model to be loaded in to fpga in VHDL?

We use after [some delay] statement for providing delay and that we can analysis in simulation. But when we will load this model in to FPGA so in actual hardware being made by VHDL code will have affect of delay or this delay is limited to simulation only?
a <= not b after 1s;
So suppose I connected one switch to b and LED to a so will I get one second delay in between pressing the switch and updating LED status?
as said before, the wait statement cannot be synthesized and will only affect the simulation. However, I should add that even in simulation you might not get what you expect. Allow me to explain.
VHDL offers 2 delay models: transport delay and inertial delay, the latter being the default, which you selected by not specifying which model to use.
If b would happen not to be stable in the course of the delay, say it toggles every 500ms, a would not toggle as you may desire. To really introduce pure delay, select the transport delay model as follows:
a <= transport not b after 1s;
Of course, again, this cannot be synthesized and is for simulation purposes only.
When you are simulating you need to provide when things happen and what happens to the inputs. After implementing it on the FPGA, exterior events create the inputs and the simulation has nothing to do about it.
So if I understood your question right, yes, the delay you are showing will only affect the simulation.
EDIT :
Regarding the timer, you know the FPGA's clock frequency. So you can create a variable and increment it on each clk_up (I use CLK = '1' and CLK'Event but there are better ways to do it), and when it reaches the same value as the clock frequency, 1 sec has passed.
Not-so-Pseudo code:
signal clock: unsigned (9 downto 0);
if CLK = '1' and CLK'Event then
clock<= clock + 1;
if clock = "1100100000" then --clock frequency (this is an example)
clock <= "0000000000"
-- 1 secound passed!
end if;
end if;

Detecting rising edge synchronization of 2 different clocks

How do you detect rising edge synchronization of 2 different clocks(different frequencies) in VHDL programming using Xilinx software?
There is a main clock of frequency 31.845 Mhz , and another clock of frequency 29.972 Mhz. So the basic aim is to trigger an action when there is synchronization between the rising edges of 2 clocks. We tried implementing it using flipflops but we could achieve only Level synchronization, not Edge sync.
And we cannot compare the rising edges of 2 different clocks in statements like IF and WAIT in vhdl, So that is out of question.
We are trying to count pulses using a counter. For that, we need to stop the count whenever edge matching takes place. We are trying to implement a method called 'Vernier Interpolation'.
Initially, we used the following statement code, but since rising edges of 2 different clocks (clk0, clk1) cannot be compared in an IF statement, we had to drop it.
if(rising_edge(clk0)=rising_edge(clk1)) then wait;
We then tried using WAIT statements (wait until) but it failed.
Then we tried using flipflops and delay circuits (D flipflop), but it resulted in level sync, and not Edge sync.
Firstly I'm not sure why you would want to do this. What you will get out is a new clock at the beat frequency between the two clocks.
The correct way to do this is to sample both clocks using another clock which is at least twice the frequency of the highest expected input. You could generate this higher clock using one of the PLLs in the device. x2 is a minimum. Ideally use a clock which is much higher than both sampled clocks.
Remember VHDL is not a language, it a description of synthesis of real hardware. So just saying Rising_Edge(clk1) = Rising_Edge(clk2) does not make the 'software' detect edges. All the function Rising_Edge really does is to tell the hardware to connect the clk signal to the clock input of a flipflop.
The proper solution is sample both 'clocks' in a process which is clocked by the a sample clock, look for edges (an edge being two subsequent samples that are different) then AND the result and latch if required.
sample code (untested, sorry no time right now).
entity twoclocks is
port (
op : out std_logic;
clk1 : in std_logic;
clk2 : in std_logic;
sample_clk : in std_logic);
end entity;
architecture RTL of twoclocks is
begin
process sample(sample_clock, clk1, clk2):
begin
if rising_edge(sample_clock):
clk1_d <= clk1;
clk2_d <= clk1;
if clk1_d != clk1 and clk2_d != clk2 then
op <= '1';
else
op <= '0';
end if;
end if;
end process;
end architecture;
The kind of vernier interpolator you want needs to be build using very tight timing constraints, thus you can probably not make it using VHDL alone. You need (a lot of) device specific constraints on resource locations and timing.
Please check out the work by A.Aloisio et al.. Aloisio and colleagues have build a vernier interpolator using specific Xilinx delay elements.
Standard VHDL synthesis is mostly suited for register transfer level descriptions. I.e. clocked/synchronous logic. But to compare these two inputs, you would need to sample them at a frequency of the least common multiple of both frequencies. For 31.845 MHz and 29.972 MHz that is a whopping 954.458340 MHz, which is a lot. I have seen these kind of speeds in FPGA logic though.
... But I'm thinking you might even need to double that, due to Nyquist. Maybe FPGA logic can nowadays handle 2 GHz swichting rate. But I'm not sure.
It might be possible to utilize a GT transceiver for this, but since that would be non-standard use of a such a transceiver, it might be hard to realize.

Passing clock between entities

my doubt is how to pass a clock between two entities that are at the same hierarchical level in VHDL.
What I have is an entity "wrapper" in which there are instantiated two components "comp_1" and "comp_2". comp_1 has an output port (let's say "clk_out") that is its clock and that must be also the clock for comp_2. Now, if I use a signal in "wrapper" to pass the clock from comp_1 to comp_2, this cause a functional error in simulation (at least with Modelsim), because the two designs are considered not synchronous (right?). Can this cause an error also in synthesis (with Xilinx)? How can I avoid the problem without changing all the structure?
architecture bhv of my_wrap is
signal tmp_clk : std_logic;
begin
comp_1_i : comp_1
port map(out_clk => tmp_clk,
...
);
compo_2_i : comp_2
port map(in_clk => tmp_clk,
...
);
In this case, in simulation there is the delta cycle problem in the signals between the two components. Can this problem also affect the implemented design on FPGA?
It sounds like you may have a delta cycle delay on the clock, which is a feature in VHDL, but it may appear as if clock and data is out of sync.
This only shows in simulation, but is general VHDL thus not ModelSim specific. After synthesis (in hardware) the internal delay gives similar behavior. Note that ModelSim has a feature ("Expanded Time Delta Mode") to show delta delays.
Without code, I guess that the generated clock in comp_1 is also used for output generation, besides being output on clk_out. Depending on the implementation, it may result in a delta cycle delay difference between clock and data, which is may appear as not synchronous, but it is actually a delta cycle issue.
A possible fix is to output the generated clock from comp_1 without using it, and then making an additional clk_in input on comp_1, similar to the clk_in on comp_2, and then use that clock internally in comp_1. The clock use will then be similar on comp_1 and comp_2, removing the issue with delta delays on clock.
As Morten also pointed out some source code could help making your question more precise.
There is nothing wrong in connecting the clock out signal from one component to the clock in signal of another component. What might be a problem in your case is the way you generate the clock signal.
Depending on your use case you have different options.
If your target is an FPGA you should use a clock generator IP form the given vendor.

How can I speed up my math operations in VHDL?

I have some calculations going on currently at rising edge of a 75MHz pixel clock to output 720p video on screen. Some of the math (like a few modulo) take too long (20+ns whereas 75MHz is 13.3ns) so my timing constraints are not met. I'm new to FPGAs but I'm wondering if for example there is a way to run the calculations at a faster speed than the current pixel clock in order to have them completed by the next tick of the 75MHz clock. I'm using VHDL by the way.
75 MHz is already quite slow by today's FPGA standards.
The problem is the modulo operation, which effectively involves division; and division is slow.
Think carefully about the operations you need, and if there is any way to reorganise the computation. If you are clocking pixels it's not as if you have 32-bit integers to deal with; restricted values are easier to deal with.
Martin hinted at one option: strength reduction. If you have 1280 pixels/line and need to operate on every third one, you don't need to compute 1280 mod 3! Count 0,1,2,0,... instead.
Another, if you need modulo-3 of an 8-bit (or 12-bit) number is to store all possible values in a lookup table, which will be fast enough.
Or sometimes you can multiply by 1/3 (X"5555") instead of dividing by 3, then multiply by 3 (which is a single addition) and subtract to get the modulo. This pipelines really well, but since X"5555" is only an approximation to 1/3 you need to verify in simulation that it delivers the correct output for every input. (for 16-bit inputs, this isn't a big simulation!) The extension to modulo 9 is easy.
EDIT:
Two points from your comments : Another option you have is to create a X2 clock (150MHz) using the Spartan's clock generators, which gives you 2 cycles per pixel. Well pipelined code should meet 150 MHz without much trouble.
How not to pipeline!
PROCESS(Clk)
BEGIN
if(rising_edge(Clk)) then
for i in 0 to 2 loop
case i is
when 0 => temp1 <= a*data;
when 1 => temp2 <= temp1*b;
when 2 => result <= temp2*c;
when others => null;
end case;
end loop;
end if;
END PROCESS;
The first thing to realise is that the loop and case statement cancel each other out, so this simplifies to
PROCESS(Clk)
BEGIN
if rising_edge(Clk) then
temp1 <= a*data;
temp2 <= temp1*b;
result <= temp2*c;
end if;
END PROCESS;
which is buggy! The testbench also being buggy, hides the problem.
In cycle 1, Data,a,b,c are presented, and temp1 = Data*a is computed.
In cycle 2, temp1 is multiplied by a NEW value of b instead of the correct one!
Same again in cycle 3!
Since the testbench sets the inputs and leaves them constant, it won't catch the problem!
PROCESS(Clk)
BEGIN
if rising_edge(Clk) then
-- cycle 1
temp1 <= a*data;
b_copy <= b;
c_copy1 <= c;
-- cycle 2
temp2 <= temp1*b_copy;
c_copy2 <= c_copy1;
-- cycle 3
result <= temp2*c_copy2;
end if;
END PROCESS;
I like to comment each cycle; every term I use in a cycle must come from the immediately preceding cycle, either by calculation or from a copy.
At least this works, but it could be reduced to 2 cycles depth and fewer copy registers because in this example, the four inputs are independent (and I am assuming there are no measures required to avoid overflow). So:
PROCESS(Clk)
BEGIN
if rising_edge(Clk) then
-- cycle 1
temp1 <= a * data;
temp2 <= b * c;
-- cycle 2
result <= temp1 * temp2;
end if;
END PROCESS;
Here's some techniques:
Pipelining - split the logic up to operate over multiple clock cycles
multi-cycle path - if you don't need the answer every cycle, you can tell the tools that it's OK for it to take longer. Care is required not to tell the tools the wrong thing though!
Think again - for example, do you really need to do x mod 3 on very wide x, or could you use a continuously updated modulo 3 counter?
Use better tools - I've had instances where I could meet timing on a deep-logic-path using an expensive synthesizer compared to not meeting timing on the same code using the vendor's synthesizer.
More extreme solutions involve changing the silicon, for a faster device, or a newer device, or a newer, faster device.
Usually complex math operations in FPGAs are pipelined. Pipelining means you divide your operations to stages. Let's say you have a multiplier which takes too long for your clock speed. You divide your multiplier to 3 stages. Basically your multiplier consists of three different parts (which has their own clock input) chained one after. These three parts will be smaller then one part, so they will have a smaller delay thus you can use a faster clock for them.
A drawback of this will be the 'delay'. Your pipelined system will give output with a latency. In the multiplier example above to have the correct output, you have to wait until your input passes all 3 stages. But this is usually very small (depending on your design of course) and can be ignored.
Here is a good (!) post about this: http://vhdlguru.blogspot.com/2011/01/what-is-pipelining-explanation-with.html EDIT: See Brian's post instead.
Also vendors usually ship optimized and pipelined versions of math operations as IP cores in their design software. Look for them.

Resources