VHDL code
First of all, sorry for the redirect, but it's easier that way.
I'm building a digital clock, but as you can see, clock_AN and clock_seg_out do not change. Is this caused by a wrong port mapping?
Thanks!
Your input master clock is too slow. Looking at the frequency divider cct, it looks like you've it programmed to divide a 100MHz clock. So either:
speed up your testbench master clock
or set the divider target to a lower number for debug purposes
Go with #2 if you want reasonable sim times!
Related
a real junior question with hopefully a junior answer, regarding one of the main assignments of VHDL (concurrent selective assignment) can anyone explain what a VHDL compiler would synthesise the following description into?
LIBRARY IEEE;
USE IEEE.std_logic_1164.ALL;
USE IEEE.numeric_std.ALL;
ENTITY Q2 IS
PORT (a,b,c,d : IN std_logic;
EW_NS : OUT std_logic
);
END ENTITY Q2;
ARCHITECTURE hybrid OF Q2 IS
SIGNAL INPUT : std_logic_vector(3 DOWNTO 0);
SIGNAL EW_NS : std_logic;
BEGIN
INPUT <= (a & b & c & d); -- concatination
WITH (INPUT) SELECT
EW_NS <= '1' WHEN "0001"|"0010"|"0011"|"0110"|"1011",
'0' WHEN OTHERS;
END ARCHITECTURE hybrid;
Why do I ask? well I have previously gone about things the wrong way i.e. describing things on VHDL before making a block diagram of the components needed. I would envisage this been synthed as a group of and gate logic ?
Any help would be really helpful.
Thanks D
You need to look at the user guide for your target FPGA, and understand what is contained within one 'logic element' ('slice' in Xilinx terminology). In general an FPGA does not implement combinatorial logic by connecting up discrete gates like AND, OR, etc. Instead, a logic element will contain one or more 'look-up tables', with typically four (but now 6 in some newer devices) inputs. The inputs to this look up table (LUT) are the inputs to your logic function, and the output is one of the outputs of the function. The LUT is then programmed as a ROM, allowing your input signals to function as an address. There is one ROM entry for every possible combination of inputs, with the result being the intended logic function.
A function with several outputs would simply use several of these LUTs in parallel, with the same inputs, one LUT for each of the function's outputs. A function requiring more inputs than the LUT has (say, 7 inputs, where a LUT has only 4), simply combines two LUTs in parallel, using a multiplexer to choose between the output of the two LUTs. This final multiplexer uses one of the input signals as it's control, and again every possible combination of inputs is accounted for.
This may sound inefficient for creating something simple like an AND gate, but the benefit is that this simple building block (a LUT) can implement absolutely any combinatorial function. It's also worth noting that an FPGA tool chain is extremely good at optimising logic functions in order to simplify them, and to better map them into the FPGA. The LUT provides a highly generic element for these tools to target.
A logic element will also contain some dedicated resources for functions that aren't well suited to the LUT approach. These might include dedicated carry chains for adders, multiplexers for combining the output of several LUTS, registers (most designs are synchronous). LUTs can also sometimes be configured as small shift registers or RAM elements. External to the logic elements, there will be more specific blocks like large multipliers, larger memories, PLLs, etc, none of which can be as efficiently implemented using LUT resource. Again, this will all be explained in the user guide for your target FPGA.
Back in the day, your code would have been implemented as a single 74150 TTL circuit, which is a 16-to-1 mux. you have a 4-bit select (INPUT), and this selects one of 16 inputs to the chip, which is routed to a single output ('EW_NS`). The 74150 is obsolete and I can't find any datasheets, but it's easy to find diagrams of what an 8-to-1 mux looks like (here, for example). The 16->1 is identical, but everything is wider. My old TI databook shows basically exactly the diagram at this link doubled up.
But - wait. Your problem is easier, because you're not routing real inputs to the output - you're just setting fixed data values. On the '150, you do this by wiring 5 of the 16 inputs to 1, and the remaining 11 to 0. This makes the logic much easier.
The 74150 has basiscally exactly the same functionality as a 4-input look-up table (where the fixed look-up data is the same as fixed levels at the '150 inputs), so it's trivial to implement your entire circuit in a single LUT in an FPGA, as per scary_jeff's answer, rather than using a NAND-level implementation. In a proper chip, though, it would be implemented as a sum-of-products, or something similar (exactly what's in the linked diagram). In this case, draw a K-map and find a minimum solution. My 2 minutes on the back of an envelope comes up with three 3-input AND gates, driving a 3-input OR gate. I'll leave it as an exercise to you to check this :)
my doubt is how to pass a clock between two entities that are at the same hierarchical level in VHDL.
What I have is an entity "wrapper" in which there are instantiated two components "comp_1" and "comp_2". comp_1 has an output port (let's say "clk_out") that is its clock and that must be also the clock for comp_2. Now, if I use a signal in "wrapper" to pass the clock from comp_1 to comp_2, this cause a functional error in simulation (at least with Modelsim), because the two designs are considered not synchronous (right?). Can this cause an error also in synthesis (with Xilinx)? How can I avoid the problem without changing all the structure?
architecture bhv of my_wrap is
signal tmp_clk : std_logic;
begin
comp_1_i : comp_1
port map(out_clk => tmp_clk,
...
);
compo_2_i : comp_2
port map(in_clk => tmp_clk,
...
);
In this case, in simulation there is the delta cycle problem in the signals between the two components. Can this problem also affect the implemented design on FPGA?
It sounds like you may have a delta cycle delay on the clock, which is a feature in VHDL, but it may appear as if clock and data is out of sync.
This only shows in simulation, but is general VHDL thus not ModelSim specific. After synthesis (in hardware) the internal delay gives similar behavior. Note that ModelSim has a feature ("Expanded Time Delta Mode") to show delta delays.
Without code, I guess that the generated clock in comp_1 is also used for output generation, besides being output on clk_out. Depending on the implementation, it may result in a delta cycle delay difference between clock and data, which is may appear as not synchronous, but it is actually a delta cycle issue.
A possible fix is to output the generated clock from comp_1 without using it, and then making an additional clk_in input on comp_1, similar to the clk_in on comp_2, and then use that clock internally in comp_1. The clock use will then be similar on comp_1 and comp_2, removing the issue with delta delays on clock.
As Morten also pointed out some source code could help making your question more precise.
There is nothing wrong in connecting the clock out signal from one component to the clock in signal of another component. What might be a problem in your case is the way you generate the clock signal.
Depending on your use case you have different options.
If your target is an FPGA you should use a clock generator IP form the given vendor.
I am doing this project that will output a desired frequency. For most frequencies i can make valid code, but when it comes to frequency like 300 Hz I'm having trouble.
So here is my code for most of them:
library ieee;
use ieee.std_logic_1164.all;
entity test is
port(
clk:in std_logic:='0';
clk_o:buffer std_logic:='0'
);
end test;
architecture Behavioral of test is
begin
process(clk)
variable temp:integer range 0 to 1000000:=0;
begin
if(clk'event)then
temp:=temp+1;
if(temp>=1000000)then
clk_o<=not clk_o;
temp:=0;
end if;
end if;
end process;
end Behavioral;
This will generate frequency of 50 Hz because the clock speed of my FPGA is 50 MHz. So first I tried to divide it, but problem is that you can't generate 300 Hz because 50*10^6/300 is 166666.667 and so on.
Then I saw that you can make time type of variable and make period last 1/300 but then i realized it is not synthesis eligible so it's no good. Also goes with REAL type of variable that could make it more accurate then integer variable but it's also not synthesis eligible.
So I'm out of ideas, if anyone can give me some hint I would much appreciate it.
If you just use 166666 you will only be too fast by 0.0004% or 4ppm. This is such a small error that it typically won't matter in a real implementation - your 50MHz oscillator probably has more error than that.
[EDIT: as noted in the comments, crystal oscillators are likely to have 10-20ppm error, but there are other types of oscillators that have much higher or lower error (although the 50MHz FPGA clock is probably a crystal)]
If you really need to get rid of that 0.0004% error, you can use a DCM/MMCM/PLL (depending on the capabilities of your FPGA) to multiply your 50MHz clock to 150MHz first which will divide evenly to 300 Hz.
As Brian Drummond mentioned, you could also use a 0-2 counter to count to 166667 2 out of every 3 cycles and to 166666 on the other, which will avoid cumulative phase error if you were trying to remain in-phase with a perfectly ideal 300MHz source.
The fact that no real oscillators are perfectly matched to the advertized frequency is why RS232 re-syncs to the clock (to avoid cumulative phase error) frequently and why higher speed transfers use source synchronous clocking or Clock-and-data-recovery PLLs.
Simply having a code like this :
if(rising_edge(clk)) then
temp(0):="001";
temp(1):="011";
temp(2):="101";
temp(3):="000";
temp(0):=temp(3)xor temp(5);
end if
For the example above all this variable assignment would be done in 1 clock cycle which is pretty unpractical. In the behavioral simulation it works fine but in post synthesis it's messed up. Can I add like a delay or a sth like a wait(wait statement is un-synthesizable) to make it wait util the variable gets its value before jumping to the next line?
Doing all of those things in one clock cycle is simple. Hardware is extremely fast, and FPGA clock rates aren't that high relative to processors.
Since you are using variables, the intermediate results are used immediately. If you want a more explicit delay, you could use a signal. The above code with signals would use temp(3) from the previous rising edge.
for synthesis you can not make delays like wait. well defined, controllable delays in synthesis can only be made with pipelining (clock cycles as delay units).
I'm using the TEMAC IP core to generate a 1gb ethernet MAC, and came across an interesting piece of code:
-- DDr logic is used for this purpose to ensure that clock routing/timing to the pin is
-- balanced as part of the clock tree
not_rx_clk_int <= not (rx_clk_int);
rx_clk_ddr : ODDR2
port map (
Q => rx_clk,
C0 => rx_clk_int
C1 => not_rx_clk_int,
CE => '1',
D0 => '1',
D1 => '0',
R => reset,
S => '0'
);
So according to my understanding, what's happening here is that a "new" clock is being generated by two clocks that are 180 degrees out of phase by using each clock as a select line input to the mux. (See very useful diagram below taken from page 64 in this document!)
When C0 is '1' then Q <= D0 which gives rx_clk <= '1', and if C1 is '1' then Q <= D1 which gives rx_clk <= '0'. During reset both flipflops are reset giving rx_clk <= '0' while reset = '1'
So I have a few questions:
Are the two clocks (not_rx_clk_int and rx_clk_int) going to be precisely 180 degrees out of phase when generated in this way? (by this way, I mean not_rx_clk_int <= not (rx_clk_int)). I assume not due to delta time? What are the implications of this?
What is the benefit of using the ODDR2 in the first place (why isn't rx_clk <= rx_clk_int adequate)? (Which leads to...)
What does it mean for a clock to be "balanced" as part of the clock tree? (clock tree mentioned briefly on page 59 here.)
Isn't rx_clk being gated during reset? Isn't this bad?
Is this the "standard" way of using a ODDR2 and/or performing this operation? Are there better options? (and hence, should I add this to my arsenal of useful VHDL bits and pieces? )
Feel free to suggest recommended reading and/or other resources. I don't want to blindly copy/paste this code into my project without knowing exactly what's going on here.
1) Are the two clocks (not_rx_clk_int and rx_clk_int) going to be precisely 180 degrees out of phase when generated in this way? (by this way, I mean not_rx_clk_int <= not (rx_clk_int)). I assume not due to delta time? What are the implications of this?
Yes, they will be pretty well exactly phased.
Delta-delays are not at issue here. They only apply to HDL simulations, standing in place of unknown "real" delays. I would hope that Xilinx got their model correct so that both edges change in the same delta cycle! ie. they do something like:
not_rx_clk <= not (rx_clk_int);
rx_clk <= rx_clk_int;
to match the deltas.
2) What is the benefit of using the ODDR2 in the first place (why isn't rx_clk <= rx_clk_int adequate)? (Which leads to...)
It ensures that the delay is predictable relative to the other IOs that you no doubt have synchronised with this clock. If you just drive the clock signal out of a pin, it has to come off the clock distribution network, through some routing, and then to the pin (as there's no direct route for a clock net to get to the IO pin. That's a delay which is unpredictable and likely to vary from one compile to another.
3) What does it mean for a clock to be "balanced" as part of the clock tree? (clock tree mentioned briefly on page 59 [here.][3])
As I understand it, it means that the clock tree makes sure that the clock goes the same distance (approximately) to every destination.
4) Isn't rx_clk being gated during reset? Isn't this bad?
Yes it is being turned on and off (I'd hesitate to use the word 'gated' as that means a specific thing - being fed through an AND gate - which this isn't). Only you can say if that matters - it depends on where it goes to.
5) Is this the "standard" way of using a ODDR2 and/or performing this operation? Are there better options? (and hence, should I add this to my arsenal of useful VHDL bits and pieces? )
Three questions in one, sneaky :)
Yes, it's (a) standard way of using ODDR2 (the other standard use is for actual DDR data of course).
No, I don't know of a better way to simply get a clock out.
Yes, add it to your arsenal.
Partial set of answers:
1) I'm amused by the unnecessary brackets in not (rx_clk_int); like a lot of Xilinx cores, it makes me wonder if it's auto-translated from Verilog or something; there's a lot of really bad VHDL in some of them. (So I'm easily amused.) Anyway...
Synthesis tools probably optimise out the separate "not" and use the falling edge of rx_clk_int, so you certainly can get 180 degree phase shift this way. (Whether it's guaranteed, or a more complex expression might fool synthesis, I can't say).
2) Straight assignment would take rx_clk_int off the clock tree, onto ordinary routing, through an output buffer and the total delay would be anybody's guess. This way you have precisely timed clocks directly in the IOB for more predictable timing.
3) FFs and IOBs right next to the clock gen don't see the clock before the ones in the far corner; balancing the clock tree is slowing up all the short paths to match the longest one. (You can see this on DIMM memory PCBs, a lot of zigzag lines on traces to lengthen them!)
4) I would expect it to be gated. Whether that's bad depends on what it's clocking. Perhaps an Ethernet expert can chip in here. Or chase the logic driving "reset" to this block; it may not be the main system reset, to fix this issue.
5) It's certainly a fairly well known trick on newer FPGAs (ones with DDR regs), and very useful for clocks in addition to their main purpose (DDR inderfaces to memory etc). Keep it handy!