How to work with DDR in synthesizeable Verilog/VHDL? - vhdl

I am working on implementing a DDR SDRAM controller for a class and am not allowed to use the Xilinx MIG core.
After thrashing with the design, I am currently working synchronously to my system clock at 100MHz and creating a divided signal "clock" (generated using a counter) that is sent out on the IO pins to DDR SDRAM. I have some logic that feeds me the "rising" edge strobes of this signal clock as I am aware that I cannot use a signal to clock a process. However, this divided clock method runs very slow and I have concerns that I am not meeting the minimum required frequency of the external DDR SDRAM. I am hoping to speed up my read/write bursts, but to do so, my spartan3e will struggle with anything higher than 100MHz. After looking around online, I found this code from EDA Board:
process(Input_Clk,Reset_Control)
begin
if (Reset_Control = '1') then
Output_Data <= (others => '0');
elsif rising_edge(Input_Clk) then
Output_Data <= Input_Data1;
elsif falling_edge(Input_Clk) then
Output_Data <= Input_Data2;
end if;
end process ;
I have written a lot of VHDL, but have never seen something like this before. I'm sure this works fine in simulation, but it doesn't look synthesizable to me. The poster said this should be supported by 1076.6-2004. Does this infer two flip-flops, one clocked on the rising edge and one on the falling edge whose outputs both feed into a 2:1 mux? Does Xilinx support this? I want to avoid having to instantiate a DCM as crossing these clock domains will definitely slow me down and will add undesired complexity. Is there a way to safely generate my DDR data that is being sent to and received from DDR SDRAM without the Xilinx primitive for the MIG? How would I perform the receiving of DDR data in Verilog?
For reference, we have to code in Verilog, so I'm not too sure on how to translate that VHDL process to a Verilog always block if it is synthesizable. We are using the Micron MT46V32M16 if that is relevant.
Here are the timing diagrams for what I am trying to replicate:

I would say that implementing a DDR controller 'for class' is rather challenging. In the companies I worked for they where left for senior engineers to build.
First about the Verilog code shown:
Yes, you are right that can not be synthesized.
The approach to double-clocking inputs is to have two data paths. One on the rising edge and one on the falling edge. In a second stage the two are put in parallel but with double the data width. Thus a 32-bit wide DDR produces 64 data bits per 'system' clock.
More difficult is to clock the arriving data at the right time. As your read diagram shows the data arrives in the middle of the clock edge. For that you need a delayed clock. In an ASIC that is done using a 'tune-able' delay line which is calibrated at start-up and regularly checked for the phase. In an FPGA that would requires some esoteric logic.
I have not been close to DDRs chips for a while, but I think all the modern ones (DDR2 and up?), output a clock themselves to help with the read data.
Also after you have clocked the read data in, using that shifted clock, you have to get the data back to the system clock which requires an asynchronous FIFO.
I hope that gets you started.

Related

How does Vivado ensure that signals do not transition unpredictably when they are sampled by a clock edge and how does this relate to simulation?

I wrote some VHDL code and I wrote a simple sequential code defining a clock sensitive process. Whenever the clock rises from low to high I check another signal in the architecture and I do stuff depending on its value. Nevertheless, this signal transitions at the same time instant as the rising edge of the clock occurs.
In simulation, when the rising edge arises, the system always samples the signal value before its transition. My question is: how does this work out once the code is implemented on the corresponding FPGA? Does it produce unpredictable sampling of the signal value? Do you advice to always avoid this type of scenario within a VHDL architecture?

FPGA Will pausing entities (by pausing their clock input) reduce the overall power consumption?

I'm currently creating a multiple entity project where all of the entities have clock synchronous architecture (no behaviourals) and most of the entities work on derived clocks.
I'm using DE0 Nano, so my source clock is 50MHz and I have 4 derived clocks: 1 MHz, 500 kHz, 10 Hz and 1 Hz.
Disclamer: While I am aware that doing things this way is much less power-efficient, I've been wondering if there is something I could do to remedy this at least a little (I'm open to ideas).
Now, in the top-level entity I have an "event handler", which can decide which entities should work at any given moment.
Therefore I came up with an idea to wire an on/off clock switch for the derived clock input signals to the lower level entities and disable some of them (the clock inputs) when I don't need a given entity to work for a while (as I understand, this should stop their processes from firing for that time).
Since I don't have an easy way to test that idea (I estimate it will take a moderate amount of work and time, especially setting up the power consumption measurement) I wanted to ask whether anyone tried something similar and/or knows if it's worth a shot?
For your information, currently when the entities are in "sleep" mode, their processes fire on each rising clock edge, check an internal state or flag variable/signal and stop e.g.:
process (1MHz clock) is
variable ...
begin
if rising_edge(clk) then
if state = ready and enable = '1' then
...
end if;
end if;
end process;
Or maybe there an other, better way to do it?

How to use an osciloscope with a FPGA using Vhdl

Any of you have any material about this?
I want to show an std_logic_vector(0 to 29) on the osciloscope
That's 30 bits ... you don't want to probe 30 pins.
I'd use 2 spare pins and roll a simple serial interface off a suitable (e.g. 1 MHz) clock and a /32 counter.
One pin shifts out each bit according to the count, the other is set when you send the first bit, as a convenient triggering signal.
Either let it free run, or tell it to start (inside the FPGA) every time you update that signal.
Most FPGA vendors provide some kind of in-system debugger (like ChipScope for Xilinx ISE designs). These provide a very powerful debugging perspective for your FPGA design and allow you to record waveforms on hundreds of signals.

VHDL simulation what is the correct delta?

I am currently implementing a MUX, and to test this I've created a generator and a monitor to well generate data as input and monitor its output.
The MUX takes Avalon Streaming interface as input and output and therefor also supports back pressure.
My question is. My test bench run on falling edge while my DUT and input data is generate at rising edge. Both my input clock and my input data is generated at Delta cycle 0. However my back pressure ready signal returning from the DUT and which controls the generator is set at Delta 3. Now this gives some sampling problems because the DUT must only load data every time data from the generator (at delta 0) is valid and the DUT ready is valid (The back pressure signal at Delta 3).
Now if I skew my DUT input clock with 1 ps it fixes the problem. But it feels like that is the wrong approach. What is the correct design principle here. ?
Skew the clock 1 ps or at least move it 4 deltas so i make sure all my signals have been set before rising_edge ?
or
Move the data I generate so it aligns with the DUT output ready signal ?
or
Is it just a decision made from test bench to test bench ?
I've also thought that a clock in a test bench should be generated at delta 0 and everything else must come after.
I am simulating in Riviera-pro
You have various choices:
i) Make everything synchronous. In other words, drive the inputs and sample the outputs on the same edge of the clock as the DUT uses. Afterall, the DUT doesn't suffer any race problems, so if you just extend the clocking strategy to the testbench, everything will work fine. At RTL, but not at gate-level. So if you're doing gate-level sims (which you should be), then this strategy is no good for that.
ii) Clock everything in your testbench off the opposite edge of the clock to the edge the DUT uses. Again, fine for RTL, but whether fine for gate-level depends on the delays through your design.
iii) Drive the inputs to the DUT just after the clock edge and sample the DUT outputs just before it. The clock edge being the edge that the DUT uses. Again, this is fine for RTL, at is the most robust for gate-level, too.
iv) Implement realistic timing for each DUT interface. That ought to work for RTL and gate-level and if it doesn't work for gate-level then the fault is with the DUT not the testbench.

Simultaneous reading and writing to registers

I'm planning to design a MIPS-like CPU in VHDL on a FPGA. The CPU will have a classic five stage pipeline without forwarding and hazard prevention. In the computer architecture course I learned that the first MIPS-CPUs used to read from the register file on rising clock edge and write on falling clock edge. The FPGA I'm using doesn't support using rising and falling clock edge at the same time (regarding reading and writing to registers), so I can't exactly do like the original MIPS and have to do it all on rising clock edge.
So, here comes the part where I'm having a problem. The instruction writes back to the register in the write back stage. The write back stage sends the data directly to the decode stage. Another instruction in the decode stage wants to read the same register that also the write back stage wants to write.
What happens in this case? Does the decode stage take the new value for the instruction or the old value that is still in the register file?
A register file that fits in the decode stage of the classic five stage design consists of a triple port RAM (or two dual port RAM) and two muxers and comparators. The comparators and muxers are required to bypass the data coming from the write-back stage. This is needed as the write data is written into the triple port RAM in the next cycle. Because the signals coming from the write-back stage are synchronous, this is not a problem.
The question is what do you understand by the term "register". Or more specifically, how do you would like to map the register bank to the FPGA.
The easiest but not so efficient way is to map each MIPS register to several flip-flops according to the register size. You can update these flip-flops at only clock-edge (e.g. falling edge). After that you can read the new content at any time also known as asynchronous read. This solution is not so efficient because the multiplexer to select one MIPS register from the register bank requires a lot of logic resources.
If you have an FPGA where the LUTs can be used as distributed memory, then almost all of the logic resources for the multiplexers can be saved. Distributed memory typically provides an asynchronous read too (and a synchronous write of course). Please read the vender documentation of the synthesis tool on how to describe this type of memory for synthesis.
Last but not least, you can map the full register bank to on-chip block memory. These typically provide only a synchronous read, i.e., reading starts at a clock-edge. (Of course, they also provide only a synchronous write). However, these are typically dual-ported RAMs. Thus, you can write at the falling edge at one port and read with the rising at on the other port. Please read, the documentation of your FPGA on the timing of the write. For example, on some Altera FPGAs the writing is delayed to the next opposite edge (here rising-edge) of the clock.

Resources