Why use multiple clocks of the same speed in an FPGA design? - vhdl

I very recently began experimenting with FPGAs. In researching things around the net I've noticed in several places that designs might use multiple separate PLL clocks of the exact same speed. Why is that?
One example I will give is this site: Parallella Linux Quick Start
They have their FCLK_CLK1 and FCLK_CLK2 both at 200MHz. Why is this recommended and not a single clock at 200MHz for both? Is it just customary to give each major component their own clock even if it is the same? Or am I missing something?

Beside the already mentioned reasons multiple other reasons exist why two PLL clocks of the same speed might exist.
Even if the frequency is exactly the same, differences might exist in clock phase or jitter. Using one PLL with fixed clock phase and another one with adjustable clock phase can be useful for proper sampling of external input signals or maintaining the correct phase difference between clock and output data. Techniques like that were especially popular before components such as IDELAY and ODELAY were widely available.
Crystal oscillators also will have small derivations from their marked value. If you have a communication link between two boards and both boards have their own oscillator, then one boards main clock might run at 200.01 Mhz and the other boards could run at 199.99 Mhz. In many cases both FPGAs will then their locally generated low-jitter clock as their main clock, but will also use the remote clock to sample incoming data. You can see this in ethernet PHYs: A 100 Mbit PHY usually has a 25 Mhz receive clock recovered from the input signal and a locally generated 25 mhz transmit clock.

There are many reasons to use multiple clocks of the exact same speed. So I will just state a few. However i don't have any deep knowledge of your example.
Magic on FPGA.
Like stated in the comments a FPGA is a highly complex device. Only the vendor knows exactly what is happening there, so they might give you some advice, which can be weird.
Clock distribution.
If you have a design, with just one clock source, it is critical to route the clock correctly. The clock has to trigger everywhere at the same time, which is hard to manage for the PnR tool. Today's FPGAs don't usually have this issue.
Different IPs on one FPGA.
If you have different IPs/Designs, which you fuse on one FPGA, the IPs can use different clocks. If you want to split it later again, you will need multiple sources of the clock anyway. Besides, you are forced to implement some registers if you switch a clockdomain and during the merging of your IPs, you don't mix up evrything, which is a good design style. This also maybe the case of your example.
HDMI support is provided by an IP core from Analog Devices ...
Output.
Maybe the additional clock is only used as an output at some I/O Port.
Low-Power.
In today's CMOS technology, the most power is wasted on transitions (transistor switches) and static power-leakage (the damn thing is so small, it just leaks current). With multiple clock domain, you have the opportunity to have less transitions per second. Or you can switch off parts of your device completly.

Related

Best Route For Input Clocks on Kintex7 FPGA

I'm looking for advice on a less than ideal situation.
I've inherited a project where we have a hardware design issue. We generate a clock to a chip which feeds the clock back in over a none clock-capable input. This works at up to 160MHz but we are looking to increase the clock so I'm researching IO options. This is used to clock 8 parallel data inputs.
Right now the data inputs go through a delay and a IDDR block. The output is fed to a FIFO. Our clock is still routed to a BUFG - so we have:
Data - IDELAY - IDDR - FIFO
Clock - BUFG ----^------^
I read somewhere that routing to a BUFG has a large delay so a BUFR-BUFIO is better. Is this the case? Have I missed a better option?
When you say generating a clock to "a chip", I will assume that you mean the Kintex7 chip.
The delay is not a problem. The issue is for your timing closure to be set up properly so that the static timing analysis can validate whether you violate any setup or hold time in all boundary corners of the board.
If you look at DS182 document, you will find under AC Switching characteristics which will give you a rough idea on how well the chip can perform.
However, the best is to let the timing analyzer inside Vivado calculate for you whether your desired clock frequency will be able to close timing.
You just need to make sure
The data input is synchronous to your final clock.
If it isn't, then clock that data input across two stages of registers with respect to the final clock.
Specify your timing constraints
Run through synthesis and implementation
Check the timing to see that there are no violations.
Or maybe I did not understand something about what you are trying to do.

Connection of external crystal oscillators for FPGA

I am designing a Triple modular redundancy processor (TMR) system to synthesize in an Altera DE10lite FPGA Board. Its purpose is to demonstrate reliability of computation under the present of various faults. I need advice on how to connect three external crystal oscillators (instead of the on board crystal), with same ratings to drive the three processors inside the FPGA.I will be using a synchronization voting scheme to sync all three signals. Can this task be done?
Clock distribution triplication
I have read the following relevant links that describe using PLL's is this the correct way?
https://www.altera.com/documentation/mcn1395213337540.html#mcn1395213788377
No, that's unlikely to work.
If you run each soft CPU with a separate crystal, they will drift out of synchronization due to slight variations in frequency between the crystals.
If you try to use a majority voting scheme to create a single clock signal from three input clocks, you'll end up with a very weird, irregular clock signal which will probably cause faults in the logic driven by it.
Use one clock source at a time. If you're convinced you need to resist failures of an external clock, consider implementing some way to detect a failure in the current clock and switch to another one. (Keep in mind that this logic will need to still work without a functional clock… which may be difficult.)

[Common Clock Framework]: How to set rate of a muxed clock if its parent clock unable to set?

Studying Common Clock Framework and have a doubt related to muxed clocks.
If we want to set particular rate of a muxed clock and the current parent of the clock is unable to set the desirable rate (parent have lesser rate).
Then, Is there any function or mechanism who switches the parent of the clock (from its parent list) automatically and sets the desire rate?
One possible solution, we can call the set_parent() manually and then call set_rate(), which can set desire rate. But what if we just call set_rate() and it swithces parent of the clock automatically and sets desirable rate.
Some clocks may up-scale a timer using a PLL. So having a parent that has a lower clocking doesn't mean that automatically trying to increase the parent clock is the best solution. The Common clock framework (CCF) is meant to allow multiple drivers/sub-systems access to a shared resource. The CCF doesn't try to be intelligent as the way different clock trees behave is difficult to know generically.
One possible solution, we can call the set_parent() manually and then call set_rate(), which can set desire rate.
I think you mean to call get_parent() and then use set_rate? Some of the time, it is not easy to call set_parent() as it maybe fixed. You need to read your SOC documentation. In some cases, there are multiple input clocks available. Ie, the real clock hierarchy is not a tree but a DAG although the active hierarchy is tree-like.
But what if we just call set_rate() and it switches parent of the clock automatically and sets desirable rate.
This might make sense for your SOC clock that you are looking at but not generically. There maybe dozens of clocks dependant on a parent and it maybe possible to re-rate grand-parents, etc. It is probably not the best choice to re-rate the system clock because an audio driver wants a clock that is a few HZ out?
It is possible to write the clock driver so that it will re-rate the parent if a request is made on a child that doesn't work. However, this is part of the clock drivers and not the CCF generally.
Example
For instance, an SOC might have an audio clock with three input sources,
A dedicated 48000khz
Some low speed bus clock (platform general)
A USB clock
Option 1 is the best sound quality with the highest power consumption. Option 2 is meant to be generic but may not match sound rates well resulting in sub-optimal DAC/wave/sound generation. Option three might be good for some sort of USB sound slave, but if you are not using USB this may be expensive for power consumption.
In the case above, set_parent() maybe a way to get the desired rate, if the SOC clock driver supports it.
There is no intelligence in the CCF; if there is some flexibility it is in the clock driver but this depends on the clock hardware. It is up to a programmer to read the SOC documentation and determine what is the best way to configure the clock tree. Probably you should also examine the clock driver for your SOC and Linux version to see what it is supporting. You can not generically change the clock rate of parents in a driver as other devices may depend on them. If you need this for a particular SOC in an SOC family, you need to special case it by examining a device tree to see which SOC the driver is running on. This is the case where you can use get_parent() and set_rate() for the particular SOC.
Reference: A question on older Linux clock structure.

Moving data between processes in Spartan 3

I have two processes A and B, each with its own clock input.
The clock frequencies are a little different, and therefore not synchronized.
Process A samples data from an IC, this data needs to be passed to process B, which then needs to write this data to another IC.
My current solution is using some simple handshake signals between process A and B.
The memory has been declared as distributed RAM (128Bytes as an array of std_logic_vector(7 downto 0)) inside process A (not block memory).
I'm using a Spartan 3AN from Xilinx and the ISE Webpack.
But is this the right way to do it?
I read somewhere that the Spartan 3 has dual-port block memory supporting two clocks, so would this be more correct?
The reason I'm asking, is because my design behaves unpredictable, and in cases like this I just hate magic. :-)
Except for very specific exceptional cases, the only correct way to move data between two independent clock domains is to use an asynchronous FIFO (also more correctly called a multi-rate FIFO).
In almost all FPGAs (including the Xilinx parts you are using), you can use FIFOs created by the vendor -- in Xilinx's case, you do this by generating yourself a FIFO using the CoreGen tool.
You can also construct such a FIFO yourself using a dual-port RAM and appropriate handshaking logic, but like most things, this is not something you ought to go reinvent on your own unless you have a very good reason to do so.
You also might consider whether your design really needs to have multiple clock domains. Sometimes it's absolutely necessary, but that's much, MUCH less often than most people just starting out believe. For instance, even if you need logic that runs at multiple rates, you can often handle this by using a single clock and appropriately generated synchronous clock enables.
The magic you are experiencing is most likely because either you haven't constrained your design correctly in synthesis, or you haven't done your handshaking properly. You have two options:
FIFO
Use a multirate FIFO as stated by wjl, which is a very common solution, works always (when done properly) and is huge in terms of resources. The big plus of this solution is that you don't have to take care about the actual clock domain crossing issues, and you'll get maximum bandwidth between the two domains. Never try to build up an asynchronous FIFO in VHDL because that won't work; even in VHDL there are some things that you simply can't do properly; use the appropriate generators from Xilinx, I think thats CoreGen
Handshaking
Have at least two registers for your data in the two domains and build a complete request/acknowledge handshaking logic, it won't work properly if you don't include those. Make sure that the handshaking logic is properly synchronized by adding at least two registers for the handshaking signals in the receiving domain, because otherwise you will most likely have unpredictable behaviour because of metastability issues.
For getting a "valid/ack" set of flags across clock domains, you may wish to look at the Flancter and here's an application of it
But in the general case, using a dual-clock FIFO is the order-of-the-day. Writing your own will be an interesting exercise, but validating across all the potential clock timing cases is a nightmare. This is one of the few places I'll instantiate a Coregen block.

What are some practical applications of an FPGA?

I'm super excited about my program powering a little seven-segment display, but when I show it off to people not in the field, they always say "well what can you do with it?" I'm never able to give them a concise answer. Can anyone help me out?
First: They don't need to have volatile memory.
Indeed the big players (Xilinx, Altera) usually have their configuration on-chip in SRAM, so you need additional EEPROM/Flash/WhatEver(TM) to store it outside.
But there are others, e.g. Actel is one big player that come to mind, that has non-volatile configuration storage on their FPGAs (btw. this has also other advantages, as SRAM is usually not very radiation tolerant, and you have to require special measurements when you go into orbit).
There are two big things that justify FPGAS:
Price - They are not cheap. But sometimes you can't do something in software, and you need hardware for it. And when you are below a certain point in your required volume (e.g. because its just small series, or a prototype) an FPGA is MUCH cheaper than an ASIC. Also, while developing ASICs this allows - before a final state is reached - much higher turn-around times.
Reconfiguration - You can reconfigure your FPGA. That is something a processor or an ASIC can't do. There are some applications where you can use this: E.g. When you need the ability to fix something in the design, but you can't get physically to the device. Example for this: The mars orbiters/rovers used Xilinx FPGAs. When someone finds there a mistake (or wants to switch to a different coding for transmitting data or whatever), you can't replace the ship, as it is just not reachable. But with an FPGA you can just reconfigure and can apply your changes. Another scenario is, that you can have one single chip which is able to perform different accelerations, depending on the scenario. Imagine a smartphone, when telephoning the FPGA can be configured to make audio en-/decoding, when surfing it can work as a compression engine, when playing videos it can be configured as h264 decoder/accelerator. Another thing you could do is that you can match your hardware to your problem instance. E.g. Cisco uses many FPGAs in their hardware. You need the hardware to perform switching/routing/packet inspection with the required speed, and you can generate from actual setting matching engines directly into hardware.
Another thing which might come up soon (I know some car manufacturer thought about it), is for devices which include a lot of different electronics and have a big supply chain. It's more or less a combination of price and reconfiguration. It's more expensive to have 10 ASICs than 10 FPGAs - where both perform the same task, but it's cheaper to have 10 FPGAs with just one supplier and the need to hold just 1 type of chip at service and supply than to have 10 suppliers with the necessity to hold and manage 10 different chips in supply and service.
True story.
They allow you to fix design flaws in the custom data-acquisition boards for a multi-million dollar particle physics experiment that become obvious only after you have everything installed and are doing integration work and detector characterization.
You can evolve circuits, this is a bit old school evolutionary algorithms but starting from a set of random individuals you can select the circuits that score higher in a fitness function than the rest and breed them to create a new population ad infinitum. read up about Evolutionary Hardware, think this book covers FPGA's http://www.amazon.co.uk/Introduction-Evolvable-Hardware-Self-Adaptive-Computational/dp/0471719773/ref=sr_1_1?ie=UTF8&qid=1316308403&sr=8-1
Say for example you wanted a DSP circuit, you have an input signal and a desired output signal, starting with a random population you select perhaps only the fittest (bad) or perhaps a mixture of fitties and odd ones to create the next generation. after a number of generations you can open the lid and discover low and behold evolution has taken place and you have a circuit that may even out perform your initial expectations!
also read the field guide to genetic programming, it's free on the web somewhere.
There are limitations to software. On software, you're running at the CPU's clock rate, enabling you to only execute one instruction per clock cycle. On software, everything is high level, you do not control details that happen in the low level. You'll always be limited by the operating system or development board you are programming. This is true for popular development boards out there such as Arduinos and Raspberry Pi.
In FPGA hardware, you can precisely program and control what happens between each clock cycle, providing your computations the speed at the electron level (note: speed of electrons determines speed of electric signal transfers between hardware)
Now, we know FPGA implies Hardware, Speed of Electrons, which is much better than
CPU that implies Software, 1 instruction per clock cycle.
So why use FPGA when we can design our own boards using Printed Circuit Board, transistor level?
This is because FPGA's are programmable hardware! It is built such that you can program the connections of a board instead of wiring it up for a specific application. This explains why FPGA's are expensive! It is sort of a 'general hardware' or Programmable Hardware.
To argue why you should pick FPGA's despite their cost, the programmable hardware component allows:
Longer product cycle (you can update the programmable hardware on the customer's products which contains your FPGA by simply allowing them to programmed your updated HDL code into their FPGA)
Recovery for hardware bug. You simply allow them to download the corrected program onto their FPGA. (note: you cannot do this with specific hardware designs as you will have to spend millions to gather back your products, create new ones, and ship them back to customers)
For examples on the cool things FPGA can do, refer to Stanford's infamous ECE5760 course.
http://people.ece.cornell.edu/land/courses/ece5760/FinalProjects/
Hope this helps!
Soon Chee Loong,
University of Toronto
FPGA are also used to test/research circuit design before they start mass production. This is happening in several sectors: image processing, signal processing, etc.
Edit - after few years we can now see more practical applications including finance and machine earning:
aerepospace
emulation
automotive
broadcast
high performance computers
medical
machine learning
finance (including cryptocoins)
I like this article: http://www.hpcwire.com/hpcwire/2011-07-13/jp_morgan_buys_into_fpga_supercomputing.html
My feeling is that FPGA's can sit directly in your streaming data at the point where it enters your the systems under your control. You can then crunch that data without going through the steps a GPGPU would require (bringing the data in off the network, passing it across the PCI Express bus and crunching it a Gb at a time).
There are good reasons for both, but I think the notion of whether you mind buffering the data is a good bellwether.
Here's another cool FPGA application:
https://ehsm.eu/m-labs.hk/m1.html
Automotive image processing is one interesting domain:
Providing lane-keeping support to the driver (disclosure: I wrote this page!):
http://www.conekt.co.uk/capabilities/50-fpga-for-ldw
Providing an aerial view of a car from 4 fisheye-lens cameras (with video):
http://www.logicbricks.com/Solutions/Surround-View-DA-System/Xylon-Test-Vehicle.aspx

Resources