I'm looking for advice on a less than ideal situation.
I've inherited a project where we have a hardware design issue. We generate a clock to a chip which feeds the clock back in over a none clock-capable input. This works at up to 160MHz but we are looking to increase the clock so I'm researching IO options. This is used to clock 8 parallel data inputs.
Right now the data inputs go through a delay and a IDDR block. The output is fed to a FIFO. Our clock is still routed to a BUFG - so we have:
Data - IDELAY - IDDR - FIFO
Clock - BUFG ----^------^
I read somewhere that routing to a BUFG has a large delay so a BUFR-BUFIO is better. Is this the case? Have I missed a better option?
When you say generating a clock to "a chip", I will assume that you mean the Kintex7 chip.
The delay is not a problem. The issue is for your timing closure to be set up properly so that the static timing analysis can validate whether you violate any setup or hold time in all boundary corners of the board.
If you look at DS182 document, you will find under AC Switching characteristics which will give you a rough idea on how well the chip can perform.
However, the best is to let the timing analyzer inside Vivado calculate for you whether your desired clock frequency will be able to close timing.
You just need to make sure
The data input is synchronous to your final clock.
If it isn't, then clock that data input across two stages of registers with respect to the final clock.
Specify your timing constraints
Run through synthesis and implementation
Check the timing to see that there are no violations.
Or maybe I did not understand something about what you are trying to do.
Related
I have a fpga that is taking in serial data at a bit rate of say 4.8 kbps.
Now I am not sure what clock frequency my fpga should run at to properly handle the data.
Will the clock speed simply need to be at minimum 4800 Hz?
It goes the other way round: you first have to determine how many clock cycles you need to process a single input "tick". If one cycle is enough to complete your processing, then 4800 Hz might be fine.
But if you need two cycles, then you would probably go with double speed.
This is a pretty generic answer, but your question is also pretty generic, so this is probably the best you can hope for without enhancing your input.
Will the clock speed simply need to be at minimum 4800 Hz?
Theoretical: yes, practical: no.
Theoretical.
You can receive a 4800 Hz signal with a 4800Hz clock but only if the clock is the exact right frequency. (The 4800 Hz will deviate, no clock is perfect). For that you would need something like a PLL which is in a measurement feedback loop looking at the signal en keeping the clock in step.
Practical.
Much easier is to use e.g. a 1MHz FPGA clock and use over-sampling. Even then you have the same problems as with a dedicated clock: you need to know where the bit boundaries are. Again some sort of clock locking or edge recognition mechanism is required. In fact you have to build the equivalent of a PLL but you can do it all using registers and counters.
When running at 1MHz (which is very slow for an FPGA) you have plenty of clock cycles to process your data.
Both methods depend on the protocol you are using which you did not mention. They are only possible for some type of signals/serial protocols. For example if signals are low or high for many clock cycles that would cause problems for either method.
I'm currently working on an FPGA design using SDaccel (and Vivado HLS). My design have several sub-components, and the latency (clock cycles) of each sub-component would depend on the input data at runtime (Therefore Vivado HLS analysis window would not be able to give me exact latency values). How do i measure the timing of each component in my design, so i can figure out where my bottlenecks are?
I found a pragma directive (pragma SDS trace), but i'm not sure how to use it to give me a detailed view of what is happening in the system during execution of different inputs.
Are there pragmas in Vivado_HLS that allow this? If so, How do i use them?
Thanks
W
The SDS pragma seems to apply only if you're using SDSoC, which supports Zynq and Zynq MPSOC.
If you're just using Vivado HLS, it looks like you need to incorporate tracing and measurement code in a more manual fashion.
In simulation, you could use the waveform view to see when each subcomponent receives data and produces output.
I often add trace or counter logic in my RTL for this purpose, so that I can measure latency and throughput on the FPGA as well.
A common pattern I have is to have an event FIFO to which I enqueue timestamp, event type, and event value. To make it non-blocking, only enqueue if there is room in the FIFO.
In addition to the methods explained in Jamey's answer, the Vivado HLS user guide describes a TRIPCOUNT pragma that specifies the number of iterations a loop should take.
Also, when using C/RTL cosimulation, the report should contain measured latency and throughput numbers, based on the input samples that were used during the simulation.
I am designing a Triple modular redundancy processor (TMR) system to synthesize in an Altera DE10lite FPGA Board. Its purpose is to demonstrate reliability of computation under the present of various faults. I need advice on how to connect three external crystal oscillators (instead of the on board crystal), with same ratings to drive the three processors inside the FPGA.I will be using a synchronization voting scheme to sync all three signals. Can this task be done?
Clock distribution triplication
I have read the following relevant links that describe using PLL's is this the correct way?
https://www.altera.com/documentation/mcn1395213337540.html#mcn1395213788377
No, that's unlikely to work.
If you run each soft CPU with a separate crystal, they will drift out of synchronization due to slight variations in frequency between the crystals.
If you try to use a majority voting scheme to create a single clock signal from three input clocks, you'll end up with a very weird, irregular clock signal which will probably cause faults in the logic driven by it.
Use one clock source at a time. If you're convinced you need to resist failures of an external clock, consider implementing some way to detect a failure in the current clock and switch to another one. (Keep in mind that this logic will need to still work without a functional clock… which may be difficult.)
I have implemented a hardware architecture on FPGA and i use some multiplier function on this architecture ,
I'd like to know is there any way or method on ISE software or hardware (by using chip scope) to calculate the maximum delay time of each section/step?
for example i want to know if i increase the input clock pulse, which sections won't work correctly?
Look at the timing report for the design, which can give you delay information about various elements in a requested path.
Based on this you can also get minimum slack information, which then tells you how much you may increase the clock, and you can then change the clock frequency and rerun synthesis to check that it holds timing with the new clock frequency.
Using specific measurement, from for example chip scope, only gives information about that specific chip, on that specific power supply, with that specific data, etc., where the timing engine (Static Timing Analysis (STA)) given you a worst case analysis for design and vendor parameters.
I very recently began experimenting with FPGAs. In researching things around the net I've noticed in several places that designs might use multiple separate PLL clocks of the exact same speed. Why is that?
One example I will give is this site: Parallella Linux Quick Start
They have their FCLK_CLK1 and FCLK_CLK2 both at 200MHz. Why is this recommended and not a single clock at 200MHz for both? Is it just customary to give each major component their own clock even if it is the same? Or am I missing something?
Beside the already mentioned reasons multiple other reasons exist why two PLL clocks of the same speed might exist.
Even if the frequency is exactly the same, differences might exist in clock phase or jitter. Using one PLL with fixed clock phase and another one with adjustable clock phase can be useful for proper sampling of external input signals or maintaining the correct phase difference between clock and output data. Techniques like that were especially popular before components such as IDELAY and ODELAY were widely available.
Crystal oscillators also will have small derivations from their marked value. If you have a communication link between two boards and both boards have their own oscillator, then one boards main clock might run at 200.01 Mhz and the other boards could run at 199.99 Mhz. In many cases both FPGAs will then their locally generated low-jitter clock as their main clock, but will also use the remote clock to sample incoming data. You can see this in ethernet PHYs: A 100 Mbit PHY usually has a 25 Mhz receive clock recovered from the input signal and a locally generated 25 mhz transmit clock.
There are many reasons to use multiple clocks of the exact same speed. So I will just state a few. However i don't have any deep knowledge of your example.
Magic on FPGA.
Like stated in the comments a FPGA is a highly complex device. Only the vendor knows exactly what is happening there, so they might give you some advice, which can be weird.
Clock distribution.
If you have a design, with just one clock source, it is critical to route the clock correctly. The clock has to trigger everywhere at the same time, which is hard to manage for the PnR tool. Today's FPGAs don't usually have this issue.
Different IPs on one FPGA.
If you have different IPs/Designs, which you fuse on one FPGA, the IPs can use different clocks. If you want to split it later again, you will need multiple sources of the clock anyway. Besides, you are forced to implement some registers if you switch a clockdomain and during the merging of your IPs, you don't mix up evrything, which is a good design style. This also maybe the case of your example.
HDMI support is provided by an IP core from Analog Devices ...
Output.
Maybe the additional clock is only used as an output at some I/O Port.
Low-Power.
In today's CMOS technology, the most power is wasted on transitions (transistor switches) and static power-leakage (the damn thing is so small, it just leaks current). With multiple clock domain, you have the opportunity to have less transitions per second. Or you can switch off parts of your device completly.