Arrays as buffer VHDL - vhdl

I need to create a FIFO buffer in VHDL. I need to use a 2 dimensional array to storage data like (number of data)(n-bit data).
If I create a single "big" array that storage for example 1000 entrys. Every new data clock I storage one slot. And every output data clock I output a data. What happen if this two clocks occour near at the same time?
For example:
if rising_edge(INPUT_DATA) then
Register_Array(Counter_IN) <= DataIN;
Counter_IN <= Counter_IN + 1;
end if;
if rising_edge(OUTPUT_DATA) then
DataOUT <= Register_Array(Counter_OUT);
Counter_OUT <= Counter_OUT + 1;
end if;
If it's possible to create a process like this, what happen if two clock are near at the same time?
Consider I can't lose any data.

What you are asking about here is a clock domain crossing FIFO, or CDC FIFO.
Clock domain crossing FIFOs are surprisingly difficult to design. There are many pitfalls, and most of them cannot be checked by simulation.
As for your arrays, you should use arrays of std_logic_vector, like in the answer linked to by #Nicolas Roudel.
But this is still far from a functioning CDC FIFO. You also need read and write pointers in gray format, gray to bin pointer conversion, clock domain crossings for the two gray pointers, empty and full indications, read and write signals, proper attributes to prevent the synthesizer from breaking the clock domain crossings, and timing constraints.
All this is needed to properly protect against exactly the thing you ask about: "What happens when two clocks occur at almost the same time?"
The thing that happens when two clocks occur at almost the same time is called "metastability", and it will cause all kinds of bad and unpredictable things in your design.
If you get only one thing in the design of the CDC FIFO wrong, your design will likely work fine in simulation, and even in hardware. Most of the time........ :-)
All FPGA vendors have ready-made CDC FIFOs which you can use. I would highly recommend that beginners consider using the ready-made FIFOs for production designs.
But at the same time, designing a CDC FIFOa is a nice challenge to learn about clock domain crossings and metastablity.
This is one of many pages where you can find information about how to handle clock domain crossings: https://filebox.ece.vt.edu/~athanas/4514/ledadoc/html/pol_cdc.html
There is also a related stackexchange answer here: https://electronics.stackexchange.com/questions/97280/trying-to-understand-fifo-in-hardware-context

Related

Round Robin gate-level diagram

May I know why 'priority' signal is fed back to AND gate input ?
What is the purpose of the AND gate in the picture below ?
From googling, I also found this article , but I am not sure if the AND gate in this article serves the similar purpose.
The article implementation uses some mask vector which seems a bit strange and complicated in terms of hardware resources as well.
'priority' signal is fed back so that the given priority stays on for multiple cycles since the registers are not conditionally clocked
So, if priority 1 is high and all the grant inputs are low, it will stay high forever.
Well, better wording would be: it is looped back into the AND gate for the purpose of it staying on forever and the AND gate is there to cut it off in case a grant input becomes high

What are the difference in delay times of the basic AND, OR, NOT, NAND, NOR, XOR, XNOR gates?

1-1 What are the difference in delay times of the basic logic gates?
I found that NAND and NOR gates are preferred in digital circuit design for shorter delay time and that AND and OR gates might even be implemented with NOT and NAND/NOR gates.
1-2 Are there set or known difference in delay time between AND, OR, NOT gates?
For a typical fpga (LUT-based logical elements) there's no difference at all.
Single cell can implement a complex function based on its resulting truth table, and multiple expressions might be folded into single cell, so you wouldn't even find individual and/or/not "gates".
It might be different for ASIC, I don't know. But in a typical fpga you don't have gates, there are ram-based lookup tables, implementing complex functions of its inputs - 4-6 inputs, not just 2.
You'll find that in a big enough design the routing costs are much higher than delays in a single logical cell.
If you look at how these different gates are constructed you can see some of the reasons for differences. An inverter consists of one pull-up transistor and one pull down transistor. This is the simplest gate and is therefore potentially the fastest. A NAND has two pull-down devices in series and two pull-up transistors in parallel. The NOR is basically the opposite of the NAND. And yes: AND is usually just NAND + inverter.
The on resistance of a path will be higher with two transistors in series (making it slower), and the number of transistors connected to a single node will increase the captive load (making it slower). You can make things faster by using larger transistors (with lower on resistance) but that increases the load of whatever cell is driving it, which slows that cell down.
It is a big optimization problem which you probably shouldn't try to solve yourself. That is what the EDA tools are for.
Like most answers in life, it depends. There are many ways to build each type of logic gate and different types of transistors can be used to make each type of gate. You can build all gates from multiple universal gates like NAND and NOR. So the other gates would have a larger delay time. BJT transistors will have a larger delay than MOFET transistors. You can also use Schottky transistors to reduce delays compared to BJT. If you use an IC there are lots of components within the chip, some which may reduce delays and some that may increase delays. So you really have to compare what you are working with. Here is a video that shows the design of logic gates at the transistor level. https://youtu.be/nB6724G3b3E

FPGA logic cells

I have an small presentation about FPGA techonology. My questions is: If your FPGA has 85k logic cells, does this mean it can run 85k operations simultaneously?
What I am trying to achieve is to shock the audience with some crazy illustrated facts about FPGA technology or facts. The people who listens now very little about FPGA, so I want to impress them.
What's inside a 'cell' can vary per manufacturer, but the Xilinx definition (using this manufacturer as an example, as these are the devices that I'm familiar with) is one four-input look-up table, and one register. Xilinx devices are made up of a number of 'slices', and these contain a number of functional elements. These might include:
Look-up tables
Registers
Multiplexers
Logic for use in carry chains
etc
As an example, a Spartan6 LX4 has 600 slices, and the marketing material claims that this is equivalent to 3840 'logic cells'. You can look in the user guide for a device to determine exactly what is contained inside a slice.
In addition to this, there are other resources such as multipliers, memories, PLLs, etc.
I suppose you could say that one logic cell can perform one operation, but a single cell is only capable of very simple operations, for example an AND gate, 2:1 multiplexer, etc.
I would say no, but it depends on what you mean by an operation. A logic cell has the capability to implement a number of logical functions (and/or/xor), and it has the ability to hold a state with storage elements. These two functions are how every digital system under the sun operates. Even addition and subtraction are higher level constructs built on top of logical functions. As in other answers, FPGA manufacturers publish guides on what is inside of their logic cell. It is this fundamental cell that is stamped repeatedly in the die to create this "array" as in Field Programmable Gate "Array".
This yields a distinctly "more or less" answer. The logic blocks can be used in multiple modes, and you might even be able to pack more than one function in one (including with two independent outputs), but you must also be able to transport meaningful data to work on. It sounds like you have a 7z020 as an example. You may want to note that besides those logic cells, it also has 220 hardware multiply+add blocks. That amount is not random; the surrounding logic is enough to keep them fed in particular cases, every cycle. Looking in 7 Series FPGAs Configurable Logic Block User Guide (UG474), we find that the Logic Cells number given is an estimate of equivalent 4LUT+FF configurations. The reason this number is lower than the number of flipflops (106k) is that the input arguments for the two 5luts you can split a 6lut into must overlap.

VHDL frequency shifting, two exact and close frequencies

I'm trying to create two precise frequencies at the 100 MHz range which are just a few kHz apart. A PLL isn't is a solution since it can't multiply by such big values.
The only solution I came up with is XOR two frequencies to add them. However this creates other unwanted frequencies which can only be filtered with external components.
How can I do it?
The only method I can think of are to apply the techniques that are used to build "Time To Digital Converters", i.e., FPGA Based High Resolution Time to Digital Converter. This would allow you to create FPGA based oscillators at nearly any speed, at the cost of hardware resources.
If you plan to use this in a production environment, however, you have to deal with the influence of temperature and vdd on the resulting frequencies. I know that there are FPGA based, temperature compensated circuits for just this purpose, but I guess you'll have to dig rather deep into the matter.

Design tips for synchronising signals through a VHDL pipeline

I am designing a video pixel data processing pipeline in VHDL which involves several steps including multiply and divide.
I want to keep signals synchronised so that I can e.g. maintain a sync signal and output it correctly at the end of the pipeline along with manipulated pixel data which has been through several processing stages.
I assume I want to use shift registers or something to delay signals by the right number of cycles so that the output is correct, but I'm looking for advice about good ways to design this, particularly as the number of pipeline stages for different signals may vary as I evolve the design.
Good question.
I'm not aware of a complete solution but here are two partial strategies...
Interconnecting components... It would be really nice if a component could export a generic whose value was its pipeline depth. Unfortunately you can't, and dedicating a port to this seems silly (though it's probably workable; as it would be an integer constant, it would disappear in synthesis)
Failing that, pass IN a generic indicating the budget for this module. Inside the module, assert (severity FAILURE) if the budget can't be met... (this assert is checkable at synth time and at least Xilinx XST handles similar asserts)
Make the budget a hard number, and either assert if not equal to actual pipeline depth, or add pipe stages inside the module if the budget is too large, and only assert if the budget is too small.
That way you are connecting predictable modules, and the top level can perform pipeline arithmetic to balance things (e.g. passing a computed constant value to a programmable delay line)
Within a component... I use a single process, with registers represented as internal signals whose names reflect their pipe stage, exponent_1, exponent_2, exponent_3 and so on. Within the process, the first section describes all the actions for the first cycle, the second section describes the second cycle, and so on. Typically the "easier" paths may be copied verbatim to the next pipe stage, just to sync them with the critical path. The process is fairly organised and easy to maintain.
I might break a 32-bit multiply down into 16*16 chunks and pipeline the partial product additions. The control this gives, USED to give better results than XST gave alone...
I know some people prefer variables within a process, and I use them for intermediate results in a pipe stage, but using signals I can describe the pipeline in its natural order (thanks to postponed assignment) whereas using variables, I would have to describe it backwards!
I create a package for each of my major processing blocks, one of the constants in there is the processing delay of that block. I can then connect that up to my general-purpose "delay-line" block which has a generic for the number of cycles.
Keeping that constant in "sync" with the actual implementation is best done by a self-checking testbench.
Something to consider is delay lines (i.e. back to back registers) vs FIFOs.
Consider a module X with a pipeline delay N. FIFOs work well when there is a N is variable. The trick is remembering that you can only request new work when both the module and the FIFO can accept it. Ideally you size the FIFO so that it can contain the maximum number of items that X can work on concurrently, but sometimes that's not practical. For example, if your calculation includes accesses to a distant memory.
Another option is integrating the side channel (i.e. the path that your sync flag is taking) into the module X rather than it going outside. If you do this then if any part of the calculation has to stall, you can also stall the side channel and the two stay in sync. You can do this because you're in a scope that has all the necessary signals in it. Then all signals, whether used in the calculation or not, appear at the output at the same time.

Resources