Generating multiple delayed channels in FPGA - fpga

Trying to find a way of implementing on an FPGA a multi channel delayed signal in real time. My intention is to A/D a continuous audio signal and split the signal into 10 output channels with each channel time delayed by differing delay amounts. The delays are to vary between 10us to 50us between each channel. I'm trying to attempt a beamforming of an audio signal.

Could be done on a ram block large enough to hold the data for the longest required delay.
So there would be a ring buffer, samples would be written to a common head and read out at different offsets from the head, with offsets matching the desired delay. Even at few megasamples per second (unlikely for the audible sound?) you should be able to do that with a simple dual-port ram block (one writing port, one reading port), or even with a single port ram.

Related

Dual clock FIFO in vivado (verilog)

I want to use a FIFO in a project where a state machine buffers as much data as possible to a FIFO, which will then be processed by a DSP block. To maximize data throughput I want to use multiple QSPI nor flash modules as a ROM with a wide data bus. DSP blocks can perform 1 operation per cycle which means the data input width has to be larger than the output width (because it takes 8 clock cycles to transfer data). For example, I want a FIFO with 32-bit input and 8-bit output, but this is not supported in vivado as far as I can tell. I tried the various FIFO block generators and even tried using the macros but I'm not having much luck.
Please let me know how I can generate a dual clock FIFO with a larger input width than the output width.
Using FIFO Generator IP core in Vivado, choose Independent Clocks Block RAM for FIFO Implementation and then you will be able to set larger data width for write port than the read port.

What is the overall impact of ignoring data at FIFO input in an FPGA?

I understand the operation of a FIFO, but I think I am missing something about it's utility.
When implementing a FIFO in an FPGA, let's say to cross clock domains, it seems that you would frequently run into the situation where the FIFO is full, but there is still data that should be clocking in every cycle. This might happen if the writing mechanism is clocking data in faster than the reading mechanism is reading data out. Obviously, once the FIFO is full it will start ignoring data until it has room to continue storing data.
My question is, isn't this a big deal? We are basically just losing data? Sure the FIFO is doing it's job, but the overall system is just throwing away data
I have drawn two possible conclusions
1) In this scenario (where the input data rate is greater than the output data rate), if we really care about not losing any data, maybe a FIFO isn't the best way to cross these domains (especially if the writing mechanism is much faster clock than the reading domain). If this is true, is there conventionally a better way to cross clock domains than with a FIFO? Maybe the answer is that you need to use another element, such as a decimator, before the FIFO?
2) We put a constraint on the system that says "you can only write for X amount of data (or cycles, or time etc.)" before the FIFO needs time to clear it's data. This seems unsatisfactory to me that we must turn off the data stream for a little while and wait for the FIFO to clear some room until we continue writing. But then again, I'm new to digital systems and maybe this is just the harsh reality that I am not used to :)
It seems then that the best use for a FIFO when crossing clock domains is simply one where the data rate into the FIFO and the data rate out of the FIFO are the same, because then it can keep up with itself.
It seems you're mixing two problems into one.
There's clock domain crossing, and input data buffering. It just happens that FIFO combines implementations for these two tasks in one entity.
If the receiver can't keep up with transmitter, and there's no flow control, then the data will be lost, and it doesn't matter if data was crossing the clock domains or not. You can't solve the data loss problem without adding some kind of handshake or flow control lines.
Without flow control you must ensure that the input buffer size is enough for handling load peaks in your specific case.
As for impacts - it's either nonexistant if your design is ok with data loss, or you'll have a nonfunctional device if the data loss is not tolerated by the design.
FIFOs have also the functionality of different input and output widths. That means for example you have an 100 Mhz 32 Bit Input and an 50Mhz 64bit output. The data rate into and out of the fifo ist half but the data widht is double.

what is the difference between Test pin and Ready pin in 8086 microprocessor?

can you tell me difference between Test pin and Ready pin in 8086 microprocessor because both of them deal with wait instructions?
TEST: input is examined by the ‘‘Wait’’ instruction. If the TEST input is
LOW execution continues, otherwise the processor waits in an ‘‘Idle’’
state. This input is synchronized internally during each clock cycle on
the leading edge of CLK.
READY: is the acknowledgement from the addressed memory or I/O
device that it will complete the data transfer. The READY signal from
memory/IO is synchronized by the 8284A Clock Generator to form
READY. This signal is active HIGH. The 8086 READY input is not
synchronized. Correct operation is not guaranteed if the setup and hold
times are not met.
If you read the description of the READY signal, the wait instruction is not mentioned.
The READY signal is sampled on each and every memory or I/O cycle. If a device is not capable of responding to the CPU's request in the standard bus cycle, the READY signal can be used to stretch out the cycle, giving it more time.
This is done by signalling to the CPU that the device is not READY. The CPU adds a clock cycles to the bus transaction until it is READY. These extra cycles are given the confusing name of "WAIT STATES", and have nothing to do with the WAIT instruction or the TEST line. Many years ago, makers of fast memory would brag "No wait states!"
The part about the 8284a refers to the details of ensuring that the READY input meets the timing requirements of the processor. Namely the so called setup and hold times, normally only of concern to the engineer designing the computer system.
In your question, you can see that the TEST input is explicitly sampled by the WAIT instruction. The TEST input is simply an input signal with a dedicated pin on the processor (TEST) sampled by a dedicated instruction (WAIT).
Most processors have signals similar to the READY line. The TEST line is rather more rare.

Simultaneous reading and writing to registers

I'm planning to design a MIPS-like CPU in VHDL on a FPGA. The CPU will have a classic five stage pipeline without forwarding and hazard prevention. In the computer architecture course I learned that the first MIPS-CPUs used to read from the register file on rising clock edge and write on falling clock edge. The FPGA I'm using doesn't support using rising and falling clock edge at the same time (regarding reading and writing to registers), so I can't exactly do like the original MIPS and have to do it all on rising clock edge.
So, here comes the part where I'm having a problem. The instruction writes back to the register in the write back stage. The write back stage sends the data directly to the decode stage. Another instruction in the decode stage wants to read the same register that also the write back stage wants to write.
What happens in this case? Does the decode stage take the new value for the instruction or the old value that is still in the register file?
A register file that fits in the decode stage of the classic five stage design consists of a triple port RAM (or two dual port RAM) and two muxers and comparators. The comparators and muxers are required to bypass the data coming from the write-back stage. This is needed as the write data is written into the triple port RAM in the next cycle. Because the signals coming from the write-back stage are synchronous, this is not a problem.
The question is what do you understand by the term "register". Or more specifically, how do you would like to map the register bank to the FPGA.
The easiest but not so efficient way is to map each MIPS register to several flip-flops according to the register size. You can update these flip-flops at only clock-edge (e.g. falling edge). After that you can read the new content at any time also known as asynchronous read. This solution is not so efficient because the multiplexer to select one MIPS register from the register bank requires a lot of logic resources.
If you have an FPGA where the LUTs can be used as distributed memory, then almost all of the logic resources for the multiplexers can be saved. Distributed memory typically provides an asynchronous read too (and a synchronous write of course). Please read the vender documentation of the synthesis tool on how to describe this type of memory for synthesis.
Last but not least, you can map the full register bank to on-chip block memory. These typically provide only a synchronous read, i.e., reading starts at a clock-edge. (Of course, they also provide only a synchronous write). However, these are typically dual-ported RAMs. Thus, you can write at the falling edge at one port and read with the rising at on the other port. Please read, the documentation of your FPGA on the timing of the write. For example, on some Altera FPGAs the writing is delayed to the next opposite edge (here rising-edge) of the clock.

Asynchronous asymmmetric FIFO in VHDL synthesis issue

I have designed an Asynchrounous asymmetric fifo using VHDL constructs.It is generic fifo with depth and prog_full as parameters. It has 32-bit in 16-bit output data width.
You can find the fifo design link here.
The top level asymmetric fifo (fifo_wrapper.vhd),is built upon an 32-bit asynchronous fifo(async_fifo.vhd). This internal fifo (async_fifo) is build using the logic from generic FIFO on open cores (http://opencores.org/project,generic_fifos). I have added a simple testbench to try out this fifo design.
BUT there is some issue with this design that I am not able to figure out. The fifo design works perfectly fine when I simulate it, but when I synthesize it and run It along with my other design on hardware I get some erroneous data sometimes. May be there is some corner case that I am not able to simulate or Is it some thing else?
That's why I would like anyone who needs this design to try it and let me know if he/she encounters any Issues during simulation or after synthesis.
thanks
PS: kindly let me know if there is some other forum where I can put my design for public use. thanks
There are a number of issues to point out in relation to this asynchronous FIFO
design, based on the assumption that the write and read clocks are fully
asynchronous.
A (and probably THE) major problem is that the write side pointer (wp in
async_fifo), which is a normal binary counter, is transfered and synchronized
to the read side clock without any Gray encoding. So the different bits in
the vector may arrive at different time in the read clock domain, thus the
write pointer value can (and most likely will from time to time) be different
from the write side value. The comparison with the read pointer (rp) will
therefore make no sense. Binary values that are transfered over clock
domains should be Gray encoded before transfer and decoded at arrival. Also
use synchronization with two flip-flop levels.
The two clocks (rd_clk and wr_clk) are assumed to be asynchronous, but there
is only a single reset (rst), so timing may be violated when reset is
deasserted, unless there are some additional requirements for clocking at the
time of reset deassert.
An similar with clear, where there is only one signal for use in two
different clock domains.
Suggestion would be to use a port naming convention where the clock domain
relationship for the port is cleared indicated in the name, like naming all
the ports in the write clock domain wr_* (e.g. wr_clk_i, wr_clk_we_i, etc.,
and all the ports in the read clock domain as rd_*.
Reset is asserted low, so a naming of rst_n would be nice.
n'I can't access your code (firewall) so I'll just mention the general points in designing them, which might be of help to you and others.
to be completely clock safe, the write side should exchange its pointer to the read side using a fully safe asynchronous handshaking method using 2 meta-stability signalling chains.
This contruct for this is a double buffered register.
The write side registers its write pointer into a buffer, and asserts a valid signal high.
A meta-stability chain reclocks the buffer valid signal to the read clock domain
On the read clock side, once a transition to high of valid is seen at the output of the meta chain, the data in the write side buffer is reregistered onto another register on the read domain. This is ok, because it is known that the data in the buffer is stable. (because of the meta chain).
The read domain asserts an ack signal high.
Another meta-stability chain reclocks the ack signal to the write clock domain.
The write side awaits a transition of the ack signal at the output of the meta chain, once seen it deasserts its valid signal.
The read side awaits a transition of the valid signal at the output of the meta chain to low, once seen it deasserts its ack signal.
The write side awaits a transition of the ack signal at the output of the meta chain to low. The cycle is now complete.
The current write pointer, which may have moved on quite a bit now, may now be transferred again.
A similar approach is taken for transferring the read pointer to the write domain.
It can be seen that although this approach leads to a latency between the write pointer on the write/ read side, and the read pointer on the read / write side, that this latency can never lead to overflow. Instead it leads to a premature full on the write side, adn a premature empty on the read side, which will eventually resolve once the pointers are next exchanged.
This approach is the only completely clock safe design for a fifo that doesn't depend on a-priori knowledge of the clock speeds. Gray coding is not required at all.
The other thing to note is that the logic for addressing / empty / full etc. needs to be duplicated on each clock domain.

Resources