Interrupt handling with fpga in VHDL - vhdl

I am writing interputs for a fpga and dsp need to interact with a dual port memory shared dpram control in vhdl.
I have External IOs coming from the SPI bus on oneside to the fpag to be communicated with dsp and on the otherhand have a camera to the to the dsp.
So my intrups are like Havinf a FIFO being reset after everytime a FSM reads and writes the interrpts with dsp.
Now my problem is
I want to enable some particular interupts at a time and disable the others.
When make a masking with logical XOR function the other interupts coming from UART goes for a timeout.
When this is done the camera gets the signal but cant be controlled.
I use the following algorithm to deal with all asynchron inputs:
In event2reg_array_proc: save all inputs to parallel buffers “fifo_data_input_array”, each input(flag) should be put into separate buffer.
In reg_array2fifo_proc2: read each buffer serially and save them in a fifo “fifo320x32”.
In main FSM read the output from fifo and do the suitable processing, each cycle read out only one value, it should be one flag.
If you get some flags which remains in register even after processing, the reason can be:
In event2reg_array_proc: and reg_array2fifo_proc2:, if one flag (in buffer) has been written in the fifo, it should be cleared from the buffer. I use the “fifo_cnt” to control this. You can use simulation to check.
Line Camera sends the FRAME_VALID signal as same as the LINE_VALID signal, so you can get a lot of CAM2DSP_FRAME_SYNC_FLAG with Line Camera.
So can any one suggest any algorithm to enable particular interupts while the the camera is still communicating with DSP.

Your question is not clearly worded enough to enable a proper answer.
But one point is clear : XOR is not a good choice for an interrupt mask!
Either AND or OR would be a better choice depending on the logic of the interrupt handler.

Related

Best Route For Input Clocks on Kintex7 FPGA

I'm looking for advice on a less than ideal situation.
I've inherited a project where we have a hardware design issue. We generate a clock to a chip which feeds the clock back in over a none clock-capable input. This works at up to 160MHz but we are looking to increase the clock so I'm researching IO options. This is used to clock 8 parallel data inputs.
Right now the data inputs go through a delay and a IDDR block. The output is fed to a FIFO. Our clock is still routed to a BUFG - so we have:
Data - IDELAY - IDDR - FIFO
Clock - BUFG ----^------^
I read somewhere that routing to a BUFG has a large delay so a BUFR-BUFIO is better. Is this the case? Have I missed a better option?
When you say generating a clock to "a chip", I will assume that you mean the Kintex7 chip.
The delay is not a problem. The issue is for your timing closure to be set up properly so that the static timing analysis can validate whether you violate any setup or hold time in all boundary corners of the board.
If you look at DS182 document, you will find under AC Switching characteristics which will give you a rough idea on how well the chip can perform.
However, the best is to let the timing analyzer inside Vivado calculate for you whether your desired clock frequency will be able to close timing.
You just need to make sure
The data input is synchronous to your final clock.
If it isn't, then clock that data input across two stages of registers with respect to the final clock.
Specify your timing constraints
Run through synthesis and implementation
Check the timing to see that there are no violations.
Or maybe I did not understand something about what you are trying to do.

Simultaneous reading and writing to registers

I'm planning to design a MIPS-like CPU in VHDL on a FPGA. The CPU will have a classic five stage pipeline without forwarding and hazard prevention. In the computer architecture course I learned that the first MIPS-CPUs used to read from the register file on rising clock edge and write on falling clock edge. The FPGA I'm using doesn't support using rising and falling clock edge at the same time (regarding reading and writing to registers), so I can't exactly do like the original MIPS and have to do it all on rising clock edge.
So, here comes the part where I'm having a problem. The instruction writes back to the register in the write back stage. The write back stage sends the data directly to the decode stage. Another instruction in the decode stage wants to read the same register that also the write back stage wants to write.
What happens in this case? Does the decode stage take the new value for the instruction or the old value that is still in the register file?
A register file that fits in the decode stage of the classic five stage design consists of a triple port RAM (or two dual port RAM) and two muxers and comparators. The comparators and muxers are required to bypass the data coming from the write-back stage. This is needed as the write data is written into the triple port RAM in the next cycle. Because the signals coming from the write-back stage are synchronous, this is not a problem.
The question is what do you understand by the term "register". Or more specifically, how do you would like to map the register bank to the FPGA.
The easiest but not so efficient way is to map each MIPS register to several flip-flops according to the register size. You can update these flip-flops at only clock-edge (e.g. falling edge). After that you can read the new content at any time also known as asynchronous read. This solution is not so efficient because the multiplexer to select one MIPS register from the register bank requires a lot of logic resources.
If you have an FPGA where the LUTs can be used as distributed memory, then almost all of the logic resources for the multiplexers can be saved. Distributed memory typically provides an asynchronous read too (and a synchronous write of course). Please read the vender documentation of the synthesis tool on how to describe this type of memory for synthesis.
Last but not least, you can map the full register bank to on-chip block memory. These typically provide only a synchronous read, i.e., reading starts at a clock-edge. (Of course, they also provide only a synchronous write). However, these are typically dual-ported RAMs. Thus, you can write at the falling edge at one port and read with the rising at on the other port. Please read, the documentation of your FPGA on the timing of the write. For example, on some Altera FPGAs the writing is delayed to the next opposite edge (here rising-edge) of the clock.

VHDL Finite State Machine - Is the reset really necessary?

I'm still learning VHDL for synthesis purposes on a custom Xilinx Spartan-6 based board. My design includes a lot of FSM and I've just learned in a previous question that the single process implementation is a lot better and much easier to use.
I also learned that initialization values for signals are actually synthetizable.
So here is the question: do I really need a reset signal to put the FSM in idle with default outputs, IF I don't need to interrupt the FSM mid flow OR I already have another signal that stops it?
let's see what is the Xilinx appraoch on reset :
Xilinx FPGA includes "Global Set/Reset" module which automatically set all signals at their initialisation values at start-up. The initialisation value is declared as follow:
signal foo : std_logic := '0';
-- ^ initialisation value
When designing a new part of code, you have to think twice for each bit if it needs to be reset by something else than the GSR, because using your own global reset is actually using a second global reset.
For your FSM, it has a startup state (IDLE) and will never be reset in the whole bitstream life. We can say at first that the FSM do not need a reset. But if you just do it like it, you'll be exposed to metastability issues. The GSR is quite slow to deassert its reset and it does it asynchronously. All flip-flop won't be released at the same time and your FSM can go in an illegal state.
So, use a local reset for your FSM (and counters as well).
To complete the reset question:
avoiding the use of global reset has better place and route result, which leads to less timing errors. A global reset uses the same network as others signals in the design, it prevents some routing resource to be available for other signal distribution.
if you really need the use of a reset, prefer an active high synchronous reset or at least an active high reset, activated asynchronously and deactivated synchronously. Active High because Xilinx Flip-Flop uses active high SET and RESET, synchronous to avoid metastability problem.
Workaround:
A solution to avoid the local reset on the FSM could be the use of a bufgce module at clock entry. At startup, this module do not feed the design with the clock and wait for some clock cycles before enabling the clock. Only a local reset is used here to manage the enable input of the BUFGCE and the reset of the FGPA is reset free.
I don't know how many clock cycles have to be waited, but it can do it. The first approach is still the best for now.
If your state variable is initialised to 'idle', then having a reset which forces it to 'idle' is only useful if you need it for some other reason. One major example would be if the state machine has states, where, on noticing an erroneous input, it deliberately stops to wait for something to reset it, before resuming normal operation.
The machine might also be running from a clock that is not guaranteed to be glitch free, or is for some reason not 100% reliable. In this case it can be sensible to include a reset, so that something like a host processor or other FPGA logic can somehow detect that the state machine is no longer working, and reset it.
Lots of people seem to have a reset signal in most processes they write, but it's perfectly valid to rely on signal and output initialisation values, if the machine then meets your design requirements. If all the reset does is assert itself briefly during startup, and never again, I would say there's not much point in it.
[EDIT] Per other answers, relying on initialisation values is normally only valid in SRAM-based FPGA designs.
There are a couple of white papers from Xilinx that address the issue and they actually show up as the first two items googling for Xilinx reset.
These are WP272 Get Smart About Reset: Think Local, Not Global and WP275 Get your Priorities Right – Make your Design Up to 50% Smaller.
The first paper does a fair job of pointing out where you should use a reset as opposed to where you can depend on configuration and default values.
The second paper also points out that the reasons why are vendor and technology dependent. You could also note the reason for eliminating 'unnecessary' resets is to preserve place and route resources.
Because you don't elaborate the details of a Finite State Machine implementation while asking if the reset is really necessary, note the claim in WP272 where an asynchronous reset can be deleterious for a One Hot State Machine which would benefit from configuration load (a default value), synchronous reset or a clock synchronized asynchronous reset.
Your VHDL code with (proper) resets is ultimately more portable, should your design ever be intended for an ASIC or some other non- bit image loaded solutions. For those soft loaded designs the ultimate reset is embodied in a configuration load.
Otherwise the purpose is to conserve place and route resources.

Bus protocol for a microcontroller in VHDL

I am designing a microcontroller in VHDL. I am at the point where I understand the role of each component (ALU/Memory...), and some ideas on how to realise them. I basically want to implement a Von Neumann architecture.
But here is what I don't get : how do the components communicate ? I don't know how to design my bus (buses?). I am therefore looking for a simple bus implementation and protocol.
My unresolved questions :
Is it simpler to have one bus for everything or to separate the different kind of data ?
How does each component knows when to "listen" and when to "write" ?
The emphasis is on the simplicity of the design (and thus of the implementation). I do not care about speed. I want to do everything from scratch (ie. no pre-made softcore).
I don't know if this is of importance at this stage, but it will not need to run "real" compiled code, is have any kind of compatibility with anything existing. Also, at which point do I begin to think about my 'assembly' instructions ? I thinks that I will load them directly in the memory.
Thank you for your help.
EDIT :
I ended up drawing (a lot of) inspiration from the Picoblaze, because it is :
simple to understand
under a BSD Licence
Specifically, I started by adding a few instructions to it.
Since your main concern seems to be learning about microcontroller design, a good approach could be taking a look into some of the earlier microprocessor models. Take for instance the Z80:
Source: http://landley.net/history/mirror/cpm/z80.html
Another good Z80 HW description: http://www.msxarchive.nl/pub/msx/mirrors/msx2.com/zaks/z80prg02.htm
To answer your first question (single vs. multiple buses), this chip uses a single bus for everything, and it has a very simple design. You could probably use something similar. To make the terminology clear, a single system bus may be composed of sub-buses (and they are also called buses). The figure shows a system bus composed of a bidirection data bus (8-bit wide) and an address bus (16-bit wide).
To answer your second question (how do components know when they are active),
in the image above you see two distinct signals, memory request and I/O request. Only one will be active at a time, and when I/O request is active, that's when a peripheral could potentially be accessed.
If you don't have many peripherals, you don't need to use all 16 address lines (some Z80's have an 8-bit I/O space). Each peripheral would be accessed through some addresses in this space. For instance, in a very simple system:
a timer peripheral could use addresses from 00h to 03h
a uart could addresses from 08h to 0Fh
In this simple example, you need to provide two circuits: one would detect when the address is within the range 00-03h, and another would do the same for 08-0Fh. If you do a logic "and" between the output of each detector and the I/O request signal, then you would have two signals indicating when each of the peripherals is being accessed. Your peripheral hardware should primarily listen to this signal.
Finally, regarding your question about instructions, the dataflow inside your microprocessor would have several stages. This is usually called a processor's datapath. It is common to divide the stages into:
FETCH: read an instruction from program memory
DECODE: check specific bits within the instructions, and decide what type of instruction it is
EXECUTE: take the actions required by the instruction (e.g., ALU operations)
MEMORY: for some instructions, you need to do a data read or write
WRITE BACK: update your CPU registers with new values affected by the instruction
Source: https://www.cs.umd.edu/class/fall2001/cmsc411/projects/DLX/proj.html
Most of your job of dealing with individual instructions would be done in the DECODE and EXECUTE stages. As for the datapath control, you will need a state machine that controls the sequence of operations through the 5 stages. This functional block is usually called a Control Unit. Here you have a few choices:
Your state machine could go throgh all stages sequentially, one at a time. An instruction would take several clock cycles to execute.
Similar as the choice above, but combining two or more stages in a single cycle if you want to make things simpler and faster.
Pipeline the execution of instructions. This can give a great speed boost, but maybe it's better left for later because things can get quite complex.
As for the implementation, I recommend keeping the functional blocks as separate entities, and make sure you write a testbench for each block. Your job will go faster if you write those testbenches.
As for the blocks, the Register File is pretty easy to code. The Instruction Decoder is also easy if you have a clear idea of your instruction layout and opcodes. And the ALU is also easy if you know the operations it needs to perform.
I would start by writing testbenches for the Instruction Decoder and the Register File. Then I would write a script that runs all the testbenches and checks their results automatically. Only then I would focus on the implementation of the functional blocks themselves.
Basically on-chip busses will use parallel busses for address and data input and output. Usually there will be some kind of arbiter which decides which component is allowed to write to the bus. So a common approach is:
The component that wants to write will set a data line connected to the arbiter to high or low to signal that it wants to access the bus.
The arbiter decides who gets access to the bus
The arbiter sets the chip select of the component that should be allowed next to access the bus.
Usually your on chip bus will use a master/slave concept, so only masters have acting access to the bus. The slaves only wait for requests from the master.
I for one like the AMBA AHB/APB design but this might be a little over the top for your application. You can have a look at this book looking for ideas on how to implement your bus

Asynchronous asymmmetric FIFO in VHDL synthesis issue

I have designed an Asynchrounous asymmetric fifo using VHDL constructs.It is generic fifo with depth and prog_full as parameters. It has 32-bit in 16-bit output data width.
You can find the fifo design link here.
The top level asymmetric fifo (fifo_wrapper.vhd),is built upon an 32-bit asynchronous fifo(async_fifo.vhd). This internal fifo (async_fifo) is build using the logic from generic FIFO on open cores (http://opencores.org/project,generic_fifos). I have added a simple testbench to try out this fifo design.
BUT there is some issue with this design that I am not able to figure out. The fifo design works perfectly fine when I simulate it, but when I synthesize it and run It along with my other design on hardware I get some erroneous data sometimes. May be there is some corner case that I am not able to simulate or Is it some thing else?
That's why I would like anyone who needs this design to try it and let me know if he/she encounters any Issues during simulation or after synthesis.
thanks
PS: kindly let me know if there is some other forum where I can put my design for public use. thanks
There are a number of issues to point out in relation to this asynchronous FIFO
design, based on the assumption that the write and read clocks are fully
asynchronous.
A (and probably THE) major problem is that the write side pointer (wp in
async_fifo), which is a normal binary counter, is transfered and synchronized
to the read side clock without any Gray encoding. So the different bits in
the vector may arrive at different time in the read clock domain, thus the
write pointer value can (and most likely will from time to time) be different
from the write side value. The comparison with the read pointer (rp) will
therefore make no sense. Binary values that are transfered over clock
domains should be Gray encoded before transfer and decoded at arrival. Also
use synchronization with two flip-flop levels.
The two clocks (rd_clk and wr_clk) are assumed to be asynchronous, but there
is only a single reset (rst), so timing may be violated when reset is
deasserted, unless there are some additional requirements for clocking at the
time of reset deassert.
An similar with clear, where there is only one signal for use in two
different clock domains.
Suggestion would be to use a port naming convention where the clock domain
relationship for the port is cleared indicated in the name, like naming all
the ports in the write clock domain wr_* (e.g. wr_clk_i, wr_clk_we_i, etc.,
and all the ports in the read clock domain as rd_*.
Reset is asserted low, so a naming of rst_n would be nice.
n'I can't access your code (firewall) so I'll just mention the general points in designing them, which might be of help to you and others.
to be completely clock safe, the write side should exchange its pointer to the read side using a fully safe asynchronous handshaking method using 2 meta-stability signalling chains.
This contruct for this is a double buffered register.
The write side registers its write pointer into a buffer, and asserts a valid signal high.
A meta-stability chain reclocks the buffer valid signal to the read clock domain
On the read clock side, once a transition to high of valid is seen at the output of the meta chain, the data in the write side buffer is reregistered onto another register on the read domain. This is ok, because it is known that the data in the buffer is stable. (because of the meta chain).
The read domain asserts an ack signal high.
Another meta-stability chain reclocks the ack signal to the write clock domain.
The write side awaits a transition of the ack signal at the output of the meta chain, once seen it deasserts its valid signal.
The read side awaits a transition of the valid signal at the output of the meta chain to low, once seen it deasserts its ack signal.
The write side awaits a transition of the ack signal at the output of the meta chain to low. The cycle is now complete.
The current write pointer, which may have moved on quite a bit now, may now be transferred again.
A similar approach is taken for transferring the read pointer to the write domain.
It can be seen that although this approach leads to a latency between the write pointer on the write/ read side, and the read pointer on the read / write side, that this latency can never lead to overflow. Instead it leads to a premature full on the write side, adn a premature empty on the read side, which will eventually resolve once the pointers are next exchanged.
This approach is the only completely clock safe design for a fifo that doesn't depend on a-priori knowledge of the clock speeds. Gray coding is not required at all.
The other thing to note is that the logic for addressing / empty / full etc. needs to be duplicated on each clock domain.

Resources