FPGA image processing [closed] - fpga

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am doing a project( with Zynq 7000 kit from Xilinx) in which I need to receive an image from an Arm microcontroller and deliver it to an FPGA . I do not know how FPGA receive an image.
Should I convert image to an array or text file ? ( even without an ARM microcontroller here, I do not know how I can load an image ( like bitmap image or DICOM image) onto FPGA)
The important point is that I need some code that can synthesized.
I can not use some thing like "fileopen" or anything.
Is it possible to mail me a code for doing this part? May i know if you have explain sth about this question in your site.

You define the protocol yourself.
The ARM and the FPGA will have some connection, e.g. some GPIO pins from the ARM connected to some GPIO pins on the FPGA. On top of that, you will have to define a protocol, typically like this:
Symbol transport
Your protocol needs to transport data symbols from one side to the other. A symbol can be a single bit, a nibble, a byte or something else (it is up to you to optimize here). The recipient must be able to find out whether the current state of the signals is a transition state, or a symbol that should be read.
A simple implementation is the SPI protocol, which has separate data and clock connections. When a rising edge on the clock pin is detected, a single data bit is read. The bus can be stopped by stopping the clock, and the speed dynamically adjusted.
Lower Layer Framing
On top of the symbol transport, you usually want some kind of grouping, for example, a convention that you always transport whole bytes together, and have a pause after each frame, or that you always send a length indicator first.
This is important when the sender and the receiver lose synchronisation, e.g. because there is a short pulse on the clock line because of interference, and the receiver from that point on has counted one bit more than the sender.
When the pause starts, the receiver will have one bit too many, which is a clear sign that something went wrong, so the receiver can then reset and restart at zero.
On this layer, you can also include an indication which stream this frame belongs to, giving you the opportunity to split data and command traffic.
Error correction
Usually, these frames are extended with an error checking code (e.g. a CRC), which allows discarding data that has been transmitted with an error. A feedback mechanism can then be used to retransmit data that has been garbled. A possible implementation is that a single line is asserted when a frame has been received correctly; the sender can continue with the next frame when it sees the acknowledgement, or repeats the last frame after a timeout.
Upper Layer Framing
On top of that, you'd then have the actual data.
If you have a separate command stream, you can use that for framing (i.e. have a command "start of image", followed by data on the data stream, followed by the "end of image" command).
When everything is inside a single stream, you should follow a pattern of sync-tag-length-data. Every upper layer frame starts with a known sequence; if that is missing, data is discarded until that sequence is found (again, resynchronisation). The tag then performs the split into data and command streams, the length shows how much data is to follow, and the scan for the sync pattern is restarted after the data has been processed.
It is possible to combine the layers to optimize, or to skimp on error checking if errors in the output are acceptable and you want to push for performance. Also, I'd check if the pins on the ARM side have a "special function" attached to them, as most embedded CPUs have ready-made controllers for several communication protocols, which will allow you to implement the protocol quicker, and use hardware like the DMA controller for better performance.

Related

How do I debug Verilog code where simulation functions as intended, but implementation doesn't?

I'm a bit stumped.
I have a fairly large verilog module that I've tested in Simulation (iSim) and it functions as I want. Now I've hooked up it up in real life to another device using SPI, and some stuff works, and some stuff doesn't.
For example,
I can send a value using command A, and verify that the right value was received using command B. Works no problem.
But if I send a value using command C, I cannot verify that it was received using command D. In simulation it works fine, so I feel I can't really gain anything from simulating any more.
I have looked at the signals on a logic analyzer, and the controller device (not my design) sends the right messages. When I issue command B, I can see the return values correct from my device (I know SPI works anyways). I don't know whether C or D work correctly. D just returns 0s, so maybe C didn't work in the first place. There is no way to step through Verilog, and this module is packaged as IP for Vivado.
Here are two screenshots. First is simulation (I send 5, then 2, then I expect it to return 4 on the next send, which it does; followed by zeros).
Here is what I get in reality (the first two bytes don't matter, 5 is a left over from previously sent value):
Here is a command (B) that works in returning a correct value (it responds to the 0x01 being sent):
Does anyone have any advice for debugging this? I have literally no idea how to proceed.
I can't really reproduce this behaviour in simulation.
Since you are synthesizing to an FPGA, you have a few more options on how to debug your synthesized, on-chip design. As you are using Vivado, you can use ChipScope to look at any signal in your system; allowing you to view a waveform of that signal over time just as you would in simulation (though more restricted). By including the ChipScope IPs into your synthesis, you can sent waveform data back to the Vivaod software which will display a waveform of your selected signals to help you see whats going on inside the FPGA as the system runs. (Note, if you were using Altera's stuff, you can use their equivalent called SignalTap; its pretty much the same thing)
There are numerous tutorial online on how to incorporate and run ChipScope, heres one from the Xilinx website:
http://www.xilinx.com/support/documentation/sw_manuals/xilinx2012_4/ug936-vivado-tutorial-programming-debugging.pdf
Many other use ISE, but the steps are very similar as both typically involve using the coregen tool (though I think you can also add ChipScope via synthesis flow, so there are multiple options on how to incorporate it into your design).
Once on the FPGA, you have access to what is effectively an internal logic analyzer. Note that it does take up some LEs on the FPGA and can take up a fair amount of block RAM depending on how many samples you want to take out your signals.
Tim's answer provides a good description of how to deal with on-chip debugging if you are designing purely for ASIC; so see his answer if you want more information about standard, non-FPGA debugging solutions.
In cases like this you might want to think about adding additional logic which is used just for debugging. ('Design for debug') is a common term used for thinking about this kind of logic.
So you have one chip interface (SPI), which you don't know if it works correctly. Since it seems not to be working, you can't trust debugging over this interface, because if you get an odd result you can't determine what it means.
Since you're working on an FPGA, are there any other interfaces other than SPI which you can get working correctly? Maybe 7-segment display, LEDs, JTAG, VGA, etc?
Try to think of other creative ways to get data out of your chip that don't require the SPI interface.
If you have 4 LEDs, A through D, can you light up each LED for 1 second each time a command of that type is received?
Can you have a 7-seg display the current state of your SPI receiver's state machine, or have it indicate certain error codes if some unknown command is received?
Can you draw over VGA to a monitor a binary sequence of the incoming SPI bitstream?
Once you can start narrowing down with data what is actually happening inside your hardware, you can narrowing the problem space to go inspect for possible problems.
There are multiple reasons why code that runs ok in RTL simulation behaves differently in the FPGA. It is important to consider all possibilities. Chipscope suggested above is definitely a step in right direction and it could give you hint, where to look further. These reasons could be:
The FPGA implementation flow was not executed properly. Did you have right timing constraints, were they met during implementation, especially P&R phase, pin placements, I/O properties, right clock properties. Usually you can find hints inspecting FPGA implementation reports. This is a tedious part, but needed sometimes. Incorrect implementation flow can also result in FPGA implementations that work or don't depending on the run or small unrelated changes (seen this problem many times!).
RTL/netlist discrepancies, e.g. due to incorrect usage `ifdef within design or during synthesis phase, selecting incorrect file for synthesis or the same verily module defined in multiple places. Often, the hint could be found by inspecting removed flop list or synthesis warnings.
Discrepancy between RTL simulation and board environment. They could be external like the clock/data alignment on the interface, but also internal: improper CDC, not handling clock or reset tree delays properly. Note, that X-propagation and CDC is not handled properly in RTL, unless you code in a certain way. Problems with those could be often only seen in netlist simulation environment.
Lastly, the FPGA board problems, like faulty clock source or power supply, heat can also be at fault. They worth checking, but I'd leave those as a last resource. Some folks have a dedicated board/FPGA test design proven to work on the good board that would catch some of those problems.
As a final note, the biggest return is given by investing in simulation environment. Some folks think that since FPGA can be debugged with chipscope and reprogrammed quickly, there is no need in good simulation environment. It probably depends on the size of the project, but my experience is that for most of modern FPGA projects the good simulation environment saves a lot of time spent in the lab looking through chipscope and logic analyzers captures.

is design compiler& encounter is for ASIC design and quartus&modelsim is for FPGA design?

Right now, I am trying on place-routing on encounter, but when I search on web, I always see the tuition about quartus routing. For curious, I try to find out the difference between two of them. However, there is not any exact answer right now. But when I moving the layout of these two tools made, I feel like quartus' layout is look like making on a settle down chip. But encounter would give me more customly feeling. Thus, I suppose quartus for FPGA encounter for ASIC. Am I right? If not, plz tell me the exact story.
Encounter is a place and route tool for custom silicon, so it can pick any cell from a library, put it anywhere within a placement block, and route metal to it on any available layer as needed. The output of Encounter is a GDSII file showing what polygons need to be created on each layer as part of the silicon manufacturing process.
An FPGA has already placed all of the available transistors and wires within the device. Quartus (or ISE, for Xilinx) maps logic into LUTs (the logic unit within an FPGA) and figures out how to connect the LUTs using available tracks between the logic blocks. The output of Quartus is a bit stream which tells what values to put in to each LUT on the device and which routing tracks to select/connect between the LUTs.

Too many comps of type "BUFGMUX" found to fit this device. (Ethernet Design)

I'm designing an Ethernet MAC Controller for Spartan 3E FPGA. IOBs have reached 109%. I still proceeded with the generation of bitstream. I then encountered this error:
Too many comps of type "BUFGMUX" found to fit this device.
What does this mean?
(I'm pretty sure that running the Spartan 3e can run the Ethernet since there is already an IP of Ethernet lite MAC for Spartan 3e. Also, it has more pins than I have in my module. Why does it have then 109% of IOBs?)
I also tried commenting the instantiated mac_transmit_module and mac_receive_module. It generated bitstream successfully. Where did I go wrong?
Your design is simply too large to fit on the target FPGA. The fact that there is similar IP suggests that your implementation is somehow less efficient or has features that the other IP does not. There is no simple, one-size-fits-all solution to this problem.
Can I suggest that in the future you don't just include screen captures as documentation? They are very hard to read and most of the image is irrelevant. If there is a particular error message you want us to see, do a copy-paste into your question instead.
Firstly, you use 255 out of 232 IOBs. You have selected xs3s500e-4fg323, which indeed only has 232 user IOs, 56 of which are input-only. Maybe you selected the wrong part for synthesis?
If you are relatively sure you selected the right part, check the "IOB Properties" report ISE. There, you will get a list of all used IOBs. If that does not work (because maybe the error occurs before this is generated), you can always check the floorplanning tools with your UCF file in order to determine if some LOCs are simply wrong. Do this on a dummy design with just your UCF and the floorplanner.
Secondly, the BUFGMUX message is telling you that you use too many global clock buffers in general (or really too many muxed clocks, unlikely). When a design features many clocks, ISE has to use BUFGMUX primitives in addition to the BUFG primitives in order to route all clocks. Now, if you exceed the number of BUFGMUXs/BUFGs in your design, you will get that error.
So both errors point to either your design being too large, or a wrong part selection.
BUFGMUXs are used to buffer signals which are used as clocks.
In most designs, especially as a beginner, you should only have one clock. And all your processes should be of the same form, with your clock signal in the sensitivity list and an if rising_edge(clock) then line in there. This is called synchronous design, and if you don't do it, all sorts of buggy results are likely to be forthcoming when you try your code out in a real chip. Don't do it until you have enough experience. You can tell when you have enough experience because when you think of using another clock signal you'll think "surely I can find a way of sticking to one clock signal" :)
It sounds to me like you have if rising_edge(all sorts of different signals in different processes) - this makes the tools produce a lot of signals it thinks are clocks, and then hang a BUFGMUX off each one of them, not only do you run out of clock routing resources very quickly, but you will get unpredictable behaviour.

Is it necessary to register both inputs and outputs of every hardware core?

I am aware of the need to synchronize all inputs to an FPGA before using those inputs in order to avoid metastability. I'm also aware of the need to synchronize signals that cross clock domains within a single FPGA. This question isn't about crossing clock domains.
My question is whether it is a good idea to routinely register all of the inputs and outputs of every internal hardware module in an FPGA design. The rationale is that we want to break up long chains of combinational logic in order to improve the clock rate so that we can meet the timing constraints for a chosen clock rate. This will add additional cycles of latency proportional to the number of modules that a signal must cross. Is this a good idea or a bad idea? Should one register only inputs and not outputs?
Answer Summary
Rule of thumb: register all outputs of internal FPGA cores; no need to register inputs. If an output already comes from a register, such as the state register of a state machine, then there is no need to register again.
It is difficult to give a hard and fast rule. It really depends on many factors.
It could:
Increase Fmax by breaking up combinatorial paths
Make place and route easier by allowing the tools to spread logic out in the part
Make partitioning your design easier, allowing for partial rebuilds.
It will not magically solve critical path timing issues. If there is a critical path inside one of your major "blocks", then it will still remain your critical path.
Additionally, you may encounter more problems, depending on how full your design is on the target part.
These things said, I lean to the side of registering outputs only.
Registering all of the inputs and outputs of every internal hardware module in an FPGA design is a bit of overkill. If an output register feeds an input register with no logic between them, then 2x the required registers are consumed. Unless, of course, you're doing logic path balancing.
Registering only inputs and not outputs of every internal hardware module in an FPGA design is a conservative design approach. If the design meets its performance and resource utilization requirements, then this is a valid approach.
If the design is not meeting its performance/utilization requirements, then you've got to do the extra timing analysis in order to reduce the registers in a given logic path within the FPGA.
My question is whether it is a good idea to routinely register all of the inputs and outputs of every internal hardware module in an FPGA design.
No, it's not a good idea to routinely introduce registers like this.
Doing both inputs and outputs is redundant. They'll be no logic between the output register and the next input register.
If my block contains a single AND gate, it's overkill. It depends on the timing and design complexity.
Register stages need to be properly thought about and designed. What happens when a output FIFO fills or other stall conditions? Do all signals have the right register delay so that they appear at the right stage in the right cycle? Adding registers isn't necessarily as simple as it seems.
The rationale is that we want to break up long chains of combinational logic in order to improve the clock rate so that we can meet the timing constraints for a chosen clock rate. This will add additional cycles of latency proportional to the number of modules that a signal must cross. Is this a good idea or a bad idea?
In this case it sounds like you must introduce registers, and you shouldn't read the previous points as "don't do it". Just don't do it blindly. Think about the control logic around the registers and the (now) multi-cycle nature of the logic. You are now building a "Pipeline". Being able to stall a pipeline properly when the output can't write is a huge source of bugs.
Think of cars moving on a road. If one car applies it's brakes and stops, all cars behind need to as well. If the first cars brake lights aren't working, the next car won't get the signal to brake, and it'll crash. Similarly each stage in a pipeline needs to tell the previous stage it's stopping for a moment.
What you can find is that instead of having long timing paths along your computation paths going from input to output, you end up with long timing paths on your enable controlling all these register stages from output to input.
Another option you have is, to let the tools work for you. Add add the end of your complete system a bunch of registers (if you want to pipeline more) and activate in your synthesis tool retiming. This will move the registers (hopefully) between the logic where it is most useful.

What kind of problems are state machines good for? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What kind of programming problems are state machines most suited for?
I have read about parsers being implemented using state machines, but would like to find out about problems that scream out to be implemented as a state machine.
The easiest answer is probably that they are suited for practically any problem. Don't forget that a computer itself is also a state machine.
Regardless of that, state machines are typically used for problems where there is some stream of input and the activity that needs to be done at a given moment depends the last elements seen in that stream at that point.
Examples of this stream of input: some text file in the case of parsing, a string for regular expressions, events such as player entered room for game AI, etc.
Examples of activities: be ready to read a number (after another number followed by a + have appear in the input in a parser for a calculator), turn around (after player approached and then sneezed), perform jumping kick (after player pressed left, left, right, up, up).
A good resource is this free State Machine EBook. My own quick answer is below.
When your logic must contain information about what happened the last time it was run, it must contain state.
So a state machine is simply any code that remembers (or acts on) information that can only be gained by understanding what happened before.
For instance, I have a cellular modem that my program must use. It has to perform the following steps in order:
reset the modem
initiate communications with the modem
wait for the signal strength to indicate a good connection with a tower
...
Now I could block the main program and simply go through all these steps in order, waiting for each to run, but I want to give my user feedback and perform other operations at the same time. So I implement this as a state machine inside a function, and run this function 100 times a second.
enum states{reset,initsend, initresponse, waitonsignal,dial,ppp,...}
modemfunction()
{
static currentstate
switch(currentstate)
{
case reset:
Do reset
if reset was successful, nextstate=init else nextstate = reset
break
case initsend
send "ATD"
nextstate = initresponse
break
...
}
currentstate=nextstate
}
More complex state machines implement protocols. For instance a ECU diagnostics protocol I used can only send 8 byte packets, but sometimes I need to send bigger packets. The ECU is slow, so I need to wait for a response. Ideally when I send a message I use one function and then I don't care what happens, but somewhere my program must monitor the line and send and respond to these messages, breaking them up into smaller pieces and reassembling the pieces of received messages into the final message.
Stateful protocols such as TCP are often represented as state machines. However it's rare that you should want to implement anything as a state machine proper. Usually you will use a corruption of one, i.e. have it carrying out a repeated action while sitting in one state, logging data while it transitions, or exchanging data while remaining in one state.
Objects in games are often represented as state machines. An AI character might be:
Guarding
Aggressive
Patroling
Asleep
So you can see these might model some simple but effective states. Of course you could probably make a more complex continuous system.
Another example would be a process such as making a purchase on Google Checkout. Google gives a number of states for Financial and Order, and then informs you of transistions such as the credit card clearing or getting rejected, and allows you to inform it that the order has been shipped.
Regular expression matching, Parsing, Flow control in a complex system.
Regular expressions are a simple form of state machine, specifically finite automata. They have a natural represenation as such, although it is possible to implement them using mutually recursive functions.
State machines when implemented well, will be very efficient.
There is an excellent state machine compiler for a number of target languages, if you want to make a readable state machine.
http://research.cs.queensu.ca/~thurston/ragel/
It also allows you to avoid the dreaded 'goto'.
AI in games is very often implemented using State Machines.
Helps create discrete logic that is much easier to build and test.
Workflow (see WF in .net 3.0)
They have many uses, parsers being a notable one. I have personally used simplified state machines to implement complex multi-step task dialogs in applications.
A parser example. I recently wrote a parser that takes a binary stream from another program. The meaning of the current element parsed indicates the size/meaning of the next elements. There are a (small) finite number of elements possible. Hence a state machine.
They're great for modelling things that change status, and have logic that triggers on each transition.
I'd use finite state machines for tracking packages by mail, or to keep track of the different stata of a user during the registration process, for example.
As the number of possible status values goes up, the number of transitions explodes. State machines help a lot in that case.
Just as a side note, you can implement state machines with proper tail calls like I explained in the tail recursion question.
In that exemple each room in the game is considered one state.
Also, Hardware design with VHDL (and other logic synthesis languages) uses state machines everywhere to describe hardware.
Any workflow application, especially with asynchronous activities. You have an item in the workflow in a certain state, and the state machine knows how to react to external events by placing the item in a different state, at which point some other activity occurs.
The concept of state is very useful for applications to "remember" the current context of your system and react properly when a new piece of information arrives. Any non trivial application has that notion embedded in the code thru variables and conditionals.
So if your application has to react differently every time it receives a new piece of information because of the context you are in, you could model your system with with a state machines. An example would be how to interpret the keys on a calculator, which depends on what your are processing at that point in time.
On the contrary, if your computation does not depend of the context but solely on the input (like a function adding two numbers), you will not need an state machine (or better said, you will have a state machine with zero states)
Some people design the whole application in terms of state machines since they capture the essential things to keep in mind in your project and then use some procedure or autocoders to make them executable. It takes some paradigm chance to program in this way, but I found it very effective.
Things that comes to mind are:
Robot/Machine manipulation... those robot arms in factories
Simulation Games, (SimCity, Racing Game etc..)
Generalizing: When you have a string of inputs that when interacting with anyone of them, requires the knowledge of the previous inputs or in other words, when processing of any single input requires the knowledge of previous inputs. (that is, it needs to have "states")
Not much that I know of that isn't reducible to a parsing problem though.
If you need a simple stochastic process, you might use a Markov chain, which can be represented as a state machine (given the current state, at the next step the chain will be in state X with a certain probability).

Resources