How do I debug Verilog code where simulation functions as intended, but implementation doesn't? - debugging

I'm a bit stumped.
I have a fairly large verilog module that I've tested in Simulation (iSim) and it functions as I want. Now I've hooked up it up in real life to another device using SPI, and some stuff works, and some stuff doesn't.
For example,
I can send a value using command A, and verify that the right value was received using command B. Works no problem.
But if I send a value using command C, I cannot verify that it was received using command D. In simulation it works fine, so I feel I can't really gain anything from simulating any more.
I have looked at the signals on a logic analyzer, and the controller device (not my design) sends the right messages. When I issue command B, I can see the return values correct from my device (I know SPI works anyways). I don't know whether C or D work correctly. D just returns 0s, so maybe C didn't work in the first place. There is no way to step through Verilog, and this module is packaged as IP for Vivado.
Here are two screenshots. First is simulation (I send 5, then 2, then I expect it to return 4 on the next send, which it does; followed by zeros).
Here is what I get in reality (the first two bytes don't matter, 5 is a left over from previously sent value):
Here is a command (B) that works in returning a correct value (it responds to the 0x01 being sent):
Does anyone have any advice for debugging this? I have literally no idea how to proceed.
I can't really reproduce this behaviour in simulation.

Since you are synthesizing to an FPGA, you have a few more options on how to debug your synthesized, on-chip design. As you are using Vivado, you can use ChipScope to look at any signal in your system; allowing you to view a waveform of that signal over time just as you would in simulation (though more restricted). By including the ChipScope IPs into your synthesis, you can sent waveform data back to the Vivaod software which will display a waveform of your selected signals to help you see whats going on inside the FPGA as the system runs. (Note, if you were using Altera's stuff, you can use their equivalent called SignalTap; its pretty much the same thing)
There are numerous tutorial online on how to incorporate and run ChipScope, heres one from the Xilinx website:
http://www.xilinx.com/support/documentation/sw_manuals/xilinx2012_4/ug936-vivado-tutorial-programming-debugging.pdf
Many other use ISE, but the steps are very similar as both typically involve using the coregen tool (though I think you can also add ChipScope via synthesis flow, so there are multiple options on how to incorporate it into your design).
Once on the FPGA, you have access to what is effectively an internal logic analyzer. Note that it does take up some LEs on the FPGA and can take up a fair amount of block RAM depending on how many samples you want to take out your signals.
Tim's answer provides a good description of how to deal with on-chip debugging if you are designing purely for ASIC; so see his answer if you want more information about standard, non-FPGA debugging solutions.

In cases like this you might want to think about adding additional logic which is used just for debugging. ('Design for debug') is a common term used for thinking about this kind of logic.
So you have one chip interface (SPI), which you don't know if it works correctly. Since it seems not to be working, you can't trust debugging over this interface, because if you get an odd result you can't determine what it means.
Since you're working on an FPGA, are there any other interfaces other than SPI which you can get working correctly? Maybe 7-segment display, LEDs, JTAG, VGA, etc?
Try to think of other creative ways to get data out of your chip that don't require the SPI interface.
If you have 4 LEDs, A through D, can you light up each LED for 1 second each time a command of that type is received?
Can you have a 7-seg display the current state of your SPI receiver's state machine, or have it indicate certain error codes if some unknown command is received?
Can you draw over VGA to a monitor a binary sequence of the incoming SPI bitstream?
Once you can start narrowing down with data what is actually happening inside your hardware, you can narrowing the problem space to go inspect for possible problems.

There are multiple reasons why code that runs ok in RTL simulation behaves differently in the FPGA. It is important to consider all possibilities. Chipscope suggested above is definitely a step in right direction and it could give you hint, where to look further. These reasons could be:
The FPGA implementation flow was not executed properly. Did you have right timing constraints, were they met during implementation, especially P&R phase, pin placements, I/O properties, right clock properties. Usually you can find hints inspecting FPGA implementation reports. This is a tedious part, but needed sometimes. Incorrect implementation flow can also result in FPGA implementations that work or don't depending on the run or small unrelated changes (seen this problem many times!).
RTL/netlist discrepancies, e.g. due to incorrect usage `ifdef within design or during synthesis phase, selecting incorrect file for synthesis or the same verily module defined in multiple places. Often, the hint could be found by inspecting removed flop list or synthesis warnings.
Discrepancy between RTL simulation and board environment. They could be external like the clock/data alignment on the interface, but also internal: improper CDC, not handling clock or reset tree delays properly. Note, that X-propagation and CDC is not handled properly in RTL, unless you code in a certain way. Problems with those could be often only seen in netlist simulation environment.
Lastly, the FPGA board problems, like faulty clock source or power supply, heat can also be at fault. They worth checking, but I'd leave those as a last resource. Some folks have a dedicated board/FPGA test design proven to work on the good board that would catch some of those problems.
As a final note, the biggest return is given by investing in simulation environment. Some folks think that since FPGA can be debugged with chipscope and reprogrammed quickly, there is no need in good simulation environment. It probably depends on the size of the project, but my experience is that for most of modern FPGA projects the good simulation environment saves a lot of time spent in the lab looking through chipscope and logic analyzers captures.

Related

is design compiler& encounter is for ASIC design and quartus&modelsim is for FPGA design?

Right now, I am trying on place-routing on encounter, but when I search on web, I always see the tuition about quartus routing. For curious, I try to find out the difference between two of them. However, there is not any exact answer right now. But when I moving the layout of these two tools made, I feel like quartus' layout is look like making on a settle down chip. But encounter would give me more customly feeling. Thus, I suppose quartus for FPGA encounter for ASIC. Am I right? If not, plz tell me the exact story.
Encounter is a place and route tool for custom silicon, so it can pick any cell from a library, put it anywhere within a placement block, and route metal to it on any available layer as needed. The output of Encounter is a GDSII file showing what polygons need to be created on each layer as part of the silicon manufacturing process.
An FPGA has already placed all of the available transistors and wires within the device. Quartus (or ISE, for Xilinx) maps logic into LUTs (the logic unit within an FPGA) and figures out how to connect the LUTs using available tracks between the logic blocks. The output of Quartus is a bit stream which tells what values to put in to each LUT on the device and which routing tracks to select/connect between the LUTs.

Too many comps of type "BUFGMUX" found to fit this device. (Ethernet Design)

I'm designing an Ethernet MAC Controller for Spartan 3E FPGA. IOBs have reached 109%. I still proceeded with the generation of bitstream. I then encountered this error:
Too many comps of type "BUFGMUX" found to fit this device.
What does this mean?
(I'm pretty sure that running the Spartan 3e can run the Ethernet since there is already an IP of Ethernet lite MAC for Spartan 3e. Also, it has more pins than I have in my module. Why does it have then 109% of IOBs?)
I also tried commenting the instantiated mac_transmit_module and mac_receive_module. It generated bitstream successfully. Where did I go wrong?
Your design is simply too large to fit on the target FPGA. The fact that there is similar IP suggests that your implementation is somehow less efficient or has features that the other IP does not. There is no simple, one-size-fits-all solution to this problem.
Can I suggest that in the future you don't just include screen captures as documentation? They are very hard to read and most of the image is irrelevant. If there is a particular error message you want us to see, do a copy-paste into your question instead.
Firstly, you use 255 out of 232 IOBs. You have selected xs3s500e-4fg323, which indeed only has 232 user IOs, 56 of which are input-only. Maybe you selected the wrong part for synthesis?
If you are relatively sure you selected the right part, check the "IOB Properties" report ISE. There, you will get a list of all used IOBs. If that does not work (because maybe the error occurs before this is generated), you can always check the floorplanning tools with your UCF file in order to determine if some LOCs are simply wrong. Do this on a dummy design with just your UCF and the floorplanner.
Secondly, the BUFGMUX message is telling you that you use too many global clock buffers in general (or really too many muxed clocks, unlikely). When a design features many clocks, ISE has to use BUFGMUX primitives in addition to the BUFG primitives in order to route all clocks. Now, if you exceed the number of BUFGMUXs/BUFGs in your design, you will get that error.
So both errors point to either your design being too large, or a wrong part selection.
BUFGMUXs are used to buffer signals which are used as clocks.
In most designs, especially as a beginner, you should only have one clock. And all your processes should be of the same form, with your clock signal in the sensitivity list and an if rising_edge(clock) then line in there. This is called synchronous design, and if you don't do it, all sorts of buggy results are likely to be forthcoming when you try your code out in a real chip. Don't do it until you have enough experience. You can tell when you have enough experience because when you think of using another clock signal you'll think "surely I can find a way of sticking to one clock signal" :)
It sounds to me like you have if rising_edge(all sorts of different signals in different processes) - this makes the tools produce a lot of signals it thinks are clocks, and then hang a BUFGMUX off each one of them, not only do you run out of clock routing resources very quickly, but you will get unpredictable behaviour.

Vhdl with no clk

I have a clock in my vhdl code but i don't use it , simply my process just depends on handshake when one component finishes and gets an output out , this output is in the sensitivity list of my FSM and is then becomes an input to the next component and of course its output is also in the sensitivity list of my FSM(so to know when will component finishes its computation)... and so on.
Is this method wrong ? it works in simulation and also in post-route simulation but gets me warnings like this : warning :HOLD High VIOLATION ON I WITH RESPECT TO CLK; and
warning :HOLD Low VIOLATION ON I WITH RESPECT TO CLK;
is this warnings not important or will my code damage my fpga because it doesn't depend on a clock ?
The warning you are getting are timing violations. You get these because the tools detect that your design does not obey the necessary timing restrictions for the internal primitives.
For instance, inputs to lookup-tables (which is one of the main building-blocks inside an FPGA) need to be held for a specific time for the output to stabilize. This is very hard to guarantee when your entire timing relies only on the latencies and delays of the components themselves, and switch on a completely asynchronous basis.
Depending on your actual design (mostly the size and complexity of it), I'll wager the guess that you'll end up with a lot of very-hard-to-debug errors once you get it inside an FPGA. You'll have a much, much, much easier time using a clock. This will allow you to have a clear idea of when signals arrive where, and it will allow you to use the internal tools to check your timing. You'll also find it much easier to interface to other devices, and your system will be less susceptible to noisy inputs.
So all in all, use a clock. You (probably) wont damage your FPGA by not doing it, but a clock will save you from tons of trouble.
your code does most probably not damage your FPGA because it doesn't depend on a clock. however, for synthesis you should always use registered (clocked) logic. without using a clock your design will not be controllable because of timing/delay/routing/fan out/... this will let your FSM behave "mysteriously" when synthesized (even if it worked in simulation).
you'll find plenty of examples for good FSM implementation style with google's help (search for Moore or Mealy FSM)
Definitely use a clock. And only one clock throughout the design. This is the easiest way - the tools support this design style very well. You can often get away with a single timing constraint, especially if your inputs are slow and synchronous to the same clock.
When you have gained experience designing this way, you can move outside of this, but be ready for more analysis, timing constraints and potentially build iterations while you learn the pitfalls of crossing clock-domains and asynchronous signals.

What are tsetup and thold in VHDL?

I am learning VHDL. When I tried to make a testbanch I run into these words. What do they mean? I could find any simple explanaition on google.
Thanks in advance.
tSetup and tHold aren't VHDL keywords to my knowledge but the minimum setup and hold time for the device being simulated to operate correctly.
tSetup - The amount of time the data/control needs to be valid before the clock edge.
tHold - The amount of time the data/control needs to be valid after the clock edge.
A simple graphic explaining this:
http://en.wikipedia.org/wiki/Flip-flop_%28electronics%29#Setup.2C_hold.2C_recovery.2C_removal_times
As TOTA says, setup and hold times are digital logic design terms, not VHDL terms.
The vast majority of the time, you do not need to concern yourself with them in testbenches as you are almost always testing internal blocks within your chip and the tools will manage all the timing for you.
When you are working at the device pin level, you can set you models up to check the setup and hold times for violations. When simulating RTL, there are no delays (usually) modelled, so your timing should be fine. You can later simulate a back-annotated netlist which has all the real chip delays included and check that you are still going to meet all the timing requirements of your external devices.

Is it necessary to register both inputs and outputs of every hardware core?

I am aware of the need to synchronize all inputs to an FPGA before using those inputs in order to avoid metastability. I'm also aware of the need to synchronize signals that cross clock domains within a single FPGA. This question isn't about crossing clock domains.
My question is whether it is a good idea to routinely register all of the inputs and outputs of every internal hardware module in an FPGA design. The rationale is that we want to break up long chains of combinational logic in order to improve the clock rate so that we can meet the timing constraints for a chosen clock rate. This will add additional cycles of latency proportional to the number of modules that a signal must cross. Is this a good idea or a bad idea? Should one register only inputs and not outputs?
Answer Summary
Rule of thumb: register all outputs of internal FPGA cores; no need to register inputs. If an output already comes from a register, such as the state register of a state machine, then there is no need to register again.
It is difficult to give a hard and fast rule. It really depends on many factors.
It could:
Increase Fmax by breaking up combinatorial paths
Make place and route easier by allowing the tools to spread logic out in the part
Make partitioning your design easier, allowing for partial rebuilds.
It will not magically solve critical path timing issues. If there is a critical path inside one of your major "blocks", then it will still remain your critical path.
Additionally, you may encounter more problems, depending on how full your design is on the target part.
These things said, I lean to the side of registering outputs only.
Registering all of the inputs and outputs of every internal hardware module in an FPGA design is a bit of overkill. If an output register feeds an input register with no logic between them, then 2x the required registers are consumed. Unless, of course, you're doing logic path balancing.
Registering only inputs and not outputs of every internal hardware module in an FPGA design is a conservative design approach. If the design meets its performance and resource utilization requirements, then this is a valid approach.
If the design is not meeting its performance/utilization requirements, then you've got to do the extra timing analysis in order to reduce the registers in a given logic path within the FPGA.
My question is whether it is a good idea to routinely register all of the inputs and outputs of every internal hardware module in an FPGA design.
No, it's not a good idea to routinely introduce registers like this.
Doing both inputs and outputs is redundant. They'll be no logic between the output register and the next input register.
If my block contains a single AND gate, it's overkill. It depends on the timing and design complexity.
Register stages need to be properly thought about and designed. What happens when a output FIFO fills or other stall conditions? Do all signals have the right register delay so that they appear at the right stage in the right cycle? Adding registers isn't necessarily as simple as it seems.
The rationale is that we want to break up long chains of combinational logic in order to improve the clock rate so that we can meet the timing constraints for a chosen clock rate. This will add additional cycles of latency proportional to the number of modules that a signal must cross. Is this a good idea or a bad idea?
In this case it sounds like you must introduce registers, and you shouldn't read the previous points as "don't do it". Just don't do it blindly. Think about the control logic around the registers and the (now) multi-cycle nature of the logic. You are now building a "Pipeline". Being able to stall a pipeline properly when the output can't write is a huge source of bugs.
Think of cars moving on a road. If one car applies it's brakes and stops, all cars behind need to as well. If the first cars brake lights aren't working, the next car won't get the signal to brake, and it'll crash. Similarly each stage in a pipeline needs to tell the previous stage it's stopping for a moment.
What you can find is that instead of having long timing paths along your computation paths going from input to output, you end up with long timing paths on your enable controlling all these register stages from output to input.
Another option you have is, to let the tools work for you. Add add the end of your complete system a bunch of registers (if you want to pipeline more) and activate in your synthesis tool retiming. This will move the registers (hopefully) between the logic where it is most useful.

Resources