I've implemented a 16-bit ALU and a register file in VHDL using the Xilinx ISE. I've been asked how many slices my design uses, and I have no idea how to go about answering that question. I'm not working with a particular chip or simulating one, I just wrote the vhdl and debugged it by using a test bench.
Is there a way to get the ISE to generate how many slices my design uses? Or do I need to go through all my code and count up my operations? Or is it as simple as defining what type of components I used?
To get a true view of what resources your design will consume use the map report. Implement the design and look at the hierarchical report of the usage (Slices, slice registers(or flip-flops), LUTS, LUTRAM, BRAM, DSPs, etc) of each module in your design in the map report file. In ISE 13.2 that is Section 13 of your _map.mrp file. You may have to enable the -detail switch in map.
Slices can be a deceptive metric (especially after a map report) since if you only use a single element of a slice it will count the entire slice as used. You will have to understand what is in a slice to really understand what the usage number means. Virtex 6 for example has 8 flip-flops and 4 6-input LUTS per slice.
If you only look at the synthesis numbers (slice flip-flops and slice LUTS) you may miss any netlist black boxes that your design uses (ie coregen elements, microblaze, system generator, or third party IP delivered in netlist form).
Ugh, I figured it out.
The trick is to click whatever module you want to get the slice count for and set it as the top level module by going to Source->Set as top level module. Once you do that, under the Processes pane (making sure the module is still highlighted in the Sources pane) go to the Synthesize - XST and double click 'View Synthesis Report'. The number of slices for that module is then listed in that report.
Related
I'm a bit stumped.
I have a fairly large verilog module that I've tested in Simulation (iSim) and it functions as I want. Now I've hooked up it up in real life to another device using SPI, and some stuff works, and some stuff doesn't.
For example,
I can send a value using command A, and verify that the right value was received using command B. Works no problem.
But if I send a value using command C, I cannot verify that it was received using command D. In simulation it works fine, so I feel I can't really gain anything from simulating any more.
I have looked at the signals on a logic analyzer, and the controller device (not my design) sends the right messages. When I issue command B, I can see the return values correct from my device (I know SPI works anyways). I don't know whether C or D work correctly. D just returns 0s, so maybe C didn't work in the first place. There is no way to step through Verilog, and this module is packaged as IP for Vivado.
Here are two screenshots. First is simulation (I send 5, then 2, then I expect it to return 4 on the next send, which it does; followed by zeros).
Here is what I get in reality (the first two bytes don't matter, 5 is a left over from previously sent value):
Here is a command (B) that works in returning a correct value (it responds to the 0x01 being sent):
Does anyone have any advice for debugging this? I have literally no idea how to proceed.
I can't really reproduce this behaviour in simulation.
Since you are synthesizing to an FPGA, you have a few more options on how to debug your synthesized, on-chip design. As you are using Vivado, you can use ChipScope to look at any signal in your system; allowing you to view a waveform of that signal over time just as you would in simulation (though more restricted). By including the ChipScope IPs into your synthesis, you can sent waveform data back to the Vivaod software which will display a waveform of your selected signals to help you see whats going on inside the FPGA as the system runs. (Note, if you were using Altera's stuff, you can use their equivalent called SignalTap; its pretty much the same thing)
There are numerous tutorial online on how to incorporate and run ChipScope, heres one from the Xilinx website:
http://www.xilinx.com/support/documentation/sw_manuals/xilinx2012_4/ug936-vivado-tutorial-programming-debugging.pdf
Many other use ISE, but the steps are very similar as both typically involve using the coregen tool (though I think you can also add ChipScope via synthesis flow, so there are multiple options on how to incorporate it into your design).
Once on the FPGA, you have access to what is effectively an internal logic analyzer. Note that it does take up some LEs on the FPGA and can take up a fair amount of block RAM depending on how many samples you want to take out your signals.
Tim's answer provides a good description of how to deal with on-chip debugging if you are designing purely for ASIC; so see his answer if you want more information about standard, non-FPGA debugging solutions.
In cases like this you might want to think about adding additional logic which is used just for debugging. ('Design for debug') is a common term used for thinking about this kind of logic.
So you have one chip interface (SPI), which you don't know if it works correctly. Since it seems not to be working, you can't trust debugging over this interface, because if you get an odd result you can't determine what it means.
Since you're working on an FPGA, are there any other interfaces other than SPI which you can get working correctly? Maybe 7-segment display, LEDs, JTAG, VGA, etc?
Try to think of other creative ways to get data out of your chip that don't require the SPI interface.
If you have 4 LEDs, A through D, can you light up each LED for 1 second each time a command of that type is received?
Can you have a 7-seg display the current state of your SPI receiver's state machine, or have it indicate certain error codes if some unknown command is received?
Can you draw over VGA to a monitor a binary sequence of the incoming SPI bitstream?
Once you can start narrowing down with data what is actually happening inside your hardware, you can narrowing the problem space to go inspect for possible problems.
There are multiple reasons why code that runs ok in RTL simulation behaves differently in the FPGA. It is important to consider all possibilities. Chipscope suggested above is definitely a step in right direction and it could give you hint, where to look further. These reasons could be:
The FPGA implementation flow was not executed properly. Did you have right timing constraints, were they met during implementation, especially P&R phase, pin placements, I/O properties, right clock properties. Usually you can find hints inspecting FPGA implementation reports. This is a tedious part, but needed sometimes. Incorrect implementation flow can also result in FPGA implementations that work or don't depending on the run or small unrelated changes (seen this problem many times!).
RTL/netlist discrepancies, e.g. due to incorrect usage `ifdef within design or during synthesis phase, selecting incorrect file for synthesis or the same verily module defined in multiple places. Often, the hint could be found by inspecting removed flop list or synthesis warnings.
Discrepancy between RTL simulation and board environment. They could be external like the clock/data alignment on the interface, but also internal: improper CDC, not handling clock or reset tree delays properly. Note, that X-propagation and CDC is not handled properly in RTL, unless you code in a certain way. Problems with those could be often only seen in netlist simulation environment.
Lastly, the FPGA board problems, like faulty clock source or power supply, heat can also be at fault. They worth checking, but I'd leave those as a last resource. Some folks have a dedicated board/FPGA test design proven to work on the good board that would catch some of those problems.
As a final note, the biggest return is given by investing in simulation environment. Some folks think that since FPGA can be debugged with chipscope and reprogrammed quickly, there is no need in good simulation environment. It probably depends on the size of the project, but my experience is that for most of modern FPGA projects the good simulation environment saves a lot of time spent in the lab looking through chipscope and logic analyzers captures.
We are using a tool to convert the code into RTL.
Using those VHDL files, we would like to synthesis the code using FPGA.
In the synthesis results, we see the following table:
Slice Logic Utilization Used Available Utilization
Number of DSP48E1s 15 864 1%
I would like to search in VHDL files to see which operations use these units.
Is there any way to find them? or any documentation which shows the operations causing DSPs to be used?
There are a few ways that a DSP48 may be used in your VHDL.
It may be inferred. This is when the synthesis tool is smart by looking at an operation that you are doing (such as a multiply) and realizing that it would be most efficient to do the multiply with a dedicated resource (DSP48) instead of fabric/logic.
It may be instantiated. This means that the primitive was directly called out in your source file. The designer said that I know I want to use this piece of hardware, so I am going to call it out explicitly. This is when you could do a text search for "DSP48" in your VHDL source files.
It may be part of a core. If it is part of a core, you may or may not have visibility into that core. For example, how the core is actually implemented may be different than the behavioral model which is used for simulation.
In any case, as recommended by Russell, using Xilinx toolset to determine utilization of primitives in the design hierarchy can be a good first pass to figuring out where the units are coming from. Additionally, you can always open up FPGA Editor, see what the DSP48 units are called and what signals are going to/out of the DSP48 for additional hints on where it is in your design.
It sounds like you're trying to find your Module Level Utilization. I know that Xilinx ISE supports this. Under Design Overview there's an option called Module Level Utilization that breaks down every module in your VHDL design and tells you where the Regs, LUTs, BRAMs, and DSPs are used.
If you're unable to find it, look for any large multiplications in your design. Large multiply/accumulate operations will synthesize into DSP48s.
Right now, I am trying on place-routing on encounter, but when I search on web, I always see the tuition about quartus routing. For curious, I try to find out the difference between two of them. However, there is not any exact answer right now. But when I moving the layout of these two tools made, I feel like quartus' layout is look like making on a settle down chip. But encounter would give me more customly feeling. Thus, I suppose quartus for FPGA encounter for ASIC. Am I right? If not, plz tell me the exact story.
Encounter is a place and route tool for custom silicon, so it can pick any cell from a library, put it anywhere within a placement block, and route metal to it on any available layer as needed. The output of Encounter is a GDSII file showing what polygons need to be created on each layer as part of the silicon manufacturing process.
An FPGA has already placed all of the available transistors and wires within the device. Quartus (or ISE, for Xilinx) maps logic into LUTs (the logic unit within an FPGA) and figures out how to connect the LUTs using available tracks between the logic blocks. The output of Quartus is a bit stream which tells what values to put in to each LUT on the device and which routing tracks to select/connect between the LUTs.
I'm designing an Ethernet MAC Controller for Spartan 3E FPGA. IOBs have reached 109%. I still proceeded with the generation of bitstream. I then encountered this error:
Too many comps of type "BUFGMUX" found to fit this device.
What does this mean?
(I'm pretty sure that running the Spartan 3e can run the Ethernet since there is already an IP of Ethernet lite MAC for Spartan 3e. Also, it has more pins than I have in my module. Why does it have then 109% of IOBs?)
I also tried commenting the instantiated mac_transmit_module and mac_receive_module. It generated bitstream successfully. Where did I go wrong?
Your design is simply too large to fit on the target FPGA. The fact that there is similar IP suggests that your implementation is somehow less efficient or has features that the other IP does not. There is no simple, one-size-fits-all solution to this problem.
Can I suggest that in the future you don't just include screen captures as documentation? They are very hard to read and most of the image is irrelevant. If there is a particular error message you want us to see, do a copy-paste into your question instead.
Firstly, you use 255 out of 232 IOBs. You have selected xs3s500e-4fg323, which indeed only has 232 user IOs, 56 of which are input-only. Maybe you selected the wrong part for synthesis?
If you are relatively sure you selected the right part, check the "IOB Properties" report ISE. There, you will get a list of all used IOBs. If that does not work (because maybe the error occurs before this is generated), you can always check the floorplanning tools with your UCF file in order to determine if some LOCs are simply wrong. Do this on a dummy design with just your UCF and the floorplanner.
Secondly, the BUFGMUX message is telling you that you use too many global clock buffers in general (or really too many muxed clocks, unlikely). When a design features many clocks, ISE has to use BUFGMUX primitives in addition to the BUFG primitives in order to route all clocks. Now, if you exceed the number of BUFGMUXs/BUFGs in your design, you will get that error.
So both errors point to either your design being too large, or a wrong part selection.
BUFGMUXs are used to buffer signals which are used as clocks.
In most designs, especially as a beginner, you should only have one clock. And all your processes should be of the same form, with your clock signal in the sensitivity list and an if rising_edge(clock) then line in there. This is called synchronous design, and if you don't do it, all sorts of buggy results are likely to be forthcoming when you try your code out in a real chip. Don't do it until you have enough experience. You can tell when you have enough experience because when you think of using another clock signal you'll think "surely I can find a way of sticking to one clock signal" :)
It sounds to me like you have if rising_edge(all sorts of different signals in different processes) - this makes the tools produce a lot of signals it thinks are clocks, and then hang a BUFGMUX off each one of them, not only do you run out of clock routing resources very quickly, but you will get unpredictable behaviour.
I'm working on a VHDL project for a Xilinx FPGA, and found myself at a cross road.
I need for example to add two signals (C=A+B), and found that Xilinx has Tool that can generate a component which will do the job.
But this could also be implemented in standard VHDL: C <= A + B
If I use standard VHDL the code should be portable, but does this has lower throughput?
I mean, does the special components use DSP functions inside the FPGA ect., that makes them faster, or can the Synthesizer normally handle this?
Any time you can infer something do so.
Performance is very rarely impacted, especially in the case of simple things like adders and multipliers. Even RAM blocks are easy to infer for many purposes - I only tend to instantiate vendor components if I need a very specific vendor-block behaviour.
The DSP blocks will be used well if you write VHDL code for adders and multipliers of the appropriate widths (or smaller) with pipelining that matches what is available inside the block. If you want to be more advanced and use (for example) Xilinx's DSP block's opcode input then you will have to instantiate a block.