Vhdl with no clk - vhdl

I have a clock in my vhdl code but i don't use it , simply my process just depends on handshake when one component finishes and gets an output out , this output is in the sensitivity list of my FSM and is then becomes an input to the next component and of course its output is also in the sensitivity list of my FSM(so to know when will component finishes its computation)... and so on.
Is this method wrong ? it works in simulation and also in post-route simulation but gets me warnings like this : warning :HOLD High VIOLATION ON I WITH RESPECT TO CLK; and
warning :HOLD Low VIOLATION ON I WITH RESPECT TO CLK;
is this warnings not important or will my code damage my fpga because it doesn't depend on a clock ?

The warning you are getting are timing violations. You get these because the tools detect that your design does not obey the necessary timing restrictions for the internal primitives.
For instance, inputs to lookup-tables (which is one of the main building-blocks inside an FPGA) need to be held for a specific time for the output to stabilize. This is very hard to guarantee when your entire timing relies only on the latencies and delays of the components themselves, and switch on a completely asynchronous basis.
Depending on your actual design (mostly the size and complexity of it), I'll wager the guess that you'll end up with a lot of very-hard-to-debug errors once you get it inside an FPGA. You'll have a much, much, much easier time using a clock. This will allow you to have a clear idea of when signals arrive where, and it will allow you to use the internal tools to check your timing. You'll also find it much easier to interface to other devices, and your system will be less susceptible to noisy inputs.
So all in all, use a clock. You (probably) wont damage your FPGA by not doing it, but a clock will save you from tons of trouble.

your code does most probably not damage your FPGA because it doesn't depend on a clock. however, for synthesis you should always use registered (clocked) logic. without using a clock your design will not be controllable because of timing/delay/routing/fan out/... this will let your FSM behave "mysteriously" when synthesized (even if it worked in simulation).
you'll find plenty of examples for good FSM implementation style with google's help (search for Moore or Mealy FSM)

Definitely use a clock. And only one clock throughout the design. This is the easiest way - the tools support this design style very well. You can often get away with a single timing constraint, especially if your inputs are slow and synchronous to the same clock.
When you have gained experience designing this way, you can move outside of this, but be ready for more analysis, timing constraints and potentially build iterations while you learn the pitfalls of crossing clock-domains and asynchronous signals.

Related

Is clock usage recommended in VHDL design?

I am doing a small task where I have to count the pulses coming from two inputs. The requirement doesn't specify clock. Currently I have a process that is triggered when any of the input changes and then corresponding count is increased.
My question is Should I use clock for this design and make the process Clock sensitive and then check if inputs have Changed? Is it a good practice to use Clocks in VHDL design?
Sub-question- I have to double buffer the input data. Does this mean I have to use clock and pass inputs through two flipflops? or is there a way to double buffer data without using clock?
It is possible to design asynchronous circuit with VHDL. The design rules are a bit different than synchronous design (using request and acknoledge signals).
Your need is not very complex and could be designed without clock but you have to be careful with your memorisation element. Specialy if you work with an FPGA, these devices are not supposed to work asynchonously. So look carefully the synthetizer results.
(If it's school homework, use a clock ;) In digital design, the clock usage is the default case. Asynchronous logic is an advanced concept)

VHDL behavioural vs structural performance

I was wandering, in terms of "performance" if there's some kind of difference between vhdl structural and behavioural. I know that nowdays is more common to write behavioural instead of structural but since i'd like to have an understanding in terms of performance i have been thinking that maybe there's some difference...
There is no hardware-related reason to prefer one form or the other.
It may be that one form leads to faster simulation than the other; I haven't seen any evidence for this in general, but then I haven't looked. It is true that after synthesis, the design is translated into a structural form, and post-synthesis simulation is slow, but this is due to the sheer size of the resulting structural version expressed as thousands of individual gates and their interconnections.
What matters more is the quality of synthesis results: It should be possible to write a design in both forms and have it synthesise to essentially the same hardware. And this seems to be generally true, in my experience.
Sometimes you will find synthesis tools have difficulty efficiently translating a construct (usually behavioural) but not as frequently as in the past.
What matters most (unless you are pushing the boundaries of speed or FPGA size) is clarity, leading to readability, reliability, efficiency, testability, maintainability and so on. If you can't understand it you can't see the inefficiencies or even test it properly.
Here, structural VHDL has a role to play at the top level : dividing a system into blocks like CPU, memory interface, FFT processor, UART, SPI and so on. Sometimes hierarchically, so you might want to divide the memory interface into refresh logic, error correction, address multiplexing, and so on.
But most blocks - for example, tasks that a single state machine can handle, are clearest and simplest when expressed behaviourally. So in a UART you might have two separate processes for TX and RX, while an SPI interface (which sends and receives on the same SPI clock) is probably best as a single behavioural process.

How do I debug Verilog code where simulation functions as intended, but implementation doesn't?

I'm a bit stumped.
I have a fairly large verilog module that I've tested in Simulation (iSim) and it functions as I want. Now I've hooked up it up in real life to another device using SPI, and some stuff works, and some stuff doesn't.
For example,
I can send a value using command A, and verify that the right value was received using command B. Works no problem.
But if I send a value using command C, I cannot verify that it was received using command D. In simulation it works fine, so I feel I can't really gain anything from simulating any more.
I have looked at the signals on a logic analyzer, and the controller device (not my design) sends the right messages. When I issue command B, I can see the return values correct from my device (I know SPI works anyways). I don't know whether C or D work correctly. D just returns 0s, so maybe C didn't work in the first place. There is no way to step through Verilog, and this module is packaged as IP for Vivado.
Here are two screenshots. First is simulation (I send 5, then 2, then I expect it to return 4 on the next send, which it does; followed by zeros).
Here is what I get in reality (the first two bytes don't matter, 5 is a left over from previously sent value):
Here is a command (B) that works in returning a correct value (it responds to the 0x01 being sent):
Does anyone have any advice for debugging this? I have literally no idea how to proceed.
I can't really reproduce this behaviour in simulation.
Since you are synthesizing to an FPGA, you have a few more options on how to debug your synthesized, on-chip design. As you are using Vivado, you can use ChipScope to look at any signal in your system; allowing you to view a waveform of that signal over time just as you would in simulation (though more restricted). By including the ChipScope IPs into your synthesis, you can sent waveform data back to the Vivaod software which will display a waveform of your selected signals to help you see whats going on inside the FPGA as the system runs. (Note, if you were using Altera's stuff, you can use their equivalent called SignalTap; its pretty much the same thing)
There are numerous tutorial online on how to incorporate and run ChipScope, heres one from the Xilinx website:
http://www.xilinx.com/support/documentation/sw_manuals/xilinx2012_4/ug936-vivado-tutorial-programming-debugging.pdf
Many other use ISE, but the steps are very similar as both typically involve using the coregen tool (though I think you can also add ChipScope via synthesis flow, so there are multiple options on how to incorporate it into your design).
Once on the FPGA, you have access to what is effectively an internal logic analyzer. Note that it does take up some LEs on the FPGA and can take up a fair amount of block RAM depending on how many samples you want to take out your signals.
Tim's answer provides a good description of how to deal with on-chip debugging if you are designing purely for ASIC; so see his answer if you want more information about standard, non-FPGA debugging solutions.
In cases like this you might want to think about adding additional logic which is used just for debugging. ('Design for debug') is a common term used for thinking about this kind of logic.
So you have one chip interface (SPI), which you don't know if it works correctly. Since it seems not to be working, you can't trust debugging over this interface, because if you get an odd result you can't determine what it means.
Since you're working on an FPGA, are there any other interfaces other than SPI which you can get working correctly? Maybe 7-segment display, LEDs, JTAG, VGA, etc?
Try to think of other creative ways to get data out of your chip that don't require the SPI interface.
If you have 4 LEDs, A through D, can you light up each LED for 1 second each time a command of that type is received?
Can you have a 7-seg display the current state of your SPI receiver's state machine, or have it indicate certain error codes if some unknown command is received?
Can you draw over VGA to a monitor a binary sequence of the incoming SPI bitstream?
Once you can start narrowing down with data what is actually happening inside your hardware, you can narrowing the problem space to go inspect for possible problems.
There are multiple reasons why code that runs ok in RTL simulation behaves differently in the FPGA. It is important to consider all possibilities. Chipscope suggested above is definitely a step in right direction and it could give you hint, where to look further. These reasons could be:
The FPGA implementation flow was not executed properly. Did you have right timing constraints, were they met during implementation, especially P&R phase, pin placements, I/O properties, right clock properties. Usually you can find hints inspecting FPGA implementation reports. This is a tedious part, but needed sometimes. Incorrect implementation flow can also result in FPGA implementations that work or don't depending on the run or small unrelated changes (seen this problem many times!).
RTL/netlist discrepancies, e.g. due to incorrect usage `ifdef within design or during synthesis phase, selecting incorrect file for synthesis or the same verily module defined in multiple places. Often, the hint could be found by inspecting removed flop list or synthesis warnings.
Discrepancy between RTL simulation and board environment. They could be external like the clock/data alignment on the interface, but also internal: improper CDC, not handling clock or reset tree delays properly. Note, that X-propagation and CDC is not handled properly in RTL, unless you code in a certain way. Problems with those could be often only seen in netlist simulation environment.
Lastly, the FPGA board problems, like faulty clock source or power supply, heat can also be at fault. They worth checking, but I'd leave those as a last resource. Some folks have a dedicated board/FPGA test design proven to work on the good board that would catch some of those problems.
As a final note, the biggest return is given by investing in simulation environment. Some folks think that since FPGA can be debugged with chipscope and reprogrammed quickly, there is no need in good simulation environment. It probably depends on the size of the project, but my experience is that for most of modern FPGA projects the good simulation environment saves a lot of time spent in the lab looking through chipscope and logic analyzers captures.

Efficient use of ALMs (Adaptive Logic Modules)?

I have a Verilog design that compiles to ~15K LEs on a Cyclone IV (EP4CE22F17C6N). When I compile the same same code on a Cyclone V (5CEFA2F23C8N), it takes ~8500 ALMs. Based on Altera's own LE equivalency for the particular Cyclone V, this would be ~20K LEs. Now, I realize that the estimates are going to be highly dependent on particular design, but a %33 increase in "effective" resource utilization seems like a lot.
So it makes me wonder if there are design tips/tricks/etc. for making more efficient use of ALMs. In particular, I'm looking for Verilog constructs that would improve the register density, fabric density, dense packing, etc.
I would agree with the comments above that generally you shouldn't need to optimise, however it's always important to check that your code does map to the chosen architecture. Specifically:
Reset
Using the wrong kind of reset for your architecture can cause problems. It's also very easy to accidentally cause the synthesis tool to insert logic to emulate a clock-enable. For full details see this answer. For Altera you should be using an asynchronous reset which is synchronously de-asserted.
Priority of control signals
In Altera:
Asynchronous Clear, aclr—highest priority
Asynchronous Load, aload
Enable, ena
Synchronous Clear, sclr
Synchronous Load, sload
Data In, data—lowest priority
Latches
Easy to grep from the reports, but unless you're absolutely sure it's intentional, latches are generally bad mmmmkay.
Synthesis
There are many options available to tweaking the behaviour of the synthesis process. Here are a few that will affect your results:
ALM_REGISTER_PACKING_EFFORT
This option guides the Fitter when packing registers into ALMs.
MUX_RESTRUCTURE
Allows the Compiler to reduce the number of logic elements required to implement multiplexersin a design.
OPTIMIZATION_TECHNIQUE
Specifies the overall optimization goal for Analysis & Synthesis: attempt to maximize performance, minimize logic usage, or balance high performance with minimal logic usage.
Bear in mind that if your device isn't getting too full, the tool won't have much "incentive" to minimise logic utilisation unless you explicitly tell it to.

Is it necessary to register both inputs and outputs of every hardware core?

I am aware of the need to synchronize all inputs to an FPGA before using those inputs in order to avoid metastability. I'm also aware of the need to synchronize signals that cross clock domains within a single FPGA. This question isn't about crossing clock domains.
My question is whether it is a good idea to routinely register all of the inputs and outputs of every internal hardware module in an FPGA design. The rationale is that we want to break up long chains of combinational logic in order to improve the clock rate so that we can meet the timing constraints for a chosen clock rate. This will add additional cycles of latency proportional to the number of modules that a signal must cross. Is this a good idea or a bad idea? Should one register only inputs and not outputs?
Answer Summary
Rule of thumb: register all outputs of internal FPGA cores; no need to register inputs. If an output already comes from a register, such as the state register of a state machine, then there is no need to register again.
It is difficult to give a hard and fast rule. It really depends on many factors.
It could:
Increase Fmax by breaking up combinatorial paths
Make place and route easier by allowing the tools to spread logic out in the part
Make partitioning your design easier, allowing for partial rebuilds.
It will not magically solve critical path timing issues. If there is a critical path inside one of your major "blocks", then it will still remain your critical path.
Additionally, you may encounter more problems, depending on how full your design is on the target part.
These things said, I lean to the side of registering outputs only.
Registering all of the inputs and outputs of every internal hardware module in an FPGA design is a bit of overkill. If an output register feeds an input register with no logic between them, then 2x the required registers are consumed. Unless, of course, you're doing logic path balancing.
Registering only inputs and not outputs of every internal hardware module in an FPGA design is a conservative design approach. If the design meets its performance and resource utilization requirements, then this is a valid approach.
If the design is not meeting its performance/utilization requirements, then you've got to do the extra timing analysis in order to reduce the registers in a given logic path within the FPGA.
My question is whether it is a good idea to routinely register all of the inputs and outputs of every internal hardware module in an FPGA design.
No, it's not a good idea to routinely introduce registers like this.
Doing both inputs and outputs is redundant. They'll be no logic between the output register and the next input register.
If my block contains a single AND gate, it's overkill. It depends on the timing and design complexity.
Register stages need to be properly thought about and designed. What happens when a output FIFO fills or other stall conditions? Do all signals have the right register delay so that they appear at the right stage in the right cycle? Adding registers isn't necessarily as simple as it seems.
The rationale is that we want to break up long chains of combinational logic in order to improve the clock rate so that we can meet the timing constraints for a chosen clock rate. This will add additional cycles of latency proportional to the number of modules that a signal must cross. Is this a good idea or a bad idea?
In this case it sounds like you must introduce registers, and you shouldn't read the previous points as "don't do it". Just don't do it blindly. Think about the control logic around the registers and the (now) multi-cycle nature of the logic. You are now building a "Pipeline". Being able to stall a pipeline properly when the output can't write is a huge source of bugs.
Think of cars moving on a road. If one car applies it's brakes and stops, all cars behind need to as well. If the first cars brake lights aren't working, the next car won't get the signal to brake, and it'll crash. Similarly each stage in a pipeline needs to tell the previous stage it's stopping for a moment.
What you can find is that instead of having long timing paths along your computation paths going from input to output, you end up with long timing paths on your enable controlling all these register stages from output to input.
Another option you have is, to let the tools work for you. Add add the end of your complete system a bunch of registers (if you want to pipeline more) and activate in your synthesis tool retiming. This will move the registers (hopefully) between the logic where it is most useful.

Resources