I have a problem - that I think I can solve with regular 'c' like operations
but I was wondering if there is a better way, something like 'regexp' for VHDL
the problem is that I have a string/collection of bits, "101010101010101" and I want to look for the pattern (with no overlapping) "1010" inside
what are my best options for attacking this problem?
edit : I'd like to mention that the input is parralel, all the bits at once and not in serial
it is still possible to implement this as an FSM - but it there a more efficient way?
If all you want to do is find a pattern within a vector, then you can just iterate over it. Assuming "downto" vectors:
process (vec, what_to_find)
begin
found <= 0;
for start in vec'high downto vec'low+what_to_find'length loop
if vec(start downto start - what_to_find'length) = what_to_find then
found <= start;
end if;
end for;
end process;
Depending on the sizes of your input and search vectors compared to the target device, this will be a reasonable or unreasonable amount of logic!
VHDL does not have builtin regex support, however what you are planning to solve is a pattern matching problem. Basically what you do is build a statemachine (which is what happens when evaluating a regular expression) and use it to match the input. The most simple approach is to check whether the first n bit match your pattern, then shift and continue. Longer, or more interesting patterns, e.g., incorporating quantifiers, matching groups etc. require a bit more.
There are numerous approaches to do that (try google vhdl pattern matching, it is used,e.g., for network traffic analysis), I even found one that would automatically generate the vhdl. I would guess, however, that a specialized hand-made version for your problem would be rather more efficient.
The there is no generally applicable VHDL solution for that kind of pattern
matching, but the solution should be driven by the requirements, since size and
speed can vary greatly for that kind of design.
So, if timing allows for ample time to do an all parallel compare and filtering
of overlapping patterns, and there is plenty of hardware to implement that,
then you can do a parallel compare.
For an all parallel implementation without FSM and clock, then you can make a
function that takes the pattern and collection, and returns a match indication
std_logic_vector with '1' for start of each match.
The function can then be used in concurrent assign with:
match <= pattern_search_collection(pattern, collection);
The function can be implemented with something along the lines of:
function pattern_search_collection(pat : std_logic_vector;
col : std_logic_vector) return std_logic_vector is
...
begin
-- Code for matching with overlap using loop over all possible positions
-- Code for filtering out overlaps using loop over all result bits
return result;
end function;
Related
I have a VHDL BCD counter whose output is an integer value (digit).
But when I simulate the code in Xilinx ISE it shows the code's waveform in binary value. The code works but the output should be integer but it's not. I have tested this code in Modelsim and the output is correct and it's in integer value. This problem is in code synthesize too and the value is binary.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity bcdcnt is
Port ( clk : in STD_LOGIC;
digit : out INTEGER RANGE 0 TO 9);
end bcdcnt;
architecture Behavioral of bcdcnt is
begin
count: PROCESS(clk)
VARIABLE temp : INTEGER RANGE 0 TO 10;
BEGIN
IF (clk'EVENT AND clk = '1') THEN
temp := temp + 1;
IF (temp = 10) THEN temp := 0;
END IF;
END IF;
digit <= temp;
END PROCESS count;
end Behavioral;
That's what synthesis does.
That's what synthesis MUST do : it translates your high level design to the resources in your FPGA or ASIC, which are binary. So, what's the problem here?
If you need to simulate the post-synth result, the usual approach is to create a wrapper entity that takes the correct port types, and translates between those and the post-synthesis netlist component.
Then, simulation should work with either the original entity, or this wrapper entity, both of which have integer ports.
Better still, you can re-use the same entity, and add the wrapper as a second architecture, thus guaranteeing that it uses the same interface (ports).
(You can even instantiate both the original and post-synth in its wrapper in the testbench, in parallel with a comparator on their outputs, to see they both do the same thing. But note there will be gate-level delays between them; usually you check outputs only on clock edges so these do not matter.)
Another approach is to restrict port types on the top level of the design to binary types like std_logic_vector. This plays nicer with badly designed tools, like ISE where the automatically generated testbench will have binary port types, (I generally edit them back to the correct ones; it's almost easier to write TBs from scratch).
But it restricts you to using an obscure and complex design style instead of higher level abstractions like Integers.
It's bad - really bad - that this approach is taught and encouraged so widely. But it is, and sometimes you'll just have to live with it. (Even in this approach, there's no reason to avoid decent abstractions internal to the FPGA, as long as the synthesis tool understands them).
A third approach - roughly, "trust, but verify" - is to trust that synthesis tools are competently written - which is usually true - and forget about post-synthesis simulation.
Just verify the design thoroughly at the behavioural level in simulation, then synthesise it, and test in live FPGA.
99% of the time (unless you're writing really weird VHDL), synth and P&R have done the right thing, and any differences you see are due to the aforementioned I/O timings (gate delays at the I/O pins). Then model these in the testbench and/or wrapper until you see the same behaviour in both (fix anything that needs fixing, and re-synthesise).
In this approach you only need to bother with the (MUCH slower) post-synth and post-PAR simulations if you need to track down a suspected synthesis tool bug.
This does happen : I've seen two in a quarter century.
Most of the time I just use the third approach.
I have the following from a beginner's VHDL tutorial:
rising_edge: block(clk’event and clk = ‘1’)
begin
result <= guarded input or force after 10ns;
end block rising_edge
The explanatory text is
"Essentially I have a block called rising_edge, and it's a block with a guard condition which does the following, it checks that we have an event on the clock, and that the clock is equal to one, so we're effectively looking for the so called rising_edge. We're looking for the event where the clock goes from 0 to 1, and if it does, then we can conditionally assign the results, so you'll see that the result variable here says that it is a guarded input or force after 10 ns might seem a bit confusing, but consider it without the guarded keyword. All we're doing is we're assigning the result of the evaluation of input or force, and we're doing it in a guarded setup. So, in this case, the assignment of the signal result is only executed if the guard signal is actually true, and in our example it means that the assignment of the expression, which is input or force, will only happen on the rising_edge of the clock because that's on guard condition."
Now I've read this over and over and searched on the net but have come up blanks as to what this is actually doing. Can someone please gently explain its purpose?
A block is essentially a grouping of concurrent statements. In terms of practical usage, it is very similar to a process, only it has a limited scope wich allows component-style signal mapping(with port and port map). It can be used to improve readability(see this question) and really not much else. Blocks are resonably rarely used and frequently not synthesis supported(see here). To my (limited) knowledge, the use of blocks has no other advantage than readability.
Because your block statement contains a guard condition(clk'event and clk='1' is the guard condition here), it is a guarded block. Inside a guarded block, signals that are declared guarded (like in your example) will only be assigned if the guard condition evaluates to true
The entire statement that has been guarded(i.e. in your case input or force after 10ns) will only be executed when the guard condition evaluates to true, i.e. on the rising edge of clk. Thus, for all intents and purposes this block has the same behaviour as
process(clk)
begin
if clk'event and clk = '1' then
result <= input or force after 10ns;
end if;
end process;
I will say though, this is a terrible example. For one thing, as others have stated, the usage of block is very rare and they are generally only used in quite advanced designs. The usage of clk'event and clk = '1' has been discouraged since 1993(see here). It should also be mentioned again that the usage of rising_edge as a label is a terrible idea, as is the use of force for a signal name(in VHDL 2008, force is a reserved keyword that can be used to force a signal to a value).
Working from the idea that this is supposed to be a beginners tutorial, and with the lack of any explanation as to why such an unusual style has been used, a much more conventional implementation would be:
process : (clk)
begin
if (rising_edge(clk)) then
result <= input or force after 10 ns;
end if;
end process;
A couple of points to note:
This assumes that input and force are either signals, or inputs to the entity.
It is unusual to model signal assignment delays if your code is going to be implemented in a real hardware device.
The code in your question uses after 10ns;, which is not valid; you need a space between the value and the units (as in my code).
The code in your question uses rising_edge as an identifier, when this is actually already defined as a function, assuming you are including standard IEEE libraries newer than I believe VHDL93.
The code in your question uses force as a signal name, when this is also a reserved language keyword since VHDL2008.
My advice to you is to find a different tutorial. The quote you posted is not clearly written, and the code you posted appears to be sending you down a strange path. All I can think is that the tutorial is in fact very, very old.
a real junior question with hopefully a junior answer, regarding one of the main assignments of VHDL (concurrent selective assignment) can anyone explain what a VHDL compiler would synthesise the following description into?
LIBRARY IEEE;
USE IEEE.std_logic_1164.ALL;
USE IEEE.numeric_std.ALL;
ENTITY Q2 IS
PORT (a,b,c,d : IN std_logic;
EW_NS : OUT std_logic
);
END ENTITY Q2;
ARCHITECTURE hybrid OF Q2 IS
SIGNAL INPUT : std_logic_vector(3 DOWNTO 0);
SIGNAL EW_NS : std_logic;
BEGIN
INPUT <= (a & b & c & d); -- concatination
WITH (INPUT) SELECT
EW_NS <= '1' WHEN "0001"|"0010"|"0011"|"0110"|"1011",
'0' WHEN OTHERS;
END ARCHITECTURE hybrid;
Why do I ask? well I have previously gone about things the wrong way i.e. describing things on VHDL before making a block diagram of the components needed. I would envisage this been synthed as a group of and gate logic ?
Any help would be really helpful.
Thanks D
You need to look at the user guide for your target FPGA, and understand what is contained within one 'logic element' ('slice' in Xilinx terminology). In general an FPGA does not implement combinatorial logic by connecting up discrete gates like AND, OR, etc. Instead, a logic element will contain one or more 'look-up tables', with typically four (but now 6 in some newer devices) inputs. The inputs to this look up table (LUT) are the inputs to your logic function, and the output is one of the outputs of the function. The LUT is then programmed as a ROM, allowing your input signals to function as an address. There is one ROM entry for every possible combination of inputs, with the result being the intended logic function.
A function with several outputs would simply use several of these LUTs in parallel, with the same inputs, one LUT for each of the function's outputs. A function requiring more inputs than the LUT has (say, 7 inputs, where a LUT has only 4), simply combines two LUTs in parallel, using a multiplexer to choose between the output of the two LUTs. This final multiplexer uses one of the input signals as it's control, and again every possible combination of inputs is accounted for.
This may sound inefficient for creating something simple like an AND gate, but the benefit is that this simple building block (a LUT) can implement absolutely any combinatorial function. It's also worth noting that an FPGA tool chain is extremely good at optimising logic functions in order to simplify them, and to better map them into the FPGA. The LUT provides a highly generic element for these tools to target.
A logic element will also contain some dedicated resources for functions that aren't well suited to the LUT approach. These might include dedicated carry chains for adders, multiplexers for combining the output of several LUTS, registers (most designs are synchronous). LUTs can also sometimes be configured as small shift registers or RAM elements. External to the logic elements, there will be more specific blocks like large multipliers, larger memories, PLLs, etc, none of which can be as efficiently implemented using LUT resource. Again, this will all be explained in the user guide for your target FPGA.
Back in the day, your code would have been implemented as a single 74150 TTL circuit, which is a 16-to-1 mux. you have a 4-bit select (INPUT), and this selects one of 16 inputs to the chip, which is routed to a single output ('EW_NS`). The 74150 is obsolete and I can't find any datasheets, but it's easy to find diagrams of what an 8-to-1 mux looks like (here, for example). The 16->1 is identical, but everything is wider. My old TI databook shows basically exactly the diagram at this link doubled up.
But - wait. Your problem is easier, because you're not routing real inputs to the output - you're just setting fixed data values. On the '150, you do this by wiring 5 of the 16 inputs to 1, and the remaining 11 to 0. This makes the logic much easier.
The 74150 has basiscally exactly the same functionality as a 4-input look-up table (where the fixed look-up data is the same as fixed levels at the '150 inputs), so it's trivial to implement your entire circuit in a single LUT in an FPGA, as per scary_jeff's answer, rather than using a NAND-level implementation. In a proper chip, though, it would be implemented as a sum-of-products, or something similar (exactly what's in the linked diagram). In this case, draw a K-map and find a minimum solution. My 2 minutes on the back of an envelope comes up with three 3-input AND gates, driving a 3-input OR gate. I'll leave it as an exercise to you to check this :)
Let's suppose I have to test different bits on an std_logic_vector. would it be better to implement one single process, that for-loops for each bit or to instantiate 'n' processes using for-generate on which each process tests one bit?
FOR-LOOP
my_process: process(clk, reset) begin
if rising_edge (clk) then
if reset = '1' then
--init stuff
else
for_loop: for i in 0 to n loop
test_array_bit(i);
end loop;
end if;
end if;
end process;
FOR-GENERATE
for_generate: for i in 0 to n generate begin
my_process: process(clk, reset) begin
if rising_edge (clk) then
if reset = '1' then
--init stuff
else
test_array_bit(i);
end if;
end if;
end process;
end generate;
What would be the impact on FPGA and ASIC implementations for this cases? What is easy for the CAD tools to deal with?
EDIT:
Just adding a response I gave to one helping guy, to make my question more clear:
For instance, when I ran a piece of code using for-loops on ISE, the synthesis summary gave me a fair result, taking a long while to compute everything. when I re-coded my design, this time using for-generate, and several processes, I used a bit more area, but the tool was able to compute everything way way faster and my timing result was better as well. So, does it imply on a rule, that is always better to use for-generates with a cost of extra area and lower complexity or is it one of the cases I have to verify every single implementation possibility?
Assuming relatively simple logic in the reset and test functions (for example, no interactions between adjacent bits) I would have expected both to generate the same logic.
Understand that since the entire for loop is executed in a single clock cycle, synthesis will unroll it and generate a separate instance of test_array_bit for each input bit. Therefore it is quite possible for synthesis tools to generate identical logic for both versions - at least in this simple example.
And on that basis, I would (marginally) prefer the for ... loop version because it localises the program logic, whereas the "generate" version globalises it, placing it outside the process boilerplate. If you find the loop version slightly easier to read, then you will agree at some level.
However it doesn't pay to be dogmatic about style, and your experiment illustrates this : the loop synthesises to inferior hardware. Synthesis tools are complex and imperfect pieces of software, like highly optimising compilers, and share many of the same issues. Sometimes they miss an "obvious" optimisation, and sometimes they make a complex optimisation that (e.g. in software) runs slower because its increased size trashed the cache.
So it's preferable to write in the cleanest style where you can, but with some flexibility for working around tool limitations and occasionally real tool defects.
Different versions of the tools remove (and occasionally introduce) such defects. You may find that ISE's "use new parser" option (for pre-Spartan-6 parts) or Vivado or Synplicity get this right where ISE's older parser doesn't. (For example, passing signals out of procedures, older ISE versions had serious bugs).
It might be instructive to modify the example and see if synthesis can "get it right" (produce the same hardware) for the simplest case, and re-introduce complexity until you find which construct fails.
If you discover something concrete this way, it's worth reporting here (by answering your own question). Xilinx used to encourage reporting such defects via its Webcase system; eventually they were even fixed! They seem to have stopped that, however, in the last year or two.
The first snippet would be equivalent to the following:
my_process: process(clk, reset) begin
if rising_edge (clk) then
if reset = '1' then
--init stuff
else
test_array_bit(0);
test_array_bit(1);
............
test_array_bit(n);
end if;
end if;
end process;
While the second one will generate n+1 processes for each i, together with the reset logic and everything (which might be a problem as that logic will attempt to drive the same signals from different processes).
In general, the for loops are sequential statements, containing sequential statements (i.e. each iteration is sequenced to be executed after the previous one). The for-generate loops are concurrent statements, containing concurrent statements, and this is how you can use it to make several instances of a component, for example.
I was recently told that the solution to all (most) problems with unintended latches during VHDL synthesis is to put whatever the problematic signal is in a record.
This seems like it's a little bit too good to be true, but I'm not that experienced with VHDL so there could be something else that I'm not considering.
Should I put all my signals in records?
No, you should not put all your signals in records. This will quickly become very confusing and you will not gain anything by using the record.
One way that a record may help you avoid latches, is if you register an entire record in a clocked process, you are really registering all of the components of the record. This takes one line of code, instead of possibly tens of lines. In the case where you have many elements which all need to be treated the same, a record can save you "silly mistakes", and possibly save you from creating a latch.
As stated by others, a record doesn't have any specific synthesis interpretation. It is simply a group of signals that you are grouping together for coding-convenience.
I don't see how this would help - a record (or even just parts of a record) can become a latch just as easily as a signal. A latch is generated if a signal keeps its state through some combinatorial process (i.e., is not assigned a value on ALL paths through the process). The same holds for constituents of a record.
Records can be useful to group related signals for readability, but synthesis-wise a record is pretty much equivalent to a bunch of individual signals.
My personal suggestion to avoid latches: avoid combinatorial processes. Make all processes clocked, and do combinatorial logic at the architecture level.
A record is just another way of grouping other types, similar to using an array
for grouping of a std_logic to std_logic_vector, so there is nothing
magical about records that make them better for avoiding latches in a design.
If you get unintended latches in your design, what I guess you think of as
"latch problems", it is because you coding style specifies latches, and you
should change the coding style, as #zennehoy also suggests.
One approach can be to define some code templates for different constructions
that you use, and then stick to these known and working templates.
The template for a flip-flop (FF) with asynchronous reset can be:
process (clk_i, rst_i) is
begin
-- Clock
if rising_edge(clk_i) then
... Control structures with Qs assign by function for Ds
... Synchronous reset is just another branch
end if;
-- Reset (asynchronous) if required
if rst_i = '1' then
... Qs assign with constant reset value for so or all Qs
end if;
end process;
Use concurrent signal assigns when possible, and more complex expressions can
be done through use of concurrent function call, where a function is used
outside a process like:
z_o <= fun(a_i, b_i);
If a process is used to create combinatorial logic, then a common pitfall and
cause for latches in VHDL is to forget a signal in the sensitivity list.
However, VHDL-2008 has a solution for this, since you can use (all) as
sensitivity list, whereby all signal used in the process are implicitly
included in the sensitivity list. So if you use VHDL-2008, then your template
for combinatorial processes can be:
process (all) is
begin
z_o <= a_i and b_i;
end process;
These template should be all you need for typical synthesizable design, and
these will keep your design latch free.