Real-life use of SLA operator in VHDL [closed]

Real-life use of SLA operator in VHDL [closed] - vhdl

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Most, if not all, VHDL textbooks casually list the arithmetic operators and amongst them even more so casually the Shift Left Arithmetic (SLA) as shift left that fills with rightmost element. Although even wikipedia disagrees with this definition, I have never seen anyone to even blink an eye to this definition, it's just taken in with a face value, "yup, makes sense".
However, the apparent complete uselessness of filling with the rightmost value in binary arithmetics made me to wonder - has anyone found a really good use for SLA in real life?
(I'm not against the existence of SLA, nor want to eradicate it from VHDL. This is just a bit of fun for the Easter holidays...)

This site mentions:
At one point, there were actual shift operators built into VHDL. These were: srl, sll, sra, sla. However these shift operators never worked correctly and were removed from the language. This website goes into great detail about the history of these VHDL shift operators and discusses why they have been replaced by shift_right() and shift_left() functions.
However the link is broken.
The wikipedia page you link shows something very important
The VHDL arithmetic left shift operator [sla] is unusual. Instead of filling the LSB of the result with zero, it copies the original LSB into the new LSB. While this is an exact mirror image of the arithmetic right shift, it is not the conventional definition of the operator, and is not equivalent to multiplication by a power of 2. In the VHDL 2008 standard this strange behavior was left unchanged (for backward compatibility) for argument types that do not have forced numeric interpretation (e.g., BIT_VECTOR) but 'SLA' for unsigned and signed argument types behaves in the expected way (i.e., rightmost positions are filled with zeros). VHDL's shift left logical (SLL) function does implement the aforementioned 'standard' arithmetic shift.
I.e. now we have the peculiar situation that sla behaves differently for different data types. That sounds like a very stupid thing to me.
However, this is expected... in my experience IEEE standardization committees mostly consist of a lot of older men, some who where there when the committee started (<'87). The rest are representatives of companies, that also don't like change. If sla etc. would be removed, this would mean a lot of old code would have to be rewritten.
Then after months of discussion they decide to keep the old folks happy by keeping the old behavior for old data types, and changing the behavior for new data types....
edit
To make it more awkward, it seems sla is not defined for the std_logic_vector and unsigned types and such... But it comes back for the ufixed type and such. So I've managed to write some (vhdl-2008) code demonstrating the difference in behavior:
entity test_e is
end entity;
library ieee;
architecture test_a of test_e is
constant value1 : bit_vector(3 downto 0) := "0001";
constant value2 : bit_vector(3 downto 0) := value1 sla 2;
use ieee.fixed_pkg.all;
constant value3 : ufixed(3 downto 0) := to_ufixed(1,3,0);
constant value4 : ufixed(3 downto 0) := value3 sla 2;
begin
end architecture;
When simulating, value2 will hold "0111" and value4 will hold "0100".

Of course, when you need to multiple to 2^n, you just need to SLA to n bit.
I use it in current project, to reduce used resources of FPGA, before division I use SRA, in result I get smaller number than was, use division and then use SRL to get right scale of the value.
It looks like, C=A/B, with reducing C*N=(A/N)/B.
You get some error in C, but reduce used resources.

Related

Most efficient VHDL for large vector?

I want to be able to have a shift register that does an XOR against another register loaded with some value. The issue is that I wish to do this with a large scale vector, something on the order of thousands of bits wide.
The obvious way to do this in VHDL would be something like
generic( length : integer := 15);
signal shiftreg : std_logic_vector(length downto 0);
process(clk)
begin
if rising_edge(clk) then
shiftreg<= shiftreg(length-1 downto 0) & input;
endif;
end process;
However, if length here is set to some very high number, attempting to synthesize this becomes a massive undertaking. Since this is a relatively simple structure I imagine it is taking so long because the length is far beyond the number of registers in a single block.
My question is if there is some way to implement a large vector like this in a way that would be quicker to synthesize. For example, is it quicker to use something like
array(length downto 0) of std_logic;
or does a synthesis tool recognize those are equivalent?

Synthesis time is not typically relevant in FPGA design, although area utilization and timing usually is. If your shift register takes most of the resources that your target FPGA has, synthesis will take a long time trying to figure out a way to make it work, and likewise builds take longer as you fill up larger parts. For some ballpark, an 80% full design with tight timing in a modern midrange FPGA usually takes about 30 minutes to synthesize and 3 hours to place&route. This will not be significantly affected by coding style if you're still describing the same functionality.
If you describe a shift register (with the same functional features) in VHDL using std_logic_vector, a type you defined as an array of std_logic, or anything else, it will synthesize into the same thing.
In recent-ish Xilinx parts at least, a single LUT can be used for a 64-deep shift register as long as you haven't described a reset (synchronous OR asynchronous). You can likewise produce a 1000 deep shift register with just a handful of LUTs.
Now if you're looking to use the whole thousand+ bits of this shift register to xor against some other register, you can't use SRLs (LUT used as a shift register) because only the final bit is accessible as an output. This makes it put the whole thing in registers which may be rather large, and could require more registers than your part has. The key thing here is that you have to think about the scale of the hardware you describe, and whether that's feasible in your target part.
If you want a really deep shift register, block rams can be used to act like shift registers at depths exceeding 100,000 but these have the same issue where you only access the final output.

VHDL concurrent selective assignment synthesis

a real junior question with hopefully a junior answer, regarding one of the main assignments of VHDL (concurrent selective assignment) can anyone explain what a VHDL compiler would synthesise the following description into?
LIBRARY IEEE;
USE IEEE.std_logic_1164.ALL;
USE IEEE.numeric_std.ALL;
ENTITY Q2 IS
PORT (a,b,c,d : IN std_logic;
EW_NS : OUT std_logic
);
END ENTITY Q2;
ARCHITECTURE hybrid OF Q2 IS
SIGNAL INPUT : std_logic_vector(3 DOWNTO 0);
SIGNAL EW_NS : std_logic;
BEGIN
INPUT <= (a & b & c & d); -- concatination
WITH (INPUT) SELECT
EW_NS <= '1' WHEN "0001"|"0010"|"0011"|"0110"|"1011",
'0' WHEN OTHERS;
END ARCHITECTURE hybrid;
Why do I ask? well I have previously gone about things the wrong way i.e. describing things on VHDL before making a block diagram of the components needed. I would envisage this been synthed as a group of and gate logic ?
Any help would be really helpful.
Thanks D

You need to look at the user guide for your target FPGA, and understand what is contained within one 'logic element' ('slice' in Xilinx terminology). In general an FPGA does not implement combinatorial logic by connecting up discrete gates like AND, OR, etc. Instead, a logic element will contain one or more 'look-up tables', with typically four (but now 6 in some newer devices) inputs. The inputs to this look up table (LUT) are the inputs to your logic function, and the output is one of the outputs of the function. The LUT is then programmed as a ROM, allowing your input signals to function as an address. There is one ROM entry for every possible combination of inputs, with the result being the intended logic function.
A function with several outputs would simply use several of these LUTs in parallel, with the same inputs, one LUT for each of the function's outputs. A function requiring more inputs than the LUT has (say, 7 inputs, where a LUT has only 4), simply combines two LUTs in parallel, using a multiplexer to choose between the output of the two LUTs. This final multiplexer uses one of the input signals as it's control, and again every possible combination of inputs is accounted for.
This may sound inefficient for creating something simple like an AND gate, but the benefit is that this simple building block (a LUT) can implement absolutely any combinatorial function. It's also worth noting that an FPGA tool chain is extremely good at optimising logic functions in order to simplify them, and to better map them into the FPGA. The LUT provides a highly generic element for these tools to target.
A logic element will also contain some dedicated resources for functions that aren't well suited to the LUT approach. These might include dedicated carry chains for adders, multiplexers for combining the output of several LUTS, registers (most designs are synchronous). LUTs can also sometimes be configured as small shift registers or RAM elements. External to the logic elements, there will be more specific blocks like large multipliers, larger memories, PLLs, etc, none of which can be as efficiently implemented using LUT resource. Again, this will all be explained in the user guide for your target FPGA.

Back in the day, your code would have been implemented as a single 74150 TTL circuit, which is a 16-to-1 mux. you have a 4-bit select (INPUT), and this selects one of 16 inputs to the chip, which is routed to a single output ('EW_NS`). The 74150 is obsolete and I can't find any datasheets, but it's easy to find diagrams of what an 8-to-1 mux looks like (here, for example). The 16->1 is identical, but everything is wider. My old TI databook shows basically exactly the diagram at this link doubled up.
But - wait. Your problem is easier, because you're not routing real inputs to the output - you're just setting fixed data values. On the '150, you do this by wiring 5 of the 16 inputs to 1, and the remaining 11 to 0. This makes the logic much easier.
The 74150 has basiscally exactly the same functionality as a 4-input look-up table (where the fixed look-up data is the same as fixed levels at the '150 inputs), so it's trivial to implement your entire circuit in a single LUT in an FPGA, as per scary_jeff's answer, rather than using a NAND-level implementation. In a proper chip, though, it would be implemented as a sum-of-products, or something similar (exactly what's in the linked diagram). In this case, draw a K-map and find a minimum solution. My 2 minutes on the back of an envelope comes up with three 3-input AND gates, driving a 3-input OR gate. I'll leave it as an exercise to you to check this :)

Multiplication of two different bit numbers in VHDL

I have two numbers A and B, both of different sizes and i need to multiply them using VHDL. I don't know the exact logic to multiply them.

If you are trying to multiply two std_logic_vector, then * will fails,
since std_logic_vector is just an array of std_logic elements, but does not
have an inherit numerical representation.
So take a look a the
ieee.numeric_std VHDL
package. This defines unsigned and signed types that assume a typical
numerical representation of an array, along with operators on these types,
including *. Using this package you can do:
use ieee.numeric_std.all;
...
c <= std_logic_vector(unsigned(a) * unsigned(b));
Note that for * the c'length is a'length + b'length.
Btw. welcome to Stack Overflow, and please spend some time in Stack Overflow
Help Center, so you can get better answers in
the future, and avoid being voted down or get the answer closed.

Trouble having Implement Shift Operator (sll) in VHDL

I am trying to make a BCD converter to show numbers from 0 to 9999, I need to implement Double Dabble Algorithm using the shift operators. But I just cannot start coding without running into warnings i dont really know about, I am still a beginner so please ignore any stupid mistakes that I make. I started off by first implementing the algorithm. I have never used shift operators so I am probably not doing it right, please help, here is my code
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity algorithm is
Port (x: in unsigned (15 downto 0);
y: out unsigned (15 downto 0));
end algorithm;
architecture Behavioral of algorithm is
begin
y <= x sll 16;
end Behavioral;
And the error
Xst:647 - Input <x> is never used. This port will be preserved and left unconnected
if it belongs to a top-level block or it belongs to a sub-block and the hierarchy of
this sub-block is preserved.
Even if I implement this
y <= x sll 1;
I get this error
Xst:647 - Input <x<15>> is never used. This port will be preserved and left
unconnected if it belongs to a top-level block or it belongs to a sub-block
and the hierarchy of this sub-block is preserved.
What am I doing wrong here?

What you are doing wrong is, firstly, attempting to debug a design via synthesis.
Write a simple testbench which, first, exercises your design (i.e. given the code above, feeds some data into the X input port).
Later you can extend the testbench to read the Y output port and compare the output with what you would expect for each input, but you're not ready for that yet.
Simulate the testbench and add the entity's internal signals to the Wave window : does the entity do what you expect? If so, proceed to synthesis. Otherwise, find and fix the problem.
The specific lines of code above, y <= x sll 16; and y <= x sll 1; work correctly and the synthesis warnings (NOT errors) are as expected. Shifting a 16 bit number by 16 bits and fitting the result into a 16 bit value, there is nothing left, so (as the warning tells you) port X is entirely unused. Shifting by 1 bit, the MSB falls off the top of the result, again exactly as the warning says.
It is the nature of synthesis to warn you of hundreds of such things (often most of them come from the vendor's own IP, strangely enough!) : if you have verified the design in simulation you can glance at the warnings and ignore most of them. Sometimes things really do go wrong, then one or two of the warnings MAY be useful. But they are not a primary debugging technique; most of them are natural and expected, as above.
As David says, you probably do want a loop inside a clocked process : FOR loops are synthesisable. I have recently read a statement that WHILE loops are often also synthesisable, but I have found this to be less reliable.

how to approach string/pattern checking in VHDL?

I have a problem - that I think I can solve with regular 'c' like operations
but I was wondering if there is a better way, something like 'regexp' for VHDL
the problem is that I have a string/collection of bits, "101010101010101" and I want to look for the pattern (with no overlapping) "1010" inside
what are my best options for attacking this problem?
edit : I'd like to mention that the input is parralel, all the bits at once and not in serial
it is still possible to implement this as an FSM - but it there a more efficient way?

If all you want to do is find a pattern within a vector, then you can just iterate over it. Assuming "downto" vectors:
process (vec, what_to_find)
begin
found <= 0;
for start in vec'high downto vec'low+what_to_find'length loop
if vec(start downto start - what_to_find'length) = what_to_find then
found <= start;
end if;
end for;
end process;
Depending on the sizes of your input and search vectors compared to the target device, this will be a reasonable or unreasonable amount of logic!

VHDL does not have builtin regex support, however what you are planning to solve is a pattern matching problem. Basically what you do is build a statemachine (which is what happens when evaluating a regular expression) and use it to match the input. The most simple approach is to check whether the first n bit match your pattern, then shift and continue. Longer, or more interesting patterns, e.g., incorporating quantifiers, matching groups etc. require a bit more.
There are numerous approaches to do that (try google vhdl pattern matching, it is used,e.g., for network traffic analysis), I even found one that would automatically generate the vhdl. I would guess, however, that a specialized hand-made version for your problem would be rather more efficient.

The there is no generally applicable VHDL solution for that kind of pattern
matching, but the solution should be driven by the requirements, since size and
speed can vary greatly for that kind of design.
So, if timing allows for ample time to do an all parallel compare and filtering
of overlapping patterns, and there is plenty of hardware to implement that,
then you can do a parallel compare.
For an all parallel implementation without FSM and clock, then you can make a
function that takes the pattern and collection, and returns a match indication
std_logic_vector with '1' for start of each match.
The function can then be used in concurrent assign with:
match <= pattern_search_collection(pattern, collection);
The function can be implemented with something along the lines of:
function pattern_search_collection(pat : std_logic_vector;
col : std_logic_vector) return std_logic_vector is
...
begin
-- Code for matching with overlap using loop over all possible positions
-- Code for filtering out overlaps using loop over all result bits
return result;
end function;

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio