Unconstrained std_logic_vector - vhdl

I have an assignment to create a Test Bench for a N-bit multiplier. The code is odd to me. It is a black box n-bit but for his std_logic_vectors he does not specify any size. Im guessing this is done with the test bench. I have not seem this before and was hoping someone could explain how this works

VHDL supports unconstrained std_logic_vectors. This means that you can design a block that doesn't specify the length of the inputs and outputs. There can be a number of pitfalls to doing this (see this article), but in the case of something like a multiplier it can help with code reuse. You get to define what the input and output widths of the block are by connecting them to a std_logic_vector of the desired width.
Since you indicated that the block is a multiplier, I would check the documentation or interface to see if there are generics associated with the port widths. That is a common way of creating a generic block interface with I/O specific logic inside. That way you can specify how "wide" the logic needs to be, without having to have a separate block for every possible width.

Related

Self checking testbench in VHDL: what values should be tested?

I want to write a self checking testbench in VHDL for my design. I'm confused on which values to test. Should I check all possible values and see if they are correct? My design also contains a generic parameter, so I should also test different bit sizes. Testing all possible values becomes impossible for larger bit sizes. For example: how would people test a 64 bit floating point adder? I'm sure that it is physically impossible to test all values then?
Maybe you can start with checking the outcome for some specific inputs.
If you can find out the expected values for random numbers in your testbench, you can use
seeds
depending on the timestamp.
You should have some check-function which compares the output with the expected value.
I hope my suggestions are helpful.

Difference between direct and indirect CRC

I have seen two different kinds of CRC algorithms. The one kind is called "direct" the other kind is called "non-direct" or "indirect". The code for both is a bit different. Both are able to calculate the same checksum if direct type is supplied with a converted initial value.
I can successfully run both algorithms and I know how to convert the initial value. So this is no problem.
What I couldn't find out: Why do these two algorithms exist? Is there something that one can do what the other can't? Are they redundant from the user's point of view?
UPDATE You can find a testable online implementation (and C implementations of both aglorithms) here. However these terms (or one of them) are mentioned in some more places. Like here ("direct table algorithm"), in a microcontroller reference document, in forums etc.
The "direct" is referring to how to avoid processing n zero bits at the end for an n-bit CRC.
The mathematical definition of the CRC is a division of the message with n zero bits appended to it. You can avoid the extra operations by exclusive-oring the message with the CRC before operating on it instead of after. This requires processing the initial value of the register in the normal version through the CRC, and having that be the new initial value.
Since it is not necessary, you will never see a real-world CRC algorithm doing the extra operations.
See the section "10. A Slightly Mangled Table-Driven Implementation" in the document you link for a more detailed explanation.

VHDL concurrent selective assignment synthesis

a real junior question with hopefully a junior answer, regarding one of the main assignments of VHDL (concurrent selective assignment) can anyone explain what a VHDL compiler would synthesise the following description into?
LIBRARY IEEE;
USE IEEE.std_logic_1164.ALL;
USE IEEE.numeric_std.ALL;
ENTITY Q2 IS
PORT (a,b,c,d : IN std_logic;
EW_NS : OUT std_logic
);
END ENTITY Q2;
ARCHITECTURE hybrid OF Q2 IS
SIGNAL INPUT : std_logic_vector(3 DOWNTO 0);
SIGNAL EW_NS : std_logic;
BEGIN
INPUT <= (a & b & c & d); -- concatination
WITH (INPUT) SELECT
EW_NS <= '1' WHEN "0001"|"0010"|"0011"|"0110"|"1011",
'0' WHEN OTHERS;
END ARCHITECTURE hybrid;
Why do I ask? well I have previously gone about things the wrong way i.e. describing things on VHDL before making a block diagram of the components needed. I would envisage this been synthed as a group of and gate logic ?
Any help would be really helpful.
Thanks D
You need to look at the user guide for your target FPGA, and understand what is contained within one 'logic element' ('slice' in Xilinx terminology). In general an FPGA does not implement combinatorial logic by connecting up discrete gates like AND, OR, etc. Instead, a logic element will contain one or more 'look-up tables', with typically four (but now 6 in some newer devices) inputs. The inputs to this look up table (LUT) are the inputs to your logic function, and the output is one of the outputs of the function. The LUT is then programmed as a ROM, allowing your input signals to function as an address. There is one ROM entry for every possible combination of inputs, with the result being the intended logic function.
A function with several outputs would simply use several of these LUTs in parallel, with the same inputs, one LUT for each of the function's outputs. A function requiring more inputs than the LUT has (say, 7 inputs, where a LUT has only 4), simply combines two LUTs in parallel, using a multiplexer to choose between the output of the two LUTs. This final multiplexer uses one of the input signals as it's control, and again every possible combination of inputs is accounted for.
This may sound inefficient for creating something simple like an AND gate, but the benefit is that this simple building block (a LUT) can implement absolutely any combinatorial function. It's also worth noting that an FPGA tool chain is extremely good at optimising logic functions in order to simplify them, and to better map them into the FPGA. The LUT provides a highly generic element for these tools to target.
A logic element will also contain some dedicated resources for functions that aren't well suited to the LUT approach. These might include dedicated carry chains for adders, multiplexers for combining the output of several LUTS, registers (most designs are synchronous). LUTs can also sometimes be configured as small shift registers or RAM elements. External to the logic elements, there will be more specific blocks like large multipliers, larger memories, PLLs, etc, none of which can be as efficiently implemented using LUT resource. Again, this will all be explained in the user guide for your target FPGA.
Back in the day, your code would have been implemented as a single 74150 TTL circuit, which is a 16-to-1 mux. you have a 4-bit select (INPUT), and this selects one of 16 inputs to the chip, which is routed to a single output ('EW_NS`). The 74150 is obsolete and I can't find any datasheets, but it's easy to find diagrams of what an 8-to-1 mux looks like (here, for example). The 16->1 is identical, but everything is wider. My old TI databook shows basically exactly the diagram at this link doubled up.
But - wait. Your problem is easier, because you're not routing real inputs to the output - you're just setting fixed data values. On the '150, you do this by wiring 5 of the 16 inputs to 1, and the remaining 11 to 0. This makes the logic much easier.
The 74150 has basiscally exactly the same functionality as a 4-input look-up table (where the fixed look-up data is the same as fixed levels at the '150 inputs), so it's trivial to implement your entire circuit in a single LUT in an FPGA, as per scary_jeff's answer, rather than using a NAND-level implementation. In a proper chip, though, it would be implemented as a sum-of-products, or something similar (exactly what's in the linked diagram). In this case, draw a K-map and find a minimum solution. My 2 minutes on the back of an envelope comes up with three 3-input AND gates, driving a 3-input OR gate. I'll leave it as an exercise to you to check this :)

Integer to Binary Conversion in Simulink

This might look a repetition to my earlier question. But I think its not.
I am looking for a technique to convert the signal in the Decimal format to binary format.
I intend to use the Simulink blocks in the Xilinx Library to convert decimal to binary format.
So if the input is 3, the expected output should in 11( 2 Clock Cycles). I am looking for the output to be obtained serially.
Please suggest me how to do it or any pointers in the internet would be helpful.
Thanks
You are correct, what you need is the parallel to serial block from system generator.
It is described in this document:
http://www.xilinx.com/support/documentation/sw_manuals/xilinx13_1/sysgen_ref.pdf
This block is a rate changing block. Check the mentions of the parallel to serial block in these documents for further descriptions:
http://www.xilinx.com/support/documentation/sw_manuals/xilinx13_1/sysgen_gs.pdf
http://www.xilinx.com/support/documentation/sw_manuals/xilinx13_1/sysgen_user.pdf
Use a normal constant block with a Matlab variable in it, this already gives the output in "normal" binary (assuming you set the properties on it to be unsigned and the binary point at 0.
Then you need to write a small serialiser block, which takes that input, latches it into a shift register and then shifts the register once per clock cycle with the bit that "falls off the end" becoming your output bit. Depending on which way your shift, you can make it come MSB first of LSB first.
You'll have to build the shift register out of ordinary registers and a mux before each one to select whether you are doing a parallel load or shifting. (This is the sort of thing which is a couple of lines of code in VHDL, but a right faff in graphics).
If you have to increase the serial rate, you need to clock it from a faster clock - you could use a DCM to generate this.
Matlab has a dec2bin function that will convert from a decimal number to a binary string. So, for example dec2bin(3) would return 11.
There's also a corresponding bin2dec which takes a binary string and converts to a decimal number, so that bin2dec('11') would return 3.
If you're wanting to convert a non-integer decimal number to a binary string, you'll first want to determine what's the smallest binary place you want to represent, and then do a little bit of pre- and post-processing, combined with dec2bin to get the results you're looking for. So, if the smallest binary place you want is the 1/512th place (or 2^-9), then you could do the following (where binPrecision equals 1/512):
function result = myDec2Bin(decNum, binPrecision)
isNegative=(decNum < 0);
intPart=floor(abs(decNum));
binaryIntPart=dec2bin(intPart);
fracPart=abs(decNum)-intPart;
scaledFracPart=round(fracPart / binPrecision);
scaledBinRep=dec2bin(scaledFracPart);
temp=num2str(10^log2(1/binPrecision)+str2num(scaledBinRep),'%d');
result=[binaryIntPart,'.',temp(2:end)];
if isNegative
result=['-',result];
end
end
The result of myDec2Bin(0.256, 1/512) would then be 0.010000011, and the result of myDec2Bin(-0.984, 1/512) would be -0.111111000. (Note that the output is a string.)

Efficient synthesis of a 4-to-1 function in Verilog

I need to implement a 4-to-1 function in Veriog. The input is 4 bits, a number from 0-15. The output is a single bit, 0 or 1. Each input gives a different output and the mapping from inputs to outputs is known, but the inputs and outputs themselves are not. I want vcs to successfully optimizing the code and also have it be as short/neat as possible. My solution so far:
wire [3:0] a;
wire b;
wire [15:0] c;
assign c = 16'b0100110010111010; //for example but could be any constant
assign b = c[a];
Having to declare c is ugly and I don't know if vcs will recognize the K-map there. Will this work as well as a case statement or an assignment in conjunctive normal form?
What you have is fine. A case statement would also work equally well. It's just a matter of how expressive you wish to be.
Your solution, indexing, works fine if the select encodings don't have any special meaning (a memory address selector for example). If the select encodings do have some special semantic meaning to you the designer (and there aren't too many of them), then go with a case statement and enums.
Synthesis wise, it doesn't matter which one you use. Any decent synthesis tool will produce the same result.
I totally agree with Dallas. Use a case statement - it makes your intent clearer. The synthesis tool will build it as a look-up table (if it's parallel) and will optimise whatever it can.
Also, I wouldn't worry so much about keeping your RTL code short. I'd shoot for clarity first. Synthesis tools are cleverer than you think...
My preference - if it makes sense for your problem - is for a case statement that makes use of enums or `defines. Anything to make code review, maintenance and verification easier.
For things like this, RTL clarity trumps all by a wide margin. SystemVerilog has special always block directives to make it clear when the block should synthesize to combinational logic, latches, or flops (and your synthesis tool should throw an error if you've written RTL that conflicts with that (e.g. not including all signals in the sensitivity list of an always block). Also be aware that the tool will probably replace whatever encoding you have with the most hardware-efficient encoding (the one that minimizes the area of your total design), unless the encoding itself propagates out to the pins of your top-level module.
This advice goes in general, as well. Make your code easy to understand by humans, and it will probably be more understandable to the synthesis tool as well, which allows it to more effectively bring literally thousands of man-years of algorithms research to bear on your RTL.
You can also code it using ternary operators if you like, but i'd prefer something like:
always_comb //or "always #*" if you don't have an SV-enabled tool flow
begin
case(a)
begin
4'b0000: b = 1'b0;
4'b0001: b = 1'b1;
...
4'b1111: b = 1'b0;
//If you don't specify a "default" clause, your synthesis tool
//Should scream at you if you didn't specify all cases,
//Which is a good thing (tm)
endcase //a
end //always
Apparently I am using a lousy synthesis tool. :-) I just synthesized both versions (just the module using a model based on fan-outs for wire delays) and the indexing version from the question gave better timing and area results than the case statements. Using Synopsys DC Z-2007.03-SP.

Resources