How to implement a (pseudo) hardware random number generator - random

How do you implement a hardware random number generator in an HDL (verilog)?
What options need to be considered?
This question is following the self-answer format. Addition answers and updates are encouraged.

As noted in Morgan's answer this will only produce a single random bit. The number of bits in the LFSR only set how many values you get before the sequence repeats. If you want an N bit random number you have to run the LFSR for N cycles. However, if you want a new number every clock cycle the other option is to unroll the loop and predict what the number will be in N cycles. Repeating Morgan's example below, but to get a 5 bit number each cycle:
module fibonacci_lfsr_5bit(
input clk,
input rst_n,
output reg [4:0] data
);
reg [4:0] data_next;
always #* begin
data_next[4] = data[4]^data[1];
data_next[3] = data[3]^data[0];
data_next[2] = data[2]^data_next[4];
data_next[1] = data[1]^data_next[3];
data_next[0] = data[0]^data_next[2];
end
always #(posedge clk or negedge rst_n)
if(!rst_n)
data <= 5'h1f;
else
data <= data_next;
endmodule
Edit: Added a new version below which doesn't require you to do the math. Just put it in a loop and let the synthesis tool figure out the logic:
module fibonacci_lfsr_nbit
#(parameter BITS = 5)
(
input clk,
input rst_n,
output reg [4:0] data
);
reg [4:0] data_next;
always_comb begin
data_next = data;
repeat(BITS) begin
data_next = {(data_next[4]^data_next[1]), data_next[4:1]};
end
end
always_ff #(posedge clk or negedge reset) begin
if(!rst_n)
data <= 5'h1f;
else
data <= data_next;
end
end
endmodule
I would like to make the LFSR length parameterizable as well, but that is much more difficult since the feedback taps don't follow a simple pattern.

This is a TRNG (True random number generator) that works on an FPGA. It is basically an LFSR type structure without the flip flops, so it is a combinatorial loop that runs continuously. The signal oscillates chaotically, when you combine several of these modules and XOR bits you get a truly random bit, since the jitter from each combines. The maximum clock rate you can run this at depends on your FPGA, you should test the randomness with a testing suite like diehard, dieharder, STS or TestU01.
These are called Galois Ring Oscillators(GARO). There are other TRNGs which use less power and area, but they are tricker to operate and write, usually relying on tuning delays to make a flipflop go metastable.
module GARO (input stop,clk, reset, output random);
(* OPTIMIZE="OFF" *) //stop *xilinx* tools optimizing this away
wire [31:1] stage /* synthesis keep */; //stop *altera* tools optimizing this away
reg meta1, meta2;
assign random = meta2;
always#(posedge clk or negedge reset)
if(!reset)
begin
meta1 <= 1'b0;
meta2 <= 1'b0;
end
else if(clk)
begin
meta1 <= stage[1];
meta2 <= meta1;
end
assign stage[1] = ~&{stage[2] ^ stage[1],stop};
assign stage[2] = !stage[3];
assign stage[3] = !stage[4] ^ stage[1];
assign stage[4] = !stage[5] ^ stage[1];
assign stage[5] = !stage[6] ^ stage[1];
assign stage[6] = !stage[7] ^ stage[1];
assign stage[7] = !stage[8];
assign stage[8] = !stage[9] ^ stage[1];
assign stage[9] = !stage[10] ^ stage[1];
assign stage[10] = !stage[11];
assign stage[11] = !stage[12];
assign stage[12] = !stage[13] ^ stage[1];
assign stage[13] = !stage[14];
assign stage[14] = !stage[15] ^ stage[1];
assign stage[15] = !stage[16] ^ stage[1];
assign stage[16] = !stage[17] ^ stage[1];
assign stage[17] = !stage[18];
assign stage[18] = !stage[19];
assign stage[19] = !stage[20] ^ stage[1];
assign stage[20] = !stage[21] ^ stage[1];
assign stage[21] = !stage[22];
assign stage[22] = !stage[23];
assign stage[23] = !stage[24];
assign stage[24] = !stage[25];
assign stage[25] = !stage[26];
assign stage[26] = !stage[27] ^ stage[1];
assign stage[27] = !stage[28];
assign stage[28] = !stage[29];
assign stage[29] = !stage[30];
assign stage[30] = !stage[31];
assign stage[31] = !stage[1];
endmodule

An LFSR is often the first port of call. Implementation is relatively simple, a shift register with a number of terms XORd together to create the feedback term.
When considering the implementation of the LFSR, the bit width of the random number and the repeatability of the number need to be considered. With N bits a Maximal LFSR will have (2**N) - 1 states. All zero state can not be used with out additional hardware.
An example 4 bit LFSR with taps a bit 0 and bit 4:
module fibonacci_lfsr(
input clk,
input rst_n,
output [4:0] data
);
wire feedback = data[4] ^ data[1] ;
always #(posedge clk or negedge rst_n)
if (~rst_n)
data <= 4'hf;
else
data <= {data[3:0], feedback} ;
endmodule
Choosing tap points and finding out the sequence length (number of numbers before it repeats) can be found from this table.
For example a sequence of 17,820,000, 30 bits wide could use taps of :
0x20000029 => bits "100000000000000000000000101001"
0x2000005E => bits "100000000000000000000001011110"
0x20000089 => bits "100000000000000000000010001001"
The first would have a feedback term of:
feedback = data[29] ^ data[5] ^ data[3] ^ data[0];
If you are unsure of the order of the taps, remember that the MSB will always be a feedback point. The Last (tap) feedback point defines the effective length of the LFSR, after that it would just be a shift register and have no bearing on the feedback sequence.
If you needed a sequence of 69,273,666 you would have to implement a 31 bit LFSR and choose 30 bits for your random number.
LFSRs are a great way to create a 1-bit random number stream but if you are taking multiple consecutive bits that there is a correlation between values, it is the same number shifted plus dither bit. If the number is being used as a dither stream you may want to introduce a mapping layer, for example swap every other bit. Alternatively use an LFSR of different length or tap points for each bit.
Further Reading
Efficient Shift Registers, LFSR Counters, and Long Pseudo-Random Sequence Generators,
A Xilinx app note by Peter Alfke.
Linear Feedback Shift Registers in Virtex Devices,
A Xilinx app note by Maria George and Peter Alfke.

Related

Why does this error in indexing BCD adder appear?

I am not sure, what exactly the error is. I think, my indexing in the for-loop is not Verilog-compatible, but I might be wrong.
Is it allowed to index like this (a[(4*i)+3:4*i]) in a for-loop just like in C/C++?
Here is a piece of my code, so the for-loop would make more sense
module testing(
input [399:0] a, b,
input cin,
output reg cout,
output reg [399:0] sum );
// bcd needs 4 bits + 1-bit carry --> 5 bits [4:0]
reg [4:0] temp_1;
always #(*) begin
for (int i = 0; i < 100; i++) begin
if (i == 0) begin // taking care of cin so the rest of the loop works smoothly
temp_1[4:0] = a[3:0] + b[3:0] + cin;
sum[3:0] = temp_1[3:0];
cout = temp_1[4];
end
else begin
temp_1[4:0] = a[(4*i)+3:4*i] + b[(4*i)+3:4*i] + cout;
sum[(4*i)+3:4*i] = temp_1[3:0];
cout = temp_1[4];
end
end
end
endmodule
This might seem obvious. I'm doing the exercises from:
HDLBits and got stuck on this one in particular for a long time (This solution isn't the one intended for the exercise).
Error messages Quartus:
Error (10734): Verilog HDL error at testing.v(46): i is not a constant File: ../testing.v Line: 46
Error (10734): Verilog HDL error at testing.v(47): i is not a constant File: ../testing.v Line: 47
But I tried the same way in indexing and got the same error
The error appears because Verilog does not allow variables at both indices of a part select (bus slice indexes).
The most dynamic thing that can be done involves the indexed part select.
Here is a related but not duplicate What is `+:` and `-:`? SO question.
Variations of this question are common on SO and other programmable logic design forums.
I took your example and used the -: operator rather than the : and changed the RHS of this to a constant. This version compiles.
module testing(
input [399:0] a, b,
input cin,
output reg cout,
output reg [399:0] sum );
// bcd needs 4 bits + 1-bit carry --> 5 bits [4:0]
reg [4:0] temp_1;
always #(*) begin
for (int i = 0; i < 100; i++) begin
if (i == 0) begin // taking care of cin so the rest of the loop works smoothly
temp_1[4:0] = a[3:0] + b[3:0] + cin;
sum[3:0] = temp_1[3:0];
cout = temp_1[4];
end
else begin
temp_1[4:0] = a[(4*i)+3-:4] + b[(4*i)+3-:4] + cout;
sum[(4*i)+3-:4] = temp_1[3:0];
cout = temp_1[4];
end
end
end
endmodule
The code will not behave as you wanted it to using the indexed part select.
You can use other operators that are more dynamic to create the behavior you need.
For example shifting, and masking.
Recommend you research what others have done, then ask again if it still is not clear.

VHDL Integer Range Output Bus Width

I'm currently working on writing a simple counter in VHDL, trying to genericize it as much as possible. Ideally I end up with a counter that can pause, count up/down, and take just two integer (min, max) values to determine the appropriate bus widths.
As far as I can tell, in order to get an integer of a given range, I just need to delcare
VARIABLE cnt: INTEGER RANGE min TO max := 0
Where min and max are defined as generics (both integers) in the entity. My understanding of this is that if min is 0, max is 5, for example, it will create an integer variable of 3 bits.
My problem is that I actually want to output this integer. So, naturally, I write
counterOut : OUT INTEGER RANGE min TO max
But this does not appear to be doing what I need. I'm generating a schematic block in Quartus Prime from this, and it creates a bus output from [min...max]. For example, if min = 0, max = 65, it outputs a 66 bit bus. Instead of the seven bit bus it should.
If I restricted the counter to unsigned values I might be able to just math out the output bus size, but I'd like to keep this as flexible as possible, and of course I'd like to know what I'm actually doing wrong and how to do it properly.
TL;DR: I want a VHDL entity to take generic min,max values, and generate an integer output bus of the required width to hold the range of values. How do?
If it matters, I'm using Quartus Prime Lite Edition V20.1.0 at the moment.
Note: I know I can use STD_LOGIC_VECTOR instead, but it is going to simulate significantly slower and is less easy to use than the integer type as far as I have read. I can provide more of my code if necessary, but it's really this one line that's the problem as far as I can tell.
I originally posted this on Stackexchange, but I think Stackoverflow might be a better place since it's more of a programming than a hardware problem.
EDIT: Complete code shown below
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.numeric_std.all;
USE ieee.std_logic_signed.all;
ENTITY Counter IS
GENERIC (modulo : INTEGER := 32;
min : INTEGER := 0;
max : INTEGER := 64);
PORT( pause : IN STD_LOGIC;
direction : IN STD_LOGIC; -- 1 is up, 0 is down
clk : IN STD_LOGIC;
counterOut : OUT INTEGER RANGE min TO max --RANGE 0 TO 32 -- THIS line is the one generating an incorrect output bus width
);
END ENTITY Counter;
-- or entity
ARCHITECTURE CounterArch OF Counter IS
BEGIN
PROCESS(direction, pause, clk)
VARIABLE cnt : INTEGER RANGE min TO max := 0;
VARIABLE dir : INTEGER;
BEGIN
IF direction = '1' THEN
dir := 1;
ELSE
dir := -1;
END IF;
IF clk'EVENT AND clk = '1' THEN
IF pause = '0'THEN
IF (cnt = modulo AND direction = '1') THEN
cnt := min; -- If we're counting up and hit modulo, reset to min value.
ELSIF (cnt = min AND direction = '0') THEN
cnt := modulo; --Counting down hit 0, go back to modulo.
ELSE
cnt := cnt + dir;
END IF;
END IF;
END IF;
counterOut <= cnt;
END PROCESS;
END ARCHITECTURE CounterArch;

Unsigned multiplication creates a x2 sized array

I'm trying to create a Shift Register, by using multiplication (*2) to shift bits one position.
However, when I do it, ISE (Xilinx IDE) says me that this expression has x2 the number of elements the original signal has.
To be specific, I've:
if rising_edge(clk) then
registro <= unsigned(sequence);
registro <= registro * 2;
-- Just adds into the last position the new bit, Sin (signal input)
registro <= registro or (Sin, others => '0');
sequence <= std_logic_vector(registro);
end if;
And before, I've declared:
signal Sin : std_logic;
signal sequence : std_logic_vector(0 to 14) := "100101010000000";
signal registro : unsigned (0 to 14);
So I'm getting the error (at multiplication line):
Expression has 30 elements ; expected 15
So, why does it creates a x2 sized vector, if I've only multiplied *2?
What am I missing? How can I accomplish it?
Thank you in advance
Word width grows because you have used multiplication.
Multiplying 2 16-bit unsigned numbers gives you a 32 bit unsigned, in general.
Now it would be possible to optimise your specific case of multiplication by a constant, 2, and have synthesis do the correct thing. In which case the error message would change to
Expression has 16 elements ; expected 15
but why should the synthesis tool bother?
Use a left shift instead, either using a left (right?) shift operator, or explicit slicing and concatenation, for example:
registro <= registro(1 to registro'length-1) & '0';
Incidentally:
Using ascending bit order range is quite unconventional for arithmetic : all I can say is good luck with that...
you have three assignments to the same signal within the same process; only the last one will take effect. (See Is process in VHDL reentrant? for some information on the semantics of signal assignment)
If you declared "sequence" as unsigned in the first place you'd save a lot of unnecessary conversions and the code inside the process would reduce to a single statement, something like
sequence <= ('0' & sequence(0 to sequence'length-2)) or
(0 => Sin, others => '0') when rising_edge(clk);
I am utterly unfamiliar with "wrong way round" arithmetic so I cannot vouch that the shifts actually do what you want.

LSFR counter for random number

module LSFR_counter
#(parameter BITS = 5)
(
input clk,
input rst_n,
output reg [4:0] data
);
reg [4:0] data_next;
always #* begin
data_next[4] = data[4]^data[1];
data_next[3] = data[3]^data[0];
data_next[2] = data[2]^data_next[4];
data_next[1] = data[1]^data_next[3];
data_next[0] = data[0]^data_next[2];
end
always_ff #(posedge clk or negedge rst_n) begin
if(!rst_n)
data <= 5'h1f;
else
data <= data_next;
end
endmodule
This is code for LSFR for 4 bit number. I want to implement N bit Random number generator for an FPGA board.
N is normally reserved for the state of the LFSR, M would be good to use for the number of random bits we wish to generate.
A standard LFSR generates 1 bit of random data, if consecutive bits of the LFSR are used they can be highly correlated, especially if taking a multi-bit value every clock cycle. To remove this correlation we can overclock the lfsr, say 4 times to generate 4 bits. The alternative to the this is to calculate the equations (feedback polynomials) that you would get for each bit. For every clock its internal state (as represented by the N-bits of the LFSR) would move forward 4 steps. Both techniques for over clocking or creating the feedback taps to move the state forward more than 1 step are known as leap-forward.
The code example in the question has been taken from a previous question and answer, this is an example of manually creating the extra feedback for a leap-forward lfsr.
The maths to do this can be done by generating the transition matrix and raising to the power of the number of steps we wish to move forward.
Quick 4-bit LFRS example: with transition matrix a:
a =
0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 1
Feedback is XOR of the first and last bit, seen on last row of the matrix. All other rows are just a single shift. The output of this LFSR is good for one bit. Two bits would suffer from a high correlation, unless it was overclocked.
>> a^2
ans =
0 0 1 0
0 0 0 1
1 0 0 1
1 1 0 1
If we want two bits we need to square the transition matrix. It can be seen that the first two rows are a shift of two places and we require feedback for two places, ie we are moving the LFSR forward two states for every clock.
Just for confirmation if we wanted three bits:
a^3
ans =
0 0 0 1
1 0 0 1
1 1 0 1
1 1 1 1
The second code example in the previous question went on to parameterise the code so the leap forward calculations did not have to be manually created, skipping all of that lovely maths! However the approach used meant it could not be fully parameterised. Therefore I would like to revisit the example I gave for that question:
module fibonacci_lfsr(
input clk,
input rst_n,
output [4:0] data
);
wire feedback = data[4] ^ data[1] ;
always #(posedge clk or negedge rst_n)
if (~rst_n)
data <= 4'hf;
else
data <= {data[3:0], feedback} ;
endmodule
Now we want to parameterise it:
module fibonacci_lfsr#(
parameter POLYNOMIAL = 4'h9
)(
input clk,
input rst_n,
output [4:0] data
);
//AND data with POLYNOMIAL this
// selects only the taps in the polynomial to be used.
// ^( ) performs a XOR reduction to 1 bit
always #* begin
feedback = ^( POLYNOMIAL & data);
end
//Reseting to 0 is easier
// Invert feedback, all 1's state is banned instead of all 0's
always #(posedge clk or negedge rst_n)
if (~rst_n)
data <= 'b0;
else
data <= {data[3:0], ~feedback};
endmodule
A small step now, Just bring the shift outside of the synchronous loop to help with the step after.
always #* begin
data_next = data;
feedback = ^( POLYNOMIAL & data);
data_next = {data_next[3:0], ~feedback} ; //<- Shift and feedback
end
always #(posedge clk or negedge rst_n)
if (~rst_n)
data <= 'b0;
else
data <= data_next;
TL;DR
Now to control the leap-forward iterations, let the tools do the heavy lifting of multiplying the transition matrix.
module fibonacci_lfsr#(
parameter POLYNOMIAL = 4'h9,
parameter N = 4,
parameter BITS = 2
)(
input clk,
input rst_n,
output [BITS-1:0] random
);
reg [N-1:0] data;
reg [N-1:0] data_next;
reg feedback;
assign random = data[N-1:N-BITS];
always #* begin
data_next = data;
// Compiler unrolls the loop, calculating the transition matrix
for (int i=0; i<BITS; i++) begin
feedback = ^( POLYNOMIAL & data_next);
data_next = {data_next[N-2:0], ~feedback} ;
end
end
always #(posedge clk or negedge rst_n)
if (~rst_n)
data <= 'b0;
else
data <= data_next;
endmodule
Example on EDA Playground.
i++ is part of SystemVerilog. If you can only synthesis plain (pre-2009) Verilog then you will need to declare i as an integer and use i =i+1 in the for loop.
If you want to implement an N-bit LFSR, then because each length of LFSR has a different polynomial, and hence a difference set of taps to XOR to produce the next LFSR value, you will need to have constants or a lookup table describing the different tap points, which the design could then could use, based on 'BITS'.
A simpler way to do it might be to implement say a 32-bit LFSR, then use the least significant N bits of this as your output. This has the added benefit of increasing the repetition period for anything but the maximum length LFSR, giving better randomness in these cases.
If you're going for the first option, look at whether using the Fibonacci form instead of the Galois form will make the design more conducive to parametrization in this way. I can't quite work out which form you are using in your 5-bit example.
I'm a VHDL guy*, so I can't give Verilog code, but VHDL-like-pseudocode (untested) might look like this:
constant TAPS_TABLE : TAPS_TABLE_type := (
"00000011",
"00000110",
...
);
for i in 0 to BITS-2 loop
if (TAPS_TABLE(BITS-2)(i) = '1') then
data_next(i) <= data(0) xor data(i+1)
else
data_next(i) <= data(i+1)
end if;
end for;
This would support BITS being between 2 and 8 inclusive, assuming the table was completed. The constant TAPS_TABLE would be optimised away during synthesis, leaving you with something no less resource-hungry than a manually coded LFSR.
* This question originally had a 'VHDL' tag.
In Addition to the previous answers:
Years ago, Xilinx wrote a good AppNote on how to implement 'pseudo random number generators' (PRNGs). The AppNote has a TAP table for n = 3..168. The TAP table is optimized to allow the usage of shift registers. So a PRNG with n=32 does not use 32 single FFs.
Efficient Shift Registers, LFSR Counters, and Long PseudoRandom Sequence GeneratorsXilinx [XAPP 052][1996.07.07]

vhdl code (for loop)

Description:
I want to write vhdl code that finds the largest integer in the array A which is an array of 20 integers.
Question:
what should my algorithm look like, to input where the sequential statements are?
my vhdl code:
highnum: for i in 0 to 19 loop
i = 0;
i < 20;
i<= i + 1;
end loop highnum;
This does not need to be synthesizable but I dont know how to form this for loop a detailed example explaining how to would be appreciated.
Simply translating the C loop to VHDL, inside a VHDL clocked process, will work AND be synthesisable. It will generate a LOT of hardware because it has to generate the output in a single clock cycle, but that doesn't matter if you are just simulating it.
If that is too much hardware, then you have to implement it as a state machine with at least two states, Idle and Calculating, so that it performs only one loop iteration per clock cycle while Calculating, and returns to the Idle state when done.
First of all you should know how have you defined the array in vhdl.
Let me define an array for you.
type array_of_integer array(19 downto 0) of integer;
signal A : array_of_integer :=(others => 0);
signal max : integer;
-- Now above is the array in vhdl of integers all are initialized to value 0.
A(0) <= 1;
A(1) <= 2;
--
--
A(19)<= 19;
-- Now the for loop for calculating maximum
max <= A(0);
for i in 0 to 19 loop
if (A(i) > max) then
max <= A(i);
end if;
end loop;
-- Now If you have problems in understating that where to put which part of code .. in a ----vhdl entity format .. i.e process, ports, etc... you can reply !

Resources