I'm recently trying to store a 2D array whose elements are consisted of 8-bit integers(0~4) by first input its elements row by row (treating it as an 1D array) and then access the values in the 1D array.
my procedure is as follow:
1.initialize an 2048-bits-1D (8*16*16) array (Row1 in the code) in test bench as input
2.cut the 1D array every 8 bits and assign the 8-bit number to the elements in the 2D array
3.use another 1D array (Row2 in the code) to observe the final result, because an array cannot be used as an instance output
So actually i'm turning an 1D array with 256 8-bit elements into a 2D array with 16*16 8-bit elements.
the problem is that after running the simulation,
it seems that most of the elements in the 2D array is in a high z state,
while the last of them have been assigned new value correctly.
Can anyone explain what's going on and how can i fix it?
To be clear, i put my verilog code below:
`timescale 1ns / 1ps
module convPE(
input clk,
input reset,
input [2048:1] Row1,
output [2048:1] Row2
);
wire [7:0] arr[17:0][17:0];
generate
genvar i,j;
for(i=16;i>=1;i=i-1)
begin:gen1
for(j=16;j>=1;j=j-1)
begin:gen2
assign arr[i][j]=Row1[(8*i*j) -: 8];
assign Row2[(8*i*j) -: 8]=arr[i][j];
end
end
end generate
endmodule
And here is the test bench :
`timescale 1ns / 1ps
module testbench;
// Inputs
reg [2048:1] Row1;
reg Clk;
reg Reset;
wire [2048:1] Row2;
convPE uut (
.clk(Clk),
.reset(Reset),
.Row1(Row1),
.Row2(Row2)
);
initial begin
// Initialize Inputs
Row1=2048'd0;
Row1[1784:1777]=8'd1;//1
Row1[1584:1577]=8'd1;
Row1[944:937]=8'd1;
Row1[376:369]=8'd1;
//2
Row1[1720:1713]=8'd2;
Row1[1600:1593]=8'd2;
Row1[1488:1481]=8'd2;
Row1[1480:1473]=8'd2;
Row1[1368:1361]=8'd2;
Row1[1344:1337]=8'd2;
Row1[1336:1329]=8'd2;
Row1[1120:1113]=8'd2;
Row1[1112:1105]=8'd2;
Row1[1080:1073]=8'd2;
Row1[1072:1065]=8'd2;
Row1[1056:1049]=8'd2;
Row1[984:977]=8'd2;
Row1[936:929]=8'd2;
Row1[856:849]=8'd2;
Row1[808:801]=8'd2;
Row1[728:721]=8'd2;
Row1[680:673]=8'd2;
Row1[608:601]=8'd2;
Row1[592:585]=8'd2;
Row1[584:577]=8'd2;
Row1[576:569]=8'd2;
Row1[568:561]=8'd2;
Row1[560:553]=8'd2;
Row1[544:537]=8'd2;
Row1[472:465]=8'd2;
Row1[424:417]=8'd2;
Row1[416:409]=8'd2;
//3
Row1[1712:1705]=8'd3;
Row1[1592:1585]=8'd3;
Row1[1472:1465]=8'd3;
Row1[1360:1353]=8'd3;
Row1[1352:1345]=8'd3;
Row1[1240:1233]=8'd3;
Row1[1208:1201]=8'd3;
Row1[1200:1193]=8'd3;
Row1[1064:1057]=8'd3;
Row1[992:985]=8'd3;
Row1[928:921]=8'd3;
Row1[864:857]=8'd3;
Row1[736:729]=8'd3;
Row1[600:593]=8'd3;
Row1[464:457]=8'd3;
Row1[456:449]=8'd3;
Row1[448:441]=8'd3;
Row1[440:433]=8'd3;
Row1[432:425]=8'd3;
//4
Row1[800:793]=8'd4;
Row1[672:665]=8'd4;
Row1[552:545]=8'd4;
#100
Reset=1'b1;
#100
Reset=1'b0;
Clk=1'b1;
// Add stimulus here
end
always
#50 Clk=~Clk;
endmodule
This (8*i*j) does not work. You have two nested loops so i in the second loop must increment in steps of 16. (The size of the inner loop) Try 8*(i*16+j)-1
Your code is somewhat inconsistent in that you sometimes use 0 and sometimes 1 as lowest index. I suggest you make all your arrays and vectors start from 0. [2047:0] It is the Verilog convention.
I have converted your code using the Verilog conventions I use. I also removed all superfluous signals like clock and reset. With the following code there are no X-es or Z-es in either Row2 or in arr.
`timescale 1ns / 1ps
module convPE(
input [2047:0] Row1,
output [2047:0] Row2
);
wire [7:0] arr[15:0][15:0];
generate
genvar i,j;
for(i=0; i<16; i=i+1)
begin:gen1
for(j=0; j<16; j=j+1)
begin:gen2
assign arr[i][j]=Row1[(8*(i*16+j)) +: 8];
assign Row2[(8*(i*16+j)) +: 8] =arr[i][j];
end
end
endgenerate
endmodule
`timescale 1ns / 1ps
module testbench;
// Inputs
reg [2047:0] Row1;
wire [2047:0] Row2;
convPE uut (
.Row1(Row1),
.Row2(Row2)
);
initial begin
#100; // I want to see X-es first
// Initialize Inputs
Row1=2048'd0;
#100;
$stop;
end
endmodule
The reason I use my method is because it is the standard way of mapping N-dimensional arrays of a certain type onto memory (which is linear) like e.g. C compilers do.
You can use 2048:1 but then you have to think much harder how to convert the indexes to a one-dimensional array. Probably replace the i and j in my formula with something like i-1,j-1.
Related
In VHDL I have created the following package:
--! Custom, 8 bit register map package
package regmap_package is
--! Amount of registers in the array
constant reg_nr : natural := 8;
--! The 8bit register map array type
type regmap_t is array(0 to (reg_nr - 1)) of std_logic_vector(7 downto 0);
end package regmap_package;
Using this package one has access to a new type regmap_t, which is a 2d array of size reg_nr x 8 bits.
In vhdl, I cannot figure out how can I access a single bit in this array. I was able to access only single "registers", that is 8 whole bits. I am trying to get something like this:
some_signal <= regmap_var(0,1);
In order to access the 2nd (bit number 1, counting from 0) from the 1st register (nr 0).
This is not a 2D array, it is only a 1D array, where the elements are themselves a 1D array. Therefore the you need to keep each index into its own ()
some_sl_signal <= regmap_var(0)(1);
module LSFR_counter
#(parameter BITS = 5)
(
input clk,
input rst_n,
output reg [4:0] data
);
reg [4:0] data_next;
always #* begin
data_next[4] = data[4]^data[1];
data_next[3] = data[3]^data[0];
data_next[2] = data[2]^data_next[4];
data_next[1] = data[1]^data_next[3];
data_next[0] = data[0]^data_next[2];
end
always_ff #(posedge clk or negedge rst_n) begin
if(!rst_n)
data <= 5'h1f;
else
data <= data_next;
end
endmodule
This is code for LSFR for 4 bit number. I want to implement N bit Random number generator for an FPGA board.
N is normally reserved for the state of the LFSR, M would be good to use for the number of random bits we wish to generate.
A standard LFSR generates 1 bit of random data, if consecutive bits of the LFSR are used they can be highly correlated, especially if taking a multi-bit value every clock cycle. To remove this correlation we can overclock the lfsr, say 4 times to generate 4 bits. The alternative to the this is to calculate the equations (feedback polynomials) that you would get for each bit. For every clock its internal state (as represented by the N-bits of the LFSR) would move forward 4 steps. Both techniques for over clocking or creating the feedback taps to move the state forward more than 1 step are known as leap-forward.
The code example in the question has been taken from a previous question and answer, this is an example of manually creating the extra feedback for a leap-forward lfsr.
The maths to do this can be done by generating the transition matrix and raising to the power of the number of steps we wish to move forward.
Quick 4-bit LFRS example: with transition matrix a:
a =
0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 1
Feedback is XOR of the first and last bit, seen on last row of the matrix. All other rows are just a single shift. The output of this LFSR is good for one bit. Two bits would suffer from a high correlation, unless it was overclocked.
>> a^2
ans =
0 0 1 0
0 0 0 1
1 0 0 1
1 1 0 1
If we want two bits we need to square the transition matrix. It can be seen that the first two rows are a shift of two places and we require feedback for two places, ie we are moving the LFSR forward two states for every clock.
Just for confirmation if we wanted three bits:
a^3
ans =
0 0 0 1
1 0 0 1
1 1 0 1
1 1 1 1
The second code example in the previous question went on to parameterise the code so the leap forward calculations did not have to be manually created, skipping all of that lovely maths! However the approach used meant it could not be fully parameterised. Therefore I would like to revisit the example I gave for that question:
module fibonacci_lfsr(
input clk,
input rst_n,
output [4:0] data
);
wire feedback = data[4] ^ data[1] ;
always #(posedge clk or negedge rst_n)
if (~rst_n)
data <= 4'hf;
else
data <= {data[3:0], feedback} ;
endmodule
Now we want to parameterise it:
module fibonacci_lfsr#(
parameter POLYNOMIAL = 4'h9
)(
input clk,
input rst_n,
output [4:0] data
);
//AND data with POLYNOMIAL this
// selects only the taps in the polynomial to be used.
// ^( ) performs a XOR reduction to 1 bit
always #* begin
feedback = ^( POLYNOMIAL & data);
end
//Reseting to 0 is easier
// Invert feedback, all 1's state is banned instead of all 0's
always #(posedge clk or negedge rst_n)
if (~rst_n)
data <= 'b0;
else
data <= {data[3:0], ~feedback};
endmodule
A small step now, Just bring the shift outside of the synchronous loop to help with the step after.
always #* begin
data_next = data;
feedback = ^( POLYNOMIAL & data);
data_next = {data_next[3:0], ~feedback} ; //<- Shift and feedback
end
always #(posedge clk or negedge rst_n)
if (~rst_n)
data <= 'b0;
else
data <= data_next;
TL;DR
Now to control the leap-forward iterations, let the tools do the heavy lifting of multiplying the transition matrix.
module fibonacci_lfsr#(
parameter POLYNOMIAL = 4'h9,
parameter N = 4,
parameter BITS = 2
)(
input clk,
input rst_n,
output [BITS-1:0] random
);
reg [N-1:0] data;
reg [N-1:0] data_next;
reg feedback;
assign random = data[N-1:N-BITS];
always #* begin
data_next = data;
// Compiler unrolls the loop, calculating the transition matrix
for (int i=0; i<BITS; i++) begin
feedback = ^( POLYNOMIAL & data_next);
data_next = {data_next[N-2:0], ~feedback} ;
end
end
always #(posedge clk or negedge rst_n)
if (~rst_n)
data <= 'b0;
else
data <= data_next;
endmodule
Example on EDA Playground.
i++ is part of SystemVerilog. If you can only synthesis plain (pre-2009) Verilog then you will need to declare i as an integer and use i =i+1 in the for loop.
If you want to implement an N-bit LFSR, then because each length of LFSR has a different polynomial, and hence a difference set of taps to XOR to produce the next LFSR value, you will need to have constants or a lookup table describing the different tap points, which the design could then could use, based on 'BITS'.
A simpler way to do it might be to implement say a 32-bit LFSR, then use the least significant N bits of this as your output. This has the added benefit of increasing the repetition period for anything but the maximum length LFSR, giving better randomness in these cases.
If you're going for the first option, look at whether using the Fibonacci form instead of the Galois form will make the design more conducive to parametrization in this way. I can't quite work out which form you are using in your 5-bit example.
I'm a VHDL guy*, so I can't give Verilog code, but VHDL-like-pseudocode (untested) might look like this:
constant TAPS_TABLE : TAPS_TABLE_type := (
"00000011",
"00000110",
...
);
for i in 0 to BITS-2 loop
if (TAPS_TABLE(BITS-2)(i) = '1') then
data_next(i) <= data(0) xor data(i+1)
else
data_next(i) <= data(i+1)
end if;
end for;
This would support BITS being between 2 and 8 inclusive, assuming the table was completed. The constant TAPS_TABLE would be optimised away during synthesis, leaving you with something no less resource-hungry than a manually coded LFSR.
* This question originally had a 'VHDL' tag.
In Addition to the previous answers:
Years ago, Xilinx wrote a good AppNote on how to implement 'pseudo random number generators' (PRNGs). The AppNote has a TAP table for n = 3..168. The TAP table is optimized to allow the usage of shift registers. So a PRNG with n=32 does not use 32 single FFs.
Efficient Shift Registers, LFSR Counters, and Long PseudoRandom Sequence GeneratorsXilinx [XAPP 052][1996.07.07]
I would like to instantiate an array of registers, and declare them all according to a certain function. This is for a multiplier block that I'm hoping to construct.
The code I'm working with is below, but this is the line that the compiler does not appreciate:
q[i][7:0] = {8{a[i]}} & b[7:0];
As the code is written out, I hope to make the registers q[0],q[1],....q[7] all store the 8-bit value define by the RHS above. Can anyone tell me what would be the proper way to do this?
Entire code:
`timescale 1ns / 1ps
module multiplier_2(
input [7:0] A,
input [7:0] B,
output reg [15:0] P,
input start,
output stop
);
reg [7:0] q[7:0];
reg P = 0;
//create 8 bit vectors q[i]
genvar i;
generate
for (i = 0; i < 8;i = i+1)
begin: loop
q[i][7:0] = {8{a[i]}} & b[7:0];
end
endgenerate
always # (*)
begin
if (start == 1'b1)
begin
for (i = 0; i < 8; i = i+1)
begin
P = P + (q[i] << i);
end
end
end
endmodule
EDIT: this code also doesn't work:
`timescale 1ns / 1ps
module multiplier_2(
input [7:0] a,
input [7:0] b,
output reg [15:0] P = 16'd0,
input start,
output stop
);
reg [7:0] q[7:0];
//create 8 bit vectors q[i]
genvar i;
generate
always begin
for (i = 0; i < 8;i = i+1)
begin: loop
q[i] = {8{a[i]}} & b[7:0];
end
end
endgenerate
always # (*)
begin
stop = 1'b0;
if (start == 1'b1)
begin
for (i = 0; i < 8; i = i+1)
begin
P = P + (q[i] << i);
end
end
stop = 1'b1;
end
endmodule
Error message:
"Line 16: Procedural assignment to a non-register i is not permitted, left-hand side should be reg/integer/time/genvar"
I do not think this require a generate statement. A standard for loop will work:
reg [7:0] q [0:7];
integer i;
always #* begin
for (i = 0; i < 8; i=i+1) begin: loop
q[i] = {8{a[i]}} & b[7:0];
end
end
Beware of what hardware you are implying though. For loops like generate statements imply parallel hardware.
NB: it is more common to list memories with the depth from 0 to x ie: reg [7:0] q [0:7];
You've got all sorts of issues here. First off, you're getting confused about what a generate statement is, and what you're trying to generate. Are you (1) trying to generate a single always block, which must contain sequential/procedural code, or are you (2) trying to generate/replicate 8 continuous assignments?
You're presumably not doing (1), since there's no point in generating a single always block; the generate is redundant. That leaves (2). So, get rid of the always begin after the generate. The i in your loop is now the 'genvar', or generation variable, and you're replicating 8 assignments; so far, so good. Get rid of the begin:loop and end; you're replicating a single statement, so they're pointless verbiage.
Next problem: the generate loop is now creating concurrent, or parallel, statements; in Verilog-speak, they're module-level statements. They means that they must be continuous assignments, ie they must have an assign in front of them, and not just ordinary procedural assignments, as you've written them. That also means that q must be declared as a wire, and not a reg. There's no good reason for this; it's just how Verilog is.
You now have a second always block, which is a concurrent (module-level) statement, which must contain sequential/procedural code. The i you're referring to in this block is the original genvar, which doesn't work. A genvar can only be used in specific generation-related circumstances; this isn't inside a generate, and you need an ordinary variable here as your index. you can do this by naming your outer begin/end, and declaring a variable inside it, or any other way. You'll now find out that you're creating a procedural assignment to net stop; this is illegal, so change stop's declaration to a reg. This should be enough to get your code to compile.
BTW, #(*) is verbose and unnecessary, and has historically confused at least one tool. #* is more concise.
You've got other issues. Your second always contains a loop. It looks like it might be logically correct, but your synthesiser has to unroll this, and carry out 8 additions, and set stop. This isn't going to work in real life. Think about making these additions concurrent and putting them in a generate, or creating a clocked pipeline, and some more robust (clocked) way of creating stop.
How do you implement a hardware random number generator in an HDL (verilog)?
What options need to be considered?
This question is following the self-answer format. Addition answers and updates are encouraged.
As noted in Morgan's answer this will only produce a single random bit. The number of bits in the LFSR only set how many values you get before the sequence repeats. If you want an N bit random number you have to run the LFSR for N cycles. However, if you want a new number every clock cycle the other option is to unroll the loop and predict what the number will be in N cycles. Repeating Morgan's example below, but to get a 5 bit number each cycle:
module fibonacci_lfsr_5bit(
input clk,
input rst_n,
output reg [4:0] data
);
reg [4:0] data_next;
always #* begin
data_next[4] = data[4]^data[1];
data_next[3] = data[3]^data[0];
data_next[2] = data[2]^data_next[4];
data_next[1] = data[1]^data_next[3];
data_next[0] = data[0]^data_next[2];
end
always #(posedge clk or negedge rst_n)
if(!rst_n)
data <= 5'h1f;
else
data <= data_next;
endmodule
Edit: Added a new version below which doesn't require you to do the math. Just put it in a loop and let the synthesis tool figure out the logic:
module fibonacci_lfsr_nbit
#(parameter BITS = 5)
(
input clk,
input rst_n,
output reg [4:0] data
);
reg [4:0] data_next;
always_comb begin
data_next = data;
repeat(BITS) begin
data_next = {(data_next[4]^data_next[1]), data_next[4:1]};
end
end
always_ff #(posedge clk or negedge reset) begin
if(!rst_n)
data <= 5'h1f;
else
data <= data_next;
end
end
endmodule
I would like to make the LFSR length parameterizable as well, but that is much more difficult since the feedback taps don't follow a simple pattern.
This is a TRNG (True random number generator) that works on an FPGA. It is basically an LFSR type structure without the flip flops, so it is a combinatorial loop that runs continuously. The signal oscillates chaotically, when you combine several of these modules and XOR bits you get a truly random bit, since the jitter from each combines. The maximum clock rate you can run this at depends on your FPGA, you should test the randomness with a testing suite like diehard, dieharder, STS or TestU01.
These are called Galois Ring Oscillators(GARO). There are other TRNGs which use less power and area, but they are tricker to operate and write, usually relying on tuning delays to make a flipflop go metastable.
module GARO (input stop,clk, reset, output random);
(* OPTIMIZE="OFF" *) //stop *xilinx* tools optimizing this away
wire [31:1] stage /* synthesis keep */; //stop *altera* tools optimizing this away
reg meta1, meta2;
assign random = meta2;
always#(posedge clk or negedge reset)
if(!reset)
begin
meta1 <= 1'b0;
meta2 <= 1'b0;
end
else if(clk)
begin
meta1 <= stage[1];
meta2 <= meta1;
end
assign stage[1] = ~&{stage[2] ^ stage[1],stop};
assign stage[2] = !stage[3];
assign stage[3] = !stage[4] ^ stage[1];
assign stage[4] = !stage[5] ^ stage[1];
assign stage[5] = !stage[6] ^ stage[1];
assign stage[6] = !stage[7] ^ stage[1];
assign stage[7] = !stage[8];
assign stage[8] = !stage[9] ^ stage[1];
assign stage[9] = !stage[10] ^ stage[1];
assign stage[10] = !stage[11];
assign stage[11] = !stage[12];
assign stage[12] = !stage[13] ^ stage[1];
assign stage[13] = !stage[14];
assign stage[14] = !stage[15] ^ stage[1];
assign stage[15] = !stage[16] ^ stage[1];
assign stage[16] = !stage[17] ^ stage[1];
assign stage[17] = !stage[18];
assign stage[18] = !stage[19];
assign stage[19] = !stage[20] ^ stage[1];
assign stage[20] = !stage[21] ^ stage[1];
assign stage[21] = !stage[22];
assign stage[22] = !stage[23];
assign stage[23] = !stage[24];
assign stage[24] = !stage[25];
assign stage[25] = !stage[26];
assign stage[26] = !stage[27] ^ stage[1];
assign stage[27] = !stage[28];
assign stage[28] = !stage[29];
assign stage[29] = !stage[30];
assign stage[30] = !stage[31];
assign stage[31] = !stage[1];
endmodule
An LFSR is often the first port of call. Implementation is relatively simple, a shift register with a number of terms XORd together to create the feedback term.
When considering the implementation of the LFSR, the bit width of the random number and the repeatability of the number need to be considered. With N bits a Maximal LFSR will have (2**N) - 1 states. All zero state can not be used with out additional hardware.
An example 4 bit LFSR with taps a bit 0 and bit 4:
module fibonacci_lfsr(
input clk,
input rst_n,
output [4:0] data
);
wire feedback = data[4] ^ data[1] ;
always #(posedge clk or negedge rst_n)
if (~rst_n)
data <= 4'hf;
else
data <= {data[3:0], feedback} ;
endmodule
Choosing tap points and finding out the sequence length (number of numbers before it repeats) can be found from this table.
For example a sequence of 17,820,000, 30 bits wide could use taps of :
0x20000029 => bits "100000000000000000000000101001"
0x2000005E => bits "100000000000000000000001011110"
0x20000089 => bits "100000000000000000000010001001"
The first would have a feedback term of:
feedback = data[29] ^ data[5] ^ data[3] ^ data[0];
If you are unsure of the order of the taps, remember that the MSB will always be a feedback point. The Last (tap) feedback point defines the effective length of the LFSR, after that it would just be a shift register and have no bearing on the feedback sequence.
If you needed a sequence of 69,273,666 you would have to implement a 31 bit LFSR and choose 30 bits for your random number.
LFSRs are a great way to create a 1-bit random number stream but if you are taking multiple consecutive bits that there is a correlation between values, it is the same number shifted plus dither bit. If the number is being used as a dither stream you may want to introduce a mapping layer, for example swap every other bit. Alternatively use an LFSR of different length or tap points for each bit.
Further Reading
Efficient Shift Registers, LFSR Counters, and Long Pseudo-Random Sequence Generators,
A Xilinx app note by Peter Alfke.
Linear Feedback Shift Registers in Virtex Devices,
A Xilinx app note by Maria George and Peter Alfke.
I'm creating a program counter that is supposed to use only unsigned numbers.
I have 2 STD_LOGIC_VECTOR and a couple of STD_LOGIC. Is there anything I need to do so that they only use unsigned? At the moment I only have library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
I also need to increase one of the binary vectors by 1 under certain conditions (as you probably have guessed by now). Would you be so kind to explain how to perform such actions (using unsigned and adding up one) considering one of the vectors is output with 32 bits.
I'm guessing (I tried) Output <= Output + 1; won't do. Oh and I'm using a process.
In brief, you can add the ieee.numeric_std package to your architecture (library ieee; use ieee.numeric_std.all;) and then do the addition using:
Output <= std_logic_vector(unsigned(Output) + 1);
to convert your std_logic_vector to an unsigned vector, increment it, and finally convert the result back to an std_logic_vector.
Note that if Output is an output port, this won't work because you can't access the value of an output port within the same block. If that is the case, you need to add a new signal and then assign Output from that signal, outside your process.
If you do need to add a signal, it might be simpler to make that signal a different type than std_logic_vector. For example, you could use an integer or the unsigned type above. For example:
architecture foo of bar is
signal Output_int : integer range 0 to (2**Output'length)-1;
begin
PR: process(clk, resetn)
begin
if resetn='0' then
Output_int <= 0;
elsif clk'event and clk='1' then
Output_int <= Output_int + 1;
end if;
end process;
Output <= std_logic_vector(to_unsigned(Output_int, Output'length));
end foo;
Output_int is declared with a range of valid values so that tools will be able to determine both the size of the integer as well as the range of valid values for simulation.
In the declaration of Output_int, Output'length is the width of the Output vector (as an integer), and the "**" operator is used for exponentiation, so the expression means "all unsigned integers that can be expressed with as many bits as Output has".
For example, for an Output defined as std_logic_vector(31 downto 0), Output'length is 32. 232-1 is the highest value that can be expressed with an unsigned 32-bit integer. Thus, in the example case, the range 0 to (2**Output'length)-1 resolves to the range 0...4294967295 (232=4294967296), i.e. the full unsigned range that can be expressed with 32 bits.
Note that you'll need to add any wrapping logic manually: VHDL simulators will produce an error when you've reached the maximum value and try to increment by one, even if the synthesized logic will cleanly wrap around to 0.