I am currently trying to create a module in Verilog that will take 2 inputs of size 2048-bits and perform addition on them where they will be broken into 256 8-bit chunks and added together to prevent a possible carry from 1 being added to the next 8-bits and causing a bad final value.
I have tried the code below and it does give me a proper output under simulation but when moving to synthesis the design is to big.
module addition(
in_1, in_2,
out,
);
input [2047:0] in_1, in_2;
output [2047:0] out;
genvar i;
generate
for(i = 0; i < 256; i = i + 1)begin
assign out[i*8+7:i*8] = (in_1[i*8+7:i*8] + in_2[i*8+7:i*8]);
end
endgenerate
endmodule
I was wondering if there was another way perhaps logically to do this so I didn't have to generate 256 8-bit adders but I am struggling to do so. Does anyone have any suggestions on how to cut down on the size here? Thank you for your help
Related
I am not sure, what exactly the error is. I think, my indexing in the for-loop is not Verilog-compatible, but I might be wrong.
Is it allowed to index like this (a[(4*i)+3:4*i]) in a for-loop just like in C/C++?
Here is a piece of my code, so the for-loop would make more sense
module testing(
input [399:0] a, b,
input cin,
output reg cout,
output reg [399:0] sum );
// bcd needs 4 bits + 1-bit carry --> 5 bits [4:0]
reg [4:0] temp_1;
always #(*) begin
for (int i = 0; i < 100; i++) begin
if (i == 0) begin // taking care of cin so the rest of the loop works smoothly
temp_1[4:0] = a[3:0] + b[3:0] + cin;
sum[3:0] = temp_1[3:0];
cout = temp_1[4];
end
else begin
temp_1[4:0] = a[(4*i)+3:4*i] + b[(4*i)+3:4*i] + cout;
sum[(4*i)+3:4*i] = temp_1[3:0];
cout = temp_1[4];
end
end
end
endmodule
This might seem obvious. I'm doing the exercises from:
HDLBits and got stuck on this one in particular for a long time (This solution isn't the one intended for the exercise).
Error messages Quartus:
Error (10734): Verilog HDL error at testing.v(46): i is not a constant File: ../testing.v Line: 46
Error (10734): Verilog HDL error at testing.v(47): i is not a constant File: ../testing.v Line: 47
But I tried the same way in indexing and got the same error
The error appears because Verilog does not allow variables at both indices of a part select (bus slice indexes).
The most dynamic thing that can be done involves the indexed part select.
Here is a related but not duplicate What is `+:` and `-:`? SO question.
Variations of this question are common on SO and other programmable logic design forums.
I took your example and used the -: operator rather than the : and changed the RHS of this to a constant. This version compiles.
module testing(
input [399:0] a, b,
input cin,
output reg cout,
output reg [399:0] sum );
// bcd needs 4 bits + 1-bit carry --> 5 bits [4:0]
reg [4:0] temp_1;
always #(*) begin
for (int i = 0; i < 100; i++) begin
if (i == 0) begin // taking care of cin so the rest of the loop works smoothly
temp_1[4:0] = a[3:0] + b[3:0] + cin;
sum[3:0] = temp_1[3:0];
cout = temp_1[4];
end
else begin
temp_1[4:0] = a[(4*i)+3-:4] + b[(4*i)+3-:4] + cout;
sum[(4*i)+3-:4] = temp_1[3:0];
cout = temp_1[4];
end
end
end
endmodule
The code will not behave as you wanted it to using the indexed part select.
You can use other operators that are more dynamic to create the behavior you need.
For example shifting, and masking.
Recommend you research what others have done, then ask again if it still is not clear.
I'm recently trying to store a 2D array whose elements are consisted of 8-bit integers(0~4) by first input its elements row by row (treating it as an 1D array) and then access the values in the 1D array.
my procedure is as follow:
1.initialize an 2048-bits-1D (8*16*16) array (Row1 in the code) in test bench as input
2.cut the 1D array every 8 bits and assign the 8-bit number to the elements in the 2D array
3.use another 1D array (Row2 in the code) to observe the final result, because an array cannot be used as an instance output
So actually i'm turning an 1D array with 256 8-bit elements into a 2D array with 16*16 8-bit elements.
the problem is that after running the simulation,
it seems that most of the elements in the 2D array is in a high z state,
while the last of them have been assigned new value correctly.
Can anyone explain what's going on and how can i fix it?
To be clear, i put my verilog code below:
`timescale 1ns / 1ps
module convPE(
input clk,
input reset,
input [2048:1] Row1,
output [2048:1] Row2
);
wire [7:0] arr[17:0][17:0];
generate
genvar i,j;
for(i=16;i>=1;i=i-1)
begin:gen1
for(j=16;j>=1;j=j-1)
begin:gen2
assign arr[i][j]=Row1[(8*i*j) -: 8];
assign Row2[(8*i*j) -: 8]=arr[i][j];
end
end
end generate
endmodule
And here is the test bench :
`timescale 1ns / 1ps
module testbench;
// Inputs
reg [2048:1] Row1;
reg Clk;
reg Reset;
wire [2048:1] Row2;
convPE uut (
.clk(Clk),
.reset(Reset),
.Row1(Row1),
.Row2(Row2)
);
initial begin
// Initialize Inputs
Row1=2048'd0;
Row1[1784:1777]=8'd1;//1
Row1[1584:1577]=8'd1;
Row1[944:937]=8'd1;
Row1[376:369]=8'd1;
//2
Row1[1720:1713]=8'd2;
Row1[1600:1593]=8'd2;
Row1[1488:1481]=8'd2;
Row1[1480:1473]=8'd2;
Row1[1368:1361]=8'd2;
Row1[1344:1337]=8'd2;
Row1[1336:1329]=8'd2;
Row1[1120:1113]=8'd2;
Row1[1112:1105]=8'd2;
Row1[1080:1073]=8'd2;
Row1[1072:1065]=8'd2;
Row1[1056:1049]=8'd2;
Row1[984:977]=8'd2;
Row1[936:929]=8'd2;
Row1[856:849]=8'd2;
Row1[808:801]=8'd2;
Row1[728:721]=8'd2;
Row1[680:673]=8'd2;
Row1[608:601]=8'd2;
Row1[592:585]=8'd2;
Row1[584:577]=8'd2;
Row1[576:569]=8'd2;
Row1[568:561]=8'd2;
Row1[560:553]=8'd2;
Row1[544:537]=8'd2;
Row1[472:465]=8'd2;
Row1[424:417]=8'd2;
Row1[416:409]=8'd2;
//3
Row1[1712:1705]=8'd3;
Row1[1592:1585]=8'd3;
Row1[1472:1465]=8'd3;
Row1[1360:1353]=8'd3;
Row1[1352:1345]=8'd3;
Row1[1240:1233]=8'd3;
Row1[1208:1201]=8'd3;
Row1[1200:1193]=8'd3;
Row1[1064:1057]=8'd3;
Row1[992:985]=8'd3;
Row1[928:921]=8'd3;
Row1[864:857]=8'd3;
Row1[736:729]=8'd3;
Row1[600:593]=8'd3;
Row1[464:457]=8'd3;
Row1[456:449]=8'd3;
Row1[448:441]=8'd3;
Row1[440:433]=8'd3;
Row1[432:425]=8'd3;
//4
Row1[800:793]=8'd4;
Row1[672:665]=8'd4;
Row1[552:545]=8'd4;
#100
Reset=1'b1;
#100
Reset=1'b0;
Clk=1'b1;
// Add stimulus here
end
always
#50 Clk=~Clk;
endmodule
This (8*i*j) does not work. You have two nested loops so i in the second loop must increment in steps of 16. (The size of the inner loop) Try 8*(i*16+j)-1
Your code is somewhat inconsistent in that you sometimes use 0 and sometimes 1 as lowest index. I suggest you make all your arrays and vectors start from 0. [2047:0] It is the Verilog convention.
I have converted your code using the Verilog conventions I use. I also removed all superfluous signals like clock and reset. With the following code there are no X-es or Z-es in either Row2 or in arr.
`timescale 1ns / 1ps
module convPE(
input [2047:0] Row1,
output [2047:0] Row2
);
wire [7:0] arr[15:0][15:0];
generate
genvar i,j;
for(i=0; i<16; i=i+1)
begin:gen1
for(j=0; j<16; j=j+1)
begin:gen2
assign arr[i][j]=Row1[(8*(i*16+j)) +: 8];
assign Row2[(8*(i*16+j)) +: 8] =arr[i][j];
end
end
endgenerate
endmodule
`timescale 1ns / 1ps
module testbench;
// Inputs
reg [2047:0] Row1;
wire [2047:0] Row2;
convPE uut (
.Row1(Row1),
.Row2(Row2)
);
initial begin
#100; // I want to see X-es first
// Initialize Inputs
Row1=2048'd0;
#100;
$stop;
end
endmodule
The reason I use my method is because it is the standard way of mapping N-dimensional arrays of a certain type onto memory (which is linear) like e.g. C compilers do.
You can use 2048:1 but then you have to think much harder how to convert the indexes to a one-dimensional array. Probably replace the i and j in my formula with something like i-1,j-1.
module LSFR_counter
#(parameter BITS = 5)
(
input clk,
input rst_n,
output reg [4:0] data
);
reg [4:0] data_next;
always #* begin
data_next[4] = data[4]^data[1];
data_next[3] = data[3]^data[0];
data_next[2] = data[2]^data_next[4];
data_next[1] = data[1]^data_next[3];
data_next[0] = data[0]^data_next[2];
end
always_ff #(posedge clk or negedge rst_n) begin
if(!rst_n)
data <= 5'h1f;
else
data <= data_next;
end
endmodule
This is code for LSFR for 4 bit number. I want to implement N bit Random number generator for an FPGA board.
N is normally reserved for the state of the LFSR, M would be good to use for the number of random bits we wish to generate.
A standard LFSR generates 1 bit of random data, if consecutive bits of the LFSR are used they can be highly correlated, especially if taking a multi-bit value every clock cycle. To remove this correlation we can overclock the lfsr, say 4 times to generate 4 bits. The alternative to the this is to calculate the equations (feedback polynomials) that you would get for each bit. For every clock its internal state (as represented by the N-bits of the LFSR) would move forward 4 steps. Both techniques for over clocking or creating the feedback taps to move the state forward more than 1 step are known as leap-forward.
The code example in the question has been taken from a previous question and answer, this is an example of manually creating the extra feedback for a leap-forward lfsr.
The maths to do this can be done by generating the transition matrix and raising to the power of the number of steps we wish to move forward.
Quick 4-bit LFRS example: with transition matrix a:
a =
0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 1
Feedback is XOR of the first and last bit, seen on last row of the matrix. All other rows are just a single shift. The output of this LFSR is good for one bit. Two bits would suffer from a high correlation, unless it was overclocked.
>> a^2
ans =
0 0 1 0
0 0 0 1
1 0 0 1
1 1 0 1
If we want two bits we need to square the transition matrix. It can be seen that the first two rows are a shift of two places and we require feedback for two places, ie we are moving the LFSR forward two states for every clock.
Just for confirmation if we wanted three bits:
a^3
ans =
0 0 0 1
1 0 0 1
1 1 0 1
1 1 1 1
The second code example in the previous question went on to parameterise the code so the leap forward calculations did not have to be manually created, skipping all of that lovely maths! However the approach used meant it could not be fully parameterised. Therefore I would like to revisit the example I gave for that question:
module fibonacci_lfsr(
input clk,
input rst_n,
output [4:0] data
);
wire feedback = data[4] ^ data[1] ;
always #(posedge clk or negedge rst_n)
if (~rst_n)
data <= 4'hf;
else
data <= {data[3:0], feedback} ;
endmodule
Now we want to parameterise it:
module fibonacci_lfsr#(
parameter POLYNOMIAL = 4'h9
)(
input clk,
input rst_n,
output [4:0] data
);
//AND data with POLYNOMIAL this
// selects only the taps in the polynomial to be used.
// ^( ) performs a XOR reduction to 1 bit
always #* begin
feedback = ^( POLYNOMIAL & data);
end
//Reseting to 0 is easier
// Invert feedback, all 1's state is banned instead of all 0's
always #(posedge clk or negedge rst_n)
if (~rst_n)
data <= 'b0;
else
data <= {data[3:0], ~feedback};
endmodule
A small step now, Just bring the shift outside of the synchronous loop to help with the step after.
always #* begin
data_next = data;
feedback = ^( POLYNOMIAL & data);
data_next = {data_next[3:0], ~feedback} ; //<- Shift and feedback
end
always #(posedge clk or negedge rst_n)
if (~rst_n)
data <= 'b0;
else
data <= data_next;
TL;DR
Now to control the leap-forward iterations, let the tools do the heavy lifting of multiplying the transition matrix.
module fibonacci_lfsr#(
parameter POLYNOMIAL = 4'h9,
parameter N = 4,
parameter BITS = 2
)(
input clk,
input rst_n,
output [BITS-1:0] random
);
reg [N-1:0] data;
reg [N-1:0] data_next;
reg feedback;
assign random = data[N-1:N-BITS];
always #* begin
data_next = data;
// Compiler unrolls the loop, calculating the transition matrix
for (int i=0; i<BITS; i++) begin
feedback = ^( POLYNOMIAL & data_next);
data_next = {data_next[N-2:0], ~feedback} ;
end
end
always #(posedge clk or negedge rst_n)
if (~rst_n)
data <= 'b0;
else
data <= data_next;
endmodule
Example on EDA Playground.
i++ is part of SystemVerilog. If you can only synthesis plain (pre-2009) Verilog then you will need to declare i as an integer and use i =i+1 in the for loop.
If you want to implement an N-bit LFSR, then because each length of LFSR has a different polynomial, and hence a difference set of taps to XOR to produce the next LFSR value, you will need to have constants or a lookup table describing the different tap points, which the design could then could use, based on 'BITS'.
A simpler way to do it might be to implement say a 32-bit LFSR, then use the least significant N bits of this as your output. This has the added benefit of increasing the repetition period for anything but the maximum length LFSR, giving better randomness in these cases.
If you're going for the first option, look at whether using the Fibonacci form instead of the Galois form will make the design more conducive to parametrization in this way. I can't quite work out which form you are using in your 5-bit example.
I'm a VHDL guy*, so I can't give Verilog code, but VHDL-like-pseudocode (untested) might look like this:
constant TAPS_TABLE : TAPS_TABLE_type := (
"00000011",
"00000110",
...
);
for i in 0 to BITS-2 loop
if (TAPS_TABLE(BITS-2)(i) = '1') then
data_next(i) <= data(0) xor data(i+1)
else
data_next(i) <= data(i+1)
end if;
end for;
This would support BITS being between 2 and 8 inclusive, assuming the table was completed. The constant TAPS_TABLE would be optimised away during synthesis, leaving you with something no less resource-hungry than a manually coded LFSR.
* This question originally had a 'VHDL' tag.
In Addition to the previous answers:
Years ago, Xilinx wrote a good AppNote on how to implement 'pseudo random number generators' (PRNGs). The AppNote has a TAP table for n = 3..168. The TAP table is optimized to allow the usage of shift registers. So a PRNG with n=32 does not use 32 single FFs.
Efficient Shift Registers, LFSR Counters, and Long PseudoRandom Sequence GeneratorsXilinx [XAPP 052][1996.07.07]
I would like to instantiate an array of registers, and declare them all according to a certain function. This is for a multiplier block that I'm hoping to construct.
The code I'm working with is below, but this is the line that the compiler does not appreciate:
q[i][7:0] = {8{a[i]}} & b[7:0];
As the code is written out, I hope to make the registers q[0],q[1],....q[7] all store the 8-bit value define by the RHS above. Can anyone tell me what would be the proper way to do this?
Entire code:
`timescale 1ns / 1ps
module multiplier_2(
input [7:0] A,
input [7:0] B,
output reg [15:0] P,
input start,
output stop
);
reg [7:0] q[7:0];
reg P = 0;
//create 8 bit vectors q[i]
genvar i;
generate
for (i = 0; i < 8;i = i+1)
begin: loop
q[i][7:0] = {8{a[i]}} & b[7:0];
end
endgenerate
always # (*)
begin
if (start == 1'b1)
begin
for (i = 0; i < 8; i = i+1)
begin
P = P + (q[i] << i);
end
end
end
endmodule
EDIT: this code also doesn't work:
`timescale 1ns / 1ps
module multiplier_2(
input [7:0] a,
input [7:0] b,
output reg [15:0] P = 16'd0,
input start,
output stop
);
reg [7:0] q[7:0];
//create 8 bit vectors q[i]
genvar i;
generate
always begin
for (i = 0; i < 8;i = i+1)
begin: loop
q[i] = {8{a[i]}} & b[7:0];
end
end
endgenerate
always # (*)
begin
stop = 1'b0;
if (start == 1'b1)
begin
for (i = 0; i < 8; i = i+1)
begin
P = P + (q[i] << i);
end
end
stop = 1'b1;
end
endmodule
Error message:
"Line 16: Procedural assignment to a non-register i is not permitted, left-hand side should be reg/integer/time/genvar"
I do not think this require a generate statement. A standard for loop will work:
reg [7:0] q [0:7];
integer i;
always #* begin
for (i = 0; i < 8; i=i+1) begin: loop
q[i] = {8{a[i]}} & b[7:0];
end
end
Beware of what hardware you are implying though. For loops like generate statements imply parallel hardware.
NB: it is more common to list memories with the depth from 0 to x ie: reg [7:0] q [0:7];
You've got all sorts of issues here. First off, you're getting confused about what a generate statement is, and what you're trying to generate. Are you (1) trying to generate a single always block, which must contain sequential/procedural code, or are you (2) trying to generate/replicate 8 continuous assignments?
You're presumably not doing (1), since there's no point in generating a single always block; the generate is redundant. That leaves (2). So, get rid of the always begin after the generate. The i in your loop is now the 'genvar', or generation variable, and you're replicating 8 assignments; so far, so good. Get rid of the begin:loop and end; you're replicating a single statement, so they're pointless verbiage.
Next problem: the generate loop is now creating concurrent, or parallel, statements; in Verilog-speak, they're module-level statements. They means that they must be continuous assignments, ie they must have an assign in front of them, and not just ordinary procedural assignments, as you've written them. That also means that q must be declared as a wire, and not a reg. There's no good reason for this; it's just how Verilog is.
You now have a second always block, which is a concurrent (module-level) statement, which must contain sequential/procedural code. The i you're referring to in this block is the original genvar, which doesn't work. A genvar can only be used in specific generation-related circumstances; this isn't inside a generate, and you need an ordinary variable here as your index. you can do this by naming your outer begin/end, and declaring a variable inside it, or any other way. You'll now find out that you're creating a procedural assignment to net stop; this is illegal, so change stop's declaration to a reg. This should be enough to get your code to compile.
BTW, #(*) is verbose and unnecessary, and has historically confused at least one tool. #* is more concise.
You've got other issues. Your second always contains a loop. It looks like it might be logically correct, but your synthesiser has to unroll this, and carry out 8 additions, and set stop. This isn't going to work in real life. Think about making these additions concurrent and putting them in a generate, or creating a clocked pipeline, and some more robust (clocked) way of creating stop.
I'm creating a program counter that is supposed to use only unsigned numbers.
I have 2 STD_LOGIC_VECTOR and a couple of STD_LOGIC. Is there anything I need to do so that they only use unsigned? At the moment I only have library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
I also need to increase one of the binary vectors by 1 under certain conditions (as you probably have guessed by now). Would you be so kind to explain how to perform such actions (using unsigned and adding up one) considering one of the vectors is output with 32 bits.
I'm guessing (I tried) Output <= Output + 1; won't do. Oh and I'm using a process.
In brief, you can add the ieee.numeric_std package to your architecture (library ieee; use ieee.numeric_std.all;) and then do the addition using:
Output <= std_logic_vector(unsigned(Output) + 1);
to convert your std_logic_vector to an unsigned vector, increment it, and finally convert the result back to an std_logic_vector.
Note that if Output is an output port, this won't work because you can't access the value of an output port within the same block. If that is the case, you need to add a new signal and then assign Output from that signal, outside your process.
If you do need to add a signal, it might be simpler to make that signal a different type than std_logic_vector. For example, you could use an integer or the unsigned type above. For example:
architecture foo of bar is
signal Output_int : integer range 0 to (2**Output'length)-1;
begin
PR: process(clk, resetn)
begin
if resetn='0' then
Output_int <= 0;
elsif clk'event and clk='1' then
Output_int <= Output_int + 1;
end if;
end process;
Output <= std_logic_vector(to_unsigned(Output_int, Output'length));
end foo;
Output_int is declared with a range of valid values so that tools will be able to determine both the size of the integer as well as the range of valid values for simulation.
In the declaration of Output_int, Output'length is the width of the Output vector (as an integer), and the "**" operator is used for exponentiation, so the expression means "all unsigned integers that can be expressed with as many bits as Output has".
For example, for an Output defined as std_logic_vector(31 downto 0), Output'length is 32. 232-1 is the highest value that can be expressed with an unsigned 32-bit integer. Thus, in the example case, the range 0 to (2**Output'length)-1 resolves to the range 0...4294967295 (232=4294967296), i.e. the full unsigned range that can be expressed with 32 bits.
Note that you'll need to add any wrapping logic manually: VHDL simulators will produce an error when you've reached the maximum value and try to increment by one, even if the synthesized logic will cleanly wrap around to 0.