How to give input matrix in Verilog code? - matrix

I am writing verilog code for inserting values in 4x4 matrix
I need to collect 16 input each one in a 4x4 matrix. How can I do that?
reg [15:0]fun_out;
integer x, y;
always #(posedge clk or negede rst_n) begin
if (~rst_n) begin
for (x=0,x<4,x=x+1) begin
for (y=0,y<4,y=y+1) begin
data[0][0] <= fun_out[0];
data[0][1] <= fun_out[1];
data[0][2] <= fun_out[2];
data[0][3] <= fun_out[3];
data[1][0] <= fun_out[4];
data[2][0] <= fun_out[5];
........
........
data[4][3] <= fun_out[14];
data[4][4] <= fun_out[15];
end
end
end
else begin
data[x][y]<=4'b0;
end
end ```

Taking into account that NUM_WIDTH is the amount of bits per value inside the matrix:
localparam NUM_WIDTH = 4;
localparam NUM_X = 4;
localparam NUM_Y = 4;
We can use "+:" to index a chunk of an array. Said that, we can pack and assign the data in every row of the matrix as:
genvar I, J;
wire [(NUM_WIDTH*NUM_X)-1:0] matrix [NUM_Y-1:0];
generate
for(I = 0; I < 4; I = I + 1) begin: matrix_gen_y
for(J = 0; J < 4; J = J + 1) begin: matrix_gen_x
assign matrix[I][(J*NUM_WIDTH)+:NUM_WIDTH] = (I*NUM_X)+J+1; //..from 1 to 16
end
end
endgenerate
So, in order to index a value inside the matrix with "x" and "y" indexes:
assign value = matrix[y_idx][(x_idx*NUM_WIDTH)+:NUM_WIDTH];

Related

Difference between these algorithm representations?

Recently I've been studying algorithm analysis and in my class I've seen code like this to represent a sample algorithm:
z = 0
for x = 1; x <= n; x++ do
for y = 1; y <= n; y++ do
z = z + 1
end for
end for
I understand that these for loops can be read as "as long as x/y is equal or lower than n do the following and at the end of each cycle add 1 to x/y, then test the condition again". However in some books like the algorithm design manual I see stuff like this:
r:= 0
for i:= 1 to n do
for j:= 1 to i do
for k:= j to i + j do
r:= r + 1
return (r)
Is it esencially the same as the first example I gave or does it mean something different? It also conflicts me that in the second example there is no increments like x++ so that the loop stops after a certain number of cycles. Also why stating k:= j instead of simply k:= 1 since j=1 in this algorithm?
Clarification: I didn't mean to ask if they do the same in the sense of producing the same output, but rather refer to if the for loops in both of them work the same as in stopping after the value of n is matched when adding 1 after each cycle to for example the variable i (as it was done for x or y in the first example).
It looks like your first algorithm is using a C-like notation where it specifies the starting state, the condition to proceed, and how the state gets updated after each iteration.
The second one appears to be using loops based on iterating over a specified range: "for each of the values in this range, do the following...". This is fairly common in modern scripting languages such as Python or Ruby, and is arguably closer to how people think about iteration.
Either algorithm can be written using the other one's notation.
Algorithm 1 / style 2
z := 0
for x := 1 to n do
for y:= 1 to n do
z := z + 1
return(z) # I’m assuming you actually wanted a return value
Algorithm 2 / style 1
r = 0
for i = 1; i <= n; i++ do
for j = 1; j <= i; j++ do
for k = j; k <= i + j; k++ do
r = r + 1
end for
end for
end for
return(r)
Stick with one style or the other, and you'll see that the differences in the counts aren't because of the different pseudocode styles. These are different algorithms.
The second one is not really the same as the first one.
I read
for i;= 1 to n do
because of 1 to n as
for i = 1; i <= n; i += 1
In each j-loop k is starting from j. So k is starting from 1 only in the first
j-loop of each i-loop.
While the first algorithm is simply imaginable as rows and columns, the second one is not.
function a1(n) {
console.clear();
console.log('algorithm 1');
console.log('n = ' + n);
console.log('x | y | z');
var x, y, z = 0;
var tr, td;
for (x = 1; x <= n; x += 1) {
tr = document.createElement('tr');
for (y = 1; y <= n; y++) {
td = document.createElement('td');
z = z + 1;
console.log(x + ' | ' + y + ' | ' + z);
tr.appendChild(td);
td.innerHTML = z;
}
tbl1.appendChild(tr);
}
}
function a2(n) {
var i, j, k, r = 0;
console.clear();
console.log('algorithm 2');
console.log('n = ' + n);
console.log('i | j | k |i+j| r')
for (i = 1; i <= n; i += 1) {
for (j = 1; j <= i; j += 1) {
for (k = j; k <= (i + j); k += 1) {
r = r + 1;
console.log(i+ ' | ' + j + ' | ' + k + ' | ' + (i+j) + ' | ' + r);
}
}
}
return (r);
}
// output table 1 handle
var tbl1 = document.getElementById('output1');
// Event handlers for buttons a1 and a2
document.getElementById("a1_start").addEventListener("click", function(){
a1(5);
});
document.getElementById("a2_start").addEventListener("click", function(){
console.log('returns ' + a2(4));
});
td {
text-align: center;
border: solid 1px #000;
}
<button id="a1_start">
start algorithm 1
</button>
<button id="a2_start">
start algorithm 2
</button>
<table id="output1">
</table>

For logic implementation in System Verilog

I'm just learning HDL and I'm interested in how a for loop is implemented in System Verilog.
With the following code...
always_ff(posedge clk)
begin
for(int i = 0; i < 32; i++) s[i] = a[i] + b[i];
end
Will I end up with 32 adders in the logic and they are all executed simultaneously? Or are the additions performed sequentially somehow?
Thanks
Boscoe
Loops which can be statically unrolled (as per your example) can be synthesised.
The example you gave has to execute in a single clock cycle, there would be nothing sequential about the hardware generated:
Your example :
always_ff(posedge clk) begin
for(int i = 0; i < 32; i++) begin
s[i] <= a[i] + b[i];
end
end
Is just (32 parallel adders):
always_ff(posedge clk) begin
s[0] <= a[0] + b[0];
s[1] <= a[1] + b[1];
s[2] <= a[2] + b[2];
//...
end

How do I apply the dct values to the formula?

I am implementing an algorithm about face detection corresponding to a paper i have found. At the end of the paper it uses the dct values to take out false alarms by using some formulas. One of them is the following:
My question is: I have calculated dct values for MxN, now how do i apply them to the formula?
EDIT: So that is what you mean? (The 0 to 7 inner loops are a random part of the 100x100 dct1 array, which has only y dct, the cb,cr are not needed for the algorith)
for(i = 0; i <= M * N * (4 - 1); i++){
for(m = 0; m <= 7; m++){
for(n = 0; n <= 7; n++){
value += std::pow(dct1.at<float>(m,n),2);
}
}

Using multiple genvar in Verilog loop

Is It possible using different "genvar" in a loop? Is there an alternative mode to realize it?
I try with this example:
genvar i;
genvar j;
genvar k;
generate
k=0;
for (i = 0; i < N; i = i + 1)
begin: firstfor
for (j = 0; j < N; j = j + 1)
begin: secondfor
if(j>i)
begin
assign a[i+j*N] = in[i] && p[k];
k=k+1;
end
end
end
endgenerate
And when I run "Check Syntax" display this error:
Syntax error near "=". (k=k+1)
I like this question because unless very familiar with generates it looks like it should work, however there is a similar question which tries to use an extra genvar.
The syntax is not allowed because of how the generates are unrolled. Integers can only be used inside always/initial processes.
If it is just combinatorial wiring rather than parametrised instantiation you might be able to do what you need just using integers (I would not normally recommend this):
integer i;
integer j;
integer k;
localparam N = 2;
reg [N*N:0] a ;
reg [N*N:0] in ;
reg [N*N:0] p ;
always #* begin
k=0;
for (i = 0; i < N; i = i + 1) begin: firstfor
for (j = 0; j < N; j = j + 1) begin: secondfor
if(j>i) begin
a[i+j*N] = in[i] && p[k];
k=k+1;
end
end
end
end
Not sure how synthesis will like this but the assignments are static, it might work.
It is possible to avoid always #* when you want to do more advanced math with genvar loops. Use localparam and a function.
Make k a localparam derived from the genvars with a function, and use k as originally intended.
The getk function seems to violate the principles of code reuse by basically recreating the loops from the generate block, but getk allows each unrolled loop iteration to derive the immutable localparam k from genvars i and j. There is no separate accumulating variable k that is tracked across all the unrolled loops. Both iverilog and ncvlog are happy with this.
(Note that the original example could optimize with j=i+1 as well, but there is still an issue with deriving k.)
module top();
localparam N=4;
function automatic integer getk;
input integer istop;
input integer jstop;
integer i,j,k;
begin
k=0;
for (i=0; i<=istop; i=i+1) begin: firstfor
for (j=i+1; j<((i==istop)? jstop : N); j=j+1) begin: secondfor
k=k+1;
end
end
getk=k;
end
endfunction
genvar i,j;
generate
for (i = 0; i < N; i = i + 1) begin: firstfor
for (j = i+1; j < N; j = j + 1) begin: secondfor
localparam k = getk(i,j);
initial $display("Created i=%0d j=%0d k=%0d",i,j,k);
end
end
endgenerate
endmodule
Output:
$ iverilog tmptest.v
$ ./a.out
Created i=0 j=1 k=0
Created i=0 j=2 k=1
Created i=0 j=3 k=2
Created i=1 j=2 k=3
Created i=1 j=3 k=4
Created i=2 j=3 k=5
I discovered the 'trick' of using functions to derive values from the genvars here:
https://electronics.stackexchange.com/questions/53327/generate-gate-with-a-parametrized-number-of-inputs

ASCII to Integer conversion in Verilog

I have a sequence of ASCII characters arriving sequentially from a UART. I want to convert from ASCII to the represented integers. For example, I receive 123 which is { 8'h31, 8'h32, 8'h33 } and I want to convert it to 8'h7B. Can anyone provide assistance?
I assume you are asking about synthesizable RTL to convert a sequence of ASCII coded characters into a corresponding number. If so, there are generally two ways of doing this — sequentially process the stream and convert each input into a binary, multiple accumulated value by ten, and add the current number. However, this method is very slow. If you have all of the input handy, you can simply use a lookup table to convert the input "string" into a binary number. Below is an example that I sketched up some time back:
/*
* An example of converting an ASCII string into binary using lookup tables.
*
* Copyright (C) 2012 Vlad Lazarenko <vlad#lazarenko.me>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
// synopsys translate_off
`timescale 1 ns / 1 ps
// synopsys translate_on
/*
* Using carefully chosen minimal size for registers effectively increases fMAX.
* However, synthesis tool complain about result being truncated. It is possible
* to use full 32-bit registers and provide false paths, but easier to just
* disable the warning.
*/
// altera message_off 10230
/*
* Convert unsigned 32-bit ASCII number representation into binary.
*
* Data bus width is 80 bit for value (up to 10 characters).
* Empty flag is 4 bit wide.
* Output latency is 4 cycles.
* fMAX on Arria II device is a little above 300 MHz.
* # 300 MHz, the throughput is ~22.32 Gbps (or ~2.79 GBps).
*/
module ascii_to_binary(clk, reset_n, data, mod, result);
input clk;
input reset_n;
input [79:0] data;
input [3:0] mod;
output reg [31:0] result;
// Convert a single ASCII digit into a corresponding
// 4-bit binary number by subtracting 48.
function [3:0] c2i;
input [7:0] c;
reg [7:0] i;
begin
i = (c - 6'd48);
c2i = i[3:0];
end
endfunction
// Convert a single ASCII digit into a corresponding
// 4-bit binary number by subtracting 48 and multiply
// the result. Multiplication is used to normalize
// a single digit number depending on its position
// in the input data sequence. For example, this function
// can be used to transform a sequence of ASCII digits
// like this: '1', '2', '3' into a digits like 100, 20, 3.
// This function can potentially use multipliers instead of
// lookup table. However, using multipliers can reduce fMAX.
function [31:0] c2d;
input [7:0] c;
input integer m;
reg [31:0] d;
begin
case (c2i(c))
4'd0: d = 0; // "0" is always 0
4'd1: d = m; // Multiplying 1 by "m" always yields "m"
4'd2: d = m * 2;
4'd3: d = m * 3;
4'd4: d = m * 4;
4'd5: d = m * 5;
4'd6: d = m * 6;
4'd7: d = m * 7;
4'd8: d = m * 8;
4'd9: d = m * 9;
4'd10: d = 0; // Don't care (false path)
4'd11: d = 0; // Don't care (false path)
4'd12: d = 0; // Don't care (false path)
4'd13: d = 0; // Don't care (false path)
4'd14: d = 0; // Don't care (false path)
4'd15: d = 0; // Don't care (false path)
endcase
c2d = d[31:0];
end
endfunction
// Stage 1 registers. Each word holds a single converted
// and adjusted/normalized digit.
reg [31:0] m9;
reg [31:0] m8;
reg [26:0] m7;
reg [23:0] m6;
reg [19:0] m5;
reg [16:0] m4;
reg [13:0] m3;
reg [9:0] m2;
reg [6:0] m1;
reg [3:0] m0;
// Stage 2 sum registers.
reg [31:0] s0;
reg [31:0] s1;
reg [31:0] s2;
reg [31:0] s3;
// Stage 3 sum registers.
reg [31:0] s4;
reg [31:0] s5;
always # (posedge clk or negedge reset_n) begin
if (!reset_n) begin
m0 <= 0;
m1 <= 0;
m2 <= 0;
m3 <= 0;
m4 <= 0;
m5 <= 0;
m6 <= 0;
m7 <= 0;
m8 <= 0;
m9 <= 0;
s0 <= 0;
s1 <= 0;
s2 <= 0;
s3 <= 0;
s4 <= 0;
s5 <= 0;
result <= 0;
end else begin
/*
* Pipeline stage #1: Convert every ASCII character into a binary
* number, normalize every number depending on the word position
* and valid input data length. For example:
* - '1', '2' turns into 10 and 2.
* - '1', '2', '3' turns into 100, 20 and 3.
* - '1', '2', '3', '4' turns into 1000, 200, 30 and 4
*/
/*
* Empty signal is 4 bit wide, but its valid range is from 0 to 9.
* When MSB in empty signal is low, only 3 bits are compared using
* a full case. Otherwise, LSB is checked to differentiate between
* 8 and 9 (4'b1000 and 4'b0001).
*/
if (mod[3:3] == 1'b0) begin
case (mod[2:0])
3'd0: begin
m9 <= c2d(data[79:72], 1000000000);
m8 <= c2d(data[71:64], 100000000);
m7 <= c2d(data[63:56], 10000000);
m6 <= c2d(data[55:48], 1000000);
m5 <= c2d(data[47:40], 100000);
m4 <= c2d(data[39:32], 10000);
m3 <= c2d(data[31:24], 1000);
m2 <= c2d(data[23:16], 100);
m1 <= c2d(data[15:8], 10);
m0 <= c2i(data[7:0]);
end
3'd1: begin
m9 <= c2d(data[79:72], 100000000);
m8 <= c2d(data[71:64], 10000000);
m7 <= c2d(data[63:56], 1000000);
m6 <= c2d(data[55:48], 100000);
m5 <= c2d(data[47:40], 10000);
m4 <= c2d(data[39:32], 1000);
m3 <= c2d(data[31:24], 100);
m2 <= c2d(data[23:16], 10);
m1 <= c2i(data[15:8]);
m0 <= 0;
end
3'd2: begin
m9 <= c2d(data[79:72], 10000000);
m8 <= c2d(data[71:64], 1000000);
m7 <= c2d(data[63:56], 100000);
m6 <= c2d(data[55:48], 10000);
m5 <= c2d(data[47:40], 1000);
m4 <= c2d(data[39:32], 100);
m3 <= c2d(data[31:24], 10);
m2 <= c2i(data[23:16]);
m1 <= 0;
m0 <= 0;
end
3'd3: begin
m9 <= c2d(data[79:72], 1000000);
m8 <= c2d(data[71:64], 100000);
m7 <= c2d(data[63:56], 10000);
m6 <= c2d(data[55:48], 1000);
m5 <= c2d(data[47:40], 100);
m4 <= c2d(data[39:32], 10);
m3 <= c2i(data[31:24]);
m2 <= 0;
m1 <= 0;
m0 <= 0;
end
3'd4: begin
m9 <= c2d(data[79:72], 100000);
m8 <= c2d(data[71:64], 10000);
m7 <= c2d(data[63:56], 1000);
m6 <= c2d(data[55:48], 100);
m5 <= c2d(data[47:40], 10);
m4 <= c2i(data[39:32]);
m3 <= 0;
m2 <= 0;
m1 <= 0;
m0 <= 0;
end
3'd5: begin
m9 <= c2d(data[79:72], 10000);
m8 <= c2d(data[71:64], 1000);
m7 <= c2d(data[63:56], 100);
m6 <= c2d(data[55:48], 10);
m5 <= c2i(data[47:40]);
m4 <= 0;
m3 <= 0;
m2 <= 0;
m1 <= 0;
m0 <= 0;
end
3'd6: begin
m9 <= c2d(data[79:72], 1000);
m8 <= c2d(data[71:64], 100);
m7 <= c2d(data[63:56], 10);
m6 <= c2i(data[55:48]);
m5 <= 0;
m4 <= 0;
m3 <= 0;
m2 <= 0;
m1 <= 0;
m0 <= 0;
end
3'd7: begin
m9 <= c2d(data[79:72], 100);
m8 <= c2d(data[71:64], 10);
m7 <= c2i(data[63:56]);
m6 <= 0;
m5 <= 0;
m4 <= 0;
m3 <= 0;
m2 <= 0;
m1 <= 0;
m0 <= 0;
end
endcase
end else begin
case (mod[0:0])
1'b0: begin
m9 <= c2d(data[79:72], 10);
m8 <= c2i(data[71:64]);
end
1'b1: begin
m9 <= c2i(data[79:72]);
m8 <= 0;
end
endcase
m7 <= 0;
m6 <= 0;
m5 <= 0;
m4 <= 0;
m3 <= 0;
m2 <= 0;
m1 <= 0;
m0 <= 0;
end
// Pipeline stage #2: Sum up numbers from the previous step.
s0 <= m9 + m0;
s1 <= m8 + m1;
s2 <= m7 + (m2 + m3);
s3 <= m6 + (m4 + m5);
// Pipeline stage #3: Sum previous partial sums.
s4 <= (s0 + s1);
s5 <= (s2 + s3);
// Last pipeline stage #3: Sum previous partial sums.
// This yields a 32-bit result.
result <= (s4 + s5);
end
end
endmodule
You can find more details with (synthesizable) implementations for both methods along with a test bench, waveforms and even some software examples in my ASCII Horror — String to Binary Conversion article.
Hope it helps. Good Luck!
If your compiler supports SystemVerilog, you can use atoi function:
str.atoi() returns the integer corresponding to the ASCII decimal
representation in str . For example:
string str = "123";
int i = str.atoi(); // assigns 123 to i.
Otherwise, you need to write your own atoi function using a method similar to Ross's suggestion.
Here a small example of code:
First, an example to create a byte dynamic array from a string.
The dynamic array of bytes contains the ASCII CODE number representation of each character.
The advantage is that the dynamic array of bytes can be randomized but strings cannot be randomized.
(created doing e.g.
stringvar ="This is a example text";
rand byte byte_din_array[];
for(i=0;i<stringvar.len(); i++) begin
byte_din_array = {byte_din_array ,stringvar[i]};
//stringvar[i] will return empty byte if the index would be beyond the string length
//The advantage of using stringvar[i] instead of stringvar.atoi(i) is that
//the string can have all ASCII characters and not just numbers.
//Disadvantage is that the byte contains the ASCII CODE "number"
//representation of the character and that is not human readable
end
).
Here is the example to convert the dynamic array of bytes back in a concatenated string.
You may have used the previous dynamic array to be partly randomized (with constraints) inside an xfer or changed in post_randomize.
function string convert_byte_array2string(byte stringdescriptionholder[]);
automatic string temp_str="";
automatic byte byte_temp;
automatic string str_test;
for ( int unsigned i = 0; i<stringdescriptionholder.size(); i++) begin
i=i;//debug breakpoint
byte_temp = stringdescriptionholder[i];
str_test = string'(byte_temp); //the "string cast" will convert the numeric ASCII representation in a string character
temp_str = {temp_str,str_test};
end
return temp_str;
endfunction
If you want more information about strings i recommend to read the section 3.7 of the SystemVerilog 3.1a Language Reference Manual (LRM) Accellera’s Extensions to Verilog.
It is about the string data types and explain the built-in methods used with string data types.
You can also find information under section 6.16 of the IEEE Standard for SystemVerilog—Unified Hardware Design, Specification, and Verification Language/IEEE Std 1800™-2012. Probably, more detailed explanation than in LRM.

Resources