Related
I am writing verilog code for inserting values in 4x4 matrix
I need to collect 16 input each one in a 4x4 matrix. How can I do that?
reg [15:0]fun_out;
integer x, y;
always #(posedge clk or negede rst_n) begin
if (~rst_n) begin
for (x=0,x<4,x=x+1) begin
for (y=0,y<4,y=y+1) begin
data[0][0] <= fun_out[0];
data[0][1] <= fun_out[1];
data[0][2] <= fun_out[2];
data[0][3] <= fun_out[3];
data[1][0] <= fun_out[4];
data[2][0] <= fun_out[5];
........
........
data[4][3] <= fun_out[14];
data[4][4] <= fun_out[15];
end
end
end
else begin
data[x][y]<=4'b0;
end
end ```
Taking into account that NUM_WIDTH is the amount of bits per value inside the matrix:
localparam NUM_WIDTH = 4;
localparam NUM_X = 4;
localparam NUM_Y = 4;
We can use "+:" to index a chunk of an array. Said that, we can pack and assign the data in every row of the matrix as:
genvar I, J;
wire [(NUM_WIDTH*NUM_X)-1:0] matrix [NUM_Y-1:0];
generate
for(I = 0; I < 4; I = I + 1) begin: matrix_gen_y
for(J = 0; J < 4; J = J + 1) begin: matrix_gen_x
assign matrix[I][(J*NUM_WIDTH)+:NUM_WIDTH] = (I*NUM_X)+J+1; //..from 1 to 16
end
end
endgenerate
So, in order to index a value inside the matrix with "x" and "y" indexes:
assign value = matrix[y_idx][(x_idx*NUM_WIDTH)+:NUM_WIDTH];
Good day to all! I wrote the Shell sort verification code, but I can’t build the correct loop invariants.It is not possible to correctly compose invariants and prove the correctness of the program... Please help me!
/*# predicate Sorted{L}(int* a, integer m, integer n) =
# \forall integer i, j; m <= i <= j < n ==> a[i] <= a[j];
*/
/*# predicate GapSorted(int* a, integer m, integer n, integer gap) =
# \forall integer i, j; (m <= i <= j < n && j % gap == i % gap) ==> a[i] <=a[j];
*/
/*#
# requires \valid(arr + (0..n-1));
# requires n > 1;
# ensures GapSorted(arr, 0, n, 1);
*/
void shell_lr(int *arr, int n) {
int i, j, tmp, gap;
/*# ghost int gap1 = n
# loop invariant 0 <= gap1 <= n/2;
# loop invariant gap1 < n/2 ==> GapSorted(arr, 0, n, gap+1);
# //loop invariant \forall integer k; gap < k <= n/2 ==> GapSorted(arr, 0, n, k);
# loop variant gap1;
*/
for (gap = n / 2; gap > 0; gap--) {
/*# loop invariant 0 <= i <= n;
# //loop invariant \forall integer m; gap < m <= n/2 ==> GapSorted(arr, 0, i, m);
# loop invariant GapSorted(arr, 0, i, gap);
# loop variant n - i; */
for (i = gap; i < n; i++) {
tmp = arr[i];
/*#
# loop invariant 0 <= j <= i;
# //loop invariant arr[j] >= tmp;
# loop invariant \forall integer k; (j < k <= i) ==> GapSorted(arr, 0, i, k);
#// loop invariant \forall integer k; j <= k <= gap ==> GapSorted(arr, k, i, gap);
# loop variant j;
#*/
for (j = i; j >= gap && arr[j - gap] > tmp; j -= gap) {
arr[j] = arr[j - gap];
//# assert arr[j] >= arr[j - gap];
//# assert tmp < arr[j - gap];
}
//# assert j>=0;
arr[j] = tmp;
}
//# assert i == n;
//# assert GapSorted(arr, 0, i, gap);
//# assert gap > 0;
// assert GapSorted(arr, 0, n, gap);
}
First, the code you provided is not syntactically correct:
a brace closing the function's body is missing
the ghost declaration of gap1 cannot be mixed with the loop annotations. This should be two distinct annotations. Anyway, I don't see the use of gap1 (especially as it is not updated inside the loop), everything can be expressed with gap. If what you were trying to achieve was to have a local variable for the whole loop annotation, this is not possible in ACSL: you have the \let gap1 = ...; ... construction, but its scope is only a single term/predicate: you can't share it with across two loop invariants (or loop invariants and loop variant)
Now, your most pressing issue here is the lack of loop assigns in your loop annotations. You must provide such clauses for all your loops, or WP will not be able to assume much about the state of the program after the loops (see for instance this answer for more detail). You might also want to strengthen a bit your invariant on i in the middle loop as gap <= i <= n, but this is a detail.
With the following loop assigns, most of your annotations gets proved (Frama-C 20.0 Calcium with -wp -wp-rte).
/*# predicate Sorted{L}(int* a, integer m, integer n) =
# \forall integer i, j; m <= i <= j < n ==> a[i] <= a[j];
*/
/*# predicate GapSorted(int* a, integer m, integer n, integer gap) =
# \forall integer i, j; (m <= i <= j < n && j % gap == i % gap) ==> a[i] <=a[j];
*/
/*#
# requires \valid(arr + (0..n-1));
# requires n > 1;
# ensures GapSorted(arr, 0, n, 1);
*/
void shell_lr(int *arr, int n) {
int i, j, tmp, gap;
/*#
# loop invariant 0 <= gap <= n/2;
# loop invariant gap < n/2 ==> GapSorted(arr, 0, n, gap+1);
# loop assigns gap, i, j, tmp, arr[0 .. n - 1];
# //loop invariant \forall integer k; gap < k <= n/2 ==> GapSorted(arr, 0, n, k);
# loop variant gap;
*/
for (gap = n / 2; gap > 0; gap--) {
/*# loop invariant gap <= i <= n;
# //loop invariant \forall integer m; gap < m <= n/2 ==> GapSorted(arr, 0, i, m);
# loop invariant GapSorted(arr, 0, i, gap);
loop assigns i,j,tmp,arr[0..n-1];
# loop variant n - i; */
for (i = gap; i < n; i++) {
tmp = arr[i];
/*#
# loop invariant 0 <= j <= i;
# //loop invariant arr[j] >= tmp;
# loop invariant \forall integer k; (j < k <= i) ==> GapSorted(arr, 0, i, k);
#// loop invariant \forall integer k; j <= k <= gap ==> GapSorted(arr, k, i, gap);
loop assigns j, arr[gap .. i];
# loop variant j;
#*/
for (j = i; j >= gap && arr[j - gap] > tmp; j -= gap) {
arr[j] = arr[j - gap];
//# assert arr[j] >= arr[j - gap];
//# assert tmp < arr[j - gap];
}
//# assert j>=0;
arr[j] = tmp;
}
//# assert i == n;
//# assert GapSorted(arr, 0, i, gap);
//# assert gap > 0;
// assert GapSorted(arr, 0, n, gap);
}
}
What remains to be proved are the GapSorted invariants for the two inner loops, which probably require more work and an answer much longer than what is fit for this format.
As a first step into OpenMP I set myself a challenge to parallelize some matrix decomposition algorithm. I picked Crout with pivoting, source can be found here:
http://www.mymathlib.com/c_source/matrices/linearsystems/crout_pivot.c
At the bottom of that decomposition function there's an outer for loop that walks over i and p_row at the same time. Of course OpenMP is as confused as I am when looking at this and refuses to do anything with it.
After wrapping my mind around it I think I got it untangled into readable form:
p_row = p_k + n;
for (i = k+1; i < n; i++) {
for (j = k+1; j < n; j++) *(p_row + j) -= *(p_row + k) * *(p_k + j);
p_row += n;
}
At this point serial run still comes up with the same result as the original code.
Then I add some pragmas, like this:
p_row = p_k + n;
#pragma omp parallel for private (i,j) shared (n,k,p_row,p_k)
for (i = k+1; i < n; i++) {
for (j = k+1; j < n; j++) *(p_row + j) -= *(p_row + k) * *(p_k + j);
#pragma omp critical
p_row += n;
#pragma omp flush(p_row)
}
Yet the results are essentially random.
What am I missing?
I haven't tested your adaptation of original code, but your program has several problems.
#pragma omp parallel for private (i,j) shared (n,k,p_row,p_k)
Default behavior is to have vars declared outside of scope shared, so the shared declaration is useless.
But these var should not be shared and rendered private.
n is unchanged during iterations, so better have a local copy
ditto for k and p_k
p_row is modified, but you really want several copies of p_row. This what will insure a proper parallel processing, so that each thread processes different rows. The problem is to compute p_row value in the different threads.
In the outer loop, iteration 0 will use p_row, second iteration p_row+n, iteration l p_row+l*n. Your iterations will be spread over several threads. Assume each thread processes m iterations. Thread 0 will process i=k+1 to i=m+(k+1) and p_row to p_row+m*n, thread 1 i=m+1+(k+1) to i=2m+(k+1) and p_row+n*(m+1) to p_row+2*m*n, etc. Hence you can compute the value that should have p_row at the start of the loop with the value of i.
Here is a possible implementation
p_row = p_k + n;
#pragma omp parallel for private(i,j) firstprivate(n, k, p_row, p_k)
// first private insures initial values are kept
{
for (i = k+1, p_row=p_row+(i-(k+1))*n; i < n; i++, p_row += n) {
for (j = k+1; j < n; j++)
*(p_row + j) -= *(p_row + k) * *(p_k + j);
}
p_row incrementation is in the for loop. This should continue to work in a sequential environment.
Critical is useless (and was buggy in your previous code). Flush is implicit at the end of a parallel section (and the pragma is just "omp flush").
Is It possible using different "genvar" in a loop? Is there an alternative mode to realize it?
I try with this example:
genvar i;
genvar j;
genvar k;
generate
k=0;
for (i = 0; i < N; i = i + 1)
begin: firstfor
for (j = 0; j < N; j = j + 1)
begin: secondfor
if(j>i)
begin
assign a[i+j*N] = in[i] && p[k];
k=k+1;
end
end
end
endgenerate
And when I run "Check Syntax" display this error:
Syntax error near "=". (k=k+1)
I like this question because unless very familiar with generates it looks like it should work, however there is a similar question which tries to use an extra genvar.
The syntax is not allowed because of how the generates are unrolled. Integers can only be used inside always/initial processes.
If it is just combinatorial wiring rather than parametrised instantiation you might be able to do what you need just using integers (I would not normally recommend this):
integer i;
integer j;
integer k;
localparam N = 2;
reg [N*N:0] a ;
reg [N*N:0] in ;
reg [N*N:0] p ;
always #* begin
k=0;
for (i = 0; i < N; i = i + 1) begin: firstfor
for (j = 0; j < N; j = j + 1) begin: secondfor
if(j>i) begin
a[i+j*N] = in[i] && p[k];
k=k+1;
end
end
end
end
Not sure how synthesis will like this but the assignments are static, it might work.
It is possible to avoid always #* when you want to do more advanced math with genvar loops. Use localparam and a function.
Make k a localparam derived from the genvars with a function, and use k as originally intended.
The getk function seems to violate the principles of code reuse by basically recreating the loops from the generate block, but getk allows each unrolled loop iteration to derive the immutable localparam k from genvars i and j. There is no separate accumulating variable k that is tracked across all the unrolled loops. Both iverilog and ncvlog are happy with this.
(Note that the original example could optimize with j=i+1 as well, but there is still an issue with deriving k.)
module top();
localparam N=4;
function automatic integer getk;
input integer istop;
input integer jstop;
integer i,j,k;
begin
k=0;
for (i=0; i<=istop; i=i+1) begin: firstfor
for (j=i+1; j<((i==istop)? jstop : N); j=j+1) begin: secondfor
k=k+1;
end
end
getk=k;
end
endfunction
genvar i,j;
generate
for (i = 0; i < N; i = i + 1) begin: firstfor
for (j = i+1; j < N; j = j + 1) begin: secondfor
localparam k = getk(i,j);
initial $display("Created i=%0d j=%0d k=%0d",i,j,k);
end
end
endgenerate
endmodule
Output:
$ iverilog tmptest.v
$ ./a.out
Created i=0 j=1 k=0
Created i=0 j=2 k=1
Created i=0 j=3 k=2
Created i=1 j=2 k=3
Created i=1 j=3 k=4
Created i=2 j=3 k=5
I discovered the 'trick' of using functions to derive values from the genvars here:
https://electronics.stackexchange.com/questions/53327/generate-gate-with-a-parametrized-number-of-inputs
I have a sequence of ASCII characters arriving sequentially from a UART. I want to convert from ASCII to the represented integers. For example, I receive 123 which is { 8'h31, 8'h32, 8'h33 } and I want to convert it to 8'h7B. Can anyone provide assistance?
I assume you are asking about synthesizable RTL to convert a sequence of ASCII coded characters into a corresponding number. If so, there are generally two ways of doing this — sequentially process the stream and convert each input into a binary, multiple accumulated value by ten, and add the current number. However, this method is very slow. If you have all of the input handy, you can simply use a lookup table to convert the input "string" into a binary number. Below is an example that I sketched up some time back:
/*
* An example of converting an ASCII string into binary using lookup tables.
*
* Copyright (C) 2012 Vlad Lazarenko <vlad#lazarenko.me>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
// synopsys translate_off
`timescale 1 ns / 1 ps
// synopsys translate_on
/*
* Using carefully chosen minimal size for registers effectively increases fMAX.
* However, synthesis tool complain about result being truncated. It is possible
* to use full 32-bit registers and provide false paths, but easier to just
* disable the warning.
*/
// altera message_off 10230
/*
* Convert unsigned 32-bit ASCII number representation into binary.
*
* Data bus width is 80 bit for value (up to 10 characters).
* Empty flag is 4 bit wide.
* Output latency is 4 cycles.
* fMAX on Arria II device is a little above 300 MHz.
* # 300 MHz, the throughput is ~22.32 Gbps (or ~2.79 GBps).
*/
module ascii_to_binary(clk, reset_n, data, mod, result);
input clk;
input reset_n;
input [79:0] data;
input [3:0] mod;
output reg [31:0] result;
// Convert a single ASCII digit into a corresponding
// 4-bit binary number by subtracting 48.
function [3:0] c2i;
input [7:0] c;
reg [7:0] i;
begin
i = (c - 6'd48);
c2i = i[3:0];
end
endfunction
// Convert a single ASCII digit into a corresponding
// 4-bit binary number by subtracting 48 and multiply
// the result. Multiplication is used to normalize
// a single digit number depending on its position
// in the input data sequence. For example, this function
// can be used to transform a sequence of ASCII digits
// like this: '1', '2', '3' into a digits like 100, 20, 3.
// This function can potentially use multipliers instead of
// lookup table. However, using multipliers can reduce fMAX.
function [31:0] c2d;
input [7:0] c;
input integer m;
reg [31:0] d;
begin
case (c2i(c))
4'd0: d = 0; // "0" is always 0
4'd1: d = m; // Multiplying 1 by "m" always yields "m"
4'd2: d = m * 2;
4'd3: d = m * 3;
4'd4: d = m * 4;
4'd5: d = m * 5;
4'd6: d = m * 6;
4'd7: d = m * 7;
4'd8: d = m * 8;
4'd9: d = m * 9;
4'd10: d = 0; // Don't care (false path)
4'd11: d = 0; // Don't care (false path)
4'd12: d = 0; // Don't care (false path)
4'd13: d = 0; // Don't care (false path)
4'd14: d = 0; // Don't care (false path)
4'd15: d = 0; // Don't care (false path)
endcase
c2d = d[31:0];
end
endfunction
// Stage 1 registers. Each word holds a single converted
// and adjusted/normalized digit.
reg [31:0] m9;
reg [31:0] m8;
reg [26:0] m7;
reg [23:0] m6;
reg [19:0] m5;
reg [16:0] m4;
reg [13:0] m3;
reg [9:0] m2;
reg [6:0] m1;
reg [3:0] m0;
// Stage 2 sum registers.
reg [31:0] s0;
reg [31:0] s1;
reg [31:0] s2;
reg [31:0] s3;
// Stage 3 sum registers.
reg [31:0] s4;
reg [31:0] s5;
always # (posedge clk or negedge reset_n) begin
if (!reset_n) begin
m0 <= 0;
m1 <= 0;
m2 <= 0;
m3 <= 0;
m4 <= 0;
m5 <= 0;
m6 <= 0;
m7 <= 0;
m8 <= 0;
m9 <= 0;
s0 <= 0;
s1 <= 0;
s2 <= 0;
s3 <= 0;
s4 <= 0;
s5 <= 0;
result <= 0;
end else begin
/*
* Pipeline stage #1: Convert every ASCII character into a binary
* number, normalize every number depending on the word position
* and valid input data length. For example:
* - '1', '2' turns into 10 and 2.
* - '1', '2', '3' turns into 100, 20 and 3.
* - '1', '2', '3', '4' turns into 1000, 200, 30 and 4
*/
/*
* Empty signal is 4 bit wide, but its valid range is from 0 to 9.
* When MSB in empty signal is low, only 3 bits are compared using
* a full case. Otherwise, LSB is checked to differentiate between
* 8 and 9 (4'b1000 and 4'b0001).
*/
if (mod[3:3] == 1'b0) begin
case (mod[2:0])
3'd0: begin
m9 <= c2d(data[79:72], 1000000000);
m8 <= c2d(data[71:64], 100000000);
m7 <= c2d(data[63:56], 10000000);
m6 <= c2d(data[55:48], 1000000);
m5 <= c2d(data[47:40], 100000);
m4 <= c2d(data[39:32], 10000);
m3 <= c2d(data[31:24], 1000);
m2 <= c2d(data[23:16], 100);
m1 <= c2d(data[15:8], 10);
m0 <= c2i(data[7:0]);
end
3'd1: begin
m9 <= c2d(data[79:72], 100000000);
m8 <= c2d(data[71:64], 10000000);
m7 <= c2d(data[63:56], 1000000);
m6 <= c2d(data[55:48], 100000);
m5 <= c2d(data[47:40], 10000);
m4 <= c2d(data[39:32], 1000);
m3 <= c2d(data[31:24], 100);
m2 <= c2d(data[23:16], 10);
m1 <= c2i(data[15:8]);
m0 <= 0;
end
3'd2: begin
m9 <= c2d(data[79:72], 10000000);
m8 <= c2d(data[71:64], 1000000);
m7 <= c2d(data[63:56], 100000);
m6 <= c2d(data[55:48], 10000);
m5 <= c2d(data[47:40], 1000);
m4 <= c2d(data[39:32], 100);
m3 <= c2d(data[31:24], 10);
m2 <= c2i(data[23:16]);
m1 <= 0;
m0 <= 0;
end
3'd3: begin
m9 <= c2d(data[79:72], 1000000);
m8 <= c2d(data[71:64], 100000);
m7 <= c2d(data[63:56], 10000);
m6 <= c2d(data[55:48], 1000);
m5 <= c2d(data[47:40], 100);
m4 <= c2d(data[39:32], 10);
m3 <= c2i(data[31:24]);
m2 <= 0;
m1 <= 0;
m0 <= 0;
end
3'd4: begin
m9 <= c2d(data[79:72], 100000);
m8 <= c2d(data[71:64], 10000);
m7 <= c2d(data[63:56], 1000);
m6 <= c2d(data[55:48], 100);
m5 <= c2d(data[47:40], 10);
m4 <= c2i(data[39:32]);
m3 <= 0;
m2 <= 0;
m1 <= 0;
m0 <= 0;
end
3'd5: begin
m9 <= c2d(data[79:72], 10000);
m8 <= c2d(data[71:64], 1000);
m7 <= c2d(data[63:56], 100);
m6 <= c2d(data[55:48], 10);
m5 <= c2i(data[47:40]);
m4 <= 0;
m3 <= 0;
m2 <= 0;
m1 <= 0;
m0 <= 0;
end
3'd6: begin
m9 <= c2d(data[79:72], 1000);
m8 <= c2d(data[71:64], 100);
m7 <= c2d(data[63:56], 10);
m6 <= c2i(data[55:48]);
m5 <= 0;
m4 <= 0;
m3 <= 0;
m2 <= 0;
m1 <= 0;
m0 <= 0;
end
3'd7: begin
m9 <= c2d(data[79:72], 100);
m8 <= c2d(data[71:64], 10);
m7 <= c2i(data[63:56]);
m6 <= 0;
m5 <= 0;
m4 <= 0;
m3 <= 0;
m2 <= 0;
m1 <= 0;
m0 <= 0;
end
endcase
end else begin
case (mod[0:0])
1'b0: begin
m9 <= c2d(data[79:72], 10);
m8 <= c2i(data[71:64]);
end
1'b1: begin
m9 <= c2i(data[79:72]);
m8 <= 0;
end
endcase
m7 <= 0;
m6 <= 0;
m5 <= 0;
m4 <= 0;
m3 <= 0;
m2 <= 0;
m1 <= 0;
m0 <= 0;
end
// Pipeline stage #2: Sum up numbers from the previous step.
s0 <= m9 + m0;
s1 <= m8 + m1;
s2 <= m7 + (m2 + m3);
s3 <= m6 + (m4 + m5);
// Pipeline stage #3: Sum previous partial sums.
s4 <= (s0 + s1);
s5 <= (s2 + s3);
// Last pipeline stage #3: Sum previous partial sums.
// This yields a 32-bit result.
result <= (s4 + s5);
end
end
endmodule
You can find more details with (synthesizable) implementations for both methods along with a test bench, waveforms and even some software examples in my ASCII Horror — String to Binary Conversion article.
Hope it helps. Good Luck!
If your compiler supports SystemVerilog, you can use atoi function:
str.atoi() returns the integer corresponding to the ASCII decimal
representation in str . For example:
string str = "123";
int i = str.atoi(); // assigns 123 to i.
Otherwise, you need to write your own atoi function using a method similar to Ross's suggestion.
Here a small example of code:
First, an example to create a byte dynamic array from a string.
The dynamic array of bytes contains the ASCII CODE number representation of each character.
The advantage is that the dynamic array of bytes can be randomized but strings cannot be randomized.
(created doing e.g.
stringvar ="This is a example text";
rand byte byte_din_array[];
for(i=0;i<stringvar.len(); i++) begin
byte_din_array = {byte_din_array ,stringvar[i]};
//stringvar[i] will return empty byte if the index would be beyond the string length
//The advantage of using stringvar[i] instead of stringvar.atoi(i) is that
//the string can have all ASCII characters and not just numbers.
//Disadvantage is that the byte contains the ASCII CODE "number"
//representation of the character and that is not human readable
end
).
Here is the example to convert the dynamic array of bytes back in a concatenated string.
You may have used the previous dynamic array to be partly randomized (with constraints) inside an xfer or changed in post_randomize.
function string convert_byte_array2string(byte stringdescriptionholder[]);
automatic string temp_str="";
automatic byte byte_temp;
automatic string str_test;
for ( int unsigned i = 0; i<stringdescriptionholder.size(); i++) begin
i=i;//debug breakpoint
byte_temp = stringdescriptionholder[i];
str_test = string'(byte_temp); //the "string cast" will convert the numeric ASCII representation in a string character
temp_str = {temp_str,str_test};
end
return temp_str;
endfunction
If you want more information about strings i recommend to read the section 3.7 of the SystemVerilog 3.1a Language Reference Manual (LRM) Accellera’s Extensions to Verilog.
It is about the string data types and explain the built-in methods used with string data types.
You can also find information under section 6.16 of the IEEE Standard for SystemVerilog—Unified Hardware Design, Specification, and Verification Language/IEEE Std 1800™-2012. Probably, more detailed explanation than in LRM.