Finding the maximum value of an array in log2 time in Verilog?

Finding the maximum value of an array in log2 time in Verilog? - sorting

I have designed a 'sorter' that finds the maximum value of its input, which is 16 31-bit words. In simulation, it works, but I am not sure if it will work in hardware (as it doesn't seem to be working on the FPGA as planned). Can someone please let me know if this will work? I am trying to save on resources, that is why I am trying to reuse the same register. Thank you...
module para_sort(clk, ready, array_in, out_max)
input clk, ready;
input [16*31-1:0] array_in;
output reg [30:0] out_max;
reg [30:0] temp_reg [0:15]
integer i, j;
always # (posedge clk)
begin
if(ready)
begin
for(j=0; j<16; j=j+1)
begin
temp_reg[j] <= array_in[31*(j+1)-1 -: 31];
end
i<=0;
done<=0;
end
else
begin
if(i<4)
begin
for(j=0; j<16; j=j+1)
if(temp_reg[j+1] > temp_reg[j]
temp_reg[((j+2)>>1)-1] <= temp_reg[j+1]
else
temp_reg[((j+2)>>1)-1] <= temp_reg[j]
i<=i+1;
end
end
if(i == 4)
begin
out_max <= temp_reg[0];
done <=1;
i <= i + 1;
end
if(i == 5)
done <=0;
end
endmodule
Sorry for the long code. If you have any questions about the code, please let me know.

Before answering the question, I assume that you do not have any problem with code syntax and semantics, and the module is part of the design so that you do not have problem with number of I/O pins, as it seems to be the case.
First of all, the algorithm that has been used in this posted code is incorrect. Maybe you meant something different, but this one is incorrect. The following code worked for me:
//The same code as you posted, but slight changes are made
//Maybe you have tried compiling this code and missed some points while posting
module para_sort(clk, ready, array_in, out_max, done);
input clk, ready;
input [16*31-1:0] array_in;
output reg [30:0] out_max;
output reg done;
reg [30:0] temp_reg [0:16];
integer i, j;
always # (posedge clk)
begin
if(ready)
begin
for(j=0; j<16; j=j+1)
begin
temp_reg[j] <= array_in[31*(j+1)-1 -: 31];
end
i<=0;
done<=0;
end
else
begin
if(i<4)
begin
for(j=0; j<16; j=j+2)
begin
if(temp_reg[j+1] > temp_reg[j])
temp_reg[((j+2)>>1)-1] <= temp_reg[j+1];
else
temp_reg[((j+2)>>1)-1] <= temp_reg[j];
end // end of the for loop
i<=i+1;
end
end
if(i == 4)
begin
out_max <= temp_reg[0];
done <=1;
i <= i + 1;
end
if(i == 5)
done <=0;
end
endmodule
If the above code does not solve the problem, then:
The problem might be with timing. In Quartus Prime software, there are netlist viewer tools (The same is true for Vivado). When you check the generated circuit, you can see paths with many combinational blocks and feedback (caused mostly by the second for loop). If the second for loop does not have enough time to complete execution in a single clock, you will loose synchronization and the results will be unpredictable.
So,
Try the above code first
If the code does not solve your problem, then try FSM (with case and if - else - if statements ) even the code becomes longer or looks uglier (not user friendly), it is more FPGA hardware friendly.

Related

VHDL - looping through an array

I wanted to loop through an array elements and later output them.
I don't think that the for loop is correct.
Could someone help with this, please ?
enter image description here

In VHDL, a for loop is a shorthand notation for creating parallel paths of logic.
In your example, the loop becomes a shift-by-1 index mapping assignment:
r_array(1) <= myArray1(0);
r_array(2) <= myArray1(1);
r_array(3) <= myArray1(2);
r_array(4) <= myArray1(3);
r_array(5) <= myArray1(4);
r_array(6) <= myArray1(5);
r_array(7) <= myArray1(6);
Since r_array is scalar type integer type and not a vector, this won't work as is.
Try using r_array as an index counter, then on each rising edge clock the next index of myArray will be assigned and our array index counter will increment by 1 (or wrap over back 0).
if rising_edge(i_CE1Hz) then
if r_array <= 7 then
r_array <= r_array + 1;
else
r_array <= 0;
end if;
o_element <= myArray(r_array); -- new output signal
end if;

Johnson Counter Syntax error. Unexpected token: generate

My college teacher asked for me to implement a Johnson Counter and it's test bench, with an width<=32 (he calls it an N parameter), and the implementation has to use generate/for structures. Although I had learned a little about Johnson Counter, I don't know how to use generate in this case, and I had some errors when I tried to run the test bench. Here is my implementation so far:
module johnsonCounter #(parameter N = 32)
(
input clk,
input rstn,
output reg [N-1:0] out
);
always # (posedge clk) begin
if (!rstn)
out <= 1;
else begin
out[N-1] <= ~out[0];
generate
for (int i = 0; i < N-1; i=i+1) begin
out[i] <= out[i+1];
end
endgenerate
end
end
endmodule
Here is the test bench:
module tb;
parameter N = 32;
reg clk;
reg rstn;
wire [N-1:0] out;
johnsonCounter u0 (.clk (clk),
.rstn (rstn),
.out (out));
always #10 clk = ~clk;
initial begin
{clk, rstn} <= 0;
$monitor ("T=%0t out=%b", $time, out);
repeat (2) #(posedge clk);
rstn <= 1;
repeat (15) #(posedge clk);
$finish;
end
initial begin
$dumpvars;
$dumpfile("dump.vcd");
end
endmodule
These are the errors:
ERROR VCP2000 "Syntax error. Unexpected token: generate[_GENERATE]. This is a Verilog keyword since IEEE Std 1364-2001 and cannot be used as an identifier. Use -v95 argument for compilation." "design.sv" 13 7
ERROR VCP2020 "begin...end pair(s) mismatch detected. 2 <end> tokens are missing." "design.sv" 17 7
ERROR VCP2020 "module/macromodule...endmodule pair(s) mismatch detected. 1 <endmodule> tokens are missing." "design.sv" 17 7
ERROR VCP2000 "Syntax error. Unexpected token: endgenerate[_ENDGENERATE]. This is a Verilog keyword since IEEE Std 1364-2001 and cannot be used as an identifier. Use -v95 argument for compilation." "design.sv" 17 7
Any help is welcome =)

It is illegal to use generate in that way.
For your code, just a for loop is needed (without generate):
always # (posedge clk) begin
if (!rstn)
out <= 1;
else begin
out[N-1] <= ~out[0];
for (int i = 0; i < N-1; i=i+1) begin
out[i] <= out[i+1];
end
end
end
For generate syntax, refer to the IEEE Std 1800-2017, section 27. Generate constructs.

I tried implementing it using the generate construct. I am also new at this, so if anybody sees any problem or error, or could provide any suggestion to improve performance, I would appreciate it.
Regarding your question, I always use generate to instantiate several modules, I think it makes my code cleaner and easier to understand. So what I did is to define a simple D flip-flop module, which I will use to instantiate it. If you want to use generate, you have to define an iterative variable with genvar. Also, you should use generate outside an always block (I don't know if there is a situation where you could use it inside the always block). Below, you can see the code.
module ff
(
input clk,
input rstn,
input d,
output reg q,
output reg qn
);
always #(posedge clk)
begin
if(!rstn)
begin
q <= 0;
qn <= 1;
end
else
begin
q <= d;
qn <= ~d;
end
end
endmodule
module johnsonCounter #(parameter N = 4)
(
input clk,
input rstn,
output [N-1:0] out,
output [N-1:0] nout
);
genvar i;
generate
for (i = 0; i < N-1; i=i+1) begin
ff flip (.clk(clk), .rstn(rstn), .d(out[i+1]), .q(out[i]), .qn(nout[i]));
end
endgenerate
ff lastFlip (.clk(clk), .rstn(clk), .d(nout[0]), .q(out[N-1]), .qn(nout[N-1]));
endmodule
Here you have the testbench, too. One thing I changed from your code is the dumpfile line. It should go before dumpvar.
module tb;
parameter N = 4;
reg clk;
reg rstn;
wire [N-1:0] out;
johnsonCounter u0 (.clk (clk),
.rstn (rstn),
.out (out));
always #10 clk = ~clk;
initial begin
{clk, rstn} <= 0;
$monitor ("T=%0t out=%b", $time, out);
repeat (2) #(posedge clk);
rstn <= 1;
repeat (15) #(posedge clk);
$finish;
end
initial begin
$dumpfile("dump.vcd");
$dumpvars;
end
endmodule
This code was tested using EDA Playground and it worked fine but, as I said, I am not an expert, so if anybody finds any error or have any suggestion, it is welcome.

In Verilog, when using a for-loop within a sequential process, how can you increment the sequential variable?

Here is a snippet of code, hopefully you are alright without the preamble:
always # (posedge clk)
begin
if(rst)
begin
i<=0;
j<=0;
end
else
begin
for(j = 0 ; j < 16 ; j = j+1)
begin
if(i<8)
begin
var[j] <= var_2[i];
i <= i+1;
end
end
end
end
Basically, I am wondering if the outer "for-loop" will erroneously increment the counter variable i, rather than simply calculating the 16 vars in parallel. If this is the case, should I cut the for-loop short so that the variable is incremented outside the for-loop?
Thanks!

You are making the code unnecessary complex.
If i<8 then these get executed:
var[j] <= var_2[i];
i <= i+1;
But the i is not incremented until after the clock edge.
As 'i' does not change the condition does not change and thus once true it stays true for all values of j.
Probably a better way of understanding this is to write the code as follows which has the exact same behavior:
always # (posedge clk)
begin
if(rst)
begin
i<=0;
j<=0;
end
else
begin
for (j = 0 ; j < 16 ; j = j+1)
begin
if(i<8)
var[j] <= var_2[i];
end
// i increment independent from the 'j' loop
if(i<8)
i <= i+1;
end
end

Vhdl Snake - how to automate tail implementation

I've been currently implementing the snake game in vhdl for Spartan3e.
I have already written a part that draws a cell square on VGA screen and makes it possible to move it around the square.
The problem is with tail implementation - as far I have manually added another cell segment to my snake but I would like to automate it (as for example in java simply making the queue with the cells and setting the positiong of the next cell as the cell before). I do not know how to write such a complex function in vhdl.
Here is my code:
begin
process (clk, reset, endOfGame)
begin
if reset='1' or endOfGame=true then
ball_y_reg <= to_unsigned(231,10);
ball_x_reg <= to_unsigned(311,10);
ball_x_reg_cell<=to_unsigned(231,10);
ball_y_reg_cell<=to_unsigned(311,10);
-- velocity after reset schould be none
x_delta_reg <= ("0000000000");
y_delta_reg <= ("0000000000");
elsif (clk'event and clk='1') then
ball_x_reg_cell<=ball_x_next_cell;
ball_y_reg_cell<=ball_y_next_cell;
ball_x_reg <= ball_x_next;
ball_y_reg <= ball_y_next;
x_delta_reg <= x_delta_next;
y_delta_reg <= y_delta_next;
end if;
end process;
pix_x <= unsigned(pixel_x);
pix_y <= unsigned(pixel_y);
-- refr_tick: 1-clock tick asserted at start of v-sync
-- i.e., when the screen is refreshed (60 Hz)
refr_tick <= '1' when (pix_y=481) and (pix_x=0) else
'0';
----------------------------------------------
-- pixel within wall
wall_on <=
'1' when ((WALL_X_LEFTSIDE_L<=pix_x) and (pix_x<=WALL_X_LEFTSIDE_R)) or ((WALL_X_RIGHTSIDE_L<=pix_x) and (pix_x<=WALL_X_RIGHTSIDE_R)) or ((WALL_Y_UPSIDE_U<=pix_y) and (pix_y<=WALL_Y_UPSIDE_D)) or ((WALL_Y_DOWNSIDE_U<=pix_y) and (pix_y<=WALL_Y_DOWNSIDE_D)) else
'0';
-- wall rgb output
wall_rgb <= "001"; -- blue
----------------------------------------------
-- square ball
ball_x_l <= ball_x_reg;
ball_y_t <= ball_y_reg;
ball_x_r <= ball_x_l + BALL_SIZE - 1;
ball_y_b <= ball_y_t + BALL_SIZE - 1;
ball_x_l_cell <= ball_x_reg_cell;
ball_y_t_cell <= ball_y_reg_cell;
ball_x_r_cell <= ball_x_l_cell + BALL_SIZE - 1;
ball_y_b_cell <= ball_y_t_cell + BALL_SIZE - 1;
--tail
-- pixel within squared ball
sq_ball_on <=
'1' when ((ball_x_l<=pix_x) and (pix_x<=ball_x_r) and
(ball_y_t<=pix_y) and (pix_y<=ball_y_b))
or
((ball_x_l_cell<=pix_x) and (pix_x<=ball_x_r_cell) and
(ball_y_t_cell<=pix_y) and (pix_y<=ball_y_b_cell))
else
'0';
ball_x_next <= ball_x_reg + x_delta_reg
when refr_tick='1' else
ball_x_reg ;
ball_y_next <= ball_y_reg + y_delta_reg
when refr_tick='1' else
ball_y_reg ;
ball_x_next_cell <= ball_x_reg - BALL_SIZE when refr_tick='1' and CURRENT_DIRECTION = DIR_RIGHT
else ball_x_reg + BALL_SIZE when refr_tick='1' and CURRENT_DIRECTION = DIR_LEFT
else ball_x_reg when refr_tick='1'
else ball_x_reg_cell;
ball_y_next_cell <= ball_y_reg - BALL_SIZE when refr_tick='1' and CURRENT_DIRECTION = DIR_UP
else ball_y_reg + BALL_SIZE when refr_tick='1' and CURRENT_DIRECTION = DIR_DOWN
else ball_y_reg when refr_tick='1'
else ball_y_reg_cell;
-- new bar y-position
process(ball_y_reg, ball_y_b, ball_y_t, refr_tick, btn, ball_x_reg ,ball_x_r, ball_x_l, x_delta_reg, y_delta_reg)
begin
x_delta_next <= x_delta_reg;
y_delta_next <= y_delta_reg;
if refr_tick='1' then
if btn(1)='1' and ball_y_b<(MAX_Y-1-BALL_SIZE) then
if CURRENT_DIRECTION /= DIR_UP then
CURRENT_DIRECTION <= DIR_DOWN;
y_delta_next <= BALL_V_P; -- move down
x_delta_next <= (others=>'0');
end if;
elsif btn(0)='1' and ball_y_t > BALL_SIZE then
if CURRENT_DIRECTION /= DIR_DOWN then
CURRENT_DIRECTION <= DIR_UP;
y_delta_next <= BALL_V_N; -- move up
x_delta_next <= (others=>'0');
end if;
elsif btn(2)='1' and ball_x_r<(MAX_X-1-BALL_SIZE) then
if CURRENT_DIRECTION /= DIR_LEFT then
CURRENT_DIRECTION <= DIR_RIGHT;
x_delta_next <= BALL_V_P;
y_delta_next <= (others=>'0');
end if;
elsif btn(3)='1' and ball_x_l > BALL_SIZE then
if CURRENT_DIRECTION /= DIR_RIGHT then
CURRENT_DIRECTION <= DIR_LEFT;
x_delta_next <= BALL_V_N;
y_delta_next <= (others=>'0');
end if;
end if;
if ball_x_l < WALL_X_LEFTSIDE_R or ball_y_t < WALL_Y_UPSIDE_D or ball_y_b > WALL_Y_DOWNSIDE_U or ball_x_r > WALL_X_RIGHTSIDE_L then
endOfGame <= true;
CURRENT_DIRECTION <= IDLE;
else
endOfGame <= false;
end if;
end if;
end process;
"Ball x next cell " parts is manually added second cell.
I have been searching through topics containing similiar problem but it is not covering it in vhdl.
Thanks for help!

The problem isn't VHDL - don't get lost in the language differences between VHDL and Java - these are trivial here.
The problem is synthesisability - you need a conceptual design that can be represented in hardware.
You say your Java implementation uses a queue - this will be based on a linked list, with nodes (segments) dynamically allocated, and referenced via pointers. And in fact you could straightforwardly translate that into VHDL, using access types, new and deallocate, and so on. You'd have to implement the details yourself, while there might be a convenient library, i.e. class, in Java. But that's mere detail.
Don't go down that road - access types and especially dynamic allocation aren't synthesisable - you can't normally generate and free chunks of hardware to a running system...
(But you might do that if you wanted to run an existing Snake in a simulator, in parallel with the synthesisable version, to compare their results and verify the synthesisable one matches the already proven software version. If you need a high-reliability Snake designed to military, aerospace or safety critical requirements, you'll need this step.)
You need a different mindset for hardware design, based on knowing what is physically realisable, and how to translate concepts into that.
So, instead you need to consider how you might implement a snake segment before the system starts, and only turn it on when you need it. Then consider how to create as many as you'll ever need before the system starts.
For example a segment might need to know its colour and its X/Y coordinates and some other stuff, like, is it on/visible yet. How might you represent all that?
You might decide, having played the game and reached 50 segments, that 100 is enough to win the game.
Now, records and fixed size arrays are absolutely synthesisable.
That might get you started...

vhdl code (for loop)

Description:
I want to write vhdl code that finds the largest integer in the array A which is an array of 20 integers.
Question:
what should my algorithm look like, to input where the sequential statements are?
my vhdl code:
highnum: for i in 0 to 19 loop
i = 0;
i < 20;
i<= i + 1;
end loop highnum;
This does not need to be synthesizable but I dont know how to form this for loop a detailed example explaining how to would be appreciated.

Simply translating the C loop to VHDL, inside a VHDL clocked process, will work AND be synthesisable. It will generate a LOT of hardware because it has to generate the output in a single clock cycle, but that doesn't matter if you are just simulating it.
If that is too much hardware, then you have to implement it as a state machine with at least two states, Idle and Calculating, so that it performs only one loop iteration per clock cycle while Calculating, and returns to the Idle state when done.

First of all you should know how have you defined the array in vhdl.
Let me define an array for you.
type array_of_integer array(19 downto 0) of integer;
signal A : array_of_integer :=(others => 0);
signal max : integer;
-- Now above is the array in vhdl of integers all are initialized to value 0.
A(0) <= 1;
A(1) <= 2;
--
--
A(19)<= 19;
-- Now the for loop for calculating maximum
max <= A(0);
for i in 0 to 19 loop
if (A(i) > max) then
max <= A(i);
end if;
end loop;
-- Now If you have problems in understating that where to put which part of code .. in a ----vhdl entity format .. i.e process, ports, etc... you can reply !

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio