FIR filter output result Interpretation While input is taken from ADC161S626: 16 bit

FIR filter output result Interpretation While input is taken from ADC161S626: 16 bit - filter

I have implemented a 20 order FIR Low pass filter, 1000 HZ cutoff frequency on Sparten - 6 FPGA. First I calculate the coefficients in Matlab, Then I directly put them in vhdl code. Here are my coefficients.
H(0) <= to_signed(26,16); --10000HZ cutoff
H(1) <= to_signed(67,16);
H(2) <= to_signed(169,16);
H(3) <= to_signed(369,16);
H(4) <= to_signed(686,16);
H(5) <= to_signed(1111,16);
H(6) <= to_signed(1606,16);
H(7) <= to_signed(2108,16);
H(8) <= to_signed(2542,16);
H(9) <= to_signed(2836,16);
H(10) <= to_signed(2940,16);
H(11) <= to_signed(2836,16);
H(12) <= to_signed(2542,16);
H(13) <= to_signed(2108,16);
H(14) <= to_signed(1606,16);
H(15) <= to_signed(1111,16);
H(16) <= to_signed(686,16);
H(17) <= to_signed(369,16);
H(18) <= to_signed(169,16);
H(19) <= to_signed(67,16);
H(20) <= to_signed(26,16);
Now Actually I am taking an torque feedback Input from some sensor, The torque is converted in voltage. The voltage is digitize into 16 bit format.The filtered value, output of the filter is in 32 bit format.
Here is the snapshot of Teraterm.
How do I convert the these values to voltages ?
Thanks.

Related

VHDL - looping through an array

I wanted to loop through an array elements and later output them.
I don't think that the for loop is correct.
Could someone help with this, please ?
enter image description here

In VHDL, a for loop is a shorthand notation for creating parallel paths of logic.
In your example, the loop becomes a shift-by-1 index mapping assignment:
r_array(1) <= myArray1(0);
r_array(2) <= myArray1(1);
r_array(3) <= myArray1(2);
r_array(4) <= myArray1(3);
r_array(5) <= myArray1(4);
r_array(6) <= myArray1(5);
r_array(7) <= myArray1(6);
Since r_array is scalar type integer type and not a vector, this won't work as is.
Try using r_array as an index counter, then on each rising edge clock the next index of myArray will be assigned and our array index counter will increment by 1 (or wrap over back 0).
if rising_edge(i_CE1Hz) then
if r_array <= 7 then
r_array <= r_array + 1;
else
r_array <= 0;
end if;
o_element <= myArray(r_array); -- new output signal
end if;

How is this calculating CRC-A using polynomial - x^16 + x^12 + x^5 + 1

I came across this piece of code. But I am not sure how is CRC calculated here. I know the theoretical way of calculating CRC but I guess here it is using some kind of different logic, maybe. Please correct me.
r_o[14:13] <= r_o[13:12];
r_o[12] <= r_o[11]^r_o[15]^x16;
r_o[11] <= r_o[10];
r_o[10:6] <= r_o[9:5];
r_o[5] <= r_o[4]^r_o[15]^x16;
r_o[4] <= r_o[3];
r_o[3:1] <= r_o[2:0];
r_o[0] <= r_o[15]^x16;

Well, if you know about the theoretical way to do we can start with
Assuming u is the input bit, and that every iteration you calculate r = r * x + u*x^16, remember that in the given ring x^16 = x^12 + x^5 + 1, a direct implementation of this would be
parameter X16_MASK = 16'h1021; // (1 << 12) ^ (1 << 5) ^ 1
function [15:0] step(input [15:0] prev_state, input u);
logic [15:0] state;
begin
state = {prev_state[14:0], 1'b0};
if(state[15]) state = state ^ X16_MASK;
if(u) state = state ^ X16_MASK;
step = state;
end
endfunction
This could simply be written as {r_o[14:0], 1'b0} ^ (X16_MASK * (r_o[15] ^ u)) and let the synthesis to optimize whatever is necessary, it should be able to simplify the multiplication by a 1-bit signal. Now check the positions where the mask has an effect you will get to assignments above.
I would simplify
r_o[11] <= r_o[10];
r_o[10:6] <= r_o[9:5];
to r_o[11:6] = r_o[10:5]
and
r_o[4] <= r_o[3];
r_o[3:1] <= r_o[2:0];
to r_o[4:1] = r_o[3:0]
In the code you presented I am missing the assignment to r_o[15].
So you could say
r_o[15:13] <= r_o[14:12];
r_o[12] <= r_o[11]^r_o[15]^x16;
r_o[11:6] <= r_o[10:5];
r_o[5] <= r_o[4]^r_o[15]^x16;
r_o[4:1] <= r_o[3:0];
r_o[0] <= r_o[15]^x16;
And if you like the one linear bit packing
r_o <= {r_o[14:12], r_o[11]^r_o[15]^x16, r_o[10:5], r_o[4]^r_o[15]^x16,r_o[3:0], r_o[15]^x16}

modified baugh-wooley algorithm multiply verilog code does not multiply correctly

The following verilog source code and/or testbench works nicely across commercial simulators, iverilog as well as formal verification tool (yosys-smtbmc)
Please keep the complaint about `ifdef FORMAL until later. I need them to use with yosys-smtbmc which does not support bind command yet.
I am now debugging the generate coding since the multiplication (using modified baugh-wooley algorithm) does not work yet.
When o_valid is asserted, the multiply code should give o_p = i_a * i_b = 3*2 = 6 but the waveform clearly shows the code gives o_p = 0x20 = 32
test_multiply.v
// Testbench
module test_multiply;
parameter A_WIDTH=4, B_WIDTH=4;
reg i_clk;
reg i_reset;
reg i_ce;
reg signed[(A_WIDTH-1):0] i_a;
reg signed[(B_WIDTH-1):0] i_b;
wire signed[(A_WIDTH+B_WIDTH-1):0] o_p;
wire o_valid;
// Instantiate design under test
multiply #(A_WIDTH, B_WIDTH) mul(.clk(i_clk), .reset(i_reset), .in_valid(i_ce), .in_A(i_a), .in_B(i_b), .out_valid(o_valid), .out_C(o_p));
initial begin
// Dump waves
$dumpfile("test_multiply.vcd");
$dumpvars(0, test_multiply);
i_clk = 0;
i_reset = 0;
i_ce = 0;
i_a = 0;
i_b = 0;
end
localparam SMALLER_WIDTH = (A_WIDTH <= B_WIDTH) ? A_WIDTH : B_WIDTH;
localparam NUM_OF_INTERMEDIATE_LAYERS = $clog2(SMALLER_WIDTH);
genvar i, j; // array index
generate
for(i = 0; i < NUM_OF_INTERMEDIATE_LAYERS; i = i + 1) begin
for(j = 0; j < SMALLER_WIDTH; j = j + 1) begin
initial $dumpvars(0, test_multiply.mul.middle_layers[i][j]);
end
end
endgenerate
always #5 i_clk = !i_clk;
initial begin
#(posedge i_clk);
#(posedge i_clk);
$display("Reset flop.");
i_reset = 1;
#(posedge i_clk);
#(posedge i_clk);
i_reset = 0;
#(posedge i_clk);
#(posedge i_clk);
i_ce = 1;
i_a = 3;
i_b = 2;
#50 $finish;
end
endmodule
multiply.v
module multiply #(parameter A_WIDTH=16, B_WIDTH=16)
(clk, reset, in_valid, out_valid, in_A, in_B, out_C); // C=A*B
`ifdef FORMAL
parameter A_WIDTH = 4;
parameter B_WIDTH = 4;
`endif
input clk, reset;
input in_valid; // to signify that in_A, in_B are valid, multiplication process can start
input signed [(A_WIDTH-1):0] in_A;
input signed [(B_WIDTH-1):0] in_B;
output signed [(A_WIDTH+B_WIDTH-1):0] out_C;
output reg out_valid; // to signify that out_C is valid, multiplication finished
/*
This signed multiplier code architecture is a combination of row adder tree and
modified baugh-wooley algorithm, thus requires an area of O(N*M*logN) and time O(logN)
with M, N being the length(bitwidth) of the multiplicand and multiplier respectively
see [url]https://i.imgur.com/NaqjC6G.png[/url] or
Row Adder Tree Multipliers in [url]http://www.andraka.com/multipli.php[/url] or
[url]https://pdfs.semanticscholar.org/415c/d98dafb5c9cb358c94189927e1f3216b7494.pdf#page=10[/url]
regarding the mechanisms within all layers
In terms of fmax consideration: In the case of an adder tree, the adders making up the levels
closer to the input take up real estate (remember the structure of row adder tree). As the
size of the input multiplicand bitwidth grows, it becomes more and more difficult to find a
placement that does not use long routes involving multiple switch nodes for FPGA. The result
is the maximum clocking speed degrades quickly as the size of the bitwidth grows.
For signed multiplication, see also modified baugh-wooley algorithm for trick in skipping
sign extension (implemented as verilog example in [url]https://www.dsprelated.com/showarticle/555.php[/url]),
thus smaller final routed silicon area.
[url]https://stackoverflow.com/questions/54268192/understanding-modified-baugh-wooley-multiplication-algorithm/[/url]
All layers are pipelined, so throughput = one result for each clock cycle
but each multiplication result still have latency = NUM_OF_INTERMEDIATE_LAYERS
*/
// The multiplication of two numbers is equivalent to adding as many copies of one
// of them, the multiplicand, as the value of the other one, the multiplier.
// Therefore, multiplicand always have the larger width compared to multipliers
localparam SMALLER_WIDTH = (A_WIDTH <= B_WIDTH) ? A_WIDTH : B_WIDTH;
localparam LARGER_WIDTH = (A_WIDTH > B_WIDTH) ? A_WIDTH : B_WIDTH;
wire [(LARGER_WIDTH-1):0] MULTIPLICAND = (A_WIDTH > B_WIDTH) ? in_A : in_B ;
wire [(SMALLER_WIDTH-1):0] MULTIPLIPLIER = (A_WIDTH <= B_WIDTH) ? in_A : in_B ;
`ifdef FORMAL
// to keep the values of multiplicand and multiplier before the multiplication finishes
reg [(LARGER_WIDTH-1):0] MULTIPLICAND_reg;
reg [(SMALLER_WIDTH-1):0] MULTIPLIPLIER_reg;
always #(posedge clk)
begin
if(reset) begin
MULTIPLICAND_reg <= 0;
MULTIPLIPLIER_reg <= 0;
end
else if(in_valid) begin
MULTIPLICAND_reg <= MULTIPLICAND;
MULTIPLIPLIER_reg <= MULTIPLIPLIER;
end
end
`endif
localparam NUM_OF_INTERMEDIATE_LAYERS = $clog2(SMALLER_WIDTH);
/*Binary multiplications and additions for partial products rows*/
// first layer has "SMALLER_WIDTH" entries of data of width "LARGER_WIDTH"
// This resulted in a binary tree with faster vertical addition processes as we have
// lesser (NUM_OF_INTERMEDIATE_LAYERS) rows to add
// intermediate partial product rows additions
// Imagine a rhombus of height of "SMALLER_WIDTH" and width of "LARGER_WIDTH"
// being re-arranged into binary row adder tree
// such that additions can be done in O(logN) time
//reg [(NUM_OF_INTERMEDIATE_LAYERS-1):0][(SMALLER_WIDTH-1):0][(A_WIDTH+B_WIDTH-1):0] middle_layers;
reg [(A_WIDTH+B_WIDTH-1):0] middle_layers[(NUM_OF_INTERMEDIATE_LAYERS-1):0][0:(SMALLER_WIDTH-1)];
//reg [(NUM_OF_INTERMEDIATE_LAYERS-1):0] middle_layers [0:(SMALLER_WIDTH-1)] [(A_WIDTH+B_WIDTH-1):0];
//reg middle_layers [(NUM_OF_INTERMEDIATE_LAYERS-1):0][0:(SMALLER_WIDTH-1)][(A_WIDTH+B_WIDTH-1):0];
generate // duplicates the leafs of the binary tree
genvar layer; // layer 0 means the youngest leaf, layer N means the tree trunk
for(layer=0; layer<NUM_OF_INTERMEDIATE_LAYERS; layer=layer+1) begin: intermediate_layers
integer pp_index; // leaf index within each layer of the tree
integer bit_index; // index of binary string within each leaf
always #(posedge clk)
begin
if(reset)
begin
for(pp_index=0; pp_index<SMALLER_WIDTH ; pp_index=pp_index+1)
middle_layers[layer][pp_index] <= 0;
end
else begin
if(layer == 0) // all partial products rows are in first layer
begin
// generation of partial products rows
for(pp_index=0; pp_index<SMALLER_WIDTH ; pp_index=pp_index+1)
middle_layers[layer][pp_index] <=
(MULTIPLICAND & MULTIPLIPLIER[pp_index]);
// see modified baugh-wooley algorithm: [url]https://i.imgur.com/VcgbY4g.png[/url] from
// page 122 of book "Ultra-Low-Voltage Design of Energy-Efficient Digital Circuits"
for(pp_index=0; pp_index<SMALLER_WIDTH ; pp_index=pp_index+1)
middle_layers[layer][pp_index][LARGER_WIDTH-1] <=
!middle_layers[layer][pp_index][LARGER_WIDTH-1];
middle_layers[layer][SMALLER_WIDTH-1] <= !middle_layers[layer][SMALLER_WIDTH-1];
middle_layers[layer][0][LARGER_WIDTH] <= 1;
middle_layers[layer][SMALLER_WIDTH-1][LARGER_WIDTH] <= 1;
end
// adding the partial product rows according to row adder tree architecture
else begin
for(pp_index=0; pp_index<(SMALLER_WIDTH >> layer) ; pp_index=pp_index+1)
middle_layers[layer][pp_index] <=
middle_layers[layer-1][pp_index<<1] +
(middle_layers[layer-1][(pp_index<<1) + 1]) << 1;
// bit-level additions using full adders
/*for(pp_index=0; pp_index<SMALLER_WIDTH ; pp_index=pp_index+1)
for(bit_index=0; bit_index<(LARGER_WIDTH+layer); bit_index=bit_index+1)
full_adder fa(.clk(clk), .reset(reset), .ain(), .bin(), .cin(), .sum(), .cout());*/
end
end
end
end
endgenerate
assign out_C = (reset)? 0 : middle_layers[NUM_OF_INTERMEDIATE_LAYERS-1][0];
/*Checking if the final multiplication result is ready or not*/
reg [($clog2(NUM_OF_INTERMEDIATE_LAYERS)-1):0] out_valid_counter; // to track the multiply stages
reg multiply_had_started;
always #(posedge clk)
begin
if(reset)
begin
multiply_had_started <= 0;
out_valid <= 0;
out_valid_counter <= 0;
end
else if(out_valid_counter == NUM_OF_INTERMEDIATE_LAYERS-1) begin
multiply_had_started <= 0;
out_valid <= 1;
out_valid_counter <= 0;
end
else if(in_valid && !multiply_had_started) begin
multiply_had_started <= 1;
out_valid <= 0; // for consecutive multiplication
end
else begin
out_valid <= 0;
if(multiply_had_started) out_valid_counter <= out_valid_counter + 1;
end
end
`ifdef FORMAL
initial assume(reset);
initial assume(in_valid == 0);
wire sign_bit = MULTIPLICAND_reg[LARGER_WIDTH-1] ^ MULTIPLIPLIER_reg[SMALLER_WIDTH-1];
always #(posedge clk)
begin
if(reset) assert(out_C == 0);
else if(out_valid) begin
assert(out_C == (MULTIPLICAND_reg * MULTIPLIPLIER_reg));
assert(out_C[A_WIDTH+B_WIDTH-1] == sign_bit);
end
end
`endif
`ifdef FORMAL
localparam user_A = 3;
localparam user_B = 2;
always #(posedge clk)
begin
cover(in_valid && (in_A == user_A) && (in_B == user_B));
cover(out_valid);
end
`endif
endmodule

Problem solved: The following code now gives correct signed multiplication result both in vivado simulation and cover() within formal verification.
See multiply.v and the corresponding correct waveform
module multiply #(parameter A_WIDTH=16, B_WIDTH=16)
(clk, reset, in_valid, out_valid, in_A, in_B, out_C); // C=A*B
`ifdef FORMAL
parameter A_WIDTH = 4;
parameter B_WIDTH = 6;
`endif
input clk, reset;
input in_valid; // to signify that in_A, in_B are valid, multiplication process can start
input signed [(A_WIDTH-1):0] in_A;
input signed [(B_WIDTH-1):0] in_B;
output signed [(A_WIDTH+B_WIDTH-1):0] out_C;
output reg out_valid; // to signify that out_C is valid, multiplication finished
/*
This signed multiplier code architecture is a combination of row adder tree and
modified baugh-wooley algorithm, thus requires an area of O(N*M*logN) and time O(logN)
with M, N being the length(bitwidth) of the multiplicand and multiplier respectively
see https://i.imgur.com/NaqjC6G.png or
Row Adder Tree Multipliers in http://www.andraka.com/multipli.php or
https://pdfs.semanticscholar.org/415c/d98dafb5c9cb358c94189927e1f3216b7494.pdf#page=10
regarding the mechanisms within all layers
In terms of fmax consideration: In the case of an adder tree, the adders making up the levels
closer to the input take up real estate (remember the structure of row adder tree). As the
size of the input multiplicand bitwidth grows, it becomes more and more difficult to find a
placement that does not use long routes involving multiple switch nodes for FPGA. The result
is the maximum clocking speed degrades quickly as the size of the bitwidth grows.
For signed multiplication, see also modified baugh-wooley algorithm for trick in skipping
sign extension (implemented as verilog example in https://www.dsprelated.com/showarticle/555.php),
thus smaller final routed silicon area.
https://stackoverflow.com/questions/54268192/understanding-modified-baugh-wooley-multiplication-algorithm/
All layers are pipelined, so throughput = one result for each clock cycle
but each multiplication result still have latency = NUM_OF_INTERMEDIATE_LAYERS
*/
// The multiplication of two numbers is equivalent to adding as many copies of one
// of them, the multiplicand, as the value of the other one, the multiplier.
// Therefore, multiplicand always have the larger width compared to multipliers
localparam SMALLER_WIDTH = (A_WIDTH <= B_WIDTH) ? A_WIDTH : B_WIDTH;
localparam LARGER_WIDTH = (A_WIDTH > B_WIDTH) ? A_WIDTH : B_WIDTH;
wire [(LARGER_WIDTH-1):0] MULTIPLICAND = (A_WIDTH > B_WIDTH) ? in_A : in_B ;
wire [(SMALLER_WIDTH-1):0] MULTIPLIER = (A_WIDTH <= B_WIDTH) ? in_A : in_B ;
// to keep the values of multiplicand and multiplier before the multiplication finishes
reg signed [(LARGER_WIDTH-1):0] MULTIPLICAND_reg;
reg signed [(SMALLER_WIDTH-1):0] MULTIPLIER_reg;
always #(posedge clk)
begin
if(reset) begin
MULTIPLICAND_reg <= 0;
MULTIPLIER_reg <= 0;
end
else if(in_valid) begin
MULTIPLICAND_reg <= MULTIPLICAND;
MULTIPLIER_reg <= MULTIPLIER;
end
end
localparam NUM_OF_INTERMEDIATE_LAYERS = $clog2(SMALLER_WIDTH);
/*Binary multiplications and additions for partial products rows*/
// first layer has "SMALLER_WIDTH" entries of data of width "LARGER_WIDTH"
// This resulted in a binary tree with faster vertical addition processes as we have
// lesser (NUM_OF_INTERMEDIATE_LAYERS) rows to add
// intermediate partial product rows additions
// Imagine a rhombus of height of "SMALLER_WIDTH" and width of "LARGER_WIDTH"
// being re-arranged into binary row adder tree
// such that additions can be done in O(logN) time
//reg [(NUM_OF_INTERMEDIATE_LAYERS-1):0][(SMALLER_WIDTH-1):0][(A_WIDTH+B_WIDTH-1):0] middle_layers;
reg signed [(A_WIDTH+B_WIDTH-1):0] middle_layers[NUM_OF_INTERMEDIATE_LAYERS:0][0:(SMALLER_WIDTH-1)];
//reg [(NUM_OF_INTERMEDIATE_LAYERS-1):0] middle_layers [0:(SMALLER_WIDTH-1)] [(A_WIDTH+B_WIDTH-1):0];
//reg middle_layers [(NUM_OF_INTERMEDIATE_LAYERS-1):0][0:(SMALLER_WIDTH-1)][(A_WIDTH+B_WIDTH-1):0];
generate // duplicates the leafs of the binary tree
genvar layer; // layer 0 means the youngest leaf, layer N means the tree trunk
for(layer=0; layer<=NUM_OF_INTERMEDIATE_LAYERS; layer=layer+1) begin: intermediate_layers
integer pp_index; // leaf index within each layer of the tree
always #(posedge clk)
begin
if(reset)
begin
for(pp_index=0; pp_index<SMALLER_WIDTH ; pp_index=pp_index+1)
middle_layers[layer][pp_index] <= 0;
end
else begin
if(layer == 0) // all partial products rows are in first layer
begin
// generation of partial products rows
for(pp_index=0; pp_index<SMALLER_WIDTH ; pp_index=pp_index+1)
middle_layers[layer][pp_index] <= MULTIPLIER[pp_index] ? MULTIPLICAND:0;
// see modified baugh-wooley algorithm: https://i.imgur.com/VcgbY4g.png from
// page 122 of book: Ultra-Low-Voltage Design of Energy-Efficient Digital Circuits
for(pp_index=0; pp_index<(SMALLER_WIDTH-1) ; pp_index=pp_index+1) // MSB inversion
middle_layers[layer][pp_index][LARGER_WIDTH-1] <=
(MULTIPLICAND[LARGER_WIDTH-1] & MULTIPLIER[pp_index]) ? 0:1;
for(pp_index=(LARGER_WIDTH-SMALLER_WIDTH); pp_index<(LARGER_WIDTH-1) ; pp_index=pp_index+1) // last partial product row inversion
//the starting index is to consider the condition where A_WIDTH != B_WIDTH
middle_layers[layer][SMALLER_WIDTH-1][pp_index] <=
(MULTIPLICAND[pp_index] & MULTIPLIER[SMALLER_WIDTH-1]) ? 0:1;
middle_layers[layer][0][LARGER_WIDTH] <= 1;
middle_layers[layer][SMALLER_WIDTH-1][LARGER_WIDTH] <= 1;
end
// adding the partial product rows according to row adder tree architecture
else begin
for(pp_index=0; pp_index<(SMALLER_WIDTH >> layer) ; pp_index=pp_index+1)
begin
if(pp_index==0)
middle_layers[layer][pp_index] <=
middle_layers[layer-1][0] +
(middle_layers[layer-1][1] << layer);
else middle_layers[layer][pp_index] <=
middle_layers[layer-1][pp_index<<1] +
(middle_layers[layer-1][(pp_index<<1) + 1] << layer);
end
end
end
end
end
endgenerate
assign out_C = (reset)? 0 : mul_result;
// both A and B are of negative numbers
wire both_negative = MULTIPLICAND_reg[LARGER_WIDTH-1] & MULTIPLIER_reg[SMALLER_WIDTH-1];
/*
the following is to deal with the shortcomings of the published modified baugh-wooley algorithm
which does not handle the case where A_WIDTH != B_WIDTH
The countermeasure does not do "To build a 6x4 multplier you can build a 6x6 multiplier, but replicate
the sign bit of the short word 3 times, and ignore the top 2 bits of the result." , instead it uses
some smart tricks/logic described by the signal 'modify_result'. The signal 'modify_result' is not
asserted when one number is positive, and another is negative.
Please use pencil and paper method (and signals waveform) to verify or understand this.
I did not do a rigorous math proof on this countermeasure.
Instead I modify the "modified baugh-wooley algorithm" by debugging wrong multiplication results from
formal verification cover(in_valid && (in_A == A_value) && (in_B == B_value)); waveforms
together with manual handwritten multiplication on paper.
The countermeasure is considered successful when assert(out_C == (MULTIPLICAND_reg * MULTIPLIER_reg));
passed during cover() verification
Besides, the last partial product row inversion mechanism is also modified to handle this shortcoming
*/
wire modify_result = (A_WIDTH == B_WIDTH) || ((A_WIDTH != B_WIDTH) && both_negative);
wire signed [(A_WIDTH+B_WIDTH-1):0] mul_result;
assign mul_result = (modify_result) ?
middle_layers[NUM_OF_INTERMEDIATE_LAYERS][0] :
{{(LARGER_WIDTH-SMALLER_WIDTH){sign_bit}} ,
middle_layers[NUM_OF_INTERMEDIATE_LAYERS][0][LARGER_WIDTH +: SMALLER_WIDTH] ,
middle_layers[NUM_OF_INTERMEDIATE_LAYERS][0][0 +: SMALLER_WIDTH]} ;
/*Checking if the final multiplication result is ready or not*/
reg [($clog2(NUM_OF_INTERMEDIATE_LAYERS)-1):0] out_valid_counter; // to track the multiply stages
reg multiply_had_started;
always #(posedge clk)
begin
if(reset)
begin
multiply_had_started <= 0;
out_valid <= 0;
out_valid_counter <= 0;
end
else if(out_valid_counter == NUM_OF_INTERMEDIATE_LAYERS-1) begin
multiply_had_started <= 0;
out_valid <= 1;
out_valid_counter <= 0;
end
else if(in_valid && !multiply_had_started) begin
multiply_had_started <= 1;
out_valid <= 0; // for consecutive multiplication
end
else begin
out_valid <= 0;
if(multiply_had_started) out_valid_counter <= out_valid_counter + 1;
end
end
wire sign_bit = MULTIPLICAND_reg[LARGER_WIDTH-1] ^ MULTIPLIER_reg[SMALLER_WIDTH-1];
`ifdef FORMAL
initial assume(reset);
initial assume(in_valid == 0);
always #(posedge clk)
begin
if(reset) assert(out_C == 0);
else if(out_valid) begin
assert(out_C == (MULTIPLICAND_reg * MULTIPLIER_reg));
assert(out_C[A_WIDTH+B_WIDTH-1] == sign_bit);
end
end
`endif
`ifdef FORMAL
wire signed [(A_WIDTH-1):0] A_value = $anyconst;
wire signed [(B_WIDTH-1):0] B_value = $anyconst;
always #(posedge clk)
begin
assume(A_value != 0);
assume(B_value != 0);
cover(in_valid && (in_A == A_value) && (in_B == B_value));
cover(out_valid);
end
`endif
endmodule

Vhdl Snake - how to automate tail implementation

I've been currently implementing the snake game in vhdl for Spartan3e.
I have already written a part that draws a cell square on VGA screen and makes it possible to move it around the square.
The problem is with tail implementation - as far I have manually added another cell segment to my snake but I would like to automate it (as for example in java simply making the queue with the cells and setting the positiong of the next cell as the cell before). I do not know how to write such a complex function in vhdl.
Here is my code:
begin
process (clk, reset, endOfGame)
begin
if reset='1' or endOfGame=true then
ball_y_reg <= to_unsigned(231,10);
ball_x_reg <= to_unsigned(311,10);
ball_x_reg_cell<=to_unsigned(231,10);
ball_y_reg_cell<=to_unsigned(311,10);
-- velocity after reset schould be none
x_delta_reg <= ("0000000000");
y_delta_reg <= ("0000000000");
elsif (clk'event and clk='1') then
ball_x_reg_cell<=ball_x_next_cell;
ball_y_reg_cell<=ball_y_next_cell;
ball_x_reg <= ball_x_next;
ball_y_reg <= ball_y_next;
x_delta_reg <= x_delta_next;
y_delta_reg <= y_delta_next;
end if;
end process;
pix_x <= unsigned(pixel_x);
pix_y <= unsigned(pixel_y);
-- refr_tick: 1-clock tick asserted at start of v-sync
-- i.e., when the screen is refreshed (60 Hz)
refr_tick <= '1' when (pix_y=481) and (pix_x=0) else
'0';
----------------------------------------------
-- pixel within wall
wall_on <=
'1' when ((WALL_X_LEFTSIDE_L<=pix_x) and (pix_x<=WALL_X_LEFTSIDE_R)) or ((WALL_X_RIGHTSIDE_L<=pix_x) and (pix_x<=WALL_X_RIGHTSIDE_R)) or ((WALL_Y_UPSIDE_U<=pix_y) and (pix_y<=WALL_Y_UPSIDE_D)) or ((WALL_Y_DOWNSIDE_U<=pix_y) and (pix_y<=WALL_Y_DOWNSIDE_D)) else
'0';
-- wall rgb output
wall_rgb <= "001"; -- blue
----------------------------------------------
-- square ball
ball_x_l <= ball_x_reg;
ball_y_t <= ball_y_reg;
ball_x_r <= ball_x_l + BALL_SIZE - 1;
ball_y_b <= ball_y_t + BALL_SIZE - 1;
ball_x_l_cell <= ball_x_reg_cell;
ball_y_t_cell <= ball_y_reg_cell;
ball_x_r_cell <= ball_x_l_cell + BALL_SIZE - 1;
ball_y_b_cell <= ball_y_t_cell + BALL_SIZE - 1;
--tail
-- pixel within squared ball
sq_ball_on <=
'1' when ((ball_x_l<=pix_x) and (pix_x<=ball_x_r) and
(ball_y_t<=pix_y) and (pix_y<=ball_y_b))
or
((ball_x_l_cell<=pix_x) and (pix_x<=ball_x_r_cell) and
(ball_y_t_cell<=pix_y) and (pix_y<=ball_y_b_cell))
else
'0';
ball_x_next <= ball_x_reg + x_delta_reg
when refr_tick='1' else
ball_x_reg ;
ball_y_next <= ball_y_reg + y_delta_reg
when refr_tick='1' else
ball_y_reg ;
ball_x_next_cell <= ball_x_reg - BALL_SIZE when refr_tick='1' and CURRENT_DIRECTION = DIR_RIGHT
else ball_x_reg + BALL_SIZE when refr_tick='1' and CURRENT_DIRECTION = DIR_LEFT
else ball_x_reg when refr_tick='1'
else ball_x_reg_cell;
ball_y_next_cell <= ball_y_reg - BALL_SIZE when refr_tick='1' and CURRENT_DIRECTION = DIR_UP
else ball_y_reg + BALL_SIZE when refr_tick='1' and CURRENT_DIRECTION = DIR_DOWN
else ball_y_reg when refr_tick='1'
else ball_y_reg_cell;
-- new bar y-position
process(ball_y_reg, ball_y_b, ball_y_t, refr_tick, btn, ball_x_reg ,ball_x_r, ball_x_l, x_delta_reg, y_delta_reg)
begin
x_delta_next <= x_delta_reg;
y_delta_next <= y_delta_reg;
if refr_tick='1' then
if btn(1)='1' and ball_y_b<(MAX_Y-1-BALL_SIZE) then
if CURRENT_DIRECTION /= DIR_UP then
CURRENT_DIRECTION <= DIR_DOWN;
y_delta_next <= BALL_V_P; -- move down
x_delta_next <= (others=>'0');
end if;
elsif btn(0)='1' and ball_y_t > BALL_SIZE then
if CURRENT_DIRECTION /= DIR_DOWN then
CURRENT_DIRECTION <= DIR_UP;
y_delta_next <= BALL_V_N; -- move up
x_delta_next <= (others=>'0');
end if;
elsif btn(2)='1' and ball_x_r<(MAX_X-1-BALL_SIZE) then
if CURRENT_DIRECTION /= DIR_LEFT then
CURRENT_DIRECTION <= DIR_RIGHT;
x_delta_next <= BALL_V_P;
y_delta_next <= (others=>'0');
end if;
elsif btn(3)='1' and ball_x_l > BALL_SIZE then
if CURRENT_DIRECTION /= DIR_RIGHT then
CURRENT_DIRECTION <= DIR_LEFT;
x_delta_next <= BALL_V_N;
y_delta_next <= (others=>'0');
end if;
end if;
if ball_x_l < WALL_X_LEFTSIDE_R or ball_y_t < WALL_Y_UPSIDE_D or ball_y_b > WALL_Y_DOWNSIDE_U or ball_x_r > WALL_X_RIGHTSIDE_L then
endOfGame <= true;
CURRENT_DIRECTION <= IDLE;
else
endOfGame <= false;
end if;
end if;
end process;
"Ball x next cell " parts is manually added second cell.
I have been searching through topics containing similiar problem but it is not covering it in vhdl.
Thanks for help!

The problem isn't VHDL - don't get lost in the language differences between VHDL and Java - these are trivial here.
The problem is synthesisability - you need a conceptual design that can be represented in hardware.
You say your Java implementation uses a queue - this will be based on a linked list, with nodes (segments) dynamically allocated, and referenced via pointers. And in fact you could straightforwardly translate that into VHDL, using access types, new and deallocate, and so on. You'd have to implement the details yourself, while there might be a convenient library, i.e. class, in Java. But that's mere detail.
Don't go down that road - access types and especially dynamic allocation aren't synthesisable - you can't normally generate and free chunks of hardware to a running system...
(But you might do that if you wanted to run an existing Snake in a simulator, in parallel with the synthesisable version, to compare their results and verify the synthesisable one matches the already proven software version. If you need a high-reliability Snake designed to military, aerospace or safety critical requirements, you'll need this step.)
You need a different mindset for hardware design, based on knowing what is physically realisable, and how to translate concepts into that.
So, instead you need to consider how you might implement a snake segment before the system starts, and only turn it on when you need it. Then consider how to create as many as you'll ever need before the system starts.
For example a segment might need to know its colour and its X/Y coordinates and some other stuff, like, is it on/visible yet. How might you represent all that?
You might decide, having played the game and reached 50 segments, that 100 is enough to win the game.
Now, records and fixed size arrays are absolutely synthesisable.
That might get you started...

Verilog FIR filter

Hello I am implementing an FIR filter in Verilog, using the DE2 board. For some reason the output out of the speakers is full of static, although it does appear to filter out some frequencies. Here is the code for the FIR:
// Local wires.
wire read_ready, write_ready, read, write;
wire [23:0] readdata_left, readdata_right;
wire [23:0] writedata_left, writedata_right;
assign writedata_left = output_sample;
assign writedata_right = output_sample;
assign read = 1;
assign write = 1;
wire [23:0] input_sample = readdata_left;
reg [23:0] output_sample;
the input sample is put through the FIR, and the output sample is put to both left and right speakers for simplicity.
//The FIR filter
parameter N = 40;
reg signed[23:0] coeffs[39:0];
reg [23:0] holderBefore[39:0];
wire [23:0] toAdd[39:0];
// -- 1000-1100
always #(*)
begin
coeffs[0]=24'b100000000110101001111110; // -- 1
coeffs[1]=24'b100000000110100011011011; // -- 2
coeffs[2]=24'b100000000111000100001100; // -- 3
coeffs[3]=24'b100000000111111000101000;// -- 4
coeffs[4]=24'b100000001000011111111100;// -- 5
coeffs[5]=24'b100000001000011001011001;// -- 6
coeffs[6]=24'b100000000111010001010011;// -- 7
coeffs[7]=24'b100000000100100110111010;// -- 8
coeffs[8]=24'b100000000000011010001101;// -- 9
coeffs[9]=24'b000000000101101111000000;// -- 10
coeffs[10]=24'b000000001101100001000100;// -- 11
coeffs[11]=24'b000000010110111100000000;// -- 12
coeffs[12]=24'b000000100001011111000001;// -- 13
coeffs[13]=24'b000000101100101001010111;// -- 14
coeffs[14]=24'b000000111000000000110100;// -- 15
coeffs[15]=24'b000001000010101010011001;// -- 16
coeffs[16]=24'b000001001100001011111000;// -- 17
coeffs[17]=24'b000001010011111101111100;// -- 18
coeffs[18]=24'b000001011001011001010010;// -- 19
coeffs[19]=24'b000001011100010000110010;// -- 20
coeffs[20]=24'b000001011100010000110010;// -- 20
coeffs[21]=24'b000001011001011001010010;// -- 19
coeffs[22]=24'b000001001100001011111000;// -- 18
coeffs[23]=24'b000001001100001011111000;// -- 17
coeffs[24]=24'b000001000010101010011001;// -- 16
coeffs[25]=24'b000000111000000000110100;// -- 15
coeffs[26]=24'b000000101100101001010111;// -- 14
coeffs[27]=24'b000000100001011111000001;// -- 13
coeffs[28]=24'b000000010110111100000000;// -- 12
coeffs[29]=24'b000000001101100001000100;// -- 11
coeffs[30]=24'b000000000101101111000000;// -- 10
coeffs[31]=24'b100000000000011010001101;// -- 9
coeffs[32]=24'b100000000100100110111010;// -- 8
coeffs[33]=24'b100000000111010001010011;// -- 7
coeffs[34]=24'b100000001000011001011001;// -- 6
coeffs[35]=24'b100000001000011111111100;// -- 5
coeffs[36]=24'b100000000111111000101000;// -- 4
coeffs[37]=24'b100000000111000100001100;// -- 3
coeffs[38]=24'b100000000110100011011011;// -- 2
coeffs[39]=24'b100000000110101001111110;// -- 1
end
genvar i;
generate
for (i=0; i<N; i=i+1)
begin: mult
multiplier mult1(
.dataa(coeffs[i]),
.datab(holderBefore[i]),
.out(toAdd[i]));
end
endgenerate
always #(posedge CLOCK_50 or posedge reset)
begin
if(reset)
begin
holderBefore[39] <= 0;
holderBefore[38] <= 0;
holderBefore[37] <= 0;
holderBefore[36] <= 0;
holderBefore[35] <= 0;
holderBefore[34] <= 0;
holderBefore[33] <= 0;
holderBefore[32] <= 0;
holderBefore[31] <= 0;
holderBefore[30] <= 0;
holderBefore[29] <= 0;
holderBefore[28] <= 0;
holderBefore[27] <= 0;
holderBefore[26] <= 0;
holderBefore[25] <= 0;
holderBefore[24] <= 0;
holderBefore[23] <= 0;
holderBefore[22] <= 0;
holderBefore[21] <= 0;
holderBefore[20] <= 0;
holderBefore[19] <= 0;
holderBefore[18] <= 0;
holderBefore[17] <= 0;
holderBefore[16] <= 0;
holderBefore[15] <= 0;
holderBefore[14] <= 0;
holderBefore[13] <= 0;
holderBefore[12] <= 0;
holderBefore[11] <= 0;
holderBefore[10] <= 0;
holderBefore[9] <= 0;
holderBefore[8] <= 0;
holderBefore[7] <= 0;
holderBefore[6] <= 0;
holderBefore[5] <= 0;
holderBefore[4] <= 0;
holderBefore[3] <= 0;
holderBefore[2] <= 0;
holderBefore[1] <= 0;
holderBefore[0] <= 0;
output_sample <= 0;
end
else
begin
holderBefore[39] <= holderBefore[38];
holderBefore[38] <= holderBefore[37];
holderBefore[37] <= holderBefore[36];
holderBefore[36] <= holderBefore[35];
holderBefore[35] <= holderBefore[34];
holderBefore[34] <= holderBefore[33];
holderBefore[33] <= holderBefore[32];
holderBefore[32] <= holderBefore[31];
holderBefore[31] <= holderBefore[30];
holderBefore[30] <= holderBefore[29];
holderBefore[29] <= holderBefore[28];
holderBefore[28] <= holderBefore[27];
holderBefore[27] <= holderBefore[26];
holderBefore[26] <= holderBefore[25];
holderBefore[25] <= holderBefore[24];
holderBefore[24] <= holderBefore[23];
holderBefore[23] <= holderBefore[22];
holderBefore[22] <= holderBefore[21];
holderBefore[21] <= holderBefore[20];
holderBefore[20] <= holderBefore[19];
holderBefore[19] <= holderBefore[18];
holderBefore[18] <= holderBefore[17];
holderBefore[17] <= holderBefore[16];
holderBefore[16] <= holderBefore[15];
holderBefore[15] <= holderBefore[14];
holderBefore[14] <= holderBefore[13];
holderBefore[13] <= holderBefore[12];
holderBefore[12] <= holderBefore[11];
holderBefore[11] <= holderBefore[10];
holderBefore[10] <= holderBefore[9];
holderBefore[9] <= holderBefore[8];
holderBefore[8] <= holderBefore[7];
holderBefore[7] <= holderBefore[6];
holderBefore[6] <= holderBefore[5];
holderBefore[5] <= holderBefore[4];
holderBefore[4] <= holderBefore[3];
holderBefore[3] <= holderBefore[2];
holderBefore[2] <= holderBefore[1];
holderBefore[1] <= holderBefore[0];
holderBefore[0] <= input_sample;
output_sample <= (input_sample + toAdd[0] + toAdd[1] +
toAdd[2] + toAdd[3] + toAdd[4] + toAdd[5] +
toAdd[6] + toAdd[7] + toAdd[8] + toAdd[9] +
toAdd[10] + toAdd[11] + toAdd[12]+ toAdd[13] + toAdd[14] +
toAdd[15] + toAdd[16] + toAdd[17] + toAdd[18] +
toAdd[19] + toAdd[20] + toAdd[21] + toAdd[22] +
toAdd[23] + toAdd[24] + toAdd[25] +toAdd[26] + toAdd[27] + toAdd[28] + toAdd[29] +
toAdd[19] + toAdd[20] + toAdd[21] + toAdd[22] +
toAdd[30] + toAdd[31] + toAdd[32]+ toAdd[33] + toAdd[34] + toAdd[35] + toAdd[36] +
toAdd[37] + toAdd[38] + toAdd[39]);
end
end
//The multiplier
module multiplier (dataa,datab,out);
input [23:0]dataa;
input [23:0]datab;
reg [47:0]result;
output[23:0]out;
always#(*)begin
result = dataa*datab;
end
assign out = result[46:24];
endmodule
Granted that the coefficients are correct, is there something wrong with the code? I assume there is a problem with the representation of the coefficients in binary, or the multiplier is wrong but I can't figure it out.

The multiplier is not performing signed multiplication.
Verilog defaults to unsigned, if any part of an equation is unsigned it will be come unsigned. If a bit selection is made (even if it is the full width) the arithmetic will be come unsigned.
The following code should perform a signed arithmetic.
module multiplier (
input signed [23:0] dataa,
input signed [23:0] datab,
output reg signed [23:0] out
);
reg signed [47:0] result;
always #* begin
result = dataa*datab;
out = result[46:24];
end
endmodule
Your not capturing the the MSB of result into out which would look like a gain error on unsigned or positive numbers, but might could loose the sign of negative numbers.
When you perform the sum into output_sample there is a possibility that the numbers overflow. for every addition you should add 1 bit of headroom, then limit. May be add some flags to record if it is overflowing/clipping at this stage.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio