I would like to implement a count min sketch with minimal update and access times.
Basically an input sample is hashed by multiple (d) hash functions and each of them increments a counter in the bucket that it hits. When querying for a sample, the counters of all the buckets corresponding to a sample are compared and the value of the smallest counter is returned as a result.
I am trying to find the minimum value of the counters in log_2(d) time with the following code:
entity main is
Port ( rst : in STD_LOGIC;
a_val : out STD_LOGIC_VECTOR(63 downto 0);
b_val : out STD_LOGIC_VECTOR(63 downto 0);
output : out STD_LOGIC_VECTOR(63 downto 0);
. .
. .
. .
CM_read_ready : out STD_LOGIC;
clk : in STD_LOGIC);
end main;
architecture Behavioral of main is
impure function min( LB, UB: in integer; sample: in STD_LOGIC_VECTOR(long_length downto 0)) return STD_LOGIC_VECTOR is
variable left : STD_LOGIC_VECTOR(long_length downto 0) := (others=>'0');
variable right : STD_LOGIC_VECTOR(long_length downto 0) := (others=>'0');
begin
if (LB < UB)
then
left := min(LB, ((LB + UB) / 2) - 1, sample);
right := min(((LB + UB) / 2) - 1, UB, sample);
if (to_integer(unsigned(left)) < to_integer(unsigned(right)))
then
return left;
else
return right;
end if;
elsif (LB = UB)
then
-- return the counter's value so that it can be compared further up in the stack.
return CM(LB, (to_integer(unsigned(hasha(LB)))*to_integer(unsigned(sample))
+ to_integer(unsigned(hashb(LB)))) mod width);
end if;
end min;
begin
CM_hashes_read_log_time: process (clk, rst)
begin
if (to_integer(unsigned(instruction)) = 2)
then
output <= min(0, depth - 1, sample);
end if;
end if;
end process;
end Behavioral;
When I run the above code, I get the following errors:
The simulator has terminated in an unexpected manner. Please review
the simulation log (xsim.log) for details.
[USF-XSim-62] 'compile' step failed with error(s). Please check the
Tcl console output or '/home/...sim/sim_1/behav/xsim/xvhdl.log' file
for more information.
[USF-XSim-62] 'elaborate' step failed with error(s). Please check the
Tcl console output or
'/home/...sim/sim_1/synth/func/xsim/elaborate.log' file for more
information.
I was not able to find any file called xsim.log and xvhdl.log was empty, but elaborate.log had some content:
Vivado Simulator 2018.2
Copyright 1986-1999, 2001-2018 Xilinx, Inc. All Rights Reserved.
Running: /opt/Xilinx/Vivado/2018.2/bin/unwrapped/lnx64.o/xelab -wto c199c4c74e8c44ef826c0ba56222b7cf --incr --debug typical --relax --mt 8 -L xil_defaultlib -L secureip --snapshot main_tb_behav xil_defaultlib.main_tb -log elaborate.log
Using 8 slave threads.
Starting static elaboration
Completed static elaboration
INFO: [XSIM 43-4323] No Change in HDL. Linking previously generated obj files to create kernel
Removing the following line solves the above errors:
output <= min(0, depth - 1, sample);
My questions:
Why am I not able to simulate this code?
Will this code be synthsizable once it is working?
Is there a better (and/or faster) way to obtain the minimum of all relevant hash buckets?
not that I was able to find any real world use for recursion, but just to surprise #EML (as requested in the comments above): you actually can define recursive hardware structures in VHDL.
In Quartus at least, this only works if you give the compiler a clear indication of the maximum recursion depth, otherwise it will try to unroll the recursion to any possible input, eventually dying from a stack overflow:
entity recursive is
generic
(
MAX_RECURSION_DEPTH : natural
);
port
(
clk : in std_ulogic;
n : in natural;
o : out natural
);
end recursive;
architecture Behavioral of recursive is
function fib(max_depth : natural; n : natural) return natural is
variable res : natural;
begin
if max_depth <= 1 then
res := 0;
return res;
end if;
if n = 0 then
res := 0;
elsif n = 1 or n = 2 then
res := 1;
else
res := fib(max_depth - 1, n - 1) + fib(max_depth - 1, n - 2);
end if;
return res;
end function fib;
begin
p_calc : process
begin
wait until rising_edge(clk);
o <= fib(MAX_RECURSION_DEPTH, n);
end process;
end Behavioral;
With a MAX_RECURSION_DEPTH of 6, this generates one single combinational circuit with more than 500 LEs (so the pracical use is probably very limited), but at least it works.
Is recursion possible in VHDL?
I would say, yes, but not recursion as we know it. That's the short answer. I have code (if anyone is interested that implements Quicksort) and it will synthesize quite happily. If anyone knows about Quicksort, it normally won't be anywhere near the context of synthesis. But I managed to do it.
The trick (which is vexatious and hard to follow) is to emulate recursion with a strange state machine that backtracks to the beginning state, after pushing a "state" onto a (hardware) stack. You can synthesize this sort of data structure quite easily if you want.
I recall some fascinating stuff written by Thatcher, Goguen and Wright about semantic transformations from one kind of coding domain to others (different models of computation, in short).
It does strike me that this is possibly a genesis point for actual recursive expressions in a more general sense. But do be warned, it's very difficult.
Related
I want an approximation of the Tanh function by saving the values in a LUT (by this I am doing a quantization). I want to choose the Number of entries in the LUT.
As an not-correct example, I imagine a code like
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
use ieee.fixed_pkg.all;
entity tanh_lut is
generic (
MIN_RANGE: real := 0.0; -- Minimum value of x
MAX_RANGE: real := 5.0; -- Maximum value of x
DATA_RANGE_int: positive:= 8;
DATA_RANGE_frac: positive:= 8;
);
Port ( DIN : in sfixed(DATA_RANGE_int-1 downto -(DATA_RANGE_frac-1));
DOUT : out sfixed(DATA_RANGE_int-1 downto -(DATA_RANGE_frac-1))
end tanh_lut;
architecture Behavioral of tanh_lut is
begin
lut_gen: for i in 0 to LUT_SIZE-1 generate
constant x_val : real := MIN_RANGE + (MAX_RANGE - MIN_RANGE) * i / (LUT_SIZE-1);
constant x_val_next : real := MIN_RANGE + (MAX_RANGE - MIN_RANGE) * (i+1) / (LUT_SIZE-1);
constant y_val : real := tanh(x_val);
if DIN>=x_val_previous AND DIN<x_val then
DOUT <= to_sfixed(tanh(y_val),DOUT ) ;
END IF
end generate;
end Behavioral;
Per example, if I want 4 entries in the range 0 to 3, I want that it is synthesizing a code like:
if DIN>0 AND DIN<=1 then
DOUT <= to_sfixed(0, DOUT);
else DIN>1 AND DIN<=2 then
DOUT <= to_sfixed(0.76159415595, DOUT);
else DIN>2 AND DIN<=3 then
DOUT <= to_sfixed(0.96402758007, DOUT);
else DIN>3 AND DIN<=4 then
DOUT <= to_sfixed(0.99505475368, DOUT);
End if
Is there any way that a code like this or a code which implements the idea behind this is possible?
A simple LUT with addresses is not possible because the addresses are always integer and DIN is fixed point, e.g., 1.5
The other possibility would be two LUTs, one for mapping the Input to an address, another for mapping the address to the LUT entry, e.g., LUT1: 1.5=> address 5, LUT2: address 5 => 0.90. But by this I would double the amount of resources what I dont want
My requirements: things like the tanh(x) should not be synthesized, only the final value of tanh(x). It shoudl also be hardware efficient
It does not matter if you use a nested „if-elsif“ construct or if you use a new „if“ construct for each check.
So you can create a loop like this:
for i in 0 to c_number_of_checks-1 loop
if c_boundaries(i)<DIN and DIN<=c_boundaries(i+1) then
DOUT <= c_output_values(i);
end if;
end loop;
Of course you must provide the constants c_number_of_checks and c_boundaries, c_output_values. This can be done by:
constant c_number_of_checks : natural := 4;
type array_of_your_data_type is array (natural range <>) of your_data_type;
constant c_boundaries : array_of_your_data_type(c_number_of_checks downto 0) := init_c_boundaries(c_number_of_checks);
constant c_output_values : array_of_your_data_type(c_number_of_checks-1 downto 0) := init_c_output_values(c_number_of_checks);
This means you will need the functions init_c_boundaries, init_c_output_values, which create arrays of values, which can initialize the constant c_boundaries and c_output_values.
But this is not complicated (you can use from ieee.math_real the function TANH), as the functions need not to be synthesizable, as they are called only during compile time.
As you see, you will have some effort. So perhaps it is easier to follow the other suggestions. If you do so (value as address of a LUT) you should think about automatic ROM inference, which is provided by several tool chains and will give you a very efficient (small) hardware.
I'm currently working on writing a simple counter in VHDL, trying to genericize it as much as possible. Ideally I end up with a counter that can pause, count up/down, and take just two integer (min, max) values to determine the appropriate bus widths.
As far as I can tell, in order to get an integer of a given range, I just need to delcare
VARIABLE cnt: INTEGER RANGE min TO max := 0
Where min and max are defined as generics (both integers) in the entity. My understanding of this is that if min is 0, max is 5, for example, it will create an integer variable of 3 bits.
My problem is that I actually want to output this integer. So, naturally, I write
counterOut : OUT INTEGER RANGE min TO max
But this does not appear to be doing what I need. I'm generating a schematic block in Quartus Prime from this, and it creates a bus output from [min...max]. For example, if min = 0, max = 65, it outputs a 66 bit bus. Instead of the seven bit bus it should.
If I restricted the counter to unsigned values I might be able to just math out the output bus size, but I'd like to keep this as flexible as possible, and of course I'd like to know what I'm actually doing wrong and how to do it properly.
TL;DR: I want a VHDL entity to take generic min,max values, and generate an integer output bus of the required width to hold the range of values. How do?
If it matters, I'm using Quartus Prime Lite Edition V20.1.0 at the moment.
Note: I know I can use STD_LOGIC_VECTOR instead, but it is going to simulate significantly slower and is less easy to use than the integer type as far as I have read. I can provide more of my code if necessary, but it's really this one line that's the problem as far as I can tell.
I originally posted this on Stackexchange, but I think Stackoverflow might be a better place since it's more of a programming than a hardware problem.
EDIT: Complete code shown below
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.numeric_std.all;
USE ieee.std_logic_signed.all;
ENTITY Counter IS
GENERIC (modulo : INTEGER := 32;
min : INTEGER := 0;
max : INTEGER := 64);
PORT( pause : IN STD_LOGIC;
direction : IN STD_LOGIC; -- 1 is up, 0 is down
clk : IN STD_LOGIC;
counterOut : OUT INTEGER RANGE min TO max --RANGE 0 TO 32 -- THIS line is the one generating an incorrect output bus width
);
END ENTITY Counter;
-- or entity
ARCHITECTURE CounterArch OF Counter IS
BEGIN
PROCESS(direction, pause, clk)
VARIABLE cnt : INTEGER RANGE min TO max := 0;
VARIABLE dir : INTEGER;
BEGIN
IF direction = '1' THEN
dir := 1;
ELSE
dir := -1;
END IF;
IF clk'EVENT AND clk = '1' THEN
IF pause = '0'THEN
IF (cnt = modulo AND direction = '1') THEN
cnt := min; -- If we're counting up and hit modulo, reset to min value.
ELSIF (cnt = min AND direction = '0') THEN
cnt := modulo; --Counting down hit 0, go back to modulo.
ELSE
cnt := cnt + dir;
END IF;
END IF;
END IF;
counterOut <= cnt;
END PROCESS;
END ARCHITECTURE CounterArch;
I'm storing two tables in two signals. One table keeps the key (address) and the other keeps the value corresponding to the key. I need to compare an input to the key and, if they match, return the value stored.
The reason why I need this is for a dynamic lookup table for branch instruction prediction. In the fetch stage of a processor I get the input Instruction_Address and I return a branch_To_Address and a branch_Prediction. Initially I want to store 16 predictions/branch addresses and use a circular buffer ring to overwrite as needed.
I've been trying to use a FOR with a nested IF to search for the key inside keyTable.
The whole module seems to work fine, except when I compare two bit_vectors with the IF statement. I need this twice (one on read and another on write) and hence I need to "sweep" the keysTable so I can see if the address that is being looked up has an entry.
I noticed the error upon simulation, where the ELSE clause is being called always regardless of the keysTable having the right entries.
Verifiable example:
library IEEE;
use ieee.numeric_bit.all;
entity branch_prediction_table is
generic (
addrSize : NATURAL := 4;
tableSize : NATURAL := 4);
port (
clock : in bit;
input_addr: in bit_vector(addrSize-1 downto 0);
return_value : out bit );
end branch_prediction_table;
architecture branch_table of branch_prediction_table is
signal keysTable : bit_vector(addrSize*tableSize-1 downto 0) := ( others => '0');
signal valuesTable : bit_vector(tableSize*2-1 downto 0) := ( others => '0');
begin
tableProc: process(clock) is
variable valueFromTable : bit;
begin
if rising_edge(clock) then
search_table: for iR in (tableSize-1) to 0 loop
if (keysTable(addrSize*(iR+1)-1 downto addrSize*iR) = input_addr) then
valueFromTable := valuesTable((iR+1)*2-1);
EXIT search_table;
else
valueFromTable := '0';
end if;
end loop search_table;
return_value <= valueFromTable;
end if; -- rising_edge(clock)
end process tableProc;
end branch_table;
with verifiable testbench simulation TCL:
add wave -position insertpoint \
sim:/branch_prediction_table/addrSize \
sim:/branch_prediction_table/clock \
sim:/branch_prediction_table/input_addr \
sim:/branch_prediction_table/keysTable \
sim:/branch_prediction_table/return_value \
sim:/branch_prediction_table/tableSize \
sim:/branch_prediction_table/valuesTable
force -freeze sim:/branch_prediction_table/valuesTable 11111111 0
force -freeze sim:/branch_prediction_table/keysTable 1111101001100011 0
force -freeze sim:/branch_prediction_table/clock 0 0, 1 {5000 ps} -r {10 ns}
run 10 ns
force -freeze sim:/branch_prediction_table/input_addr 1010 0
run 20 ns
force -freeze sim:/branch_prediction_table/input_addr 1111 0
run 10 ns
and testbench simulation result showing that error is indeed in the IF:
I have tried converting them with to_integer(unsigned(bit_vector1)) = to_integer(unsigned(bit_vector2)) with no avail
As user1155120 pointed out:
The problem lies within search_table: for iR **in** (tableSize-1) to 0 loop
It should've been "down to" as L > R. Since I used "in" with L>R, that produces a null range and the for loop iteration is said to be complete.
(IEEE Std 1076-2008 5.2 Scalar types, "A range specifies a subset of values of a scalar type. A range is said to be a null range if the specified subset is empty. The range L to R is called an ascending range; if L > R, then the range is a null range. The range L downto R is called a descending range; if L < R, then the range is a null range.").
10.10 Loop statement "For the execution of a loop with a for iteration scheme, the discrete range is first evaluated. If the discrete range is a null range, the iteration scheme is said to be complete, ..."
I'm writing my own package to deal with generic matrix-like objects due to unavailability of VHDL-2008 (I'm only concerned with compilation and simulation for the time being).
My aim is getting a matrix M_out from a matrix M_in such that:
M_out(i downto 0, j downto 0) <= M_in(k+i downto k, l+j downto l);
using a subroutine of sort. For, let's say, semantic convenience and analogy with software programming languages my subroutine prototype should ideally look something like this:
type matrix is array(natural range <>, natural range <>) of std_logic;
...
procedure slice_matrix(signal m_out: out matrix;
constant rows: natural range<>;
constant cols: natural range<>;
signal m_in: in matrix);
The compiler does however regard this as an error:
** Error: custom_types.vhd(9): near "<>": syntax error
** Error: custom_types.vhd(9): near "<>": syntax error
Is it possible to pass a range as an argument in some way or shall I surrender and pass 4 separate indexes to calculate it locally?
An unconstrained index range natural range <> is not a VHDL object of class signal, variable, constant, or file. Thus it can not be passed into a subprogram. I wouldn't implement a slice operations as a procedure, because it's a function like behavior.
An implementation for working with matrices and slices thereof is provided by the PoC-Library. The implementation is provided in the vectors package.
function slm_slice(slm : T_SLM; RowIndex : natural; ColIndex : natural; Height : natural; Width : natural) return T_SLM is
variable Result : T_SLM(Height - 1 downto 0, Width - 1 downto 0) := (others => (others => '0'));
begin
for i in 0 to Height - 1 loop
for j in 0 to Width - 1 loop
Result(i, j) := slm(RowIndex + i, ColIndex + j);
end loop;
end loop;
return Result;
end function;
More specialized functions to slice off a row or column can be found in that file too. It also provides procedures to assign parts of a matrix.
This package works in simulation and synthesis.
Unfortunately, slicing multi dimensional arrays will not be part of VHDL-2017. I'll make sure it's discuss for VHDL-202x again.
Passing ranges into a subprogram will be allowed in VHDL-2017. The language change LCS 2016-099 adds this capability.
Description:
I want to write vhdl code that finds the largest integer in the array A which is an array of 20 integers.
Question:
what should my algorithm look like, to input where the sequential statements are?
my vhdl code:
highnum: for i in 0 to 19 loop
i = 0;
i < 20;
i<= i + 1;
end loop highnum;
This does not need to be synthesizable but I dont know how to form this for loop a detailed example explaining how to would be appreciated.
Simply translating the C loop to VHDL, inside a VHDL clocked process, will work AND be synthesisable. It will generate a LOT of hardware because it has to generate the output in a single clock cycle, but that doesn't matter if you are just simulating it.
If that is too much hardware, then you have to implement it as a state machine with at least two states, Idle and Calculating, so that it performs only one loop iteration per clock cycle while Calculating, and returns to the Idle state when done.
First of all you should know how have you defined the array in vhdl.
Let me define an array for you.
type array_of_integer array(19 downto 0) of integer;
signal A : array_of_integer :=(others => 0);
signal max : integer;
-- Now above is the array in vhdl of integers all are initialized to value 0.
A(0) <= 1;
A(1) <= 2;
--
--
A(19)<= 19;
-- Now the for loop for calculating maximum
max <= A(0);
for i in 0 to 19 loop
if (A(i) > max) then
max <= A(i);
end if;
end loop;
-- Now If you have problems in understating that where to put which part of code .. in a ----vhdl entity format .. i.e process, ports, etc... you can reply !