Scaling down a 128 bit Xorshift. - PRNG in vhdl - random

Im trying to figure out a way of generating random values (pseudo random will do) in vhdl using vivado (meaning that I can't use the math_real library).
These random values will determine the number of counts a prescaler will run for which will then in turn generate random timing used for the application.
This means that the values generated do not need to have a very specific value as I can always tweak the speed the prescaler runs at. Generally speaking I am looking for values between 1000 - 10,000, but a bit larger might do as well.
I found following code online which implements a 128 bit xorshift and does seem to work very well. The only problem is that the values are way too large and converting to an integer is pointless as the max value for an unsigned integer is 2^32.
This is the code:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity XORSHIFT_128 is
port (
CLK : in std_logic;
RESET : in std_logic;
OUTPUT : out std_logic_vector(127 downto 0)
);
end XORSHIFT_128;
architecture Behavioral of XORSHIFT_128 is
signal STATE : unsigned(127 downto 0) := to_unsigned(1, 128);
begin
OUTPUT <= std_logic_vector(STATE);
Update : process(CLK) is
variable tmp : unsigned(31 downto 0);
begin
if(rising_edge(CLK)) then
if(RESET = '1') then
STATE <= (others => '0');
end if;
tmp := (STATE(127 downto 96) xor (STATE(127 downto 96) sll 11));
STATE <= STATE(95 downto 0) &
((STATE(31 downto 0) xor (STATE(31 downto 0) srl 19)) xor (tmp xor (tmp srl 8)));
end if;
end process;
end Behavioral;
For the past couple of hours I have been trying to downscale this 128 bit xorshift PRNG to an 8 bit, 16 bit or even 32 bit PRNG but every time again I get either no output or my simulation (testbench) freezes after one cycle.
I've tried just dividing the value which does work in a way, but the size of the output of the 128 bit xorshift is so large that it makes it a very unwieldy way of going about the situation.
Any ideas or pointers would be very welcome.

To reduce the range of your RNG to a smaller power of two range, simply ignore some of the bits. I guess that's something like OUTPUT(15 downto 0) but I don't know VHDL at all.
The remaining bits represent working state for the generator and cannot be eliminated from the design even if you don't use them.
If you mean that the generator uses too many gates, then you'll need to find a different algorithm. Wikipedia gives an example 32-bit xorshift generator in C which you might be able to adapt.

Table 3 in the old Xilinx Application Note has the information you need to make such random generator circuit for 8-bit as you mention.
https://www.xilinx.com/support/documentation/application_notes/xapp052.pdf

Related

Is There Any Limit to How Wide 2 VHDL Numbers Can Be To Add Them In 1 Clock Cycle?

I am considering adding two 1024-bit numbers in VHDL.
Ideally, I would like to hit a 100 MHz clock frequency.
Target is a Xilinx 7-series.
When you add 2 numbers together, there are inevitably carry bits. Since carry bits on the left cannot be computed until bits on the right have been calculated, to me it seems there should be a limit on how wide a register can be and still be added in 1 clock cycle.
Here are my questions:
1.) Do FPGAs add numbers in this way? Or do they have some way of performing addition that does not suffer from the carry problem?
2.) Is there a limit to the width? If so, is 1024 within the realm of reason for a 100 MHz clock, or is that asking for trouble?
No. You just need to choose a suitably long clock cycle.
Practically, though there is no fundamental limit, for any given cycle time, there will be some limit which depends on the FPGA technology.
At 1024 bits, I'd look at breaking the addition and pipelining it.
Implemented as a single cycle, I would expect a 1024 bit addition to have a speed somewhere around 5, maybe 10 MHz. (This would be easy to check : synthesise one and look at the timing reports!)
Pipelining is not the only approach to overcoming that limit.
There are also "fast adder" architectures like carry look-ahead, carry-save (details via the usual sources) ... these pretty much fell out of fashion when FPGAs built fast carry chains into the LUT fabric, but they may have niche uses such as yours. However they may not be optimally supported by synthesis since (for most purposes) the fast carry chain is adequate.
Maybe this works, have not tried it:
library ieee;
USE ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity Calculator is
generic(
num_length : integer := 1024
);
port(
EN: in std_logic;
clk: in std_logic;
number1 : in std_logic_vector((num_length) - 1 downto 0);
number2 : in std_logic_vector((num_length) - 1 downto 0);
CTRL : in std_logic_vector(2 downto 0);
result : out std_logic_vector(((num_length * 2) - 1) downto 0));
end Calculator;
architecture Beh of Calculator is
signal temp : unsigned(((num_length * 2) - 1) downto 0) := (others => '0');
begin
result <= std_logic_vector(temp);
process(EN, clk)
begin
if EN ='0' then
temp <= (others => '0');
elsif (rising_edge(clk))then
case ctrl is
when "00" => temp <= unsigned(number1) + unsigned(number2);
when "01" => temp <= unsigned(number1) - unsigned(number2);
when "10" => temp <= unsigned(number1) * unsigned(number2);
when "11" => temp <= unsigned(number1) / unsigned(number2);
end case;
end if;
end process;
end Beh;

Same design in VHDL and Verilog. But different speed and resource usages?

I have two codes, one in Verilog and another in vhdl, which counts the number of one's in a 16 bit binary number. Both does the same thing, but after synthesising using Xilinx ISE, I get different synthesis reports.
Verilog code:
module num_ones_for(
input [15:0] A,
output reg [4:0] ones
);
integer i;
always#(A)
begin
ones = 0; //initialize count variable.
for(i=0;i<16;i=i+1) //for all the bits.
ones = ones + A[i]; //Add the bit to the count.
end
endmodule
VHDL code:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity num_ones_for is
Port ( A : in STD_LOGIC_VECTOR (15 downto 0);
ones : out STD_LOGIC_VECTOR (4 downto 0));
end num_ones_for;
architecture Behavioral of num_ones_for is
begin
process(A)
variable count : unsigned(4 downto 0) := "00000";
begin
count := "00000"; --initialize count variable.
for i in 0 to 15 loop --for all the bits.
count := count + ("0000" & A(i)); --Add the bit to the count.
end loop;
ones <= std_logic_vector(count); --assign the count to output.
end process;
end Behavioral;
Number of LUT's used in VHDL and Verilog - 25 and 20.
Combination delay of the circuit - 3.330 ns and 2.597 ns.
As you can see the verilog code looks much more efficient. Why is that?
The only difference I can see is, how 4 zeros are appended on MSB side in VHDL code. But I did this, because otherwise VHDL throws an error.
Is this because of the tool I am using, or HDL language or the way I wrote the code?
You will need to try a number of different experiments before coming to any conclusions. But my observation is that Verilog is used more frequently in the most critical capacity/area/performance designs. Therefore the majority of research effort goes into handling Verilog language tools first.

VHDL multiplier which output has the same side of it's inputs

I'm using VHDL for describing a 32 bits multiplier, for a system to be implemented on a Xilinx FPGA, I found on web that the rule of thumb is that if you have inputs of N-bits size, the output must've (2*N)-bits of size. I'm using it for a feedback system, is it posible to has a multiplier with an output of the same size of it's inputs?.
I swear once I found a fpga application, which vhdl code has adders and multipliers blocks wired with signals of the same size. The person who wrote the code told me that you just have to put the result of the product on a 64 bits signal and then the output has to get the most significant 32 bits of the result (which was not necesarily on the most significant 32 bits of the 64 bits signal).
At the time I build a system (apparently works) using the next code:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity Multiplier32Bits is
port(
CLK: in std_logic;
A,B: in std_logic_vector(31 downto 0);
R: out std_logic_vector(31 downto 0)
);
end Multiplier32Bits;
architecture Behavioral of Multiplier32Bits is
signal next_state: std_logic_vector(63 downto 0);
signal state: std_logic_vector(31 downto 0);
begin
Sequential: process(CLK,state,next_state)
begin
if CLK'event and CLK = '1' then
state <= next_state(61 downto 30);
else
state <= state;
end if;
end process Sequential;
--Combinational part
next_state <= std_logic_vector(signed(A)*signed(B));
--Output assigment
R <= state;
end Behavioral;
I though it was working since at the time I had the block simulated with Active-HDL FPGA simulator, but know that I'm simulating the whole 32 bit system using iSim from Xilinx ISE Design Suite. I found that my output has a big difference from the real product of A and B inputs, which I don't know if it's just the accuracy loose from skipping 32 bits or my code is just bad.
Your code has some problems:
next_state and state don't belong into the sensitivity list
The writing CLK'event and CLK = '1' should be replaced by rising_edge(CLK)
state <= state; has no effect and causes some tools like ISE to misread the pattern. Remove it.
Putting spaces around operators doesn't hurt, but improves readability.
Why do you expect the result of a * b in bits 30 to 61 instead of 0 to 31?
state and next_state don't represent states of a state machine. It's just a register.
Improved code:
architecture Behavioral of Multiplier32Bits is
signal next_state: std_logic_vector(63 downto 0);
signal state: std_logic_vector(31 downto 0);
begin
Sequential: process(CLK)
begin
if rising_edge(CLK) then
state <= next_state(31 downto 0);
end if;
end process Sequential;
--Combinational part
next_state <= std_logic_vector(signed(A) * signed(B));
--Output assigment
R <= state;
end architecture Behavioral;
I totally agree with everything that Paebbels write. But I will explain to you this things about number of bits in the result.
So I will explain it by examples in base 10.
9 * 9 = 81 (two 1 digit numbers gives maximum of 2 digits)
99 * 99 = 9801 (two 2 digit numbers gives maximum of 4 digits)
999 * 999 = 998001 (two 3 digit numbers gives maximum of 6 digits)
9999 * 9999 = 99980001 (4 digits -> 8 digits)
And so on... It is totally the same for binary. That's why output is (2*N)-bits of size of input.
But if your numbers are smaller, then result will fit in same number of digits, as factors:
3 * 3 = 9
10 * 9 = 90
100 * 99 = 990
And so on. So if your numbers are small enough, then result will be 32 bit. Of course, as Paebbels already written, result will be in least significant part of signal.
And as J.H.Bonarius already pointed out, if your input consist not of integer, but fixed point numbers, then you would have to do post shifting. If this is your case, write it in the comment, and I will explain what to do.

Non-recommended latches generated in VHDL

As part of a school project where we do genetics algorithm, I am programming something called "crossover core" in VHDL. This core is supposed to take in two 64-bit input "parents" and the two outputs "children" should contain parts from both inputs.
The starting point for this crossover is based on a value from an input random_number, where the 6 bit-value detemines the bit-number for where to start the crossover.
For instance, if the value from the random_number is 7 (in base 10), and the inputs are only 0's on one, and only 1's on the other, then the output should be something like this:
000.....00011111111 and 111.....11100000000
(crossover start at bit number 7)
This is the VHDL code:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity crossover_core_split is
generic (
N : integer := 64;
R : integer := 6
);
port (
random_number : in STD_LOGIC_VECTOR(R-1 downto 0);
parent1 : in STD_LOGIC_VECTOR(N-1 downto 0);
parent2 : in STD_LOGIC_VECTOR(N-1 downto 0);
child1 : out STD_LOGIC_VECTOR(N-1 downto 0);
child2 : out STD_LOGIC_VECTOR(N-1 downto 0)
);
end crossover_core_split;
architecture Behavioral of crossover_core_split is
signal split : INTEGER := 0;
begin
split <= TO_INTEGER(UNSIGNED(random_number));
child1 <= parent1(N-1 downto split+1) & parent2(split downto 0);
child2 <= parent2(N-1 downto split+1) & parent1(split downto 0);
end Behavioral;
The code is written and compiled in Xilinx ISE Project Navigator 12.4.
I have tested this in ModelSim, and verified that it works. However, there is an issues with latches, and I get these warnings:
WARNING:Xst:737 - Found 1-bit latch for signal <child1<62>>. Latches may be generated from incomplete case or if statements. We do not recommend the use of latches in FPGA/CPLD designs, as they may lead to timing problems.
WARNING:Xst:737 - Found 1-bit latch for signal <child1<61>>. Latches may be generated from incomplete case or if statements. We do not recommend the use of latches in FPGA/CPLD designs, as they may lead to timing problems.
ETC ETC ETC...
WARNING:Xst:1336 - (*) More than 100% of Device resources are used
A total of 128 latches are generated, but appearantly they are not recommended.
Any advices in how to avoid latches, or at least reduce them?
This code is not well suited for synthesis: the length of the sub-vectors should not vary and maybe this is the reason for the latches.
For me the best solution is to create a mask from the random value: you can do that in many way (it's typically a binary to thermometric conversion). As example (it's not the optimal one):
process(random_number)
begin
for k in 0 to 63 loop
if k <= to_integer(unsigned(random_number)) then
mask(k) <= '1';
else
mask(k) <= '0';
end if;
end loop;
end process;
then once you have the mask value you can simply write:
child1 <= (mask and parent1) or ((not mask) and parent2);
child2 <= (mask and parent2) or ((not mask) and parent1);

How to generate pseudo random number in FPGA?

How to generate pseudo random number in FPGA?
This has been covered (I'd go for an LFSR):
Random number generation on Spartan-3E
There's an excellent Xilinx application note on generating pseudo-random number sequences efficiently in an FPGA. It's XAPP052.
If it's not for cryptography or other applications with an intelligent adversary (e.g. gambling) I'd use a linear feedback shift register approach.
It only uses exclusive or and shift, so it is very simple to implement in hardware.
As others have said, LFSRs can be used for pseudo random numbers in an FPGA. Here is a VHDL implementation of a maximal length 32-bit LFSR.
process(clk)
-- maximal length 32-bit xnor LFSR based on xilinx app note XAPP210
function lfsr32(x : std_logic_vector(31 downto 0)) return std_logic_vector is
begin
return x(30 downto 0) & (x(0) xnor x(1) xnor x(21) xnor x(31));
end function;
begin
if rising_edge(clk) then
if rst='1' then
pseudo_rand <= (others => '0');
else
pseudo_rand <= lfsr32(psuedo_rand);
end if;
end if;
end process;

Resources