How to generate pseudo random number in FPGA?

This has been covered (I'd go for an LFSR):
Random number generation on Spartan-3E

There's an excellent Xilinx application note on generating pseudo-random number sequences efficiently in an FPGA. It's XAPP052.

If it's not for cryptography or other applications with an intelligent adversary (e.g. gambling) I'd use a linear feedback shift register approach.
It only uses exclusive or and shift, so it is very simple to implement in hardware.

As others have said, LFSRs can be used for pseudo random numbers in an FPGA. Here is a VHDL implementation of a maximal length 32-bit LFSR.
-- maximal length 32-bit xnor LFSR based on xilinx app note XAPP210
function lfsr32(x : std_logic_vector(31 downto 0)) return std_logic_vector is
return x(30 downto 0) & (x(0) xnor x(1) xnor x(21) xnor x(31));
end function;
if rising_edge(clk) then
if rst='1' then
pseudo_rand <= (others => '0');
pseudo_rand <= lfsr32(psuedo_rand);
end if;
end if;
end process;


Is There Any Limit to How Wide 2 VHDL Numbers Can Be To Add Them In 1 Clock Cycle?

I am considering adding two 1024-bit numbers in VHDL.
Ideally, I would like to hit a 100 MHz clock frequency.
Target is a Xilinx 7-series.
When you add 2 numbers together, there are inevitably carry bits. Since carry bits on the left cannot be computed until bits on the right have been calculated, to me it seems there should be a limit on how wide a register can be and still be added in 1 clock cycle.
Here are my questions:
1.) Do FPGAs add numbers in this way? Or do they have some way of performing addition that does not suffer from the carry problem?
2.) Is there a limit to the width? If so, is 1024 within the realm of reason for a 100 MHz clock, or is that asking for trouble?
No. You just need to choose a suitably long clock cycle.
Practically, though there is no fundamental limit, for any given cycle time, there will be some limit which depends on the FPGA technology.
At 1024 bits, I'd look at breaking the addition and pipelining it.
Implemented as a single cycle, I would expect a 1024 bit addition to have a speed somewhere around 5, maybe 10 MHz. (This would be easy to check : synthesise one and look at the timing reports!)
Pipelining is not the only approach to overcoming that limit.
There are also "fast adder" architectures like carry look-ahead, carry-save (details via the usual sources) ... these pretty much fell out of fashion when FPGAs built fast carry chains into the LUT fabric, but they may have niche uses such as yours. However they may not be optimally supported by synthesis since (for most purposes) the fast carry chain is adequate.
Maybe this works, have not tried it:
library ieee;
USE ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity Calculator is
num_length : integer := 1024
EN: in std_logic;
clk: in std_logic;
number1 : in std_logic_vector((num_length) - 1 downto 0);
number2 : in std_logic_vector((num_length) - 1 downto 0);
CTRL : in std_logic_vector(2 downto 0);
result : out std_logic_vector(((num_length * 2) - 1) downto 0));
end Calculator;
architecture Beh of Calculator is
signal temp : unsigned(((num_length * 2) - 1) downto 0) := (others => '0');
result <= std_logic_vector(temp);
process(EN, clk)
if EN ='0' then
temp <= (others => '0');
elsif (rising_edge(clk))then
case ctrl is
when "00" => temp <= unsigned(number1) + unsigned(number2);
when "01" => temp <= unsigned(number1) - unsigned(number2);
when "10" => temp <= unsigned(number1) * unsigned(number2);
when "11" => temp <= unsigned(number1) / unsigned(number2);
end case;
end if;
end process;
end Beh;

Same design in VHDL and Verilog. But different speed and resource usages?

I have two codes, one in Verilog and another in vhdl, which counts the number of one's in a 16 bit binary number. Both does the same thing, but after synthesising using Xilinx ISE, I get different synthesis reports.
Verilog code:
module num_ones_for(
input [15:0] A,
output reg [4:0] ones
integer i;
ones = 0; //initialize count variable.
for(i=0;i<16;i=i+1) //for all the bits.
ones = ones + A[i]; //Add the bit to the count.
VHDL code:
library IEEE;
entity num_ones_for is
Port ( A : in STD_LOGIC_VECTOR (15 downto 0);
ones : out STD_LOGIC_VECTOR (4 downto 0));
end num_ones_for;
architecture Behavioral of num_ones_for is
variable count : unsigned(4 downto 0) := "00000";
count := "00000"; --initialize count variable.
for i in 0 to 15 loop --for all the bits.
count := count + ("0000" & A(i)); --Add the bit to the count.
end loop;
ones <= std_logic_vector(count); --assign the count to output.
end process;
end Behavioral;
Number of LUT's used in VHDL and Verilog - 25 and 20.
Combination delay of the circuit - 3.330 ns and 2.597 ns.
As you can see the verilog code looks much more efficient. Why is that?
The only difference I can see is, how 4 zeros are appended on MSB side in VHDL code. But I did this, because otherwise VHDL throws an error.
Is this because of the tool I am using, or HDL language or the way I wrote the code?
You will need to try a number of different experiments before coming to any conclusions. But my observation is that Verilog is used more frequently in the most critical capacity/area/performance designs. Therefore the majority of research effort goes into handling Verilog language tools first.

Scaling down a 128 bit Xorshift. - PRNG in vhdl

Im trying to figure out a way of generating random values (pseudo random will do) in vhdl using vivado (meaning that I can't use the math_real library).
These random values will determine the number of counts a prescaler will run for which will then in turn generate random timing used for the application.
This means that the values generated do not need to have a very specific value as I can always tweak the speed the prescaler runs at. Generally speaking I am looking for values between 1000 - 10,000, but a bit larger might do as well.
I found following code online which implements a 128 bit xorshift and does seem to work very well. The only problem is that the values are way too large and converting to an integer is pointless as the max value for an unsigned integer is 2^32.
This is the code:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity XORSHIFT_128 is
port (
CLK : in std_logic;
RESET : in std_logic;
OUTPUT : out std_logic_vector(127 downto 0)
end XORSHIFT_128;
architecture Behavioral of XORSHIFT_128 is
signal STATE : unsigned(127 downto 0) := to_unsigned(1, 128);
OUTPUT <= std_logic_vector(STATE);
Update : process(CLK) is
variable tmp : unsigned(31 downto 0);
if(rising_edge(CLK)) then
if(RESET = '1') then
STATE <= (others => '0');
end if;
tmp := (STATE(127 downto 96) xor (STATE(127 downto 96) sll 11));
STATE <= STATE(95 downto 0) &
((STATE(31 downto 0) xor (STATE(31 downto 0) srl 19)) xor (tmp xor (tmp srl 8)));
end if;
end process;
end Behavioral;
For the past couple of hours I have been trying to downscale this 128 bit xorshift PRNG to an 8 bit, 16 bit or even 32 bit PRNG but every time again I get either no output or my simulation (testbench) freezes after one cycle.
I've tried just dividing the value which does work in a way, but the size of the output of the 128 bit xorshift is so large that it makes it a very unwieldy way of going about the situation.
Any ideas or pointers would be very welcome.
To reduce the range of your RNG to a smaller power of two range, simply ignore some of the bits. I guess that's something like OUTPUT(15 downto 0) but I don't know VHDL at all.
The remaining bits represent working state for the generator and cannot be eliminated from the design even if you don't use them.
If you mean that the generator uses too many gates, then you'll need to find a different algorithm. Wikipedia gives an example 32-bit xorshift generator in C which you might be able to adapt.
Table 3 in the old Xilinx Application Note has the information you need to make such random generator circuit for 8-bit as you mention.

Process or not to Process?

I have the below code in VHDL that I use in a project. I have been using a Process within the architecture and wanted to know if there were any other means which I'm sure there are of accomplishing the same goal.. in essence to take one number compare it to another and if there is a difference of +/- 2 reflect this in the output. I am using the following:
USE IEEE.std_logic_1164.all, IEEE.std_logic_arith.all, IEEE.std_logic_signed;
ENTITY thermo IS
CLK : in std_logic;
Tset, Tact : in std_logic_vector (6 DOWNTO 0);
Heaton : out std_logic
END ENTITY thermo;
ARCHITECTURE behavioral OF thermo IS
SIGNAL TsetINT, TactINT : integer RANGE 63 Downto -64; --INT range so no 32bit usage
Heat_on_off: PROCESS
VARIABLE ONOFF: std_logic;
TsetINT <= conv_integer (signed (Tset));--converts vector to Int
TactINT <= conv_integer (signed (Tact));--converts vector to Int
--If you read this why is it conv_integer not to_integer?? thx
ONOFF := '0'; --so variable does not hang on start
IF TactINT <= (TsetINT - 2) then
ONOFF := '1';
ELSIF TactINT >= (TsetINT + 2) then
ONOFF := '0';
Heaton <= ONOFF;
I'm just after a comparison really and to know if there are any better ways of doing what I have already done.
Why convert Tact and Tset to an integer?
Why have the variable ONOFF? The variable initialization appears to remove any sense of hysteresis, is that what you intended? Based on your other code, I bet not. I recommend that you assign directly to the signal Heaton instead of using the variable ONOFF.
If I were to create TsetINT and TactINt, these would be good candidates to be variables. However, there is no need to do the integer conversion as you can simply do the following:
if signed(Tact) <= signed(Tset) - 2 then
elsif signed(Tact) >= signed(Tset) + 2 then
Please use numeric_std. Please ask your professor why they are teaching you old methodologies that are not current industry practice. Numeric_std is an IEEE standard and is updated with the standard, std_logic_arith is not an IEEE standard.
use ieee.numeric_std.all ;
In response to Jim's comment I wrote a simple thermal model test bench to test your design.
I only changed your design to use package numeric_std instead of the Synopsys packages. The rest is just prettifying and eliminating comments not germane to the question of whether or not Tact ever reaches Tset.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity thermo is
port (
CLK: in std_logic;
Tset, Tact: in std_logic_vector (6 downto 0);
Heaton: out std_logic
end entity thermo;
architecture behavioral of thermo is
signal TsetINT, TactINT: integer range 63 downto -64;
variable ONOFF: std_logic;
TsetINT <= to_integer (signed (Tset)); -- package numeric_std
TactINT <= to_integer (signed (Tact)); -- instead of conv_integer
ONOFF := '0'; -- AT ISSUE -- so variable does not hang on start
wait until CLK'event and CLK = '1';
if TactINT <= TsetINT - 2 then -- operator precedence needs no parens
ONOFF := '1';
elsif TactINT >= TsetINT + 2 then
ONOFF := '0';
end if;
Heaton <= ONOFF;
end process;
end architecture behavioral;
You have a comment in your process asking why conv_integer was required instead of to_integer. That prompted the change.
I removed superfluous parentheses based on operator order precedence (adding operators being higher precedence than relational operators), notice Jim's answer did the same.
So the simple model thermal model runs with a clock set to a 1 second period, and has two coefficients, relating to the temperature increase when Heaton is '1' or not. I arbitrarily set the heating up coefficient to 1 every 4 clocks, and the temperature decay coefficient to 1 every 10 clocks. Also set the ambient temperature (tout) to 10 and tset to 22. The numbers selected are severe to keep the model run time short enhancing portability without relying on setting a simulator resolution limit.
The thermal model was implemented using fixed signed arithmetic without using fixed_generic_pkg, allowing portability to -1993 tools without math packages and includes a fractional part, responsible for the different widths of Heaton true after reaching normal operating temperature. The model could just as easily have been implemented with two different precursor counters used to tell when to increment or decrement Tact.
Using REAL types is possible, not desirable because converting REAL to INTEGER (then to SIGNED) isn't portable (IEEE Std 1076-2008 Annex D).
The idea here is to demonstrate the lack of hysteresis and demonstrate the model doesn't reach Tset:
The lack of hitting Tset (22 + 2) is based on the lack of hysteresis. Hysteresis is desirable for reducing the number of heat on and off cycles The idea is once you start the heater you leave in on for a while, and once you stop it you want to leave it off for a while too.
Using Jim's modification:
-- signal TsetINT, TactINT: integer range 63 downto -64;
process (CLK)
if rising_edge(CLK) then
if signed(Tact) <= signed(Tset) - 2 then
Heaton <= '1';
elsif signed(Tact) >= signed(Tset) + 2 then
Heaton <= '0';
end if;
end if;
end process;
gives us longer Heaton on and off cycles, decreasing how many times the heater starts and stops:
And actually allows us to see the temperature reach Tset + 2 as well as Tset - 2. where these thresholds provide the hysteresis which is characterized as a minimum on or minimum off time, depending on the efficiency of the heater and heat loss rate when the heater is off.
So what changed in the execution of the thermo model process? Look at the difference in the synthesis results for the two versions.

AND all elements of an n-bit array in VHDL

lets say I have an n-bit array. I want to AND all elements in the array. Similar to wiring each element to an n-bit AND gate.
How do I achieve this in VHDL?
Note: I am trying to use re-usable VHDL code so I want to avoid hard coding something like
result <= array(0) and array(1) and array(2)....and array(n);
Solution 1: With unary operator
VHDL-2008 defines unary operators, like these:
outp <= and "11011";
outp <= xor "11011";
outp <= and inp; --this would be your case
However, they might not be supported yet by your compiler.
Solution 2: With pure combinational (and traditional) code
Because in concurrent code you cannot assign a value to a signal more than once, your can create a temp signal with an "extra" dimension. In your case, the output is one-bit, so the temp signal should be a 1D array, as shown below.
entity unary_AND IS
generic (N: positive := 8); --array size
port (
inp: in bit_vector(N-1 downto 0);
outp: out bit);
end entity;
architecture unary_AND of unary_AND is
signal temp: bit_vector(N-1 downto 0);
temp(0) <= inp(0);
gen: for i in 1 to N-1 generate
temp(i) <= temp(i-1) and inp(i);
end generate;
outp <= temp(N-1);
end architecture;
The inferred circuit is shown in the figure below.
Solution 3: With sequential code
This is simpler than solution 2, though you are now using sequential code to solve a purely combinational problem (but the hardware will be the same). You can either write a code similar to that in solution 2, but with a process and loop (the latter, in place of generate) or using a function. Because in sequential code you are allowed to assign a value to a signal more than once, the temp signal of solution 2 is not needed here.
If you have VHDL-2008 available, then reduction and is build into the
language as David Koontz and Pedroni have explained.
If you only have VHDL-2003 and prior available, then you can use a function
function and_reduct(slv : in std_logic_vector) return std_logic is
variable res_v : std_logic := '1'; -- Null slv vector will also return '1'
for i in slv'range loop
res_v := res_v and slv(i);
end loop;
return res_v;
end function;
You can then use the function both inside and outside functions with:
signal arg : std_logic_vector(7 downto 0);
signal res : std_logic;
res <= and_reduct(arg);
My favorite, non-VHDL-2008 solution is:
use ieee.std_logic_unsigned.all ; -- assuming not VHDL-2008
. . .
result <= '1' when not MyArray = 0 else '0' ;
With VHDL-2008, I recommend that you use the "and" reduction built-in (see Pedroni's post) and use the IEEE standard package "ieee.numeric_std_unsigned.all" instead of the shareware package "std_logic_unsigned".
