Vhdl vector boundry check - vhdl

type dmemSpace is array(0 to 1023) of std_logic_vector(31 downto 0);
signal dataMem : dmemSpace := (
400 => X"00000000",
404 => X"00001000",
408 => X"FFFFEFFF",
others => X"00000000"
);
signal dAddr : std_logic_vector(31 downto 0);
signal check : integer;
dAddr(31 downto 0) <= Addr(31 downto 2) & "00";
check <= to_integer(unsigned(dAddr));
DataOut <= dataMem(to_integer(unsigned(dAddr))) when (check > 0);
Its me again.... In working on a single cycle cpu and everything else works fine but this particular line in the memory.
DataOut <= dataMem(to_integer(unsigned(dAddr))) when (check > 0);
I want to prevent an index out of bounds error for DataOut but this doesn't work. Any ideas?
Check > 0 prevents all data from coming out.
Check >= 0 lets the error through... when the index that causes the exception is -4.

If you have it in a process, you need "dAddr" and "check" to be variables, or else you are taking two clock cycles based on whether or not the previous address was valid, not the one you are using.

If your memory has 1024 locations, your address should be 10 bits, not the 32 bits you have now. If your address is unsigned(9 downto 0), all of its values are legal input for your memory array.
Note you put data at address 400, 404, 408. You are leaving three blank spaces in between each data element! Even though your data is 4 bytes wide, every address takes up an entire 4 byte data word.

A few other problems with this attempt and the provided answers:
You could not have applied index of -4. Your dAddr is type std_logic_vector cast to UNSIGNED. So it is always a positive or ZERO.
Using VARIABLES is a solution if you are just SIMULATING. For SYNTHESIS they still need to be SIGNALS if you want to know what the implementation is doing.
Your memory is Read only. If you want to have read/write memory you will want to have this in a clocked process so you generate REGISTERS instead of LATCHES.
I dont know WHAT you're doing with the array partial assignment in the declaration. Yes it is syntatically correct, but assignment at declaration DOES NOT APPLY TO SYNTHESIZED logic. This really only works for CONSTANTS (in all fairness, that is what your dataMem signal is.... a constant.)
To initialize the memory you need it in the RESET block of your clocked process, use a for loop to set all to x"00", followed by the 3 assignments for 400, 404, 408 using [dataMem(404) <= x"08";]
Assign DataOut EVERY clock
if (check < 1024) then
DataOut <= dataMem(check);
else
DataOut <= (others => "0"); -- maybe? or just retain old value?
end if;

Related

Assigning initial value to VHDL vector

I am just learning the syntax of VHDL
I'd like to assign an initial value of '1' to Qout(0) and the rest '0'.
I cannot find the reference that shows me the correct syntax.
This gave me an error:
signal Qout: Std_Logic_Vector (4 downto 0) :='1';
As user1155120 says, in VHDL the width of the right hand side has to match the width of the left hand side of an assignment operator (<= or :=).
So, you could use the literal that corresponds to a std_logic_vector, which is a string:
signal Qout: Std_Logic_Vector (4 downto 0) := "00001";
(a string literal in VHDL is enclosed within double quotes). Or (and this is what a more experience VHDL user would do) use an aggregate:
signal Qout: Std_Logic_Vector (4 downto 0) := (0 => '1', others => '0');
The construct on the right hand side is an aggregate. An aggregate is a construct for representing composite data types such as arrays (which is what a std_logic_vector is) and record types (like a struct in C). The above example is saying "make element 0 equal to '1' and make all the other elements equal to '0'. Element 0 is the right hand side, because the array was declared (4 downto 0) (not (0 to 4)).
Using an aggregate might be considered a better way of doing it because, whilst not to clear to a beginner, the code is more maintainable: if the width of the signal were to change, you would not have to modify the aggregate.
You might want to seriously consider why you want to initialise this signal at all. If you are using an FPGA, it may be the case that that the corresponding flip-flops will be initialised as you wish. (I assume this signal Qout will become 5 flip-flops because of the name you have chosen.) On a chip this would never ever be the case - your initialisation would be ignored. You might want to consider whether providing a reset to your flip-flops would be a better solution than initialising a signal, eg an active-high synchronous reset:
process (Clock)
begin
if Reset = '1' then
Qout <= (0 => '1', others => '0');
elsif rising_edge(Clock) then
...

VHDL multiplier which output has the same side of it's inputs

I'm using VHDL for describing a 32 bits multiplier, for a system to be implemented on a Xilinx FPGA, I found on web that the rule of thumb is that if you have inputs of N-bits size, the output must've (2*N)-bits of size. I'm using it for a feedback system, is it posible to has a multiplier with an output of the same size of it's inputs?.
I swear once I found a fpga application, which vhdl code has adders and multipliers blocks wired with signals of the same size. The person who wrote the code told me that you just have to put the result of the product on a 64 bits signal and then the output has to get the most significant 32 bits of the result (which was not necesarily on the most significant 32 bits of the 64 bits signal).
At the time I build a system (apparently works) using the next code:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity Multiplier32Bits is
port(
CLK: in std_logic;
A,B: in std_logic_vector(31 downto 0);
R: out std_logic_vector(31 downto 0)
);
end Multiplier32Bits;
architecture Behavioral of Multiplier32Bits is
signal next_state: std_logic_vector(63 downto 0);
signal state: std_logic_vector(31 downto 0);
begin
Sequential: process(CLK,state,next_state)
begin
if CLK'event and CLK = '1' then
state <= next_state(61 downto 30);
else
state <= state;
end if;
end process Sequential;
--Combinational part
next_state <= std_logic_vector(signed(A)*signed(B));
--Output assigment
R <= state;
end Behavioral;
I though it was working since at the time I had the block simulated with Active-HDL FPGA simulator, but know that I'm simulating the whole 32 bit system using iSim from Xilinx ISE Design Suite. I found that my output has a big difference from the real product of A and B inputs, which I don't know if it's just the accuracy loose from skipping 32 bits or my code is just bad.
Your code has some problems:
next_state and state don't belong into the sensitivity list
The writing CLK'event and CLK = '1' should be replaced by rising_edge(CLK)
state <= state; has no effect and causes some tools like ISE to misread the pattern. Remove it.
Putting spaces around operators doesn't hurt, but improves readability.
Why do you expect the result of a * b in bits 30 to 61 instead of 0 to 31?
state and next_state don't represent states of a state machine. It's just a register.
Improved code:
architecture Behavioral of Multiplier32Bits is
signal next_state: std_logic_vector(63 downto 0);
signal state: std_logic_vector(31 downto 0);
begin
Sequential: process(CLK)
begin
if rising_edge(CLK) then
state <= next_state(31 downto 0);
end if;
end process Sequential;
--Combinational part
next_state <= std_logic_vector(signed(A) * signed(B));
--Output assigment
R <= state;
end architecture Behavioral;
I totally agree with everything that Paebbels write. But I will explain to you this things about number of bits in the result.
So I will explain it by examples in base 10.
9 * 9 = 81 (two 1 digit numbers gives maximum of 2 digits)
99 * 99 = 9801 (two 2 digit numbers gives maximum of 4 digits)
999 * 999 = 998001 (two 3 digit numbers gives maximum of 6 digits)
9999 * 9999 = 99980001 (4 digits -> 8 digits)
And so on... It is totally the same for binary. That's why output is (2*N)-bits of size of input.
But if your numbers are smaller, then result will fit in same number of digits, as factors:
3 * 3 = 9
10 * 9 = 90
100 * 99 = 990
And so on. So if your numbers are small enough, then result will be 32 bit. Of course, as Paebbels already written, result will be in least significant part of signal.
And as J.H.Bonarius already pointed out, if your input consist not of integer, but fixed point numbers, then you would have to do post shifting. If this is your case, write it in the comment, and I will explain what to do.

How to correctly storage registers in an FPGA

I need to write in VHDL a program that initialize a sensor registers using i2c. My problem is to write an efficent program that don't waste all FPGA space. The number of registers I need to storage are 400 register composed by 8bit address and 8 bit data.
Program I write is:
entity i2cReg is
port (
RegSel : in std_logic;
Address : out std_logic_vector (15 downto 0);
Data : out std_logic_vector (7 downto 0);
RegStop : out std_logic;
ModuleEN : in std_logic
);
end i2cReg;
architecture i2cReg_archi of i2cReg is
signal counter :integer := 0;
begin
process(RegSel, ModuleEN)
begin
if ModuleEN = '0' then
Address <= x"10";
Data <= x"10";
RegStop <= '0';
counter <= 0;
elsif rising_edge(RegSel) then
counter <= counter + 1;
case counter is
when 0 =>
Address <= x"10";
Data <= x"10";
when 1 =>
Address <= x"10";
Data <= x"10";
when 2 =>
Address <= x"10";
Data <= x"10";
when 3 =>
Address <= x"10";
Data <= x"10";
when 4 =>
Address <= x"10";
Data <= x"10";
when 5 =>
Address <= x"10";
Data <= x"10";
when 400 =>
RegStop <= '1';
when others =>
end case;
end if;
end process;
end i2cReg_archi;
There is a way to optimize this code? Or you advice me to use an external eeprom?
Yaro - you have not mentioned the FPGA vendor or the device but the answer is: Yes, you can initialize ROM in an FPGA so that the values you need are present after configuration. Both Altera and Xilinx allow you to provide a file with the initial values during synthesis.
Kevin.
Initialized BlockRAM is in general the correct solution if you are on Xilinx or Altera.
But there are exceptions where a logic implementation can also work:
For example, if the content of your 400 registers has repeating patterns or many registers with the same value (like in your example code). In this case, if you implement it as logic, your synthesis tool will optimize it heavily. You may actually end up with a very small amount of logic if the register content is very repeating. It is sometimes also possible to improve the optimization by clever reordering of the registers.
100-200 logic cells is often considered "cheaper" than a BlockRAM. But it depends mostly on which resource is most scarce in your particular application.
Regardless if you go for initialized BlockRAM or logic, I would suggest that you model it as an array of std_logic_vector instead of using case/when.
The "array of std_logic_vector" approach is platform independent, and can be synthesized to either BlockRAM or logic. Your synthesis tool will usually try to automatically select the best implementation. But you can also force the sythesis tool to use either logic or BlockRAM by using vendor specific attributes. (I can't tell you which attributes to use, since I don't know which platform you are using)
Example:
type REG_TYPE is array (0 to 3) of std_logic_vector(15 downto 0);
constant REGISTERS : REG_TYPE :=
(x"0000",
x"0001",
x"0010",
x"0100");
And in your process, something like:
if rising_edge(RegSel) then
Address <= REGISTERS( counter )(15 downto 8);
Data <= REGISTERS( counter )( 7 downto 0);
end if;

VHDL beginner - what's going wrong wrt to timing in this circuit?

I'm very new to VHDL and hardware design and was wondering if someone could tell me if my understanding of the following problem I ran into is right.
I've been working on a simple BCD-to-7 segment display driver for the Nexys4 board - this is my VHDL code (with the headers stripped).
entity BCDTo7SegDriver is
Port ( CLK : in STD_LOGIC;
VAL : in STD_LOGIC_VECTOR (31 downto 0);
ANODE : out STD_LOGIC_VECTOR (7 downto 0);
SEGMENT : out STD_LOGIC_VECTOR (6 downto 0));
function BCD_TO_DEC7(bcd : std_logic_vector(3 downto 0))
return std_logic_vector is
begin
case bcd is
when "0000" => return "1000000";
when "0001" => return "1111001";
when "0010" => return "0100100";
when "0011" => return "0110000";
when others => return "1111111";
end case;
end BCD_TO_DEC7;
end BCDTo7SegDriver;
architecture Behavioral of BCDTo7SegDriver is
signal cur_val : std_logic_vector(31 downto 0);
signal cur_anode : unsigned(7 downto 0) := "11111101";
signal cur_seg : std_logic_vector(6 downto 0) := "0000001";
begin
process (CLK, VAL, cur_anode, cur_seg)
begin
if rising_edge(CLK) then
cur_val <= VAL;
cur_anode <= cur_anode rol 1;
ANODE <= std_logic_vector(cur_anode);
SEGMENT <= cur_seg;
end if;
-- Decode segments
case cur_anode is
when "11111110" => cur_seg <= BCD_TO_DEC7(cur_val(3 downto 0));
when "11111101" => cur_seg <= BCD_TO_DEC7(cur_val(7 downto 4));
when "11111011" => cur_seg <= BCD_TO_DEC7(cur_val(11 downto 8));
when "11110111" => cur_seg <= BCD_TO_DEC7(cur_val(15 downto 12));
when "11101111" => cur_seg <= BCD_TO_DEC7(cur_val(19 downto 16));
when "11011111" => cur_seg <= BCD_TO_DEC7(cur_val(23 downto 20));
when "10111111" => cur_seg <= BCD_TO_DEC7(cur_val(27 downto 24));
when "01111111" => cur_seg <= BCD_TO_DEC7(cur_val(31 downto 28));
when others => cur_seg <= "0011111";
end case;
end process;
end Behavioral;
Now, at first I tried to naively drive this circuit from the board clock defined in the constraints file:
## Clock signal
##Bank = 35, Pin name = IO_L12P_T1_MRCC_35, Sch name = CLK100MHZ
set_property PACKAGE_PIN E3 [get_ports clk]
set_property IOSTANDARD LVCMOS33 [get_ports clk]
create_clock -add -name sys_clk_pin -period 10.00 -waveform {0 5} [get_ports clk]
This gave me what looked like almost garbage output on the seven-segment displays - it looked like every decoded digit was being superimposed onto every digit place. Basically if bits 3 downto 0 of the value being decoded were "0001", the display was showing 8 1s in a row instead of 00000001 (but not quite - the other segments were lit but appeared dimmer).
Slowing down the clock to something more reasonable did the trick and the circuit works how I expected it to.
When I look at what elaboration gives me (I'm using Vivado 2014.1), it gives me a circuit with VAL connected to 8 RTL_ROMs in parallel (each one decoding 4 bits of the input). The outputs from these ROMs are fed into an RTL_MUX and the value of cur_anode is being used as the selector. The output of the RTL_MUX feeds the cur_val register; the cur_val and cur_anode registers are then linked to the outputs.
So, with that in mind, which part of the circuit couldn't handle the clock rate? From what I've read I feel like this is related to timing constraints that I may need to add; am I thinking along the right track?
Did your timing report indicate that you had a timing problem? It looks to me like you were just rolling through the segment values extremely fast. No matter how well you design for higher clock speeds, you're rotating cur_anode every clock cycle, and therefore your display will change accordingly. If your clock is too fast, the display will change much faster than a human would be able to read it.
Some other suggestions:
You should split your single process into separate clocked and unclocked processes. It's not that what you're doing won't end up synthesizing (obviously), but it's unconventional, and may lead to unexpected results.
Your initialization on cur_seg won't really do anything, as it's always driven (combinationally) by your process. It's not a problem - just wanted to make sure you were aware.
Well there are two parts to this.
Your segments appeared so dimly because you are basically running them at a 1/8th duty cycle at a faster rate than the segments have time to react(every clock pulse you are changing which segment is lit up and then you stop driving it on the next pulse).
By increasing the period your segments got brighter by switching from a transient current (segments need time to ramp up) to a steady state current (longer period lets current go to desired levels when you drive the segments slower than their inherent driving frequency). Hence the brightness increase.
One other thing about your code. You may be aware of this, but when you latch with your clock there, the variable labeled cur_anode is advanced and actually represents the NEXT anode. You also latch ANODE and SEGMENT to the current anode and segment respectively. Just pointing out that the cur_anode may be a misnomer (and is confusing because its usually the NEXT one).
Keeping in mind Paul Seeb's and fru1bat's answers on clock speed, Paul's comment on NEXT anode, and fru1bat's suggestion on separating clocked and un-clocked processes as well as your noting that you had 8 ROMs, there are alternative architectures.
Your architecture with a ring counter for ANODE and multiple ROMs happens to be optimal for speed, which as both Paul and fru1bat note isn't needed. Instead you can optimize for area.
Because the clock speed is either external or controlled by the addition of an enable supplied periodically it isn't addressed in area optimization:
architecture foo of BCDTo7SegDriver is
signal digit: natural range 0 to 7; -- 3 bit binary counter
signal bcd: std_logic_vector (3 downto 0); -- input to ROM
begin
UNLABELED:
process (CLK)
begin
if rising_edge(CLK) then
if digit = 7 then -- integer/unsigned "+" result range
digit <= 0; -- not tied to digit range in simulation
else
digit <= digit + 1;
end if;
SEGMENT_REG:
SEGMENT <= BCD_TO_DEC7(bcd); -- single ROM look up
ANODE_REG:
for i in ANODE'range loop
if digit = i then
ANODE(i) <= '0';
else
ANODE(i) <= '1';
end if;
end loop;
end if;
end process;
BCD_MUX:
with digit select
bcd <= VAL(3 downto 0) when 0,
VAL(7 downto 4) when 1,
VAL(11 downto 8) when 2,
VAL(15 downto 12) when 3,
VAL(19 downto 16) when 4,
VAL(23 downto 20) when 5,
VAL(27 downto 24) when 6,
VAL(31 downto 28) when 7;
end architecture;
This trades off a 32 bit register (cur_val), an 8 bit ring counter (cur_anode) and seven copies of the ROM implied by function BCD_TO_DEC7 for a three bit binary counter.
In truth the argument over whether or not you should be using separate sequential (clocked) and combinatorial (non clocked) processes is somewhat reminiscent of Liliput and Blefuscu going to war over Endian-ness.
Separate processes generally execute a little more efficiently due to not sharing sensitivity lists. You could also note that all concurrent statements have process or block statement equivalents. There's also nothing in this design that can take particular advantage of using variables which can result in more efficient simulation while implying a single process. (Shared variables aren't supported by XST).
I haven't verified this will synthesize but after reading through the 14.1 version of the XST user guide think it should. If not you can convert digit to a std_logic_vector with a length of 3.
The + 1 for digit will get optimized, an incrementer is smaller than a full adder.

synthesis of dynamic mux on std_logic_vector bytes

I have a FIFO who's size is determined according to a parameter in the package:
signal fifo : std_logic_vector(FIFO_SIZE*8 -1 downto 0);
I also have a 4 bit vector (numOfBytes) saying how many bytes are in the FIFO at any given time (up to 8).
I want the data out (a single byte) from the FIFO to be determined according the numOfBytes signal:
Do <= fifo(to_integer(unsigned(numOfBytes)*8 -1 downto to_integer(unsigned(numOfBytes)*8 -8) when numOfBytes /= x"0" else (others => '0');
when simulating, this works well, however when I try to synthesis it (using Synopsys DC) I get an elaboration error upon linking the design saying "Constant value required (ELAB-922)".
The ELAB code means "This error message occurs because an expression in the indicated line of your RTL description does not evaluate to a constant value, as required by the language."
How else can I make the output mux so it will undergo synthesis?
if not for the parameter i'd change the Do line to a regular mux, but it can't work with the parameters. (I can't call fifo(63 downto 54) when fifo is 4 byte...)
p.s.
I tried working with conv_integer in the beginning, but changed to to_integer(unsigned())due to answers found on the web.
Signal indexes used to construct a range have to be compile-time constants for synthesis to accept them.
There are two ways to solve this problem:
1) Change your FIFO to use an array. This is the standard way of declaring any form of memory, such as a FIFO.
type fifo_type is array(0 to FIFO_SIZE-1) of std_logic_vector(8-1 downto 0);
signal fifo : fifo_type;
...
Do <= fifo(to_integer(unsigned(numOfBytes))-1) when(numOfBytes/=0) else (others=>'0');
2) Use a loop to convert the variable into a constant. This is a common way to code a generic mux.
Do <= (others=>'0');
for i in 0 to FIFO_SIZE-1 loop
if(numOfBytes=i+1) then
Do <= fifo((i+1)*8-1 downto i*8);
end if;
end loop;
I would recommend the first approach for larger, memory-based FIFOs, and the second for smaller, register-based ones.
If the FIFO created with a number of bytes, instead of combining it into the same std_logic_vector then Synopsys DC may be able to handle it. Code could look like:
library ieee;
use ieee.numeric_std.all;
architecture syn of mdl is
... Declaration of FIFO_SIZE natural constant
type fifo_t is array(natural range <>) of std_logic_vector(7 downto 0);
signal fifo : fifo_t(FIFO_SIZE - 1 downto 0);
begin
... Handling FIFO insert and remove
Do <= fifo(to_integer(unsigned(numOfBytes))) when numOfBytes /= x"0" else (others => '0');
end architecture;
If you don't need a runtime-dynamic size to the FIFO, use a generic on your entity.
If you truly need a dynamic sized FIFO, you'll have to use a loop in a process as someone else said. But be very careful how you use such a FIFO, as if you change the size of it while someone is reading or writing, bad things may happen!

Resources