I am referring to the popular paper: Correct Methods For Adding Delays To Verilog Behavioral Models by Cummings
http://www.sunburst-design.com/papers/CummingsHDLCON1999_BehavioralDelays_Rev1_1.pdf
always #(a or b or ci)
#12 {co, sum} = a + b + ci;
From the paper,
a input changes at time 15 as shown in Figure 3, then if
the a, b and ci inputs all change during the next 9ns, the
outputs will be updated with the latest values of a, b and
ci. This modeling style has just permitted the ci input to
propagate a value to the sum and carry outputs after
only 3ns instead of the required 12ns propagation delay.
Can anyone tell me why is this true?
We are using a blocking assignment here. So,shouldn't the always block remain inactive from 15 to 27 ns because the current always block is not completed? But here it remains active , means always block gets triggered whenever a change is noticed.
It helps to reformat the code with a different layout to more accurately capture the functionality that is happening
always
#(a or b or ci) // line 1
#12 // line 2
{co, sum} = a + b + ci; // line 3
The first line suspends the always process until it sees a change in one of the listed signals, which happens at time 15.
The second line suspends the process for 12 time units. During this time period, there are changes to a, b, and ci that are ignored.
The third line wakes up at time 27 (15+12) and uses the current values of a, b, and ci and makes a blocking assignment to {co, sum}.
Since this is a always block, after the assignment, it loops back to the first line and waits for another change.
Related
I'm working on a FPGA project where I need to read data from an image sensor. This sensor has different image modes (like test pattern, frame, binning, etc.) and in order to change image mode I need to look for specific signals before writing into the registers.
I have inherited some code that I need to fix since the image sensor sometimes gets stuck when we change image mode.
Concerning the change of image mode, a state machine is used.
The following piece of code shows how the registers for changing mode are currently written.
Essentially, when we want to change mode, we need to wait that the signal MODE_SIG_HIGH becomes high before writing into the registers. Then, when this condition happens, we check what mode we want to set. For example, to set set test pattern, we check if bit S2 is set. Then we performs all the operations to actually change mode (line 10).
01. ...
02. WHEN MODE_SIG_HIGH =>
03. NEXT_ST <= MODE_SIG_HIGH;
04. ...
05. IF S2 = '1' THEN
06. -- configure the sensor to
07. NEXT_ST <= CONFIGURE_TEST_PATTERN;
08. END IF;
09. ...
10. WHEN CONFIGURE_TEST_PATTERN =>
11. ...
I'm having a debate with a friend of mine concerning what is the best way to change state when a new event happens. The above solution doesn't seem right to me.
As far as I understood, when we enter a sate, all the instructions contained in that state are performed in parallel. Therefore, concerning the above piece of code, when we enter the state MODE_SIG_HIGH the instruction at line 03 is executed in parallel to the IF condition. My point is that if the bit S2 is set to 1, the IF condition is true and we end up assigning the value CONFIGURE_TEST_PATTERN to the NEXT_ST. An this ends up in assigning two different values to the same variable (in parallel), in line 03 and in line 07. Am I right or am I missing some basic behavior? The reason for having the instruction at line 3 is because after we enter MODE_SIG_HIGH, it could take some clock cycles before we see on of the mode bits set.
As far as I understood, when we enter a sate, all the instructions
contained in that state are performed in parallel.
Not quite. The only things in VHDL which are concurrent ('performed in parallel') are:
processes
concurrent signal assignments
component instantiations
concurrent procedure calls
concurrent assertions (inc.PSL)
generates
blocks
The code inside a process or subprogram (function/prodedure) executes sequentially. This is where you do your conventional programming, using sequential statements (ie. nothing in the list above). These are your standard control constructs (if, case, loop, etc), sequential signal assignments, and so on. If you carry out signal (or variable) assignments in a sequential region, the last one wins, just like a conventional programming language. There are scheduling rules that make this happen, but you don't need to know about those (yet!)
I'm writing a project that firstly designates the root process to read a large data file and do some calculations, and secondly broadcast the calculated results to all other processes. Here is my code: (1) it reads random numbers from a txt file with nsample=30000 (2) generate dens_ent matrix by some rule (3) broadcast to other processes. Btw, I'm using OpenMPI with gfortran.
IF (myid==0) THEN
OPEN(UNIT=8,FILE='rnseed_ent20.txt')
DO i=1,n_sample
DO j=1,3
READ(8,*) rn(i,j)
END DO
END DO
CLOSE(8)
END IF
dens_ent=0.0d0
DO i=1,n_sample
IF (myid==0) THEN
!Random draws of productivity and savings
rn_zb=MC_JOINT_SAMPLE((/-0.1d0,mu_b0/),var,rn(i,1:2))
iz=minloc(abs(log(zgrid)-rn_zb(1)),dim=1)
ib=minloc(abs(log(bgrid(1:nb/2))-rn_zb(2)),dim=1) !Find the closest saving grid
CALL SUB2IND(j,(/nb,nm,nk,nxi,nz/),(/ib,1,1,1,iz/))
DO iixi=1,nxi
DO iiz=1,nz
CALL SUB2IND(jj,(/nb,nm,nk,nxi,nz/),(/policybmk_2_statebmk_index(j,:),iixi,iiz/))
dens_ent(jj)=dens_ent(jj)+1.0d0/real(nxi)*markovian(iz,iiz)*merge(1.0d0,0.0d0,vent(j) .GE. -bgrid(ib)+ce)
!Density only recorded if the value of entry is greater than b0+ce
END DO
END DO
END IF
END DO
PRINT *, 'dingdongdingdong',myid
IF (myid==0) dens_ent=dens_ent/real(n_sample)*Mpo
IF (myid==0) PRINT *, 'sum_density by joint normal distribution',sum(dens_ent)
PRINT *, 'BLBLALALALALALA',myid
CALL MPI_BCAST(dens_ent,N,MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,ierr)
Problem arises:
(1) IF (myid==0) PRINT *, 'sum_density by joint normal distribution',sum(dens_ent) seems not executed, as there is no print out.
(2) I then verify this by adding PRINT *, 'BLBLALALALALALA',myid etc messages. Again no print out for root process myid=0.
It seems like root process is not working? How can this be true? I'm quite confused. Is it because I'm not using MPI_BARRIER before PRINT *, 'dingdongdingdong',myid?
Is it possible that you miss the following statement just at the very beginning of your code?
CALL MPI_COMM_RANK (MPI_COMM_WORLD, myid, ierr)
IF (ierr /= MPI_SUCCESS) THEN
STOP "MPI_COMM_RANK failed!"
END IF
The MPI_COMM_RANK returns into myid (if succeeds) the identifier of the process within the MPI_COMM_WORLD communicator (i.e a value within 0 and NP, where NP is the total number of MPI ranks).
Thanks for contributions from #cw21 #Harald and #Hristo Iliev.
The failure lies in unit numbering. One reference says:
unit number : This must be present and takes any integer type. Note this ‘number’ identifies the
file and must be unique so if you have more than one file open then you must specify a different
unit number for each file. Avoid using 0,5 or 6 as these UNITs are typically picked to be used by
Fortran as follows.
– Standard Error = 0 : Used to print error messages to the screen.
– Standard In = 5 : Used to read in data from the keyboard.
– Standard Out = 6 : Used to print general output to the screen.
So I changed all numbering i into 1i, not working; then changed into 10i. It starts to work. Mysteriously, as correctly pointed out by #Hristo Iliev, as long as the numbering is not 0,5,6, the code should behave properly. I cannot explain to myself why 1i not working. But anyhow, the root process is now printing out results.
This question already has an answer here:
MPI_Recv overwrites parts of memory it should not access
(1 answer)
Closed 3 years ago.
I have some Fortran code that I'm parallelizing with MPI which is doing truly bizarre things. First, there's a variable nstartg that I broadcast from the boss process to all the workers:
call mpi_bcast(nstartg,1,mpi_integer,0,mpi_comm_world,ierr)
The variable nstartg is never altered again in the program. Later on, I have the boss process send eproc elements of an array edge to the workers:
if (me==0) then
do n=1,ntasks-1
(determine the starting point estart and the number eproc
of values to send)
call mpi_send(edge(estart),eproc,mpi_integer,n,n,mpi_comm_world,ierr)
enddo
endif
with a matching receive statement if me is non-zero. (I've left out some other code for readability; there's a good reason I'm not using scatterv.)
Here's where things get weird: the variable nstartg gets altered to n instead of keeping its actual value. For example, on process 1, after the mpi_recv, nstartg = 1, and on process 2 it's equal to 2, and so forth. Moreover, if I change the code above to
call mpi_send(edge(estart),eproc,mpi_integer,n,n+1234567,mpi_comm_world,ierr)
and change the tag accordingly in the matching call to mpi_recv, then on process 1, nstartg = 1234568; on process 2, nstartg = 1234569, etc.
What on earth is going on? All I've changed is the tag that mpi_send/recv are using to identify the message; provided the tags are unique so that the messages don't get mixed up, this shouldn't change anything, and yet it's altering a totally unrelated variable.
On the boss process, nstartg is unaltered, so I can fix this by broadcasting it again, but that's hardly a real solution. Finally, I should mention that compiling and running this code using electric fence hasn't picked up any buffer overflows, nor did -fbounds-check throw anything at me.
The most probable cause is that you pass an INTEGER scalar as the actual status argument to MPI_RECV when it should be really declared as an array with an implementation-specific size, available as the MPI_STATUS_SIZE constant:
INTEGER, DIMENSION(MPI_STATUS_SIZE) :: status
or
INTEGER status(MPI_STATUS_SIZE)
The message tag is written to one of the status fields by the receive operation (its implementation-specific index is available as the MPI_TAG constant and the field value can be accessed as status(MPI_TAG)) and if your status is simply a scalar INTEGER, then several other local variables would get overwritten. In your case it simply happens so that nstartg falls just above status in the stack.
If you do not care about the receive status, you can pass the special constant MPI_STATUS_IGNORE instead.
I was just trying some stuff for a quaternionic neural network when I realized that, even if I close my current Session in a for loop, my program slows down massively and I get a memory leak caused by ops being constructed. This is my code:
for step in xrange(0,200):#num_epochs * train_size // BATCH_SIZE):
338
339 with tf.Session() as sess:
340
341 offset = (BATCH_SIZE) % train_size
342 #print "Offset : %d" % offset
343
344 batch_data = []
345 batch_labels = []
346 batch_data.append(qtrain[0][offset:(offset + BATCH_SIZE)])
347 batch_labels.append(qtrain_labels[0][offset:(offset + BATCH_SIZE)]
352 retour = sess.run(test, feed_dict={x: batch_data})
357
358 test2 = feedForwardStep(retour, W_to_output,b_output)
367 #sess.close()
The problem seems to come from test2 = feedForward(..). I need to declare these ops after executing retour once, because retour can't be a placeholder (I need to iterate through it). Without this line, the program runs very well, fast and without a memory leak. I can't understand why it seems like TensorFlow is trying to "save" test2 even if I close the session ...
TL;DR: Closing a session does not free the tf.Graph data structure in your Python program, and if each iteration of the loop adds nodes to the graph, you'll have a leak.
Since your function feedForwardStep creates new TensorFlow operations, and you call it within the for loop, then there is a leak in your code—albeit a subtle one.
Unless you specify otherwise (using a with tf.Graph().as_default(): block), all TensorFlow operations are added to a global default graph. This means that every call to tf.constant(), tf.matmul(), tf.Variable() etc. adds objects to a global data structure. There are two ways to avoid this:
Structure your program so that you build the graph once, then use tf.placeholder() ops to feed in different values in each iteration. You mention in your question that this might not be possible.
Explicitly create a new graph in each for loop. This might be necessary if the structure of the graph depends on the data available in the current iteration. You would do this as follows:
for step in xrange(200):
with tf.Graph().as_default(), tf.Session() as sess:
# Remainder of loop body goes here.
Note that in this version, you cannot use Tensor or Operation objects from a previous iteration. (For example, it's not clear from your code snippet where test comes from.)
Considering this code:
architecture synth of my_entity is
signal a : std_logic;
begin
a <= c and d;
b <= a and c;
end synth;
Is the second line going to respect that a changed in the other process or are all signals only at the end of architecture assigned?
Careful with your terminology. When you say a changed in the other "process", that has a specific meaning in VHDL (process is a keyword in VHDL), and your code does not have any processes.
Synthesizers will treat your code as:
a <= c and d;
b <= (c and d) and c;
Simulators will typically assign a in a first pass, then assign b on a second pass one 'delta' later. A delta is an infinitesimal time delay that takes place at the same simulation time as the initial assignment.
Note this is a gross generalization of what really happens...if you want full details, read up on the documentation provided with your tool chain.
Is the second line going to respect that a changed in the other
process or are all signals only at the end of architecture assigned?
It sounds like you are thinking of signal behaviour within a single process when you say this. In that context, the signals are not updated until the end of the process, so the b update will use the "old" value of a
However, signal assignments not inside a process statement are executed continuously, there is nothing to "trigger" an architecture to "run". Or alternatively, they are all individual separate implied processes (as you have commented), with a sensitivity list implied by everything on the "right-hand-side".
In your particular case, the b assignment will use the new value of a, and the assignment will happen one delta-cycle after the a assignment.
For a readable description of how simulation time works in VHDL, see Jan Decaluwe's page here:
http://www.sigasi.com/content/vhdls-crown-jewel
And also this thread might be instructive:
https://groups.google.com/group/comp.lang.vhdl/browse_thread/thread/e47295730b0c3de4/d5bd4532349aadf0?hl=en&ie=UTF-8&q=vhdl+concurrent+assignment#d5bd4532349aadf0