Given the data path of the LC-3 (provided below), give a complete description of the STORE DIRECT instruction (ST SR, label), as follows

Given the data path of the LC-3 (provided below), give a complete description of the STORE DIRECT instruction (ST SR, label), as follows - lc3

I've tried reading the book but I'm not sure exactly how to go about this. Anyone have an idea?

So in doing these problems its best to have the datapath in front of you and have the Register Transfer Language written down, you do these problems step by step, it is a little daunting, but following all of the Digital Logic you have learned it is all a matter of you being a pirate and the datapath being your treasure map.
To do this you just follow the wires in the diagram. I'm using this one which I'm sure is in the Patt/Patel textbook https://i.ytimg.com/vi/PeWbyffnkZ4/maxresdefault.jpg
Mem[PC + SEXT(IR[8:0])] = SR
Clock Cycle 1
So the first thing you need to do is SEXT(IR[8:0]) So where in the datapath is a sign extender and where is the IR. If you look at the ADDR2MUX you see it has 4 inputs each being bits from the IR and one with 0. So ADDR2MUX=IR[8:0]
Next we need to add the PC to it. So from the output of the ADDR2MUX will be the SEXT(IR[8:0]) So next we need to add the PC to that output. Well we see the output of the ADDR2MUX feeds into an adder. So ok we need to set the other adder up with the PC. The ADDR1MUX has an input from the register file and the PC. So ADDR1MUX=PC
Both of these inputs go into the adder and now the output of that adder has PC + SEXT(IR[8:0])
Next we need to store to memory, the address we want to store to is PC + SEXT(IR[8:0]), and what we want to store is SR. So how do we do that? To interface with memory we need to put the address in the MAR (Memory Address Register) and the data we want to store in the MDR. So lets do the MAR step first. So we need to put the result of the ADDER into the MAR. The only path we can take is the MARMUX. So MARMUX=ADDER. We need to Gate the MARMUX to put it out on the bus as well. So GateMARMUX.
The value of the MARMUX is now out onto the bus so we want to latch that into the MAR so LDMAR.
We need a new clock cycle now because we need to wait for the value to latch into the register which happens at the beginning of a new clock cycle.
Clock Cycle 1 Signals - ADDR2MUX=IR[8:0], ADDR1MUX=PC, MARMUX=ADDER, GateMARMUX, LDMAR
Clock Cycle 2
Now lets the source register into the MDR. Looking at the diagram we need a path from the register file to the BUS to get it into the MDR. There's actually two ways of doing this one going through ADDR1MUX and one going through the ALU.
I will take the ALU path as its slightly more correct.
First we need to make SR1 be the source register from the instruction so SR1MUX=[11:9]
The Source register from the instruction now comes out the register file from the SR1 output, this feeds into the ALU. So we must choose the operation the ALU does so, ALUK=PASSA. PASSA simply makes the output of the ALU the 'A' input.
We then need to put the ALU output on the bus so GateALU
Now the ALU output is on the bus and we want to store this in the MDR, however there is a MUX blocking that input. MIO.EN=0 to select the bus output to go into the MDR. Then we need to latch that value into the register so LDMDR
We just tried to load a value into a register so it will not be available in the output until the start of the next clock cycle so..
Clock Cycle 2 Signals - SR1MUX=[11:9], ALUK=PASSA, GateALU, MIO.EN=0, LDMDR
Clock Cycle 3
All we need to do is give memory a good ol kick to store the value in the MDR at the address in the MAR so... MIO.EN=1, R/W=W
Clock Cycle 3 signals - MIO.EN=1, R/W=W
So this concludes the ST instruction it takes 3 clock cycles and the signals to assert each clock cycle are indicated at the end of each clock cycle. All of those signals per clock cycle can be turned on at the same time.

Related

Why does the single cycle processor not incur register latency on both read and write?

I wonder why last register write latency(200) is not added?
To be more precise, critical path is determined by load instruction's
latency, so then why critical path is not
I-Mem + Regs + Mux + ALU + D-Mem + MUX + Regs
but is actually
I-Mem + Regs + Mux + ALU + D-Mem + MUX
Background
Figure 4.2
In the following three problems, assume that we are starting with a
datapath from Figure 4.2, where I-Mem, Add, Mux, ALU, Regs, D-Mem,
and Control blocks have latencies of 400 ps, 100 ps, 30 ps, 120 ps,
200 ps, 350 ps, and 100 ps, respectively, and costs of 1000, 30, 10,
100, 200, 2000, and 500, respectively.
And I find solution like below
Cycle Time Without improvement = I-Mem + Regs + Mux + ALU + D-Mem +
Mux = 400 + 200 + 30 + 120 + 350 + 30 = 1130
Cycle Time With improvement = 1130 + 300 = 1430

It is a good question as to whether it requires two Regs latencies.
The register write is a capture of the output of one cycle.  It happens at the end of one clock cycle, and the start of the next — it is the clock cycle edge/transition to the next cycle that causes the capture.
In one sense, the written output of one instruction effectively happens in parallel with the early operations of the next instruction, including the register reads, with the only requirement for this overlap being that the next instruction must be able to read the output of the prior instruction instead of a stale register value.  And this is possible because the written data was already available at the very top/beginning of the current cycle (before the transition, in fact).
The PC works the same: at the clock transition from one cycle's end to another cycle's start, the value for the new PC is captured and then released to the I Mem.  So, the read and write effectively happen in parallel, with the only requirement then being that the read value sent to I Mem is the new one.
This is the fundamental way that cycles work: enregistered values start a cycle, then combinational logic computes new values that are captured at the end of the cycle (aka the start of the next cycle) and form the program state available for the start of the new cycle.  So one cycle does state -> processing -> (new) state and then the cycle repeats.
In the case of the PC, you might ask why we need a register at all?
(For the 32 CPU registers it is obvious that they are needed to provide the machine code program with lasting values (one instruction outputs register, say, $a0 and that register may be used many instructions later, or maybe even used many times before being changed.))
But one could speculate what might happen without a PC register (and the clocking that dictates its capture), and the answer there is that we don't want the PC to change until the instruction is completed, which is dictated by the clock.  Without the clock and the register, the PC could run ahead of the rest of the design, since much of the PC computation is not on the critical path (this would cause instability of the design).  But as we want the PC to hold stable for the whole clock cycle, and change only when the clock says the instruction is over, a register is use (and the clocked update of it).

Xilinx FPGA output to output timing constraints

I have a Spartan-6/ISE design where I'm generating 8-bit data # 70MHz to feed the FIFO of a Cypress FX3 USB3 controller. I also generate a 70MHz o/p clock and /WR strobe that clock data into the USB controller. The 70MHz is derived from halving the 140MHz system clock, divided by 2 in a process rather than using a DPLL, though the 140MHz system clock is produced using a DPLL.
I want to ensure the 8-bit data meets the set-up & hold time requirements of the USB controller and, although the data, o/p clock and /WR are derived from the 140MHz, I don't really care about their relationship to it. What I'm really concerned about is ensuring the set-up & hold times for data & /WR w.r.t the 70MHz o/p clock are within the USB controller limits.
My question is: how do I go about specifying timing constraints between FPGA outputs rather than w.r.t. to the internal system clock ?
Thanks
Dave

IR emitter and PWM output

I have been using FRDM_KL46Z development board to do some IR communication experiment. Right now, I got two PWM outputs with same setting (50% duty cycle, 38 kHz) had different voltage levels. When both were idle, one was 1.56V, but another was 3.30V. When the outputs were used to power the same IR emitter, the voltages were changed to 1.13V and 2.29V.
And why couldn't I use one PWM output to power two IR emitters at the same time? When I tried to do this, it seemed that the frequency was changed, so two IR receivers could not work.

I am not an expert in freescale, but how are you controlling your pwm? I'm guessing each pwm comes from a separate timer, maybe they are set up differently. Like one is in 16 bit mode (the 3.3V) and the other in 32 (1.56v) in that case even if they have the same limit in the counter ((2^17 - 1) / 2) would be 50% duty cycle of a 16 bit timer. But in a 32 bit, that same value would only be 25% duty so, one output would be ~1/2 the voltage of the other. SO I suggest checking the timer setup.
The reason the voltage changed is because the IR emmiters were loading the circuit. In an ideal situation this wouldn't happen, but if a source is giving too much current the voltage usually drops a bit.

SET A VALUE TO PIN BY MONITORING SAME PIN IN VHDL

Is it possible to set an inout pin to specific value when after monitoring the value in same pin.ie if we have an inout signal then if value on that signal is one then after doing specific operation can we set value of that pin to zero in vhdl.

What you are describing doesn't make a lot of sense. Are you sure you are understanding the requirements correctly?
Your load signal sounds like an external control signal that is an input into your module. You should not be trying to change the value of that signal - whoever is controlling your module should do that instead.
As long as the load signal is asserted (1), you should probably be loading your shift register with whatever value is presumably being provided on a different input signal (e.g., parallel_data). When the load signal is deasserted (0) by the external logic, you should probably start shifting out one bit of the loaded data per clock cycle to your output signal (e.g., serial_data).
Note that there is no need for bidirectional signals!
This is all based on what I would consider typical behavior for a shift register, and may or may not match what you are trying to achieve.

This doesn't sound like a good plan, and I'm not entirely sure you want to do it, but I guess if you can set things up such that:
you have a resistor pulling the wire down to ground.
your outside device drives the wire high
the FPGA captures the pin going high and then also drives it high
The outside source goes tristate once it has seen the pin go high
the FPGA can then set the pin tristate when it wants to flag it has finished (or whatever), and the resistor will pull it low again
repeat
I imagine one use for this would be for the outside device to trigger some processing and the FPGA to indicate when it has finished, in which case, the FPGA code could be something like:
pin_name <= '1' when fpga_is_processing = '1' else 'Z';
start_processing <= '1' when pin_name = '1' and pin_name_last = '0';
pin_name_last <= pin_name when rising_edge(clk);
start processing will produce a single clock pulse on the rising edge of the pin_name signal. fpga_is_processing would be an output from your processing block, which must "come back" before he external device has stopped driving the pin high.
You may want to "denoise" the edge-detector on the pin_name signal to reduce the chances of external glitches triggering your processing. There are various ways to achieve that also.

Implementing delay in VHDL state machine

I'm writing a state machine which controls data flow from a chip by setting and reading read/write enables. My clock is running at 27 MHz giving a period of 37 ns. However the specification for the chip I'm communicating with requires I hold my 'read request' signal for at least 50 ns. Of course this isn't possible to do in one cycle since my period is 37 ns.
I have considered I could create an additional state which does nothing but flag the next state to be the one I actually complete the read on, hence adding another period delay (meaning I hold 'read request' for 74 ns), but this doesn't sound like good practice.
The other option is perhaps to use a counter, but I wonder if there's perhaps yet another option I haven't visited yet?
How should one implement delay in a state machine when a state should last longer than one clock period?
Thanks!
(T1 must be greater than 50 ns)
Please see here for the full datasheet.

Delays are only reliably doable using the clock - adding an extra "tick" either via an extra state or using a counter in the existing state is perfectly acceptable to my mind. The counter has the possibility of being more flexible if you re-use the same state machine with a slower external chip (or if you use a different clock frequency to feed the FPGA) - you can just change the maximum count, instead of adding multiple "wait" states to the state machine.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio