MSI: When shared and invalid states can occur at the same time - caching

So, as the title says, is it possible that Processor 0 has line A with a Shared (S) state, and Processor 1 has line B with an Invalid (I) state?
Imagine the following situation:
P0: Line A | Modified
P1: Line A | Invalid
P2: Line A | Invalid
If P2 makes a read request for Line A, what is P1 final state? Shared or keeps Invalid?

TL;DR
P1's line A will still be in Invalid state. An Invalid state cannot be changed by other processors' actions.
Both P0 and P2 will have line A in Shared state.
On the Wikipedia's page for the MSI protocol there's the state machine description of the algorithm.
Along with the English description, there are two pictures.
Given a set of processors and their relative cache lines, a processor can either be "active" by making one action among "load/read" and "store/write", or can be "passive" by snooping an event on the bus.
The input to the MSI (and similar) protocol is either an action or a bus event. For simplicity Wikipedia split the state machine in two: one when the input is an action and one when the input is a bus event.
This way you can use one picture to calculate the new states for the lines of the active processor (which is exactly one) and the other picture for the states of the lines of the passive processors.
Let's say processor X is the active one, thus making a load or a store.
The first picture describes how the cache lines state changes for processor X (the active processor):
Each label has the form x/y where x is the input action (either PrRd for a load/read or PrWr for a store/write) and y is the bus event that is emitted ("-" means no event is emitted on the bus).
The second picture is used similarly but for the passive processors (any processor but processor X):
Each label here is again a x/y pair but x is a bus event and y is a bus action.
The bus events are:
BusRd -> Another processor needs to read a line from memory (or upper cache).
BusRdX -> Another processor needs to read a line from memory (or upper cache) but then will immediately modify it (i.e. because it was doing a write).
BugUpgr -> Another processor just wrote to one of its cache line that was only read so far.
Of course, a Flush is the act of writing a line to memory. Wikipedia considers it a bus transaction (I consider it a bus action since it's not used as an input).
We are now ready to answer your question.
P2 is the active processor and needs to read line A, so it performs a PrRd. The first picture tells us that its line A will end up in S state and that a BusRd is issued on the bus (note that this is a mental model, real hardware won't probably send a special transaction, rather it will detect the read itself).
P2: LineA -> Shared
Both P0 and P1 are passive processors and both see the BusRd.
P0 has the line in state Modified, the second picture tells us that it will flush the line (making the last value available to P2) and set line A to state Shared.
P1 has the line in state Invalid, from the second picture we see that there is no way to escape an Invalid state for a passive processor. Specifically, the BusRd input will set the state of line A to Invalid again (it is actually ignored).
So after the read from P2 we have:
P0: LineA -> Shared
P1: LineA -> Invalid
P2: LineA -> Shared

Related

Proper way to change state on a state machine in VHDL

I'm working on a FPGA project where I need to read data from an image sensor. This sensor has different image modes (like test pattern, frame, binning, etc.) and in order to change image mode I need to look for specific signals before writing into the registers.
I have inherited some code that I need to fix since the image sensor sometimes gets stuck when we change image mode.
Concerning the change of image mode, a state machine is used.
The following piece of code shows how the registers for changing mode are currently written.
Essentially, when we want to change mode, we need to wait that the signal MODE_SIG_HIGH becomes high before writing into the registers. Then, when this condition happens, we check what mode we want to set. For example, to set set test pattern, we check if bit S2 is set. Then we performs all the operations to actually change mode (line 10).
01. ...
02. WHEN MODE_SIG_HIGH =>
03. NEXT_ST <= MODE_SIG_HIGH;
04. ...
05. IF S2 = '1' THEN
06. -- configure the sensor to
07. NEXT_ST <= CONFIGURE_TEST_PATTERN;
08. END IF;
09. ...
10. WHEN CONFIGURE_TEST_PATTERN =>
11. ...
I'm having a debate with a friend of mine concerning what is the best way to change state when a new event happens. The above solution doesn't seem right to me.
As far as I understood, when we enter a sate, all the instructions contained in that state are performed in parallel. Therefore, concerning the above piece of code, when we enter the state MODE_SIG_HIGH the instruction at line 03 is executed in parallel to the IF condition. My point is that if the bit S2 is set to 1, the IF condition is true and we end up assigning the value CONFIGURE_TEST_PATTERN to the NEXT_ST. An this ends up in assigning two different values to the same variable (in parallel), in line 03 and in line 07. Am I right or am I missing some basic behavior? The reason for having the instruction at line 3 is because after we enter MODE_SIG_HIGH, it could take some clock cycles before we see on of the mode bits set.
As far as I understood, when we enter a sate, all the instructions
contained in that state are performed in parallel.
Not quite. The only things in VHDL which are concurrent ('performed in parallel') are:
processes
concurrent signal assignments
component instantiations
concurrent procedure calls
concurrent assertions (inc.PSL)
generates
blocks
The code inside a process or subprogram (function/prodedure) executes sequentially. This is where you do your conventional programming, using sequential statements (ie. nothing in the list above). These are your standard control constructs (if, case, loop, etc), sequential signal assignments, and so on. If you carry out signal (or variable) assignments in a sequential region, the last one wins, just like a conventional programming language. There are scheduling rules that make this happen, but you don't need to know about those (yet!)

How does the output register data path work in the 6502?

I am currently developing a subset of the 6502 in LogiSim and at the current stage I am determining which parts to implement and what can be cut out. One of my main resources is Hanson's Block Diagram.
I am currently trying to determine how exactly the output register and its data path works. In this diagram, it looks to me like the data output register goes back onto the bus through the Input Data Latch, but also back into the instruction register.
This confuses me because usually the Address lines to the right of the diagram are sent back into the program memory (not pictured) and not back onto the bus as pictured.
How exactly does this data path work? As a follow up, Is it possible to simplify this area to only take the output and send it to a display instead of back into the processor as pictured?
This confuses me because usually the Address lines to the right of the diagram are sent back into the program memory (not pictured) and not back onto the bus as pictured.
The address bus works differently from the data bus. The address bus is always Output, but the data bus can be Input or Output. We say that the databus is tristate; it either reads, or writes, or does neither. Each pin d0 thru d7 has a simple circuit involving a couple of transistors that controls this. In the case of the 6502, each and every cycle the CPU is either reading something or writing something. In other words, from the 6502's point of view, every cycle is either a read or write cycle.
I am currently trying to determine how exactly the output register and its data path works.
Have a look: the Input Data Latch and Predecode Register are loaded with each φ2. But the Output Data Register is loaded with each φ1. φ1 and φ2 are the two phases of the CPU clock. This arrangement leaves enough time for, say a value to pass from the Input Data Latch, through the ALU, and into the Output Data Register for example.
The Data Output Register's output goes to the Data Bus Tristate Buffers. As you can see, that is controlled by R/W and also by φ2. If it's a read cycle, nothing happens there. So if it's a write cycle, that means the value in the Data Output Register (which was loaded with the previous φ1) is going to be put onto the databus. It also will get loaded into the Predecode Register and into the Input Data Latch.
In this diagram, it looks to me like the data output register goes back onto the bus through the Input Data Latch, but also back into the instruction register.
Absolutely. Anything that the CPU outputs could also get loaded into the Input Data Latch and the Predecode Register. But that doesn't matter, since an instruction will always start with a read cycle, which is the opcode fetch, so the Input Data Latch and the Predecode Register will get overwritten then with the proper value.

what happens during context switch between two processes in linux?

Let's say process p1 is executing with its own address space(stack,heap,text). When context switch happens, i understand that all the current cpu registers are pushed into PCB before loading process p2. Then TLB is flushed and loaded with p2 address mapping and starts executing with its own address spaces.
What i would like know is the state of p1 address space. Will it be copied to disk and updates its page table before loading process p2?
The specifics of a context switch depend upon the underlying hardware. However, context switches are basically the same, even among different system.
The mistake you have is " i understand that all the current cpu registers are pushed into stack before loading process p2". The registers are stored in an area of memory that is usually called the PROCESS CONTEXT BLOCK (or PCB) whose structure is defined by the processor. Most processors have instructions for loading and saving the process context (i.e., its registers) into this structure. In the case of Intel, this can require multiple instructions saving to multiple blocks because of all the different register sets (e.g. FPU, MMX).
The outgoing process does not have to be written to disk. It may paged out if the system needs more memory but it is possible that it could stay entirely in memory and be ready to execute.
A context switch is simply the exchange of one processor's saved register values for another's.

How do I read the status register of a Virtex 5 in a JTAG chain?

I'm working on an XUPV5-LX110T and I'm trying to read the status register over JTAG. I'm getting incorrect data, but I can't see why. I seem to be getting all zeros.
I suspect it has to do with the order of the JTAG chain, but I'm not sure how I should adjust the order of the commands I send.
I know the TMS pits will change the state of all the devices on the chain, but how do you shift in data to the FPGA when it's the last device on the chain?
I've actually worked on this same device. If I'm correct, when you look at the JTAG chain in iMPACT, you should see 5 devices: two PROMs, a SystemAce, and a CPLD, followed by the Virtex 5 as the final item on the chain. Like this:
PROM -> PROM -> SysAce -> CPLD -> Virtex5
In order to read the status register successfully, you will need to understand how the TAP Controller works:
(source: fpga4fun.com)
Like you said, the TMS signals are connected to all the devices on the JTAG chain. That is, if you're in the Test-Logic-Reset state and send in 0 1 1 0 0, all devices will now be in the Shift-DR state.
Next, you will want to know the size of all of the Instruction Registers of the devices on your JTAG chain. In this case, the two PROMs have IR size of 16 bits. The SysAce and CPLD have IR size of 8-bits. You want to know these sizes so that you know how much data to shift down the chain. The Virtex 5 has an IR size of 10-bits.
The final trick to working with JTAG is noting that when sending in commands, they are transmitted on TDI LSB-first. But, shifting data into the DR is MSB first. Make sure to check which way is which in the Virtex 5 Configuration Guide
With these pieces of information, you can read the status register like this pseudocode:
unsigned int read_status_register {
reset JTAG to Test-Logic-Reset by sending five 1s on TMS
go into Shift-IR state
// The order of this depends on your JTAG chain
Send CONFIG_IN on TDI (these 10 bits will eventuall get pushed to the Virtex 5's IR)
Send eight 1's to put the CPLD in BYPASS
Send eight 1's to put the SysAce in BYPASS
Send sixteen 1s to put the next PROM in bypass
Send fifteen 1s to put the last PROM in bypass
// As described in the configuration guide
Send the last 1 on TDI while transitioning from Shift-IR to the Exit state
Transition back to Test-Logic-Reset
Transition to Shift-DR
Shift in the command sequence (sync word, noop, read_status, noop, noop)
Shift in 3 bits to push the command sequence past the other devices on the chain
Shift in 1 more bit while transitioning to exit
Transition to Shift-IR
Shift in CONFIG_OUT
Shift in 1's to put all devices in BYPASS like we did above
Transition to Shift-DR
Shift out 32-bits and save the data coming from TDO
// Note that we can stop here because the FPGA is the last device
// on the chain. Otherwise, you may need to shift in a couple of bits
// to push the data past other devices on the chain
}
As you can see, it's basically all about making the right state transitions, and knowing the order to send things. Good luck!

Is it neccessary to have "control unit next state register" to be FALLING edge triggered ?

Each module can be consider to have following power:
[1] It can store data.
[2] It can operate on the data.(arithmetic operation)
some property of modules (listing just that, i am concerned with right now.)
[1] all register/memory element in modules are RAISING edge triggered.
Now this architecture can be use to create a model of a computer processor.
Real Deal:
Is it neccessary to have "control unit next state register" to be FALLING egde triggered ?
(below i explain why i think so)
CLOCK:
|------| |------|[1] |------| |------|
_____| |_________| |_________| |_________| |____
|----|
Data should be valid in this region at least.(considering the setup/hold time).
|----------------|[1]
____________| |_________
So the write signal should be up (if control unit want to) in this region.
This control signals are just the conbinational result of input and CURRENT STATE.
SO that means as the current state changes the control signal changes, which implies the state should change at falling edge[1].
So change of state is simply the change in "control unit state register" which is happening at the falling edge of the clock.
Thats why i think "Is it necessary to have "control unit next state register" to be FALLING edge triggered" ....am i thinking/considering things right ?
If yes then the same(falling edge triggered of control unit state register ) should be happening in actual processor as well.
I am learning stuff so please forgive + correct my mistakes
A common way to handle this is to consider the rising edge of the clock to trigger the “fetch” cycle, and the falling edge to trigger the “execute” cycle.
During “fetch” the memory address is incremented and data from memory is alowed to stabilize and propagate to control circuits (such as ALU’s settings , demultiplexers to control things, multiplexers to sample states for conditional tests, set up shift logic, etc).
During “execute” the things being controlled by the control circuit outputs are triggered (i.e. the test state being read by a multiplexer would be tested, and if true a branch might be taken by loading the program counter with the branch address,so that during the next fetch cycle the system would load the next instruction from the branch address instead of simply incrementing to the next address in memory ).
ANSWERED by: a generous man "BL" (name initials)

Resources