Understanding the Impact of a MIPS Program on a Direct Mapped Data Cache - caching

I am trying to understand the behavior of a direct mapped data cache in a MIPS processor. I have been given a program and asked to determine which sets of the cache have been updated after the program has been executed. I am having trouble determining the set number, valid bit, and tag value of the data cache.
Here is the program in question:
1 lui $t0,0x12ff
2 lw $t1,0x1240($t0)
3 lw $t2,0x5aa4($t0)
4 lw $t3,0x4248($t0)
Can someone help me understand how to approach this problem and what the solution would look like?
Thank you in advance!
I tried solving the problem by first understanding the memory addresses being loaded in the program. I calculated the memory address for each of the load instructions by adding the base address stored in $t0 and the offset. Then I tried to determine the set number by dividing the memory address by the block size of 32 bytes. I expected to get an integer value as the result, but instead, I got a decimal value. I wasn't sure how to proceed after that. I tried looking at the binary representation of the memory address and the block size to see if I could find a pattern, but I was unable to do so. I was stuck at this point and couldn't figure out how to determine the set number.

Related

How does a CPU know if an address in RAM contains an integer, a pre-defined CPU instruction, or any other kind of data?

The reason this gets me confused is that all addresses hold a sequence of 1's and 0's. So how does the CPU differentiate, let's say, 00000100(integer) from 00000100(CPU instruction)?
First of all, different commands have different values (opcodes). That's how the CPU knows what to do.
Finally, the questions remains: What's a command, what's data?
Modern PCs are working with the von Neumann-Architecture ( https://en.wikipedia.org/wiki/John_von_Neumann) where data and opcodes are stored in the same memory space. (There are architectures seperating between these two data types, such as the Harvard architecture)
Explaining everything in Detail would totally be beyond the scope of stackoverflow, most likely the amount of characters per post would not be sufficent.
To answer the question with as few words as possible (Everyone actually working on this level would kill me for the shortcuts in the explanation):
Data in the memory is stored at certain addresses.
Each CPU Advice is basically consisting of 3 different addresses (NOT values - just addresses!):
Adress about what to do
Adress about value
Adress about an additional value
So, assuming an addition should be performed, and you have 3 Adresses available in the memory, the application would Store (in case of 5+7) (I used "verbs" for the instructions)
Adress | Stored Value
1 | ADD
2 | 5
3 | 7
Finally the CPU receives the instruction 1 2 3, which then means ADD 5 7 (These things are order-sensitive! [Command] [v1] [v2])... And now things are getting complicated.
The CPU will move these values (actually not the values, just the adresses of the values) into its registers and then processing it. The exact registers to choose depend on datatype, datasize and opcode.
In the case of the command #1 #2 #3, the CPU will first read these memory addresses, then knowing that ADD 5 7 is desired.
Based on the opcode for ADD the CPU will know:
Put Address #2 into r1
Put Address #3 into r2
Read Memory-Value Stored at the address stored in r1
Read Memory-Value stored at the address stored in r2
Add both values
Write result somewhere in memory
Store Address of where I put the result into r3
Store Address stored in r3 into the Memory-Address stored in r1.
Note that this is simplified. Actually the CPU needs exact instructions on whether its handling a value or address. In Assembly this is done by using
eax (means value stored in register eax)
[eax] (means value stored in memory at the adress stored in the register eax)
The CPU cannot perform calculations on values stored in the memory, so it is quite busy moving values From memory to registers and from registers to memory.
i.e. If you have
eax = 0x2
and in memory
0x2 = 110011
and the instruction
MOV ebx, [eax]
this means: move the value, currently stored at the address, that is currently stored in eax into the register ebx. So finally
ebx = 110011
(This is happening EVERYTIME the CPU does a single calculation!. Memory -> Register -> Memory)
Finally, the demanding application can read its predefined memory address #2,
resulting in address #2568 and then knows, that the outcome of the calculation is stored at adress #2568. Reading that Adress will result in the value 12 (5+7)
This is just a tiny tiny example of whats going on. For a more detailed introduction about this, refer to http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
One cannot really grasp the amount of data movement and calculations done for a simple addition of 2 values. Doing what a CPU does (on paper) would take you several minutes just to calculate "5+7", since there is no "5" and no "7" - Everything is hidden behind an address in memory, pointing to some bits, resulting in different values depending on what the bits at adress 0x1 are instructing...
Short form: The CPU does not know what's stored there, but the instructions tell the CPU how to interpret it.
Let's have a simplified example.
If the CPU is told to add a word (let's say, an 32 bit integer) stored at the location X, it fetches the content of that address and adds it.
If the program counter reaches the same location, the CPU will again fetch this word and execute it as a command.
The CPU (other than security stuff like the NX bit) is blind to whether it's data or code.
The only way data doesn't accidentally get executed as code is by carefully organizing the code to never refer to a location holding data with an instruction meant to operate on code.
When a program is started, the processor starts executing it at a predefined spot. The author of a program written in machine language will have intentionally put the beginning of their program there. From there, that instruction will always end up setting the next location the processor will execute to somewhere this is an instruction. This continues to be the case for all of the instructions that make up the program, unless there is a serious bug in the code.
There are two main ways instructions can set where the processor goes next: jumps/branches, and not explicitly specifying. If the instruction doesn't explicitly specify where to go next, the CPU defaults to the location directly after the current instruction. Contrast that to jumps and branches, which have space to specifically encode the address of the next instruction's address. Jumps always jump to the place specified. Branches check if a condition is true. If it is, the CPU will jump to the encoded location. If the condition is false, it will simply go to the instruction directly after the branch.
Additionally, the a machine language program should never write data to a location that is for instructions, or some other instruction at some future point in the program could try to run what was overwritten with data. Having that happen could cause all sorts of bad things to happen. The data there could have an "opcode" that doesn't match anything the processor knows what to do. Or, the data there could tell the computer to do something completely unintended. Either way, you're in for a bad day. Be glad that your compiler never messes up and accidentally inserts something that does this.
Unfortunately, sometimes the programmer using the compiler messes up, and does something that tells the CPU to write data outside of the area they allocated for data. (A common way this happens in C/C++ is to allocate an array L items long, and use an index >=L when writing data.) Having data written to an area set aside for code is what buffer overflow vulnerabilities are made of. Some program may have a bug that lets a remote machine trick the program into writing data (which the remote machine sent) beyond the end of an area set aside for data, and into an area set aside for code. Then, at some later point, the processor executes that "data" (which, remember, was sent from a remote computer). If the remote computer/attacker was smart, they carefully crafted the "data" that went past the boundary to be valid instructions that do something malicious. (To give them more access, destroy data, send back sensitive data from memory, etc).
this is because an ISA must take into account what a valid set of instructions are and how to encode data: memory address/registers/literals.
see this for more general info on how ISA is designed
https://en.wikipedia.org/wiki/Instruction_set
In short, the operating system tells it where the next instruction is. In the case of x64 there is a special register called rip (instruction pointer) which holds the address of the next instruction to be executed. It will automatically read the data at this address, decode and execute it, and automatically increment rip by the number of bytes of the instruction.
Generally, the OS can mark regions of memory (pages) as holding executable code or not. If an error or exploit tries to modify executable memory an error should occur, similarly if the CPU finds itself trying to execute non-executable memory it will/should also signal an error and terminate the program. Now you're into the wonderful world of software viruses!

Can someone help me with segmentation and 8086 intel's microprocessor?

I am reading about the architecture of intel's 8086 and can't figure out the following things about segmentation: I know that segment registers point to segments respectively and contain the base address of a 64kb long segment. But who calculates and in which point sets the physical address in the segment registers? Also, because one physical address can be accessed by multiple segment:offset pairs and segments can overlap, how you can be sure that you won't overwrite something? Where I can read more about this?
Generally speaking the Assembler will only use offset addresses to access a logical address. For example looking at this code:
start lea si,[hello] ; Load effective address of string
mov word [ds:si+10],0 ; Zero-terminate string after 10th letter
jmp $ ; Loop endlessly
; Fill rest of the segment with 0s
times 65536-($-$$) db 0x00
hello db "I'm just outside of the current segment. Hello!",0
The assembler will try to calculate the offset of 'hello' from the origin of the program. Since no origin is defined 0x0 will be assumed. However the offset of 'hello' would be 0x10000 in this case, which does not fit 16-bits. Therefor the Assembler will truncate the address to 0x0000. It will not change any of the Segment registers. However it will likely issue a warning, for example test.asm:1: warning: word data exceeds bounds. What actually happens when you run this program is that the jmp $ line is overwritten with zeroes, because the address of hello wrapped around and the CPU will start executing nothing but Zeroes, which was not what you intended to do.
That is of course only if the code-segment and data-segment are the same. Now who guarantees that is the case? Nobody really. Especially since I still don't know what platform you are coding for. It is entirely your resposibility to set up the segment registers with correct values. The easiest way to do so is:
push cs ; Push address of code segment to stack
pop ds ; Pop address back into data segment
push cs ; Same for extra data segment
pop es ;
This way you can be certain your you are accessing the offset in the correct-data segment.
Now regarding 'How do you make sure the code segment doesnt overlap the data segment', why shouldn't it? When your program with data is smaller than 64KB it is actually the easiest way to access data if your code and data segment are identical.
And how can you be sure that you don't overwrite anything important? Assembler can't help you with that, you have to check yourself if the segment:offset address you are writing to already contains data.

Hard time understanding execution of machine instructions

Im reading the book "Write great code: understanding the machine" by Randall Hyde, is a great and clear text but here im completely stuck with his explanation of, for example, the mov instruction.
He dissects the steps for the mov(srcReg,destMem) instruction as follows:
1. Fetch the instruction's opcode from memory.
2. Update the EIP register with the address of the byte following the opcode.
3. Decode the instruction's opcode to see what instruction it specifies.
4. Fetch the displacement associated with the memory operand from the memory location immediately
following the opcode.
5. Update EIP to point at the first byte beyond the operand that follows the opcode.
6. If the mov instruction uses a complex addressing mode (for example, the indexed addressing mode),compute the effective address of the destination memory location.
7. Fetch the data from srcReg.
8. Store the fetched value into the destination memory location.
Im lost in steps 4-6. My exact questions are:
Step 4: Why do I need this displacement, how Im gonna use it later and why?
Step5: I understand that in step 2, the EIP must "point" to the next byte where the next instruction to be executed is stored. But I dont understand why does EIP needs to be one byte beyond the operand address. I belived that EIP was concerned only with instructions/opcodes, not data.
Step6: What is exactly and effective address? Are there other types of address?
Step 4:
Some opcodes reference memory that's relative to the opcode's location. For example, a function might have a constant or static piece of data. If it does, the code may opt to place that right before the function starts (or right after it ends) and refer to it by saying "get the memory from 46 bytes earlier". That's the displacement -- it's an offset from the contents of a register (in this case, EIP), used for referencing data relative to the register's contents.
Step 5
The operands for opcodes are normally stored right after the opcode. So you might have some memory arranged like so: a b c. a is and opcode, b is the operand for a and c is the next opcode.
If you only move EIP to the end of a (so it references b), then in the next instruction cycle, the computer will assume that b is the next opcode to execute. b isn't supposed to be an opcode though; it's an operand. The computer can't tell the difference between an opcode and an operand though. It just assumes whatever EIP points to is an instruction and executes it. That's why EIP needs to be moved past the operand too.
Step 6
An "effective" address is just an absolute one (relative to the start of memory) while the "complex" address the book refers to is relative to something else (often the contents of a register).
Step 4 showed that an opcode might not refer to an absolute memory address. It could easily refer to a relative one. In fact, programs very frequently refer to addresses that are relative to some register. For example, if you wrote some_struct.data in C and compiled it for an x86 processor, it would load the address of some_struct into a register (say, EAX), then hard-code data's offset from the base of some_struct into the operand. So if there are 5 bytes of data between the start of the struct and the start of the data element, then the instruction might look like load [EAX + 5] -> EBX which means "take what's in EAX, add 5, fetch the data from that address and put it in EBX".
The thing is, the memory doesn't really understand relative addresses like this. It only understands absolute ones. So in order to access a relative address, the processor has to first add that 5 to whatever's in EAX to compute an absolute address. Then it can send that address to the memory controller and have it understood.
There are two basic types of relative addresses I've worked with (there are more I haven't).
Register relative: The processor takes the contents of a register and uses that as the address in memory. Depending on the opcode and processor support, it may also add an operand to the register as well. Step 4 was dealing with this kind of addressing, with EIP as the register the address was relative to.
Memory relative: Sometimes referred to as "indirect". The processor starts out with a register relative address, then automatically fetches the data at that address and treats it as the real address.
Wikipedia describes lots of other addressing modes on their addressing modes page.
Memory relative took me a while to understand. Say you did a memory relative load where the register contains 10 and the offset is 5. The processor will add them together (10 + 5 = 15). Then, it'll go to that address (15 in this case) and grab whatever's there. If address 15 happens to contain the value 60, then 60 will be treated as the actual address and the processor will load the contents of address 60. If you're familiar with a language with pointers (e.g. C), memory relative is like a pointer-to-a-pointer.

Calculating Cache Memory Hit and Miss, and Calculating Rows in Cache

I am studying an old exam for an upcoming exam, and the final questions consist of what the title describes. Now, I am familiar with assembly language instructions and I somewhat know what the code means. But, what the exam question actually wants me to do is confusing. I would really appreciate if someone could explain this question.
The question:
I am given a cache-memory which has room for 512 bytes and every row is 8 bytes long. The memory is direct-mapped and an "address" is 32 bits long. Also, the cache-memory is empty from the start.
After that, I get some instructions and am supposed to explain if it becomes a cache-hit or cache-miss. It should also be assumed that the instructions are all sequential and all data that is added/modified in an instruction still exists for the next instruction.
The instructions I get are
movia r8, 0xBEDA12C4
ldw r10, 0( r8 )
ldw r11, 8( r8 )
stw r10, 16( r8 )
ldw r10, 24(r8)
ldw r18, 32(r8)
Now I would really appreciate if someone could explain the details to me:
The cache-memory has room for a total of 512 bytes. What is this? Is it the total memory the cache is able to store? Also, I heard from somewhere that this is how you calculate rows in cache. For example, 512 bytes of memory and every row is 16 bytes. 512/16 = 32 rows in cache. For this example 512/8 = 64 rows. Which one is it? What does this mean!?
It also states that every row is 16 bytes long. I've seen the example with TAG, ROW, BYTE where they try to illustrate the cache. But how do I understand the 16 bytes per row? At least it doesn't seem to take part of the length on TAG, ROW, BYTE. What is this for?
Direct-mapped cache. I understand this somewhat. It's just a big row of slots of order which are empty or not, yeah? I found some information on this here.
http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Memory/direct.html
*Updated link: https://web.archive.org/web/20150213025748/http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Memory/direct.html
Now to the main part. How do I calculate for each instruction if it will be a cache miss or hit? My guess is that the first instruction ought to be a miss, since the question said that the cache memory is empty from the start. The second instruction also must be a cache miss but from this point on I am not sure how to calculate if the instruction generates a cache hit or miss. To be honest, I am not even sure what a hit would be.
I would really appreciate if someone could show me how to calculate each step and how I know whether an instruction creates a cache hit or miss. The instructions we get for calculating this are really confusing. Thank you so much!
Generally you have to look at it as at a separate memory space, with only 512 bytes, addressable, readable and writable as arrays 8 bytes each. If you need byte 2, the address will be 0, you read the whole array and select byte 3 from it. If you need byte 8, the address will be 1, and you select byte 0 from the array. Such small memory have one huge advantage - it is fast. It alone can store the contents of some larger memory space, only first 512 bytes. If you store something to address 1 of larger memory space, it will go to that smaller memory instead, the address will become 0 and offset 1, internally for that small amount of memory. If you access beyond that, for example, 1000, you will have to wait more. In this case it would be just memory mapped "registers" - it would be actually faster and better in some cases, than "cache" - unfortunately for some reason processor makers generally won't let you use the cache in that way (probably marketing and support reasons - to sell other products as a separate market share, with higher price).
If you add some more space to each array to store some other value, you can store a part of address there. Without hardware support you could store there virtually anything, that second part is called tag. Now if you have some address fffff000, you can read the second space (assuming that you have the commands to do so), from address 0 - for simplicity and speed you can obtain the address from the primary memory space by masking all the bits except bits 3..8 and 0-2 (which are used to obtain offset in 8 byte array), and check the tag part from that address. One bit in that tag may be used to indicate whether there is something stored there, the other bits may be used to store the part of address from main memory. If you want to save something cached there, you set the bit indicating that the array is not "empty", and assign the upper bits of the main address there, and copy the 8 bytes from the main memory. Next time, before reading something within that range in memory, you read the tag part of the smaller memory array first, then decide whether to read from slow main memory, or from that smaller but faster part (and it would be cache hit).
If you write something with an address of (+-)x512 bytes in main memory, you would have to read the already mentioned array of 8 bytes, copy it into main memory, whole 8 bytes, and write what you want into the very same cell, and then modify the address with a new value. But you would lose the previous copy of your data in the smaller memory area (but faster). If you need the previous value again (any of those 8 bytes), you would have to copy it again from main memory (cache miss).
The same goes for all other arrays of that "cache" memory. So we have a sequence of cache checking, writing, reading and copying the data to or from main memory.
That is called 1 way associativity, for 2 ways there would be one more array (same) of 512 bytes, which can store different addresses though (with the step of 512 from main memory), the tags of those 2 arrays may be checked simultaneously, and if some array has the copy of that memory range it can return it instead of reading it from main memory. Without tag checking (extra cycles for that), the "cache" is essentially a small amount of memory.

This is about memory in Computers

Does a Bit have an address or
Does a Byte ( 8 bits ) have an address or
Does both have addresses..
If u don't understand wat an address is : it is a used to access data in memory
something like 0x0000
Bits do not have an address, in the sense of being retrievable from the memory of a computer.
For convenience and historical reasons, data is addressed as bytes not bits. To work with an individual bit, in most programming languages you have to get the byte that contains the bit you want, and then perform some bit manipulation to convert it into a form that is usable by the language, like a byte or a boolean.
The smallest adressable piece of memory is defined as byte. So an adress is used to define where the first of 8 bits is starting at. Did that help?

Resources