We express the target address in JUMP instruction only by 26 bits so this address becomes dependent on current PC+4 value.
Consider the case when the instruction that I want to jump to is located in my code at
0010&[my_target = 26 bits]&00
but instead we'll jump to the wrong address that starts with 0011 ( to 0011&[my_target = 26 bits]$00 ) because the PC was already incremented to such value that changed its 4th bit.
Is this case really possible?
If it is, how it can be solved?
Thanx
Is this case really possible?
Yes, it is. Quoting from MIPS32™ Architecture For Programmers
Volume II: The MIPS32™ Instruction Set:
When the jump instruction is in the last word of a 256 MB
region, it can branch only to the following 256 MB region containing the branch delay slot.
If it is, how it can be solved?
Load the address you want to jump to into a register and then use the jr instruction.
Related
I am currently writing a small debugger in assembly on windows plateform.
I open the debuggee process as follow:
invoke CreateProcess, addr buffer, NULL, NULL, NULL, FALSE, DEBUG_PROCESS+DEBUG_ONLY_THIS_PROCESS, NULL, NULL, addr startinfo, addr pi
It works well, i can get the EIP by looking on the context of the debuggee and so i can get the 1st byte of the instruction that will be executed.
However, I need to get the number of bytes that have been executed in the previous instruction.
Instructions are not size independant. Sometimes an instruction is just 1 byte, and some other time 6 bytes or more.
I tried to substract the previous EIP with the current EIP in order to get the number of bytes that have been executed. But it doesn't work if there is a jmp or a call because the address space is not the same anymore.
I planned to get a map of all opcode and make some cmp, but it seems to be a huge work to do.
If you have some idea in order to get the number of byte of the previous instruction that has been executed (maybe looking into a cache or something like that), please let me know.
Best regards
TL;DR
Keep it simple: single step and decode only the branch instructions and use EIP - last EIP unless the last instruction was a branch (in that case use the decoding to find the length).
If an unknown instruction is found, back off and don't provide its size.
It's impossible to decode an x86 instruction stream backward because x86 encoding is not symmetric (w.r.t. address growth), to see this consider mov eax, 90909090h or similar.
So you need to disassemble each instruction as you single step through the program (a debugger needs this anyway) and record its size.
The control transfer instructions are significantly less than the total number of instructions, so you could decode just that and use the EIP - EIP' (where EIP' is the EIP of the last instruction) trick otherwise.
Intel processors support Last Branch Recording but it requires OS support and you'd need to post-process the data anyway, it's seem too burdensome.
A similar argument can be made for the Intel Processor Trace technology.
I can't think of any event for the performance counters (granted that you can use them) that would result in the the number of bytes of an instruction.
Actually in the backend, the concept of "instruction" has been reduced to a sequence of uOPs (probably with a bit to say that an opcode is the last one in an instruction) and the front-end is mostly decoupled from the architectural value of eip (working almost always with a speculative value of eip) so it may be several instructions ahead of the backend.
I believe each uOP probably have a field to record how to update eip at retirement but not the size of an instruction in bytes.
Similarly in the front-end only in the pre-decode stage an instruction length in bytes is recorded, after that I think it's discarded (I can't think of any use of it).
Instructions in the L1 instruction cache are not yet decoded, so even if there was a way to inspect their content and metadata there would be nothing there.
The usual way this is done is by making a trace: single step thorough the program, disassemble the instruction at eip (see below), record its size, resume the program, repeat until a stop condition.
This gives you a list of addresses and instruction sizes.
If you find an instruction you can't decode you either not record the size for it or try to estimate it with some heuristic (its length must be less than 16B and you could in theory integrate the data with the count from a PMC like BR_INST_RETIRED.ALL_BRANCHES).
It's possible to detect the size of an instruction at runtime but that's totally not feasible in this context.
The reason this gets me confused is that all addresses hold a sequence of 1's and 0's. So how does the CPU differentiate, let's say, 00000100(integer) from 00000100(CPU instruction)?
First of all, different commands have different values (opcodes). That's how the CPU knows what to do.
Finally, the questions remains: What's a command, what's data?
Modern PCs are working with the von Neumann-Architecture ( https://en.wikipedia.org/wiki/John_von_Neumann) where data and opcodes are stored in the same memory space. (There are architectures seperating between these two data types, such as the Harvard architecture)
Explaining everything in Detail would totally be beyond the scope of stackoverflow, most likely the amount of characters per post would not be sufficent.
To answer the question with as few words as possible (Everyone actually working on this level would kill me for the shortcuts in the explanation):
Data in the memory is stored at certain addresses.
Each CPU Advice is basically consisting of 3 different addresses (NOT values - just addresses!):
Adress about what to do
Adress about value
Adress about an additional value
So, assuming an addition should be performed, and you have 3 Adresses available in the memory, the application would Store (in case of 5+7) (I used "verbs" for the instructions)
Adress | Stored Value
1 | ADD
2 | 5
3 | 7
Finally the CPU receives the instruction 1 2 3, which then means ADD 5 7 (These things are order-sensitive! [Command] [v1] [v2])... And now things are getting complicated.
The CPU will move these values (actually not the values, just the adresses of the values) into its registers and then processing it. The exact registers to choose depend on datatype, datasize and opcode.
In the case of the command #1 #2 #3, the CPU will first read these memory addresses, then knowing that ADD 5 7 is desired.
Based on the opcode for ADD the CPU will know:
Put Address #2 into r1
Put Address #3 into r2
Read Memory-Value Stored at the address stored in r1
Read Memory-Value stored at the address stored in r2
Add both values
Write result somewhere in memory
Store Address of where I put the result into r3
Store Address stored in r3 into the Memory-Address stored in r1.
Note that this is simplified. Actually the CPU needs exact instructions on whether its handling a value or address. In Assembly this is done by using
eax (means value stored in register eax)
[eax] (means value stored in memory at the adress stored in the register eax)
The CPU cannot perform calculations on values stored in the memory, so it is quite busy moving values From memory to registers and from registers to memory.
i.e. If you have
eax = 0x2
and in memory
0x2 = 110011
and the instruction
MOV ebx, [eax]
this means: move the value, currently stored at the address, that is currently stored in eax into the register ebx. So finally
ebx = 110011
(This is happening EVERYTIME the CPU does a single calculation!. Memory -> Register -> Memory)
Finally, the demanding application can read its predefined memory address #2,
resulting in address #2568 and then knows, that the outcome of the calculation is stored at adress #2568. Reading that Adress will result in the value 12 (5+7)
This is just a tiny tiny example of whats going on. For a more detailed introduction about this, refer to http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
One cannot really grasp the amount of data movement and calculations done for a simple addition of 2 values. Doing what a CPU does (on paper) would take you several minutes just to calculate "5+7", since there is no "5" and no "7" - Everything is hidden behind an address in memory, pointing to some bits, resulting in different values depending on what the bits at adress 0x1 are instructing...
Short form: The CPU does not know what's stored there, but the instructions tell the CPU how to interpret it.
Let's have a simplified example.
If the CPU is told to add a word (let's say, an 32 bit integer) stored at the location X, it fetches the content of that address and adds it.
If the program counter reaches the same location, the CPU will again fetch this word and execute it as a command.
The CPU (other than security stuff like the NX bit) is blind to whether it's data or code.
The only way data doesn't accidentally get executed as code is by carefully organizing the code to never refer to a location holding data with an instruction meant to operate on code.
When a program is started, the processor starts executing it at a predefined spot. The author of a program written in machine language will have intentionally put the beginning of their program there. From there, that instruction will always end up setting the next location the processor will execute to somewhere this is an instruction. This continues to be the case for all of the instructions that make up the program, unless there is a serious bug in the code.
There are two main ways instructions can set where the processor goes next: jumps/branches, and not explicitly specifying. If the instruction doesn't explicitly specify where to go next, the CPU defaults to the location directly after the current instruction. Contrast that to jumps and branches, which have space to specifically encode the address of the next instruction's address. Jumps always jump to the place specified. Branches check if a condition is true. If it is, the CPU will jump to the encoded location. If the condition is false, it will simply go to the instruction directly after the branch.
Additionally, the a machine language program should never write data to a location that is for instructions, or some other instruction at some future point in the program could try to run what was overwritten with data. Having that happen could cause all sorts of bad things to happen. The data there could have an "opcode" that doesn't match anything the processor knows what to do. Or, the data there could tell the computer to do something completely unintended. Either way, you're in for a bad day. Be glad that your compiler never messes up and accidentally inserts something that does this.
Unfortunately, sometimes the programmer using the compiler messes up, and does something that tells the CPU to write data outside of the area they allocated for data. (A common way this happens in C/C++ is to allocate an array L items long, and use an index >=L when writing data.) Having data written to an area set aside for code is what buffer overflow vulnerabilities are made of. Some program may have a bug that lets a remote machine trick the program into writing data (which the remote machine sent) beyond the end of an area set aside for data, and into an area set aside for code. Then, at some later point, the processor executes that "data" (which, remember, was sent from a remote computer). If the remote computer/attacker was smart, they carefully crafted the "data" that went past the boundary to be valid instructions that do something malicious. (To give them more access, destroy data, send back sensitive data from memory, etc).
this is because an ISA must take into account what a valid set of instructions are and how to encode data: memory address/registers/literals.
see this for more general info on how ISA is designed
https://en.wikipedia.org/wiki/Instruction_set
In short, the operating system tells it where the next instruction is. In the case of x64 there is a special register called rip (instruction pointer) which holds the address of the next instruction to be executed. It will automatically read the data at this address, decode and execute it, and automatically increment rip by the number of bytes of the instruction.
Generally, the OS can mark regions of memory (pages) as holding executable code or not. If an error or exploit tries to modify executable memory an error should occur, similarly if the CPU finds itself trying to execute non-executable memory it will/should also signal an error and terminate the program. Now you're into the wonderful world of software viruses!
I am reading about the architecture of intel's 8086 and can't figure out the following things about segmentation: I know that segment registers point to segments respectively and contain the base address of a 64kb long segment. But who calculates and in which point sets the physical address in the segment registers? Also, because one physical address can be accessed by multiple segment:offset pairs and segments can overlap, how you can be sure that you won't overwrite something? Where I can read more about this?
Generally speaking the Assembler will only use offset addresses to access a logical address. For example looking at this code:
start lea si,[hello] ; Load effective address of string
mov word [ds:si+10],0 ; Zero-terminate string after 10th letter
jmp $ ; Loop endlessly
; Fill rest of the segment with 0s
times 65536-($-$$) db 0x00
hello db "I'm just outside of the current segment. Hello!",0
The assembler will try to calculate the offset of 'hello' from the origin of the program. Since no origin is defined 0x0 will be assumed. However the offset of 'hello' would be 0x10000 in this case, which does not fit 16-bits. Therefor the Assembler will truncate the address to 0x0000. It will not change any of the Segment registers. However it will likely issue a warning, for example test.asm:1: warning: word data exceeds bounds. What actually happens when you run this program is that the jmp $ line is overwritten with zeroes, because the address of hello wrapped around and the CPU will start executing nothing but Zeroes, which was not what you intended to do.
That is of course only if the code-segment and data-segment are the same. Now who guarantees that is the case? Nobody really. Especially since I still don't know what platform you are coding for. It is entirely your resposibility to set up the segment registers with correct values. The easiest way to do so is:
push cs ; Push address of code segment to stack
pop ds ; Pop address back into data segment
push cs ; Same for extra data segment
pop es ;
This way you can be certain your you are accessing the offset in the correct-data segment.
Now regarding 'How do you make sure the code segment doesnt overlap the data segment', why shouldn't it? When your program with data is smaller than 64KB it is actually the easiest way to access data if your code and data segment are identical.
And how can you be sure that you don't overwrite anything important? Assembler can't help you with that, you have to check yourself if the segment:offset address you are writing to already contains data.
Im reading the book "Write great code: understanding the machine" by Randall Hyde, is a great and clear text but here im completely stuck with his explanation of, for example, the mov instruction.
He dissects the steps for the mov(srcReg,destMem) instruction as follows:
1. Fetch the instruction's opcode from memory.
2. Update the EIP register with the address of the byte following the opcode.
3. Decode the instruction's opcode to see what instruction it specifies.
4. Fetch the displacement associated with the memory operand from the memory location immediately
following the opcode.
5. Update EIP to point at the first byte beyond the operand that follows the opcode.
6. If the mov instruction uses a complex addressing mode (for example, the indexed addressing mode),compute the effective address of the destination memory location.
7. Fetch the data from srcReg.
8. Store the fetched value into the destination memory location.
Im lost in steps 4-6. My exact questions are:
Step 4: Why do I need this displacement, how Im gonna use it later and why?
Step5: I understand that in step 2, the EIP must "point" to the next byte where the next instruction to be executed is stored. But I dont understand why does EIP needs to be one byte beyond the operand address. I belived that EIP was concerned only with instructions/opcodes, not data.
Step6: What is exactly and effective address? Are there other types of address?
Step 4:
Some opcodes reference memory that's relative to the opcode's location. For example, a function might have a constant or static piece of data. If it does, the code may opt to place that right before the function starts (or right after it ends) and refer to it by saying "get the memory from 46 bytes earlier". That's the displacement -- it's an offset from the contents of a register (in this case, EIP), used for referencing data relative to the register's contents.
Step 5
The operands for opcodes are normally stored right after the opcode. So you might have some memory arranged like so: a b c. a is and opcode, b is the operand for a and c is the next opcode.
If you only move EIP to the end of a (so it references b), then in the next instruction cycle, the computer will assume that b is the next opcode to execute. b isn't supposed to be an opcode though; it's an operand. The computer can't tell the difference between an opcode and an operand though. It just assumes whatever EIP points to is an instruction and executes it. That's why EIP needs to be moved past the operand too.
Step 6
An "effective" address is just an absolute one (relative to the start of memory) while the "complex" address the book refers to is relative to something else (often the contents of a register).
Step 4 showed that an opcode might not refer to an absolute memory address. It could easily refer to a relative one. In fact, programs very frequently refer to addresses that are relative to some register. For example, if you wrote some_struct.data in C and compiled it for an x86 processor, it would load the address of some_struct into a register (say, EAX), then hard-code data's offset from the base of some_struct into the operand. So if there are 5 bytes of data between the start of the struct and the start of the data element, then the instruction might look like load [EAX + 5] -> EBX which means "take what's in EAX, add 5, fetch the data from that address and put it in EBX".
The thing is, the memory doesn't really understand relative addresses like this. It only understands absolute ones. So in order to access a relative address, the processor has to first add that 5 to whatever's in EAX to compute an absolute address. Then it can send that address to the memory controller and have it understood.
There are two basic types of relative addresses I've worked with (there are more I haven't).
Register relative: The processor takes the contents of a register and uses that as the address in memory. Depending on the opcode and processor support, it may also add an operand to the register as well. Step 4 was dealing with this kind of addressing, with EIP as the register the address was relative to.
Memory relative: Sometimes referred to as "indirect". The processor starts out with a register relative address, then automatically fetches the data at that address and treats it as the real address.
Wikipedia describes lots of other addressing modes on their addressing modes page.
Memory relative took me a while to understand. Say you did a memory relative load where the register contains 10 and the offset is 5. The processor will add them together (10 + 5 = 15). Then, it'll go to that address (15 in this case) and grab whatever's there. If address 15 happens to contain the value 60, then 60 will be treated as the actual address and the processor will load the contents of address 60. If you're familiar with a language with pointers (e.g. C), memory relative is like a pointer-to-a-pointer.
I'm confused when it comes to encoding the address for a J-format instruction.
From Class Notes:
Assume L1 is at the address 4194340 in decimal, which is 400024 in hexadecimal. We fill the target field as an address in instructions (0x100009) rather than bytes (0x400024).
Can someone please explain to me why this is?
The j instruction jumps to the passed target.
However, because the instruction set is limited to 32 bits, and 6 are used for the op-code, only 26 may be used for the jump target.
This means that the distance a j instruction can travel is limited as it works by appending its target to some number of the most significant bits of the current IPC.
The MIPS instruction set could have been defined by saying that when a j instruction is encountered you add the first 6 bits of the IPC to the 26 bit target of the j instruction, but instead it was noted that instructions that a program can jump are always "word-aligned". This means that these address are always a multiple of 4 and therefore the last 2 bits of the address are always 0.
This allows us to not encode the last 2 bits in our jump target and instead encode bits 3-28. This means that to get the target of a j instruction you take the first 4 bits of the PC, add the jump target, and then add two zeros.
Hopefully with that explanation out of the way it makes sense why the target 0x400024 is encoded in the j instruction by the bits 0x100009 i.e. 0x400024 >> 2. Because the last two bits are not needed.