I'm building a CPU in the nand2tetris course and I'm kind of stuck.
Do I have to check if the instruction is an A or C instruction?
In the A instruction guide it only shows the first control bit. The MSB controls the output of the first Mux. What controls the load of the A register if it's an A instruction?
If it's an A instruction the load of the A register should always be 1 im pretty sure.
If it's a C instruction there are lots of control bits but I can't use the same control bits for A instructions.
So should I be checking if the instruction coming in is a C or A instruction and then setting the control bits accordingly?
Here's a another picture that might be useful.
Here's one way to think about it:
In a C-instruction, there are a lot of bits (accccccdddjjj, plus the most significant bit that says it's a C-instruction) that determine what the various parts of the machine do. So if you are presented with a C-instruction, you just have to route those bits as appropriate to control the machine.
The A-instruction, on the other hand, doesn't have the control bits, it just has the instruction type bit. So in this instance, you have to generate the control bits so that there are no changes to the machine state other than storing the 15 bits into the A register (and incrementing the PC, of course).
You have to do something similar to handle Reset.
Related
I am currently writing a small debugger in assembly on windows plateform.
I open the debuggee process as follow:
invoke CreateProcess, addr buffer, NULL, NULL, NULL, FALSE, DEBUG_PROCESS+DEBUG_ONLY_THIS_PROCESS, NULL, NULL, addr startinfo, addr pi
It works well, i can get the EIP by looking on the context of the debuggee and so i can get the 1st byte of the instruction that will be executed.
However, I need to get the number of bytes that have been executed in the previous instruction.
Instructions are not size independant. Sometimes an instruction is just 1 byte, and some other time 6 bytes or more.
I tried to substract the previous EIP with the current EIP in order to get the number of bytes that have been executed. But it doesn't work if there is a jmp or a call because the address space is not the same anymore.
I planned to get a map of all opcode and make some cmp, but it seems to be a huge work to do.
If you have some idea in order to get the number of byte of the previous instruction that has been executed (maybe looking into a cache or something like that), please let me know.
Best regards
TL;DR
Keep it simple: single step and decode only the branch instructions and use EIP - last EIP unless the last instruction was a branch (in that case use the decoding to find the length).
If an unknown instruction is found, back off and don't provide its size.
It's impossible to decode an x86 instruction stream backward because x86 encoding is not symmetric (w.r.t. address growth), to see this consider mov eax, 90909090h or similar.
So you need to disassemble each instruction as you single step through the program (a debugger needs this anyway) and record its size.
The control transfer instructions are significantly less than the total number of instructions, so you could decode just that and use the EIP - EIP' (where EIP' is the EIP of the last instruction) trick otherwise.
Intel processors support Last Branch Recording but it requires OS support and you'd need to post-process the data anyway, it's seem too burdensome.
A similar argument can be made for the Intel Processor Trace technology.
I can't think of any event for the performance counters (granted that you can use them) that would result in the the number of bytes of an instruction.
Actually in the backend, the concept of "instruction" has been reduced to a sequence of uOPs (probably with a bit to say that an opcode is the last one in an instruction) and the front-end is mostly decoupled from the architectural value of eip (working almost always with a speculative value of eip) so it may be several instructions ahead of the backend.
I believe each uOP probably have a field to record how to update eip at retirement but not the size of an instruction in bytes.
Similarly in the front-end only in the pre-decode stage an instruction length in bytes is recorded, after that I think it's discarded (I can't think of any use of it).
Instructions in the L1 instruction cache are not yet decoded, so even if there was a way to inspect their content and metadata there would be nothing there.
The usual way this is done is by making a trace: single step thorough the program, disassemble the instruction at eip (see below), record its size, resume the program, repeat until a stop condition.
This gives you a list of addresses and instruction sizes.
If you find an instruction you can't decode you either not record the size for it or try to estimate it with some heuristic (its length must be less than 16B and you could in theory integrate the data with the count from a PMC like BR_INST_RETIRED.ALL_BRANCHES).
It's possible to detect the size of an instruction at runtime but that's totally not feasible in this context.
I am currently developing a subset of the 6502 in LogiSim. One of my main resources is Hanson's Block Diagram.
I am trying to determine how and where I should build circuitry to update the Processor States Register. In the diagram of the Processor Status Register below, there are multiple control lines going into the register, but there is no indication of where they come from.
When and where is the 6502 Processor status register updated? I would think that it is on the output of the ALU, but I want to make sure that this is the case.
Do you have Hanson's complete updated diagram? The paper is here. (Or original here.)
The inputs on the left side of P (DB0/C etc) are outputs from the bottom of the Random Control Logic block. The inputs at the top of P are from the ALU (ACR, AVR) and IR5 is bit 5 of the Instruction Register. (But from Breaknes below it seems Hanson's diagram is incomplete: "Donald missed the 0/V command on the schematic, which is used when processing the CLV instruction.")
The inputs will be latched differently for various instructions. For instance the two cycle instructions like CLC/SEC/CLD/SED/CLI/SEI/CLV have one bit (IR5) that ends up latching a hard-coded value to only one of C, I, V, or D. Other instructions will latch ALU (etc) signals to multiple flags at a later cycle. That's as much detail as I know, and as much of the logic that will fit into an answer here.
Very detailed information is available at the Russian Breaknes site. The author has reverse engineered all of the 6502 logic at the transistor level from images at Visual6502. Have a good look around in the Wiki and Info sections of the site. E.g. here is a translated link to the flag info page which has a logic diagram, unlike the wiki page on flag logic.
There was a lot of discussion in the 6502 forum when he did this work (flag logic on page 12 and page 15) and some of the content might only be linked from this thread. The original code repo has been moved to GitHub where there is emulator source code and Logisim circuit diagrams.
From the top:
The C flag is set or cleared by
any instruction that can have an unsigned overflow. These include ADC SBC, CMP, CPX, CPY
Shift and rotate instructions ASL, ROL, ROR
explicit set and clear instructions SEC, CLC
instructions that load the whole status register PLP, RTI
Z is set or cleared by
any instruction that writes to A, X, Y e.g. arithmetic as for carry, bitwise logical operations, loads, transfers, pulls from the stack, shifts and rotates.
instructions that load the whole status register PLP, RTI
BIT
I is set or cleared by
the SEI and CLI instructions.
instructions that load the whole status register PLP, RTI
set by BRK and interrupts.
D is set or cleared only by the PLP, RTI, SED and CLD instructions.
B is interesting. It's actually completely inaccessible to the programmer and not used by the processor. The status byte pushed on the stack is set for BRK and cleared for an interrupt. I guess that means that RTIand PLP would set it if it is set in the byte pulled off the stack, but it doesn't matter.
V flag is set or cleared by
ADC, SBC
BIT
instructions that load the whole status register PLP, RTI
CLV
N is set or cleared in the same circumstances as Z.
I would think that it is on the output of the ALU
That is a fair assessment for all the ALU ops, but as you can see from above there are circumstances when the status flags are set from a source other than the ALU.
Reference: http://www.e-tradition.net/bytes/6502/6502_instruction_set.html
Why is the register length (in bits) that a CPU operates on not dynamically/manually/arbitrarily adjustable? Would it make the computer slower if it was adjustable this way?
Imagine you had an 8-bit integer. If you could adjust the CPU register length to 8 bits, the CPU would only have to go through the first 8 bits instead of extending the 8-bit integer to 64 bits and then going through all 64 bits.
At first I thought you were asking if it was possible to have a CPU with no definitive register size. That make no sense since the number and size of the registers is a physical property of the hardware and cannot be changed.
However some architecture let the programmer work on a smaller part of a register or to pair registers.
The x86 does both for example, with add al, 9 (uses only 8 bits of the 64-bit rax) and div rbx (pairs rdx:rax to form a 128-bit register).
The reason this scheme is not so diffuse is that it comes with a lot of trade-offs.
More registers means more bits needed to address them, simply put: longer instructions.
Longer instructions mean less code density, more complex decoders and less performance.
Furthermore most elementary operations, like the logic ones, addition and subtraction are already implemented as operating on a full register in a single cycle.
Finally, one execution unit can handle only one instruction at a time, we cannot issue eight 8-bit additions in a 64-bit ALU at the same time.
So there wouldn't be any improvement, nor in the latency nor in the throughput.
Accessing partial registers is useful for the programmer to fan-out the number of available registers, so for example if an algorithm works with 16-bit data, the programmer could use a single physical 64-bit register to store four items and operate on them independently (but not in parallel).
The ISAs that have variable length instructions can also benefit from using partial register because that usually means smaller immediate values, for example and instruction that set a register to a specific value usually have an immediate operand that matches the size of register being loaded (though RISC usually sign-extends or zero-extends it).
Architectures like ARM (presumably others as well) supports half precision floats. The idea is to do what you were speculating and #Margaret explained. With half precision floats, you can pack two float values in a single register, thereby introducing less bandwidth at a cost of reduced accuracy.
Reference:
[1] ARM
[2] GCC
The reason this gets me confused is that all addresses hold a sequence of 1's and 0's. So how does the CPU differentiate, let's say, 00000100(integer) from 00000100(CPU instruction)?
First of all, different commands have different values (opcodes). That's how the CPU knows what to do.
Finally, the questions remains: What's a command, what's data?
Modern PCs are working with the von Neumann-Architecture ( https://en.wikipedia.org/wiki/John_von_Neumann) where data and opcodes are stored in the same memory space. (There are architectures seperating between these two data types, such as the Harvard architecture)
Explaining everything in Detail would totally be beyond the scope of stackoverflow, most likely the amount of characters per post would not be sufficent.
To answer the question with as few words as possible (Everyone actually working on this level would kill me for the shortcuts in the explanation):
Data in the memory is stored at certain addresses.
Each CPU Advice is basically consisting of 3 different addresses (NOT values - just addresses!):
Adress about what to do
Adress about value
Adress about an additional value
So, assuming an addition should be performed, and you have 3 Adresses available in the memory, the application would Store (in case of 5+7) (I used "verbs" for the instructions)
Adress | Stored Value
1 | ADD
2 | 5
3 | 7
Finally the CPU receives the instruction 1 2 3, which then means ADD 5 7 (These things are order-sensitive! [Command] [v1] [v2])... And now things are getting complicated.
The CPU will move these values (actually not the values, just the adresses of the values) into its registers and then processing it. The exact registers to choose depend on datatype, datasize and opcode.
In the case of the command #1 #2 #3, the CPU will first read these memory addresses, then knowing that ADD 5 7 is desired.
Based on the opcode for ADD the CPU will know:
Put Address #2 into r1
Put Address #3 into r2
Read Memory-Value Stored at the address stored in r1
Read Memory-Value stored at the address stored in r2
Add both values
Write result somewhere in memory
Store Address of where I put the result into r3
Store Address stored in r3 into the Memory-Address stored in r1.
Note that this is simplified. Actually the CPU needs exact instructions on whether its handling a value or address. In Assembly this is done by using
eax (means value stored in register eax)
[eax] (means value stored in memory at the adress stored in the register eax)
The CPU cannot perform calculations on values stored in the memory, so it is quite busy moving values From memory to registers and from registers to memory.
i.e. If you have
eax = 0x2
and in memory
0x2 = 110011
and the instruction
MOV ebx, [eax]
this means: move the value, currently stored at the address, that is currently stored in eax into the register ebx. So finally
ebx = 110011
(This is happening EVERYTIME the CPU does a single calculation!. Memory -> Register -> Memory)
Finally, the demanding application can read its predefined memory address #2,
resulting in address #2568 and then knows, that the outcome of the calculation is stored at adress #2568. Reading that Adress will result in the value 12 (5+7)
This is just a tiny tiny example of whats going on. For a more detailed introduction about this, refer to http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
One cannot really grasp the amount of data movement and calculations done for a simple addition of 2 values. Doing what a CPU does (on paper) would take you several minutes just to calculate "5+7", since there is no "5" and no "7" - Everything is hidden behind an address in memory, pointing to some bits, resulting in different values depending on what the bits at adress 0x1 are instructing...
Short form: The CPU does not know what's stored there, but the instructions tell the CPU how to interpret it.
Let's have a simplified example.
If the CPU is told to add a word (let's say, an 32 bit integer) stored at the location X, it fetches the content of that address and adds it.
If the program counter reaches the same location, the CPU will again fetch this word and execute it as a command.
The CPU (other than security stuff like the NX bit) is blind to whether it's data or code.
The only way data doesn't accidentally get executed as code is by carefully organizing the code to never refer to a location holding data with an instruction meant to operate on code.
When a program is started, the processor starts executing it at a predefined spot. The author of a program written in machine language will have intentionally put the beginning of their program there. From there, that instruction will always end up setting the next location the processor will execute to somewhere this is an instruction. This continues to be the case for all of the instructions that make up the program, unless there is a serious bug in the code.
There are two main ways instructions can set where the processor goes next: jumps/branches, and not explicitly specifying. If the instruction doesn't explicitly specify where to go next, the CPU defaults to the location directly after the current instruction. Contrast that to jumps and branches, which have space to specifically encode the address of the next instruction's address. Jumps always jump to the place specified. Branches check if a condition is true. If it is, the CPU will jump to the encoded location. If the condition is false, it will simply go to the instruction directly after the branch.
Additionally, the a machine language program should never write data to a location that is for instructions, or some other instruction at some future point in the program could try to run what was overwritten with data. Having that happen could cause all sorts of bad things to happen. The data there could have an "opcode" that doesn't match anything the processor knows what to do. Or, the data there could tell the computer to do something completely unintended. Either way, you're in for a bad day. Be glad that your compiler never messes up and accidentally inserts something that does this.
Unfortunately, sometimes the programmer using the compiler messes up, and does something that tells the CPU to write data outside of the area they allocated for data. (A common way this happens in C/C++ is to allocate an array L items long, and use an index >=L when writing data.) Having data written to an area set aside for code is what buffer overflow vulnerabilities are made of. Some program may have a bug that lets a remote machine trick the program into writing data (which the remote machine sent) beyond the end of an area set aside for data, and into an area set aside for code. Then, at some later point, the processor executes that "data" (which, remember, was sent from a remote computer). If the remote computer/attacker was smart, they carefully crafted the "data" that went past the boundary to be valid instructions that do something malicious. (To give them more access, destroy data, send back sensitive data from memory, etc).
this is because an ISA must take into account what a valid set of instructions are and how to encode data: memory address/registers/literals.
see this for more general info on how ISA is designed
https://en.wikipedia.org/wiki/Instruction_set
In short, the operating system tells it where the next instruction is. In the case of x64 there is a special register called rip (instruction pointer) which holds the address of the next instruction to be executed. It will automatically read the data at this address, decode and execute it, and automatically increment rip by the number of bytes of the instruction.
Generally, the OS can mark regions of memory (pages) as holding executable code or not. If an error or exploit tries to modify executable memory an error should occur, similarly if the CPU finds itself trying to execute non-executable memory it will/should also signal an error and terminate the program. Now you're into the wonderful world of software viruses!
I would like to use both 8-bit timers of an ATmega 64 microcontroller.
I used the following code to declare their compare interrupts:
.org 0x0012 ; Timer2 8 bit counter
rjmp TIM2
.org 0x001E ; Timer0 8 bit counter
rjmp TIM1
I noticed that if I enter the first interrupt (0x0012) the second timer doesn't work... its interrupt is never generated.
Why does this happen and how do I solve it?
I also notice something strange. If I reverse their order, I get the error:
Error 3 Overlap in .cseg: addr=0x1e conflicts with 0x1e:0x1f
On the ATmega other interrupts are blocked during the execution of any interrupt vector.
This is a useful feature for various reasons. This prevents an interrupt from interrupting itself, prevents potential stack overflows due to recursion, and allows special registers to be set aside specifically for use in low-latency interrupts without having to save them first, and ensures that the handler is atomic, among other reasons.
It is occasionally useful to explicitly use reentrant interrupts however, especially on the ATmega which lacks interrupt priority levels. To do this, simply add an SEI instruction to set the interrupt enable flag.
You must take great care to avoid the problems mentioned above when doing this though. Generally this means that any registers used must be preserved on the stack and that the interrupt itself needs to be disabled before the re-entrant part starts.
As for your address overlap problem, I suspect the problem is that your assembler counts its program addresses in bytes whereas the interrupt vector addresses in the datasheet are specified in words (for example, the timer 2 compare interrupt would be at 0x24 instead of 0x12). You also need to take care to return to the main code segment after finishing the definition of the vectors or any subsequent code will simply run on into the other vectors.