LC3 TRAP's Instructions - lc3

This is a question that's giving me a lot of trouble, but which I need to understand for my Final Exam in 2 weeks. I don't know if it's the wording, but I have no idea how to arrive at a concrete answer. Here's the question:
"Bob plans to make changes to the mechanism of LC-3 TRAP instructions. He has two ideas:
Make use of the bit [8:11] of TRAP instructions.
The first instruction of the trap routine is stored at the address specified in the TRAP instruction, rather than the starting address of the trap routine.
In his new design, he still wants to implement as many TRAP routines as the original LC-3 TRAP. Calculate on average how many lines of instructions each TRAP routine will have in his new design."
I know TRAP has 3 fundamental TRAP Vectors, x20, x25, and x23? What does the "how many lines of instructions" even mean?

Is your class using the Mc Graw Hill LC-3 Simulator? Because reading through their text on the TRAP command, bits [8:11] aren't even sent to the MAR to load from memory, they're just dropped. Only bits [7:0] are used because they point to a location in the Trap Vector Table.
Memory locations x0000 through x00FF, 256 in all, are available to
contain starting addresses for system calls specified by their
corresponding trap vectors. This region of memory is called the Trap
Vector Table.
The Vector table is only using 6 of its 256 available trap vectors, so you could make 250 of your own Trap calls.
After trying what "Bob" was trying to do I get the following error "1024 can not be represented as an 8 bit trap vector" and when I try and manually fill in my own trap call (ex. TRAP400 .FILL xF400 ;which is 1111 0100 0000 0000) It won't run its subroutine.
That being said, your question can only mean that Bob is making his own version of the LC-3 and would like to increase the amount of trap vectors he can use. If that's the case then using bits [11:0] he could have 4,095 trap commands or 4,089 if you do not include the original 6.
I hope that helps.

That's pretty vague. A Trap subroutine is as large as it needs to be to execute a particular function. But if you're only counting the required lines needed in a subroutine then you would need at least 7 (or 1 if you only wanted your routine to return to the command that called it).
Looking at TRAP x21's routine we get:
.ORIG x0430 ; syscall address
ST R7, SaveR7
ST R1, SaveR1
TryWrite
LDI R1, CRTSR
BRzp TryWrite
WriteIt
STI R0, CRTDR
Return
LD R1, SaveR1
LD R7, SaveR7
RET
CRTSR .FILL xFE04
CRTDR .FILL xFE06
SaveR1 .FILL 0
SaveR7 .FILL 0
.END
We have to save the registers before we use them, and load them after running our routine. We need the variables to store those registers, and lastly we need a RET command to return to the command that called the routine.

Related

How are Ethereum bytecode JUMPs and JUMPDESTs resolved?

I've been looking around for info on how Ethereum deals with jumps and jump destinations. From various blogs and the yellow paper what I found is as follows:
The operand taken by JUMP and the first of the two operands taken by JUMPI are the value the the PC is set to (assume the first stack value != 0 in the case of JUMPI).
However, looking at this contract's creation code (as opcodes) the first few opcodes/values are:
PUSH1 0x60
PUSH1 0x40
MSTORE
CALLDATASIZE
ISZERO
PUSH2 0x00f8
JUMPI
As I understand it this means that if the value pushed to the stack by ISZERO != 0 then PC will change to 0x00f8 as JUMPI takes two from the stack, checks if the second is 0 and if not sets PC to the value of its first operand.
The problem I am having is that 0x00f8 in decimal is 248. The 248th position in the contract appears to be MSTORE and not a JUMPDEST, which would cause the contract to fail in its execution as JUMP* can only point to a valid JUMPDEST.
Presumably contracts don't jump to invalid destinations on purpose?
If anyone could explain how jumps and jump destinations are resolved I would be very grateful.
In case it helps others:
The confusion arose from the EVM reading byte by byte and NOT word by word.
From the example in the question, 0x00f8 would be the 248th byte, not the 248th word.
As each opcode is 1 byte long PC is normally incremented by 1 when reading an opcode.
However in the case of a PUSH instruction, information on how many of the following bytes are to be taken as its operand is also included.
For example PUSH2 takes the 2 bytes that follow it, PUSH6 takes 6 bytes that follow it, and so on. Here PC would be incremented by 1 for the PUSH and then 2 or 6 respectively for each byte of the data used by the PUSH.
Just want to point out that there is a difference in JUMP and JUMPI.
JUMP just takes 1 element from the stack i.e. destination. Which is generally an offset in hex pushed to the stack.
JUMPI is a conditional jump that takes top 2 elements from the stack i.e. destination and condition.
In the example you gave the condition is ISZERO(checks if the top most element of the stack is 0 or not).
So if that returns true, it will JUMP to the desitnation that is the offset 0x00f8(248 in decimal).
If the condition is False, it will just increase the program counter by 1.
In the contract you mentioned, it is a JUMPDEST opcode at (Program counter)248.
The program counter depends on the opcode. How much many bytes does a opcode push into the stack,etc. e.g.
PUSH1 0x60 - PC[0]
PUSH1 0x40 - PC[2]
MSTORE - PC[4]
CALLDATASIZE- PC[5]
ISZERO - PC[6]
PUSH2 0x00f8- PC[7]
JUMPI - PC[10]
Maybe this website will give you a better understanding on opcodes https://ethervm.io/

Understanding 8086 assembler debugger

I'm learning assembler and I need some help with understanding codes in the debugger, especially the marked part.
mov ax, a
mov bx, 4
I know how above instructions works, but in the debugger I have "2EA10301" and "BB0400".
What do they mean?
The first instruction moves variable a from data segment to the ax register, but in debugger I have cs:[0103].
What do mean these brackets and these numbers?
Thanks for any help.
The 2EA10301 and BB0400 numbers are the opcodes for the two instructions highlighted.
2E is Code Segment (CS) prefix and instructs the CPU to access memory with the CS segment instead of the default DS one.
A1 is the opcode for MOV AX, moffs16 and 0301 is the immediate 0103h in little endian, the address to read from.
So 2EA10301 is mov ax, cs:[103h].
The square brackets are the preferred way to denote a memory access through one the addressing mode but some assemblers support the confusing syntax without the brackets.
As this syntax is ambiguous and less standardised across different assemblers than the other, it is discouraged.
During the assembling the assembler keeps a location counter incremented for each byte emitted (each "section"/segment has its own counter, i.e. the counter is reset at the beginning of each "section").
This gives each variable an offset that is used to access it and to craft the instruction, variables names are for the human, CPUs can only read from addresses, numbers.
This offset will later be and address in memory once the file is loaded.
The assembler, the linker and the loader cooperate, there are various tricks at play, to make sure the final instruction is properly formed in memory and that the offset is transformed into the right address.
In your example their efforts culminate in the value 103h, that is the address of a in memory.
Again, in your example, the offset, if the file is a COM (by the way, don't put variables in the execution flow), was still 103h due to the peculiar structure of the COM files.
But in general, it could have been another number.
BB is MOV r16, imm16 with the register BX. The base form is B8 with the lower 3 bits indicating the register to use, BX is denoted by a value of 3 (011b in binary) and indeed 0B8h + 3 = 0BBh.
After the opcode, again, the WORD immediate 0400 that encodes 4 in little endian.
You now are in the position to realise that the assembly source is not always fully informative, as the assemblers implement some form of syntactic sugar.
The instruction mov ax, a, identical to mov bx, 4 in its syntax and that technically is move the immediate value, constant and known at assembly time, given by the address of a into ax, is instead interpreted as move the content of a, a value present in memory and readable only with a memory access, into ax because a is known to be a variable.
This phenomenon is limited in the x86, being CISC, and more widespread in the RISC world, where the lack of commonly needed instructions is compensated with pseudo-instructions.
Well, first, assembler is x86 Assembly. The assembler is what turns the instructions into machine code.
When you disassemble programs, it probably will use the hex values (like 90 is NOP instruction or B8 to move something to AX).
Square brackets copies the memory address to which the register points to.
The hex on the side is called the address.
Everything is very simple. The command mov ax, cx: [0103] means that the value of 000Ah is loaded into the register ax. This value is taken from the code segment at 0103h. Slightly higher in the pictures you can see this value. cx: 0101 0B900A00. Accordingly, at the address 0101h to be the value 0Bh, 0102h to be the value 90h, 0103h to be the value 0Ah, 0104h to be the value 00h. It turns out that the AL register loads the value from the address 0103h equal to 0Ah. It turns out that the AH register loads the value from the address 0104h equal to 00h and it turns out ax = 000Ah. If instead of the ax command, cx: [0103] there was the ax command, cx: [0101], then ax = 900Bh or the ax command, cx: [0102], then ax = 0A90h.

"PUSH" "POP" Or "MOVE"?

When it comes to temporarily storage for an existing value in a register, all modern compilers(at least the ones I experienced) do PUSH and POP instructions. But why not store the data in another register if it's available?
So, where should the temporarily storage for an existing value goes? Stack Or Register?
Consider the following 1st Code:
MOV ECX,16
LOOP:
PUSH ECX ;Value saved to stack
... ;Assume that here's some code that must uses ECX register
POP ECX ;Value released from stack
SUB ECX,1
JNZ LOOP
Now consider the 2st Code:
MOV ECX,16
LOOP:
MOV ESI,ECX ;Value saved to ESI register
... ;Assume that here's some code that must uses ECX register
MOV ECX,ESI ;Value returned to ECX register
SUB ECX,1
JNZ LOOP
After all, which one of the above code is better and why?
Personally I think the first code is better on size since PUSH and POP only takes 1 bytes while MOV takes 2; and second code is better on speed because data moving between registers is faster than memory access.
It does make a lot of sense to do that. But I think the simplest answer is all the other registers are being used. In order to use some other register you would need to push it on the stack.
Compilers are smart enough. Keeping track of what is in a register for a compiler is somewhat trivial, that is not a problem. Speaking generically not necessarily x86 specific, esp when you have more registers (than an x86), you are going to have some registers that are used for input (in your calling convention), some you can trash, that may be the same as the input ones or not, some you cant trash you have to preserve them first. Some instruction sets have special registers, must use this one for auto increment, that one for register indirect, etc.
You will most definitely if not trivial to get the compiler to produce code for an arm for example where the input and the trashable registers are the same set, but that means that if you call another function and create the calling function right it needs to save something to use after the return:
unsigned int more_fun ( unsigned int );
unsigned int fun ( unsigned int x )
{
return(more_fun(x)+x);
}
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: e1a04000 mov r4, r0
8: ebfffffe bl 0 <more_fun>
c: e0840000 add r0, r4, r0
10: e8bd4010 pop {r4, lr}
14: e12fff1e bx lr
I told you it was trivial. Now to use your argument backward, why didnt they just push r0 on the stack and pop it off later, why push r4? Not r0-r3 are used for input and are volatile, r0 is the return register when it fits, r4 almost all the way up you have to preserve (one exception I think).
So r4 is assumed to be used by the caller or some caller up the line, the calling convention dictates you cannot trash it you must preserve it so you have to assume it is used. You can trash r0-r3, but you cant use one of those as the callee can trash them too, so in this case we need to take the incoming value x and both use it (pass it on) and preserve it for after the return so they did both, the "used another register with a move" but in order to do that they preserved that other register.
Why save r4 to the stack in this case is very obvious, you can save it up front with the return address, in particular arm wants you to always use the stack in 64 bit chunks so two registers at a time ideally or at least keep it aligned on a 64 bit boundary, so you have to save lr anyway, so they are going to push something else too even if they dont have, to in this case the saving of r4 is a freebie, and since they need to save r0 and at the same time use it. r4 or r5 or something above is a good choice.
BTW looks like an x86 compiler did with above.
0000000000000000 <fun>:
0: 53 push %rbx
1: 89 fb mov %edi,%ebx
3: e8 00 00 00 00 callq 8 <fun+0x8>
8: 01 d8 add %ebx,%eax
a: 5b pop %rbx
b: c3 retq
demonstration of them pushing something that they dont need to preserve:
unsigned int more_fun ( unsigned int );
unsigned int fun ( unsigned int x )
{
return(more_fun(x)+1);
}
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: ebfffffe bl 0 <more_fun>
8: e8bd4010 pop {r4, lr}
c: e2800001 add r0, r0, #1
10: e12fff1e bx lr
No reason to save r4, they just needed some register to make the stack aligned, so in this case r4 was chosen, some versions of this compiler you will see r3 or some other register used.
Remember humans (still) write compilers and the optimizers, etc. So they why this and why that is really a question for that human or those humans, and we cant really tell you what they were thinking. It is not a simple task for sure, but it is not hard to take a reasonable sized function and/or project and find opportunities to hand tune compiler output, to improve it. Of course beauty is in the eye of the beholder, one definition of improve is another's definition of make worse. One instruction mix might use less total instruction bytes, so that is "better" by program size standards, another may or may not use more instructions or bytes, but execute faster, one might have less memory accesses at the cost of instructions to ideally execute faster, etc.
There are architectures with hundreds of general purpose registers, but most of the ones we touch products with daily dont have that many, so you can generally make a function or some code that has so many variables in flight in a function that you have to start saving off to the stack mid function. So you cant always just save a few registers at the beginning and the end of the function to give you more working registers mid function, if the number of working registers you need mid function is more registers than you have. It actually takes some practice to be able to write code that doesnt optimize to the point of not needing too many registers, but once you start to see how the compilers work by examining their output, you can write trivial functions like the ones above to prevent optimizations or force preservation of registers mid function, etc.
At the end of the day for the compiler to be somewhat sane it needs a calling convention, it keeps the authors from going crazy and the compiler from being a nightmare to code and manage. And the calling convention is very clearly going to define the input and output register(s) any volatile registers, and the ones that have to be preserved.
unsigned int fun ( unsigned int x, unsigned int y, unsigned int z )
{
unsigned int a;
a=x<<y;
a+=(y<<z);
a+=x+y+z;
return(a);
}
00000000 <fun>:
0: e0813002 add r3, r1, r2
4: e0833000 add r3, r3, r0
8: e0832211 add r2, r3, r1, lsl r2
c: e0820110 add r0, r2, r0, lsl r1
10: e12fff1e bx lr
Only spent a few seconds on that but could have worked harder on it. I didnt push past four registers total, granted I had four variables. And I didnt call any functions so the compiler was free to just trash r0-r3 as needed as the dependencies worked out. So I didnt have to save r4 in order to create a temporary storage, it didnt have to use the stack it just optimized the order of execution to for example free up r2, the z variable so that later it could use r2 as an intermediate variable, one of the instances of a equals something. Keeping it down to four registers instead of burning a fifth one.
If I was more creative with my code and I added in calls to functions, I could get it to burn a lot more registers, you would see as even in this last case, the compiler has no problem whatsoever keeping track of what is where, and you will see when you play with the compilers there is no reason that they have to keep your high level language variables intact in the same register throughout much less execute in the same order you wrote your code (so long as it is legal), but they are still at the mercy of the calling convention, if any only some of the registers are considered volatile, and if you call a function from your function at a certain time in the code, then you have to preserve that content so you cant use them as long term storage, and the ones that are not volatile are already considered to be consumed so they have to be preserved to use them, then it becomes in part a performance question, does it cost more (size, speed, etc) to save to the stack on the fly or can I preserve up front in a way that possibly reduces instructions or can be invisible and/or consume less clocks with a larger transfer rather than separate, less efficient transfers mid function?
I have said this seven times now but the bottom line is the calling convention for that compiler (version) and target (and command line options/defaults). If you have volatile registers (arbitrary calling convention thing for general purpose registers, not a hardware/ISA thing) and you are not calling any other functions, then they are easy to use and save you expensive stack (memory) transactions. If you are calling someone then they can be trashed by them so they may no longer be free, depends on your code. The non-volatile registers are considered consumed by callers so you have to burn stack operations in order to use them, they are not free to use. And then it becomes performance as to when and where to use the stack, pushes and pops and movs. No two compilers are expected to generate the same code even if they use the same convention, but you can see above it is somewhat trivial to make test functions, compile them and examine the output, tweak here and there to navigate through and around that (compiler, version and target and convention and command line options) optimizer.
Using a register is a bit faster, but requires you to keep track of which registers are available, and you can run out of registers. Also, this method cannot be use recursively. In addition, some registers will get trashed if you use INT or CALL to invoke a subroutine.
Use of the stack (POP and PUSH) can be used as many times as needed (so long as you don't run out of stack space), and in addition it supports recursive logic. You can use the stack safely with INT or CALL because by convention any subroutine should reserve its own portion of the stack, and must restore it to its previous state (or else the RET instruction would fail).
Do trust the work of the optimizing compiler, based on the work of decades of code generation specialists.
They fill as much registers as available and extend to the stack when needed, comparing different options. And they also care about tradeoffs between storing a value for later reuse vs. recomputation of the value.
There is no single rule "register vs. stack", it's a matter of global optimization, taking into account the processor's peculiarities. And in general, there is no single "best solution" as it will depend on your "bestness" criteria.
Except when very creative workarounds can be found (or when exploiting data properties known of you only), you can't beat a compiler.
When thinking about speed, you always have to keep in mind a sense of proportion.
If the function being compiled calls other functions,
those push and pop instructions may be insignificant,
compared to the number of instructions executed in between them.
Compiler writers know, in that kind of case, which is very common, one shouldn't be penny-wise and pound-foolish.
By using PUSH and POP, you can save at least one registers. This will be significant if you working with limited available registers. On the other hand, yes, sometimes using MOV is better in speed, but you also have to keep in mind which register is used as a temporary storage. This will be hard if you want to store several values that needed to be processed later

Appending 0 before the hexa number

I have been instructed by my teacher to append 0 before the hexa numbers while writing instructions as some compilers search for 0 before the number in an instruction to differentiate it from a label. I am confused if the instruction already starts with a 0, what should be done in such a case?
For Example,
AND BL, 0FH
Is there a need of adding 0 before that hexa number or not? Please help me out. Thanks
EDIT:
Sorry if I had not been clearer enough before. What I meant was that in the above example, a 0 is already present, do I need to convert it to,
AND BL, 00FH
Except for the special cases like 0 or 1, I tend to encode my hex numbers with the full complement of digits just so it's easier to see what the intent is:
mov al, 09h
mov ax, 0123h
and so on.
For cases where the number starts with an alpha character (like deadbeef), I prefix it with an extra 0.
But no, it's not usually (a) necessary to do this if your hex number already begins with a digit.
In any case, I'd be putting most numbers into an equ statement rather than sprinkling magic numbers throughout the code. Would you rather see:
mov ax, 80
or:
mov ax, lines_per_screen
(a) Of course, it depends on your assembler but, from memory, all the ones I've used work this way.
No, there's no need (and including more than one leading 0 is fairly unusual).
Your example is an apt one though -- without the leading 0 to tell it this was a number, the assembler would normally interpret FH as a symbol rather than a number.

Usefulness of LOOPNE

I am unable to understand the usefulness of LOOPNE. Even if LOOPNE was not there and only LOOP was there, it would have done the same thing here. Please help me out.
MOV CX, 80
MOV AH,1
INT 21H
CMP AL, ' '
LOOPNE BACK
CMP is more or less a SUB instruction without changing the value, which means that it sets flags such as ZF (the zero flag).
LOOPNE has 2 conditions to loop: cx > 0 and ZF = 0
LOOP has 1 condition to loop: cx > 0
So, a normal LOOP would go through all characters, whereas LOOPNE will go through all characters, or until a space is encountered. Whichever comes first
LOOPNE loops when a comparison fails, and when there is a remaining nonzero iteration count (after decrementing it). This is arguably very convenient for finding an element in a linear list of known length.
There is little use for it in modern x86 CPUs.
The LOOPNE instruction is likely implemented internally in the CPU by microinstructions and thus effectively equivalent to JNE/DEC CX/JNE.
Because the CPU designers invest vast amounts of effort to optimize compare/branch/register arithmetic, the equivalent instruction sequence is likely, on a highly pipelined CPU, to execute virtually just as fast. It may actually execute slower; you'll only know by timing it. And the fact that you are confused about what it does makes it a source of coding errors.
I presently code the equivalent instruction sequence, because I got bit by a misunderstanding once. I'm not confused about CMP and JNE.

Resources