We have just read a single numeric digit from the keyboard using the GETC command. Convert this value to binary and place it into R4 - lc3

I need to write this in LC-3 assembly language:
We have just read a single numeric digit from the keyboard using the GETC command. Convert this value to binary and place it into R4.

Here's an example, adapted from a lab manual written by George M. Georgiou and Brian Strader.
3.2.3 How to read an input value
The assembly command GETC, which is another name for TRAP x20, reads a single character from the keyboard and places its ASCII value in register R0. The 8 most significant bits of R0 are cleared. There is no echo of the read character. For example, one may use the following code to read a single numerical character, 0 through 9, and place its value in register R3:
GETC ; Place ASCII value of input character into R0
ADD R3, R0, x0 ; Copy R0 into R3
ADD R3, R3, #−16 ; Subtract 48, the ASCII value of 0
ADD R3, R3, #−16
ADD R3, R3, #−16 ; R3 now contains the actual value
Notice that it was necessary to use three instructions to subtract 48, since the maximum possible value of the immediate operand of ADD is 5 bits, in two’s complement format. Thus, -16 is the most we can subtract with the immediate version of the ADD instruction. As an example, if the pressed key was "5", its ASCII value 53 will be placed in R0. Subtracting 48 from 53, the value 5 results, as expected, and is placed in register R3.
Original Source.
You'll need to adapt this to put the result in R4.

Related

Trying to make an LC3 binary code that checks if 2 numbers are equal

I am trying to make a machine instruction program that asks for two user inputs(single digit) and then checks that two numbers are the same. Then it stores the first user input into the Register R3, and the second user input into R4. If two numbers are the same, the R5 will be set 1. Otherwise, R5 is 0. Finally, print out the check result R5(The printing result is Optional).

Understanding 8086 assembler debugger

I'm learning assembler and I need some help with understanding codes in the debugger, especially the marked part.
mov ax, a
mov bx, 4
I know how above instructions works, but in the debugger I have "2EA10301" and "BB0400".
What do they mean?
The first instruction moves variable a from data segment to the ax register, but in debugger I have cs:[0103].
What do mean these brackets and these numbers?
Thanks for any help.
The 2EA10301 and BB0400 numbers are the opcodes for the two instructions highlighted.
2E is Code Segment (CS) prefix and instructs the CPU to access memory with the CS segment instead of the default DS one.
A1 is the opcode for MOV AX, moffs16 and 0301 is the immediate 0103h in little endian, the address to read from.
So 2EA10301 is mov ax, cs:[103h].
The square brackets are the preferred way to denote a memory access through one the addressing mode but some assemblers support the confusing syntax without the brackets.
As this syntax is ambiguous and less standardised across different assemblers than the other, it is discouraged.
During the assembling the assembler keeps a location counter incremented for each byte emitted (each "section"/segment has its own counter, i.e. the counter is reset at the beginning of each "section").
This gives each variable an offset that is used to access it and to craft the instruction, variables names are for the human, CPUs can only read from addresses, numbers.
This offset will later be and address in memory once the file is loaded.
The assembler, the linker and the loader cooperate, there are various tricks at play, to make sure the final instruction is properly formed in memory and that the offset is transformed into the right address.
In your example their efforts culminate in the value 103h, that is the address of a in memory.
Again, in your example, the offset, if the file is a COM (by the way, don't put variables in the execution flow), was still 103h due to the peculiar structure of the COM files.
But in general, it could have been another number.
BB is MOV r16, imm16 with the register BX. The base form is B8 with the lower 3 bits indicating the register to use, BX is denoted by a value of 3 (011b in binary) and indeed 0B8h + 3 = 0BBh.
After the opcode, again, the WORD immediate 0400 that encodes 4 in little endian.
You now are in the position to realise that the assembly source is not always fully informative, as the assemblers implement some form of syntactic sugar.
The instruction mov ax, a, identical to mov bx, 4 in its syntax and that technically is move the immediate value, constant and known at assembly time, given by the address of a into ax, is instead interpreted as move the content of a, a value present in memory and readable only with a memory access, into ax because a is known to be a variable.
This phenomenon is limited in the x86, being CISC, and more widespread in the RISC world, where the lack of commonly needed instructions is compensated with pseudo-instructions.
Well, first, assembler is x86 Assembly. The assembler is what turns the instructions into machine code.
When you disassemble programs, it probably will use the hex values (like 90 is NOP instruction or B8 to move something to AX).
Square brackets copies the memory address to which the register points to.
The hex on the side is called the address.
Everything is very simple. The command mov ax, cx: [0103] means that the value of 000Ah is loaded into the register ax. This value is taken from the code segment at 0103h. Slightly higher in the pictures you can see this value. cx: 0101 0B900A00. Accordingly, at the address 0101h to be the value 0Bh, 0102h to be the value 90h, 0103h to be the value 0Ah, 0104h to be the value 00h. It turns out that the AL register loads the value from the address 0103h equal to 0Ah. It turns out that the AH register loads the value from the address 0104h equal to 00h and it turns out ax = 000Ah. If instead of the ax command, cx: [0103] there was the ax command, cx: [0101], then ax = 900Bh or the ax command, cx: [0102], then ax = 0A90h.

"PUSH" "POP" Or "MOVE"?

When it comes to temporarily storage for an existing value in a register, all modern compilers(at least the ones I experienced) do PUSH and POP instructions. But why not store the data in another register if it's available?
So, where should the temporarily storage for an existing value goes? Stack Or Register?
Consider the following 1st Code:
MOV ECX,16
LOOP:
PUSH ECX ;Value saved to stack
... ;Assume that here's some code that must uses ECX register
POP ECX ;Value released from stack
SUB ECX,1
JNZ LOOP
Now consider the 2st Code:
MOV ECX,16
LOOP:
MOV ESI,ECX ;Value saved to ESI register
... ;Assume that here's some code that must uses ECX register
MOV ECX,ESI ;Value returned to ECX register
SUB ECX,1
JNZ LOOP
After all, which one of the above code is better and why?
Personally I think the first code is better on size since PUSH and POP only takes 1 bytes while MOV takes 2; and second code is better on speed because data moving between registers is faster than memory access.
It does make a lot of sense to do that. But I think the simplest answer is all the other registers are being used. In order to use some other register you would need to push it on the stack.
Compilers are smart enough. Keeping track of what is in a register for a compiler is somewhat trivial, that is not a problem. Speaking generically not necessarily x86 specific, esp when you have more registers (than an x86), you are going to have some registers that are used for input (in your calling convention), some you can trash, that may be the same as the input ones or not, some you cant trash you have to preserve them first. Some instruction sets have special registers, must use this one for auto increment, that one for register indirect, etc.
You will most definitely if not trivial to get the compiler to produce code for an arm for example where the input and the trashable registers are the same set, but that means that if you call another function and create the calling function right it needs to save something to use after the return:
unsigned int more_fun ( unsigned int );
unsigned int fun ( unsigned int x )
{
return(more_fun(x)+x);
}
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: e1a04000 mov r4, r0
8: ebfffffe bl 0 <more_fun>
c: e0840000 add r0, r4, r0
10: e8bd4010 pop {r4, lr}
14: e12fff1e bx lr
I told you it was trivial. Now to use your argument backward, why didnt they just push r0 on the stack and pop it off later, why push r4? Not r0-r3 are used for input and are volatile, r0 is the return register when it fits, r4 almost all the way up you have to preserve (one exception I think).
So r4 is assumed to be used by the caller or some caller up the line, the calling convention dictates you cannot trash it you must preserve it so you have to assume it is used. You can trash r0-r3, but you cant use one of those as the callee can trash them too, so in this case we need to take the incoming value x and both use it (pass it on) and preserve it for after the return so they did both, the "used another register with a move" but in order to do that they preserved that other register.
Why save r4 to the stack in this case is very obvious, you can save it up front with the return address, in particular arm wants you to always use the stack in 64 bit chunks so two registers at a time ideally or at least keep it aligned on a 64 bit boundary, so you have to save lr anyway, so they are going to push something else too even if they dont have, to in this case the saving of r4 is a freebie, and since they need to save r0 and at the same time use it. r4 or r5 or something above is a good choice.
BTW looks like an x86 compiler did with above.
0000000000000000 <fun>:
0: 53 push %rbx
1: 89 fb mov %edi,%ebx
3: e8 00 00 00 00 callq 8 <fun+0x8>
8: 01 d8 add %ebx,%eax
a: 5b pop %rbx
b: c3 retq
demonstration of them pushing something that they dont need to preserve:
unsigned int more_fun ( unsigned int );
unsigned int fun ( unsigned int x )
{
return(more_fun(x)+1);
}
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: ebfffffe bl 0 <more_fun>
8: e8bd4010 pop {r4, lr}
c: e2800001 add r0, r0, #1
10: e12fff1e bx lr
No reason to save r4, they just needed some register to make the stack aligned, so in this case r4 was chosen, some versions of this compiler you will see r3 or some other register used.
Remember humans (still) write compilers and the optimizers, etc. So they why this and why that is really a question for that human or those humans, and we cant really tell you what they were thinking. It is not a simple task for sure, but it is not hard to take a reasonable sized function and/or project and find opportunities to hand tune compiler output, to improve it. Of course beauty is in the eye of the beholder, one definition of improve is another's definition of make worse. One instruction mix might use less total instruction bytes, so that is "better" by program size standards, another may or may not use more instructions or bytes, but execute faster, one might have less memory accesses at the cost of instructions to ideally execute faster, etc.
There are architectures with hundreds of general purpose registers, but most of the ones we touch products with daily dont have that many, so you can generally make a function or some code that has so many variables in flight in a function that you have to start saving off to the stack mid function. So you cant always just save a few registers at the beginning and the end of the function to give you more working registers mid function, if the number of working registers you need mid function is more registers than you have. It actually takes some practice to be able to write code that doesnt optimize to the point of not needing too many registers, but once you start to see how the compilers work by examining their output, you can write trivial functions like the ones above to prevent optimizations or force preservation of registers mid function, etc.
At the end of the day for the compiler to be somewhat sane it needs a calling convention, it keeps the authors from going crazy and the compiler from being a nightmare to code and manage. And the calling convention is very clearly going to define the input and output register(s) any volatile registers, and the ones that have to be preserved.
unsigned int fun ( unsigned int x, unsigned int y, unsigned int z )
{
unsigned int a;
a=x<<y;
a+=(y<<z);
a+=x+y+z;
return(a);
}
00000000 <fun>:
0: e0813002 add r3, r1, r2
4: e0833000 add r3, r3, r0
8: e0832211 add r2, r3, r1, lsl r2
c: e0820110 add r0, r2, r0, lsl r1
10: e12fff1e bx lr
Only spent a few seconds on that but could have worked harder on it. I didnt push past four registers total, granted I had four variables. And I didnt call any functions so the compiler was free to just trash r0-r3 as needed as the dependencies worked out. So I didnt have to save r4 in order to create a temporary storage, it didnt have to use the stack it just optimized the order of execution to for example free up r2, the z variable so that later it could use r2 as an intermediate variable, one of the instances of a equals something. Keeping it down to four registers instead of burning a fifth one.
If I was more creative with my code and I added in calls to functions, I could get it to burn a lot more registers, you would see as even in this last case, the compiler has no problem whatsoever keeping track of what is where, and you will see when you play with the compilers there is no reason that they have to keep your high level language variables intact in the same register throughout much less execute in the same order you wrote your code (so long as it is legal), but they are still at the mercy of the calling convention, if any only some of the registers are considered volatile, and if you call a function from your function at a certain time in the code, then you have to preserve that content so you cant use them as long term storage, and the ones that are not volatile are already considered to be consumed so they have to be preserved to use them, then it becomes in part a performance question, does it cost more (size, speed, etc) to save to the stack on the fly or can I preserve up front in a way that possibly reduces instructions or can be invisible and/or consume less clocks with a larger transfer rather than separate, less efficient transfers mid function?
I have said this seven times now but the bottom line is the calling convention for that compiler (version) and target (and command line options/defaults). If you have volatile registers (arbitrary calling convention thing for general purpose registers, not a hardware/ISA thing) and you are not calling any other functions, then they are easy to use and save you expensive stack (memory) transactions. If you are calling someone then they can be trashed by them so they may no longer be free, depends on your code. The non-volatile registers are considered consumed by callers so you have to burn stack operations in order to use them, they are not free to use. And then it becomes performance as to when and where to use the stack, pushes and pops and movs. No two compilers are expected to generate the same code even if they use the same convention, but you can see above it is somewhat trivial to make test functions, compile them and examine the output, tweak here and there to navigate through and around that (compiler, version and target and convention and command line options) optimizer.
Using a register is a bit faster, but requires you to keep track of which registers are available, and you can run out of registers. Also, this method cannot be use recursively. In addition, some registers will get trashed if you use INT or CALL to invoke a subroutine.
Use of the stack (POP and PUSH) can be used as many times as needed (so long as you don't run out of stack space), and in addition it supports recursive logic. You can use the stack safely with INT or CALL because by convention any subroutine should reserve its own portion of the stack, and must restore it to its previous state (or else the RET instruction would fail).
Do trust the work of the optimizing compiler, based on the work of decades of code generation specialists.
They fill as much registers as available and extend to the stack when needed, comparing different options. And they also care about tradeoffs between storing a value for later reuse vs. recomputation of the value.
There is no single rule "register vs. stack", it's a matter of global optimization, taking into account the processor's peculiarities. And in general, there is no single "best solution" as it will depend on your "bestness" criteria.
Except when very creative workarounds can be found (or when exploiting data properties known of you only), you can't beat a compiler.
When thinking about speed, you always have to keep in mind a sense of proportion.
If the function being compiled calls other functions,
those push and pop instructions may be insignificant,
compared to the number of instructions executed in between them.
Compiler writers know, in that kind of case, which is very common, one shouldn't be penny-wise and pound-foolish.
By using PUSH and POP, you can save at least one registers. This will be significant if you working with limited available registers. On the other hand, yes, sometimes using MOV is better in speed, but you also have to keep in mind which register is used as a temporary storage. This will be hard if you want to store several values that needed to be processed later

Understanding Subi Syntax for AVR Programming

I've come across a certain piece of code that i'm not quite understanding and have been unable to find any information on it. It's a macro that takes in a register and then should display the result on the LCD.
The contents of the register being passed in should be a single digit number.
.macro do_lcd_rdata
mov lcd, #0
subi lcd, -'0'
rcall lcd_data
rcall lcd_wait
.endmacro
The part I am confused about is what subi lcd, -'0' this means. SUBI is subtract immediate but I am confused about what -'0' is.
-'0' is the negative of the ascii value of the character '0'. The operation is effectively adding 0x30 or 48 to the value in the register to turn it into the equivalent ascii character value of the digit.
For example, 6 - -'0' = 6 + 48 = 54 = '6'

Appending 0 before the hexa number

I have been instructed by my teacher to append 0 before the hexa numbers while writing instructions as some compilers search for 0 before the number in an instruction to differentiate it from a label. I am confused if the instruction already starts with a 0, what should be done in such a case?
For Example,
AND BL, 0FH
Is there a need of adding 0 before that hexa number or not? Please help me out. Thanks
EDIT:
Sorry if I had not been clearer enough before. What I meant was that in the above example, a 0 is already present, do I need to convert it to,
AND BL, 00FH
Except for the special cases like 0 or 1, I tend to encode my hex numbers with the full complement of digits just so it's easier to see what the intent is:
mov al, 09h
mov ax, 0123h
and so on.
For cases where the number starts with an alpha character (like deadbeef), I prefix it with an extra 0.
But no, it's not usually (a) necessary to do this if your hex number already begins with a digit.
In any case, I'd be putting most numbers into an equ statement rather than sprinkling magic numbers throughout the code. Would you rather see:
mov ax, 80
or:
mov ax, lines_per_screen
(a) Of course, it depends on your assembler but, from memory, all the ones I've used work this way.
No, there's no need (and including more than one leading 0 is fairly unusual).
Your example is an apt one though -- without the leading 0 to tell it this was a number, the assembler would normally interpret FH as a symbol rather than a number.

Resources