Introductory ARM - Assembly Error - for-loop

For a class just starting out with the ARM assembly language, we are required to implement a simple for-loop described below:
h=1;
for (i=0, i<5, i++)
h=(h*3)-i;
I have written the following code in ARM assembly:
AREA Prog2, CODE, READONLY
ENTRY
MOV r0, #1; initialize h=1
MOV r1, #0; initialize i=0
loop CMP r1, #5; at start of loop, compare i with 5
MULLT r0, r0, #3; if i<5, h=h*3
SUBLT r0, r0, r1; if i<5, h=h-i (ties in with previous line)
ADDLT r1, r1, #1; increment i if i is less than 5
BLT loop ; repeat loop of i is less than 5
stop B stop; stop program
END
The problem is that there is an error with the line
MULLT r0, r0, #3; if i<5, h=h*3
If I delete it from the code, everything works fine. I just cannot understand the issue with this one line. The error description given is "Bad register name symbol, expected integer register." I have tried loading #3 into a register then multiplying the two registers, but that didn't help. It simply changed the error message to "This register combination results in unpredictable behavior." I am new to this, so please offer only basic instructions as a fix for this. Thanks.

MUL requires all operands to be registers, so you must use the form MUL r0, rn, r0 where rn is some other suitable register.
If the result and the first operand are the same the result is unpredictable as the error says. This is due to the internal operation of the processor. This is why you must use r0, rn, r0 and not r0, r0, rn

Multiplying by 3 is overrated anyway; ARM can do it with a single ridiculously idiomatic addition:
add r0, r0, r0 lsl #1 // r0 = r0 + r0 *2

Related

Understand a piece of assembly template code for arm gcc

Below code contains some inline assembly template:
static inline uintptr_t arch_syscall_invoke3(uintptr_t arg1, uintptr_t arg2,
uintptr_t arg3,
uintptr_t call_id)
{
register uint32_t ret __asm__("r0") = arg1;
register uint32_t r1 __asm__("r1") = arg2;
register uint32_t r2 __asm__("r2") = arg3;
register uint32_t r6 __asm__("r6") = call_id;
__asm__ volatile("svc %[svid]\n"
: "=r"(ret), "=r"(r1), "=r"(r2) <===================== HERE 1
: [svid] "i" (_SVC_CALL_SYSTEM_CALL), <===================== HERE 2
"r" (ret), "r" (r1), "r" (r2), "r" (r6)
: "r8", "memory", "r3", "ip");
return ret;
}
And I got the final assembly with https://godbolt.org/z/znMeEMrEz like this:
push {r6, r7, r8} ------------- A -------------
sub sp, sp, #20
add r7, sp, #0
str r0, [r7, #12]
str r1, [r7, #8]
str r2, [r7, #4]
str r3, [r7]
ldr r0, [r7, #12]
ldr r1, [r7, #8]
ldr r2, [r7, #4]
ldr r6, [r7]
svc #3 ------------- B -------------
mov r3, r0 ------------- C1 -------------
mov r0, r3 ------------- C2 -------------
adds r7, r7, #20
mov sp, r7
pop {r6, r7, r8}
bx lr
From A to B, the assembly code just ensure the input arguments are present in the targeted registers. I guess this is some system call convention.
I don't understand the purpose of HERE 1 and HERE 2.
Question 1:
According to here, HERE 1 should be the OutputOperands part, which means
A comma-separated list of the C variables modified by the instructions in the AssemblerTemplate.
Does this mean the specific requested system call function will modify the ret/r0, r1 and r2regitser?
Question 2:
For HERE 2, it means InputOperands, which means:
A comma-separated list of C expressions read by the instructions in the AssemblerTemplate. An empty list is permitted. See InputOperands.
According to here, the SVC instruction expects only 1 argument imm.
But we specify 4 input operands like ret, r1, r2, r6.
Why do we need to specify so many of them?
I guess these registers are used by svc handler so I need to prepare them before the SVC instruction. But what if I just prepare them like from A to B and do not mention them as the input operands? Will there be some error?
Question 3:
And at last, what's the point of the C1 and C2? They seem totally redundant. The r0 is still there.
I guess this is some system call convention.
This is the result of compilation without optimizations. Looking closely at what's going on in that code one can see that after saving r6, r7 and r8 all it does is moving r3 to r6, everything else is redundant.
Question 1: Does this mean the specific requested system call function will modify the ret/r0, r1 and r2 regitser?
Yes.
Question 2: According to here, the SVC instruction expects only 1 argument imm. But we specify 4 input operands like ret, r, r2, r6.
We specify imm to generate a correct SVC instruction and we specify the rest to make sure that the system call we invoke will find its arguments in the registers documented in the system call ABI.
Why do we need to specify so many of them?
According to the function name it's a 3-argument syscall, so we have 3 syscall parameters and apparently the system call identifier.
But what if I just prepare them like from A to B and do not mention them as the input operands? Will there be some error?
One cannot reliably do the just prepare them like from A to B part without mentioning them as inputs in that asm statement. Just assigning function arguments to local variables is not enough because nothing will enforce the correct ordering of this assignment and the asm statement. There will be no compile-time error unless compiling with warnings-as-errors and having enabled the warning for unused but set variables.
Question 3: And at last, what's the point of the C1 and C2? They seem totally redundant. The r0 is still there.
They are. Compiling with -O will eliminate this redundant move as well as most of the prologue.

Cortex-M compiler generates improper FOR loop

Tested and reproduced on Cortex-M 4 and Cortex-M 0.
I have discovered an issue with the GCC compiler. When a function is declared as type int (non-void), and contains a for loop, but does not have a return statement, the for loop will not break; after disassembling the compiled code, there is a difference between functions with a return, and without a return.
When this code is compiled, it does not throw an error message. On the first compile, a warning of missing return statements is thrown, but after that the warning will not reappear until you restart the IDE. An issue of this magnitude should probably fail to compile, or at least crash the Arduino, but it just never breaks out of the for loop.
I am mainly looking to find the proper channels to report this, since I am not sure if GNU ARM Embedded Toolchain launchpad or GNU Bugzilla are maintained anymore. If anyone knows which site (or both) are still maintained, or if there's a direct contact to someone in the project who I can share this with, please share.
Below is a more thorough description of the behavior.
Arduino Code
============
This is an attempt at a minimum reproducible example. I have run into this issue on two separate occasions in larger projects, which cause the program to behave in extremely unexpected and hard to debug ways (but always fixed by adding a return statement in the function definition).
/*
gcc compiler error demonstration for Adafruit GrandCentral
gcc version: gcc version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (GNU Tools for Arm Embedded Processors 9-2019-q4-major)
Arduino IDE: all warinings on
Arduino IDE version 1.8.13
Adafruit SAMD version 1.8.11
based on Blink
modified to call two functions which are identical except one does not have a return
statement even though it is of return type int.
In the list file, myList.GrandCentral.lst ,AFunctionWithReturn shows both the comparison of
i with Count and the conditional comparison i>7 with break assembly instructions
The AFunctionNoReturn does not show any assembly instructions for the end of
loop comparision or the conditional comparison i>7 with break
Found 4/8/21 Robert Calay and Tristan Calay
Turns an LED on for one second, then off for one second, repeatedly.
Most Arduinos have an on-board LED you can control. On the UNO, MEGA and ZERO
it is attached to digital pin 13, on MKR1000 on pin 6. LED_BUILTIN is set to
the correct LED pin independent of which board is used.
If you want to know what pin the on-board LED is connected to on your Arduino
model, check the Technical Specs of your board at:
https://www.arduino.cc/en/Main/Products
modified 8 May 2014
by Scott Fitzgerald
modified 2 Sep 2016
by Arturo Guadalupi
modified 8 Sep 2016
by Colby Newman
This example code is in the public domain.
http://www.arduino.cc/en/Tutorial/Blink
*/
#define MAIN
//#include "Serial3.h" We are re-directing serial port output to SERCOM 5 on the Grand Central M4.
int AFunctionWithReturn(int count)
{
Serial.print("CountWR");
Serial.println(count);
for(int i=0;i<count;i++) {
Serial.println(i);
if (i>7)
break;
}
return(1);
}
int AFunctionNoReturn(int count)
{
Serial.print("CountNR");
Serial.println(count);
for(int i=0;i<count;i++) {
Serial.println(i);
if (i>7)
break;
}
//Note: No return statement here.
}
// the setup function runs once when you press reset or power the board
void setup() {
Serial.begin(115200);
// initialize digital pin LED_BUILTIN as an output.
pinMode(LED_BUILTIN, OUTPUT);
AFunctionWithReturn(10); //This loops 8 times
AFunctionNoReturn(10); //This loops forever, never reaching loop()
}
// the loop function runs over and over again forever
void loop() {
digitalWrite(LED_BUILTIN, HIGH); // turn the LED on (HIGH is the voltage level)
delay(1000); // wait for a second
digitalWrite(LED_BUILTIN, LOW); // turn the LED off by making the voltage LOW
delay(1000); // wait for a second
}
/*
OUTPUT ON ADAFRUIT GRANDCENTRAL SERIAL PORT
CountWR10
0
1
2
3
4
5
6
7
8
CountNR10
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
....
DOES NOT STOP CONTINUES 2000000+
*
*
*/
Disassembled Code
=================
There is a strange behavior in the brackets here. I'm no expert on the low level code, but it seems like AFunctionNoReturn calls itself recursively here. If not, it still has no break condition, and it does not have a compare call like AFunctionWithReturn in cmp r4, r5.
int AFunctionWithReturn(int count)
{
42bc: b570 push {r4, r5, r6, lr}
Serial.print("CountWR");
42be: 490c ldr r1, [pc, #48] ; (42f0 <_Z19AFunctionWithReturni+0x34>)
Serial.println(count);
for(int i=0;i<count;i++) {
Serial.println(i);
42c0: 4e0c ldr r6, [pc, #48] ; (42f4 <_Z19AFunctionWithReturni+0x38>)
{
42c2: 4605 mov r5, r0
Serial.print("CountWR");
42c4: 480b ldr r0, [pc, #44] ; (42f4 <_Z19AFunctionWithReturni+0x38>)
42c6: f000 fafa bl 48be <_ZN5Print5printEPKc>
Serial.println(count);
42ca: 480a ldr r0, [pc, #40] ; (42f4 <_Z19AFunctionWithReturni+0x38>)
42cc: 220a movs r2, #10
42ce: 4629 mov r1, r5
42d0: f000 fb43 bl 495a <_ZN5Print7printlnEii>
for(int i=0;i<count;i++) {
42d4: 2400 movs r4, #0
42d6: 42ac cmp r4, r5
42d8: da08 bge.n 42ec <_Z19AFunctionWithReturni+0x30>
Serial.println(i);
42da: 220a movs r2, #10
42dc: 4621 mov r1, r4
42de: 4630 mov r0, r6
42e0: f000 fb3b bl 495a <_ZN5Print7printlnEii>
if (i>7)
42e4: 2c08 cmp r4, #8
42e6: d001 beq.n 42ec <_Z19AFunctionWithReturni+0x30>
for(int i=0;i<count;i++) {
42e8: 3401 adds r4, #1
42ea: e7f4 b.n 42d6 <_Z19AFunctionWithReturni+0x1a>
break;
}
return(1);
}
int AFunctionNoReturn(int count)
{
42f8: b538 push {r3, r4, r5, lr}
Serial.print("CountNR");
42fa: 4909 ldr r1, [pc, #36] ; (4320 <_Z17AFunctionNoReturni+0x28>)
Serial.println(count);
for(int i=0;i<count;i++) {
Serial.println(i);
42fc: 4d09 ldr r5, [pc, #36] ; (4324 <_Z17AFunctionNoReturni+0x2c>)
{
42fe: 4604 mov r4, r0
Serial.print("CountNR");
4300: 4808 ldr r0, [pc, #32] ; (4324 <_Z17AFunctionNoReturni+0x2c>)
4302: f000 fadc bl 48be <_ZN5Print5printEPKc>
Serial.println(count);
4306: 4621 mov r1, r4
4308: 4806 ldr r0, [pc, #24] ; (4324 <_Z17AFunctionNoReturni+0x2c>)
430a: 220a movs r2, #10
430c: f000 fb25 bl 495a <_ZN5Print7printlnEii>
for(int i=0;i<count;i++) {
4310: 2400 movs r4, #0
Serial.println(i);
4312: 4621 mov r1, r4
4314: 220a movs r2, #10
4316: 4628 mov r0, r5
4318: f000 fb1f bl 495a <_ZN5Print7printlnEii>
for(int i=0;i<count;i++) {
431c: 3401 adds r4, #1
431e: e7f8 b.n 4312 <_Z17AFunctionNoReturni+0x1a>
4320: 00006538 .word 0x00006538
4324: 2000011c .word 0x2000011c
00004328 <loop>:
AFunctionWithReturn(10);
AFunctionNoReturn(10);
}
Perhaps the most helpful thing that can be said is: "Why do you want to miss out the return statement? what are you hoping to achieve?"
The various language standards (Arduino is sort-of-C++ but with some funny pre-processing) tell you what will happen if you write valid code. They do not always tell you what happens if you write invalid code. In this case the compiler has very helpfully pointed out why your code is wrong, but then after that it is totally free to do anything. No matter what it does this is never a bug in the compiler, it is a bug in your code. This sometimes called "garbage in - garbage out".
To perhaps explain why you got the particular result you did, think about it like this: the compiler knows that in a valid program execution never runs to the end of the function without a return statement, so if there isn't a return statement after the loop, it is safe to assume that it never leaves the loop. Making this assumption helps to optimize valid code to run faster. If this assumption changes what an invalid program does, then the compiler authors usually don't care. They are usually only interested in what valid programs do.
(Regarding the launchpad page, if you click on the big link at the top of the page, you will see a message about where the site has moved to).

How to create an array of booleans in arm assembly?

I need to specify each boolean manually like in a fixed table, so using
Array: .skip 400
I will be declaring an array of 400 bytes,so how can i set the boolean values?
ARM registers are 32 bits each. You only need a bit to represent a boolean. So you can use the following 'C' code to access an array,
uint32_t load_bool(uint32_t index)
{
return (bool_array[index>>2] & (1<<(index&3)));
}
void store_bool(uint32_t index, int value)
{
uint32_t target = bool_array[index>>2];
if(value)
target |= (1<<(index&3));
else
target &= ~(1<<(index&3));
bool_array[index>>2] = target;
}
Use a compiler to target your CPU; for instance tuning godbolt output on a Cortex-A5 gives,
load_bool(unsigned int):
ldr r3, =bool_array
mov r2, r0, lsr #2
ldr r3, [r3, r2, asl #2]
and r0, r0, #3
mov r2, #1
and r0, r3, r2, asl r0
bx lr
store_bool(unsigned int, int):
ldr r3, =bool_array
mov r2, r0, lsr #2
cmp r1, #0
ldr r1, [r3, r2, asl #2]
and r0, r0, #3
mov ip, #1
orrne r0, r1, ip, asl r0
biceq r0, r1, ip, asl r0
str r0, [r3, r2, asl #2]
bx lr
The instructions tst, bclr, etc might be useful if you choose a macro instead of a function call (bit index known at compile/assemble time). Also, ldrb or byte access might be better on older platforms/CPUs. Most ARM CPUs have a 32bit bus, so the cycles for ldrb and ldr are equal.
Boolean variables in C and C++ are basically treated as a native integer assigned 1 for true and 0 for false; in ARM's case it would be a 32-bit integer. So if you need to access the structure as an array of Booleans in C/C++ you would need to access them as 32-bit integers aligned on a 4-byte boundary. However if you only need to access it from other assembly code you can use each byte as it's own boolean variable and simply manipulate the array on a byte level.
In ARM assembly, this would be the difference between accessing the array with LDR vs with LDRB.

LC3 trap executed illegal vector number

I'm trying to count the number of characters in LC3 simulator and keep getting "a trap was executed with an illegal vector number".
These are the objects I execute
charcount.obj:
0011000000000000
0101010010100000
0010011000010000
1111000000100011
0110001011000000
0001100001111100
0000010000001000
1001001001111111
0001001001100001
0001001001000000
0000101000000001
0001010010100001
0001011011100001
0110001011000000
0000111111110110
0010000000000100
0001000000000010
1111000000100001
1111000000100101
1110001011111111
0000000000110000
and verse:
.ORIG x3100
.STRINGZ "Simple Simon met a pieman,"
.STRINGZ "Going to the fair;"
.STRINGZ "Says Simple Simon to the pieman,"
.STRINGZ "Let me taste your ware."
.FILL x04
.END
Looks like we're going to need more information before we can help you much. I understand you've provided us with some binary and I ran that through the LC3 simulator. Here's where I'm a bit lost, which string would you like to count and where is it stored?
After trying to piece together what you've provided here's what I've found.
Registers:
R0 x0061 97
R1 x0000 0
R2 x0000 0
R3 xE2FF -7425
R4 x0000 0
R5 x0000 0
R6 x0000 0
R7 x3003 12291
PC x3004 12292
IR x62C0 25280
CC Z
Memory:
x3000 0101010010100000 x54A0 AND R2, R2, #0
x3001 0010011000010000 x2610 LD R3, x3012
x3002 1111000000100011 xF023 TRAP IN
x3003 0110001011000000 x62C0 LDR R1, R3, #0
x3004 0001100001111100 x187C ADD R4, R1, #-4
x3005 0000010000001000 x0408 BRZ x300E
x3006 1001001001111111 x927F NOT R1, R1
x3007 0001001001100001 x1261 ADD R1, R1, #1
x3008 0001001001000000 x1240 ADD R1, R1, R0
x3009 0000101000000001 x0A01 BRNP x300B
x300A 0001010010100001 x14A1 ADD R2, R2, #1
x300B 0001011011100001 x16E1 ADD R3, R3, #1
x300C 0110001011000000 x62C0 LDR R1, R3, #0
x300D 0000111111110110 x0FF6 BRNZP x3004
x300E 0010000000000100 x2004 LD R0, x3013
x300F 0001000000000010 x1002 ADD R0, R0, R2
x3010 1111000000100001 xF021 TRAP OUT
x3011 1111000000100101 xF025 TRAP HALT
x3012 1110001011111111 xE2FF LEA R1, x3112
x3013 0000000000110000 x0030 NOP
The values displayed in the registers is what I get when I stop after line x3003. For some reason the literal value of xE2FF gets loaded into register R3. After that the value of 0 at memory location xE2FF is loaded into register R1 and then the problems mount from there.
I would recommend displaying your asm code and then commenting each line so we can better understand what you're trying to accomplish.

Where to learn LC3 with proper full explanations?

This may seem rather silly, but there actually are hardly any resources to learn LC-3. I can't manage to seem to find a proper in depth analysis of the subject and explain how things work, I mean sure, you can find simple definitions and what certain op/pseudo-op codes do, but nothing written in full and fully explained.
If someone could do a full analysis of the following:
; Hello name in LC-3 assembler
.orig x3000
lea r0, what
puts
lea r1, name
; typical assembly language hack coming up
add r1, r1, #-1
char getc
putc
add r2, r0, #-10
brz completed; was a newline
str r0, r1, #0
add r1, r1, #1
brnzp char
completed lea r0, hello
puts
halt
That would probably extremely lengthy, but also very appreciated. (Maybe this is the first stack post for a full analysis of LC-3 code resource?)
p.s I don't expect the person who answers to explain what each op/pseudo op code does but at least be very specific about how the operator performs and does its work
I mostly learned how LC3 worked from plugging in code and stepping through it. Though I would reference Appendix A in the book a LOT.
; Hello name in LC-3 assembler
.orig x3000 ; Starting place in memory for our code
lea r0, what ; Load the memory address of variable what
puts ; Print the string who's memory address is stored in R0
lea r1, name ; Load the memory address of the variable name into R1
; typical assembly language hack coming up
add r1, r1, #-1 ; Subtract 1 from R1, then store into R1
char getc ; Get a single char from the user, store into R0
putc ; Print that same char to the console
add r2, r0, #-10 ; R2 = R0 - 10
brz completed ; If the user presses ENTER (Ascii 10)
; and we've subtracted 10 then we'll get a 0, exit program
str r0, r1, #0 ; Store the value of R0 into memory[R1 + 0]
add r1, r1, #1 ; R1 = R1 + 1
brnzp char ; Jump to Clear no matter what
completed lea r0, hello ; Load the memory address of variable hello into R0
puts ; Print the string stored in hello
halt ; Stop the program

Resources