I am compiling a program for an embedded ARM device, and want to switch from one bootloader to another. Both bootloaders are written in assembler (for the same type of device), but the problem is that they are different dialects/flavors (perhaps Intel vs AT&T?). The existing assembler code compiles happily in gcc, but the one I want to use does not.
For example, the existing (working) code looks like this...
/* Comments are c-style */
.syntax unified
.arch armv7-m
.section .stack
.align 3
#ifdef __STACK_SIZE
.equ Stack_Size, __STACK_SIZE
#else
.equ Stack_Size, 0xc00
#endif
.globl __StackTop
.globl __StackLimit
__StackLimit:
.space Stack_Size
.size __StackLimit, . - __StackLimit
__StackTop:
.size __StackTop, . - __StackTop
... and the code I want to use looks like this ...
; comments are lisp-style
Stack_Size EQU 0x00000400
AREA STACK, NOINIT, READWRITE, ALIGN=3
Stack_Mem SPACE Stack_Size
__initial_sp
; <h> Heap Configuration
; <o> Heap Size (in Bytes) <0x0-0xFFFFFFFF:8>
; </h>
Heap_Size EQU 0x00000200
AREA HEAP, NOINIT, READWRITE, ALIGN=3
__heap_base
Heap_Mem SPACE Heap_Size
__heap_limit
PRESERVE8
THUMB
Notice the order of operands, and commenting style is different. What type of assembler is this second block? Can gcc be told to expect this format and parse it?
The first one is GNU gas syntax, the second is ARM's commercial toolchain syntax.
The formats (directives and label definitions) are not compatible, although the instruction syntax itself is. Assembling the one with the other is not possible, but the generated object files can be linked together.
Your code examples contain no instructions however, only various assembler directives allocating space for stack and heap.
The first looks like AT&T to me, and the second like Intel. I don't think GCC has an option to change what flavor it uses (as it runs all of it's assembly through GAS (GNU assembler), which uses AT&T). But, if you spend a little time learning the C calling conventions, you can use NASM (Netwide Assembler, which uses Intel syntax but can't be inline). Just create a definition something like this in one of your C headers:
extern void assembly_boot();
And then in your assembly, implement it (yes, the prefixing underscore is correct):
global _assembly_boot
_assembly_boot:
;Blah blah blah
Note: That example doesn't implement the C calling conventions. If you want your assembly to be callable from C, you need to use the C calling conventions. Google them.
Related
I am trying to emit a global SYMBOL based on a #define VALUE. My attempt is as follows:
__asm__ (".globl SYMBOL");
__asm__ (".set SYMBOL, %0" :: "i" (VALUE));
What is emitted by gcc to the assembler is the following:
.globl SYMBOL
.set SYMBOL, #VALUE
How can I get rid of the hash in the .set before VALUE. FWIW, my target is ARM.
armclang defines various template modifiers that can be used with inline assembly. gcc supports them, in every instance I've checked, although it doesn't document this.
In particular there is
c
Valid for an immediate operand. Prints it as a plain value without a preceding #. Use this template modifier when using the operand in .word, or another data-generating directive, which needs an integer without the #.
So you can do
__asm__ (".set SYMBOL, %c0" : : "i" (VALUE));
Try on godbolt
(There's a few open bugs on the gcc bugzilla suggesting that template / operand modifiers should be documented. The main one seems to be 30527, where I've just posted a comment. The developers' view seems to be that operand modifiers are "compiler internals" that are not meant for end users, but for arm/aarch64 in particular, there are simple things that you just can't do any other way. They made an exception for x86, so why not here?)
You can use stringizing.
#define VALUE 89
#define xstr(s) str(s)
#define str(s) #s
__asm__ (".globl SYMBOL");
__asm__ (".set SYMBOL, " str(VALUE));
The 'VALUE' must conform to something that gas will take as working with set. They could be fixed addresses from some vendor documentation or a listing output that is parsed. If you want 'VALUE' use str(s), if you want '89' then use xstr(s). You did not describe the actual use case.
I'm trying to run a simple code in assembly - I want to save an address to memory.
I'm moving the address into a register and then moving it into the memory, but for some reason the memory isn't updated.
.data
str1: .asciz "atm course number is 234118"
str2: .asciz "234118"
result: .space 8
.text
.global main
main:
xorq %rax, %rax
xorq %rbx, %rbx
leaq str1, %rax
mov %rax, result(,%rbx,1)
ret
What am I doing wrong?
Your debugger is looking at the wrong instance of result. Your code was always fine (although inefficient; use mov %rax, result(%rip) and don't zero an index, or use mov %rax, result(%rbx,,) to use the byte offset as a "base", not "index", which is more efficient).
glibc contains several result symbols, and in GDB info var result shows:
All variables matching regular expression "result":
Non-debugging symbols:
0x000000000040404b result # in your executable, at a normal static address
0x00007ffff7f54f20 result_type
0x00007ffff7f821b8 cached_result
0x00007ffff7f846a0 result # in glibc, at a high address
0x00007ffff7f85260 result # where the dynamic linker puts shared libs
0x00007ffff7f85660 result
0x00007ffff7f86ab8 result
0x00007ffff7f86f48 result
When I do p /x &result to see what address the debugger resolved that symbol to, I get one of the glibc instances, not the instance in your .data section. Specifically, I get 0x7ffff7f85660 as the address, with the content = 0.
When I print the value with a cast to p /x (unsigned long)result, or dump the memory with GDB's x command, I find a 0 there after the store.
(gdb) x /xg &result
0x7ffff7f85660 <result>: 0x0000000000000000
It looks like your system picked a different instance, one that contained a pointer to a libc address or something. I can't copy-paste from your image. These other result variables are probably static int result or whatever inside various .c files in glibc. (And BTW, that looks like a sign of poor coding style; usually you want to return a value instead of set a global or static. But glibc is old and/or maybe there's some justification for some of those.)
Your result: is the asm a compiler would make for static void* result if it didn't get optimized away. Except it would put it in .bss instead of .data because it's zero-initialized.
You're using SASM. I used GDB to get more details on exactly what's going on. Looking at the address of result in SASM's debug pane might have helped. But now that we've identified the problem using GDB, we can change your source to fix it for SASM.
You can use .globl result to make it an externally-visible symbol so it "wins" when the debugger is looking for symbols.
I added that and compiled again with gcc -g -no-pie store.s. It works as expected now, with p /x (unsigned long)result giving 0x404028
int 0x80 is a system call, it's also 128 in hexa.
why kernel use int 0x80 as interrupt and when i declare int x he knows it's just an integer named x and vice versa ?
You appear to be confused about the difference between C and assembly language. Both are programming languages, (nowadays) both accept the 0xNNNN notation for writing numbers in hexadecimal, and there's usually some way to embed tiny snippets of assembly language in a C program, but they are different languages. The keyword int means something completely different in C than it does in (x86) assembly language.
To a C compiler, int always and only means to declare something involving an integer, and there is no situation where you can immediately follow int with a numeric literal. int 0x80 (or int 128, or int 23, or anything else of the sort) is always a syntax error in C.
To an x86 assembler, int always and only means to generate machine code for the INTerrupt instruction, and a valid operand for that instruction (an "imm8", i.e. a number in the range 0–255) must be the next thing on the line. int x; is a syntax error in x86 assembly language, unless x has been defined as a constant in the appropriate range using the assembler's macro facilities.
Obvious follow-up question: If a C compiler doesn't recognize int as the INTerrupt instruction, how does a C program (compiled for x86) make system calls? There are four complementary answers to this question:
Most of the time, in a C program, you do not make system calls directly. Instead, you call functions in the C library that do it for you. When processing your program, as far as the C compiler knows, open (for instance) is no different than any other external function. So it doesn't need to generate an int instruction. It just does call open.
But the C library is just more C that someone else wrote for you, isn't it? Yet, if you disassemble the implementation of open, you will indeed see an int instruction (or maybe syscall or sysenter instead). How did the people who wrote the C library do that? They wrote that function in assembly language, not in C. Or they used that technique for embedding snippets of assembly language in a C program, which brings us to ...
How does that work? Doesn't that mean the C compiler does need to understand int as an assembly mnemonic sometimes? Not necessarily. Let's look at the GCC syntax for inserting assembly—this could be an implementation of open for x86/32/Linux:
int open(const char *path, int flags, mode_t mode)
{
int ret;
asm ("int 0x80"
: "=a" (ret)
: "0" (SYS_open), "d" (path), "c" (flags), "D" (mode));
if (ret >= 0) return ret;
return __set_errno(ret);
}
You don't need to understand the bulk of that: the important thing for purpose of this question is, yes, it says int 0x80, but it says it inside a string literal. The compiler will copy the contents of that string literal, verbatim, into the generated assembly-language file that it will then feed to the assembler. It doesn't need to know what it means. That's the assembler's job.
More generally, there are lots of words that mean one thing in C and a completely different thing in assembly language. A C compiler produces assembly language, so it has to "know" both of the meanings of those words, right? It does, but it does not confuse them, because they are always used in separate contexts. "add" being an assembly mnemonic that the C compiler knows how to use, does not mean that there is any problem with naming a variable "add" in a C program, even if the "add" instruction gets used in that program.
Is there a way in ARM assembly to place the address of an array into a register?
Something similar to
__asm__("movl %0,%%eax"::"r"(&array1));
AT&T syntax for X86
My initially attempt when in manner
__asm__("LDR R0,%0" :: "m" (&array`)");
Can you give me any suggestion or point to a place where I can look in for this.
This should work:
int a[10];
asm volatile("mov %r0, %[a]" : : [a] "r" (a));
ARM GCC Inline Assembler Cookbook is a very good resource to get syntax right.
Look also at Specifying Registers for Local Variables in GCC docs. You can directly specify registers for variables.
register int *foo asm ("a5");
GCC provides a __BIGGEST_ALIGNMENT__ pre-defined macro which is the largest alignment ever used for any data type on the target machine you are compiling for. I cannot seem to find an LLVM's equivalent for this. Is there any? If not, what is the best way to figure it out (preferably with pre-processor)?
This isn't accessible from the preprocessor, but __attribute__((aligned)) or __attribute__((__aligned__)) (with the alignment value omitted) will give the alignment you want. This is supposed to give the largest alignment of any built-in type, which is 16 on x86 and ARM.
For example:
$ cat align.c
struct foo {
char c;
} __attribute__((aligned)) var;
$ clang align.c -S -o - -emit-llvm
...
#var = global %struct.foo zeroinitializer, align 16
This is used by unwind.h for _Unwind_Exception:
struct _Unwind_Exception
{
_Unwind_Exception_Class exception_class;
_Unwind_Exception_Cleanup_Fn exception_cleanup;
_Unwind_Word private_1;
_Unwind_Word private_2;
/* ### The IA-64 ABI says that this structure must be double-word aligned.
Taking that literally does not make much sense generically. Instead we
provide the maximum alignment required by any type for the machine. */
} __attribute__((__aligned__));
This is in llvm internals as TargetData::PointerABIAlign, but it doesn't appear to be exposed to code. I'd just hard code to 16 bytes, as it seems like it'd be a while before we see any more aligned types or instruction sets.