GNU assembler for MIPS: how to emit sync_* instructions?

GNU assembler for MIPS: how to emit sync_* instructions? - gcc

MIPS32 ISA defines the following format for the sync instruction:
SYNC (stype = 0 implied)
SYNC stype
here, stype may be SYNC_WMB (SYNC 4), SYNC_MB (SYNC 16), etc.
In inline assembler, I may use default sync: __asm__ volatile ("sync" ::);.
But, if I write something like __asm__ volatile ("sync 0x10" ::), it doesn't compile:
Error: illegal operands 'sync 0x10'
Same if pass -mips32r2 option to gcc.
So, the question is: how to use SYNC_* (WYNC_WMB, SYNC_MB, SYNC_ACQUIRE, ...) instructions from GCC inlined assembly?

I suspect that your binutils are too old - it looks like support for this was only added in version 2.20.
As a workaround, if you can't upgrade your binutils easily, you could construct the opcode by hand.
sync is an opcode 0 instruction with a function code (bits 5..0) of 0xf, and this form of it encodes the sync type in the shift amount field (bits 10..6). So, e.g. for sync 0x10:
__asm__ volatile(".word (0x0000000f | (0x10 << 6))");

Related

Branching to a c symbol from thumb inline assembly

I'm on a Cortex-M0+ device (Thumb only) and I'm trying to dynamically generate some code in ram and then jump to it, like so:
uint16_t code_buf[18];
...
void jump() {
register volatile uint32_t* PASET asm("r0") = &(PA->OUTSET.reg);
register volatile uint32_t* PACLR asm("r1") = &(PA->OUTCLR.reg);
register uint32_t set asm("r2") = startset;
register uint32_t cl0 asm("r3") = clears[0];
register uint32_t cl1 asm("r4") = clears[1];
register uint32_t cl2 asm("r5") = clears[2];
register uint32_t cl3 asm("r6") = clears[3];
register uint32_t dl0 asm("r8") = delays[0];
register uint32_t dl1 asm("r9") = delays[1];
register uint32_t dl2 asm("r10") = delays[2];
register uint32_t dl3 asm("r11") = delays[3];
asm volatile (
"bl code_buf\n"
: [set]"+r" (set) : [PASET]"r" (PASET), [PACLR]"r" (PACLR), [cl0]"r" (cl0), [cl1]"r" (cl1), [cl2]"r" (cl2), [cl3]"r" (cl3), [dl0]"r" (dl0), [dl1]"r" (dl1), [dl2]"r" (dl2), [dl3]"r" (dl3) : "lr"
);
}
The code in code_buf will use the arguments passed via registers (that's why I'm forcing specific registers).
This code compiles fine, but when I look at the disassembly the branch instruction has been changed to
a14: f004 ebb0 blx 0x5178
Which would try to switch the cpu to ARM mode and cause a HardFault. Is there a way to force the assembler to keep the branch as a simple bl?

So it turns out that the toolchain I was using (gcc 4.8) is buggy, and makes two errors: it interprets code_buf as an arm address, and produces a bogus blx label which isn't even legal on a cortex-m0+. I updated it to 6.3.1 and the inline asm was converted to a bl label as it was supposed to.

From section 4.1.1 of the ARMv6-M Architecture Reference Manual:
Thumb interworking is held as bit [0] of an interworking address.
Interworking addresses are used in the following instructions: BX,
BLX, or POP that loads the PC.
ARMv6-M only supports the Thumb
instruction Execution state, therefore the value of address bit [0]
must be 1 in interworking instructions, otherwise a fault occurs. All
instructions ignore bit [0] and write bits [31:1]:’0’ when updating
the PC.
The target of your branch, code_buf, will be word-aligned (possibly double-word aligned) so bit 0 will be clear in its address. The key is to ensure that bit 0 is set before you branch, and then even if the toolchain selects an interworking instruction you'll remain in thumb mode.
I don't have a development environment in front of me to test this, but I would suggest casting to a pointer-to-single-byte type and using pointer arithmetic to set bit 0:
uint8_t *thumb_target = ((uint8_t *)code_buf) + 1;
asm volatile (
"bl thumb_target\n"
: [set]"+r" (set) : [PASET]"r" (PASET), [PACLR]"r" (PACLR), [cl0]"r" (cl0), [cl1]"r" (cl1), [cl2]"r" (cl2), [cl3]"r" (cl3), [dl0]"r" (dl0), [dl1]"r" (dl1), [dl2]"r" (dl2), [dl3]"r" (dl3) : "lr"
);
Edit: The above doesn't work, as Peter Cordes points out, because a local variable can't be used in inline ASM in this context. Not being well-versed in gcc's inline ASM, I won't attempt to fix it.
I have now had a chance to test the supplied code though, and gcc 7.2.1 with -S -mtune=cortex-m0plus -fomit-frame-pointer generates a BL not a BLX.
Edit 2: The documentation (section A6.7.14) suggests that only the register-target version of BLX is present in the ARMv6-M architecture (this is in common with the ARMv7 devices I'm most familiar with) and so it looks to me as if the fault is caused not by an attempt to switch to ARM mode but by an illegal instruction. Is your compiler correctly configured?

IDK why your assembler would be changing bl into blx. Mine doesn't, using arm-none-eabi-gcc 7.3.0 on Arch Linux. arm-none-eabi-as --version shows Binutils 2.30.
unsigned short code_buf[18];
void jump() {
asm("bl code_buf");
asm("blx code_buf"); // still assembles to BL, not BLX
// asm("blx jump");
// asm("bl jump");
}
compiled with arm-none-eabi-gcc -O2 -nostdlib arm-bl.c -mcpu=cortex-m0plus -mthumb (I made a linked executable with -nostdlib so I could see actual branch displacements, not placeholders).
Disassembling with arm-none-eabi-objdump -d a.out shows
00008000 <jump>:
8000: f010 f804 bl 1800c <__data_start>
8004: f010 f802 bl 1800c <__data_start>
8008: 4770 bx lr
800a: 46c0 nop ; (mov r8, r8)
Your f004 ebb0 may be a Thumb2 encoding for BLX. I don't know why you're getting it.
The Thumb encoding for bl is documented in section 5.19 of this ARM7TDMI ISA manual ("long branch with link"), but that manual doesn't mention a Thumb encoding for blx at all (because it's only Thumb, not Thumb 2). The Thumb bl encoding stores the branch displacement right-shifted by 1 (i.e. without the low bit), and always stays in Thumb mode.
It's actually two separate instructions; one which puts the high 12 bits of the displacement into LR, and another which branches and updates LR to the return address. (This 2-instruction hack allows Thumb1 to work without Thumb2 32-bit instructions). Both instructions start with f, so your disassembly shows that you got something else; the first 16-bit chunk of f004 ebb0 is the LR setup, but ebb0 doesn't match any Thumb 1 instruction.
Possibly asm("bl code_buf+1" : ...); or blx code_buf+1 could work, if the +1 convinces the assembler to treat it as a Thumb target. But you might need to use asm to get a .thumb_func directive applied to code_buf somehow to keep your assembler happy.

How does the `asm()` function works in C language?

I am learning Operating System Development and a Beginner of course. I would like to build my system in real mode environment which is a 16 bit environment using C language.
In C, I used a function asm() to convert the codes to 16 bit as follows:
asm(".code16")
which in GCC's language to generate 16 bit executables(not exactly though).
Question:
Suppose I have two header files head1.h and head2.h and a main.c file. The contents of main.c file are as follows:
asm(".code16");
#include<head1.h>
#include<head2.h>
int main(){
return 0;
}
Now, Since I started my code with the command to generate 16 bit executable file and then included head1.h and head2.h, will I need to do the same in all header files that I am to create? (or) Is it sufficient to add the line asm(".code16"); once?
OS: Ubuntu
Compiler: Gnu CC

To answer your question: It suffices for the asm block to be present at the beginning of the translation unit.
So putting it once at the beginning will do.
But you can do better: you can avoid it altogether and use the -m16 command line option (available from 5.2.0) instead.
But you can do better: you can avoid it altogether.
The effect of -m16 and .code16 is to make 32-bit code executable in real mode, it is not to produce real mode code.
Look
16.c
int main()
{
return 4;
}
Extracting the raw .text segment
>gcc -c -m16 16.c
>objcopy -j .text -O binary 16.o 16.bin
>ndisasm 16.bin
we get
00000000 6655 push ebp
00000002 6689E5 mov ebp,esp
00000005 6683E4F0 and esp,byte -0x10
00000009 66E800000000 call dword 0xf
0000000F 66B804000000 mov eax,0x4
00000015 66C9 o32 leave
00000017 66C3 o32 ret
Which is just 32-bit code filled with operand size prefixes.
On a real pre-386 machine this won't work as the 66h opcode is UD.
There are old 16-bit compilers, like Turbo C1, that address the problematic of the real-mode applications properly.
Alternatively, switch in protected mode as soon as possible or consider using UEFI.
1 It is available online. This compiler is as old as me!

It is not needed to add asm("code16") neither in head1.h nor head2.h.
The main reason is how the C pre-compiler works. It replaces the content of head1.h and head2.h within main.c.
Please check How `#include' Works for further information.
Hope it helps!
Best regards,
Miguel Ángel

How to set gcc or clang to use Intel syntax permanently for inline asm() statements?

I have the following code which compiles fine with the gcc command gcc ./example.c. The program itself calls the function "add_two" which simply adds two integers. To use the intel syntax within the extended assembly instructions I need to switch at first to intel and than back to AT&T. According to the gcc documentation it is possible to switch to intel syntax entirely by using gcc -masm=intel ./exmaple.
Whenever I try to compile it with the switch -masm=intel it won't compile and I don't understand why? I already tried to delete the instruction .intel_syntax but it still don't compile.
#include <stdio.h>
int add_two(int, int);
int main(){
int src = 3;
int dst = 5;
printf("summe = %d \n", add_two(src, dst));
return 0;
}
int add_two(int src, int dst){
int sum;
asm (
".intel_syntax;" //switch to intel syntax
"mov %0, %1;"
"add %0, %2;"
".att_syntax;" //switch to at&t syntax
: "=r" (sum) //output
: "r" (src), "r" (dst) //input
);
return sum;
}
The error message by compiling the above mentioned program with gcc -masm=intel ./example.c is:
tmp/ccEQGI4U.s: Assembler messages:
/tmp/ccEQGI4U.s:55: Error: junk `PTR [rbp-4]' after expression
/tmp/ccEQGI4U.s:55: Error: too many memory references for `mov'
/tmp/ccEQGI4U.s:56: Error: too many memory references for `mov'

Use -masm=intel and don't use any .att_syntax directives in your inline asm. This works with GCC and I think ICC, and with any constraints you use. Other methods don't. (See Can I use Intel syntax of x86 assembly with GCC? for a simple answer saying that; this answer explores exactly what goes wrong, including with clang 13 and earlier.)
That also works in clang 14 and later. (Which isn't released yet but the patch is part of current trunk; see https://reviews.llvm.org/D113707).
Clang 13 and earlier would always use AT&T syntax for inline asm, both in substituting operands and in assembling as op src, dst. But even worse, clang -masm=intel would do that even when taking the Intel side of an asm template using dialect-alternatives like asm ("add {att | intel}" : ... )`!
clang -masm=intel did still control how it printed asm after its built-in assembler turned an asm() statement into some internal representation of the instruction. e.g. Godbolt showing clang13 -masm=intel turning add %0, 1 as add dword ptr [1], eax, but clang trunk producing add eax, 1.
Some of the rest of this answer talking about clang hasn't been updated for this new clang patch.
Clang does support Intel-syntax inside MSVC-style asm-blocks, but that's terrible (no constraints so inputs / outputs have to go through memory.
If you were hard-coding register names with clang, -masm=intel would be usable (or the equivalent -mllvm --x86-asm-syntax=intel). But it chokes on mov %eax, 5 in Intel-syntax mode so you can't let %0 expand to an AT&T-syntax register name.
-masm=intel makes the compiler use .intel_syntax noprefix at the top of its asm output file, and use Intel-syntax when generating asm from C outside your inline-asm statement. Using .att_syntax at the bottom of your asm template breaks the compiler's asm, hence the error messages like PTR [rbp-4] looking like junk to the assembler (which is expecting AT&T syntax).
The "too many operands for mov" is because in AT&T syntax, mov eax, ebx is a mov from a memory operand (with symbol name eax) to a memory operand (with symbol name ebx)
Some people suggest using .intel_syntax noprefix and .att_syntax prefix around your asm template. That can sometimes work but it's problematic. And incompatible with the preferred method of -masm=intel.
Problems with the "sandwich" method:
When the compiler substitutes operands into your asm template, it will do so according to -masm=. This will always break for memory operands (the addressing-mode syntax is completely different).
It will also break with clang even for registers. Clang's built-in assembler does not accept %eax as a register name in Intel-syntax mode, and doesn't accept .intel_syntax prefix (as opposed to the noprefix that's usually used with Intel-syntax).
Consider this function:
int foo(int x) {
asm(".intel_syntax noprefix \n\t"
"add %0, 1 \n\t"
".att_syntax"
: "+r"(x)
);
return x;
}
It assembles as follows with GCC (Godbolt):
movl %edi, %eax
.intel_syntax noprefix
add %eax, 1 # AT&T register name in Intel syntax
.att_syntax
The sandwich method depends on GAS accepting %eax as a register name even in Intel-syntax mode. GAS from GNU Binutils does, but clang's built-in assembler doesn't.
On a Mac, even using real GCC the asm output has to assemble with an as that's based on clang, not GNU Binutils.
Using clang on that source code complains:
<source>:2:35: error: unknown token in expression
asm(".intel_syntax noprefix \n\t"
^
<inline asm>:2:6: note: instantiated into assembly here
add %eax, 1
^
(The first line of the error message didn't handle the multi-line string literal very well. If you use ; instead of \n\t and put everything on one line the clang error message works better but the source is a mess.)
I didn't check what happens with "ri" constraints when the compiler picks an immediate; it will still decorate it with $ but IDK if GAS silently ignores that, too, in Intel syntax mode.
PS: your asm statement has a bug: you forgot an early-clobber on your output operand so nothing is stopping the compiler from picking the same register for the %0 output and the %2 input that you don't read until the 2nd instruction. Then mov will destroy an input.
But using mov as the first or last instruction of an asm-template is usually also a missed-optimization bug. In this case you can and should just use lea %0, [%1 + %2] to let the compiler add with the result written to a 3rd register, non-destructively. Or just wrap the add instruction (using a "+r" operand and an "r", and let the compiler worry about data movement.) If it had to load the value from memory anyway, it can put it in the right register so no mov is needed.
PS: it's possible to write inline asm that works with -masm=intel or att, using GNU C inline asm dialect alternatives. e.g.
void atomic_inc(int *p) {
asm( "lock add{l $1, %0 | %0, 1}"
: "+m" (*p)
:: "memory"
);
}
compiles with gcc -O2 (-masm=att is the default) to
atomic_inc(int*):
lock addl $1, (%rdi)
ret
Or with -masm=intel to:
atomic_inc(int*):
lock add DWORD PTR [rdi], 1
ret
Notice that the l suffix is required for AT&T, and the dword ptr is required for intel, because memory, immediate doesn't imply an operand-size. And that the compiler filled in valid addressing-mode syntax for both cases.
This works with clang, but only the AT&T version ever gets used.

Note that -masm= also affects the default inline assembler syntax:
Output assembly instructions using selected dialect. Also affects
which dialect is used for basic "asm" and extended "asm". Supported
choices (in dialect order) are att or intel. The default is att.
Darwin does not support intel.
That means that your first .intel_syntax directive is superfluous and the final .att_syntax is wrong because your GCC call compiles C to Intel assembler code.
IOW, either stick to -masm=intel or sandwich your inline Intel assembler code sections between .intel_syntax noprefix and .att_syntax prefix directives - but don't do both.
Note that the sandwich method isn't compatible with all inline assembler constraints - e.g. a constraint that involves m (i.e. memory operand) would insert an operand in ATT syntax which would yield an error like 'Error: junk (%rbp) after expression'. In those cases you have to use -masm=intel.

How to declare and initialize local variables in gcc inline assembly without extended inline asm?

I know this is a very basic question but I am really stuck on it. In fact I am absolutely newbie in GCC syntax.
I want to have local variables (Stack addresses with labels) without using extended inline assembly. Something like the following code in Intel syntax:
DATA1 DB 100
MOV AL, DATA1
This is the code I guess may substitute in GCC:
int someFunction(int x)
{
__asm__ volatile(
"function1:"
".data;"
".2byte $4 data1 ;"
".text;"
"pushq %rbp;"
"movq %rsp , %rbp ;"
"movl var , %eax;" // this is source of error
"popq %rbp;"
"leaveq;"
"retq ; "
);
}
But this code results in this error:
symbol(s) not found for architecture x86_64
I can use global variables in x86 but the same result comes in x64 or x86_x64.
Setting: LLVM 4.1; Cocoa used in Xcode 4
What is the correct syntax?

GCC inline assembler doesn't support local variables, use GCC's extended syntax.
If you are uncomfortable with AT&T syntax there are ways to use Intel syntax on GCC.
This is an excellent how-to on GCC asm.

Need to convert old 32-bit GAS code to a current GAS assembler (pushfl/popl)

I am in the process of trying to compile an old project on my modern machine. I know this old project used an old (2.x) version of GCC/GAS so I need to clean it up so that I can compile it with a current version.
The code in question:
__get_flags:
pushfl
popl %eax
ret
gas complains that pushfl is not an instruction. From what I read this is an old deprecated GNU mnemonic, but there must be some sort of x86_64 equivalent, surely?
In addition to answering how to convert this code to x86_64 gas, it would be great if someone could explain where mnemonics like pushfl and pushfq and defined and documented. I am quite sure that are GAS creations...
Note: I want to compile this on gcc 4.6.1 and gas 2.21

The following:
static inline u16 __get_flags(void) {
u16 flags;
asm ("pushf\n\t"
"pop %0" : : "+r"(flags) : "cc");
return flags;
}
should actually be portable for both 64/32bit, x64/x86. But it only gets you the condition codes, not the high parts of the state register. If you require the entire contents of the flags register when running in 64bit (long) mode, use the PUSHFQ instruction name, with a u64 (or other 64bit quantity) operand.

I can't add a comment to the answer because I don't have 50 reputation. Anyway:
In the answer:
"pop %0" : : "+r"(flags) : "cc");
Should read:
"pop %0" : "=r"(flags) : : "cc");
Because the syntax is asm( code : outputs : inputs : clobbers ) and what you want is flags to be a register that's output from the asm block, not used as input. (I tested this and it worked for me, I also used 64-bit ints)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

GNU assembler for MIPS: how to emit sync_* instructions? - gcc

Related

Branching to a c symbol from thumb inline assembly

How does the `asm()` function works in C language?

How to set gcc or clang to use Intel syntax permanently for inline asm() statements?

How to declare and initialize local variables in gcc inline assembly without extended inline asm?

Need to convert old 32-bit GAS code to a current GAS assembler (pushfl/popl)

Categories

Resources