C function access to R0 to R12 registers - greenhills

I need to write the C function that will return the value of a specific hardware register. For example R0.
I am unclear from the GHS documentation how this is done with the macros provided by the GHS Compiler.
uint32_t readRegRx(void)
{
uint32_t x = 0U;
__asm("MOV Rx, ??");
return x;
}
What is the syntax in the GHS compiler for referencing a local variable as an argument to an inline assembly instruction?
I've seen this in the GHS documentation:
asm int mulword(a,b)
{
%con a %con b
mov r0,a
mov r1,b
mul r0,r1,r0
%con a %reg b
mov r0,a
mul r0,b,r0
%reg a %con b
mov r0,b
mul r0,a,r0
%reg a %reg b
mul r0,a,b
%con a %mem b
mov r0,a
ldr r1,b
mul r0,r1,r0
%mem a %con b
ldr r0,a
mov r1,b
mul r0,r1,r0
%mem a %mem b
ldr r0,a
ldr r1,b
mul r0,r1,r0
%error
}
But this isn't exactly what I want, I think. The example from the documention above describes a function taking arguments. The return value is implicitly in R0.
In my case, what I want is to use a plain C function, with embedded inline assembly to read a register (R-anything) and store the value in a local variable in the function.

I received information from the GHS support and it addresses the means to read hardware registers (Rn) within C functions analogous to the extended GNU ARM features. The following applies to GHS compiler usage, not GNU compiler usage:
"For asm macro purposes (use within an asm macro), it's probably best to use the enhanced GNU asm syntax, and you'll want to turn on "Accept GNU asm statements" (--gnu_asm)."
int bar(void)
{
int a;
__asm__("mov r0, %0" : : "r"(a)); // move a to r0. Replace r0 to suit.
// Stuff
return a;
}
Alternative method:
asm void readR0(r0Val)
{
%reg r0Val
mov r0Val,r0
}
void foo(void)
{
register int regValReg = 0;
readR0(regValReg);
// Stuff
}

Related

All the calculations take place in registers. Why is the stack not storing the result of the register computation here

I am debugging a simple code in c++ and, looking at the disassembly.
In the disassembly, all the calculations are done in the registers. And later, the result of the operation is returned. I only see the a and b variables being pushed onto the stack (the code is below). I don't see the resultant c variable pushed onto the stack. Am I missing something?
I researched on the internet. But on the internet it looks like all variables a,b and c should be pushed onto the stack. But in my Disassembly, I don't see the resultant variable c being pushed onto the stack.
C++ code:
#include<iostream>
using namespace std;
int AddMe(int a, int b)
{
int c;
c = a + b;
return c;
}
int main()
{
AddMe(10, 20);
return 0;
}
Relevant assembly code:
int main()
{
00832020 push ebp
00832021 mov ebp,esp
00832023 sub esp,0C0h
00832029 push ebx
0083202A push esi
0083202B push edi
0083202C lea edi,[ebp-0C0h]
00832032 mov ecx,30h
00832037 mov eax,0CCCCCCCCh
0083203C rep stos dword ptr es:[edi]
0083203E mov ecx,offset _E7BF1688_Function#cpp (0849025h)
00832043 call #__CheckForDebuggerJustMyCode#4 (083145Bh)
AddMe(10, 20);
00832048 push 14h
0083204A push 0Ah
0083204C call std::operator<<<std::char_traits<char> > (08319FBh)
00832051 add esp,8
return 0;
00832054 xor eax,eax
}
As seen above, 14h and 0Ah are pushed onto the stack - corresponding to AddMe(10, 20);
But, when we look at the disassembly for the AddMe function, we see that the variable c (c = a + b), is not pushed onto the stack.
snippet of AddMe in Disassembly:
…
int c;
c = a + b;
00836028 mov eax,dword ptr [a]
0083602B add eax,dword ptr [b]
0083602E mov dword ptr [c],eax
return c;
00836031 mov eax,dword ptr [c]
}
shouldn't c be pushed to the stack in this program? Am I missing something?
All the calculations take place in registers.
Well yes, but they're stored afterwards.
Using memory-destination add instead of just using the accumulator register (EAX) would be an optimization. And one that's impossible when when the result needs to be in a different location than any of the inputs to an expression.
Why is the stack not storing the result of the register computation here
It is, just not with push
You compiled with optimization disabled (debug mode) so every C object really does have its own address in the asm, and is kept in sync between C statements. i.e. no keeping C variables in registers. (Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?). This is one reason why debug mode is extra slow: it's not just avoiding optimizations, it's forcing store/reload.
But the compiler uses mov not push because it's not a function arg. That's a missed optimization that all compilers share, but in this case it's not even trying to optimize. (What C/C++ compiler can use push pop instructions for creating local variables, instead of just increasing esp once?). It would certainly be possible for the compiler to reserve space for c in the same instruction as storing it, using push. But compilers instead to stack-allocation for all locals on entry to a function with one sub esp, constant.
Somewhere before the mov dword ptr [c],eax that spills c to its stack slot, there's a sub esp, 12 or something that reserves stack space for c. In this exact case, MSVC uses a dummy push to reserve 4 bytes space, as an optimization over sub esp, 4.
In the MSVC asm output, the compiler will emit a c = ebp-4 line or something that defines c as a text substitution for ebp-4. If you looked at disassembly you'd just see [ebp-4] or whatever addressing mode instead of.
In MSVC asm output, don't assume that [c] refers to static storage. It's actually still stack space as expected, but using a symbolic name for the offset.
Putting your code on the Godbolt compiler explorer with 32-bit MSVC 19.22, we get the following asm which only uses symbolic asm constants for the offset, not the whole addressing mode. So [c] might just be that form of listing over-simplifying even further.
_c$ = -4 ; size = 4
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
int AddMe(int,int) PROC ; AddMe
push ebp
mov ebp, esp ## setup a legacy frame pointer
push ecx # RESERVE 4B OF STACK SPACE FOR c
mov eax, DWORD PTR _a$[ebp]
add eax, DWORD PTR _b$[ebp] # c = a+b
mov DWORD PTR _c$[ebp], eax # spill c to the stack
mov eax, DWORD PTR _c$[ebp] # reload it as the return value
mov esp, ebp # restore ESP
pop ebp # tear down the stack frame
ret 0
int AddMe(int,int) ENDP ; AddMe
The __cdecl calling convention, which AddMe() uses by default (depending on the compiler's configuration), requires parameters to be passed on the stack. But there is nothing requiring local variables to be stored on the stack. The compiler is allowed to use registers as an optimization, as long as the intent of the code is preserved.

Disassembly code from visual studio

Using WinDBG for debugging the assembly code of an executable, it seems that compiler inserts some other codes between two sequential statements. The statements are pretty simple, e.g. they don't work with complex objects for function calls;
int a, b;
char c;
long l;
a = 0; // ##
b = a + 1; // %%
c = 1; // ##
l = 1000000;
l = l + 1;
And the disassembly is
## 008a1725 c745f800000000 mov dword ptr [ebp-8],0
008a172c 80bd0bffffff00 cmp byte ptr [ebp-0F5h],0 ss:002b:0135f71f=00
008a1733 750d jne test!main+0x42 (008a1742)
008a1735 687c178a00 push offset test!main+0x7c (008a177c)
008a173a e893f9ffff call test!ILT+205(__RTC_UninitUse) (008a10d2)
008a173f 83c404 add esp,4
008a1742 8b45ec mov eax,dword ptr [ebp-14h]
%% 008a1745 83c001 add eax,1
008a1748 c6850bffffff01 mov byte ptr [ebp-0F5h],1
008a174f 8945ec mov dword ptr [ebp-14h],eax
## 008a1752 c645e301 mov byte ptr [ebp-1Dh],1
Please note that ##, %% and ## in the disassembly list show the corresponding C++ lines.
So what are that call, cmp, jne and push?
It is the compiler run-time error checking (RTC), the RTC switch check for uninitialized variables, I think that you can manage it from Visual Studio (compiler options).
For more information, take a look to this. Section /RTCu switch

What is the correct use of multiple input and output operands in extended GCC asm?

What is the correct use of multiple input and output operands in extended GCC asm under register constraint? Consider this minimal version of my problem. The following brief extended asm code in GCC, AT&T syntax:
int input0 = 10;
int input1 = 15;
int output0 = 0;
int output1 = 1;
asm volatile("mov %[input0], %[output0]\t\n"
"mov %[input1], %[output1]\t\n"
: [output0] "=r" (output0), [output1] "=r" (output1)
: [input0] "r" (input0), [input1] "r" (input1)
:);
printf("output0: %d\n", output0);
printf("output1: %d\n", output1);
The syntax appears correct based on https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html However, I must have overlooked something or be committing some trivial mistake that I for some reason can't see.
The output with GCC 5.3.0 p1.0 (no compiler arguments) is:
output0: 10
output1: 10
Expected output is:
output0: 10
output1: 15
Looking at it in GDB shows:
0x0000000000400581 <+43>: mov eax,DWORD PTR [rbp-0x10]
0x0000000000400584 <+46>: mov edx,DWORD PTR [rbp-0xc]
0x0000000000400587 <+49>: mov edx,eax
0x0000000000400589 <+51>: mov eax,edx
0x000000000040058b <+53>: mov DWORD PTR [rbp-0x8],edx
0x000000000040058e <+56>: mov DWORD PTR [rbp-0x4],eax
From what I can see it loads eax with input0 and edx with input1. It then overwrites edx with eax and eax with edx, making these equal. It then writes these back into output0 and output1.
If I use a memory constraint (=m) instead of a register constraint (=r) for the output, it gives the expected output and the assembly looks more reasonable.
The problem is that GCC assumes that all all output operands are only written at the end of the instruction, after all input operands have been consumed. This means it can use the same operand (eg. a register) as an input operand and an output operand which is what is happening here. The solution is to mark [output0] with an early clobber constraint so that GCC knows that its written to before the end of the asm statement.
For example:
asm volatile("mov %[input0], %[output0]\t\n"
"mov %[input1], %[output1]\t\n"
: [output0] "=&r" (output0), [output1] "=r" (output1)
: [input0] "r" (input0), [input1] "r" (input1)
:);
You don't need to mark [output1] as early clobber because it's only written to at the end of the instruction so it doesn't matter if it uses the same register as [input0] or [input1].

How do I return a byte or short from inline assembly

When I compile:
short foo (short x)
{
__asm ("mov %0, 0" : "=r" (x));
return x;
}
I get
foo:
mov r0, 0
sxth r0, r0
bx lr
I don't want the extra instruction sxth (signed extract halfword).
The reason it is there is because the ABI (ARM AEABI in this case) requires that the function return value is sign extended to a full register, but GCC doesn't know what my assembly code might have done to the register it gave me.
How do I tell GCC that I promise to leave the register correctly sign extended so it doesn't have to insert this extra instruction?
I would have thought that it could infer this from the fact that I gave it a short as the expression for the "=r" constraint.

In GCC-style extended inline asm, is it possible to output a "virtualized" boolean value, e.g. the carry flag?

If I have the following C++ code to compare two 128-bit unsigned integers, with inline amd-64 asm:
struct uint128_t {
uint64_t lo, hi;
};
inline bool operator< (const uint128_t &a, const uint128_t &b)
{
uint64_t temp;
bool result;
__asm__(
"cmpq %3, %2;"
"sbbq %4, %1;"
"setc %0;"
: // outputs:
/*0*/"=r,1,2"(result),
/*1*/"=r,r,r"(temp)
: // inputs:
/*2*/"r,r,r"(a.lo),
/*3*/"emr,emr,emr"(b.lo),
/*4*/"emr,emr,emr"(b.hi),
"1"(a.hi));
return result;
}
Then it will be inlined quite efficiently, but with one flaw. The return value is done through the "interface" of a general register with a value of 0 or 1. This adds two or three unnecessary extra instructions and detracts from a compare operation that would otherwise be fully optimized. The generated code will look something like this:
mov r10, [r14]
mov r11, [r14+8]
cmp r10, [r15]
sbb r11, [r15+8]
setc al
movzx eax, al
test eax, eax
jnz is_lessthan
If I use "sbb %0,%0" with an "int" return value instead of "setc %0" with a "bool" return value, there's still two extra instructions:
mov r10, [r14]
mov r11, [r14+8]
cmp r10, [r15]
sbb r11, [r15+8]
sbb eax, eax
test eax, eax
jnz is_lessthan
What I want is this:
mov r10, [r14]
mov r11, [r14+8]
cmp r10, [r15]
sbb r11, [r15+8]
jc is_lessthan
GCC extended inline asm is wonderful, otherwise. But I want it to be just as good as an intrinsic function would be, in every way. I want to be able to directly return a boolean value in the form of the state of a CPU flag or flags, without having to "render" it into a general register.
Is this possible, or would GCC (and the Intel C++ compiler, which also allows this form of inline asm to be used) have to be modified or even refactored to make it possible?
Also, while I'm at it — is there any other way my formulation of the compare operator could be improved?
Here we are almost 7 years later, and YES, gcc finally added support for "outputting flags" (added in 6.1.0, released ~April 2016). The detailed docs are here, but in short, it looks like this:
/* Test if bit 0 is set in 'value' */
char a;
asm("bt $0, %1"
: "=#ccc" (a)
: "r" (value) );
if (a)
blah;
To understand =#ccc: The output constraint (which requires =) is of type #cc followed by the condition code to use (in this case c to reference the carry flag).
Ok, this may not be an issue for your specific case anymore (since gcc now supports comparing 128bit data types directly), but (currently) 1,326 people have viewed this question. Apparently there's some interest in this feature.
Now I personally favor the school of thought that says don't use inline asm at all. But if you must, yes you can (now) 'output' flags.
FWIW.
I don't know a way to do this. You may or may not consider this an improvement:
inline bool operator< (const uint128_t &a, const uint128_t &b)
{
register uint64_t temp = a.hi;
__asm__(
"cmpq %2, %1;"
"sbbq $0, %0;"
: // outputs:
/*0*/"=r"(temp)
: // inputs:
/*1*/"r"(a.lo),
/*2*/"mr"(b.lo),
"0"(temp));
return temp < b.hi;
}
It produces something like:
mov rdx, [r14]
mov rax, [r14+8]
cmp rdx, [r15]
sbb rax, 0
cmp rax, [r15+8]
jc is_lessthan

Resources