gcc inline assembler shift left problem - gcc

I'm having trouble compiling the code below. It may also have logical errors, please help. thanks,
#include <iostream>
using namespace std;
int main()
{
int shifted_value;
int value = 2;
__asm__("shll %%eax,%1;" : "=a" (shifted_value): "a" (value));
cout<<shifted_value<<endl;
return 0 ;
}
The error is:
Error: suffix or operands invalid for `shl'

It should look like
__asm__(("shll %%cl, %%eax;"
: "=a" (shifted_value)
: "a" (shifted_value), "c" (value)
);
Credit for correct code goes to the other answer for pointing out that the operands were in the incorrect order.
You dont need to specify eax as clobbered because eax is an output register.

it can work with shll also, shll being the GNU at&t mnemonic for "shift left longword". However, the invalid operand error is due to the operand needing to come first! I found this out when I googled up these sources: http://meplayer.googlecode.com/svn-history/r23/trunk/meplayer/src/filters/transform/mpcvideodec/ffmpeg/libavcodec/cabac.h. I also used the excellent http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html#s6
Here is one way that works, taking into account Jesus Ramos's observation that shifted_value needs to be initialized:
jcomeau#intrepid:/tmp$ cat test.cpp; make test; ./test
#include <iostream>
using namespace std;
int main()
{
int shifted_value = 1;
char value = 2;
__asm__("shll %%cl, %%eax;"
: "=a" (shifted_value)
: "a" (shifted_value), "c" (value)
);
cout<<shifted_value<<endl;
return 0 ;
}
g++ test.cpp -o test
4

Related

How do you explain gcc's inline assembly constraints for the IN, OUT instructions of i386?

As far as I can tell, the constraints used in gcc inline assembly tell gcc where input and output variables must go (or must be) in order to generate valid assembly. As the Fine Manual says, "constraints on the placement of the operand".
Here's a specific, working example from a tutorial.
static inline uint8_t inb(uint16_t port)
{
uint8_t ret;
asm volatile ( "inb %1, %0"
: "=a"(ret)
: "Nd"(port) );
return ret;
}
inb is AT&T syntax-speak for the i386 IN instruction that receives one byte from an I/O port.
Here are the specs for this instruction, taken from the i386 manual. Note that port numbers go from 0x0000 to 0xFFFF.
IN AL,imm8 // Input byte from immediate port into AL
IN AX,imm8 // Input word from immediate port into AX
IN EAX,imm8 // Input dword from immediate port into EAX
IN AL,DX // Input byte from port DX into AL
IN AX,DX // Input word from port DX into AX
IN EAX,DX // Input dword from port DX into EAX
Given a statement like uint8_t x = inb(0x80); the assembly output is, correctly, inb $0x80,%al. It used the IN AL,imm8 form of the instruction.
Now, let's say I just care about the IN AL,imm8 form, receiving a uint8_t from a port between 0x00 and 0xFF inclusive. The only difference between this and the working example is that port is now a uint8_t template parameter (to make it effectively a constant) and the constraint is now "N".
template<uint8_t port>
static inline uint8_t inb()
{
uint8_t ret;
asm volatile ( "inb %1, %0"
: "=a"(ret)
: "N"(port) );
return ret;
}
Fail!
I thought that the "N" constraint would mean, "you must have a constant unsigned 8-bit integer for this instruction", but clearly it does not because it is an "impossible constraint". Isn't the uint8_t template param a constant unsigned 8-bit integer?
If I replace "N" with "Nd", I get a different error:
./test.h: Assembler messages:
./test.h:23: Error: operand type mismatch for `in'
In this case, the assembler output is inb %dl, %al which obviously is not valid.
Why would this only work with "Nd" and uint16_t and not "N" and uint8_t?
EDIT:
Here's a stripped-down version I tried on godbolt.org:
#include <cstdint>
template<uint8_t N>
class Port {
public:
uint8_t in() const {
uint8_t data;
asm volatile("inb %[port], %%al"
:
: [port] "N" (N)
: // clobbers
);
return data;
}
};
void func() {
Port<0x7F>().in();
}
Interestingly, this works fine, except if you change N to anything between 0x80 and 0xFF. On clang this generates a "128 is out of range for constraint N" error. This generates a more generic error in gcc.
Based on how constraints are documented your code should work as expected.
This appears to still be a bug more than a year later. It appears the compilers are converting N from an unsigned value to a signed value and attempting to pass that into an inline assembly constraint. That of course fails when the value being passed into the constraint can't be represented as an 8-bit signed value. The input constraint "N" is suppose to allow an unsigned 8-bit value and any value between 0 and 255 (0xff) should be accepted:
N
Unsigned 8-bit integer constant (for in and out instructions).
There is a similar bug report to GCC's bugzilla titled "Constant constraint check sign extends unsigned constant input operands".
In one of the related threads it was suggested you can fix this issue by ANDing (&) 0xff to the constant (ie: N & 0xff). I have also found that static casting N to an unsigned type wider than uint8_t also works:
#include <cstdint>
template<uint8_t N>
class Port {
public:
uint8_t in() const {
uint8_t data;
asm volatile("inb %[port], %0"
: "=a"(data)
: [port] "N" (static_cast<uint16_t>(N))
: // clobbers
);
return data;
}
};
void func() {
Port<0x7f>().in();
Port<0x80>().in();
// Port<0x100>().in(); // Fails as expected since it doesn't fit in a uint8_t
}
To test this you can play with it on godbolt.

Simple assembly example : set inputs and get output - right syntax

I try to do a simple example to insert, into a C code, a piece of Sparc assembly 32 bits; this little code performs an incrementation on the variable "sum".
The code is :
#include <stdio.h>
#include <sys/time.h>
#include <unistd.h>
int n;
int sum;
int main ()
{
n = 100;
sum = 0;
struct timeval tv1, tv2;
long long diff;
gettimeofday (&tv1, NULL);
asm volatile ("set sum, %g1\n\t" \
"set n, %g3\n" \
"loop:\n\t" \
"add %g1, 1, %g2\n\t" \
"sub %g3, 1, %g4\n\t" \
"bne loop\n\t" \
"nop\n\t" \
: "=r" (sum)
: "r" (n)
);
gettimeofday (&tv2, NULL);
diff = (tv2.tv_sec - tv1.tv_sec) * 1000000L + (tv2.tv_usec - tv1.tv_usec);
printf ("Elapsed time = %d usec\n", diff);
printf ("Sum = %d\n", sum);
return 0;
}
Unfortunately, compilation with gcc4.1.2 produces the following errors :
loop_dev_for-assembly_code.c: In function #main#:
loop_dev_for-assembly_code.c:18: error: invalid 'asm': invalid operand output code
loop_dev_for-assembly_code.c:18: error: invalid 'asm': operand number out of range
loop_dev_for-assembly_code.c:18: error: invalid 'asm': invalid operand output code
loop_dev_for-assembly_code.c:18: error: invalid 'asm': operand number out of range
loop_dev_for-assembly_code.c:18: error: invalid 'asm': operand number out of range
loop_dev_for-assembly_code.c:18: error: invalid 'asm': operand number out of range
It seems the line 18 corresponds to "asm volatile ("set sum, %g1\n\t" \ ...".
But I don't know how to circumvent these errors. It may come from the variable sum which is set to %g1 register.
About the links between variable belonging to C code and variable localted in Assembly code part. I have also seen, for inputs and outputs parameters, the syntax "=g" (output paramter ??), "g" (input parameter) : I think that it corresponds to different registers between the 2 syntax.
if someone could give to me some clues to understand this link and debug my little code which does a simple loop to increment variable sum.
Thanks for your help, regards.
As somebody else said, there are many errors and misconceptions in your inline assembly code. Here are just a few things. First, in extended asm syntax, you must escape all the '%' symbols with another '%', so for example you need to put '%%g1' instead of '%g1' and do this for all the registers you access. Second, you can't use 'set' for either of the variables n or sum, since they are both stack variables, not globals. You have already declared these variables as positional parameters in your asm statement, so sum is parameter %0 and n is %1. Your add instruction puts the result in %g2, which is never initialized or used anywhere.
I think the entire sequence could be rendered much more simply like this (not tested):
asm volatile ("clr %%g1\n" \
"loop:\n\t" \
"add %%g1, 1, %%g1\n\t" \
"subcc %1, 1, %1\n\t" \
"bne loop\n\t" \
"nop\n\t" \
"mov %%g1, %0\n" \
: "=r" (sum)
: "r" (n)
: "g1" );

What is the role of the clobber list? [duplicate]

This function "strcpy" aims to copy the content of src to dest, and it works out just fine: display two lines of "Hello_src".
#include <stdio.h>
static inline char * strcpy(char * dest,const char *src)
{
int d0, d1, d2;
__asm__ __volatile__("1:\tlodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b"
: "=&S" (d0), "=&D" (d1), "=&a" (d2)
: "0"(src),"1"(dest)
: "memory");
return dest;
}
int main(void) {
char src_main[] = "Hello_src";
char dest_main[] = "Hello_des";
strcpy(dest_main, src_main);
puts(src_main);
puts(dest_main);
return 0;
}
I tried to change the line : "0"(src),"1"(dest) to : "S"(src),"D"(dest), the error occurred: ‘asm’ operand has impossible constraints. I just cannot understand. I thought that "0"/"1" here specified the same constraint as the 0th/1th output variable. the constraint of 0th output is =&S, te constraint of 1th output is =&D. If I change 0-->S, 1-->D, there shouldn't be any wrong. What's the matter with it?
Does "clobbered registers" or the earlyclobber operand(&) have any use? I try to remove "&" or "memory", the result of either circumstance is the same as the original one: output two lines of "Hello_src" strings. So why should I use the "clobbered" things?
The earlyclobber & means that the particular output is written before the inputs are consumed. As such, the compiler may not allocate any input to the same register. Apparently using the 0/1 style overrides that behavior.
Of course the clobber list also has important use. The compiler does not parse your assembly code. It needs the clobber list to figure out which registers your code will modify. You'd better not lie, or subtle bugs may creep in. If you want to see its effect, try to trick the compiler into using a register around your asm block:
extern int foo();
int bar()
{
int x = foo();
asm("nop" ::: "eax");
return x;
}
Relevant part of the generated assembly code:
call foo
movl %eax, %edx
nop
movl %edx, %eax
Notice how the compiler had to save the return value from foo into edx because it believed that eax will be modified. Normally it would just leave it in eax, since that's where it will be needed later. Here you can imagine what would happen if your asm code did modify eax without telling the compiler: the return value would be overwritten.

How do I ask the assembler to "give me a full size register"?

I'm trying to allow the assembler to give me a register it chooses, and then use that register with inline assembly. I'm working with the program below, and its seg faulting. The program was compiled with g++ -O1 -g2 -m64 wipe.cpp -o wipe.exe.
When I look at the crash under lldb, I believe I'm getting a 32-bit register rather than a 64-bit register. I'm trying to compute an address (base + offset) using lea, and store the result in a register the assembler chooses:
"lea (%0, %1), %2\n"
Above, I'm trying to say "use a register, and I'll refer to it as %2".
When I perform a disassembly, I see:
0x100000b29: leal (%rbx,%rsi), %edi
-> 0x100000b2c: movb $0x0, (%edi)
So it appears the code being generated calculates and address using 64-bit values (rbx and rsi), but saves it to a 32-bit register (edi) (that the assembler chose).
Here are the values at the time of the crash:
(lldb) type format add --format hex register
(lldb) p $edi
(unsigned int) $3 = 1063330
(lldb) p $rbx
(unsigned long) $4 = 4296030616
(lldb) p $rsi
(unsigned long) $5 = 10
A quick note on the Input Operands below. If I drop the "r" (2), then I get a compiler error when I refer to %2 in the call to lea: invalid operand number in inline asm string.
How do I tell the assembler to "give me a full size register" and then refer to it in my program?
int main(int argc, char* argv[])
{
string s("Hello world");
cout << s << endl;
char* ptr = &s[0];
size_t size = s.length();
if(ptr && size)
{
__asm__ __volatile__
(
"%=:\n" /* generate a unique label for TOP */
"subq $1, %1\n" /* 0-based index */
"lea (%0, %1), %2\n" /* calcualte ptr[idx] */
"movb $0, (%2)\n" /* 0 -> ptr[size - 1] .. ptr[0] */
"jnz %=b\n" /* Back to TOP if non-zero */
: /* no output */
: "r" (ptr), "r" (size), "r" (2)
: "0", "1", "2", "cc"
);
}
return 0;
}
Sorry about these inline assembly questions. I hope this is the last one. I'm not really thrilled with using inline assembly in GCC because of pain points like this (and my fading memory). But its the only legal way I know to do what I want to do given GCC's interpretation of the qualifier volatile in C.
If interested, GCC interprets C's volatile qualifier as hardware backed memory, and anything else is an abuse and it results in an illegal program. So the following is not legal for GCC:
volatile void* g_tame_the_optimizer = NULL;
...
unsigned char* ptr = ...
size_t size = ...;
for(size_t i = 0; i < size; i++)
ptr[i] = 0x00;
g_tame_the_optimizer = ptr;
Interestingly, Microsoft uses a more customary interpretation of volatile (what most programmers expect - namely, anything can change the memory, and not just memory mapped hardware), and the code above is acceptable.
gcc inline asm is a complicated beast. "r" (2) means allocate an int sized register and load it with the value 2. If you just need an arbitrary scratch register you can declare a 64 bit early-clobber dummy output, such as "=&r" (dummy) in the output section, with void *dummy declared earlier. You can consult the gcc manual for more details.
As to the final code snippet looks like you want a memory barrier, just as the linked email says. See the manual for example.

can't find a register in class 'CREG' while reloading 'asm' - memcpy inline asm

I am trying to make an earlier verion Linux got compiled, you can download the source code from git://github.com/azru0512/linux-0.12.git. While compiling ''kernel/blk_drv/ramdisk.c'', I got error message below,
ramdisk.c:36:10: error: can't find a register in class 'CREG' while reloading 'asm'
ramdisk.c:40:10: error: can't find a register in class 'CREG' while reloading 'asm'
ramdisk.c:36:10: error: 'asm' operand has impossible constraints
ramdisk.c:40:10: error: 'asm' operand has impossible constraints
What in ramdisk.c are,
if (CURRENT-> cmd == WRITE) {
(void) memcpy(addr,
CURRENT->buffer,
len);
} else if (CURRENT->cmd == READ) {
(void) memcpy(CURRENT->buffer,
addr,
len);
} else
panic("unknown ramdisk-command");
And the memcpy is,
extern inline void * memcpy(void * dest,const void * src, int n)
{
__asm__("cld\n\t"
"rep\n\t"
"movsb"
::"c" (n),"S" (src),"D" (dest)
:"cx","si","di");
return dest;
}
I guess it's memcpy (include/string.h) inline asm problem, so I remove the clobber list from it but without luck. Could you help me to find out what's going wrong? Thanks!
GCC's syntax for this has changed / evolved a bit.
You must now specify each of the special target registers as an output operand:
...("...instructions..."
: "=c"(n), "=S"(src), "=D"(dest)
and then additionally as the same registers as source operands:
: "0"(n), "1"(src), "2"(dest)
and finally you need to clobber "memory" (I can't remember offhand if this affects condition codes, if so you would also need "cc"):
: "memory")
Next, because this instruction should not be moved or deleted, you need to use either volatile or __volatile__ (I'm not entirely sure why but without this the instructions were deleted, in my test-case).
Last, it's no longer a good idea to attempt to override memcpy because gcc "knows" how to implement the function. You can override gcc's knowledge with -fno-builtin.
This compiles (for me anyway, with a somewhat old gcc on an x86-64 machine):
extern inline void * memcpy(void * dest,const void * src, int n)
{
__asm__ volatile("cld\n\t"
"rep\n\tmovsb\n\t"
: "=c" (n), "=S" (src), "=D" (dest)
: "0" (n), "1" (src), "2" (dest)
: "memory", "cc");
return dest;
}
This exact problem & its reasons are discussed on GCC's bugzilla :
Bug 43998 - inline assembler: can't set clobbering for input register
gcc wont allow input & output registers as clobbers.
If you corrupt input register, do a dummy output to same register :
unsigned int operation;
unsigned int dummy;
asm ("cpuid" : "=a" (dummy) : "0" ( operation) :);

Resources