I make some assembly test code which just compare with character,
gcc makes jle / jg combination always whether condition contains equal or not.
example 1.
if ( 'A' < test && test < 'Z' )
0x000000000040054d <+32>: cmp BYTE PTR [rbp-0x1],0x41
0x0000000000400551 <+36>: jle 0x40056a <main+61>
0x0000000000400553 <+38>: cmp BYTE PTR [rbp-0x1],0x59
0x0000000000400557 <+42>: jg 0x40056a <main+61>
example 2.
if ( 'A' <= test && test <= 'Z' )
0x000000000040054d <+32>: cmp BYTE PTR [rbp-0x1],0x40
0x0000000000400551 <+36>: jle 0x40056a <main+61>
0x0000000000400553 <+38>: cmp BYTE PTR [rbp-0x1],0x5a
0x0000000000400557 <+42>: jg 0x40056a <main+61>
I thought it's problem about optimization, but GCC gave same result even if I compile with -O0 option.
How can I get JL/JG through 'A'< sth<'Z' and JLE/JGE through 'A'<=sth<='Z'?
One can see that first comparison is against [x41...x59] range. Second comparison is against [x40...x5a] range. Basically, compiler makes it into
if ( 'A'-1 < test && test < 'Z'+1 )
and then generates the same code
UPDATE
Just to make clear why I think compiler prefers JL vs JLE. JLE depends on flag values being updated (ZF=1) but JL doesn't. Therefore, JLE will introduce dependencies which potentially could hurt instruction level parallelism, even if instruction timing itself is the same
So, clear choice - transform code to use simpler instructions.
In general, you can't force the compiler to emit a particular instruction. In this case, you might succeed if you get rid of the constant so the compiler won't be able to adjust it. Note that due to the nature of your expression, the compiler will probably still reverse one of the tests, and thus bring in an equals. You might be able to work around that by using goto. Obviously, both of these changes will generate worse code.
Related
I wanted to know whether we can use branch prediction macros (likely/unlikely) along with any atomic operation. Is there any side effect of below statement ?
atomic_t v = ATOMIC_INIT(0);
atomic_inc(&v);
if (unlikely(atomic_read(&v)) == 2) {
/* Some Operation */
}
There is no difference between using likely/unlikely on atomic vs non-atomic operations. The purpose of those macros is only to generate code that performs better in the scenario where one of the two branches of a condition is more likely.
So for a "normal" operation you would have for example:
if (unlikely(--x)) if (likely(--x))
do_a(); do_a();
else else
do_b(); do_b();
*decrement x* *decrement x*
jnz not_zero jz zero
call do_b call do_a
not_zero: zero:
call do_a call do_b
While in the case of an atomic operation you would simply have:
if (unlikely(atomic_sub_and_test(&x))) if (likely(atomic_sub_and_test(&x)))
do_a(); do_a();
else else
do_b(); do_b();
*atomically decrement x* *atomically decrement x*
jnz not_zero jz zero
call do_b call do_a
not_zero: zero:
call do_a call do_b
I'm trying to edit the IDT (interrupt Descriptor Table) and I have found this code which should give me access to the structure. But I didn't understand what the colon in the asm line is. I guess that it's some game with bitmaps in C, and this is somehow padding the instruction. But I couldn't find anything definitive.
If it helps, right now the compiler says: invalid 'asm': invalid expression as operand. Still fighting this one... :)
So, what is the colon doing there?
This is the extended asm syntax from gcc compiler. Here's a link describing the syntax:
asm ( assembler template
: output operands /* optional */
: input operands /* optional */
: list of clobbered registers /* optional */
);
and the example:
int a=10, b;
asm ("movl %1, %%eax;
movl %%eax, %0;"
:"=r"(b) /* output */
:"r"(a) /* input */
:"%eax" /* clobbered register */
);
"b" is the output operand, referred to by %0 and "a" is the input
operand, referred to by %1.
"r" is a constraint on the operands. We’ll
see constraints in detail later. For the time being, "r" says to GCC
to use any register for storing the operands. output operand
constraint should have a constraint modifier "=". And this modifier
says that it is the output operand and is write-only.
There are two
%’s prefixed to the register name. This helps GCC to distinguish
between the operands and registers. operands have a single % as
prefix.
The clobbered register %eax after the third colon tells GCC
that the value of %eax is to be modified inside "asm", so GCC won’t
use this register to store any other value.
I have a little problem. I want to have an Edit control in which is a text (something like this: "ABC#") . This string must be non editable so that the user shouldn't be able to delete it. The user should be able to type a text after the sign '#' only. I know how to make text readonly in editbox. I use EM_SETREADONLY message
//global variables
#define ID_TEXTBOX 1
static HWND hwndTextBox;
//in WndProc function
case WM_CREATE:
{
hwndTextBox = CreateWindow(TEXT("EDIT"),TEXT("abc#"),WS_VISIBLE | WS_CHILD | WS_BORDER ,70,100, 200,25,hWnd,(HMENU)ID_TEXTBOX,NULL,NULL);
if(!hwndTextBox )
{
MessageBox(hWnd,"Failed","Failed",MB_OK);
return FALSE;
}
SendMessage(hwndTextBox,EM_SETREADONLY,TRUE ,NULL);
break;
}
but the code makes whole text readonly and of course does not solve my problem.
Use a RichEdit control instead of an Edit control. Use the EM_SETCHARFORMAT message to mark individual characters, or ranges of characters, as "protected". Use the EM_SETEVENTMASK message to register for EN_PROTECTED notifications from the RichEdit. That way, if a user tries to modify one or more protected characters for any reason, the RichEdit will ask for your permission before allowing the modification.
This is probably not what you're looking for, but it may mimic the required functionality with just a bit of code overhead.
You can subclass the edit control and then through WM_CHAR message capture any of the user input that may modify the edit box contents. Once the message is received by your procedure, you detect the current selection (that is the caret position) in the edit box and if it's anywhere inside the first four characters you simply don't allow the change. This is a bit crude method, but it should work.
Example in assembly, sorry I'm not proficient enough in C and C is such a drag :D
invoke SetWindowLong,hEditBox,GWL_WNDPROC,offset EditProc
mov DefEditProc,eax
...
EditProc proc hWnd:DWORD,uMsg:DWORD,wParam:DWORD,lParam:DWORD
cmp uMsg,WM_CHAR
je #WM_CHAR
cmp uMsg,WM_KEYUP
je #WM_KEYUP
#DEFAULT:
invoke CallWindowProc,DefEditProc,hWnd,uMsg,wParam,lParam
ret
#EXIT:
xor eax,eax
ret
;=============
#WM_KEYUP:
mov eax,wParam ; you will need this if you want to process the delete key
cmp ax,VK_DELETE
je #VK_DELETE
jmp #DEFAULT
;=============
#WM_CHAR:
mov eax,wParam
cmp ax,VK_BACK ; this is for the backspace key
je #BACKSPACE
cmp ax,VK_0
jb #EXIT ; if null is returned, the char will not be passed to the edit box
cmp ax,VK_9
ja #EXIT
jmp #NUMBER
;---
#VK_DELETE:
#NUMBER:
invoke SendMessage,hWnd,EM_GETSEL,offset start,0 ; the caret position through text selection, we just need the starting position
cmp start,3
ja #DEFAULT ; if the caret is placed somewhere past the 4th charater, allow user input
jmp #EXIT
;---
#BACKSPACE:
invoke SendMessage,hWnd,EM_GETSEL,offset start,0
cmp start,4
ja #DEFAULT ; since you're deleting one character to the left, you need to factor that in for backspace
jmp #EXIT
EditProc endp
It's very cut, you hopefully get the gist of it. This example only allows the digits (0-9), the DEL and BACKSPACE keys through. You can expand to meet your needs.
Regards
I need to use some x86 instructions that have no GCC intrinsics, such as BSF and BSR.
With GCC inline assembly, I can write something like the following
__INTRIN_INLINE unsigned char bsf64(unsigned long* const index, const uint64_t mask)
{
__asm__("bsf %[mask], %[index]" : [index] "=r" (*index) : [mask] "mr" (mask));
return mask ? 1 : 0;
}
Code like if (bsf64(x, y)) { /* use x */ } is translated by GCC to something like
0x000000010001bf04 <bsf64+0>: bsf %rax,%rdx
0x000000010001bf08 <bsf64+4>: test %rax,%rax
0x000000010001bf0b <bsf64+7>: jne 0x10001bf44 <...>
However if mask is zero, BSF already sets the ZF flag, so the test after bsf is redundant.
Instead of returning mask ? 1 : 0, is it possible to retrieve the ZF flag and returning it, making GCC not generate the test?
EDIT: made the if example more clear
EDIT: In response to Damon, __builtin_ffsl generates even less optimal code. If I use the following code
int b = __builtin_ffsl(mask);
if (b) {
*index = b - 1;
return true;
} else {
return false;
}
GCC generates this assembly
0x000000000044736d <+1101>: bsf %r14,%r14
0x0000000000447371 <+1105>: cmove %r12,%r14
0x0000000000447375 <+1109>: add $0x1,%r14d
0x0000000000447379 <+1113>: je 0x4471c0 <...>
0x000000000044737f <+1119>: lea -0x1(%r14),%ecx
So the test is gone, but redundant conditional move, increment and decrement are generated.
A couple of remarks:
This is an "anti-optimization". You're trying to do a micro-optimization on something that the compiler already supports.
Your code does not generate the bsf instruction at all with my version of gcc with all optimization switches turned on. Looking at the code, that is not surprising, because you return mask, which is the source operand, not the destination operand (gcc uses AT&T syntax!). The compiler is intelligent enough to figure this out and drops the assembler code (which doesn't do anything) alltogether.
There is an intrinsic function __builtin_ffsl which does exactly the same as your inline assembly (though, correctly). An intrinsic is no less portable than inline assembler, but easier for the compiler to optimize.
Using the intrinsic function results in a bsf cmov sequence on my compiler (assuming the calling code forces it to actually emit the instruction), which shows that the compiler uses the zero-flag just fine without an additional test instruction.
Returning a char when you want a bool is not the best possible hint for the compiler, though it will probably figure it out anyway most of the time. However, telling the compiler to use a bitscan instruction when you are really only interested in "zero or not zero" is certainly sub-optimal. if(x) and if(!x) work perfectly well for that matter. It would be different if you returned the result as reference, so you could reuse it in another place, but as it is, your code is only a very complicated way of writing if(x).
GCC compiles (using gcc --omit-frame-pointer -s):
int the_answer() { return 42; }
into
.Text
.globl _the_answer
_the_answer:
subl $12, %esp
movl $42, %eax
addl $12, %esp
ret
.subsections_via_symbols
What is the '$12' constant doing here, and what is the '%esp' register?
Short answer: stack frames.
Long answer: when you call a function, compilers will manipulate the stack pointer to allow for local data such as function variables. Since your code is changing esp, the stack pointer, that's what I assume is happening here. I would have thought GCC smart enough to optimize this away where it's not actually required, but you may not be using optimization.
_the_answer:
subl $12, %esp
movl $42, %eax
addl $12, %esp
ret
The first subl decrements the stack-pointer, to make room for variables that may be used in your function. One slot may be used for the frame pointer, another to hold the return address, for example. You said it should omit the frame pointer. That usually means that it omits loads/stores to save/restore the frame pointer. But often the code will still reserve memory for it. The reason is that it makes code that analyzes the stack much easier. It's easy to give the offset of the stack a minimal width and so you know you can always access FP+0x12, to get at the first local variable slot, even if you omit saving the frame pointer.
Well, eax on x86 is used to handle the return value to the caller, as far as i know. And the last addl just destroys the previously created frame for your function.
The code that generates the instructions at the start and end of functions is called "epilogue" and "prologue" of the function. Here is what my port does when it has to create the prologue of a function in GCC (it's way more complex for real-world ports that intend to be as fast and versatile as possible):
void eco32_prologue(void) {
int i, j;
/* reserve space for all callee saved registers, and 2 additional ones:
* for the frame pointer and return address */
int regs_saved = registers_to_be_saved() + 2;
int stackptr_off = (regs_saved * 4 + get_frame_size());
/* decrement the stack pointer */
emit_move_insn(stack_pointer_rtx,
gen_rtx_MINUS(SImode, stack_pointer_rtx,
GEN_INT(stackptr_off)));
/* save return adress, if we need to */
if(eco32_ra_ever_killed()) {
/* note: reg 31 is return address register */
emit_move_insn(gen_rtx_MEM(SImode,
plus_constant(stack_pointer_rtx,
-4 + stackptr_off)),
gen_rtx_REG(SImode, 31));
}
/* save the frame pointer, if it is needed */
if(frame_pointer_needed) {
emit_move_insn(gen_rtx_MEM(SImode,
plus_constant(stack_pointer_rtx,
-8 + stackptr_off)),
hard_frame_pointer_rtx);
}
/* save callee save registers */
for(i=0, j=3; i<FIRST_PSEUDO_REGISTER; i++) {
/* if we ever use the register, and if it's not used in calls
* (would be saved already) and it's not a special register */
if(df_regs_ever_live_p(i) &&
!call_used_regs[i] && !fixed_regs[i]) {
emit_move_insn(gen_rtx_MEM(SImode,
plus_constant(stack_pointer_rtx,
-4 * j + stackptr_off)),
gen_rtx_REG(SImode, i));
j++;
}
}
/* set the new frame pointer, if it is needed now */
if(frame_pointer_needed) {
emit_move_insn(hard_frame_pointer_rtx,
plus_constant(stack_pointer_rtx, stackptr_off));
}
}
I omitted some code that deals with other issues, primarily with telling GCC what are instructions important for exception handling (i.e where the frame pointer is stored and so on). Well, callee saved registers are the ones that the caller don't need to save prior to a call. The called function cares about saving/restoring them as needed. As you see in the first lines, we always allocate space for the return address and frame pointer. That space is just a few bytes and won't matter. But we only generate the stores/loads when necessary. Finally note the "hard" frame pointer is the "real" frame pointer register. It's necessary because of some gcc internal reasons. The "frame_pointer_needed" flag is set by GCC, whenever i can not omit storing the frame-pointer. For some cases, it has to be stored, for example when alloca (it changes the stackpointer dynamically) is used. GCC cares about all that. Note it has been some time since i wrote that code, so i hope the additional comments i added above are not all wrong :)
Stack alignment. At function entry, esp is -4 mod 16, due to the return address having been pushed by call. Subtracting 12 re-aligns it. There's no good reason to have the stack aligned to 16 bytes on x86 except in multimedia code that's using mmx/sse/etc., but somewhere in the 3.x era, the gcc developers decided the stack should always be kept aligned anyway, imposing prologue/epilogue overhead, increased stack size, and resulting increased cache thrashing on all programs for the sake of a few special-purpose interests (which incidentally happen to be some of my areas of interest, but I still think it was unfair and a bad decision).
Normally if you enable any optimization level, gcc will remove the useless prologue/epilogue for stack alignment for leaf functions (functions that make no function calls), but it will come back as soon as you start making calls.
You can also fix the issue with -mpreferred-stack-boundary=2.
Using GCC 4.3.2 I get this for the function:
the_answer:
movl $42, %eax
ret
...plus surrounding junk, by using the following command line: echo 'int the_answer() { return 42; }' | gcc --omit-frame-pointer -S -x c -o - -
Which version are you using?