Strange behaviour of clang assembler - gcc

I tried to compile this overflow detection macro of Zend engine:
#define ZEND_SIGNED_MULTIPLY_LONG(a, b, lval, dval, usedval) do { \
long __tmpvar; \
__asm__( \
"mul %0, %2, %3\n" \
"smulh %1, %2, %3\n" \
"sub %1, %1, %0, asr #63\n" \
: "=X"(__tmpvar), "=X"(usedval) \
: "X"(a), "X"(b)); \
if (usedval) (dval) = (double) (a) * (double) (b); \
else (lval) = __tmpvar; \
} while (0)
And got this result in assembly:
; InlineAsm Start
mul x8, x8, x9
smulh x9, x8, x9
sub x9, x9, x8, asr #63
; InlineAsm End
The compiler used only 2 register for both input and output of the macro, which i think it must be at least 3, and lead to wrong result of the calculation (for example, -1 * -1). Any suggestion?

The assembly code is buggy. From GCC's documentation on extended asm:
Use the ‘&’ constraint modifier (see Modifiers) on all output operands that must not overlap an input. Otherwise, GCC may allocate the output operand in the same register as an unrelated input operand, on the assumption that the assembler code consumes its inputs before producing outputs. This assumption may be false if the assembler code actually consists of more than one instruction.
This basically says that from the moment you write to an output parameter not marked with an ampersand, you're not allowed to use the input parameters anymore because they might have been overwritten.

The syntax is designed around the concept of wrapping a single insn which reads its inputs before writing its outputs.
When you use multiple insns, you often need to use an early-clobber modifier on the constraint ("=&x") to let the compiler know you write an output or read-write register before reading all the inputs. Then it will make sure that output register isn't the same register as any of the input registers.
See also the x86 tag wiki, and my collection of inline asm docs and SO answers at the bottom of this answer.
#define ZEND_SIGNED_MULTIPLY_LONG(a, b, lval, dval, usedval) do { \
long __tmpvar; \
__asm__( \
"mul %[tmp], %[a], %[b]\n\t" \
"smulh %[uv], %[a], %[b]\n\t" \
"sub %[uv], %[uv], %[tmp], asr #63\n" \
: [tmp] "=&X"(__tmpvar), [uv] "=&X"(usedval) \
: [a] "X"(a), [b] "X"(b)); \
if (usedval) (dval) = (double) (a) * (double) (b); \
else (lval) = __tmpvar; \
} while (0)
Do you really need all those instructions to be inside the inline asm? Can't you make long tmp = a * b an input operand? Then if the compiler needs a*b elsewhere in the function, CSE can see it.
You can convince gcc to broadcast the sign bit with an arithmetic right shift using a ternary operator. So hopefully you can coax the compiler to do the sub that way. Then it could use subs to set flags from the sub instead of needing a separate test insn on usedval.
If you can't get your target compiler to make the code you want, then sure, give inline asm a shot. But beware, I've seen clang be a lot worse than gcc with inline asm. It tends to make worse code around the inline
asm on x86.

Related

Executing TI BASIC from a String

Is it possible to Execute TI BASIC from a string? Such that:
execute(":Disp Str1")
would Print out Str1?
It could be done with a small Asm( program that takes a string (from Ans), turns it into a BASIC program, and executes it. For example:
.nolist
#include "ti83plus.inc"
.list
.org userMem-2
.db $BB,$6D
ld hl,SavesScreen
ld (hl),tTheta
inc hl
ld (hl),0
inc hl
push hl
bcall(_AnsName)
rst rFindSym
ex de,hl
ld c,(hl)
inc hl
ld b,(hl)
dec hl
inc bc
inc bc
pop de
ldir
ld hl,SavesScreen
ld a,6
bcall(_ExecuteNewPrgm)
; no ret because _ExecuteNewPrgm does not return
This is not ideal,
No safety in case Ans did not contain a string (could be added).
A program named θ should not exist, because it will be overwritten. If prgmθ already exists and is archived, then it does not work at all. (could be improved)
prgmθ is not deleted afterwards. (not sure how to do this)
When prgmθ is done, it quits to the Homescreen, it does not return to the calling program. (not sure how to do this)
Aside from that, it does work, for example:
:"TESTING->Str1
:":Disp Str1
:Asm(prgmRUNSTR
Looks like this afterwards:
You can create the assembly program by typing this into a normal program:
AsmPrgmBB6D21
EC86365B2336
0023E5EF524B
D7EB4E23462B
0303D1EDB021
EC863E06EF3C
4C
It can be made smaller with AsmComp(.
Unfortunately, no, not like this- expr and eval only work on expressions.

Local Label Address in GNU GCC Assembely

I'm trying to use some PPC assembly code in C, but I'm having trouble understanding and transposing this particular piece of ASM into the GNU GCC format:
...
b 1f
1:
lis p0, HI(2f)
ori p0, p0, LO(2f)
mtsrr0 p0
rfi
2:
mtssr0 p1 /* Restore srr0 & srr1 */
mtssr1 p2
...
The lines in question are those that reference 2f. I'm aware of Local Labels and I can only assume that is what is meant by 2f in those two instructions. Looking at the more general mtspr instruction, the RS parameter should be a register.
EDIT: Peter Cordes helped me understand the intent of this code. It looks like we're using lis and ori to build the 32 bit address of the label 2: to load into ssr0. The following quote from the PowerPC Architecture Primer completely explains the intent of this assembly.
Save/restore registers (SRR0 and SRR1) — SRR0 holds the address of the instruction where an interrupted process should resume. When rfi executes, instruction execution continues at the address in SRR0. In Book E, SRR0 is used for non-critical interrupts.— SRR1 holds machine state information. When an interrupt is taken, MSR contents are placed in SRR1. When rfi executes, SRR1 contents are placed into MSR. In Book E, SRR1 is used for non-critical interrupts.
Now that I understand what the code is doing, I need to represent this code with GNU GCC in C:
__asm__ __volatile__ (
"b 1f\n\t"
"1:\n\t"
"lis %2, %hi(2f)\n\t"
"ori %2, %2, %lo(2f)\n\t"
"mtsrr0 %2\n\t"
"rfi\n\t"
"2:\n\t"
"mtsrr0 %0\n\t" /* srr0 = p1 */
"mtsrr1 %1\n\t" /* srr1 = p2 */
: "=r" (p1), "=r" (p2)
: "r" (val), "r" (p1), "r" (p2));
This yields the following error (twice, for each instance of 2f I assume:
invalid 'asm': operand number missing after %-letter
Commenting out the lines with instructions lis and ori allows the code to compile without errors.

EBNF: prefix and suffix-like operator in assembly code production

I'm trying to write down 6809 assembly in EBNF to write a tree-sitter parser.
I'm stuck on one certain production. In 6809 assembly, you can use a register as an operand and additionally de- or increment it:
LDA 0,X+ ; loads A from X then bumps X by 1
LDD ,Y++ ; loads D from Y then bumps Y by 2
LDA 0,-U ; decrements U by 1 then loads A from address in U
LDU ,--S ; decrements S by 2 then loads U from address in S
Mind the "missing" first operand in the second line of code. Here are the productions I wrote:
instruction = opcode, [operand], ["," , register_exp];
...
register_exp = [{operator}], register | register, [{operator}];
register = "X" | "Y" | "U" | etc. ;
operator = "+" | "-";
The problem is register_exp = .... I feel like there could be a more elegant way to define this production. Also, what happens if only a register is given to register_exp?
You probably need
register_exp = [{operator}], register | register, [{operator}] | register;
to allow register names without operators. Why do you find it not so elegant? Quite descriptive.

Windows kernel conditional breakpoint not evaluating

I'm using the windows kernel debugger through visual studio 2013 and I'm trying to stop (break) in a function (nt!KiSwapContext) but only for a specific process (0x920).
The breakpoint works without a condition bp nt!KiSwapContext
I determined the Process ID for the current thread can be found with dt dword poi(gs:[188h])+3B8h
I've confirmed the following conditional works to see if I am on the right thread: ? poi(poi(gs:[188h])+3B8h)==0x920
However, when I try to set the conditional breakpoint it always breaks no matter what I put in the if/else . So I am guessing it thinks the expression is invalid and is just ignoring it. I've confirmed that if I do enter an invalid expression it just accepts it without warning or error and always stops on the breakpoint.
The expression I am using is: bp nt!KiSwapContext ".if (poi(poi(gs:[188h])+3B8h)==0x920) {} .else {gc}"
I also tried using the j conditional syntax to no avail.
Any ideas on what I am doing wrong?
[Edit] Oh, as a bonus, how can I do the conditional check with a dword instead of a qword on a 64 bit processor. ? poi(poi(gs:[188h])+3B8h) returns a qword value. I know I can use dd to get the value, but I can't seem to figure out how to add that into the conditional. Something like ? dword(poi(gs:[188h])+3B8h)==0x920 or ? {dd poi(gs:[188h])+3B8h}==0x920
windbg allows you to set process specific breakpoints with /p
you shouldn't be mucking with gs and fs registers
kd> bl
kd> !process 0 0 calc.exe
Failed to get VAD root
PROCESS 8113d528 SessionId: 0 Cid: 07a0 Peb: 7ffde000 ParentCid: 043c
DirBase: 03d27000 ObjectTable: e15ba240 HandleCount: 28.
Image: calc.exe
kd> bp /p 8113d528 nt!KiSwapContext "?? (char *)(#$proc->ImageFileName)"
kd> g
char * 0x8113d69c
"calc.exe"
nt!KiSwapContext:
804db828 83ec10 sub esp,10h
kd> g
char * 0x8113d69c
"calc.exe"
nt!KiSwapContext:
804db828 83ec10 sub esp,10h
use dwo() and qwo () as required to evaluate dword and qword
kd> ? qwo ( ffb9cda8 + 70)
Evaluate expression: -9142252815570161280 = 81203180`81203180
kd> ? dwo ( qwo ( ffb9cda8 + 70))
Evaluate expression: -4600296 = ffb9ce18
confirmation
kd> dd 81203180 l1
81203180 ffb9ce18
kd> dd ffb9cda8+70 l1
ffb9ce18 81203180
Edit
I cant access an x64 system atm so cant tell you what is the error in your expression
but in general you should avoid hardcoding unless it is absolutely necessary
in your case it is not necessary
windbg provides you pseudo registers to what you are hard coding
$thread to c++ Expression for CurrentThread * ie (nt!_ETHREAD *) .
so $thread->Cid.UniqueProcess is what you are evaluating with your gsexxxxx
with that in mind you can set a breakpoint like this
bp nt!KiSwapContext " r? $t0 = #$thread->Cid.UniqueProcess ;.if( #$t0 != 0x740 ) {? #$t0;?? (char * )#$proc->ImageFileName ;gc }"
this conditional will break only in calc.exe is the Current Process
kd> g
Evaluate expression: 404 = 00000194
char * 0x81105c84
"csrss.exe"
XXXXXXXXXXX
Evaluate expression: 4 = 00000004
char * 0x8129196c
"System"
xxxxxxxxxxxxxxxxxxxxxxxxxxx
Evaluate expression: 1404 = 0000057c
char * 0x8114a4bc
"vpcmap.exe"
Evaluate expression: 480 = 000001e0
char * 0x8112a98c
"services.exe"
Evaluate expression: 492 = 000001ec
char * 0x811cc9ac
"lsass.exe"
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
Evaluate expression: 1116 = 0000045c
char * 0xffaf9da4
"explorer.exe"
Evaluate expression: 644 = 00000284
char * 0xffb74f14
"svchost.exe"
nt!KiSwapContext: <---------------------------Conditional broke here
804db828 83ec10 sub esp,10h
kd> ? #$t0;?? (char * )#$proc->ImageFileName
Evaluate expression: 1856 = 00000740
char * 0x8110e76c
"calc.exe"
keep in mind evaluating conditions in a very hot path will make you endure unbearable pain watching it crawl by
nt!kiSwapContext is called hundreds of times in few seconds
and you will be seeing a very noticeable performance degradation in your
Session
whenever possible use process specific or thread specific breakpoints
do not evaluate conditions
no i don't use any cheat sheet ( google says there are few available ) i prefer manual or in some cases online msdn documentation

How to translate "rterrmsgs <8, aR6008NotEnough>" into a legal nasm-style asm code?

I use IDA Pro to disassemble SPEC 2006 binaries on Windows 7 32 bit.
It generates a variable declared like this:
rterrs rterrmsgs <2, aR6002FloatingP>
rterrmsgs <8, aR6008NotEnough>
terrmsgs <9, aR6009NotEnough>
rterrmsgs <0Ah, aR6010AbortHasB>
rterrmsgs <10h, aR6016NotEnough>
rterrmsgs <11h, aR6017Unexpecte>
rterrmsgs <12h, aR6018Unexpecte>
and I can find the definition of aR6002FloatingP, aR6008NotEnough, aR6010AbortHasB... like
aR6016NotEnough:
dw __utf16__('R6016')
dw 0Dh, 0Ah
dw __utf16__('- not enough space for thread data')
dw 0Dh, 0Ah, 0
So basically instructions like
rterrmsgs <11h, aR6017Unexpecte>
can not be directly assembled into binary using nasm/masm,
I am thinking this stuff should work like a array, but basically what is 2, 8, 9 in
rterrs rterrmsgs <2, aR6002FloatingP>
rterrmsgs <8, aR6008NotEnough>
terrmsgs <9, aR6009NotEnough>
so my question is, how to adjust instructions above to make it re-assembled in nasm syntax?
THank you!
These are simply instances of a structure from the CRT :
/* struct used to lookup and access runtime error messages */
struct rterrmsgs {
int rterrno; /* error number */
char *rterrtxt; /* text of error message */
};
see (for example) : ftp://ftp.cs.ntust.edu.tw/yang/PC-SIMSCRIPT/C++/VC98/CRT/SRC/CRT0MSG.C
In your example, if you take :
rterrmsgs <10h, aR6016NotEnough>
It corresponds to the following entry :
/* 16 */
{ _RT_THREAD, _RT_THREAD_TXT },
where _RT_THREAD is 16 (0x10) and _RT_THREAD_TXT is defined as follows:
#define _RT_THREAD_TXT "R6016" EOL "- not enough space for thread data" EOL
see (http://bioen.okstate.edu/Home/prashm%20-%20keep/prashant/VS.NET%20setup%20files/PROGRAM%20FILES/MICROSOFT%20VISUAL%20STUDIO%20.NET/VC7/CRT/SRC/CMSGS.H) for various messages.
Hope that helps :)

Resources