What does this mean in PowerPC? - powerpc

stwu r1, -32(r1) // 32 bytes of space for this function
mflr r0
stw r0, 36(r1) //stores link register
stw r30, 24(r1) // ??
stw r31, 28(r1) // Probably makes space for r31?
mr r31, r1 // r31 = stack pointer
This is the beginning of this function, in code above it stores r30 somewhere in the memory, and every function begins this way. But neither r31 nor r30 hold any value in the registers. What sense to store it?

In the PowerPC ELF ABI, registers r14-r31 are defined as non-volatile - they must be preseved across a function call. So, if a function can overwrite the contents of any of these registers, it must save their values in the function prologue, and restore them before returning to the caller.
So, even though your disassembled function hasn't used r30 and r31 yet, it needs to save them on the stack, so it doesn't corrupt the calling-function's nonvolatile state. You'll probably see usage of r30 and r31 later in the function, and the restore (from those same locations on the stack) before the function returns.
I'm assuming that your program conforms to the Power ELF ABI, as that's what defines how your registers are used.
For more information, the Power ELF ABI is at http://openpowerfoundation.org/technical/technical-resources/technical-specifications/ , or https://www.power.org/technology-introduction/standards-specifications/ for the 32-bit versions.

Related

which MOV instructions in the x86 are not used or the least used, and can be used for a custom MOV extension

I am modelling a custom MOV instruction in the X86 architecture in the gem5 simulator, to test its implementation on the simulator, I need to compile my C code using inline assembly to create a binary file. But since it a custom instruction which has not been implemented in the GCC compiler, the compiler will throw out an error. I know one way is to extend the GCC compiler to accept my custom X86 instruction, but I do not want to do it as it is more time consuming(but will do it afterwards).
As a temporary hack (just to check if my implementation is worth it or not). I want to edit an already MOV instruction while changing its underlying "micro ops" in the simulator so as to trick the GCC to accept my "custom" instruction and compile.
As they are many types of MOV instructions which are available in the x86 architecture. As they are various MOV Instructions in the 86 architecture reference.
Therefore coming to my question, which MOV instruction is the least used and that I can edit its underlying micro-ops. Assuming my workload just includes integers i.e. most probably wont be using the xmm and mmx registers and my instructions mirrors the same implementation of a MOV instruction.
Your best bet is regular mov with a prefix that GCC will never emit on its own. i.e. create a new mov encoding that includes a mandatory prefix in front of any other mov. Like how lzcnt is rep bsr.
Or if you're modifying GCC and as, you can add a new mnemonic that just uses otherwise-invalid (in 64-bit mode) single byte opcodes for memory-source, memory-dest, and immediate-source versions of mov. AMD64 freed up several opcodes, including the BCD instructions like AAM, and push/pop most segment registers. (x86-64 can still mov to/from Sregs, but there's just 1 opcode per direction, not 2 per Sreg for push ds/pop ds etc.)
Assuming my workload just includes integers i.e. most probably wont be using the xmm and mmx registers
Bad assumption for XMM: GCC aggressively uses 16-byte movaps / movups instead of copying structs 4 or 8 bytes at a time. It's not at all rare to find vector mov instructions in scalar integer code as part of inline expansion of small known-length memcpy or struct / array init. Also, those mov instructions have at least 2-byte opcodes (SSE1 0F 28 movaps, so a prefix in front of plain mov is the same size as your idea would have been).
However, you're right about MMX regs. I don't think modern GCC will ever emit movq mm0, mm1 or use MMX at all, unless you use MMX intrinsics. Definitely not when targeting 64-bit code.
Also mov to/from control regs (0f 21/23 /r) or debug registers (0f 20/22 /r) are both the mov mnemonic, but gcc will definitely never emit either on its own. Only available with GP register operands as the operand that isn't the debug or control register. So that's technically the answer to your title question, but probably not what you actually want.
GCC doesn't parse its inline asm template string, it just includes it in its asm text output to feed to the assembler after substituting for %number operands. So GCC itself is not an obstacle to emitting arbitrary asm text using inline asm.
And you can use .byte to emit arbitrary machine code.
Perhaps a good option would be to use a 0E byte as a prefix for your special mov encoding that you're going to make GEM decode specially. 0E is push CS in 32-bit mode, invalid in 64-bit mode. GCC will never emit either.
Or just an F2 repne prefix; GCC will never emit repne in front of a mov opcode (where it doesn't apply), only movs. (F3 rep / repe means xrelease when used on a memory-destination instruction so don't use that. https://www.felixcloutier.com/x86/xacquire:xrelease says that F2 repne is the xacquire prefix when used with locked instructions, which doesn't include mov to memory so it will be silently ignored there.)
As usual, prefixes that don't apply have no documented behaviour, but in practice CPUs that don't understand a rep / repne ignore it. Some future CPU might understand it to mean something special, and that's exactly what you're doing with GEM.
Picking .byte 0x0e; instead of repne; might be a better choice if you want to guard against accidentally leaving these prefixes in a build you run on a real CPU. (It will #UD -> SIGILL in 64-bit mode, or usually crash from messing up the stack in 32-bit mode.) But if you do want to be able to run the exact same binary on a real CPU, with the same code alignment and everything, then an ignored REP prefix is ideal.
Using a prefix in front of a standard mov instruction has the advantage of letting the assembler encode the operands for you:
template<class T>
void fancymov(T& dst, T src) {
// fixme: imm -> mem needs a size suffix, defeating template
// unless you use Intel-syntax where the operand includes "dword ptr"
asm("repne; movl %1, %0"
#if 1
: "=m"(dst)
: "ri" (src)
#else
: "=g,r"(dst)
: "ri,rmi" (src)
#endif
: // no clobbers
);
}
void test(int *dst, long src) {
fancymov(*dst, (int)src);
fancymov(dst[1], 123);
}
(Multi-alternative constraints let the compiler pick either reg/mem destination or reg/mem source. In practice it prefers the register destination even when that will cost it another instruction to do its own store, so that sucks.)
On the Godbolt compiler explorer, for the version that only allows a memory-destination:
test(int*, long):
repne; movl %esi, (%rdi) # F2 E9 37
repne; movl $123, 4(%rdi) # F2 C7 47 04 7B 00 00 00
ret
If you wanted this to be usable for loads, I think you'd have to make 2 separate versions of the function and use the load version or store version manually, where appropriate, because GCC seems to want to use reg,reg whenever it can.
Or with the version allowing register outputs (or another version that returns the result as a T, see the Godbolt link):
test2(int*, long):
repne; mov %esi, %esi
repne; mov $123, %eax
movl %esi, (%rdi)
movl %eax, 4(%rdi)
ret

What is asm instruction "jmpq *0xa48201(%rip)" exactly doing? [duplicate]

How is the address 0x600860 computed in the Intel instruction below? 0x4003b8 + 0x2004a2 = 60085a, so I don't see how the computation is carried out.
0x4003b8 <puts#plt>: jmpq *0x2004a2(%rip) # 0x600860 <puts#got.plt>
On Intel, JMP, CALL, etc. are relative to the program counter of the next instruction.
The next instruction in your case was at 0x4003be, and 0x4003be + 0x2004a2 == 0x600860
It's AT&T syntax for a memory-indirect JMP with a RIP-relative addressing mode.
The jump address is fetched from the memory location that is specified relative to the instruction pointer:
first calculate 0x4003be + 0x2004a2 == 0x600860 then fetch the address to jump to from location 0x600860.
Other addressing modes are possible, for example a jump-table might use
jmpq *(%rdi, %rax, 8) with the table base in RDI and the index in RAX.
RIP-relative addressing for static data is common, though. In this case, it's addressing an entry in the GOT (Global Offset Table), set up by dynamic linking.

Is atomic.LoadUint32 necessary?

Go's atomic package provides function func LoadUint32(addr *uint32) (val uint32). I looked into the assembly implementation:
TEXT ·LoadUint32(SB),NOSPLIT,$0-12
MOVQ addr+0(FP), AX
MOVL 0(AX), AX
MOVL AX, val+8(FP)
RET
which basically load the value from the memory address and return it.
I'm wondering if we have a uint32 pointer(addr) x, what is the difference between calling atomic.LoadUint32(x) and directly access it using *x?
which basically load the value from the memory address and return it.
That is the case in your context, but might differ on a different machine architecture where atomicity is to be implemented, as discussed here.
As mentioned in go issue 8739
We intrinsify both sync/atomic and runtime/internal/atomic for a bunch of architectures.
The APIs are not unified (e.g. LoadUint32 in sync/atomic is Load in runtime/internal/atomic).
(* "intrinsify" as in issue 4947)
As mentioned in my first link:
Regarding loads and stores.
Memory model along with instruction set specifies whether plain loads and stores are atomic or not. Typical guarantee for all modern commodity hardware is that aligned word-sized loads and stores are atomic. For example, on x86 architecture (IA-32 and Intel 64) 1-, 2-, 4-, 8- and 16-byte aligned loads and stores are all atomic (that is, plain MOV instruction, MOVQ and MOVDQA are atomic).

Windows x64 ABI. How can debugger show you arguments passed to functions

In x86 calling conventions parameters are passed on the stack and when using base pointers in a frame it is possible to reconstruct from a call stack what parameters have been passed to successive stack functions (actually the process is done in reverse order from last functioned called going back)
How can we do the same in x64 ABI considering (as per x64 ABI) that registers used for parameter passing RCX, RDX, R8, R9 -> are all volatile and thus loose their values between frames (with no stack backup). ?

cedecl calling convention -- compiled asm instructions cause crash

Treat this more as pseudocode than anything. If there's some macro or other element that you feel should be included, let me know.
I'm rather new to assembly. I programmed on a pic processor back in college, but nothing since.
The problem here (segmentation fault) is the first instruction after "Compile function entrance, setup stack frame." or "push %ebp". Here's what I found out about those two instructions:
http://unixwiz.net/techtips/win32-callconv-asm.html
Save and update the %ebp :
Now that we're in the new function, we need a new local stack frame pointed to by %ebp, so this is done by saving the current %ebp (which belongs to the previous function's frame) and making it point to the top of the stack.
push ebp
mov ebp, esp // ebp « esp
Once %ebp has been changed, it can now refer directly to the function's arguments as 8(%ebp), 12(%ebp). Note that 0(%ebp) is the old base pointer and 4(%ebp) is the old instruction pointer.
Here's the code. This is from a JIT compiler for a project I'm working on. I'm doing this more for the learning experience than anything.
IL_CORE_COMPILE(avs_x86_compiler_compile)
{
X86GlobalData *gd = X86_GLOBALDATA(ctx);
ILInstruction *insn;
avs_debug(print("X86: Compiling started..."));
/* Initialize X86 Assembler opcode context */
x86_context_init(&gd->ctx, 4096, 1024*1024);
/* Compile function entrance, setup stack frame*/
x86_emit1(&gd->ctx, pushl, ebp);
x86_emit2(&gd->ctx, movl, esp, ebp);
/* Setup floating point rounding mode to integer truncation */
x86_emit2(&gd->ctx, subl, imm(8), esp);
x86_emit1(&gd->ctx, fstcw, disp(0, esp));
x86_emit2(&gd->ctx, movl, disp(0, esp), eax);
x86_emit2(&gd->ctx, orl, imm(0xc00), eax);
x86_emit2(&gd->ctx, movl, eax, disp(4, esp));
x86_emit1(&gd->ctx, fldcw, disp(4, esp));
for (insn=avs_il_tree_base(tree); insn != NULL; insn = insn->next) {
avs_debug(print("X86: Compiling instruction: %p", insn));
compile_opcode(gd, obj, insn);
}
/* Restore floating point rounding mode */
x86_emit1(&gd->ctx, fldcw, disp(0, esp));
x86_emit2(&gd->ctx, addl, imm(8), esp);
/* Cleanup stack frame */
x86_emit0(&gd->ctx, emms);
x86_emit0(&gd->ctx, leave);
x86_emit0(&gd->ctx, ret);
/* Link machine */
obj->run = (AvsRunnableExecuteCall) gd->ctx.buf;
return 0;
}
And when obj->run is called, it's called with obj as its only argument:
obj->run(obj);
If it helps, here are the instructions for the entire function call. It's basically an assignment operation: foo=3*0.2;. foo is pointing to a float in C.
0x8067990: push %ebp
0x8067991: mov %esp,%ebp
0x8067993: sub $0x8,%esp
0x8067999: fnstcw (%esp)
0x806799c: mov (%esp),%eax
0x806799f: or $0xc00,%eax
0x80679a4: mov %eax,0x4(%esp)
0x80679a8: fldcw 0x4(%esp)
0x80679ac: flds 0x806793c
0x80679b2: fsts 0x805f014
0x80679b8: fstps 0x8067954
0x80679be: fldcw (%esp)
0x80679c1: add $0x8,%esp
0x80679c7: emms
0x80679c9: leave
0x80679ca: ret
Edit: Like I said above, in the first instruction in this function, %ebp is void. This is also the instruction that causes the segmentation fault. Is that because it's void, or am I looking for something else?
Edit: Scratch that. I keep typing edp instead of ebp. Here are the values of ebp and esp.
(gdb) print $esp
$1 = (void *) 0xbffff14c
(gdb) print $ebp
$3 = (void *) 0xbffff168
Edit: Those values above are wrong. I should have used the 'x' command, like below:
(gdb) x/x $ebp
0xbffff168: 0xbffff188
(gdb) x/x $esp
0xbffff14c: 0x0804e481
Here's a reply from someone on a mailing list regarding this. Anyone care to illuminate what he means a bit? How do I check to see how the stack is set up?
An immediate problem I see is that the
stack pointer is not properly aligned.
This is 32-bit code, and the Intel
manual says that the stack should be
aligned at 32-bit addresses. That is,
the least significant digit in esp
should be 0, 4, 8, or c.
I also note that the values in ebp and
esp are very far apart. Typically,
they contain similar values --
addresses somewhere in the stack.
I would look at how the stack was set
up in this program.
He replied with corrections to the above comments. He was unable to see any problems after further input.
Another edit: Someone replied that the code page may not be marked executable. How can I insure it is marked as such?
The problem had nothing to do with the code. Adding -z execstack to the linker fixed the problem.
If push %ebp is causing a segfault, then your stack pointer isn't pointing at valid stack. How does control reach that point? What platform are you on, and is there anything odd about the runtime environment? At the entry to the function, %esp should point to the return address in the caller on the stack. Does it?
Aside from that, the whole function is pretty weird. You go out of your way to set the rounding bits in the fp control word, and then don't perform any operations that are affected by rounding. All the function does is copy some data, but uses floating-point registers to do it when you could use the integer registers just as well. And then there's the spurious emms, which you need after using MMX instructions, not after doing x87 computations.
Edit See Scott's (the original questioner) answer for the actual reason for the crash.

Resources