How do I lookup a value in a table? - go

In Go assembly on arm64, I have created a table of values
DATA table<>+0(SB)/4, 0x00000001
DATA table<>+4(SB)/4, 0x00000002
DATA table<>+8(SB)/4, 0x00000003
DATA table<>+12(SB)/4, 0x00000004
But what I want to be able to do is load up a value into a register from this table, but based on a variable.
If I had a constant I could do
MOVD table<>+4(SB), R1
so R1=0x00000002
but how can I do it with a variable? Something like...
MOVD $4, R0
MOVD table<>+R0(SB), R1
Or better yet, can I get the address and load a vector directly?
I guess the answer in normal are is ADR, but when I try that in go
ADR table<>(SB), R0
I just get
asm: illegal combination: 00280 [...] ADR table<>(SB), R9 ADDR NONE NONE REG, 3 7
Which is maybe the least useful error message I've ever seen.
Okay, so ADR works if I do PC relative addressing, but that's obviously not right.

Turns out it's really easy, you just put a $ before the variable
MOVD $table<>+0(SB), R0

Related

How do I use PinTool to find the offset from an address stored in a register?

This might be very easy but I'm new to pintool. Basically, my question is: for an instruction such as:
mov 0x28(%rax) %xmm1
How do I record the value 0x28 into the output trace file? Is it the difference between IARG_MEMORYREAD_EA and IARG_REG_VALUE?
Can you check, if 0x28 is overwritten or stays the same?
In Assambler it does not change the value in the register at all.
So you could just use 0x28 again to do what ever.

Translating Go assembler to NASM

I came across the following Go code:
type Element [12]uint64
//go:noescape
func CSwap(x, y *Element, choice uint8)
//go:noescape
func Add(z, x, y *Element)
where the CSwap and Add functions are basically coming from an assembly, and look like the following:
TEXT ·CSwap(SB), NOSPLIT, $0-17
MOVQ x+0(FP), REG_P1
MOVQ y+8(FP), REG_P2
MOVB choice+16(FP), AL // AL = 0 or 1
MOVBLZX AL, AX // AX = 0 or 1
NEGQ AX // RAX = 0x00..00 or 0xff..ff
MOVQ (0*8)(REG_P1), BX
MOVQ (0*8)(REG_P2), CX
// Rest removed for brevity
TEXT ·Add(SB), NOSPLIT, $0-24
MOVQ z+0(FP), REG_P3
MOVQ x+8(FP), REG_P1
MOVQ y+16(FP), REG_P2
MOVQ (REG_P1), R8
MOVQ (8)(REG_P1), R9
MOVQ (16)(REG_P1), R10
MOVQ (24)(REG_P1), R11
// Rest removed for brevity
What I try to do is that translate the assembly to a syntax that is more familiar to me (I think mine is more like NASM), while the above syntax is Go assembler. Regarding the Add method I didn't have much problem, and translated it correctly (according to test results). It looks like this in my case:
.text
.global add_asm
add_asm:
push r12
push r13
push r14
push r15
mov r8, [reg_p1]
mov r9, [reg_p1+8]
mov r10, [reg_p1+16]
mov r11, [reg_p1+24]
// Rest removed for brevity
But, I have a problem when translating the CSwap function, I have something like this:
.text
.global cswap_asm
cswap_asm:
push r12
push r13
push r14
mov al, 16
mov rax, al
neg rax
mov rbx, [reg_p1+(0*8)]
mov rcx, [reg_p2+(0*8)]
But this doesn't seem to be quite correct, as I get error when compiling it. Any ideas how to translate the above CSwap assembly part to something like NASM?
EDIT (SOLUTION):
Okay, after the two answers below, and some testing and digging, I found out that the code uses the following three registers for parameter passing:
#define reg_p1 rdi
#define reg_p2 rsi
#define reg_p3 rdx
Accordingly, rdx has the value of the choice parameter. So, all that I had to do was use this:
movzx rax, dl // Get the lower 8 bits of rdx (reg_p3)
neg rax
Using byte [rdx] or byte [reg_3] was giving an error, but using dl seems to work fine for me.
Basic docs about Go's asm: https://golang.org/doc/asm. It's not totally equivalent to NASM or AT&T syntax: FP is a pseudo-register name for whichever register it decides to use as the frame pointer. (Typically RSP or RBP). Go asm also seems to omit function prologue (and probably epilogue) instructions. As #RossRidge comments, it's a bit more like a internal representation like LLVM IR than truly asm.
Go also has its own object-file format, so I'm not sure you can make Go-compatible object files with NASM.
If you want to call this function from something other than Go, you'll also need to port the code to a different calling convention. Go appears to be using a stack-args calling convention even for x86-64, unlike the normal x86-64 System V ABI or the x86-64 Windows calling convention. (Or maybe those mov function args into REG_P1 and so on instructions disappear when Go builds this source for a register-arg calling convention?)
(This is why you could you had to use movzx eax, dl instead of loading from the stack at all.)
BTW, rewriting this code in C instead of NASM would probably make even more sense if you want to use it with C. Small functions are best inlined and optimized away by the compiler.
It would be a good idea to check your translation, or get a starting point, by assembling with the Go assembler and using a disassembler.
objdump -drwC -Mintel or Agner Fog's objconv disassembler would be good, but they don't understand Go's object-file format. If Go has a tool to extract the actual machine code or get it in an ELF object file, do that.
If not, you could use ndisasm -b 64 (which treats input files as flat binaries, disassembling all the bytes as if they were instructions). You can specify an offset/length if you can find out where the function starts. x86 instructions are variable length, and disassembly will likely be "out of sync" at the start of the function. You might want to add a bunch of single-byte NOP instructions (kind of a NOP sled) for the disassembler, so if it decodes some 0x90 bytes as part of an immediate or disp32 for a long instruction that was really not part of the function, it will be in sync. (But the function prologue will still be messed up).
You might add some "signpost" instructions to your Go asm functions to make it easy to find the right place in the mess of crazy asm from disassembling metadata as instructions. e.g. put a pmuludq xmm0, xmm0 in there somewhere, or some other instruction with a unique mnemonic that you can search for which the Go code doesn't include. Or an instruction with an immediate that will stand out, like addq $0x1234567, SP. (An instruction that will crash so you don't forget to take it out again is good here.)
Or you could use gdb's built-in disassembler: add an instruction that will segfault (like a load from a bogus absolute address (movl 0, AX null-pointer deref), or a register holding a non-pointer value e.g. movl (AX), AX). Then you'll have an instruction-pointer value for the instructions in memory, and can disassemble from some point behind that. (Probably the function start will be 16-byte aligned.)
Specific instructions.
MOVBLZX AL, AX reads AL, so that's definitely an 8-bit operand. The size for AX is given by the L part of the mnemonic, meaning long for 32 bit, like in GAS AT&T syntax. (The gas mnemonic for that form of movzx is movzbl %al, %eax). See What does cltq do in assembly? for a table of cdq / cdqe and the AT&T equivalent, and the AT&T / Intel mnemonic for the equivalent MOVSX instruction.
The NASM instruction you want is movzx eax, al. Using rax as the destination would be a waste of a REX prefix. Using ax as the destination would be a mistake: it wouldn't zero-extend into the full register, and would leave whatever high garbage. Go asm syntax for x86 is very confusing when you're not used to it, because AX can mean AX, EAX, or RAX depending on the operand size.
Obviously mov rax, al isn't a possibility: Like most instructions, mov requires both its operands to be the same size. movzx is one of the rare exceptions.
MOVB choice+16(FP), AL is a byte load into AL, not an immediate move. choice+16 is a an offset from FP. This syntax is basically the same as AT&T addressing modes, with FP as a register and choice as an assemble-time constant.
FP is a pseudo-register name. It's pretty clear that it should simply be loading the low byte of the 3rd arg-passing slot, because choice is the name of a function arg. (In Go asm, choice is just syntactic sugar, or a constant defined as zero.)
Before a call instruction, rsp points at the first stack arg, so that + 16 is the 3rd arg. It appears that FP is that base address (and might actually be rsp+8 or something). After a call (which pushes an 8 byte return address), the 3rd stack arg is at rsp + 24. After more pushes, the offset will be even larger, so adjust as necessary to reach the right location.
If you're porting this function to be called with a standard calling convention, the 3 integer args will be passed in registers, with no stack args. Which 3 registers depends on whether you're building for Windows vs. non-Windows. (See Agner Fog's calling conventions doc: http://agner.org/optimize/)
BTW, a byte load into AL and then movzx eax, al is just dumb. Much more efficient on all modern CPUs to do it in one step with
movzx eax, byte [rsp + 24] ; or rbp+32 if you made a stack frame.
I hope the source in the question is from un-optimized Go compiler output? Or the assembler itself makes such optimizations?
I think you can translate these as just
mov rbx, [reg_p1]
mov rcx, [reg_p2]
Unless I'm missing some subtlety, the offsets which are zero can just be ignored. The *8 isn't a size hint since that's already in the instruction.
The rest of your code looks wrong though. The MOVB choice+16(FP), AL in the original is supposed to be fetching the choice argument into AL, but you're setting AL to a constant 16, and the code for loading the other arguments seems to be completely missing, as is the code for all of the arguments in the other function.

Stack problems when calling printf from an ARM assembly function

I have an ARM assembly function that is called from a C function.
At some point, I do something like this:
.syntax unified
.arm
.text
.globl myfunc
.extern printf
myfunc:
stmdb sp!, {r4-r11} // save stack from C call
... do stuff ...
// (NOT SHOWN): Load values into r1 and r2 to be printed by format string above
ldr r0, =message // Load format string above
push {lr} // me attempting to preserve my stack
bl printf // actual call to printf
pop {lr} // me attempting to recover my stack
ldmia sp!, {r4-r11} // recover stack from C call
mov r0, r2 // Move return value into r0
mov pc, lr // Return to C
.section data
message:
.asciz "Output: %d, %d\n"
.end
This runs sometimes, crashes sometimes, runs a few times then crashes, etc. It actually runs on a quasi bare-metal context, so I can't run a debugger. I'm 99% sure it's a stack -- or alignment? -- thing, as per this Printf Change values in registers, ARM Assembly and this Call C function from Assembly -- the application freezes at "call printf" and I have no idea why.
Can anyone provide some specific ideas for how to get the above chunk of code running, and perhaps general ideas for best practices here? Ideally I'd like to be able to call the same output function multiple times in my assembly file, to debug things as I go.
Thanks in advance!
I could see the following issues in that code:
.align 2 (could be 3 or any higher value) before function entry point (myfunc:)
.align 2 // guarantee that instruction address is 4B aligned
myfunc:
as was mentioned in comments, stack is expected to be 8B aligned. push {lr} breaks that.
message: doesn't need to be in 'data' section. It might be placed in code section behind 'myfunc'. Check linker map that data is actually present & address loaded into r0 is correct.
Since that a bare-metal, check that stack is set properly and enough room is reserved for it.

How to disambiguate instructions from data in the .text segment of a PE file?

I have a PE file and I try to disassemble it in order to get it's instructions. However I noticed that .text segment contains not only instructions but also some data (I used IDA to notice that). Here's the example:
.text:004037E4 jmp ds:__CxxFrameHandler3
.text:004037EA ; [00000006 BYTES: COLLAPSED FUNCTION _CxxThrowException. PRESS KEYPAD "+" TO EXPAND]
.text:004037F0 ;
.text:004037F0 mov ecx, [ebp-10h]
.text:004037F3 jmp ds:??1exception#std##UAE#XZ ; std::exception::~exception(void)
.text:004037F3 ;
.text:004037F9 byte_4037F9 db 8Bh, 54h, 24h ; DATA XREF: sub_401440+2o
.text:004037FC dd 0F4428D08h, 33F04A8Bh, 0F6B2E8C8h, 0C4B8FFFFh, 0E9004047h
.text:004037FC dd 0FFFFFFD0h, 3 dup(0CCCCCCCCh), 0E904458Bh, 0FFFFD9B8h
.text:00403828 dword_403828 dd 824548Bh, 8BFC428Dh, 0C833F84Ah, 0FFF683E8h, 47F0B8FFh
.text:00403828 ; DATA XREF: sub_4010D0+2o
.text:00403828 ; .text:00401162o
.text:00403828 dd 0A1E90040h, 0CCFFFFFFh, 3 dup(0CCCCCCCCh), 50E0458Dh
.text:00403828 dd 0FFD907E8h, 458DC3FFh, 0D97EE9E0h
.text:00403860 db 2 dup(0FFh)
.text:00403862 word_403862 dw 548Bh
How can I distinct such data from instructions? My solution to this problem was to find simply the first instruction (enter address) and visit each instruction and all called functions. Unfortunatelly it occured that there are some blocks of code which are not directly called but their addresses are in .rdata segment among some data and I have no idea how distinct valid instruction addresses from data.
To sum up: is there any way to decide whether some address in .text segment contains data or instructions? Or maybe is there any way to decide which potential addresses in .rdata should be interpreted as instructions addresses and which as data?
You cannot, in general. The .text section of a PE file can mix up code and constants any way the author likes. Programs like IDA try to make sense of this by starting with the entrypoints and then disassembling, and seeing which addresses are targets of jumps, and which of reads. But devious programs can 'pun' between instructions and data.

ARM/Thumb code for firmware patches...How to tell gcc assembler / linker to BL to absolute addr?

I'm trying to write a firmware mod (to existing firmware, for which i don't have source code)
All Thumb code.
does anybody have any idea how to do this, in gcc as (GAS) assembler:
Use BL without having to manually calculate offsets, when BL'ing to some existing function (not in my code.. but i know its address)
Currently, if i want to use BL ...i have to :
-go back in my code
-figure out and add all the bytes that would result from assembling all the previous instructions in the function i'm writing
-add the begining address of my function to that (i specify the starting address of what i'm writing, in the linker script)
-and then substract the address of the firmfunc function i want to call
All this... just to calculate the offset... to be able to write abl offset... to call an existing firmware function?
And if i change any code before that BL, i have to do it all over again manually !
See.. this is why i want to learn to use BX right... instead of BL
Also, i don't quite understand the BX. If i use BX to jump to an absolute address, do i have to increase the actual address by 1, when caling Thumb code from Thumb code (to keep the lsb byte 1)... and the CPU will know it's thumb code ?
BIG EDIT:
Changing the answer based on what I have learned recently and a better understanding of the question
First off I dont know how to tell the linker to generate a bl to an address that is a hardcoded address and not actually in this code. You might try to rig up an elf file that has labels and such but dummy or no code, dont know if that will fool the linker or not. You would have to modify the linker script as well. not worth it.
your other question that was spawned from this one:
Arm/Thumb: using BX in Thumb code, to call a Thumb function, or to jump to a Thumb instruction in another function
For branching this works just fine:
LDR R6, =0x24000
ADD R6, #1 # (set lsb to 1)
BX R6
or save an instruction and just do this
LDR R6, =0x24001
BX R6
if you want to branch link and you know the address and you are in thumb mode and want to get to thumb code then
ldr r6,=0x24001
bl thumb_trampoline
;#returns here
...
.thumb_func
thumb_trampoline:
bx r6
And almost the exact same if you are starting in arm mode, and want to get to thumb code at an address you already know.
ldr r6,=0x24001
bl arm_trampoline
;#returns here
...
arm_trampoline:
bx r6
You have to know that you can trash r6 in this way (make sure r6 isnt saving some value being used by some code that called this code).
Very sorry misleading you with the other answer, I could swear that mov lr,pc pulled in the lsbit as a mode, but it doesnt.
The accepted answer achieves the desired goal, but to address the answer exactly as asked you can use the .equ directive to associate a constant vale with a symbol, that can then be used as an operand to instructions. This has the assembler synthesise the trampoline if/when necessary:
equ myFirmwareFunction, 0x12346570
.globl _start
mov r0, #42
b myFirmwareFunction
Which generates the following assembly[1]
01000000 <_start>:
1000000: e3a0002a mov r0, #42 ; 0x2a
1000004: eaffffff b 1000008 <__*ABS*0x12346570_veneer>
01000008 <__*ABS*0x12346570_veneer>:
__*ABS*0x12346570_veneer():
1000008: e51ff004 ldr pc, [pc, #-4] ; 100000c <__*ABS*0x12346570_veneer+0x4>
100000c: 12346570 data: #0x12345670
If the immediate value is close enough to PC that the offset will fit in the immediate field, then the verneer (trampoline) is skipped and you will get a single branch instruction to the specified constant address.
[1] using the codesorcery (2009q1) toolchain with:
arm-none-eabi-gcc -march=armv7-a -x assembler test.spp -o test.elf -Ttext=0x1000000 -nostdlib

Resources