use assembly call other custom function written by assembly - go

I want to use go assembly implement below two functions
func AA(a, b int) {
var ret = sum(a, b)
println(ret)
}
func sum(a, b int) int {
return a+b
}
and use in main.go
func main() {
pkg.AA(1,2)
}
Below is my code:
pkg/pkg.go
func Sum(a,b int) int
func AA(a, b int)
pkg/pkg_amd64.s
TEXT ·AA(SB),NOSPLIT,$8-16
MOVQ arg1+0(FP), AX
MOVQ arg2+8(FP), BX
MOVQ AX, (SP)
MOVQ BX, +8(SP)
CALL ·Sum(SB)
RET
TEXT ·Sum(SB),NOSPLIT,$0
MOVQ arg1+0(FP), AX // AX = arg1
MOVQ arg2+8(FP), BX // BX = arg2
ADDQ AX, BX // BX += AX
MOVQ BX, ret1+16(FP)
RET
But when I execute go run main.go, there have some errors that hardly to debug,
I think the error is wrong way to call Sum, if you have any idea ,please give some advise for that error,thank a lot!
$ go run main.go
runtime: unexpected return pc for runtime.sigpanic called from 0x1
stack: frame={sp:0xc000056720, fp:0xc000056758} stack=[0xc000056000,0xc000056800)
000000c000056620: 0000000000000000 000000c0000566f0
000000c000056630: 000000000102ca65 <runtime.gopanic+293> 000000c000000180
000000c000056640: 000000000102b0fb <runtime.panicmem+91> 000000c000056700
000000c000056650: 0000000000000000 0000000000000055
000000c000056660: 000000000113a9f0 00000000010cb720
000000c000056670: 000000c0000566a8 000000000100aaea <runtime.(*mcache).nextFree+170>
000000c000056680: 000000000113a9f0 000000c000000180
000000c000056690: 0000000001004b85 <runtime.chanrecv+997> 000000c00006e000
000000c0000566a0: 000000000113a9f0 000000c000056730
000000c0000566b0: 000000c0000001a0 0000000000000000
000000c0000566c0: 0000000001067c40 00000000010c93b0
000000c0000566d0: 0000000000000000 0000000000000000
000000c0000566e0: 0000000000000000 0000000000000000
000000c0000566f0: 000000c000056710 000000000102b0fb <runtime.panicmem+91>
000000c000056700: 0000000001067c40 00000000010c93b0
000000c000056710: 000000c000056748 0000000001041dd9 <runtime.sigpanic+377>
000000c000056720: <00000000010cb720 0000000000000000
000000c000056730: 000000c000056778 000000c000056778
000000c000056740: 000000c000000180 000000c000056758
000000c000056750: !0000000000000001 >0000000000000002
000000c000056760: 000000000105e2f3 <main.main+51> 000000000105e2f5 <main.main+53>
000000c000056770: 0000000000000002 000000c0000567d0
000000c000056780: 000000000102fad6 <runtime.main+598> 000000c00007a000
000000c000056790: 0000000000000000 000000c00007a000
000000c0000567a0: 0000000000000000 0100000000000000
000000c0000567b0: 0000000000000000 000000c000000180
000000c0000567c0: 000000c0000567ae 0000000001078b80
000000c0000567d0: 0000000000000000 000000000105b061 <runtime.goexit+1>
000000c0000567e0: 0000000000000000 0000000000000000
000000c0000567f0: 0000000000000000 0000000000000000
fatal error: unknown caller pc
runtime stack:
runtime.throw(0x1074e46, 0x11)
/usr/local/go/src/runtime/panic.go:1117 +0x72
runtime.gentraceback(0x102b0fb, 0xc000056700, 0x0, 0xc000000180, 0x0, 0x0, 0x7fffffff, 0x7ffeefbff3c0, 0x0, 0x0, ...)
/usr/local/go/src/runtime/traceback.go:261 +0x1a56
runtime.addOneOpenDeferFrame.func1()
/usr/local/go/src/runtime/panic.go:717 +0x91
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:379 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1246
goroutine 1 [running]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:339 fp=0xc0000565f8 sp=0xc0000565f0 pc=0x10593e0
runtime.addOneOpenDeferFrame(0xc000000180, 0x102b0fb, 0xc000056700)
/usr/local/go/src/runtime/panic.go:716 +0x7b fp=0xc000056638 sp=0xc0000565f8 pc=0x102bedb
panic(0x1067c40, 0x10c93b0)
/usr/local/go/src/runtime/panic.go:925 +0x125 fp=0xc000056700 sp=0xc000056638 pc=0x102ca65
runtime.panicmem()
/usr/local/go/src/runtime/panic.go:212 +0x5b fp=0xc000056720 sp=0xc000056700 pc=0x102b0fb
runtime: unexpected return pc for runtime.sigpanic called from 0x1
stack: frame={sp:0xc000056720, fp:0xc000056758} stack=[0xc000056000,0xc000056800)
000000c000056620: 0000000000000000 000000c0000566f0
000000c000056630: 000000000102ca65 <runtime.gopanic+293> 000000c000000180
000000c000056640: 000000000102b0fb <runtime.panicmem+91> 000000c000056700
000000c000056650: 0000000000000000 0000000000000055
000000c000056660: 000000000113a9f0 00000000010cb720
000000c000056670: 000000c0000566a8 000000000100aaea <runtime.(*mcache).nextFree+170>
000000c000056680: 000000000113a9f0 000000c000000180
000000c000056690: 0000000001004b85 <runtime.chanrecv+997> 000000c00006e000
000000c0000566a0: 000000000113a9f0 000000c000056730
000000c0000566b0: 000000c0000001a0 0000000000000000
000000c0000566c0: 0000000001067c40 00000000010c93b0
000000c0000566d0: 0000000000000000 0000000000000000
000000c0000566e0: 0000000000000000 0000000000000000
000000c0000566f0: 000000c000056710 000000000102b0fb <runtime.panicmem+91>
000000c000056700: 0000000001067c40 00000000010c93b0
000000c000056710: 000000c000056748 0000000001041dd9 <runtime.sigpanic+377>
000000c000056720: <00000000010cb720 0000000000000000
000000c000056730: 000000c000056778 000000c000056778
000000c000056740: 000000c000000180 000000c000056758
000000c000056750: !0000000000000001 >0000000000000002
000000c000056760: 000000000105e2f3 <main.main+51> 000000000105e2f5 <main.main+53>
000000c000056770: 0000000000000002 000000c0000567d0
000000c000056780: 000000000102fad6 <runtime.main+598> 000000c00007a000
000000c000056790: 0000000000000000 000000c00007a000
000000c0000567a0: 0000000000000000 0100000000000000
000000c0000567b0: 0000000000000000 000000c000000180
000000c0000567c0: 000000c0000567ae 0000000001078b80
000000c0000567d0: 0000000000000000 000000000105b061 <runtime.goexit+1>
000000c0000567e0: 0000000000000000 0000000000000000
000000c0000567f0: 0000000000000000 0000000000000000
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:734 +0x179 fp=0xc000056758 sp=0xc000056720 pc=0x1041dd9
exit status 2

I do many methods and finally, and use the go tool compile -N -l -S main.go to debug below code for some useful infomation:
func printsum(a, b int) int{
return sum1(a, b)
}
func sum1(a, b int) int {
return a+b
}
and I find a method can pass the funtion,
TEXT ·AA(SB),NOSPLIT,$40-24
MOVQ a+0(FP), AX
MOVQ b+8(FP), BX
MOVQ AX, 0(SP) // need to put AX(arg1) in 0(SP), when call Sum, it will be used as the first param
MOVQ BX, 0x8(SP) // need to place BX(arg1) in 8(SP), when call Sum, it will be used as secnod param
CALL ·Sum(SB)
MOVQ 16(SP), AX
MOVQ AX,ret1+16(FP)
RET
TEXT ·Sum(SB),NOSPLIT,$0-24
MOVQ arg1+0(FP), AX // AX = arg1
MOVQ arg2+8(FP), BX // BX = arg2
ADDQ AX, BX // BX += AX
MOVQ BX, ret1+16(FP)
// MOVQ AX, (SP)
// CALL ·print(SB)
RET
But i still have a question, why use MOVQ 16(SP), AX can get the return value that from Sum, because i think the AA function stack is like this
------
ret0 (8 bytes)
------
arg1 (8 bytes)
------
arg0 (8 bytes)
------ FP
ret addr (8 bytes)
------
caller BP (8 bytes)
------ pseudo SP
frame content (8 bytes)
------ hardware SP
and if the function stack is correct, it will work withMOVQ 40(SP), AX , I'm very confusing about it😭

Related

Why is this linked-list-related program in x86 segfaulting?

I want to allocate some nodes for a linked list. I have an alloc_pair function which seems to work. I included comments to explain the intent of each line in regards to linked lists. My code is giving me a segmentation fault somewhere, but I can't figure out where. GDB is unhelpful as seen here:
Thread 2 hit Breakpoint 1, 0x0000000100003f63 in main ()
(gdb) c
Continuing.
Thread 2 hit Breakpoint 2, 0x0000000100003f4e in alloc_pair ()
(gdb) ni
0x0000000100003f55 in alloc_pair ()
(gdb) ni
0x0000000100003f59 in alloc_pair ()
(gdb) disassemble
Dump of assembler code for function alloc_pair:
0x0000000100003f4e <+0>: mov rdi,0x10
0x0000000100003f55 <+7>: sub rsp,0x8
=> 0x0000000100003f59 <+11>: call 0x100003f96
0x0000000100003f5e <+16>: add rsp,0x8
0x0000000100003f62 <+20>: ret
End of assembler dump.
(gdb) c
Continuing.
Thread 2 received signal SIGSEGV, Segmentation fault.
0x00007fff731d970a in ?? ()
(gdb) bt
#0 0x00007fff731d970a in ?? ()
#1 0x00007ffeefbff828 in ?? ()
#2 0x0000000100008008 in ?? ()
#3 0x0000000000000000 in ?? ()
(gdb)
If you know the mistake that I am making, please let me know.
.global _main
.text
alloc_pair:
push rbp
mov rbp, rsp
mov rdi, 16
sub rsp, 8
call _malloc
add rsp, 8
mov rsp, rbp
pop rbp
ret
_main:
call alloc_pair
mov r13, rax # r13 stores the initial pair allocated
mov qword ptr [rax], 29 # the node 1 head contains 29
mov r12, [rax + 8] # r12 stores the memory location of the node 1 tail
call alloc_pair
mov qword ptr [rax], 7 # the node 2 head contains 7
mov qword ptr [r12], rax # the node 1 tail points to the node 2 head
mov rdi, 0
mov rax, 0x2000001
syscall
This line:
mov r12, [rax + 8] # r12 stores the memory location of the node 1 tail
doesn't do what your comment says it does. This instruction moves the 64-bit contents of memory at [rax+8] to R12. It doesn't move the address of [rax+8] to R12. What you want is to Load Effective Address (LEA) to get the address of [rax+8] into R12. The instruction would look like:
lea r12, [rax + 8] # r12 stores the memory location of the node 1 tail

go runtime.newobject closure variable malloc not in heap

code:
package main
import "fmt"
func test(x, y int) func() {
return func() {
x += y
}
}
func main() {
f := test(0x100, 0x200)
f()
fmt.Println("close")
}
compile:
go build -gcflags "-N -l" -o test main.go
gdb:
info "(gdb)Auto-loading safe path"
(gdb) l
2
3 import "fmt"
4
5 func test(x, y int) func() {
6 return func() {
7 x += y
8 }
9 }
10
11 func main() {
(gdb) l
12 f := test(0x100, 0x200)
13 f()
14 fmt.Println("close")
15 }
(gdb) b 13
Breakpoint 1 at 0x483291: file /home/devops/study/src/function/closed_function.go, line 13.
(gdb) b 6
Breakpoint 2 at 0x4831cc: /home/devops/study/src/function/closed_function.go:6. (2 locations)
(gdb) r
Starting program: /home/devops/study/src/function/test
Breakpoint 2, main.test (x=256, y=512, ~r2={void ()} 0xc420043f20) at /home/devops/study/src/function/closed_function.go:6
6 return func() {
(gdb) set disassembly-flavor intel
(gdb) disassemble
Dump of assembler code for function main.test:
0x0000000000483180 : mov rcx,QWORD PTR fs:0xfffffffffffffff8
0x0000000000483189 : cmp rsp,QWORD PTR [rcx+0x10]
0x000000000048318d : jbe 0x483240
0x0000000000483193 : sub rsp,0x28
0x0000000000483197 : mov QWORD PTR [rsp+0x20],rbp
0x000000000048319c : lea rbp,[rsp+0x20]
0x00000000004831a1 : mov QWORD PTR [rsp+0x40],0x0
0x00000000004831aa : lea rax,[rip+0x103af] # 0x493560
0x00000000004831b1 : mov QWORD PTR [rsp],rax
0x00000000004831b5 : call 0x40e3c0
0x00000000004831ba : mov rax,QWORD PTR [rsp+0x8]
0x00000000004831bf : mov QWORD PTR [rsp+0x18],rax
0x00000000004831c4 : mov rcx,QWORD PTR [rsp+0x30]
0x00000000004831c9 : mov QWORD PTR [rax],rcx
=> 0x00000000004831cc : lea rax,[rip+0x1e44d] # 0x4a1620
0x00000000004831d3 : mov QWORD PTR [rsp],rax
0x00000000004831d7 : call 0x40e3c0 # this wihle return a unsafe.Poniter
0x00000000004831dc : mov rax,QWORD PTR [rsp+0x8]
0x00000000004831e1 : mov QWORD PTR [rsp+0x10],rax
0x00000000004831e6 : lea rcx,[rip+0x133] # 0x483320
0x00000000004831ed : mov QWORD PTR [rax],rcx
0x00000000004831f0 : mov rax,QWORD PTR [rsp+0x10]
0x00000000004831f5 : test BYTE PTR [rax],al
0x00000000004831f7 : mov ecx,DWORD PTR [rip+0xc2de3] # 0x545fe0
0x00000000004831fd : mov rdx,QWORD PTR [rsp+0x18]
0x0000000000483202 : lea rdi,[rax+0x8]
0x0000000000483206 : test ecx,ecx
0x0000000000483208 : jne 0x483236
0x000000000048320a : jmp 0x48320c
0x000000000048320c : mov QWORD PTR [rax+0x8],rdx
0x0000000000483210 : jmp 0x483212
0x0000000000483212 : mov rax,QWORD PTR [rsp+0x10]
0x0000000000483217 : test BYTE PTR [rax],al
0x0000000000483219 : mov rcx,QWORD PTR [rsp+0x38]
0x000000000048321e : mov QWORD PTR [rax+0x10],rcx
0x0000000000483222 : mov rax,QWORD PTR [rsp+0x10]
0x0000000000483227 : mov QWORD PTR [rsp+0x40],rax
0x000000000048322c : mov rbp,QWORD PTR [rsp+0x20]
0x0000000000483231 : add rsp,0x28
0x0000000000483235 : ret
0x0000000000483236 : mov rax,rdx
0x0000000000483239 : call 0x44f5c0
0x000000000048323e : jmp 0x483212
0x0000000000483240 : call 0x44d170
0x0000000000483245 : jmp 0x483180
End of assembler dump.
(gdb) c
Continuing.
Breakpoint 1, main.main () at /home/devops/study/src/function/closed_function.go:13
13
(gdb) x/16xg $rsp
0xc420043f10: 0x0000000000000100 0x0000000000000200
0xc420043f20: 0x000000c42000a060 0x000000000040423c
0xc420043f30: 0x000000c42005a058 0x0000000000000000
0xc420043f40: 0x000000c42000a060 0x00000000004b7465
0xc420043f50: 0x0000000000000000 0x0000000000000000
0xc420043f60: 0x000000c42000e1d0 0x000000c420043f78
0xc420043f70: 0x000000c42005a058 0x000000c420043f80
0xc420043f80: 0x0000000000427f32 0x000000c42005a000
(gdb) x/3xg 0x000000c42000a060
0xc42000a060: 0x0000000000483320 0x000000c420014098
0xc42000a070: 0x0000000000000200
(gdb) x/1xg 0x000000c420014098
0xc420014098: 0x0000000000000100
By looking at the address of the memory of the variable, the final corresponding variable is found, and the memory address is not on the heap, and the runtime. newobject only assigns a pointer that appears to follow and does not continue to point to the operation?

golang doing unexpected heap memory allocation

While benchmarking, I noticed a surprising heap memory allocation. After reducing the repro, I ended up with the following:
// --- Repro file ---
func memAllocRepro(values []int) *[]int {
for {
break
}
return &values
}
// --- Benchmark file ---
func BenchmarkMemAlloc(b *testing.B) {
values := []int{1, 2, 3, 4}
for i := 0; i < b.N; i++ {
memAllocRepro(values)
}
}
And here is the benchmark output:
BenchmarkMemAlloc-4 50000000 40.2 ns/op 32 B/op 1 allocs/op
PASS
ok memalloc_debugging 2.113s
Success: Benchmarks passed.
Now the funny this is, if I remove the for loop, or if I return the slice directly instead of a slice pointer, there are no more heap alloc:
// --- Repro file ---
func noAlloc1(values []int) *[]int {
return &values // No alloc!
}
func noAlloc2(values []int) []int {
for {
break
}
return values // No alloc!
}
// --- Benchmark file ---
func BenchmarkNoAlloc(b *testing.B) {
values := []int{1, 2, 3, 4}
for i := 0; i < b.N; i++ {
noAlloc1(values)
noAlloc2(values)
}
Benchmark result:
BenchmarkNoAlloc-4 300000000 4.20 ns/op 0 B/op 0 allocs/op
PASS
ok memalloc_debugging 1.756s
Success: Benchmarks passed.
I found that very confusing and confirmed with Delve that the disassembly does has an allocation at the start of the memAllocRepro function:
(dlv) disassemble
TEXT main.memAllocRepro(SB) memalloc_debugging/main.go
main.go:10 0x44ce10 65488b0c2528000000 mov rcx, qword ptr gs:[0x28]
main.go:10 0x44ce19 488b8900000000 mov rcx, qword ptr [rcx]
main.go:10 0x44ce20 483b6110 cmp rsp, qword ptr [rcx+0x10]
main.go:10 0x44ce24 7662 jbe 0x44ce88
main.go:10 0x44ce26 4883ec18 sub rsp, 0x18
main.go:10 0x44ce2a 48896c2410 mov qword ptr [rsp+0x10], rbp
main.go:10 0x44ce2f 488d6c2410 lea rbp, ptr [rsp+0x10]
main.go:10 0x44ce34 488d0525880000 lea rax, ptr [rip+0x8825]
main.go:10 0x44ce3b 48890424 mov qword ptr [rsp], rax
=> main.go:10 0x44ce3f* e8bcebfbff call 0x40ba00 runtime.newobject
I must say though, once I hit that point, I couldn't easily dig further. I'm pretty sure it would be possible to know at least which type is allocated by looking at the structure pointed to by the RAX register, but I wasn't very successful doing so. It's been a long time since I've read disassembly like this.
(dlv) regs
Rip = 0x000000000044ce3f
Rsp = 0x000000c042039f30
Rax = 0x0000000000455660
(...)
All that being said, I have 2 questions:
* Anyone can tell why is there a heap allocation there and if it's "expected"?
* How could I have gone further in my debugging session? Dumping memory to hex has a different address layout and go tool objdump will output disassembly, which mangle the content at the address location
Full function dump with go tool objdump:
TEXT main.memAllocRepro(SB) memalloc_debugging/main.go
main.go:10 0x44ce10 65488b0c2528000000 MOVQ GS:0x28, CX
main.go:10 0x44ce19 488b8900000000 MOVQ 0(CX), CX
main.go:10 0x44ce20 483b6110 CMPQ 0x10(CX), SP
main.go:10 0x44ce24 7662 JBE 0x44ce88
main.go:10 0x44ce26 4883ec18 SUBQ $0x18, SP
main.go:10 0x44ce2a 48896c2410 MOVQ BP, 0x10(SP)
main.go:10 0x44ce2f 488d6c2410 LEAQ 0x10(SP), BP
main.go:10 0x44ce34 488d0525880000 LEAQ runtime.types+34656(SB), AX
main.go:10 0x44ce3b 48890424 MOVQ AX, 0(SP)
main.go:10 0x44ce3f e8bcebfbff CALL runtime.newobject(SB)
main.go:10 0x44ce44 488b7c2408 MOVQ 0x8(SP), DI
main.go:10 0x44ce49 488b442428 MOVQ 0x28(SP), AX
main.go:10 0x44ce4e 48894708 MOVQ AX, 0x8(DI)
main.go:10 0x44ce52 488b442430 MOVQ 0x30(SP), AX
main.go:10 0x44ce57 48894710 MOVQ AX, 0x10(DI)
main.go:10 0x44ce5b 8b052ff60600 MOVL runtime.writeBarrier(SB), AX
main.go:10 0x44ce61 85c0 TESTL AX, AX
main.go:10 0x44ce63 7517 JNE 0x44ce7c
main.go:10 0x44ce65 488b442420 MOVQ 0x20(SP), AX
main.go:10 0x44ce6a 488907 MOVQ AX, 0(DI)
main.go:16 0x44ce6d 48897c2438 MOVQ DI, 0x38(SP)
main.go:16 0x44ce72 488b6c2410 MOVQ 0x10(SP), BP
main.go:16 0x44ce77 4883c418 ADDQ $0x18, SP
main.go:16 0x44ce7b c3 RET
main.go:16 0x44ce7c 488b442420 MOVQ 0x20(SP), AX
main.go:10 0x44ce81 e86aaaffff CALL runtime.gcWriteBarrier(SB)
main.go:10 0x44ce86 ebe5 JMP 0x44ce6d
main.go:10 0x44ce88 e85385ffff CALL runtime.morestack_noctxt(SB)
main.go:10 0x44ce8d eb81 JMP main.memAllocRepro(SB)
:-1 0x44ce8f cc INT $0x3
Disassemble of the memory pointed to by the RAX register:
(dlv) disassemble -a 0x0000000000455660 0x0000000000455860
.:0 0x455660 1800 sbb byte ptr [rax], al
.:0 0x455662 0000 add byte ptr [rax], al
.:0 0x455664 0000 add byte ptr [rax], al
.:0 0x455666 0000 add byte ptr [rax], al
.:0 0x455668 0800 or byte ptr [rax], al
.:0 0x45566a 0000 add byte ptr [rax], al
.:0 0x45566c 0000 add byte ptr [rax], al
.:0 0x45566e 0000 add byte ptr [rax], al
.:0 0x455670 8e66f9 mov fs, word ptr [rsi-0x7]
.:0 0x455673 1b02 sbb eax, dword ptr [rdx]
.:0 0x455675 0808 or byte ptr [rax], cl
.:0 0x455677 17 ?
.:0 0x455678 60 ?
.:0 0x455679 0d4a000000 or eax, 0x4a
.:0 0x45567e 0000 add byte ptr [rax], al
.:0 0x455680 c01f47 rcr byte ptr [rdi], 0x47
.:0 0x455683 0000 add byte ptr [rax], al
.:0 0x455685 0000 add byte ptr [rax], al
.:0 0x455687 0000 add byte ptr [rax], al
.:0 0x455689 0c00 or al, 0x0
.:0 0x45568b 004062 add byte ptr [rax+0x62], al
.:0 0x45568e 0000 add byte ptr [rax], al
.:0 0x455690 c0684500 shr byte ptr [rax+0x45], 0x0
Escape analysis determines whether any references to a value escape the function in which the value is declared.
In Go, arguments are passed by value, typically on the stack; the stack is reclaimed at the end of the function. However, returning the reference &values from the memAllocRepro function gives the values parameter declared in memAllocRepro a lifetime beyond the end of the function. The values variable is moved to the heap.
memAllocRepro: &values: Alloc
./escape.go:3:6: cannot inline memAllocRepro: unhandled op FOR
./escape.go:7:9: &values escapes to heap
./escape.go:7:9: from ~r1 (return) at ./escape.go:7:2
./escape.go:3:37: moved to heap: values
The noAlloc1 function is inlined in the main function. The values argument, if necessary, is declared in and does not escape from the main function.
noAlloc1: &values: No Alloc
./escape.go:10:6: can inline noAlloc1 as: func([]int)*[]int{return &values}
./escape.go:23:10: inlining call to noAlloc1 func([]int)*[]int{return &values}
The noAlloc2 function values argument is returned as values. values is returned on the stack. There is no reference to values in the noAlloc2 function and so no escape.
noAlloc2: values: No Alloc
package main
func memAllocRepro(values []int) *[]int {
for {
break
}
return &values
}
func noAlloc1(values []int) *[]int {
return &values
}
func noAlloc2(values []int) []int {
for {
break
}
return values
}
func main() {
memAllocRepro(nil)
noAlloc1(nil)
noAlloc2(nil)
}
Output:
$ go build -a -gcflags='-m -m' escape.go
# command-line-arguments
./escape.go:3:6: cannot inline memAllocRepro: unhandled op FOR
./escape.go:10:6: can inline noAlloc1 as: func([]int) *[]int { return &values }
./escape.go:14:6: cannot inline noAlloc2: unhandled op FOR
./escape.go:21:6: cannot inline main: non-leaf function
./escape.go:23:10: inlining call to noAlloc1 func([]int) *[]int { return &values }
./escape.go:7:9: &values escapes to heap
./escape.go:7:9: from ~r1 (return) at ./escape.go:7:2
./escape.go:3:37: moved to heap: values
./escape.go:11:9: &values escapes to heap
./escape.go:11:9: from ~r1 (return) at ./escape.go:11:2
./escape.go:10:32: moved to heap: values
./escape.go:14:31: leaking param: values to result ~r1 level=0
./escape.go:14:31: from ~r1 (return) at ./escape.go:18:2
./escape.go:23:10: main &values does not escape
$

Local variable location from DWARF info in ARM

I have a C program in file delay.c:
void delay(int num)
{
volatile int i;
for(i=0; i<num; i++);
}
Then I compile the program with gcc 4.6.3 on ARM emulator (armel, more specifically) with command gcc -g -O1 -o delay.o delay.c. The assembly in delay.o is:
00000000 <delay>:
0: e24dd008 sub sp, sp, #8
4: e3a03000 mov r3, #0
8: e58d3004 str r3, [sp, #4]
c: e59d3004 ldr r3, [sp, #4]
10: e1500003 cmp r0, r3
14: da000005 ble 30 <delay+0x30>
18: e59d3004 ldr r3, [sp, #4]
1c: e2833001 add r3, r3, #1
20: e58d3004 str r3, [sp, #4]
24: e59d3004 ldr r3, [sp, #4]
28: e1530000 cmp r3, r0
2c: bafffff9 blt 18 <delay+0x18>
30: e28dd008 add sp, sp, #8
34: e12fff1e bx lr
I want to figure out where the variable i is on the stack of function delay from debugging information. Below is the information about delay and i in .debug_info section:
<1><25>: Abbrev Number: 2 (DW_TAG_subprogram)
<26> DW_AT_external : 1
<27> DW_AT_name : (indirect string, offset: 0x19): delay
<2b> DW_AT_decl_file : 1
<2c> DW_AT_decl_line : 1
<2d> DW_AT_prototyped : 1
<2e> DW_AT_low_pc : 0x0
<32> DW_AT_high_pc : 0x38
<36> DW_AT_frame_base : 0x0 (location list)
<3a> DW_AT_sibling : <0x59>
...
<2><4b>: Abbrev Number: 4 (DW_TAG_variable)
<4c> DW_AT_name : i
<4e> DW_AT_decl_file : 1
<4f> DW_AT_decl_line : 3
<50> DW_AT_type : <0x60>
<54> DW_AT_location : 0x20 (location list)
It shows that the location of i is in the location list. So I output the location list:
Offset Begin End Expression
00000000 00000000 00000004 (DW_OP_breg13 (r13): 0)
00000000 00000004 00000038 (DW_OP_breg13 (r13): 8)
00000000 <End of list>
00000020 0000000c 00000020 (DW_OP_fbreg: -12)
00000020 00000024 00000028 (DW_OP_reg3 (r3))
00000020 00000028 00000038 (DW_OP_fbreg: -12)
00000020 <End of list>
From address 4 to 38, the frame base of delay should be r13 + 8. So from address c to 20 and from address 28 to 38, the location of i is r13 + 8 -12 = r13 - 4.
However, from the assembly, we can know that there is no location r13 - 4 and i is apparently at location r13 + 4.
Do I miss some calculation step? Anyone can explain the difference of i's location between calculation from debugging information and in assembly?
Thanks in advance!
TL;DR The analysis in the question is correct and the discrepancy is a bug in one of the gcc components (GNU Arm Embedded Toolchain is an obvious place to log one).
As it stands, this other answer is incorrect because it erroneously conflates the value of the stack pointer on evaluation of a location expression with the earlier value of the stack pointer on entry to the function.
As far as the DWARF is concerned, the location of i varies with the program counter. Consider, for example, the text address delay+0x18. At this point, the location of i is given by DW_OP_fbreg(-12), i.e. 12 bytes below the frame base. The frame base is given by the parent DW_TAG_subprogram's DW_AT_frame_base attribute which, in this case, is also dependent on the program counter: for delay+0x18 its expression is DW_OP_breg13(8), i.e. r13 + 8. Importantly, this calculation uses the current value of r13, i.e. the value of r13 when the program counter is equal to delay+0x18.
Thus the DWARF asserts that, at delay+0x18, i is located at r13 + 8 - 12, i.e. 4 bytes below the bottom of the existing stack. Inspection of the assembly shows that, at delay+018, i should be found 4 bytes above the bottom of the stack. Therefore the DWARF is in error and whatever generated it is defective.
One can demonstrate the bug using gdb with a simple wrapper around the test case provided in the question:
$ cat delay.c
void delay(int num)
{
volatile int i;
for(i=0; i<num; i++);
}
$ gcc-4.6 -g -O1 -c delay.c
$ cat main.c
void delay(int);
int main(int argc, char **argv) {
delay(3);
}
$ gcc-4.6 -o test main.c delay.o
$ gdb ./test
.
.
.
(gdb)
Set a breakpoint at delay+0x18 and run to the second occurrence (where we expect i to be 1):
(gdb) break *delay+0x18
Breakpoint 1 at 0x103cc: file delay.c, line 4.
(gdb) run
Starting program: /home/pi/test
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb) cont
Continuing.
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb)
We know from the disassembly that i is four bytes above the stack pointer. Indeed, there it is:
(gdb) print *((int *)($r13 + 4))
$1 = 1
(gdb)
However, the bogus DWARF means that gdb looks in the wrong place:
(gdb) print i
$2 = 0
(gdb)
As explained above, the DWARF is incorrectly giving the location of i at four bytes below the stack pointer. There's a zero there, hence the reported value of i:
(gdb) print *((int *)($r13 - 4))
$3 = 0
(gdb)
This isn't a coincidence. A magic number written into this bogus location below the stack pointer reappears when gdb is asked to print i:
(gdb) set *((int *)($r13 - 4)) = 42
(gdb) print i
$6 = 42
(gdb)
Thus, at delay+0x18, the DWARF incorrectly encodes the location of i as r13 - 4 even though its true location is r13 + 4.
One can go a step further by editing the compilation unit by hand and replacing DW_OP_fbreg(-12) (bytes 0x91 0x74) with DW_OP_fbreg(-4) (bytes 0x91 0x7c). This gives
$ readelf --debug-dump=loc delay.modified.o
Contents of the .debug_loc section:
Offset Begin End Expression
00000000 00000000 00000004 (DW_OP_breg13 (r13): 0)
0000000c 00000004 00000038 (DW_OP_breg13 (r13): 8)
00000018 <End of list>
00000020 0000000c 00000020 (DW_OP_fbreg: -4)
0000002c 00000024 00000028 (DW_OP_reg3 (r3))
00000037 00000028 00000038 (DW_OP_fbreg: -4)
00000043 <End of list>
$
In other words, the DWARF has been corrected so that at, e.g., delay+0x18 the location of i is given as frame base - 4 = r13 + 8 - 4 = r13 + 4, matching the assembly. Repeating the gdb experiment with the corrected DWARF shows the expected value of i each time around the loop:
$ gcc-4.6 -o test.modified main.c delay.modified.o
$ gdb ./test.modified
.
.
.
(gdb) break *delay+0x18
Breakpoint 1 at 0x103cc: file delay.c, line 4.
(gdb) run
Starting program: /home/pi/test.modified
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb) print i
$1 = 0
(gdb) cont
Continuing.
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb) print i
$2 = 1
(gdb) cont
Continuing.
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb) print i
$3 = 2
(gdb) cont
Continuing.
[Inferior 1 (process 30954) exited with code 03]
(gdb)
I am not agree with the OP's asm analysis:
00000000 <delay>: ; so far, let's suppose sp = sp(0)
0: e24dd008 sub sp, sp, #8 ; sp = sp(0) - 8
4: e3a03000 mov r3, #0 ; r3 = 0
8: e58d3004 str r3, [sp, #4] ; store the value of r3 in (sp + 4)
c: e59d3004 ldr r3, [sp, #4] ; load (sp + 4) in r3
10: e1500003 cmp r0, r3 ; compare r3 and r0
14: da000005 ble 30 <delay+0x30> ; go to end of loop
18: e59d3004 ldr r3, [sp, #4] ; i is in r3, and it is being loaded from
; (sp + 4), that is,
; sp(i) = sp(0) - 8 + 4 = sp(0) - 4
1c: e2833001 add r3, r3, #1 ; r3 = r3 + 1, that is, increment i
20: e58d3004 str r3, [sp, #4] ; store i (which is in r3) in (sp + 4),
; being again sp(i) = sp(0) - 8 + 4 = \
; sp(0) - 4
24: e59d3004 ldr r3, [sp, #4] ; load sp + 4 in r3
28: e1530000 cmp r3, r0 ; compare r3 and r0
2c: bafffff9 blt 18 <delay+0x18> ; go to init of loop
30: e28dd008 add sp, sp, #8 ; sp = sp + 8
34: e12fff1e bx lr ;
So i is located in sp(0) - 4, which matchs with the dwarf analysis (which says that i is being located in 0 + 8 - 12)
Edit in order to add information regarding my DWARF analysis:
According to this line: 00000020 0000000c 00000020 (DW_OP_fbreg: -12) , being DW_OP_fbreg :
The DW_OP_fbreg operation provides a signed LEB128 offset from
the address specified by
the location description in the DW_AT_frame_base attribute of the
current function. (This is
typically a “stack pointer” register plus or minus some offset.
On more sophisticated systems
it might be a location list that adjusts the offset according to
changes in the stack pointer as
the PC changes.)
,the address is frame_base + offset, where:
frame_base : is the stack pointer +/- some offset, and according to the previous line (00000000 00000004 00000038 (DW_OP_breg13 (r13): 8)), from 00000004 to 00000038, it has an offset of +8 (r13 is SP)
offset: obviously it is -12
Given that, DWARF indicates that it is pointing to sp(0) + 8 - 12 = sp(0) - 4

How to explicitly assign a section in C code (like .text, .init, .fini) (mainly for arm)?

I am trying to make an embedded system. I have some C code, however, before the main function runs, some pre-initialization is needed. Is there a way to tell the gcc compiler, that a certain function is to be put in the .init section rather than the .text section?
this is the code:
#include <stdint.h>
#define REGISTERS_BASE 0x3F000000
#define MAIL_BASE 0xB880 // Base address for the mailbox registers
// This bit is set in the status register if there is no space to write into the mailbox
#define MAIL_FULL 0x80000000
// This bit is set in the status register if there is nothing to read from the mailbox
#define MAIL_EMPTY 0x40000000
struct Message
{
uint32_t messageSize;
uint32_t requestCode;
uint32_t tagID;
uint32_t bufferSize;
uint32_t requestSize;
uint32_t pinNum;
uint32_t on_off_switch;
uint32_t end;
};
struct Message m =
{
.messageSize = sizeof(struct Message),
.requestCode =0,
.tagID = 0x00038041,
.bufferSize = 8,
.requestSize =0,
.pinNum = 130,
.on_off_switch = 1,
.end = 0,
};
void _start()
{
__asm__
(
"mov sp, #0x8000 \n"
"b main"
);
}
/** Main function - we'll never return from here */
int main(void)
{
uint32_t mailbox = MAIL_BASE + REGISTERS_BASE + 0x18;
volatile uint32_t status;
do
{
status = *(volatile uint32_t *)(mailbox);
}
while((status & 0x80000000));
*(volatile uint32_t *)(MAIL_BASE + REGISTERS_BASE + 0x20) = ((uint32_t)(&m) & 0xfffffff0) | (uint32_t)(8);
while(1);
}
EDIT: using __attribute__(section("init")) doesn't seem to be working
Dont understand why you think you need a .init section for baremetal. A complete working example for a pi zero (using .init)
start.s
.section .init
.globl _start
_start:
mov sp,#0x8000
bl centry
b .
so.c
unsigned int data=5;
unsigned int bss;
unsigned int centry ( void )
{
return(0);
}
so.ld
MEMORY
{
ram : ORIGIN = 0x8000, LENGTH = 0x1000
}
SECTIONS
{
.init : { *(.init*) } > ram
.text : { *(.text*) } > ram
.bss : { *(.bss*) } > ram
.data : { *(.data*) } > ram
}
build
arm-none-eabi-as start.s -o start.o
arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-ld -T so.ld start.o so.o -o so.elf
arm-none-eabi-objdump -D so.elf
Disassembly of section .init:
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: eb000000 bl 800c <centry>
8008: eafffffe b 8008 <_start+0x8>
Disassembly of section .text:
0000800c <centry>:
800c: e3a00000 mov r0, #0
8010: e12fff1e bx lr
Disassembly of section .bss:
00008014 <bss>:
8014: 00000000 andeq r0, r0, r0
Disassembly of section .data:
00008018 <data>:
8018: 00000005 andeq r0, r0, r5
Note if you do it right you dont need to init .bss in the bootstrap (put .data after .bss and make sure there is at least one item in .data)
hexdump -C so.bin
00000000 02 d9 a0 e3 00 00 00 eb fe ff ff ea 00 00 a0 e3 |................|
00000010 1e ff 2f e1 00 00 00 00 05 00 00 00 |../.........|
0000001c
if you want them in separate places then yes your linker script gets instantly more complicated as well as your bootstrap (with lots of room for error).
The only thing the extra work that .init buys you here IMO is that you can re-arrange the linker command line
arm-none-eabi-ld -T so.ld so.o start.o -o so.elf
get rid of .init all together
Disassembly of section .text:
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: eb000000 bl 800c <centry>
8008: eafffffe b 8008 <_start+0x8>
0000800c <centry>:
800c: e3a00000 mov r0, #0
8010: e12fff1e bx lr
Disassembly of section .bss:
00008014 <bss>:
8014: 00000000 andeq r0, r0, r0
Disassembly of section .data:
00008018 <data>:
8018: 00000005 andeq r0, r0, r5
No issues, works fine. Just have to know that with gnu ld (and probably others) if you dont call out something in the linker script then it fills things in in the order presented (on the command line).
Whether or not a compiler uses sections or what they are named is compiler specific, so you have to dig into the compiler specific options to see if there are any to change the defaults. Using C to bootstrap C is more work than it is worth, gcc will accept assembly files if it is a Makefile issue you are having problems with, very rarely is there a reason to use inline assembly when you can use real assembly and have something more reliable and maintainable. In real assembly then these things are trivial.
.section .helloworld
.globl _start
_start:
mov sp,#0x8000
bl centry
b .
Disassembly of section .helloworld:
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: ebfffffd bl 8000 <_start>
8008: eafffffe b 8008 <bss>
Disassembly of section .text:
00008000 <centry>:
8000: e3a00000 mov r0, #0
8004: e12fff1e bx lr
Disassembly of section .bss:
00008008 <bss>:
8008: 00000000 andeq r0, r0, r0
Disassembly of section .data:
0000800c <data>:
800c: 00000005 andeq r0, r0, r5
real assembly is generally used for bootstrapping, no compiler games required, no needing to go back and maintain the code every so often because of compiler games, porting is easier, etc.

Resources