Where is the implementation of func append in Go? - go

I'm very interested in go, and trying to read go function's implementations. I found some of these function doesn't have implementations there.
Such as append or call:
// The append built-in function appends elements to the end of a slice. If
// it has sufficient capacity, the destination is resliced to accommodate the
// new elements. If it does not, a new underlying array will be allocated.
// Append returns the updated slice. It is therefore necessary to store the
// result of append, often in the variable holding the slice itself:
// slice = append(slice, elem1, elem2)
// slice = append(slice, anotherSlice...)
// As a special case, it is legal to append a string to a byte slice, like this:
// slice = append([]byte("hello "), "world"...)
func append(slice []Type, elems ...Type) []Type
// call calls fn with a copy of the n argument bytes pointed at by arg.
// After fn returns, reflectcall copies n-retoffset result bytes
// back into arg+retoffset before returning. If copying result bytes back,
// the caller must pass the argument frame type as argtype, so that
// call can execute appropriate write barriers during the copy.
func call(argtype *rtype, fn, arg unsafe.Pointer, n uint32, retoffset uint32)
It seems not calling a C code, because using cgo needs some special comments.
Where is these function's implementations?

The code you are reading and citing is just dummy code to have consistent documentation. The built-in functions are, well, built into the language and, as such, are included in the code processing step (the compiler).
Simplified what happens is: lexer will detect 'append(...)' as APPEND token, parser will translate APPEND, depending on the circumstances/parameters/environment to code, code is written as assembly and assembled. The middle step - the implementation of append - can be found in the compiler here.
What happens to an append call is best seen when looking at the assembly of an example program. Consider this:
b := []byte{'a'}
b = append(b, 'b')
println(string(b), cap(b))
Running it will yield the following output:
ab 2
The append call is translated to assembly like this:
// create new slice object
MOVQ BX, "".b+120(SP) // BX contains data addr., write to b.addr
MOVQ BX, CX // store addr. in CX
MOVQ AX, "".b+128(SP) // AX contains len(b) == 1, write to b.len
MOVQ DI, "".b+136(SP) // DI contains cap(b) == 1, write to b.cap
MOVQ AX, BX // BX now contains len(b)
INCQ BX // BX++
CMPQ BX, DI // compare new length (2) with cap (1)
JHI $1, 225 // jump to grow code if len > cap
...
LEAQ (CX)(AX*1), BX // load address of newly allocated slice entry
MOVB $98, (BX) // write 'b' to loaded address
// grow code, call runtime.growslice(t *slicetype, old slice, cap int)
LEAQ type.[]uint8(SB), BP
MOVQ BP, (SP) // load parameters onto stack
MOVQ CX, 8(SP)
MOVQ AX, 16(SP)
MOVQ SI, 24(SP)
MOVQ BX, 32(SP)
PCDATA $0, $0
CALL runtime.growslice(SB) // call
MOVQ 40(SP), DI
MOVQ 48(SP), R8
MOVQ 56(SP), SI
MOVQ R8, AX
INCQ R8
MOVQ DI, CX
JMP 108 // jump back, growing done
As you can see, no CALL statement to a function called append can be seen. This is the full implementation of the append call in the example code. Another call with different parameters will look differently (other registers, different parameters depending on the slice type, etc.).

The Go append builtin function code is generated by the Go gc and gccgo compilers and uses Go package runtime functions (for example, runtime.growslice()) in go/src/runtime/slice.go.
For example,
package main
func main() {
b := []int{0, 1}
b = append(b, 2)
}
Go pseudo-assembler:
$ go tool compile -S a.go
"".main t=1 size=192 value=0 args=0x0 locals=0x68
0x0000 00000 (a.go:3) TEXT "".main(SB), $104-0
0x0000 00000 (a.go:3) MOVQ (TLS), CX
0x0009 00009 (a.go:3) CMPQ SP, 16(CX)
0x000d 00013 (a.go:3) JLS 167
0x0013 00019 (a.go:3) SUBQ $104, SP
0x0017 00023 (a.go:3) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0017 00023 (a.go:3) FUNCDATA $1, gclocals·790e5cc5051fc0affc980ade09e929ec(SB)
0x0017 00023 (a.go:4) LEAQ "".autotmp_0002+64(SP), BX
0x001c 00028 (a.go:4) MOVQ BX, CX
0x001f 00031 (a.go:4) NOP
0x001f 00031 (a.go:4) MOVQ "".statictmp_0000(SB), BP
0x0026 00038 (a.go:4) MOVQ BP, (BX)
0x0029 00041 (a.go:4) MOVQ "".statictmp_0000+8(SB), BP
0x0030 00048 (a.go:4) MOVQ BP, 8(BX)
0x0034 00052 (a.go:4) NOP
0x0034 00052 (a.go:4) MOVQ $2, AX
0x003b 00059 (a.go:4) MOVQ $2, DX
0x0042 00066 (a.go:5) MOVQ CX, "".b+80(SP)
0x0047 00071 (a.go:5) MOVQ AX, "".b+88(SP)
0x004c 00076 (a.go:5) MOVQ DX, "".b+96(SP)
0x0051 00081 (a.go:5) MOVQ AX, BX
0x0054 00084 (a.go:5) INCQ BX
0x0057 00087 (a.go:5) CMPQ BX, DX
0x005a 00090 (a.go:5) JHI $1, 108
0x005c 00092 (a.go:5) LEAQ (CX)(AX*8), BX
0x0060 00096 (a.go:5) MOVQ $2, (BX)
0x0067 00103 (a.go:6) ADDQ $104, SP
0x006b 00107 (a.go:6) RET
0x006c 00108 (a.go:5) LEAQ type.[]int(SB), BP
0x0073 00115 (a.go:5) MOVQ BP, (SP)
0x0077 00119 (a.go:5) MOVQ CX, 8(SP)
0x007c 00124 (a.go:5) MOVQ AX, 16(SP)
0x0081 00129 (a.go:5) MOVQ DX, 24(SP)
0x0086 00134 (a.go:5) MOVQ BX, 32(SP)
0x008b 00139 (a.go:5) PCDATA $0, $0
0x008b 00139 (a.go:5) CALL runtime.growslice(SB)
0x0090 00144 (a.go:5) MOVQ 40(SP), CX
0x0095 00149 (a.go:5) MOVQ 48(SP), AX
0x009a 00154 (a.go:5) MOVQ 56(SP), DX
0x009f 00159 (a.go:5) MOVQ AX, BX
0x00a2 00162 (a.go:5) INCQ BX
0x00a5 00165 (a.go:5) JMP 92
0x00a7 00167 (a.go:3) CALL runtime.morestack_noctxt(SB)
0x00ac 00172 (a.go:3) JMP 0

To add to the assembly code given by the others, you can find the Go (1.5.1) code for gc there : https://github.com/golang/go/blob/f2e4c8b5fb3660d793b2c545ef207153db0a34b1/src/cmd/compile/internal/gc/walk.go#L2895
// expand append(l1, l2...) to
// init {
// s := l1
// if n := len(l1) + len(l2) - cap(s); n > 0 {
// s = growslice_n(s, n)
// }
// s = s[:len(l1)+len(l2)]
// memmove(&s[len(l1)], &l2[0], len(l2)*sizeof(T))
// }
// s
//
// l2 is allowed to be a string.
with growslice_n being defined there : https://github.com/golang/go/blob/f2e4c8b5fb3660d793b2c545ef207153db0a34b1/src/runtime/slice.go#L36
// growslice_n is a variant of growslice that takes the number of new elements
// instead of the new minimum capacity.
// TODO(rsc): This is used by append(slice, slice...).
// The compiler should change that code to use growslice directly (issue #11419).
func growslice_n(t *slicetype, old slice, n int) slice {
if n < 1 {
panic(errorString("growslice: invalid n"))
}
return growslice(t, old, old.cap+n)
}
// growslice handles slice growth during append.
// It is passed the slice type, the old slice, and the desired new minimum capacity,
// and it returns a new slice with at least that capacity, with the old data
// copied into it.
func growslice(t *slicetype, old slice, cap int) slice {
if cap < old.cap || t.elem.size > 0 && uintptr(cap) > _MaxMem/uintptr(t.elem.size) {
panic(errorString("growslice: cap out of range"))
}
if raceenabled {
callerpc := getcallerpc(unsafe.Pointer(&t))
racereadrangepc(old.array, uintptr(old.len*int(t.elem.size)), callerpc, funcPC(growslice))
}
et := t.elem
if et.size == 0 {
// append should not create a slice with nil pointer but non-zero len.
// We assume that append doesn't need to preserve old.array in this case.
return slice{unsafe.Pointer(&zerobase), old.len, cap}
}
newcap := old.cap
if newcap+newcap < cap {
newcap = cap
} else {
for {
if old.len < 1024 {
newcap += newcap
} else {
newcap += newcap / 4
}
if newcap >= cap {
break
}
}
}
if uintptr(newcap) >= _MaxMem/uintptr(et.size) {
panic(errorString("growslice: cap out of range"))
}
lenmem := uintptr(old.len) * uintptr(et.size)
capmem := roundupsize(uintptr(newcap) * uintptr(et.size))
newcap = int(capmem / uintptr(et.size))
var p unsafe.Pointer
if et.kind&kindNoPointers != 0 {
p = rawmem(capmem)
memmove(p, old.array, lenmem)
memclr(add(p, lenmem), capmem-lenmem)
} else {
// Note: can't use rawmem (which avoids zeroing of memory), because then GC can scan uninitialized memory.
p = newarray(et, uintptr(newcap))
if !writeBarrierEnabled {
memmove(p, old.array, lenmem)
} else {
for i := uintptr(0); i < lenmem; i += et.size {
typedmemmove(et, add(p, i), add(old.array, i))
}
}
}
return slice{p, old.len, newcap}
}

Related

Is there any way to find from a binary file means from assembly code that whether in a function, an array parameter has been passed or not?

I am doing binary ananlysis and I hav many objectives like to know the types of function paramter whether its an array or pointer or char or int type. Also, my objctive is to find whether an array has been passed as a function parameter.
Here is my C program.
#include<stdio.h>
#include<string.h>
void concat(char s1[], char s2[]) {
int i, j;
i = strlen(s1);
for (j = 0; s2[j] != '\0'; i++, j++) {
s1[i] = s2[j];
}
s1[i] = '\0';
}
int main() {
char a[] = "Hello";
char b[] = "World";
concat(a, b);
return (0);
}
Here is my assembly code
0000000000001169 <concat>:
1169: endbr64
116d: pushq %rbp
116e: movq %rsp,%rbp
1171: subq $0x20,%rsp
1175: movq %rdi,-0x18(%rbp)
1179: movq %rsi,-0x20(%rbp)
117d: movq -0x18(%rbp),%rax
1181: movq %rax,%rdi
1184: callq 1060 <strlen#plt> //since calls destination is eax //
1189: movl %eax,-0x8(%rbp)
118c: movl $0x0,-0x4(%rbp) //local argument
1193: jmp 11bc <concat+0x53>
1195: movl -0x4(%rbp),%eax
1198: movslq %eax,%rdx //hint that 32bit source moved to 64 bit sign extended
119b: movq -0x20(%rbp),%rax
119f: addq %rdx,%rax
11a2: movl -0x8(%rbp),%edx
11a5: movslq %edx,%rcx
11a8: movq -0x18(%rbp),%rdx
11ac: addq %rcx,%rdx
11af: movzbl (%rax),%eax
11b2: movb %al,(%rdx)
11b4: addl $0x1,-0x8(%rbp)
11b8: addl $0x1,-0x4(%rbp)
11bc: movl -0x4(%rbp),%eax
11bf: movslq %eax,%rdx
11c2: movq -0x20(%rbp),%rax
11c6: addq %rdx,%rax
11c9: movzbl (%rax),%eax
11cc: testb %al,%al
11ce: jne 1195 <concat+0x2c>
11d0: movl -0x8(%rbp),%eax
11d3: movslq %eax,%rdx
11d6: movq -0x18(%rbp),%rax
11da: addq %rdx,%rax
11dd: movb $0x0,(%rax)
11e0: nop
11e1: leaveq
11e2: retq
From the assembly code how can I tell whether an array has been passed or not in a function?
I am trying to understand the reverse enginnering tools.

Decoding compiler output for interface run type assertion

I recently encountered empty interfaces while using the Load() method of Atomic.Value . I was experimenting with empty interfaces type assertion a bit - https://play.golang.org/p/CLyY2y9-2VF
This piqued my interest, and I decided to take a peek behind the curtains to see what actions does a compiler take so that the code doesn't panic in case of trying to read the concrete value on a nil interface {} (e.g., when you call Load.(type) when Store hasn't been called yet).
I could see that in the unsafe version, compiler had this assembly instruction that cause the panic : call runtime.panicdottypeE(SB)
The panic instruction is obviously not present in the safe version. Can someone please explain this in more details on what compiler is doing when we capture return value with ok (and perhaps point me to the corresponding assembly instructions in the godbolt link)?
Here are the godbolt compiler links for unsafe version [1] and safe version [2].
[1] https://godbolt.org/z/76onvj
[2] https://godbolt.org/z/e8aoqe
Empty interface type (called eface in runtime package) is 2 pointers, first one to underlying type (e.g. bool type, int type, YourStruct type, ...) second one is a pointer to data (IIRC in some cases the data itself).
Unsafe version:
call "".returnEmptyInterface(SB) Call the function
pcdata $0, $1 pcdata is not a real instruction, ignore
movq 8(SP), AX AX <- pointer to data
movq (SP), CX CX <- pointer to type
pcdata $0, $2
leaq type.bool(SB), DX DX <- pointer to bool type
cmpq CX, DX Compare CX and DX
jne main_pc156 If they are not equal jump to main_pc156
In main_pc156 compiler will call runtime.panicdottypeE that will basically panic. Source code from runtime/iface.go (can find it in your $GOPATH/src):
func panicdottypeE(have, want, iface *_type) {
panic(&TypeAssertionError{iface, have, want, ""})
}
Safe version:
pcdata $0, $0
pcdata $1, $0
call "".returnEmptyInterface(SB) Call the function
pcdata $0, $1
movq 8(SP), AX AX <- pointer to data
movq (SP), CX CX <- pointer to type
main_pc47:
pcdata $0, $2
leaq type.bool(SB), DX DX <- pointer to bool type
cmpq CX, DX Compare CX and DX
jne main_pc186 If not equal jump main_pc186
pcdata $0, $3
movblzx (AX), AX AX <- dereference AX
main_pc62:
and in main_pc186
pcdata $0, $3
xorl AX, AX AX <- 0
jmp main_pc62 Jump back at the end of previous block
Here AX corresponds to x in the code but what corresponds to ok in code? Nothing! If you check code of println you see:
cmpq CX, DX Compare CX and DX
seteq AL AL <- 1 if CX equal to DX otherwise 0
So compiler decided to compare them again when printing.
The quick summary: they do exactly the same thing, it's just that the case for ok == false is different.
The two bits have the following in common:
pcdata $0, $0
pcdata $1, $0
call "".returnEmptyInterface(SB)
pcdata $0, $1
movq 8(SP), AX
movq (SP), CX
pcdata $0, $2
leaq type.bool(SB), DX
cmpq CX, DX
jne main_pc156 <==== jump
pcdata $0, $3
movblzx (AX), AX
The code found in main_pc156 is the part we care about. As you noticed, for the single-value type assertion, this is:
main_pc156:
movq DX, 8(SP)
pcdata $0, $1
leaq type.interface {}(SB), AX
pcdata $0, $0
movq AX, 16(SP)
call runtime.panicdottypeE(SB)
xchgl AX, AX
There's no escaping this, once we jump to main_pc156, we panic.
On the other hand, the code for the two-value type assertion is:
main_pc186:
pcdata $0, $3
xorl AX, AX
jmp main_pc62
This is massively different from the previous case, and takes us right back to the end of the first bit of code, resuming execution.

How to save registers state just before instruction call?

For arbitrary function in C, how to save registers state just before the call instruction?
Consider the example below (assume it's x86_64),
// We don't know what the `func` is.
#define run(func) \
do {func;} while(0)
// Say the `func` is `add`
int64_t add(int64_t a, int64_t b) {
return a + b
}
int main(void) {
run(add(1, 2));
}
The generated assembly code in gcc may be something like,
main:
push %rbp
movq %rsp, %rbp
movq $1, %rdi
movq $2, %rsi
# The question is:
# how can we save registers like %rdi, %rsi etc. here in C code?
call add
movq %rbp, %rsp
popq %rbp

Never triggered if statements make code execution in benchmark faster? Why?

I have recently started the Go track on exercism.io and had fun optimizing the "nth-prime" calculation. Actually I came across a funny fact I can't explain. Imagine the following code:
// Package prime provides ...
package prime
// Nth function checks for the prime number on position n
func Nth(n int) (int, bool) {
if n <= 0 {
return 0, false
}
if (n == 1) {
return 2, true
}
currentNumber := 1
primeCounter := 1
for n > primeCounter {
currentNumber+=2
if isPrime(currentNumber) {
primeCounter++
}
}
return currentNumber, primeCounter==n
}
// isPrime function checks if a number
// is a prime number
func isPrime(n int) bool {
//useless because never triggered but makes it faster??
if n < 2 {
println("n < 2")
return false
}
//useless because never triggered but makes it faster??
if n%2 == 0 {
println("n%2")
return n==2
}
for i := 3; i*i <= n; i+=2 {
if n%i == 0 {
return false
}
}
return true
}
In the private function isPrime I have two initial if-statements that are never triggered, because I only give in uneven numbers greater than 2. The benchmark returns following:
Running tool: /usr/bin/go test -benchmem -run=^$ -bench ^(BenchmarkNth)$
BenchmarkNth-8 100 18114825 ns/op 0 B/op 0
If I remove the never triggered if-statements the benchmark goes slower:
Running tool: /usr/bin/go test -benchmem -run=^$ -bench ^(BenchmarkNth)$
BenchmarkNth-8 50 21880749 ns/op 0 B/op 0
I have run the benchmark multiple times changing the code back and forth always getting more or less the same numbers and I can't think of a reason why these two if-statements should make the execution faster. Yes it is micro-optimization, but I want to know: Why?
Here is the whole exercise from exercism with test-cases: nth-prime
Go version i am using is 1.12.1 linux/amd64 on a manjaro i3 linux
What happens is the compiler is guaranteed with some assertions about the input when those if's are added.
If those assertions are lifted, the compiler has to add it himself. The way it does it is by validating it on each iteration. We can take a look at the assembly code to prove it. (by passing -gcflags=-S to the go test command)
With the if's:
0x004b 00075 (func.go:16) JMP 81
0x004d 00077 (func.go:16) LEAQ 2(BX), AX
0x0051 00081 (func.go:16) MOVQ AX, DX
0x0054 00084 (func.go:16) IMULQ AX, AX
0x0058 00088 (func.go:16) CMPQ AX, CX
0x005b 00091 (func.go:16) JGT 133
0x005d 00093 (func.go:17) TESTQ DX, DX
0x0060 00096 (func.go:17) JEQ 257
0x0066 00102 (func.go:17) MOVQ CX, AX
0x0069 00105 (func.go:17) MOVQ DX, BX
0x006c 00108 (func.go:17) CQO
0x006e 00110 (func.go:17) IDIVQ BX
0x0071 00113 (func.go:17) TESTQ DX, DX
0x0074 00116 (func.go:17) JNE 77
Without the if's:
0x0016 00022 (func.go:16) JMP 28
0x0018 00024 (func.go:16) LEAQ 2(BX), AX
0x001c 00028 (func.go:16) MOVQ AX, DX
0x001f 00031 (func.go:16) IMULQ AX, AX
0x0023 00035 (func.go:16) CMPQ AX, CX
0x0026 00038 (func.go:16) JGT 88
0x0028 00040 (func.go:17) TESTQ DX, DX
0x002b 00043 (func.go:17) JEQ 102
0x002d 00045 (func.go:17) MOVQ CX, AX
0x0030 00048 (func.go:17) MOVQ DX, BX
0x0033 00051 (func.go:17) CMPQ BX, $-1
0x0037 00055 (func.go:17) JEQ 64
0x0039 00057 (func.go:17) CQO
0x003b 00059 (func.go:17) IDIVQ BX
0x003e 00062 (func.go:17) JMP 69
0x0040 00064 func.go:17) NEGQ AX
0x0043 00067 (func.go:17) XORL DX, DX
0x0045 00069 (func.go:17) TESTQ DX, DX
0x0048 00072 (func.go:17) JNE 24
Line 51 in the assembly code 0x0033 00051 (func.go:17) CMPQ BX, $-1 is the culprit.
Line 16, for i := 3; i*i <= n; i+=2, in the original Go code, is translated the same for both cases. But line 17 if n%i == 0 that runs every iteration compiles to more instructions and as a result more work for the CPU in total.
Something similar in the encoding/base64 package by ensuring the loop won't receive a nil value. You can take a look here:
https://go-review.googlesource.com/c/go/+/151158/3/src/encoding/base64/base64.go
This check was added intentionally. In your case, you optimized it accidentally :)

Golang `copy` time complexity

I was wondering about the time complexity of go's copy function?
Intuitively I would assume the worst case of linear time. But I was wondering if there was any magic that was able to bulk allocate, or something, which would allow it to perform better?
https://golang.org/ref/spec#Appending_and_copying_slices
I figured the assembly would explain something but I'm not sure what I"m reading :p
$ GOOS=linux GOARCH=amd64 go tool compile -S main.go
func main() {
src := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
dst := make([]int, len(src))
numCopied := copy(dst, src)
if numCopied != 10 {
panic(fmt.Sprintf("expected 5 copied received: %d", numCopied))
}
}
With the following output from the copy line:
0x007a 00122 (main.go:23) CMPQ AX, $10
0x007e 00126 (main.go:23) JLE 133
0x0080 00128 (main.go:23) MOVL $10, AX
0x0085 00133 (main.go:23) MOVQ AX, "".numCopied+56(SP)
0x008a 00138 (main.go:23) MOVQ CX, (SP)
0x008e 00142 (main.go:23) LEAQ ""..autotmp_8+72(SP), CX
0x0093 00147 (main.go:23) MOVQ CX, 8(SP)
0x0098 00152 (main.go:23) SHLQ $3, AX
0x009c 00156 (main.go:23) MOVQ AX, 16(SP)
0x00a1 00161 (main.go:23) PCDATA $0, $0
0x00a1 00161 (main.go:23) CALL runtime.memmove(SB)
0x00a6 00166 (main.go:23) MOVQ "".numCopied+56(SP), AX
I then tried with 5 elements as well:
func main() {
src := []int{1, 2, 3, 4, 5}
dst := make([]int, len(src))
numCopied := copy(dst, src)
if numCopied != 5 {
panic(fmt.Sprintf("expected 5 copied received: %d", numCopied))
}
}
With the following output from the copy line:
0x0086 00134 (main.go:9) CMPQ AX, $5
0x008a 00138 (main.go:9) JLE 145
0x008c 00140 (main.go:9) MOVL $5, AX
0x0091 00145 (main.go:9) MOVQ AX, "".numCopied+56(SP)
0x0096 00150 (main.go:9) MOVQ CX, (SP)
0x009a 00154 (main.go:9) LEAQ ""..autotmp_8+72(SP), CX
0x009f 00159 (main.go:9) MOVQ CX, 8(SP)
0x00a4 00164 (main.go:9) SHLQ $3, AX
0x00a8 00168 (main.go:9) MOVQ AX, 16(SP)
0x00ad 00173 (main.go:9) PCDATA $0, $0
0x00ad 00173 (main.go:9) CALL runtime.memmove(SB)
0x00b2 00178 (main.go:9) MOVQ "".numCopied+56(SP), AX
I suggest benchmarking the time it takes to copy array/slices of different sizes. Here is something to get the ball rolling:
package main
import (
"fmt"
"math"
"testing"
)
func main() {
for i := 0; i < 16; i++ {
size := powerOfTwo(i)
runBench(size)
}
}
func runBench(size int) {
bench := func(b *testing.B) {
src := make([]int, size, size)
dst := make([]int, size, size)
// we don't want to measure the time
// it takes to make the arrays, so reset timer
b.ResetTimer()
for i := 0; i < b.N; i++ {
copy(dst, src)
}
}
fmt.Printf("size = %d, %s", size, testing.Benchmark(bench))
}
func powerOfTwo(i int) int {
return int(math.Pow(float64(2), float64(i)))
}

Resources