For instance, github.com/yhat/scrape suggests using a closure like this:
func someFunc() {
...
matcher := func(n *html.Node) bool {
return n.DataAtom == atom.Body
}
body, ok := scrape.Find(root, matcher)
...
}
Since matcher doesn’t actually capture any local variables, this could equivalently be written as:
func someFunc() {
...
body, ok := scrape.Find(root, matcher)
...
}
func matcher(n *html.Node) bool {
return n.DataAtom == atom.Body
}
The first form looks better, because the matcher function is quite specific to that place in the code. But does it perform worse at runtime (assuming someFunc may be called often)?
I guess there must be some overhead to creating a closure, but this kind of closure could be optimized into a regular function by the compiler?
(Obviously the language spec doesn’t require this; I’m interested in what gc actually does.)
It seems like there is no difference. We can check in the generated machine code.
Here is a toy program:
package main
import "fmt"
func topLevelFunction(x int) int {
return x + 4
}
func useFunction(fn func(int) int) {
fmt.Println(fn(10))
}
func invoke() {
innerFunction := func(x int) int {
return x + 8
}
useFunction(topLevelFunction)
useFunction(innerFunction)
}
func main() {
invoke()
}
And here is its disassembly:
$ go version
go version go1.8.5 linux/amd64
$ go tool objdump -s 'main\.(invoke|topLevel)' bin/toy
TEXT main.topLevelFunction(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
toy.go:6 0x47b7a0 488b442408 MOVQ 0x8(SP), AX
toy.go:6 0x47b7a5 4883c004 ADDQ $0x4, AX
toy.go:6 0x47b7a9 4889442410 MOVQ AX, 0x10(SP)
toy.go:6 0x47b7ae c3 RET
TEXT main.invoke(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
toy.go:13 0x47b870 64488b0c25f8ffffff FS MOVQ FS:0xfffffff8, CX
toy.go:13 0x47b879 483b6110 CMPQ 0x10(CX), SP
toy.go:13 0x47b87d 7638 JBE 0x47b8b7
toy.go:13 0x47b87f 4883ec10 SUBQ $0x10, SP
toy.go:13 0x47b883 48896c2408 MOVQ BP, 0x8(SP)
toy.go:13 0x47b888 488d6c2408 LEAQ 0x8(SP), BP
toy.go:17 0x47b88d 488d052cfb0200 LEAQ 0x2fb2c(IP), AX
toy.go:17 0x47b894 48890424 MOVQ AX, 0(SP)
toy.go:17 0x47b898 e813ffffff CALL main.useFunction(SB)
toy.go:14 0x47b89d 488d0514fb0200 LEAQ 0x2fb14(IP), AX
toy.go:18 0x47b8a4 48890424 MOVQ AX, 0(SP)
toy.go:18 0x47b8a8 e803ffffff CALL main.useFunction(SB)
toy.go:19 0x47b8ad 488b6c2408 MOVQ 0x8(SP), BP
toy.go:19 0x47b8b2 4883c410 ADDQ $0x10, SP
toy.go:19 0x47b8b6 c3 RET
toy.go:13 0x47b8b7 e874f7fcff CALL runtime.morestack_noctxt(SB)
toy.go:13 0x47b8bc ebb2 JMP main.invoke(SB)
TEXT main.invoke.func1(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
toy.go:15 0x47b8f0 488b442408 MOVQ 0x8(SP), AX
toy.go:15 0x47b8f5 4883c008 ADDQ $0x8, AX
toy.go:15 0x47b8f9 4889442410 MOVQ AX, 0x10(SP)
toy.go:15 0x47b8fe c3 RET
As we can see, at least in this simple case, there is no structural difference in how topLevelFunction and innerFunction (invoke.func1), and their passing to useFunction, are translated to machine code.
(It is instructive to compare this to the case where innerFunction does capture a local variable; and to the case where, moreover, innerFunction is passed via a global variable rather than a function argument — but these are left as an exercise to the reader.)
It generally should. And probably even more so with compiler optimization taken into account (as reasoning about a function is generally easier then about a closure, so I would expect a compiler to tend to optimize a function more often then an equivalent closure). But it is not exactly black and white as many factors may affect the final code produced, including your platform and version of the compiler itself. And more importantly, your other code will typically affect performance much more then speed of making a call (both algorithm wise and lines of code wise), which seems to be the point JimB made.
For example, I wrote following sample code and then benchmarked it.
var (
test int64
)
const (
testThreshold = int64(1000000000)
)
func someFunc() {
test += 1
}
func funcTest(threshold int64) int64 {
test = 0
for i := int64(0); i < threshold; i++ {
someFunc()
}
return test
}
func closureTest(threshold int64) int64 {
someClosure := func() {
test += 1
}
test = 0
for i := int64(0); i < threshold; i++ {
someClosure()
}
return test
}
func closureTestLocal(threshold int64) int64 {
var localTest int64
localClosure := func() {
localTest += 1
}
localTest = 0
for i := int64(0); i < threshold; i++ {
localClosure()
}
return localTest
}
On my laptop, funcTest takes 2.0 ns per iteration, closureTest takes 2.2 ns and
closureTestLocal takes 1.9ns. Here, closureTest vs funcTest appears confirming your (and mine) assumption that a closure call will be slower then a function call. But please note that those test functions were intentionally made simple and small to make call speed difference to stand out and it's still only 10% difference. In fact, checking compiler output shows that actually in funcTest case compiler did inline funcTest instead of calling it. So, I would expect the difference be even smaller if it didn't. But more importantly, I'd like to point out that closureTestLocal is 5% faster then the (inlined) function even though this one is actually a capturing closure. Please note that neither of the closures was inlined or optimized out - both closure tests faithfully make all the calls. The only difference I see in the compiled code for local closure case operates completely on the stack, while both other functions access a global variable (somewhere in memory) by it's address. But whilst I easily can reason about the difference by looking at the compiled code, my point is - it's not exactly black and white even in the simplest cases.
So, if speed is really that important in your case, I would suggest benchmarking it instead (and with actual code). You also could use go tool objdump to analyze actual code produced to get a clue where difference comes from. But as a rule of thumb, I would suggest to rather focus on writing better code (whatever that means for you) and ignore speed of actual calls (as in "avoid premature optimization").
I don't think scope of function declaration can harm performance. Also it's common to inline lambda in call. I'd write it
body, ok := scrape.Find(root, func (n *html.Node) bool {return n.DataAtom == atom.Body})
Related
Defining an interface type to type parameters like this:
func CallByteWriterGen[W io.ByteWriter](w W, bytes []byte) {
_ = w.WriteByte(bytes[0])
}
...causes extra pointer dereference through dictionary (passed using AX):
MOVQ 0x10(AX), DX // <-- extra pointer dereference
MOVQ 0x18(DX), DX
MOVZX 0(CX), CX
MOVQ BX, AX
MOVL CX, BX
CALL DX
What might be the benefits that cannot be achieved by simply using an interface argument, like this:
func CallByteWriter(w io.ByteWriter, bytes []byte) {
_ = w.WriteByte(bytes[0])
}
The interface version is idiomatic, not the type parameter one - use an interface where an interface is called for.
See the When to use Generics blog post for additional information and details, specifically the section Don’t replace interface types with type parameters:
For example, it might be tempting to change the first function
signature here, which uses just an interface type, into the second
version, which uses a type parameter.
func ReadSome(r io.Reader) ([]byte, error)
func ReadSome[T io.Reader](r T) ([]byte, error)
Don’t make that kind of change. Omitting the type parameter makes the
function easier to write, easier to read, and the execution time will
likely be the same.
In general don’t replace interfaces with type parameters when the interface is used to abstract behavior and the dynamic types are actually irrelevant within the function body.
As for usage, it may not change much at call site. Either way you can pass into the function argument only something that implements io.ByteWriter, just like you do without type parameters.
The differences become relevant when the dynamic types of the interface are interesting. The static type of an io.ByteWriter argument is just io.ByteWriter, and to retrieve the dynamic type you have to use an assertion w.(*bytes.Buffer) which may panic, or a type switch; whereas with type parameters, the function deals directly with the concrete type W.
There’s at least two use cases for this. Within the function body:
you happen to declare new values of those concrete types
you happen to do comparisons
io.ByteWriter is a bad example in either case, because you are going to use that one precisely for abstracting some behavior rather than for its dynamic type, so that’s not a good candidate for type parametrization.
There may be other situations like pre-Go1.18 "generic" code based on interface{}, or perhaps protobuffers, where proto.Message is an interface, factory patterns, unit test helpers, etc. where the dynamic types are interesting.
Then, clumsy occurrences of reflect.New or reflect.Zero to create new values can be replaced by new(T) or var x T. Demo:
func newFrom(v Setter) Setter {
return reflect.Zero(reflect.TypeOf(v)).Interface().(Setter)
}
vs.
func newFrom[T Setter](v T) T {
return *new(T)
}
About comparisons, interfaces do support equality operators == and != natively, but the comparison may just panic if the dynamic values are not comparable. With type parameters instead the comparison between method-only interfaces simply won't compile unless you explicitly add comparable to the constraint, thus improving code safety.
// compiles, might panic
func equal(v, w Setter) bool {
return v == w
}
vs.
// doesn't compile, must add comparable explicitly
func equal[T Setter](v, w T) bool {
return v == w
}
This doesn't mean type parameters are always appropriate for these use cases either, but you may consider them as candidates.
The crate in the title is byteorder.
Here is how we can read binary data from std::io::BufReader. BufReader implements the std::io::Read trait. There is an implementation of byteorder::ReadBytesExt for any type implementing Read. ReadBytesExt contains read_u16 and other methods that read binary data. This implementation:
fn read_u16<T: ByteOrder>(&mut self) -> Result<u16> {
let mut buf = [0; 2];
self.read_exact(&mut buf)?;
Ok(T::read_u16(&buf))
}
It passes a reference to buf to BufReader; I suppose it passes the address of buf in the stack. Hence the resulting u16 is transferred from the internal buffer of BufReader (memory) to buf above (memory), probably, using memcpy or something. Wouldn't it be more efficient if BufReader implemented ReadBytesExt by reading data from its internal buffer directly? Or the compiler optimizes buf away?
TL;DR: It's all up to the Optimization Gods, but it should be efficient.
The key optimization here is inlining, as usual, and the probabilities are on our side, but who knows...
As long as the call to read_exact is inlined, it should just work.
Firstly, it can be inlined. In Rust, "inner" calls are always statically dispatched -- there's no inheritance -- and therefore the type of the receiver (self) in self.read_exact is known at compile-time. As a result, the exact read_exact function being called is known at compile-time.
Of course, there's no telling whether it'll be inlined. The implementation is fairly short, so chances are good, but that's out of our hands.
Secondly, what happens if it's inlined? Magic!
You can see the implementation here:
fn read_exact(&mut self, buf: &mut [u8]) -> io::Result<()> {
if self.buffer().len() >= buf.len() {
buf.copy_from_slice(&self.buffer()[..buf.len()]);
self.consume(buf.len());
return Ok(());
}
crate::io::default_read_exact(self, buf)
}
Once inlined, we therefore have:
fn read_u16<T: ByteOrder>(&mut self) -> Result<u16> {
let mut buf = [0; 2];
// self.read_exact(&mut buf)?;
if self.buffer().len() >= buf.len() {
buf.copy_from_slice(&self.buffer()[..buf.len()]);
self.consume(buf.len());
Ok(())
} else {
crate::io::default_read_exact(self, buf)
}?;
Ok(T::read_u16(&buf))
}
Needless to say, all those buf.len() calls should be replaced by 2.
fn read_u16<T: ByteOrder>(&mut self) -> Result<u16> {
let mut buf = [0; 2];
// self.read_exact(&mut buf)?;
if self.buffer().len() >= 2 {
buf.copy_from_slice(&self.buffer()[..2]);
self.consume(2);
Ok(())
} else {
crate::io::default_read_exact(self, buf)
}?;
Ok(T::read_u16(&buf))
}
So we're left with copy_from_slice, a memcpy invoked with a constant size (2).
The trick is that memcpy is so special that it's a builtin in most compilers, and it certainly is in LLVM. And it's a builtin specifically so that in special cases -- such as a constant size being specified which happen to be a register size -- its codegen can be specialized to... a mov instruction in the case of x86/x64.
So, as long as read_exact is inlined, then buf should live in a register from beginning to end... in the happy case.
In the cold path, when default_read_exact is called, then the compiler will need to use the stack and pass a slice. That's fine. It should not happen often.
If you find yourself repeatedly doing sequences of u16 reads, however... you may find yourself better served by reading larger arrays, to avoid the repeated if self.buffer().len() >= 2 checks.
say for some very simple Golang code:
package main
import "fmt"
func plus( a int, b int) int {
return a+b
}
func plusPlus(a,b,c int) int {
return a +b + c
}
func main() {
ptr := plus
ptr2 := plusPlus
fmt.Println(ptr)
fmt.Println(ptr2)
}
This has the following output:
0x2000
0x2020
What is going on here? This doesn't look like a function pointer, or any kind of pointer for that matter, that one would find in the stack. I also understand that Go, while offering some nice low level functionality in the threading department, also requires an OS for it to function; C is functional across all computer platforms and operating systems can be written in it while Go needs an operating system to function and in fact only works on a few OS right now. Do the very regular function pointers mean that this works on a VM? Or is the compiler just linked to low level C functions?
Go does not run on a virtual machine.
From the view of the language specification, ptr and ptr2 are function values. They can be called as ptr(1, 2) and ptr2(1, 2, 3).
Diving down into the implementation, the variables ptr and ptr2 are pointers to func values. See the Function Call design document for information on func values. Note the distinction between the language's "function" value and the implementation's "func" value.
Because the reflection API used by the fmt package indirects through the func values to get the pointer to print, the call tofmt.Println(ptr) prints the actual address of the plus function.
Go doesn't run on a virtual machine. Those are the actual addresses of the functions.
On my machine (go 1.4.1, Linux amd64) the program prints
0x400c00
0x400c20
which are different from the values in your example, but still pretty low. Checking the compiled code:
$ nm test | grep 'T main.plus'
0000000000400c00 T main.plus
0000000000400c20 T main.plusPlus
these are the actual addresses of the functions. func plus compiles to a mere 19 bytes of code, so plusPlus appears only 32 (0x20) bytes later to satisfy optimal alignment requirements.
For the sake of curiosity, here's the disassembly of func plus from objdump -d, which should dispell any doubts that Go compiles to anything but native code:
0000000000400c00 <main.plus>:
400c00: 48 8b 5c 24 08 mov 0x8(%rsp),%rbx
400c05: 48 8b 6c 24 10 mov 0x10(%rsp),%rbp
400c0a: 48 01 eb add %rbp,%rbx
400c0d: 48 89 5c 24 18 mov %rbx,0x18(%rsp)
400c12: c3 retq
They are function values:
package main
import "fmt"
func plus(a int, b int) int {
return a + b
}
func plusPlus(a, b, c int) int {
return a + b + c
}
func main() {
funcp := plus
funcpp := plusPlus
fmt.Println(funcp)
fmt.Println(funcpp)
fmt.Println(funcp(1, 2))
fmt.Println(funcpp(1, 2, 3))
}
Output:
0x20000
0x20020
3
6
I find the provision of named returned variables in Go to be a useful feature, because it can avoid the separate declaration of a variable or variables. However, in some instances I want to return a different variable to the variable declared in the function as the return-variable. That appears to work ok, however I do find it a little strange to declare a return variable and then return something else.
While writing a test program to help learn Go (not the one below), I found it a little annoying specifying the return variable in the return statement of functions returning multiple variables. Particularly so, because the variables had been named in the function declaration. I now find while posting this, that it appears that where there are named return variables, they don't need to be used in the return statement, just "return" will suffice and will implicitly use the named variables. That I find this a great feature.
So, although I have possibly partly answered my own question, could someone advise if my usage below is acceptable? I'm sure this is documented, but I haven't come across it, and it doesn't appear to be in the reference-book that I purchased which I think overlooks this feature.
Basically, the rule appears to be (as far as I can determine), that where named return variables are used, that the function statement declares the variables, and also the function can optionally implicitly uses them as the return values, however this can be overridden by using explicit return values.
Example Program :
package main
func main() {
var sVar1, sVar2 string
println("Test Function return-values")
sVar1, sVar2 = fGetVal(1)
println("This was returned for '1' : " + sVar1 + ", " + sVar2)
sVar1, sVar2 = fGetVal(2)
println("This was returned for '2' : " + sVar1 + ", " + sVar2)
}
func fGetVal(iSeln int) (sReturn1 string, sReturn2 string) {
sReturn1 = "This is 'sReturn1'"
sReturn2 = "This is 'sReturn2'"
switch iSeln {
case 1 : return
default : return "This is not 'sReturn1'", "This is not 'sReturn2'"
}
}
Your usage is absolutely fine and you'll find plenty of similar examples in the Go source code.
I'll attempt to explain how the return statement actually works in Go to give a deeper appreciation of why. It is useful to have a think about how Go implements parameter passing and return from functions. Once you understand that you'll understand why named return variables are so natural.
All arguments to functions and all return values from functions are passed on the stack in Go. This varies from C which usually passes some parameters in registers. When a function is called in Go the caller makes space on the stack for both the arguments and the return values then calls the function.
Specifically, when this function is called, which has 3 input parameters a, b, c and two return values
func f(a int, b int, c int) (int, int)
The stack will look like this (low memory address at the top)
* a
* b
* c
* space for return parameter 1
* space for return parameter 2
Now it is obvious that naming your return parameter just names those locations on the stack.
func f(a int, b int, c int) (x int, y int)
* a
* b
* c
* x
* y
It should now also be obvious what an empty return statement does - it just returns to the caller with whatever the values of x and y are.
Now for some disassembly! Compiling this with go build -gcflags -S test.go
package a
func f(a int, b int, c int) (int, int) {
return a, 0
}
func g(a int, b int, c int) (x int, y int) {
x = a
return
}
Gives
--- prog list "f" ---
0000 (test.go:3) TEXT f+0(SB),$0-40
0001 (test.go:3) LOCALS ,$0
0002 (test.go:3) TYPE a+0(FP){int},$8
0003 (test.go:3) TYPE b+8(FP){int},$8
0004 (test.go:3) TYPE c+16(FP){int},$8
0005 (test.go:3) TYPE ~anon3+24(FP){int},$8
0006 (test.go:3) TYPE ~anon4+32(FP){int},$8
0007 (test.go:4) MOVQ a+0(FP),BX
0008 (test.go:4) MOVQ BX,~anon3+24(FP)
0009 (test.go:4) MOVQ $0,~anon4+32(FP)
0010 (test.go:4) RET ,
--- prog list "g" ---
0011 (test.go:7) TEXT g+0(SB),$0-40
0012 (test.go:7) LOCALS ,$0
0013 (test.go:7) TYPE a+0(FP){int},$8
0014 (test.go:7) TYPE b+8(FP){int},$8
0015 (test.go:7) TYPE c+16(FP){int},$8
0016 (test.go:7) TYPE x+24(FP){int},$8
0017 (test.go:7) TYPE y+32(FP){int},$8
0018 (test.go:7) MOVQ $0,y+32(FP)
0019 (test.go:8) MOVQ a+0(FP),BX
0020 (test.go:8) MOVQ BX,x+24(FP)
0021 (test.go:9) RET ,
Both functions assemble to pretty much the same code. You can see quite clearly the declarations of a,b,c,x,y on the stack in g, though in f, the return values are anonymous anon3 and anon4.
Note: CL 20024 (March 2016, for Go 1.7) clarifies the usage of named return values and illustrates within the code base of go itself when its usage is appropriate:
all: remove public named return values when useless
Named returned values should only be used on public funcs and methods
when it contributes to the documentation.
Named return values should not be used if they're only saving the
programmer a few lines of code inside the body of the function,
especially if that means there's stutter in the documentation or it
was only there so the programmer could use a naked return
statement. (Naked returns should not be used except in very small
functions)
This change is a manual audit & cleanup of public func signatures.
Signatures were not changed if:
the func was private (wouldn't be in public godoc)
the documentation referenced it
For instance, archive/zip/reader.go#Open() used
func (f *File) Open() (rc io.ReadCloser, err error) {
It now uses:
func (f *File) Open() (io.ReadCloser, error) {
Its named return values didn't add anything to its documentation, which was:
// Open returns a `ReadCloser` that provides access to the File's contents.
// Multiple files may be read concurrently.
Yes, it's totally acceptable. I usually use named returned variables to assure a default return in a defer error treatment, to ensure a minimum viable return, like example below:
//execute an one to one reflection + cache operation
func (cacheSpot CacheSpot) callOneToOne(originalIns []reflect.Value) (returnValue []reflect.Value) {
defer func() { //assure for not panicking
if r := recover(); r != nil {
log.Error("Recovering! Error trying recover cached values!! y %v", r)
//calling a original function
returnValue = reflect.ValueOf(cacheSpot.OriginalFunc).Call(originalIns)
}
}()
//... doing a really nasty reflection operation, trying to cache results. Very error prone. Maybe panic
return arrValues //.. it's ok, arrValues achieved
}
I'm experiencing a bit of cognitive dissonance between C-style stack-based programming, where automatic variables live on the stack and allocated memory lives on the heap, and Python-style stack-based-programming, where the only thing that lives on the stack are references/pointers to objects on the heap.
As far as I can tell, the two following functions give the same output:
func myFunction() (*MyStructType, error) {
var chunk *MyStructType = new(HeaderChunk)
...
return chunk, nil
}
func myFunction() (*MyStructType, error) {
var chunk MyStructType
...
return &chunk, nil
}
i.e., allocate a new struct and return it.
If I'd written that in C, the first one would have put an object on the heap and the second would have put it on the stack. The first would return a pointer to the heap, the second would return a pointer to the stack, which would have evaporated by the time the function had returned, which would be a Bad Thing.
If I'd written it in Python (or many other modern languages except C#) example 2 would not have been possible.
I get that Go garbage collects both values, so both of the above forms are fine.
To quote:
Note that, unlike in C, it's perfectly OK to return the address of a
local variable; the storage associated with the variable survives
after the function returns. In fact, taking the address of a composite
literal allocates a fresh instance each time it is evaluated, so we
can combine these last two lines.
http://golang.org/doc/effective_go.html#functions
But it raises a couple of questions.
In example 1, the struct is declared on the heap. What about example 2? Is that declared on the stack in the same way it would be in C or does it go on the heap too?
If example 2 is declared on the stack, how does it stay available after the function returns?
If example 2 is actually declared on the heap, how is it that structs are passed by value rather than by reference? What's the point of pointers in this case?
It's worth noting that the words "stack" and "heap" do not appear anywhere in the language spec. Your question is worded with "...is declared on the stack," and "...declared on the heap," but note that Go declaration syntax says nothing about stack or heap.
That technically makes the answer to all of your questions implementation dependent. In actuality of course, there is a stack (per goroutine!) and a heap and some things go on the stack and some on the heap. In some cases the compiler follows rigid rules (like "new always allocates on the heap") and in others the compiler does "escape analysis" to decide if an object can live on the stack or if it must be allocated on the heap.
In your example 2, escape analysis would show the pointer to the struct escaping and so the compiler would have to allocate the struct. I think the current implementation of Go follows a rigid rule in this case however, which is that if the address is taken of any part of a struct, the struct goes on the heap.
For question 3, we risk getting confused about terminology. Everything in Go is passed by value, there is no pass by reference. Here you are returning a pointer value. What's the point of pointers? Consider the following modification of your example:
type MyStructType struct{}
func myFunction1() (*MyStructType, error) {
var chunk *MyStructType = new(MyStructType)
// ...
return chunk, nil
}
func myFunction2() (MyStructType, error) {
var chunk MyStructType
// ...
return chunk, nil
}
type bigStruct struct {
lots [1e6]float64
}
func myFunction3() (bigStruct, error) {
var chunk bigStruct
// ...
return chunk, nil
}
I modified myFunction2 to return the struct rather than the address of the struct. Compare the assembly output of myFunction1 and myFunction2 now,
--- prog list "myFunction1" ---
0000 (s.go:5) TEXT myFunction1+0(SB),$16-24
0001 (s.go:6) MOVQ $type."".MyStructType+0(SB),(SP)
0002 (s.go:6) CALL ,runtime.new+0(SB)
0003 (s.go:6) MOVQ 8(SP),AX
0004 (s.go:8) MOVQ AX,.noname+0(FP)
0005 (s.go:8) MOVQ $0,.noname+8(FP)
0006 (s.go:8) MOVQ $0,.noname+16(FP)
0007 (s.go:8) RET ,
--- prog list "myFunction2" ---
0008 (s.go:11) TEXT myFunction2+0(SB),$0-16
0009 (s.go:12) LEAQ chunk+0(SP),DI
0010 (s.go:12) MOVQ $0,AX
0011 (s.go:14) LEAQ .noname+0(FP),BX
0012 (s.go:14) LEAQ chunk+0(SP),BX
0013 (s.go:14) MOVQ $0,.noname+0(FP)
0014 (s.go:14) MOVQ $0,.noname+8(FP)
0015 (s.go:14) RET ,
Don't worry that myFunction1 output here is different than in peterSO's (excellent) answer. We're obviously running different compilers. Otherwise, see that I modfied myFunction2 to return myStructType rather than *myStructType. The call to runtime.new is gone, which in some cases would be a good thing. Hold on though, here's myFunction3,
--- prog list "myFunction3" ---
0016 (s.go:21) TEXT myFunction3+0(SB),$8000000-8000016
0017 (s.go:22) LEAQ chunk+-8000000(SP),DI
0018 (s.go:22) MOVQ $0,AX
0019 (s.go:22) MOVQ $1000000,CX
0020 (s.go:22) REP ,
0021 (s.go:22) STOSQ ,
0022 (s.go:24) LEAQ chunk+-8000000(SP),SI
0023 (s.go:24) LEAQ .noname+0(FP),DI
0024 (s.go:24) MOVQ $1000000,CX
0025 (s.go:24) REP ,
0026 (s.go:24) MOVSQ ,
0027 (s.go:24) MOVQ $0,.noname+8000000(FP)
0028 (s.go:24) MOVQ $0,.noname+8000008(FP)
0029 (s.go:24) RET ,
Still no call to runtime.new, and yes it really works to return an 8MB object by value. It works, but you usually wouldn't want to. The point of a pointer here would be to avoid pushing around 8MB objects.
type MyStructType struct{}
func myFunction1() (*MyStructType, error) {
var chunk *MyStructType = new(MyStructType)
// ...
return chunk, nil
}
func myFunction2() (*MyStructType, error) {
var chunk MyStructType
// ...
return &chunk, nil
}
In both cases, current implementations of Go would allocate memory for a struct of type MyStructType on a heap and return its address. The functions are equivalent; the compiler asm source is the same.
--- prog list "myFunction1" ---
0000 (temp.go:9) TEXT myFunction1+0(SB),$8-12
0001 (temp.go:10) MOVL $type."".MyStructType+0(SB),(SP)
0002 (temp.go:10) CALL ,runtime.new+0(SB)
0003 (temp.go:10) MOVL 4(SP),BX
0004 (temp.go:12) MOVL BX,.noname+0(FP)
0005 (temp.go:12) MOVL $0,AX
0006 (temp.go:12) LEAL .noname+4(FP),DI
0007 (temp.go:12) STOSL ,
0008 (temp.go:12) STOSL ,
0009 (temp.go:12) RET ,
--- prog list "myFunction2" ---
0010 (temp.go:15) TEXT myFunction2+0(SB),$8-12
0011 (temp.go:16) MOVL $type."".MyStructType+0(SB),(SP)
0012 (temp.go:16) CALL ,runtime.new+0(SB)
0013 (temp.go:16) MOVL 4(SP),BX
0014 (temp.go:18) MOVL BX,.noname+0(FP)
0015 (temp.go:18) MOVL $0,AX
0016 (temp.go:18) LEAL .noname+4(FP),DI
0017 (temp.go:18) STOSL ,
0018 (temp.go:18) STOSL ,
0019 (temp.go:18) RET ,
Calls
In a function call, the function value and arguments are evaluated in
the usual order. After they are evaluated, the parameters of the call
are passed by value to the function and the called function begins
execution. The return parameters of the function are passed by value
back to the calling function when the function returns.
All function and return parameters are passed by value. The return parameter value with type *MyStructType is an address.
According to Go's FAQ:
if the compiler cannot prove that the variable is not referenced after
the function returns, then the compiler must allocate the variable on
the garbage-collected heap to avoid dangling pointer errors.
You don't always know if your variable is allocated on the stack or heap.
...
If you need to know where your variables are allocated pass the "-m" gc flag to "go build" or "go run" (e.g., go run -gcflags -m app.go).
Source: http://devs.cloudimmunity.com/gotchas-and-common-mistakes-in-go-golang/index.html#stack_heap_vars
Here is another discussion about stack heap and GC in A Guide to the Go Garbage Collector
Where Go Values Live
stack allocation
non-pointer Go values stored in local variables will likely not be managed by the Go GC at all, and Go will instead arrange for memory to be allocated that's tied to the lexical scope in which it's created. In general, this is more efficient than relying on the GC, because the Go compiler is able to predetermine when that memory may be freed and emit machine instructions that clean up. Typically, we refer to allocating memory for Go values this way as "stack allocation," because the space is stored on the goroutine stack.
heap allocation
Go values whose memory cannot be allocated this way, because the Go compiler cannot determine its lifetime, are said to escape to the heap. "The heap" can be thought of as a catch-all for memory allocation, for when Go values need to be placed somewhere. The act of allocating memory on the heap is typically referred to as "dynamic memory allocation" because both the compiler and the runtime can make very few assumptions as to how this memory is used and when it can be cleaned up. That's where a GC comes in: it's a system that specifically identifies and cleans up dynamic memory allocations.
There are many reasons why a Go value might need to escape to the heap. One reason could be that its size is dynamically determined. Consider for instance the backing array of a slice whose initial size is determined by a variable, rather than a constant. Note that escaping to the heap must also be transitive: if a reference to a Go value is written into another Go value that has already been determined to escape, that value must also escape.
Escape analysis
As for how to access the information from the Go compiler's escape analysis, the simplest way is through a debug flag supported by the Go compiler that describes all optimizations it applied or did not apply to some package in a text format. This includes whether or not values escape. Try the following command, where [package] is some Go package path.
$ go build -gcflags=-m=3 [package]
Implementation-specific optimizations
The Go GC is sensitive to the demographics of live memory, because a complex graph of objects and pointers both limits parallelism and generates more work for the GC. As a result, the GC contains a few optimizations for specific common structures. The most directly useful ones for performance optimization are listed below.
Pointer-free values are segregated from other values.
As a result, it may be advantageous to eliminate pointers from data structures that do not strictly need them, as this reduces the cache pressure the GC exerts on the program. As a result, data structures that rely on indices over pointer values, while less well-typed, may perform better. This is only worth doing if it's clear that the object graph is complex and the GC is spending a lot of time marking and scanning.
The GC will stop scanning values at the last pointer in the value.
As a result, it may be advantageous to group pointer fields in struct-typed values at the beginning of the value. This is only worth doing if it's clear the application spends a lot of its time marking and scanning. (In theory the compiler can do this automatically, but it is not yet implemented, and struct fields are arranged as written in the source code.)
func Function1() (*MyStructType, error) {
var chunk *MyStructType = new(HeaderChunk)
...
return chunk, nil
}
func Function2() (*MyStructType, error) {
var chunk MyStructType
...
return &chunk, nil
}
Function1 and Function2 may be inline function. And return variable will not escape. It's not necessary to allocate variable on the heap.
My example code:
package main
type S struct {
x int
}
func main() {
F1()
F2()
F3()
}
func F1() *S {
s := new(S)
return s
}
func F2() *S {
s := S{x: 10}
return &s
}
func F3() S {
s := S{x: 9}
return s
}
According to output of cmd:
go run -gcflags -m test.go
output:
# command-line-arguments
./test.go:13:6: can inline F1
./test.go:18:6: can inline F2
./test.go:23:6: can inline F3
./test.go:7:6: can inline main
./test.go:8:4: inlining call to F1
./test.go:9:4: inlining call to F2
./test.go:10:4: inlining call to F3
/var/folders/nr/lxtqsz6x1x1gfbyp1p0jy4p00000gn/T/go-build333003258/b001/_gomod_.go:6:6: can inline init.0
./test.go:8:4: main new(S) does not escape
./test.go:9:4: main &s does not escape
./test.go:14:10: new(S) escapes to heap
./test.go:20:9: &s escapes to heap
./test.go:19:2: moved to heap: s
If the compiler is smart enough, F1() F2() F3() may not be called. Because it makes no means.
Don't care about whether a variable is allocated on heap or stack, just use it. Protect it by mutex or channel if necessary.