Store net.Conn by value or reference? - go

My app uses a sync.Map to store open socket-connections which are accessed concurrently through multiple goroutines.
I'm wondering whether to store these connections as structs net.Conn or as references *net.Conn.
What are the benefits/drawbacks of both options and what would be the prefered solution?

While #blackgreen is correct, I'd expand a bit on the reasoning.
The sync.Map type is explicitly defined to operate on interface{}.
Now remember that in Go, an interface is not merely an abstraction used by the type system; instead, you can have values of interface types, and the in-memory representation of such values is a struct containing two pointers—to an internal object describing the dynamic type of the value stored in the variable, and to the value itself (or a copy of it created on the heap by the runtime).
This means, if you were to store a pointer to anything in sync.Map, any such pointer stored would have been converted to a value of type interface{} and it would occupy exactly the same space in sync.Map.
If, instead, you would store values of type net.Conn there directly, they would have been stored directly—simply because they are already interface values, so Go would just copy the pair of pointers.
On the surface, this looks like both methods are on par in terms of the space used but bear with me.
To store a pointer to a net.Conn value in a container data type such as sync.Map, the program must make sure that that value is allocated on the heap (as opposed to allocating it directly on the stack of the currently running goroutine), and this fact might force the compiler to arrange for ensuring that the original net.Conn value is allocated directly on the heap.
In other words, storing a pointer to a variable of interface type might be (and usually will be—due to the way typical code is organized) more wasteful in terms of memory use.
Add to it that most dereferencing (pointer chasing) tends to trash CPU cache; that's not a game changer but might add up to a couple of µs when you iterate over collections in tight loops.
Having said that, I'd would advise against outright dismissing storing pointers in containers like sync.Map: occasionally it comes in handy—for instance, to reuse arrays for slices, you usually store pointers to the 1st elements of such arrays.

Related

Is it correct to use slice as *[]Item, because Slice is by default pointer

What is the right way to use slice in Go. As per Go documentation slice is by default pointer, so is creating slice as *[]Item is the right way?. Since slice are by default pointer isn't this way of creating the slice making it pointer to a pointer.
I feel the right way to create slice is []Item or []*item (slice holding pointers of items)
A bit of theory
Your question has no sense: there's no "right" or "wrong" or "correct" and "incorrect": you can have a pointer to a slice, and you can have a pointer to a pointer to a slice, and you can add levels of such indirection endlessly.
What to do depends on what you need in a particular case.
To help you with the reasoning, I'll try to provide a couple of facts and draw some conclusions.
The first two things to understand about types and values in Go are:
Everything in Go, ever, always, is passed by value.
This means variable assignments (= and :=), passing values to function and method calls, and copying memory which happens internally such as when reallocating backing arrays of slices or rebalancing maps.
Passing by value means that actual bits of the value which is assigned are physically copied into the variable which "receives" the value.
Types in Go—both built-in and user-defined (including those defined in the standard library)—can have value semantics and reference semantics when it comes to assignment.
This one is a bit tricky, and often leads to novices incorrectly assuming that the first rule explained above does not hold.
"The trick" is that if a type contains a pointer (an adderss of a variable) or consists of a single pointer, the value of this pointer is copied when the value of the type is copied.
What does this mean?
Pretty simple: if you assign the value of a variable of type int to another variable of type int, both variables contain identical bits but they are completely independent: change the content of any of them, and another will be unaffected.
If you assign a variable containing a pointer (or consisting of a single pointer) to another one, they both, again, will contain identical bits and are independent in the sense that changing those bits in any of them will not affect the other.
But since the pointer in both these variables contains the address of the same memory location, using those pointers to modify the contents of the memory location they point at will modify the same memory.
In other words, the difference is that an int does not reference anything while a pointer naturally references another memory location—because it contains its address.
Hence, if a type contains at least a single pointer (it may do so by containing a field of another type which itself contains a pointer, and so on—to any nesting level), values of this type will have reference assignment semantics: if you assign a value to another variable, you end up with two values referencing the same memory location.
That is why maps, slices and strings have reference semantics: when you assign variables of these types both variables point to the same underlying memory location.
Let's move on to slices.
Slices vs pointers to slices
A slice, logically, is a struct of three fields: a pointer to the slice's backing array which actually contains the slice's elements, and two ints: the capacity of the slice and its length.
When you pass around and assign a slice value, these struct values are copied: a pointer and two integers.
As you can see, when you pass a slice value around the backing array is not copied—only a pointer to it.
Now let's consider when you want to use a plain slice or a pointer to a slice.
If you're concerned with performance (memory allocation and/or CPU cycles needed to copy memory), these concerns are unfounded: copying three integers when passing around a slice is dirt-cheap on today's hardware.
Using a pointer to a slice would make copying a tiny bit faster—a single integer rather than three—but these savings will be easily offset by two facts:
The slice's value will almost certainly end up being allocated on the heap so that the compiler can be sure its value will survive crossing boundaries of the function calls—so you will pay for using the memory manager and the garbage collector will have more work.
Using a level of indirection reduces data locality: accessing RAM is slow so CPUs have caches which prefetch data at the addresses following the one at which the data is being read. If the control flow immediately reads memory at another location, the prefetched data is thrown away: cache trashing.
OK, so is there a case when you would want a pointer to a slice?
Yes. For instance, the built-in append function could have been defined as
func append(*[]T, T...)
instead of
func append([]T, T...) []T
(N.B. the T here actually means "any type" because append is not a library fuction and cannot be sensibly defined in plain Go; so it's sort of pseudocode.)
That is, it could accept a pointer to a slice and possibly replace the slice pointed to by the pointer, so you'd call it as append(&slice, element) and not as slice = append(slice, element).
But honestly, in real-world Go projects I have dealt with, the only case of using pointers to slices which I can remember was about pooling slices which are heavily reused—to save on memory reallocations. And that sole case was only due to sync.Pool keeping elements of type interface{} which may be more effective when using pointers¹.
Slices of values vs slices of pointers to values
Exactly the same logic described above applies to the reasoning about this case.
When you put a value in a slice that value is copied. When the slice needs to grow its backing array, the array will be reallocated, and reallocation means physically copying all existing elements into the new memory location.
So, two considerations:
Are elements reasonably small so that copying them is not going to press on memory and CPU resources?
(Note that "small" vs "big" also heavily depens on the frequency of such copying in a working program: copying a couple of megabytes once in a while is not a big deal; copying even tens of kilobytes in a tight time-critical loop can be a big deal.)
Are you program OK with multiple copies of the same data (for instance, values of certain types like sync.Mutex must not be copied after first use)?
If the answer to either question is "no", you should consider keeping pointers in the slice. But when you consider keeping pointers, also think about data locality explained above: if a slice contains data intended for time-critical number-crunching, it's better not have the CPU to chase pointers.
To recap: when you ask about a "correct" or "right" way of doing something, the question has no sense without specifying the set of criteria according to which we could classify all possible solutions to a problem. Still, there are considerations which must be performed when designing the way you're going to store and manipulate data, and I have tried to explain these considerations.
In general, a rule of thumb regarding slices could be:
Slices are designed to be passed around "as is"—as values, not pointers to variables containing their values.
There are legitimate reasons to have pointers to slices, though.
Most of the time you keep values in the slice's elements, not pointers to variables with these values.
Exceptions to this general rule:
Values you intend to store in a slice occupy too much space so that it looks like the envisioned pattern of using slices of them would involve excessive memory pressure.
Types of values you intend to store in a slice require they must not be copied but rather only referenced, existing as a single instance each. A good example are types containing/embedding a field of type sync.Mutex (or, actually, a variable of any other type from the sync package except those which itself have reference semantics such as sync.Pool): if you lock a mutex, copy its value and then unlock the copy, the initially locked copy won't notice, which means you have a grave bug in your code.
A note of caution on correctness vs performance
The text above contains a lot of performance considerations.
I've presented them because Go is a reasonably low-level language: not that low-level as C and C++ and Rust but still providing the programmer with plenty of wiggle-room to use when performance is at stake.
Still, you should very well understand that at this point on your learning curve, correctness must be your top—if not the sole—objective: please take no offence, but if you were after tuning some Go code to shave off some CPU time to execute it, you weren't asking your question in the first place.
In other words, please consider all of the above as a set of facts and considerations to guilde you in your learning and exploration of the subject but do not fall into the trap of trying to think about performance first. Make your programs correct and easy to read and modify.
¹ An interface value is a pair of pointers: to the variable containing the value you have put into the interface value and to a special data structure inside the Go runtime which describes the type of that variable.
So while you can put a slice value into a variable of type interface{} directly—in the sense that it's perfectly fine in the language—if the value's type is not itself a single pointer, the compiler will have to allocate on the heap a variable to contain a copy of your value there, and store a pointer to that new variable into the value of type interface{}.
This is needed to hold that "everything is always passed by value" semantics of the Go assignments.
Consequently, if you put a slice value into a variable of type interface{}, you will end up with a copy of that value on the heap.
Because of this, keeping pointers to slices in data structures such as sync.Map makes code uglier but results in lesser memory churn.

How does gc handle slice memory reclaim

var a = [...]int{1,2,3,4,5,6}
s1 := a[2:4:5]
Suppose s1 goes out of scope later than a. How does gc know to reclaim the memory of s1's underlying array a?
Consider the runtime representation of s1, spec
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
The GC doesn't even know about the beginning of a.
Go uses mark-and-sweep collector as it's present implementation.
As per the algorithm, there will be one root object, and the rest is tree like structure, in case of multi-core machines gc runs along with the program on one core.
gc will traverse the tree and when something is not reachable it, considers it as free.
Go objects also have metadata for objects as stated in this post.
An excerpt:
We needed to have some information about the objects since we didn't have headers. Mark bits are kept on the side and used for marking as well as allocation. Each word has 2 bits associated with it to tell you if it was a scalar or a pointer inside that word. It also encoded whether there were more pointers in the object so we could stop scanning objects sooner than later.
The reason go's slices (slice header) were structures instead of pointer to structures is documented by russ cox in this page under slice section.
This is an excerpt:
Go originally represented a slice as a pointer to the structure(slice header) , but doing so meant that every slice operation allocated a new memory object. Even with a fast allocator, that creates a lot of unnecessary work for the garbage collector, and we found that, as was the case with strings, programs avoided slicing operations in favor of passing explicit indices. Removing the indirection and the allocation made slices cheap enough to avoid passing explicit indices in most cases.
The size(length) of an array is part of its type. The types [1]int and [2]int are distinct.
One thing to remember is go is value oriented language, instead of storing pointers, they store direct values.
[3]int, arrays are values in go, so if you pass an array, it copies the whole array.
[3]int this is a value (one as a whole).
When one does a[1] you are accessing part of the value.
SliceHeader Data field says consider this as base point of array, instead of a[0]
As far as my knowledge is considered:
When one requests for a[4],
a[0]+(sizeof(type)*4)
is calculated.
Now if you are accessing something in through slice s = a[2:4],
and if one requests for s[1], what one was requesting is,
a[2]+sizeof(type)*1

Using pointer to channel

Is it good practice to use pointer to channel? For example I read the data concurrently and pass those data map[string]sting using channel and process this channel inside getSameValues().
func getSameValues(results *chan map[string]string) []string {
var datas = make([]map[string]string, len(*results))
i := 0
for values := range *results {
datas[i] = values
i++
}
}
The reason I do this is because the chan map[string]string there will be around millions of data inside the map and it will be more than one map.
So I think it would be a good approach if I can pass pointer to the function so that it will not copy the data to save some resource of memory.
I didn't find a good practice in effective go. So I'm kinda doubt about my approach here.
It is poor practice to use pointers to channels, maps, functions, interfaces, or slices for efficiency.
Values of these types have a small fixed size independent of the length or capacity of the value. An internal pointer references the variable size data.
Channels, maps, and functions are the same size as a pointer. Therefore, the runtime cost of copying a value of these types is identical to copying a pointer to the value.
Interfaces are two × the size of a pointer, and slices are three × the size of a pointer. The cost of copying a value of these types is higher than copying a pointer. That extra copying cost is often lower or equal to the cost of dereferencing the pointer.
In Go, there are six categories of value that are passed by reference rather than by value. These are pointers, slices, maps, channels, interfaces and functions.
Copying a reference value and copying a pointer should be considered equal in terms of what the CPU has to actually do (at least as a good approximation).
So it is almost never useful to use pointers to channels, just like it is rarely useful to use pointers to maps.
Because your channel carries maps, the channel is a reference type and so are the maps, so all the CPU is doing is copying pointers around the heap. In the case of the channel, it also does goroutine synchronisation too.
For further reading, open Effective Go and search the page for the word 'reference'.
Everything in Golang is passed by value. Even pointers are a type and assigned the value of the memory address. So they are values too.
(Extending Rick's answer) There are actually six types that hold pointer values and a pointer to these (i.e. a pointer to a pointer) types doesn't help anyway:
pointers
slices
maps
channels
interfaces
function

In golang, what happens to a variable after it goes out of scope of a loop or a condition or a case?

Please don't mark this as duplicate question, as this is more specific to golang and requesting advise on some best practice when declaring variables to store large byte arrays when reading from a channel.
Forgive me for this dumb question, but the reason for this question is just my curiosity to determine what could be the best practice on writing a high-performance stream consumer reading large size byte array from multiple channels. (although premature optimization is the root of all evil, this is more of a curiosity). I have read answers about similar senario specific to C here, but I am requesting answer specific to go, as it is a garbage collected language, and their documentation here says "From a correctness standpoint, you don't need to know where the variable is allocated".
If I have the following code to read from a channel,
for {
select {
case msg := <-stream.Messages():
...snip...
Variable msg is within the scope of the case statement.
What happens once it goes out of scope of case statement? Since this is declared in the same native function, and the size of stream could be a large byte slice, is the variable going to be stored in heap or stack, and if heap, will it be garbage collected, or does stack pointer comes into picture?
Since this is inside an infinite for loop, and the size of stream is a large byte slice, is creating the variable and allocating memory every time an overhead,or should I declare the variable ahead, and keeps on over-writing it in every iteration, so that if there is a garbage collection involved, which I am not sure, I could possibly reduce the garbage?
Shouldn't I be bothered about it at all?
Thank you.
Shouldn't I be bothered about it at all?
No.
(And once it bothers you: profile.)
If the channel value type is a slice, the value of the variable msg is just the slice descriptor, which is small (see https://blog.golang.org/go-slices-usage-and-internals). The array that contains the data the slice refers to will have been allocated elsewhere before the slice was placed on the channel. Assuming the value must survive after the function that allocated it returns, it will be on the heap. Note that the contents of the slice are not actually being moved or copied by the channel receive operation.
Once the value of msg becomes unreachable (by the variable going out of scope or being assigned a different value), assuming there are no other references to the array underlying the slice, it will be subject to garbage collection.
It's hard to say whether some amount of optimization would be helpful without knowing more about how the program works.

Go implicit conversion to interface does memory allocation?

When defining a function with variadic arguments of type interface{} (e.g. Printf), the arguments are apparently implicitly converted to interface instances.
Does this conversion imply memory allocation? Is this conversion fast? When concerned by code efficiency, should I avoid using variadic functions?
The best explanation i found about the interface memory allocation in Go is still this article from Rus Cox, one of the core Go programmer. It's well worth to read it.
http://research.swtch.com/interfaces
I picked up some of the most interesting parts:
Values stored in interfaces might be arbitrarily large, but only one
word is dedicated to holding the value in the interface structure, so
the assignment allocates a chunk of memory on the heap and records the
pointer in the one-word slot.
...
Calling fmt.Printf(), the Go compiler generates code that calls the
appropriate function pointer from the itable, passing the interface
value's data word as the function's first (in this example, only)
argument.
Go's dynamic type conversions mean that it isn't reasonable for the
compiler or linker to precompute all possible itables: there are too
many (interface type, concrete type) pairs, and most won't be needed.
Instead, the compiler generates a type description structure for each
concrete type like Binary or int or func(map[int]string). Among other
metadata, the type description structure contains a list of the
methods implemented by that type.
...
The interface runtime computes the itable by looking for each method
listed in the interface type's method table in the concrete type's
method table. The runtime caches the itable after generating it, so
that this correspondence need only be computed once.
...
If the interface type involved is empty—it has no methods—then the
itable serves no purpose except to hold the pointer to the original
type. In this case, the itable can be dropped and the value can point
at the type directly.
Because Go has the hint of static typing to go along with the dynamic method lookups, it can move the lookups back from the call sites to the point when the value is stored in the interface.
Converting to an interface{} is a separate concept from variadic arguments which are contained in a slice and can be of any type. However these are all probably free in the sense of allocations as long as they don't escape to the heap (in the GC toolchain).
The excess allocations you would see from fmt functions like Printf are going to be from reflection rather than from the use of interface{} or variadic arguments.
If you're concerned with efficiency though, avoiding indirection will always be more efficient than not, so using the correct value types will yield more efficient code. The difference can be minimal though, so benchmark the code first before concerning yourself with minor optimizations.
Go passes arguments copy_by_value, so it does memory allocation anyway. You always should better avoid using interface{} if possible. In described case your function will need to reflect arguments to use them. Reflection is quite expensive operation that's why fmt.Printf() is so slow.

Resources