Goroutine Channel, Copy vs Pointer - go

Both functions are doing the same task which is initializing "Data struct". what are the Pros or Cons of each function? e.g. the function should unmarshal a big JSON file.
package main
type Data struct {
i int
}
func funcp(c chan *Data) {
var t *Data
t = <-c //receive
t.i = 10
}
func funcv(c chan Data) {
var t Data
t.i = 20
c <- t //send
}
func main() {
c := make(chan Data)
cp := make(chan *Data)
var t Data
go funcp(cp)
cp <- &t //send
println(t.i)
go funcv(c)
t = <- c //receive
println(t.i)
}
Link to Go Playground

The title of your question seems wrong. You are asking not about swapping things but rather about whether to send a pointer to some data or a copy of some data. More importantly, the overall thrust of your question lacks crucial information.
Consider two analogies:
Which is better, chocolate ice cream, or strawberry? That's probably a matter of opinion, but at least both with serve similar purposes.
Which is better, a jar of glue or a brick of C4? That depends on whether you want to build something, or blow something up, doesn't it?
If you send a copy of data through a channel, the receiver gets ... a copy. The receiver does not have access to the original. The copying process may take some time, but the fact that the receiver does not have to share access may speed things up. So this is something of an opinion, and if your question is about which is faster, well, you'll have to benchmark it. Be sure to benchmark the real problem, and not a toy example, because benchmarks on toy examples don't translate to real-world performance.
If you send a pointer to data through a channel, the receiver gets a copy of the pointer, and can therefore modify the original data. Copying the pointer is fast, but the fact that the receiver has to share access may slow things down. But if the receiver must be able to modify the data, you have no choice. You must use a tool that works, and not one that does not.
In your two functions, one generates values (funcv) so it does not have to send pointers. That's fine, and gives you the option. The other (funcp) receives objects but wants to update them so it must receive a pointer to the underlying object. That's fine too, but it means that you are now communicating by sharing (the underlying data structure), which requires careful coordination.

Related

Re-using the same encoder/decoder for the same struct type in Go without creating a new one

I was looking for the quickest/efficient way to store Structs of data to persist on the filesystem. I came across the gob module which allows encoders and decoders to be set up for structs to convert to []byte (binary) that can be stored.
This was relatively easy - here's a decoding example:
// Per item get request
// binary = []byte for the encoded binary from database
// target = struct receiving what's being decoded
func Get(path string, target *SomeType) {
binary = someFunctionToGetBinaryFromSomeDB(path)
dec := gob.NewDecoder(bytes.NewReader(binary))
dec.Decode(target)
}
However, when I benchmarked this against JSON encoder/decoder, I found it to be almost twice as slow. This was especially noticeable when I created a loop to retrieve all structs. Upon further research, I learned that creating a NEW decoder every time is really expensive. 5000 or so decoders are re-created.
// Imagine 5000 items in total
func GetAll(target *[]SomeType{}) {
results = getAllBinaryStructsFromSomeDB()
for results.next() {
binary = results.getBinary()
// Making a new decoder 5000 times
dec := gob.NewDecoder(bytes.NewReader(binary))
var target someType
dec.Decode(target)
// ... append target to []SomeType{}
}
}
I'm stuck here trying to figure out how I can recycle (reduce reuse recycle!) a decoder for list retrieval. Understanding that the decoder takes an io.Reader, I was thinking it would be possible to 'reset' the io.Reader and use the same reader at the same address for a new struct retrieval, while still using the same decoder. I'm not sure how to go about doing that and I'm wondering if anyone has any ideas to shed some light. What I'm looking for is something like this:
// Imagine 5000 items in total
func GetAll(target *[]SomeType{}) {
// Set up some kind of recyclable reader
var binary []byte
reader := bytes.NewReader(binary)
// Make decoder based on that reader
dec := gob.NewDecoder(reader)
results = getAllBinaryStructsFromSomeDB()
for results.next() {
// Insert some kind of binary / decoder reset
// Then do something like:
reader.WriteTo(results.nextBinary())
var target someType
dec.Decode(target) // except of course this won't work
// ... append target to []SomeType{}
}
}
Thanks!
I was looking for the quickest/efficient way to store Structs of data to persist on the filesystem
Instead of serializing your structs, represent your data primarily in a pre-made data store that fits your usage well. Then model that data in your Go code.
This may seem like the hard way or the long way to store data, but it will solve your performance problem by intelligently indexing your data and allowing filtering to be done without a lot of filesystem access.
I was looking for ... data to persist.
Let's start there as a problem statement.
gob module allows encoders and decoders to be set up for structs to convert to []byte (binary) that can be stored.
However, ... I found it to be ... slow.
It would be. You'd have to go out of your way to make data storage any slower. Every object you instantiate from your storage will have to come from a filesystem read. The operating system will cache these small files well, but you'll still be reading the data every time.
Every change will require rewriting all the data, or cleverly determining which data to write to disk. Recall that there is no "insert between" operation for files; you'll be rewriting all bytes after to add bytes in the middle of a file.
You could do this concurrently, of course, and goroutines handle a bunch of async work like filesystem reads very well. But now you've got to start thinking about locking.
My point is, for the cost of trying to serialize your structures you can better describe your data at the persistent layer, and solve problems you're not even working on yet.
SQL is a pretty obvious choice, since you can make it work with sqlite as well as other sql servers that scale well; I hear mongodb is easy to wrangle these days, and depending on what you're doing with the data, redis has a number of attractive list, set and k/v operations that can easily be made atomic and consistent.
The encoder and decoder are designed to work with streams of values. The encoder writes information describing a Go type to the stream once before transmitting the first value of the type. The decoder retains received type information for decoding subsequent values.
The type information written by the encoder is dependent on the order that the encoder encounters unique types, the order of fields in structs and more. To make sense of the stream, a decoder must read the complete stream written by a single encoder.
It is not possible to recycle decoders because of the way that type information is transmitted.
To make this more concrete, the following does not work:
var v1, v2 Type
var buf bytes.Buffer
gob.NewEncoder(&buf).Encode(v1)
gob.NewEncoder(&buf).Encode(v2)
var v3, v4 Type
d := gob.NewDecoder(&buf)
d.Decode(&v3)
d.Decode(&v4)
Each call to Encode writes information about Type to the buffer. The second call to Decode fails because a duplicate type is received.

Is it necessary to return pointer type in sync.Pool New function?

I saw the issue on Github which says sync.Pool should be used only with pointer types, for example:
var TPool = sync.Pool{
New: func() interface{} {
return new(T)
},
}
Does it make sense? What about return T{} and which is the better choice, why?
The whole point of sync.Pool is to avoid (expensive) allocations. Large-ish buffers, etc. You allocate a few buffers and they stay in memory, available for reuse. Hence the use of pointers.
But here you'll be copying the values on every step, defeating the purpose. (Assuming your T is a "normal" struct and not something like SliceHeader)
It is not necessary. In most cases it should be a pointer as you want to share an object, not to make copies.
In some use cases this can be a non pointer type, like an id of some external resource. I can imagine a pool of paths (mounted disk drives) represented with strings where some large file operations are being conducted.

Go error: gob: type sync.Mutex has no exported fields

I am having an issue where I can't save a struct as a gob if it has an exported sync.Mutex. Everything seems to work if I make the mutex unexported (by not capitalizing it). I'm curious to understand why this is and make sure that there's no other issue with gobbing an unexported sync.Mutex.
I see that there are several hits on Google for a related problem with sync.RWMutex, but none really explain why this happens.
package main
import (
"sync"
"encoding/gob"
"os"
"fmt"
)
func writeGob(filePath string, object interface{}) error {
file, err := os.Create(filePath)
defer file.Close()
if err != nil {
return err
}
encoder := gob.NewEncoder(file)
err = encoder.Encode(object)
return err
}
type Dog struct {
Name string
GobMux sync.Mutex
}
func main() {
d := &Dog{Name: "Fido"}
err := writeGob("./gob", d)
fmt.Printf("Error: %v\n", err)
}
Output:
Error: gob: type sync.Mutex has no exported fields
Gob Encoding
Everything seems to work if I make the mutex unexported (by not capitalizing it). I'm curious to understand why this is.
As Cerise mentioned there is an open issue for this, but in short it is normally a programming error if you try to encode a struct (such as a mutex) which has no exported fields.
There are some ways to work around this particular problem, though.
You can make the mutex private and wrap the lock/unlock in a public function, rather than reaching into the struct to manipulate the mutex. E.g.
func (d *Dog) SetName(name string) {
d.GobMux.Lock()
d.Name = name
d.GobMux.Unlock()
}
You can also wrap the type and pull the mutex out:
type Dog struct {
Name string
}
type DogWrapper struct {
Dog *Dog
GobMux sync.Mutex
}
This is fairly cumbersome if you have many small structs but for a smaller number of more complex structs it might be OK.
Finally, the "correct" way to solve this problem is to write your own GobEncode/Decode routines. There are some sparse examples in the stdlib, such as time.GobEncode, but generally this seems like quite a bit of work.
Mutexes in General
...and make sure that there's no other issue with gobbing an unexported sync.Mutex.
Mutexes are tightly coupled to the Go runtime's in-process memory and scheduler. They help the Go runtime decide which goroutines are allowed to read or write to a particular part of memory, and also decide when those goroutines may be scheduled (i.e. a goroutine waiting on a mutex to unlock will not be scheduled until the mutex is unlocked).
If you use Gob to copy a data structure to another process, the recipient process's runtime has completely different internal state compared to the process that is sending the gob, and as such the mutex cannot be logically transferred. Copying a mutex to another process would be a bit like using Earth GPS coordinates on Mars. They just don't match up.
when I read back a Dog object from a gob, it looks like the mutex is unlocked regardless of the state of the mutex when it was saved to a gob. Is this behavior I can count on?
As the documentation for Mutex states, "The zero value for a Mutex is an unlocked mutex." So yes, you can rely on this behavior.
Other Encoders
In my opinion, despite gob's presence in the stdlib, it does not receive much attention because there are many other mature encoding options available. If gob doesn't meet your needs there are many other options available -- JSON, Cap'n Proto, net/rpc, etc. with different characteristics that may work better for you.

Are there any advantages to having a defer in a simple, no return, non-panicking function?

Going through the standard library, I see a lot functions similar to the following:
// src/database/sql/sql.go
func (dc *driverConn) removeOpenStmt(ds *driverStmt) {
dc.Lock()
defer dc.Unlock()
delete(dc.openStmt, ds)
}
...
func (db *DB) addDep(x finalCloser, dep interface{}) {
//println(fmt.Sprintf("addDep(%T %p, %T %p)", x, x, dep, dep))
db.mu.Lock()
defer db.mu.Unlock()
db.addDepLocked(x, dep)
}
// src/expvar/expvar.go
func (v *Map) addKey(key string) {
v.keysMu.Lock()
defer v.keysMu.Unlock()
v.keys = append(v.keys, key)
sort.Strings(v.keys)
}
// etc...
I.e.: simple functions with no returns and presumably no way to panic that are still deferring the unlock of their mutex. As I understand it, the overhead of a defer has been improved (and perhaps is still in the process of being improved), but that being said: Is there any reason to include a defer in functions like these? Couldn't these types of defers end up slowing down a high traffic function?
Always deferring things like Mutex.Unlock() and WaitGroup.Done() at the top of the function makes debugging and maintenance easier, since you see immediately that those pieces are handled correctly so you know that those important pieces are taken care of, and can quickly move on to other issues.
It's not a big deal in 3 line functions, but consistent-looking code is also just easier to read in general. Then as the code grows, you don't have to worry about adding an expression that may panic, or complicated early return logic, because the pre-existing defers will always work correctly.
Panic is a sudden (so it could be unpredicted or unprepared) violation of normal control flow. Potentially it can emerge from anything - quite often from external things - for example memory failure. defer mechanism gives an easy and quite cheap tool to perform exit operation. Thus not leave a system in broken state. This is important for locks in high load applications because it help not to lose locked resources and freeze the whole system in lock once.
And if for some moment code has no places for panic (hard to guess such system ;) but things evolve. Later this function would be more complex and able to throw panic.
Conclusion: Defer helps you to ensure your function will exit correctly if something “goes wring”. Also quite important it is future-proof - same reply for different failures.
So it’s a food style to put them even in easy functions. As a programmer you can see nothing is lost. And be more sure in a code.

Golang error function arguments too large for new goroutine

I am running a program with go 1.4 and I am trying to pass a large struct to a go function.
go ProcessImpression(network, &logImpression, campaign, actualSpent, partnerAccount, deviceId, otherParams)
I get this error:
runtime.newproc: function arguments too large for new goroutine
I have moved to pass by reference which helps but I am wondering if there is some way to pass large structs in a go function.
Thanks,
No, none I know of.
I don't think you should be too aggressive tuning to avoid copying, but it appears from the source that this error is emitted when parameters exceed the usable stack space for a new goroutine, which should be kilobytes. The copying overhead is real at that point, especially if this isn't the only time these things are copied. Perhaps some struct either explicitly is larger than expected thanks to a large struct member (1kb array rather than a slice, say) or indirectly. If not, just using a pointer as you have makes sense, and if you're worried about creating garbage, recycle the structs pointed to using sync.Pool.
I was able to fix this issue by changing the arguments from
func doStuff(prev, next User)
to
func doStuff(prev, next *User)
The answer from #twotwotwo in here is very helpful.
Got this issue at processing list of values([]BigType) of big struct:
for _, stct := range listBigStcts {
go func(stct BigType) {
...process stct ...
}(stct) // <-- error occurs here
}
Workaround is to replace []BigType with []*BigType

Resources