Any side effects to having a large number of mutexes? - go

In my go code, I've a no. of structs that need mutexes:
type vitalMacs struct {
mu *sync.RWMutex
macs map[string]bool
}
//macInfo would store oui info
type macInfo struct {
mu *sync.RWMutex
infoDict map[string]*string
}
var (
NetOPMutex sync.RWMutex
ipMacGlobal = make(map[string]string)
macInfoVar = macInfo{mu: &NetOPMutex}
vitalMacsVar = vitalMacs{mu: &NetOPMutex}
//vitalMacsVar = vitalMacs{mu: &sync.RWMutex{}} // <- like this
)
I could use NetOPMutex for all of them, or, have new mutexes set as &sync.RWMutex{} - which made me wonder:
Are there any side effects to having a large number different of mutexes in your code?
Not trying to avoid mutexes, you gotta guard what you gotta guard, no question there.
I know that if I use the single mutex, things would have to wait for each other and play nice. On the other hand, would having a large number of different mutexes (30+) manifest as higher cpu util or as more time in user vs kernel mode etc?

sync.Mutex and sync.RWMutex are struct types, having mutexes is the same thing as having ordinary struct values. Are there any side effects of having many struct values in your app? Obviously not.
Creating a mutex does not involve magic. It's equivalent to just creating a struct value.
Using mutexes also does not involve magic. Check the sources: Mutex.Lock() and Mutex.Unlock(). They do not launch goroutines or do something that consumes CPU.
Do not use a single mutex just to save / spare some memory. Use multiple mutexes if appropriate to reduce / mitigate contention for the locks.

Related

What is the best way to evict unused records from string pool?

I am implementing a cache in Golang. Let's say the cache could be implemented as sync.Map with integer key and value as a struct:
type value struct {
fileName string
functionName string
}
Huge number of records have the same fileName and functionName. To save memory I want to use string pool. Go has immutable strings and my idea looks like:
var (
cache sync.Map
stringPool sync.Map
)
type value struct {
fileName string
functionName string
}
func addRecord(key int64, val value) {
fileName, _ := stringPool.LoadOrStore(val.fileName, val.fileName)
val.fileName = fileName.(string)
functionName, _ := stringPool.LoadOrStore(val.functionName, val.functionName)
val.functionName = functionName.(string)
cache.Store(key, val)
}
My idea is to keep every unique string (fileName and functionName) in memory once. Will it work?
Cache implementation must be concurrent safe. The number of records in the cache is about 10^8. The number of records in the string pool is about 10^6.
I have some logic that removes records from the cache. There is no problem with main cache size.
Could you please suggest how to manage string pool size?
I am thinking about storing reference count for every record in the string pool. It will require additional synchronizations or probably global locks to maintain it. I would like to implementation as simple as possible. You can see in my code snippet I don't use additional mutexes.
Or may be I need to follow completely different approach to minimize memory usage for my cache?
What you are trying to do with stringPool is commonly known as string interning. There are libraries like github.com/josharian/intern that provide "good enough" solutions to that kind of problem, and that do not require you to manually maintain the stringPool map. Note that no solution (including yours, assuming you eventually remove some elements from stringPool) can reliably deduplicate 100% of strings without incurring impractical levels of CPU overhead.
As a side note, it's worth pointing out that sync.Map is not really designed for update-heavy workloads. Depending on the keys used, you may actually experience significant contention when calling cache.Store. Furthermore, since sync.Map relies on interface{} for both keys and values, it normally incurs much more allocations that a plain map. Make sure to benchmark with realistic workloads to ensure that you picked the right approach.

Goroutine Channel, Copy vs Pointer

Both functions are doing the same task which is initializing "Data struct". what are the Pros or Cons of each function? e.g. the function should unmarshal a big JSON file.
package main
type Data struct {
i int
}
func funcp(c chan *Data) {
var t *Data
t = <-c //receive
t.i = 10
}
func funcv(c chan Data) {
var t Data
t.i = 20
c <- t //send
}
func main() {
c := make(chan Data)
cp := make(chan *Data)
var t Data
go funcp(cp)
cp <- &t //send
println(t.i)
go funcv(c)
t = <- c //receive
println(t.i)
}
Link to Go Playground
The title of your question seems wrong. You are asking not about swapping things but rather about whether to send a pointer to some data or a copy of some data. More importantly, the overall thrust of your question lacks crucial information.
Consider two analogies:
Which is better, chocolate ice cream, or strawberry? That's probably a matter of opinion, but at least both with serve similar purposes.
Which is better, a jar of glue or a brick of C4? That depends on whether you want to build something, or blow something up, doesn't it?
If you send a copy of data through a channel, the receiver gets ... a copy. The receiver does not have access to the original. The copying process may take some time, but the fact that the receiver does not have to share access may speed things up. So this is something of an opinion, and if your question is about which is faster, well, you'll have to benchmark it. Be sure to benchmark the real problem, and not a toy example, because benchmarks on toy examples don't translate to real-world performance.
If you send a pointer to data through a channel, the receiver gets a copy of the pointer, and can therefore modify the original data. Copying the pointer is fast, but the fact that the receiver has to share access may slow things down. But if the receiver must be able to modify the data, you have no choice. You must use a tool that works, and not one that does not.
In your two functions, one generates values (funcv) so it does not have to send pointers. That's fine, and gives you the option. The other (funcp) receives objects but wants to update them so it must receive a pointer to the underlying object. That's fine too, but it means that you are now communicating by sharing (the underlying data structure), which requires careful coordination.

how to understand the relation between uintptr and struct?

I have learned code like the following
func str2bytes(s string) []byte {
x := (*[2]uintptr)(unsafe.Pointer(&s))
h := [3]uintptr{x[0], x[1], x[1]}
return *(*[]byte)(unsafe.Pointer(&h))
}
this function is to change string to []byte without the stage copying data.
I try to convert num to reverseNum
type Num struct {
name int8
value int8
}
type ReverseNum struct {
value int8
name int8
}
func main() {
n := Num{100, 10}
z := (*[2]uintptr)(unsafe.Pointer(&n))
h := [2]uintptr{z[1], z[0]}
fmt.Println(*(*ReverseNum)(unsafe.Pointer(&h))) // print result is {0, 0}
}
this code doesn't get the result I want.
Can anybody tell my about
That's too compilcated.
A simpler
package main
import (
"fmt"
"unsafe"
)
type Num struct {
name int8
value int8
}
type ReverseNum struct {
value int8
name int8
}
func main() {
n := Num{name: 42, value: 12}
p := (*ReverseNum)(unsafe.Pointer(&n))
fmt.Println(p.value, p.name)
}
outputs "42, 12".
But the real question is why on Earth would you want to go for such trickery instead of copying two freaking bytes which is done instantly on any sensible CPU Go programs run on?
Another problem with your approach is that IIUC nothing in the Go language specification guarantees that two types which have seemingly identical fields must have identical memory layouts. I beleive they should on most implementations but I do not think they are required to do that.
Also consider that seemingly innocuous things like also having an extra field (even of type struct{}!) in your data type may do interesting things to memory layouts of the variables of those types, so it may be outright dangerous to assume you may reinterpret memory of Go variables the way you want.
... I just want to learn about the principle behind the package unsafe.
It's an escape hatch.
All strongly-typed but compiled languages have a basic problem: the actual machines on which the compiled programs will run do not have the same typing system as the compiler.1 That is, the machine itself probably has a linear address space where bytes are assembled into machine words that are grouped into pages, and so on. The operating system may also provide access at, say, page granularity: if you need more memory, the OS will give you one page—4096 bytes, or 8192 bytes, or 65536 bytes, or whatever the page size is—of additional memory at a time.
There are many ways to attack this problem. For instance, one can write code directly in machine (or assembly) language, using the hardware's instruction set, to talk to the OS to achieve OS-level things. This code can then talk to the compiled program, acting as the go-between. If the compiled program needs to allocate a 40-byte data structure, this machine-level code can figure out how to do that within the strictures of the OS's page-size allocations.
But writing machine code is difficult and time-consuming. That's precisely why we have high-level languages and compilers in the first place. What if we had a way to, within the high-level language, violate the normal rules imposed by the language? By violating specific requirements in specific ways, carefully coordinating those ways with all other code that also violates those requirements, we can, in code we keep away from the usual application programming, write much of our memory-management, process-management, and so on in our high-level language.
In other words, we can use unsafe (or something similar in other languages) to deliberately break the type-safety provided by our high level language. When we do this—when we break the rules—we must know what all the rules are, and that our specific violations here will function correctly when combined with all the normal code that does obey the normal rules and when combined with all the special, unsafe code that breaks the rules.
This often requires help from the compiler itself. If you inspect the runtime source distributed with Go, you will find routines with annotations like go:noescape, go:noinline, go:nosplit, and go:nowritebarrier. You need to know when and why these are required if you are going to make much use of some of the escape-hatch programming.
A few of the simpler uses, such as tricks to gain access to string or slice headers, are ... well, they are still unsafe, but they are unsafe in more-predictable ways and do not require this kind of close coordination with the compiler itself.
To understand how, when, and why they work, you need to understand how the compiler and runtime allocate and work with strings and slices, and in some cases, how memory is laid out on the hardware, and some of the rules about Go's garbage collector. In particular, the GC code is aware of unsafe.Pointer but not of uintptr. Much of this is pretty tricky: see, e.g., https://utcc.utoronto.ca/~cks/space/blog/programming/GoUintptrVsUnsafePointer and the link to https://github.com/golang/go/issues/19135, in which writing nil to a Go pointer value caused Go's garbage collector to complain, because the write caused the GC to inspect the previously stored value, which was invalid.
1See this Wikipedia article on the Intel 432 for a notable attempt at designing hardware to run compiled high level languages. There have been others in the past as well, often with the same fate, though some IBM projects have been more successful.

Is it necessary to return pointer type in sync.Pool New function?

I saw the issue on Github which says sync.Pool should be used only with pointer types, for example:
var TPool = sync.Pool{
New: func() interface{} {
return new(T)
},
}
Does it make sense? What about return T{} and which is the better choice, why?
The whole point of sync.Pool is to avoid (expensive) allocations. Large-ish buffers, etc. You allocate a few buffers and they stay in memory, available for reuse. Hence the use of pointers.
But here you'll be copying the values on every step, defeating the purpose. (Assuming your T is a "normal" struct and not something like SliceHeader)
It is not necessary. In most cases it should be a pointer as you want to share an object, not to make copies.
In some use cases this can be a non pointer type, like an id of some external resource. I can imagine a pool of paths (mounted disk drives) represented with strings where some large file operations are being conducted.

Go error: gob: type sync.Mutex has no exported fields

I am having an issue where I can't save a struct as a gob if it has an exported sync.Mutex. Everything seems to work if I make the mutex unexported (by not capitalizing it). I'm curious to understand why this is and make sure that there's no other issue with gobbing an unexported sync.Mutex.
I see that there are several hits on Google for a related problem with sync.RWMutex, but none really explain why this happens.
package main
import (
"sync"
"encoding/gob"
"os"
"fmt"
)
func writeGob(filePath string, object interface{}) error {
file, err := os.Create(filePath)
defer file.Close()
if err != nil {
return err
}
encoder := gob.NewEncoder(file)
err = encoder.Encode(object)
return err
}
type Dog struct {
Name string
GobMux sync.Mutex
}
func main() {
d := &Dog{Name: "Fido"}
err := writeGob("./gob", d)
fmt.Printf("Error: %v\n", err)
}
Output:
Error: gob: type sync.Mutex has no exported fields
Gob Encoding
Everything seems to work if I make the mutex unexported (by not capitalizing it). I'm curious to understand why this is.
As Cerise mentioned there is an open issue for this, but in short it is normally a programming error if you try to encode a struct (such as a mutex) which has no exported fields.
There are some ways to work around this particular problem, though.
You can make the mutex private and wrap the lock/unlock in a public function, rather than reaching into the struct to manipulate the mutex. E.g.
func (d *Dog) SetName(name string) {
d.GobMux.Lock()
d.Name = name
d.GobMux.Unlock()
}
You can also wrap the type and pull the mutex out:
type Dog struct {
Name string
}
type DogWrapper struct {
Dog *Dog
GobMux sync.Mutex
}
This is fairly cumbersome if you have many small structs but for a smaller number of more complex structs it might be OK.
Finally, the "correct" way to solve this problem is to write your own GobEncode/Decode routines. There are some sparse examples in the stdlib, such as time.GobEncode, but generally this seems like quite a bit of work.
Mutexes in General
...and make sure that there's no other issue with gobbing an unexported sync.Mutex.
Mutexes are tightly coupled to the Go runtime's in-process memory and scheduler. They help the Go runtime decide which goroutines are allowed to read or write to a particular part of memory, and also decide when those goroutines may be scheduled (i.e. a goroutine waiting on a mutex to unlock will not be scheduled until the mutex is unlocked).
If you use Gob to copy a data structure to another process, the recipient process's runtime has completely different internal state compared to the process that is sending the gob, and as such the mutex cannot be logically transferred. Copying a mutex to another process would be a bit like using Earth GPS coordinates on Mars. They just don't match up.
when I read back a Dog object from a gob, it looks like the mutex is unlocked regardless of the state of the mutex when it was saved to a gob. Is this behavior I can count on?
As the documentation for Mutex states, "The zero value for a Mutex is an unlocked mutex." So yes, you can rely on this behavior.
Other Encoders
In my opinion, despite gob's presence in the stdlib, it does not receive much attention because there are many other mature encoding options available. If gob doesn't meet your needs there are many other options available -- JSON, Cap'n Proto, net/rpc, etc. with different characteristics that may work better for you.

Resources