How to implement Memory Pooling in Golang - memory-management

I implemented an HTTP server in Go.
For each request, I need to create hundreds of objects for a particular struct, and I have ~10 structs like that. So after the request is finished as per Go implementation it will be garbage collected.
So for each request this much amount of memory will be allocated and deallocated.
Instead I wanted to implement memory pooling to improve performance from allocation side as well as GC side also
In the beginning of request, I will take from pool and put them back after the request is served
From the pool implementation side
How to allocate and deallocate memory of a particular type of struct?
How keep track of information this memory got assigned and other is not?
Any other suggestions to improve performance in case of memory allocation and deallocation?

Note beforehand:
Many suggest to use sync.Pool which is a fast, good implementation for temporary objects. But note that sync.Pool does not guarantee that pooled objects are retained. Quoting from its doc:
Any item stored in the Pool may be removed automatically at any time without notification. If the Pool holds the only reference when this happens, the item might be deallocated.
So if you don't want your objects in the Pool to get garbage collected (which depending on your case might result in more allocations), the solution presented below is better, as values in the channel's buffer are not garbage collected. If your objects are really that big that memory pool is justified, the overhead of the pool-channel will be amortized.
Moreover, sync.Pool does not allow you to limit the number of pooled objects, while the presented solution below naturally does.
The simplest memory pool "implementation" is a buffered channel.
Let's say you want a memory pool of some big objects. Create a buffered channel holding pointers to values of such expensive objects, and whenever you need one, receive one from the pool (channel). When you're done using it, put it back to the pool (send on the channel). To avoid accidentally losing the objects (e.g. in case of a panic), use defer statement when putting them back.
Let's use this as the type of our big objects:
type BigObject struct {
Id int
Something string
}
Creating a pool is:
pool := make(chan *BigObject, 10)
The size of the pool is simply the size of the channel's buffer.
Filling the pool with pointers of expensive objects (this is optional, see notes at the end):
for i := 0; i < cap(pool); i++ {
bo := &BigObject{Id: i}
pool <- bo
}
Using the pool by many goroutines:
wg := sync.WaitGroup{}
for i := 0; i < 100; i++ {
wg.Add(1)
go func() {
defer wg.Done()
bo := <-pool
defer func() { pool <- bo }()
fmt.Println("Using", bo.Id)
fmt.Println("Releasing", bo.Id)
}()
}
wg.Wait()
Try it on the Go Playground.
Note that this implementation blocks if all the "pooled" objects are in use. If you don't want this, you may use select to force creating new objects if all are in use:
var bo *BigObject
select {
case bo = <-pool: // Try to get one from the pool
default: // All in use, create a new, temporary:
bo = &BigObject{Id:-1}
}
And in this case you don't need to put it back into the pool. Or you may choose to try to put all back into the pool if there's room in the pool, without blocking, again with select:
select {
case pool <- bo: // Try to put back into the pool
default: // Pool is full, will be garbage collected
}
Notes:
Filling the pool prior is optional. If you use select to try to get / put back values from / to the pool, the pool may initially be empty.
You have to make sure you're not leaking information between requests, e.g. make sure you don't use fields and values in your shared objects that were set and belong to other requests.

This is the sync.Pool implementation mentioned by #JimB. Mind the usage of defer to return object to pool.
package main
import "sync"
type Something struct {
Name string
}
var pool = sync.Pool{
New: func() interface{} {
return &Something{}
},
}
func main() {
s := pool.Get().(*Something)
defer pool.Put(s)
s.Name = "hello"
// use the object
}

Related

Golang concurrency access to slice

My use case:
append items(small struct) to a slice in the main process
every 100 items I want to process items in a processor go routine (then pop them from slice)
items comme in very fast continuously
I read that if there is at least one "write" in more then two goroutines using a variable (slice in my case), one shall handle the concurrency (mutex or similar).
My questions:
If I do not handle with a mutex the r/w on slice do I risk problems ? (ie. Item 101 arrives while the processor is working on 1-100s)
What is the best concurrency technique for the incoming item flow to remain "fluent" ?
Disclaimer:
I do not want any event queueing, I need to process items in a given "bundle" size
Actually you don't need a lock here. Here is a working code:
package main
import (
"fmt"
"sync"
)
type myStruct struct {
Cpt int
}
func main() {
buf := make([]myStruct, 0, 100)
wg := sync.WaitGroup{}
// Main process
// Appending one million times
for i := 0; i < 10e6; i++ {
// Locking buffer
// Appending
buf = append(buf, myStruct{Cpt: i})
// Did we reach 100 items ?
if len(buf) >= 100 {
// Yes we did. Creating a slice from the buffer
processSlice := make([]myStruct, 100)
copy(processSlice, buf[0:100])
// Emptying buffer
buf = buf[:0]
// Running processor in parallel
// Adding one element to waitgroup
wg.Add(1)
go processor(&wg, processSlice)
}
}
// Waiting for all processors to finish
wg.Wait()
}
func processor(wg *sync.WaitGroup, processSlice []myStruct) {
// Removing one element to waitgroup when done
defer wg.Done()
// Doing some process
fmt.Printf("Procesing items from %d to %d\n", processSlice[0].Cpt, processSlice[99].Cpt)
}
A few notes about your problem and this solution:
If you want a minimal stop time in your feeding process (e.g, to respond as fast as possible to a HTTP call), then the minimal thing to do is just the copy part, and run the processor function in a go routine. By doing so, you have to create a unique process slice dynamically and copying the content of your buffer inside it.
The sync.WaitGroup object is needed to ensure that the last processor function has ended before exiting the program.
Note that this is not a perfect solution: If you run this pattern for a long time, and the input data comes in more than 100 times faster than the processor processes the slices, then there are going to be:
More and more processSlice instances in RAM -> Risks for filling the RAM and hitting the swap
More and more parallel processor goroutines -> Same risks for the RAM, and more to process in the same time, making each of the calls be slower and the problem gets self-feeding.
This will end up in the system crashing at some point.
The solution for this is to have a limited number of workers that ensures there is no crash. However, when this number of workers is fully busy, then there will be wait in the feeding process, which does not answer what you want. However this is a good solution to absorb a charge which intensity is changing in time.
In general, just remember that if you feed more data than you can process in the same time, your program will just reach a limit at some point where it can't handle it so it has to slow down input acquisition or crash. This is mathematical!

Deadlock-free locking multiple locks in Go

Is there a proven programmatic way to achieve mutual exclusion of multiple Mutexes / Locks / whatever in Golang?
Eg.
mutex1.Lock()
defer mutex1.Unlock()
mutex2.Lock()
defer mutex2.Unlock()
mutex3.Lock()
defer mutex3.Unlock()
would keep mutex1 locked I guess while waiting for mutex2 / mutex3. For several goroutines, all using a different subset of several locks, this easily can get deadlocked.
So is there any way to acquire those locks only if all of them are available? Or is there any other pattern (using channels maybe?) in achieving the same?
So is there any way to acquire those locks only if all of them are available?
No. Not with standard library mutex at least. There is no way to "check" if a lock is available, or "try" acquiring a lock. Every call of Lock() will block until locking is successful.
The implementation of mutex relies on atomic operations which only act on a single value at once. In order to achieve what you describe, you'd need some kind of "meta locking" where execution of the lock and unlock methods are themselves protected by a lock, but this probably isn't necessary just to have correct and safe locking in your program:
Penelope Stevens' comment explains correctly: As long as the order of acquisition is consistent between different goroutines, you won't get deadlocks, even for arbitrary subsets of the locks, if each goroutine eventually releases the locks it acquires.
If the structure of your program provides some obvious locking order, then use that. If not, you can create your own special mutex type that has intrinsic ordering.
type Mutex struct {
sync.Mutex
id uint64
}
var mutexIDCounter uint64
func NewMutex() *Mutex {
return &Mutex{
id: atomic.AddUint64(&mutexIDCounter, 1),
}
}
func MultiLock(locks ...*Mutex) {
sort.Slice(locks, func(i, j int) bool { return locks[i].id < locks[j].id })
for i := range locks {
locks[i].Lock()
}
}
func MultiUnlock(locks ...*Mutex) {
for i := range locks {
locks[i].Unlock()
}
}
Usage:
a := NewMutex()
b := NewMutex()
c := NewMutex()
MultiLock(a, b, c)
MultiUnlock(a, b, c)
Each mutex is assigned an incrementing ID, and unlocked in order of ID.
Try it for yourself in this example program on the playground. To prevent the deadlock, change const safe = true.
You might as well just put all the Mutexes into an array or into a map and implement a function that will call Lock() method on each of those using range based loop, so the order of locking remains the same.
The function might also take an array containing indexes it shall skip(or not skip if you like).
This way there will be no way to change the order of locking the mutexes as the order of the loop is consistent.

What happens when reading or writing concurrently without a mutex

In Go, a sync.Mutex or chan is used to prevent concurrent access of shared objects. However, in some cases I am just interested in the "latest" value of a variable or field of an object.
Or I like to write a value and do not care if another go-routine overwrites it later or has just overwritten it before.
Update: TLDR; Just don't do this. It is not safe. Read the answers, comments, and linked documents!
Update 2021: The Go memory model is going to be specified more thoroughly and there are three great articles by Russ Cox that will teach you more about the surprising effects of unsynchronized memory access. These articles summarize a lot of the below discussions and learnings.
Here are two variants good and bad of an example program, where both seem to produce "correct" output using the current Go runtime:
package main
import (
"flag"
"fmt"
"math/rand"
"time"
)
var bogus = flag.Bool("bogus", false, "use bogus code")
func pause() {
time.Sleep(time.Duration(rand.Uint32()%100) * time.Millisecond)
}
func bad() {
stop := time.After(100 * time.Millisecond)
var name string
// start some producers doing concurrent writes (DANGER!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
name = fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
for {
select {
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func good() {
stop := time.After(100 * time.Millisecond)
names := make(chan string, 10)
// start some producers concurrently writing to a channel (GOOD!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
names <- fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
var name string
for {
select {
case name = <-names:
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func main() {
flag.Parse()
if *bogus {
bad()
} else {
good()
}
}
The expected output is as follows:
...
read: name = 3
read: name = 3
read: name = 5
read: name = 4
...
Any combination of read: and read: name=[0-9] is correct output for this program. Receiving any other string as output would be an error.
When running this program with go run --race bogus.go it is safe.
However, go run --race bogus.go -bogus warns of the concurrent reads and writes.
For map types and when appending to slices I always need a mutex or a similar method of protection to avoid segfaults or unexpected behavior. However, reading and writing literals (atomic values) to variables or field values seems to be safe.
Question: Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
Please explain why something is safe or unsafe in Go in your answer.
Update: I rewrote the example to better reflect the original code, where I had the the concurrent writes issue. The important leanings are already in the comments. I will accept an answer that summarizes these learnings with enough detail (esp. on the Go-runtime).
However, in some cases I am just interested in the latest value of a variable or field of an object.
Here is the fundamental problem: What does the word "latest" mean?
Suppoose that, mathematically speaking, we have a sequence of values Xi, with 0 <= i < N. Then obviously Xj is "later than" Xi if j > i. That's a nice simple definition of "latest" and is probably the one you want.
But when two separate CPUs within a single machine—including two goroutines in a Go program—are working at the same time, time itself loses meaning. We cannot say whether i < j, i == j, or i > j. So there is no correct definition for the word latest.
To solve this kind of problem, modern CPU hardware, and Go as a programming language, gives us certain synchronization primitives. If CPUs A and B execute memory fence instructions, or synchronization instructions, or use whatever other hardware provisions exist, the CPUs (and/or some external hardware) will insert whatever is required for the notion of "time" to regain its meaning. That is, if the CPU uses barrier instructions, we can say that a memory load or store that was executed before the barrier is a "before" and a memory load or store that is executed after the barrier is an "after".
(The actual implementation, in some modern hardware, consists of load and store buffers that can rearrange the order in which loads and stores go to memory. The barrier instruction either synchronizes the buffers, or places an actual barrier in them, so that loads and stores cannot move across the barrier. This particular concrete implementation gives an easy way to think about the problem, but isn't complete: you should think of time as simply not existing outside the hardware-provided synchronization, i.e., all loads from, and stores to, some location are happening simultaneously, rather than in some sequential order, except for these barriers.)
In any case, Go's sync package gives you a simple high level access method to these kinds of barriers. Compiled code that executes before a mutex Lock call really does complete before the lock function returns, and the code that executes after the call really does not start until after the lock function returns.
Go's channels provide the same kinds of before/after time guarantees.
Go's sync/atomic package provides much lower level guarantees. In general you should avoid this in favor of the higher level channel or sync.Mutex style guarantees. (Edit to add note: You could use sync/atomic's Pointer operations here, but not with the string type directly, as Go strings are actually implemented as a header containing two separate values: a pointer, and a length. You could solve this with another layer of indirection, by updating a pointer that points to the string object. But before you even consider doing that, you should benchmark the use of the language's preferred methods and verify that these are a problem, because code that works at the sync/atomic level is hard to write and hard to debug.)
Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
None.
It really is that simple: You cannot, under no circumstance whatsoever, read and write concurrently to anything in Go.
(Btw: Your "correct" program is not correct, it is racy and even if you get rid of the race condition it would not deterministically produce the output.)
Why can't you use channels
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup // wait group to close channel
var buffer int = 1 // buffer of the channel
// channel to get the share data
cName := make(chan string, buffer)
for i := 0; i < 10; i++ {
wg.Add(1) // add to wait group
go func(i int) {
cName <- fmt.Sprintf("name = %d", i)
wg.Done() // decrease wait group.
}(i)
}
go func() {
wg.Wait() // wait of wait group to be 0
close(cName) // close the channel
}()
// process all the data
for n := range cName {
println("read:", n)
}
}
The above code returns the following output
read: name = 0
read: name = 5
read: name = 1
read: name = 2
read: name = 3
read: name = 4
read: name = 7
read: name = 6
read: name = 8
read: name = 9
https://play.golang.org/p/R4n9ssPMOeS
Article about channels

Attempting to acquire a lock with a deadline in golang?

How can one only attempt to acquire a mutex-like lock in go, either aborting immediately (like TryLock does in other implementations) or by observing some form of deadline (basically LockBefore)?
I can think of 2 situations right now where this would be greatly helpful and where I'm looking for some sort of solution. The first one is: a CPU-heavy service which receives latency sensitive requests (e.g. a web service). In this case you would want to do something like the RPCService example below. It is possible to implement it as a worker queue (with channels and stuff), but in that case it becomes more difficult to gauge and utilize all available CPU. It is also possible to just accept that by the time you acquire the lock your code may already be over deadline, but that is not ideal as it wastes some amount of resources and means we can't do things like a "degraded ad-hoc response".
/* Example 1: LockBefore() for latency sensitive code. */
func (s *RPCService) DoTheThing(ctx context.Context, ...) ... {
if s.someObj[req.Parameter].mtx.LockBefore(ctx.Deadline()) {
defer s.someObj[req.Parameter].mtx.Unlock()
... expensive computation based on internal state ...
} else {
return s.cheapCachedResponse[req.Parameter]
}
}
Another case is when you have a bunch of objects which should be touched, but which may be locked, and where touching them should complete within a certain amount of time (e.g. updating some stats). In this case you could also either use LockBefore() or some form of TryLock(), see the Stats example below.
/* Example 2: TryLock() for updating stats. */
func (s *StatsObject) updateObjStats(key, value interface{}) {
if s.someObj[key].TryLock() {
defer s.someObj[key].Unlock()
... update stats ...
... fill in s.cheapCachedResponse ...
}
}
func (s *StatsObject) UpdateStats() {
s.someObj.Range(s.updateObjStats)
}
For ease of use, let's assume that in the above case we're talking about the same s.someObj. Any object may be blocked by DoTheThing() operations for a long time, which means we would want to skip it in updateObjStats. Also, we would want to make sure that we return the cheap response in DoTheThing() in case we can't acquire a lock in time.
Unfortunately, sync.Mutex only and exclusively has the functions Lock() and Unlock(). There is no way to potentially acquire a lock. Is there some easy way to do this instead? Am I approaching this class of problems from an entirely wrong angle, and is there a different, more "go"ish way to solve them? Or will I have to implement my own Mutex library if I want to solve these? I am aware of issue 6123 which seems to suggest that there is no such thing and that the way I'm approaching these problems is entirely un-go-ish.
Use a channel with buffer size of one as mutex.
l := make(chan struct{}, 1)
Lock:
l <- struct{}{}
Unlock:
<-l
Try lock:
select {
case l <- struct{}{}:
// lock acquired
<-l
default:
// lock not acquired
}
Try with timeout:
select {
case l <- struct{}{}:
// lock acquired
<-l
case <-time.After(time.Minute):
// lock not acquired
}
I think you're asking several different things here:
Does this facility exist in the standard libray? No, it doesn't. You can probably find implementations elsewhere - this is possible to implement using the standard library (atomics, for example).
Why doesn't this facility exist in the standard library: the issue you mentioned in the question is one discussion. There are also several discussions on the go-nuts mailing list with several Go code developers contributing: link 1, link 2. And it's easy to find other discussions by googling.
How can I design my program such that I won't need this?
The answer to (3) is more nuanced and depends on your exact issue. Your question already says
It is possible to implement it as a worker queue (with channels and
stuff), but in that case it becomes more difficult to gauge and
utilize all available CPU
Without providing details on why it would be more difficult to utilize all CPUs, as opposed to checking for a mutex lock state.
In Go you usually want channels whenever the locking schemes become non-trivial. It shouldn't be slower, and it should be much more maintainable.
How about this package: https://github.com/viney-shih/go-lock . It use channel and semaphore (golang.org/x/sync/semaphore) to solve your problem.
go-lock implements TryLock, TryLockWithTimeout and TryLockWithContext functions in addition to Lock and Unlock. It provides flexibility to control the resources.
Examples:
package main
import (
"fmt"
"time"
"context"
lock "github.com/viney-shih/go-lock"
)
func main() {
casMut := lock.NewCASMutex()
casMut.Lock()
defer casMut.Unlock()
// TryLock without blocking
fmt.Println("Return", casMut.TryLock()) // Return false
// TryLockWithTimeout without blocking
fmt.Println("Return", casMut.TryLockWithTimeout(50*time.Millisecond)) // Return false
// TryLockWithContext without blocking
ctx, cancel := context.WithTimeout(context.Background(), 50*time.Millisecond)
defer cancel()
fmt.Println("Return", casMut.TryLockWithContext(ctx)) // Return false
// Output:
// Return false
// Return false
// Return false
}
PMutex from package https://github.com/myfantasy/mfs
PMutex implements RTryLock(ctx context.Context) and TryLock(ctx context.Context)
// ctx - some context
ctx := context.Background()
mx := mfs.PMutex{}
isLocked := mx.TryLock(ctx)
if isLocked {
// DO Something
mx.Unlock()
} else {
// DO Something else
}

Understanding Golang memory management with large slice of strings

I am working on a chat bot for the site Twitch.tv that is written in Go.
One of the features of the bot is a points system that rewards users for watching a particular stream. This data is stored in a SQLite3 database.
To get the viewers, the bot makes an API call to twitch and gathers all of the current viewers of a stream. These viewers are then put in a slice of strings.
Total viewers can range anywhere from a couple to 20,000 or more.
What the bot does
Makes API call
Stores all viewers in a slice of strings
For each viewer, bot iterates and adds points accordingly.
Bot clears this slice before next iteration
Code
type Viewers struct {
Chatters struct {
CurrentModerators []string `json:"moderators"`
CurrentViewers []string `json:"viewers"`
} `json:"chatters"`
}
func RunPoints(timer time.Duration, modifier int, conn net.Conn, channel string) {
database := InitializeDB() // Loads database through SQLite3 driver
var Points int
var allUsers []string
for range time.NewTicker(timer * time.Second).C {
currentUsers := GetViewers(conn, channel)
tx, err := database.Begin()
if err != nil {
fmt.Println("Error starting points transaction: ", err)
}
allUsers = append(allUsers, currentUsers.Chatters.CurrentViewers...)
allUsers = append(allUsers, currentUsers.Chatters.CurrentModerators...)
for _, v := range allUsers {
userCheck := UserInDB(database, v)
if userCheck == false {
statement, _ := tx.Prepare("INSERT INTO points (Username, Points) VALUES (?, ?)")
statement.Exec(v, 1)
} else {
err = tx.QueryRow("Select Points FROM points WHERE Username = ?", v).Scan(&Points)
if err != nil {
} else {
Points = Points + modifier
statement, _ := tx.Prepare("UPDATE points SET Points = ? WHERE username = ?")
statement.Exec(Points, v)
}
}
}
tx.Commit()
allUsers = allUsers[:0]
currentUsers = Viewers{} // Clear Viewer object struct
}
Expected Behavior
When pulling thousands of viewers, naturally, I expect the system resources to get pretty high. This can turn the bot using 3.0 MB of RAM up to 20 MB+. Thousands of elements takes up a lot of space, of course!
However, something else happens.
Actual Behavior
Each time the API is called, the RAM increases as expected. But because I clear the slice, I expect it to fall back down to its 'normal' 3.0 MB of usage.
However, the amount of RAM usage increases per API call, and doesn't go back down even if the total number of viewers of a stream creases.
Thus, given a few hours, the bot will easily consume 100 + MB of ram which doesn't seem right to me.
What am I missing here? I'm fairly new to programming and CS in general, so perhaps I am trying to fix something that isn't a problem. But this almost sounds like a memory leak to me.
I have tried forcing garbage collection and freeing the memory through Golang's run time library, but this does not fix it.
To understand what's happening here, you need to understand the internals of a slice and what's happening with it. You should probably start with https://blog.golang.org/go-slices-usage-and-internals
To give a brief answer: A slice gives a view into a portion of an underlying array, and when you are attempting to truncate your slice, all you're doing is reducing the view you have over the array, but the underlying array remains unaffected and still takes up just as much memory. In fact, by continuing to use the same array, you're never going to decrease the amount of memory you're using.
I'd encourage you to read up on how this works, but as an example for why no actual memory would be freed up, take a look at the output from this simple program that demos how changes to a slice will not truncate the memory allocated under the hood: https://play.golang.org/p/PLEZba8uD-L
When you reslice the slice:
allUsers = allUsers[:0]
All the elements are still in the backing array and cannot be collected. The memory is still allocated, which will save some time in the next run (it doesn't have to resize the array so much, saving slow allocations), which is the point of reslicing it to zero length instead of just dumping it.
If you want the memory released to the GC, you'd need to just dump it altogether and create a new slice every time. This would be slower, but use less memory between runs. However, that doesn't necessarily mean you'll see less memory used by the process. The GC collects unused heap objects, then may eventually free that memory to the OS, which may eventually reclaim it, if other processes are applying memory pressure.

Resources