Understanding Golang memory management with large slice of strings - go

I am working on a chat bot for the site Twitch.tv that is written in Go.
One of the features of the bot is a points system that rewards users for watching a particular stream. This data is stored in a SQLite3 database.
To get the viewers, the bot makes an API call to twitch and gathers all of the current viewers of a stream. These viewers are then put in a slice of strings.
Total viewers can range anywhere from a couple to 20,000 or more.
What the bot does
Makes API call
Stores all viewers in a slice of strings
For each viewer, bot iterates and adds points accordingly.
Bot clears this slice before next iteration
Code
type Viewers struct {
Chatters struct {
CurrentModerators []string `json:"moderators"`
CurrentViewers []string `json:"viewers"`
} `json:"chatters"`
}
func RunPoints(timer time.Duration, modifier int, conn net.Conn, channel string) {
database := InitializeDB() // Loads database through SQLite3 driver
var Points int
var allUsers []string
for range time.NewTicker(timer * time.Second).C {
currentUsers := GetViewers(conn, channel)
tx, err := database.Begin()
if err != nil {
fmt.Println("Error starting points transaction: ", err)
}
allUsers = append(allUsers, currentUsers.Chatters.CurrentViewers...)
allUsers = append(allUsers, currentUsers.Chatters.CurrentModerators...)
for _, v := range allUsers {
userCheck := UserInDB(database, v)
if userCheck == false {
statement, _ := tx.Prepare("INSERT INTO points (Username, Points) VALUES (?, ?)")
statement.Exec(v, 1)
} else {
err = tx.QueryRow("Select Points FROM points WHERE Username = ?", v).Scan(&Points)
if err != nil {
} else {
Points = Points + modifier
statement, _ := tx.Prepare("UPDATE points SET Points = ? WHERE username = ?")
statement.Exec(Points, v)
}
}
}
tx.Commit()
allUsers = allUsers[:0]
currentUsers = Viewers{} // Clear Viewer object struct
}
Expected Behavior
When pulling thousands of viewers, naturally, I expect the system resources to get pretty high. This can turn the bot using 3.0 MB of RAM up to 20 MB+. Thousands of elements takes up a lot of space, of course!
However, something else happens.
Actual Behavior
Each time the API is called, the RAM increases as expected. But because I clear the slice, I expect it to fall back down to its 'normal' 3.0 MB of usage.
However, the amount of RAM usage increases per API call, and doesn't go back down even if the total number of viewers of a stream creases.
Thus, given a few hours, the bot will easily consume 100 + MB of ram which doesn't seem right to me.
What am I missing here? I'm fairly new to programming and CS in general, so perhaps I am trying to fix something that isn't a problem. But this almost sounds like a memory leak to me.
I have tried forcing garbage collection and freeing the memory through Golang's run time library, but this does not fix it.

To understand what's happening here, you need to understand the internals of a slice and what's happening with it. You should probably start with https://blog.golang.org/go-slices-usage-and-internals
To give a brief answer: A slice gives a view into a portion of an underlying array, and when you are attempting to truncate your slice, all you're doing is reducing the view you have over the array, but the underlying array remains unaffected and still takes up just as much memory. In fact, by continuing to use the same array, you're never going to decrease the amount of memory you're using.
I'd encourage you to read up on how this works, but as an example for why no actual memory would be freed up, take a look at the output from this simple program that demos how changes to a slice will not truncate the memory allocated under the hood: https://play.golang.org/p/PLEZba8uD-L

When you reslice the slice:
allUsers = allUsers[:0]
All the elements are still in the backing array and cannot be collected. The memory is still allocated, which will save some time in the next run (it doesn't have to resize the array so much, saving slow allocations), which is the point of reslicing it to zero length instead of just dumping it.
If you want the memory released to the GC, you'd need to just dump it altogether and create a new slice every time. This would be slower, but use less memory between runs. However, that doesn't necessarily mean you'll see less memory used by the process. The GC collects unused heap objects, then may eventually free that memory to the OS, which may eventually reclaim it, if other processes are applying memory pressure.

Related

Fix memory consumption of a go program with goroutines

I am working on a problem that involves a producer-consumer pattern. I have one producer who produces the task and 'n' consumers that consumes the task. A consumer task is to read some data from a file and then upload that data to S3. One consumer can read up to xMB(8/16/32) of data and then uploads it to s3. keeping all the data in memory was causing memory consumption that was more than what is expected from the program so I switched to reading the data from file and then writing it to some temporary file and then uploading the file to S3, though this performed better in terms of memory but CPU took a hit. I wonder if there is any way to allocate a fixed size of memory once and then use it among different goroutines?
What I would want is that if I have 4 goroutines then I can allocate 4 different array of xMB and then use the same array in each goroutine invocation, so that a goroutine doesn't allocate for memory every time and also doesn't depend on GC to free the memory?
Edit: Adding a crux of my code. My go consumer looks like:
type struct Block {
offset int64
size int64
}
func consumer (blocks []Block) {
var dataArr []byte
for _, block := range blocks {
data := file.Read(block.offset, block.size)
dataArr = append(dataArr, data)
}
upload(dataArr)
}
I read the data from file based on Blocks, this block can contain several small chunks limited by xMB or one big chunk of xMB.
Edit2: Tried sync.Pool based on suggestions in comment. but I did not see any improvement in memory consumption. Am I doing something wrong?
var pool *sync.Pool
func main() {
pool = &sync.Pool{
New: func()interface{} {
return make([]byte, 16777216)
},
}
for i:=0; i < 4; i++ {
// blocks is 2-d array each index contains array of blocks.
go consumer(blocks[i])
}
}
go consumer(blocks []Blocks) {
var dataArr []byte
d := pool.(Get).([]byte)
for _, block := range blocks {
file.Read(block.offset,block.size,d[block.offset:block.size])
}
upload(data)
pool.put(data)
}
Take a look at SA6002 of StaticCheck, about sync.Pool. You can also use pprof tool.

Golang: remove structs older than 1h from slice

So I'm building a small utility that listens on a socket and stores incoming messages as a structs in a slice:
var points []Point
type Point struct {
time time.Time
x float64
y float64
}
func main() {
received = make([]Point, 0)
l, err := net.Listen("tcp4", ":8900")
// (...)
}
func processIncomingData(data string) {
// Parse icoming data that comes as: xValue,yValue
inData = strings.Split(data, ",")
x, err := strconv.ParseFloat(inData[0], 64);
if err != nil {
fmt.Println(err)
}
y, err := strconv.ParseFloat(inData[1], 64);
if err != nil {
fmt.Println(err)
}
// Store the new Point
points = append(points, Point{
time: time.Now(),
x: x,
y: y,
})
// Remove points older than 1h ?
}
Now, as you might imagine this will quickly fill my RAM. What's the best way (faster execution) to remove points older than 1h after appening each new one? I'll be getting new points 10-15 times peer second.
Thank you.
An approach I've used several times is to start a goroutine early in the project that looks something like this:
go cleanup()
...
func cleanup() {
for {
time.Sleep(...)
// do cleanup
}
}
Then what you could do is iterate over points using time.Since(point.time) to figure out how old each piece of data is. If it's too old, there's a slice trick to remove an item from a slice given it's position:
points = append(points[:i], points[i+1:]...)
(where i is the index to remove)
Because the points are in the slice in order of the time they were added, you could speed things up by simply finding the first index that isn't an hour old and doing points = points[i:] to chop off the old points from the beginning of the slice.
You may run into problems if you get a request that accesses the array while you're cleaning it up. Adding a sync.Mutex can help with that. Just lock the mutex before the cleanup and also attempt to lock the mutex anywhere else you write to the array. This may be premature optimization though. I'd experiment without the mutex before adding it in as this would effectively make interacting with points a serial operation and slow the service down.
The time.Sleep(...) in the loop is to prevent from cleaning too often. You might be tempted to set it to an hour since you want to delete points older than that but you might end up with a situation where a point is added immediately after a cleanup. On the next cleanup it'll be 59 mins old and you don't delete it, on the NEXT cleanup it's nearly 2 hours old. My rule of thumb is that I attempt to clean up every 1/10 the amount of time I want to allow an object to stay in memory but that's rather arbitrary. This approach means an object could be at most 1h 5m 59s old when it's deleted.

function returning pointer to struct slice only returns 1

I have a function that basically looks like this:
func (db *datastoreDB) GetAllUsers(ctx context.Context) (*[]User, error) {
query := datastore.NewQuery("User")
var users []User
_, err := db.client.GetAll(ctx, query, &users)
return &users, nil
}
with the struct:
type User struct {
username string
password []byte
}
Now, if I try to call
users, err := models.DB.GetAllUsers(ctx)
log.Println(users)
then it will only print 1 user, even though there are many..
I tried to Print using users[0], users[1] but that returned errors, also tried with *users[1], &users[1], and for i, _ range users { log.Println(users[i]) }
Haven't quite been able to understand when/how to use * and & even though I read many online tutorials, so often just do trail and error.. I doubt there is anything wrong with my datastore GetAll function, so I assume I just fail to properly access/return the struct slice but feel like I tried everything..
Slices include pointers; in fact they are structs with a pointer to an array and some information as to where the slice begins and ends.
A slice is a descriptor of an array segment. It consists of a pointer to the array, the length of the segment, and its capacity (the maximum length of the segment)
Golang blog
The asterix before a type designates that type as a pointer (unless it already is a pointer, in that case it dereferences it). I think you probably meant to write []*User, which will expect a slice of pointers to Users. You can think of [] and User as distinct types.
To create a slice, the simplest way is probably with make();
You can try, instead of var users []User,
users := make([]*User, 0) // replace 0 with the amount of nil values in the slice you're looking to fill
Finally, you'll have to remove the & signs you place before users, as these pass the pointer to a value (but as I pointed out above, slices are already pointers)
To better understand pointers, Dave Cheney recently wrote a blog post titled Understand Go pointers in less than 800 words or your money back, you might find it useful.

How to implement Memory Pooling in Golang

I implemented an HTTP server in Go.
For each request, I need to create hundreds of objects for a particular struct, and I have ~10 structs like that. So after the request is finished as per Go implementation it will be garbage collected.
So for each request this much amount of memory will be allocated and deallocated.
Instead I wanted to implement memory pooling to improve performance from allocation side as well as GC side also
In the beginning of request, I will take from pool and put them back after the request is served
From the pool implementation side
How to allocate and deallocate memory of a particular type of struct?
How keep track of information this memory got assigned and other is not?
Any other suggestions to improve performance in case of memory allocation and deallocation?
Note beforehand:
Many suggest to use sync.Pool which is a fast, good implementation for temporary objects. But note that sync.Pool does not guarantee that pooled objects are retained. Quoting from its doc:
Any item stored in the Pool may be removed automatically at any time without notification. If the Pool holds the only reference when this happens, the item might be deallocated.
So if you don't want your objects in the Pool to get garbage collected (which depending on your case might result in more allocations), the solution presented below is better, as values in the channel's buffer are not garbage collected. If your objects are really that big that memory pool is justified, the overhead of the pool-channel will be amortized.
Moreover, sync.Pool does not allow you to limit the number of pooled objects, while the presented solution below naturally does.
The simplest memory pool "implementation" is a buffered channel.
Let's say you want a memory pool of some big objects. Create a buffered channel holding pointers to values of such expensive objects, and whenever you need one, receive one from the pool (channel). When you're done using it, put it back to the pool (send on the channel). To avoid accidentally losing the objects (e.g. in case of a panic), use defer statement when putting them back.
Let's use this as the type of our big objects:
type BigObject struct {
Id int
Something string
}
Creating a pool is:
pool := make(chan *BigObject, 10)
The size of the pool is simply the size of the channel's buffer.
Filling the pool with pointers of expensive objects (this is optional, see notes at the end):
for i := 0; i < cap(pool); i++ {
bo := &BigObject{Id: i}
pool <- bo
}
Using the pool by many goroutines:
wg := sync.WaitGroup{}
for i := 0; i < 100; i++ {
wg.Add(1)
go func() {
defer wg.Done()
bo := <-pool
defer func() { pool <- bo }()
fmt.Println("Using", bo.Id)
fmt.Println("Releasing", bo.Id)
}()
}
wg.Wait()
Try it on the Go Playground.
Note that this implementation blocks if all the "pooled" objects are in use. If you don't want this, you may use select to force creating new objects if all are in use:
var bo *BigObject
select {
case bo = <-pool: // Try to get one from the pool
default: // All in use, create a new, temporary:
bo = &BigObject{Id:-1}
}
And in this case you don't need to put it back into the pool. Or you may choose to try to put all back into the pool if there's room in the pool, without blocking, again with select:
select {
case pool <- bo: // Try to put back into the pool
default: // Pool is full, will be garbage collected
}
Notes:
Filling the pool prior is optional. If you use select to try to get / put back values from / to the pool, the pool may initially be empty.
You have to make sure you're not leaking information between requests, e.g. make sure you don't use fields and values in your shared objects that were set and belong to other requests.
This is the sync.Pool implementation mentioned by #JimB. Mind the usage of defer to return object to pool.
package main
import "sync"
type Something struct {
Name string
}
var pool = sync.Pool{
New: func() interface{} {
return &Something{}
},
}
func main() {
s := pool.Get().(*Something)
defer pool.Put(s)
s.Name = "hello"
// use the object
}

Is there an efficient way of reclaiming over-capacity slices?

I have a large number of allocated slices (a few million) which I have appended to. I'm sure a large number of them are over capacity. I want to try and reduce memory usage.
My first attempt is to iterate over all of them, allocate a new slice of len(oldSlice) and copy the values over. Unfortunately this appears to increase memory usageĀ (up to double) and the garbage collection is slow to reclaim the memory.
Is there a good general way to slim down memory usage for a large number of over-capacity slices?
Choosing the right strategy to allocate your buffers is hard without knowing the exact problem.
In general you can try to reuse your buffers:
type buffer struct{}
var buffers = make(chan *buffer, 1024)
func newBuffer() *buffer {
select {
case b:= <-buffers:
return b
default:
return &buffer{}
}
}
func returnBuffer(b *buffer) {
select {
case buffers <- b:
default:
}
}
The heuristic used in append may not be suitable for all applications. It's designed for use when you don't know the final length of the data you'll be storing. Instead of iterating over them later, I'd try to minimize the amount of extra capacity you're allocating as early as possible. Here's a simple example of one strategy, which is to use a buffer only while the length is not known, and to reuse that buffer:
type buffer struct {
names []string
... // possibly other things
}
// assume this is called frequently and has lots and lots of names
func (b *buffer) readNames(lines bufio.Scanner) ([]string, error) {
// Start from zero, so we can re-use capacity
b.names = b.names[:0]
for lines.Scan() {
b.names = append(b.names, lines.Text())
}
// Figure out the error
err := lines.Err()
if err == io.EOF {
err = nil
}
// Allocate a minimal slice
out := make([]string, len(b.names))
copy(out, b.names)
return out, err
}
Of course, you'll need to modify this if you need something that's safe for concurrent use; for that I'd recommend using a buffered channel as a leaky bucket for storing your buffers.

Resources