I have a large number of allocated slices (a few million) which I have appended to. I'm sure a large number of them are over capacity. I want to try and reduce memory usage.
My first attempt is to iterate over all of them, allocate a new slice of len(oldSlice) and copy the values over. Unfortunately this appears to increase memory usage (up to double) and the garbage collection is slow to reclaim the memory.
Is there a good general way to slim down memory usage for a large number of over-capacity slices?
Choosing the right strategy to allocate your buffers is hard without knowing the exact problem.
In general you can try to reuse your buffers:
type buffer struct{}
var buffers = make(chan *buffer, 1024)
func newBuffer() *buffer {
select {
case b:= <-buffers:
return b
default:
return &buffer{}
}
}
func returnBuffer(b *buffer) {
select {
case buffers <- b:
default:
}
}
The heuristic used in append may not be suitable for all applications. It's designed for use when you don't know the final length of the data you'll be storing. Instead of iterating over them later, I'd try to minimize the amount of extra capacity you're allocating as early as possible. Here's a simple example of one strategy, which is to use a buffer only while the length is not known, and to reuse that buffer:
type buffer struct {
names []string
... // possibly other things
}
// assume this is called frequently and has lots and lots of names
func (b *buffer) readNames(lines bufio.Scanner) ([]string, error) {
// Start from zero, so we can re-use capacity
b.names = b.names[:0]
for lines.Scan() {
b.names = append(b.names, lines.Text())
}
// Figure out the error
err := lines.Err()
if err == io.EOF {
err = nil
}
// Allocate a minimal slice
out := make([]string, len(b.names))
copy(out, b.names)
return out, err
}
Of course, you'll need to modify this if you need something that's safe for concurrent use; for that I'd recommend using a buffered channel as a leaky bucket for storing your buffers.
Related
Let's say I have the following pipeline of functions:
func func3(opts ...FunctionObject) {
for _, opt := range opts {
opt()
}
}
func func2(opts ...FunctionObject) {
var functions []FunctionObject
functions = append(functions, someFunction3)
functions = append(functions, someFunction4)
...
...
...
func3(append(functions, opts...)...)
}
func func1(opts ...FunctionObject) {
var functions []FunctionObject
functions = append(functions, someFunction)
functions = append(functions, someFunction2)
...
...
...
func2(append(functions, opts...)...)
}
For reasons inherited in the problem I want to solve, the functions in functions should be called before the functions in opts , so i can't just append to opts but I have to prepend functions to opts (by append(functions, opts...) ) and then using ... again to send it to next function in the pipeline, so im getting the weird expression:
func2(append(functions, opts...)...)
I don't know how efficient it is, but Im sure it looks weird,
There must be a better way of doing it, and that's what Im looking for.
yet i'd be grateful for accompanying explanation about efficiency :)
Edit:
I can't change the the argument type from opts ...FunctionObject to opts []FunctionObject (as #dev.bmax suggested in comments) since im making changes in an existing codebase so i can't change the functions that call func{1,2,3}
by saying that "it looks weird" i don't mean only of the "look" but it looks weird to do this operation (ellipsis) twice, and it seems to be inefficient (am i wrong?)
Prepending to a slice is fundamentally inefficient since it will require some combination of:
allocating a larger backing array
moving items to the end of the slice
...or both.
It would be more efficient if you could change the calling convention between functions to only append options and then process them in reverse. This could avoid repeatedly moving items to the end of the slice and potentially all allocations beyond the first (if enough space is allocated in advance).
func func3(opts ...FunctionObject) {
for i := len(opts) - 1; i >= 0; i-- {
opts[i]()
}
}
Note: func3(opts ...FunctionObject) / func3(opts...) and func3(opts []FunctionObject) / func3(opts) are equivalent for performance. The former is effectively syntactic sugar for passing the slice.
However, you've mentioned you need to keep your calling conventions...
Your example code will cause allocations for the 1st, 2nd, 3rd, 5th,.. append within each function - allocations are needed to double the size of the backing array (for small slices). append(functions, opts...) will likely also allocate if the earlier appends didn't create enough spare capacity.
A helper function could make the code more readable. It could also reuse spare capacity in the opts backing array:
func func2(opts ...FunctionObject) {
// 1-2 allocations. Always allocate the variadic slice containings
// prepend items. Prepend reallocates the backing array for `opts`
// if needed.
opts = Prepend(opts, someFunction3, someFunction4)
func3(opts...)
}
// Generics requires Go1.18+. Otherwise change T to FunctionObject.
func Prepend[T any](base []T, items ...T) []T {
if size := len(items) + len(base); size <= cap(base) {
// Extend base using spare slice capacity.
out := base[:size]
// Move elements from the start to the end of the slice (handles overlaps).
copy(out[len(items):], base)
// Copy prepended elements.
copy(out, items)
return out
}
return append(items, base...) // Always re-allocate.
}
Some alternate options without the helper function that describe the allocations in more detail:
// Directly allocate the items to prepend (2 allocations).
func func1(opts ...FunctionObject) {
// Allocate slice to prepend with no spare capacity, then append re-allocates the backing array
// since it is not large enough for the additional `opts`.
// In future, Go could allocate enough space initially to avoid the
// reallocation, but it doesn't do it yet (as of Go1.20rc1).
functions := append([]FunctionObject{
someFunction,
someFunction2,
...
}, opts...)
// Does not allocate -- the slice is simply passed to the next function.
func2(functions...)
}
// Minimise allocations (1 allocation).
func func2(opts ...FunctionObject) {
// Pre-allocate the required space to avoid any further append
// allocations within this function.
functions := make([]FunctionObject, 0, 2 + len(opts))
functions = append(functions, someFunction3)
functions = append(functions, someFunction4)
functions = append(functions, opts...)
func3(functions...)
}
You could go further and reuse with spare capacity in opts without needing to allocate a slice containing the items to prepend (0-1 allocations per function). However this is complex and error prone -- I wouldn't recommend it.
I am working on a problem that involves a producer-consumer pattern. I have one producer who produces the task and 'n' consumers that consumes the task. A consumer task is to read some data from a file and then upload that data to S3. One consumer can read up to xMB(8/16/32) of data and then uploads it to s3. keeping all the data in memory was causing memory consumption that was more than what is expected from the program so I switched to reading the data from file and then writing it to some temporary file and then uploading the file to S3, though this performed better in terms of memory but CPU took a hit. I wonder if there is any way to allocate a fixed size of memory once and then use it among different goroutines?
What I would want is that if I have 4 goroutines then I can allocate 4 different array of xMB and then use the same array in each goroutine invocation, so that a goroutine doesn't allocate for memory every time and also doesn't depend on GC to free the memory?
Edit: Adding a crux of my code. My go consumer looks like:
type struct Block {
offset int64
size int64
}
func consumer (blocks []Block) {
var dataArr []byte
for _, block := range blocks {
data := file.Read(block.offset, block.size)
dataArr = append(dataArr, data)
}
upload(dataArr)
}
I read the data from file based on Blocks, this block can contain several small chunks limited by xMB or one big chunk of xMB.
Edit2: Tried sync.Pool based on suggestions in comment. but I did not see any improvement in memory consumption. Am I doing something wrong?
var pool *sync.Pool
func main() {
pool = &sync.Pool{
New: func()interface{} {
return make([]byte, 16777216)
},
}
for i:=0; i < 4; i++ {
// blocks is 2-d array each index contains array of blocks.
go consumer(blocks[i])
}
}
go consumer(blocks []Blocks) {
var dataArr []byte
d := pool.(Get).([]byte)
for _, block := range blocks {
file.Read(block.offset,block.size,d[block.offset:block.size])
}
upload(data)
pool.put(data)
}
Take a look at SA6002 of StaticCheck, about sync.Pool. You can also use pprof tool.
I am trying to understand how copyBuffer works under the hood, but what is not clear to me is the use of while loop
for {
nr, er := src.Read(buf)
//...
}
Full code below:
// copyBuffer is the actual implementation of Copy and CopyBuffer.
// if buf is nil, one is allocated.
func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error) {
// If the reader has a WriteTo method, use it to do the copy.
// Avoids an allocation and a copy.
if wt, ok := src.(WriterTo); ok {
return wt.WriteTo(dst)
}
// Similarly, if the writer has a ReadFrom method, use it to do the copy.
if rt, ok := dst.(ReaderFrom); ok {
return rt.ReadFrom(src)
}
size := 32 * 1024
if l, ok := src.(*LimitedReader); ok && int64(size) > l.N {
if l.N < 1 {
size = 1
} else {
size = int(l.N)
}
}
if buf == nil {
buf = make([]byte, size)
}
for {
nr, er := src.Read(buf)
if nr > 0 {
nw, ew := dst.Write(buf[0:nr])
if nw > 0 {
written += int64(nw)
}
if ew != nil {
err = ew
break
}
if nr != nw {
err = ErrShortWrite
break
}
}
if er != nil {
if er != EOF {
err = er
}
break
}
}
return written, err
}
It writes to nw, ew := dst.Write(buf[0:nr]) when nr is the number of bytes read, so why is the while loop necessary?
Let's assume that src does not implement WriterTo and dst does not implement ReaderFrom, since otherwise we would not get down to the for loop at all.
Let's further assume, for simplicity, that src does not implement LimitedReader, so that size is 32 * 1024: 32 kBytes. (There is no real loss of generality here as LimitedReader just allows the source to pick an even smaller number, at least in this case.)
Finally, let's assume buf is nil. (Or, if it's not nil, let's assume it has a capacity of 32768 bytes. If it has a large capacity, we can just change the rest of the assumptions below, so that src has more bytes than there are in the buffer.)
So: we enter the loop with size holding the size of the temporary buffer buf, which is 32k. Now suppose the source is a file that holds 64k. It will take at least two src.Read() calls to read it! Clearly we need an outer loop. That's the overall for here.
Now suppose that src.Read() really does read the full 32k, so that nr is also 32 * 1024. The code will now call dst.Write(), passing the full 32k of data. Unlike src.Read()—which is allowed to only read, say, 1k instead of the full 32k—the next chunk of code requires that dst.Write() write all 32k. If it doesn't, the loop will break with err set to ErrShortWrite.
(An alternative would have been to keep calling dst.Write() with the remaining bytes, so that dst.Write() could write only 1k of the 32k, requiring 32 calls to get it all written.)
Note that src.Read() can choose to read only, say, 1k instead of 32k. If the actual file is 64k, it will then take 64 trips, rather than 2, through the outer loop. (An alternative choice would have been to force such a reader to implement the LimitedReaderinterface. That's not as flexible, though, and is not what LimitedReader is intended for.)
func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error)
when the total data size to copy if larger than len(buf), nr, er := src.Read(buf) will try read at most len(buf) data every time.
that's how copyBuffer works:
for {
copy `len(buf)` data from `src` to `dst`;
if EOF {
//done
break;
}
if other Errors {
return Error
}
}
In the normal case, you would just call Copy rather than CopyBuffer.
func Copy(dst Writer, src Reader) (written int64, err error) {
return copyBuffer(dst, src, nil)
}
The option to have a user-supplied buffer is, I think, just for extreme optimization scenarios. The use of the word "Buffer" in the name is possibly a source of confusion since the function is not copying the buffer -- just using it internally.
There are two reasons for the looping...
The buffer might not be large enough to copy all of the data (the size of which is not necessarily known in advance) in one pass.
Reader, though not 'Writer', may return partial results when it makes sense to do so.
Regarding the second item, consider that the Reader does not necessarily represent a fixed file or data buffer. It could, instead, be a live stream from some other thread or process. As such, there are many valid scenarios for stream data to be read and processed on an as-available basis. Although CopyBuffer doesn't do this, it still has to work with such behaviors from any Reader.
I am working on a chat bot for the site Twitch.tv that is written in Go.
One of the features of the bot is a points system that rewards users for watching a particular stream. This data is stored in a SQLite3 database.
To get the viewers, the bot makes an API call to twitch and gathers all of the current viewers of a stream. These viewers are then put in a slice of strings.
Total viewers can range anywhere from a couple to 20,000 or more.
What the bot does
Makes API call
Stores all viewers in a slice of strings
For each viewer, bot iterates and adds points accordingly.
Bot clears this slice before next iteration
Code
type Viewers struct {
Chatters struct {
CurrentModerators []string `json:"moderators"`
CurrentViewers []string `json:"viewers"`
} `json:"chatters"`
}
func RunPoints(timer time.Duration, modifier int, conn net.Conn, channel string) {
database := InitializeDB() // Loads database through SQLite3 driver
var Points int
var allUsers []string
for range time.NewTicker(timer * time.Second).C {
currentUsers := GetViewers(conn, channel)
tx, err := database.Begin()
if err != nil {
fmt.Println("Error starting points transaction: ", err)
}
allUsers = append(allUsers, currentUsers.Chatters.CurrentViewers...)
allUsers = append(allUsers, currentUsers.Chatters.CurrentModerators...)
for _, v := range allUsers {
userCheck := UserInDB(database, v)
if userCheck == false {
statement, _ := tx.Prepare("INSERT INTO points (Username, Points) VALUES (?, ?)")
statement.Exec(v, 1)
} else {
err = tx.QueryRow("Select Points FROM points WHERE Username = ?", v).Scan(&Points)
if err != nil {
} else {
Points = Points + modifier
statement, _ := tx.Prepare("UPDATE points SET Points = ? WHERE username = ?")
statement.Exec(Points, v)
}
}
}
tx.Commit()
allUsers = allUsers[:0]
currentUsers = Viewers{} // Clear Viewer object struct
}
Expected Behavior
When pulling thousands of viewers, naturally, I expect the system resources to get pretty high. This can turn the bot using 3.0 MB of RAM up to 20 MB+. Thousands of elements takes up a lot of space, of course!
However, something else happens.
Actual Behavior
Each time the API is called, the RAM increases as expected. But because I clear the slice, I expect it to fall back down to its 'normal' 3.0 MB of usage.
However, the amount of RAM usage increases per API call, and doesn't go back down even if the total number of viewers of a stream creases.
Thus, given a few hours, the bot will easily consume 100 + MB of ram which doesn't seem right to me.
What am I missing here? I'm fairly new to programming and CS in general, so perhaps I am trying to fix something that isn't a problem. But this almost sounds like a memory leak to me.
I have tried forcing garbage collection and freeing the memory through Golang's run time library, but this does not fix it.
To understand what's happening here, you need to understand the internals of a slice and what's happening with it. You should probably start with https://blog.golang.org/go-slices-usage-and-internals
To give a brief answer: A slice gives a view into a portion of an underlying array, and when you are attempting to truncate your slice, all you're doing is reducing the view you have over the array, but the underlying array remains unaffected and still takes up just as much memory. In fact, by continuing to use the same array, you're never going to decrease the amount of memory you're using.
I'd encourage you to read up on how this works, but as an example for why no actual memory would be freed up, take a look at the output from this simple program that demos how changes to a slice will not truncate the memory allocated under the hood: https://play.golang.org/p/PLEZba8uD-L
When you reslice the slice:
allUsers = allUsers[:0]
All the elements are still in the backing array and cannot be collected. The memory is still allocated, which will save some time in the next run (it doesn't have to resize the array so much, saving slow allocations), which is the point of reslicing it to zero length instead of just dumping it.
If you want the memory released to the GC, you'd need to just dump it altogether and create a new slice every time. This would be slower, but use less memory between runs. However, that doesn't necessarily mean you'll see less memory used by the process. The GC collects unused heap objects, then may eventually free that memory to the OS, which may eventually reclaim it, if other processes are applying memory pressure.
I want to build a buffer in Go that supports multiple concurrent readers and one writer. Whatever is written to the buffer should be read by all readers. New readers are allowed to drop in at any time, which means already written data must be able to be played back for late readers.
The buffer should satisfy the following interface:
type MyBuffer interface {
Write(p []byte) (n int, err error)
NextReader() io.Reader
}
Do you have any suggestions for such an implementation preferably using built in types?
Depending on the nature of this writer and how you use it, keeping everything in memory (to be able to re-play everything for readers joining later) is very risky and might demand a lot of memory, or cause your app to crash due to out of memory.
Using it for a "low-traffic" logger keeping everything in memory is probably ok, but for example streaming some audio or video is most likely not.
If the reader implementations below read all the data that was written to the buffer, their Read() method will report io.EOF, properly. Care must be taken as some constructs (such as bufio.Scanner) may not read more data once io.EOF is encountered (but this is not the flaw of our implementation).
If you want the readers of our buffer to wait if no more data is available in the buffer, to wait until new data is written instead of returning io.EOF, you may wrap the returned readers in a "tail reader" presented here: Go: "tail -f"-like generator.
"Memory-safe" file implementation
Here is an extremely simple and elegant solution. It uses a file to write to, and also uses files to read from. The synchronization is basically provided by the operating system. This does not risk out of memory error, as the data is solely stored on the disk. Depending on the nature of your writer, this may or may not be sufficient.
I will rather use the following interface, because Close() is important in case of files.
type MyBuf interface {
io.WriteCloser
NewReader() (io.ReadCloser, error)
}
And the implementation is extremely simple:
type mybuf struct {
*os.File
}
func (mb *mybuf) NewReader() (io.ReadCloser, error) {
f, err := os.Open(mb.Name())
if err != nil {
return nil, err
}
return f, nil
}
func NewMyBuf(name string) (MyBuf, error) {
f, err := os.Create(name)
if err != nil {
return nil, err
}
return &mybuf{File: f}, nil
}
Our mybuf type embeds *os.File, so we get the Write() and Close() methods for "free".
The NewReader() simply opens the existing, backing file for reading (in read-only mode) and returns it, again taking advantage of that it implements io.ReadCloser.
Creating a new MyBuf value is implementing in the NewMyBuf() function which may also return an error if creating the file fails.
Notes:
Note that since mybuf embeds *os.File, it is possible with a type assertion to "reach" other exported methods of os.File even though they are not part of the MyBuf interface. I do not consider this a flaw, but if you want to disallow this, you have to change the implementation of mybuf to not embed os.File but rather have it as a named field (but then you have to add the Write() and Close() methods yourself, properly forwarding to the os.File field).
In-memory implementation
If the file implementation is not sufficient, here comes an in-memory implementation.
Since we're now in-memory only, we will use the following interface:
type MyBuf interface {
io.Writer
NewReader() io.Reader
}
The idea is to store all byte slices that are ever passed to our buffer. Readers will provide the stored slices when Read() is called, each reader will keep track of how many of the stored slices were served by its Read() method. Synchronization must be dealt with, we will use a simple sync.RWMutex.
Without further ado, here is the implementation:
type mybuf struct {
data [][]byte
sync.RWMutex
}
func (mb *mybuf) Write(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
// Cannot retain p, so we must copy it:
p2 := make([]byte, len(p))
copy(p2, p)
mb.Lock()
mb.data = append(mb.data, p2)
mb.Unlock()
return len(p), nil
}
type mybufReader struct {
mb *mybuf // buffer we read from
i int // next slice index
data []byte // current data slice to serve
}
func (mbr *mybufReader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
// Do we have data to send?
if len(mbr.data) == 0 {
mb := mbr.mb
mb.RLock()
if mbr.i < len(mb.data) {
mbr.data = mb.data[mbr.i]
mbr.i++
}
mb.RUnlock()
}
if len(mbr.data) == 0 {
return 0, io.EOF
}
n = copy(p, mbr.data)
mbr.data = mbr.data[n:]
return n, nil
}
func (mb *mybuf) NewReader() io.Reader {
return &mybufReader{mb: mb}
}
func NewMyBuf() MyBuf {
return &mybuf{}
}
Note that the general contract of Writer.Write() includes that an implementation must not retain the passed slice, so we have to make a copy of it before "storing" it.
Also note that the Read() of readers attempts to lock for minimal amount of time. That is, it only locks if we need new data slice from buffer, and only does read-locking, meaning if the reader has a partial data slice, will send that in Read() without locking and touching the buffer.
I linked to the append only commit log, because it seems very similar to your requirements. I am pretty new to distributed systems and the commit log so I may be butchering a couple of the concepts, but the kafka introduction clearly explains everything with nice charts.
Go is also pretty new to me, so i'm sure there's a better way to do it:
But perhaps you could model your buffer as a slice, I think a couple of cases:
buffer has no readers, new data is written to the buffer, buffer length grows
buffer has one/many reader(s):
reader subscribes to buffer
buffer creates and returns a channel to that client
buffer maintains a list of client channels
write occurs -> loops through all client channels and publishes to it (pub sub)
This addresses a pubsub real time consumer stream, where messages are fanned out, but does not address the backfill.
Kafka enables a backfill and their intro illustrates how it can be done :)
This offset is controlled by the consumer: normally a consumer will
advance its offset linearly as it reads records, but, in fact, since
the position is controlled by the consumer it can consume records in
any order it likes. For example a consumer can reset to an older
offset to reprocess data from the past or skip ahead to the most
recent record and start consuming from "now".
This combination of features means that Kafka consumers are very
cheap—they can come and go without much impact on the cluster or on
other consumers. For example, you can use our command line tools to
"tail" the contents of any topic without changing what is consumed by
any existing consumers.
I had to do something similar as part of an experiment, so sharing:
type MultiReaderBuffer struct {
mu sync.RWMutex
buf []byte
}
func (b *MultiReaderBuffer) Write(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
b.mu.Lock()
b.buf = append(b.buf, p...)
b.mu.Unlock()
return len(p), nil
}
func (b *MultiReaderBuffer) NewReader() io.Reader {
return &mrbReader{mrb: b}
}
type mrbReader struct {
mrb *MultiReaderBuffer
off int
}
func (r *mrbReader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
r.mrb.mu.RLock()
n = copy(p, r.mrb.buf[r.off:])
r.mrb.mu.RUnlock()
if n == 0 {
return 0, io.EOF
}
r.off += n
return n, nil
}