I want to build a buffer in Go that supports multiple concurrent readers and one writer. Whatever is written to the buffer should be read by all readers. New readers are allowed to drop in at any time, which means already written data must be able to be played back for late readers.
The buffer should satisfy the following interface:
type MyBuffer interface {
Write(p []byte) (n int, err error)
NextReader() io.Reader
Do you have any suggestions for such an implementation preferably using built in types?

Depending on the nature of this writer and how you use it, keeping everything in memory (to be able to re-play everything for readers joining later) is very risky and might demand a lot of memory, or cause your app to crash due to out of memory.
Using it for a "low-traffic" logger keeping everything in memory is probably ok, but for example streaming some audio or video is most likely not.
If the reader implementations below read all the data that was written to the buffer, their Read() method will report io.EOF, properly. Care must be taken as some constructs (such as bufio.Scanner) may not read more data once io.EOF is encountered (but this is not the flaw of our implementation).
If you want the readers of our buffer to wait if no more data is available in the buffer, to wait until new data is written instead of returning io.EOF, you may wrap the returned readers in a "tail reader" presented here: Go: "tail -f"-like generator.
"Memory-safe" file implementation
Here is an extremely simple and elegant solution. It uses a file to write to, and also uses files to read from. The synchronization is basically provided by the operating system. This does not risk out of memory error, as the data is solely stored on the disk. Depending on the nature of your writer, this may or may not be sufficient.
I will rather use the following interface, because Close() is important in case of files.
type MyBuf interface {
NewReader() (io.ReadCloser, error)
And the implementation is extremely simple:
type mybuf struct {
func (mb *mybuf) NewReader() (io.ReadCloser, error) {
f, err := os.Open(mb.Name())
if err != nil {
return nil, err
return f, nil
func NewMyBuf(name string) (MyBuf, error) {
f, err := os.Create(name)
if err != nil {
return nil, err
return &mybuf{File: f}, nil
Our mybuf type embeds *os.File, so we get the Write() and Close() methods for "free".
The NewReader() simply opens the existing, backing file for reading (in read-only mode) and returns it, again taking advantage of that it implements io.ReadCloser.
Creating a new MyBuf value is implementing in the NewMyBuf() function which may also return an error if creating the file fails.
Note that since mybuf embeds *os.File, it is possible with a type assertion to "reach" other exported methods of os.File even though they are not part of the MyBuf interface. I do not consider this a flaw, but if you want to disallow this, you have to change the implementation of mybuf to not embed os.File but rather have it as a named field (but then you have to add the Write() and Close() methods yourself, properly forwarding to the os.File field).
In-memory implementation
If the file implementation is not sufficient, here comes an in-memory implementation.
Since we're now in-memory only, we will use the following interface:
type MyBuf interface {
NewReader() io.Reader
The idea is to store all byte slices that are ever passed to our buffer. Readers will provide the stored slices when Read() is called, each reader will keep track of how many of the stored slices were served by its Read() method. Synchronization must be dealt with, we will use a simple sync.RWMutex.
Without further ado, here is the implementation:
type mybuf struct {
data [][]byte
func (mb *mybuf) Write(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
// Cannot retain p, so we must copy it:
p2 := make([]byte, len(p))
copy(p2, p)
mb.data = append(mb.data, p2)
return len(p), nil
type mybufReader struct {
mb *mybuf // buffer we read from
i int // next slice index
data []byte // current data slice to serve
func (mbr *mybufReader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
// Do we have data to send?
if len(mbr.data) == 0 {
mb := mbr.mb
if mbr.i < len(mb.data) {
mbr.data = mb.data[mbr.i]
if len(mbr.data) == 0 {
return 0, io.EOF
n = copy(p, mbr.data)
mbr.data = mbr.data[n:]
return n, nil
func (mb *mybuf) NewReader() io.Reader {
return &mybufReader{mb: mb}
func NewMyBuf() MyBuf {
return &mybuf{}
Note that the general contract of Writer.Write() includes that an implementation must not retain the passed slice, so we have to make a copy of it before "storing" it.
Also note that the Read() of readers attempts to lock for minimal amount of time. That is, it only locks if we need new data slice from buffer, and only does read-locking, meaning if the reader has a partial data slice, will send that in Read() without locking and touching the buffer.

I linked to the append only commit log, because it seems very similar to your requirements. I am pretty new to distributed systems and the commit log so I may be butchering a couple of the concepts, but the kafka introduction clearly explains everything with nice charts.
Go is also pretty new to me, so i'm sure there's a better way to do it:
But perhaps you could model your buffer as a slice, I think a couple of cases:
buffer has no readers, new data is written to the buffer, buffer length grows
buffer has one/many reader(s):
reader subscribes to buffer
buffer creates and returns a channel to that client
buffer maintains a list of client channels
write occurs -> loops through all client channels and publishes to it (pub sub)
This addresses a pubsub real time consumer stream, where messages are fanned out, but does not address the backfill.
Kafka enables a backfill and their intro illustrates how it can be done :)
This offset is controlled by the consumer: normally a consumer will
advance its offset linearly as it reads records, but, in fact, since
the position is controlled by the consumer it can consume records in
any order it likes. For example a consumer can reset to an older
offset to reprocess data from the past or skip ahead to the most
recent record and start consuming from "now".
This combination of features means that Kafka consumers are very
cheap—they can come and go without much impact on the cluster or on
other consumers. For example, you can use our command line tools to
"tail" the contents of any topic without changing what is consumed by
any existing consumers.

I had to do something similar as part of an experiment, so sharing:
type MultiReaderBuffer struct {
mu sync.RWMutex
buf []byte
func (b *MultiReaderBuffer) Write(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
b.buf = append(b.buf, p...)
return len(p), nil
func (b *MultiReaderBuffer) NewReader() io.Reader {
return &mrbReader{mrb: b}
type mrbReader struct {
mrb *MultiReaderBuffer
off int
func (r *mrbReader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
n = copy(p, r.mrb.buf[r.off:])
if n == 0 {
return 0, io.EOF
r.off += n
return n, nil


Sync Map possibly leading increase in ram and goroutines

Hi here is the code where I make util called as collector
import (
type Collector struct {
keyValMap *sync.Map
func (c *Collector) LoadOrWait(key any) (retValue any, availability int, err error) {
value, status := c.getStatusAndValue(key)
switch status {
case 0:
return nil, 0, nil
case 1:
return value, 1, nil
case 2:
ctxWithTimeout, _ := context.WithTimeout(context.Background(), 5 * time.Second)
for {
select {
case <-ctxWithTimeout.Done():
return nil, 0, errRequestTimeout
value, resourceStatus := c.getStatusAndValue(key)
if resourceStatus == 1 {
return value, 1, nil
time.Sleep(50 * time.Millisecond)
return nil, 0, errRequestTimeout
// Store ...
func (c *Collector) Store(key any, value any) {
c.keyValMap.Store(key, value)
func (c *Collector) getStatusAndValue(key any) (retValue any, availability int) {
var empty any
result, loaded := c.keyValMap.LoadOrStore(key, empty)
if loaded && result != empty {
return result, 1
if loaded && result == empty {
return empty, 2
return nil, 0
So the purpose of this util is to act as a cache where similar value is only loaded once but read many times. However when an object of Collector is passed to multiple goroutines I am facing increase in gorotines and ram usage whenever multiple goroutines are trying to use collector cache. Could someone explain if this usage of sync Map is correct. If yes then what might be the cause high number of goroutines / high ram usage
For sure, you're facing possible memory leaks due to not calling the cancel func of the newly created ctxWithTimeout context. In order to fix this change the line to these:
ctxWithTimeout, cancelFunc := context.WithTimeout(context.Background(), requestTimeout)
defer cancelFunc()
Thanks to this, you're always sure to clean up all the resources allocated once the context expires. This should address the issue of the leaks.
About the usage of sync.Map seems good to me.
Let me know if this solves your issue or if there is something else to address, thanks!
You show the code on the reader side of things, but not the code which does the request (and calls .Store(key, value)).
With the code you display :
the first goroutine which tries to access a given key will store your empty value in the map (when executing c.keyValMap.LoadOrStore(key, empty)),
so all goroutines that will come afterwards querying for the same key will enter the "query with timeout" loop -- even if the action that actually runs the request and stores its result in the cache isn't executed.
[after your update]
The code for your collector alone seems to be ok regarding resource consumption : I don't see deadlocks or multiplication of goroutines in that code alone.
You should probably look at other places in your code.
Also, if this structure only grows and never shrinks, it is bound to consume more memory. Do audit your program to evaluate how many different keys can live together in your cache and how much memory the cached values can occupy.

How to range over slice of a custom type

I'm trying to write in Go custom cache for Google DataStore (more precisely - a wrapper around one of existing cache libraries). At cache initialisation, it should accept any custom type of struct (with appropriately-defined datastore fields), which then would be the basis for all items stored. The idea is that cache can be created/initialised for various types which reflect the structure of a particular DataStore entry (CustomEntry)
Approach 1 - store reflect.Type and use it. Problem encountered - can't iterate over a slice of a custom type
type CustomEntry struct {
Data struct {
name string `datastore:"name,noindex"`
address []string `datastore:"address,noindex"`
} `datastore:"data,noindex"`
func (cache *MyCache) CacheData(dataQuery string, dataType reflect.Type) {
slice := reflect.MakeSlice(reflect.SliceOf(dataType), 10, 10)
if keys, err := DataStoreClient.GetAll(cache.ctx, datastore.NewQuery(dataQuery), &slice); err != nil {
//handle error
} else {
for i, dataEntry:= range slice {
// ERROR: Cannot range over 'slice' (type Value)
cache.Set(keys[i].Name, dataEntry)
//usage: Cache.CacheData("Person", reflect.TypeOf(CustomEntry{})
Approach 2 - accept an array of interfaces as arguments. Problem encountered = []CustomEntry is not []interface{}
func (cache *MyCache) CacheData(dataQuery string, dataType []interface{}) {
if keys, err := DataStoreClient.GetAll(cache.ctx, datastore.NewQuery(dataQuery), &dataType); err != nil {
//handle error
} else {
for i, dataEntry:= range slice {
// this seems to work fine
cache.Set(keys[i].Name, dataEntry)
var dataType []CustomEntry
Cache.CacheData("Person", data)
// ERROR: Cannot use 'data' (type []CustomEntry) as type []interface{}
Any suggestions would be highly appreciated.
I have found a solution and thought it might be worth sharing in case anyone else has a similar problem.
The easiest way is to initiate a slice of structs which the DataStore is expected to receive, and then to pass a pointer to it as an argument (interface{}) into the desired function. DataStore, similarly to a few unmarshaling functions (I have tried with JSON package) will be able to successfully append the data to it.
Trying to dynamically create the slice within the function, given a certain Type, which would be then accepted by a function (such as DataStore client) might be quite difficult (I have not managed to find a way to do it). Similarly, passing a slice of interfaces (to allow for easy iteration) only complicates things.
Secondly, in order to iterate over the data (e.g. to store it in cache), it is necessary to:
(1) retrieve the underlying value of the interface (i.e. the pointer itself) - this can be achieved using reflect.ValueOf(pointerInterface),
(2) dereference the pointer so that we obtain access to the underlying, iterable slice of structs - this can be done by invoking .Elem(),
(3) iterate over the underlying slice using .Index(i) method (range will not accept an interface, even if the underlying type is iterable).
Naturally, adding a number of switch-case statements might be appropriate to ensure that any errors are caught rather than cause a runtime panic.
Hence the following code provides a working solution to the above problem:
In main:
var data []customEntry
And the function itself:
func (cache *MyCache) CacheData(dataQuery string, data interface{}) error {
if keys, err := DataStoreClient.GetAll(cache.ctx, datastore.NewQuery(dataQuery), data); err != nil {
return err
} else {
s := reflect.ValueOf(data).Elem()
for i := 0; i < s.Len(); i++ {
cache.Set(keys[i].Name, s.Index(i), 1)

Why copyBuffer implements while loop

I am trying to understand how copyBuffer works under the hood, but what is not clear to me is the use of while loop
for {
nr, er := src.Read(buf)
Full code below:
// copyBuffer is the actual implementation of Copy and CopyBuffer.
// if buf is nil, one is allocated.
func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error) {
// If the reader has a WriteTo method, use it to do the copy.
// Avoids an allocation and a copy.
if wt, ok := src.(WriterTo); ok {
return wt.WriteTo(dst)
// Similarly, if the writer has a ReadFrom method, use it to do the copy.
if rt, ok := dst.(ReaderFrom); ok {
return rt.ReadFrom(src)
size := 32 * 1024
if l, ok := src.(*LimitedReader); ok && int64(size) > l.N {
if l.N < 1 {
size = 1
} else {
size = int(l.N)
if buf == nil {
buf = make([]byte, size)
for {
nr, er := src.Read(buf)
if nr > 0 {
nw, ew := dst.Write(buf[0:nr])
if nw > 0 {
written += int64(nw)
if ew != nil {
err = ew
if nr != nw {
err = ErrShortWrite
if er != nil {
if er != EOF {
err = er
return written, err
It writes to nw, ew := dst.Write(buf[0:nr]) when nr is the number of bytes read, so why is the while loop necessary?
Let's assume that src does not implement WriterTo and dst does not implement ReaderFrom, since otherwise we would not get down to the for loop at all.
Let's further assume, for simplicity, that src does not implement LimitedReader, so that size is 32 * 1024: 32 kBytes. (There is no real loss of generality here as LimitedReader just allows the source to pick an even smaller number, at least in this case.)
Finally, let's assume buf is nil. (Or, if it's not nil, let's assume it has a capacity of 32768 bytes. If it has a large capacity, we can just change the rest of the assumptions below, so that src has more bytes than there are in the buffer.)
So: we enter the loop with size holding the size of the temporary buffer buf, which is 32k. Now suppose the source is a file that holds 64k. It will take at least two src.Read() calls to read it! Clearly we need an outer loop. That's the overall for here.
Now suppose that src.Read() really does read the full 32k, so that nr is also 32 * 1024. The code will now call dst.Write(), passing the full 32k of data. Unlike src.Read()—which is allowed to only read, say, 1k instead of the full 32k—the next chunk of code requires that dst.Write() write all 32k. If it doesn't, the loop will break with err set to ErrShortWrite.
(An alternative would have been to keep calling dst.Write() with the remaining bytes, so that dst.Write() could write only 1k of the 32k, requiring 32 calls to get it all written.)
Note that src.Read() can choose to read only, say, 1k instead of 32k. If the actual file is 64k, it will then take 64 trips, rather than 2, through the outer loop. (An alternative choice would have been to force such a reader to implement the LimitedReaderinterface. That's not as flexible, though, and is not what LimitedReader is intended for.)
func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error)
when the total data size to copy if larger than len(buf), nr, er := src.Read(buf) will try read at most len(buf) data every time.
that's how copyBuffer works:
for {
copy `len(buf)` data from `src` to `dst`;
if EOF {
if other Errors {
return Error
In the normal case, you would just call Copy rather than CopyBuffer.
func Copy(dst Writer, src Reader) (written int64, err error) {
return copyBuffer(dst, src, nil)
The option to have a user-supplied buffer is, I think, just for extreme optimization scenarios. The use of the word "Buffer" in the name is possibly a source of confusion since the function is not copying the buffer -- just using it internally.
There are two reasons for the looping...
The buffer might not be large enough to copy all of the data (the size of which is not necessarily known in advance) in one pass.
Reader, though not 'Writer', may return partial results when it makes sense to do so.
Regarding the second item, consider that the Reader does not necessarily represent a fixed file or data buffer. It could, instead, be a live stream from some other thread or process. As such, there are many valid scenarios for stream data to be read and processed on an as-available basis. Although CopyBuffer doesn't do this, it still has to work with such behaviors from any Reader.

Is is safe to append() to a slice from which another thread is reading?

Let's say I have many goroutines doing something like this:
func (o *Obj) Reader() {
data := o.data;
for i, value := range data {
log.Printf("got data[%v] = %v", i, value)
And one doing this:
func (o *Obj) Writer() {
o.data = append(o.data, 1234)
If data := o.data means the internal structure of the slice is copied, this looks like it could be safe, because I'm never modifying anything in the accessible range of the copy. I'm either setting one element outside of the range and increasing the length, or allocating a completely new pointer, but the reader would be operating on the original one.
Are my assumptions correct and this is safe to do?
I'm aware that slices are not meant to be "thread-safe" in general, the question is more about how much does slice1 := slice2 actually copy.
The code in the question is unsafe because it reads a variable in one goroutine and modifies the variable in another goroutine without synchronization.
Here's one way to make the code safe:
type Obj struct {
mu sync.Mutex // add mutex
... // other fields as before
func (o *Obj) Reader() {
data := o.data
for i, value := range data {
log.Printf("got data[%v] = %v", i, value)
func (o *Obj) Writer() {
o.data = append(o.data, 1234)
It's safe for Reader to range over the local slice variable data because the Writer does not modify the local variable data or the backing array visible through the local variable data.
A bit late to the party, but if your use-case is frequent reads and infrequent writes, atomic.Value is designed to solve this:
type Obj struct {
data atomic.Value // []int
mu sync.Mutex
func (o *Obj) Reader() {
data := o.data.Load().([]int);
for i, value := range data {
log.Printf("got data[%v] = %v", i, value)
func (o *Obj) Writer() {
data := o.data.Load().([]int);
data = append(o.data, 1234)
This will generally be much faster than either a Mutex or an RWMutex.
Note that this will only work with data this is effectively a copy, which it is in this case because you can safely maintain a reference to the previous slice when appending, as append() creates a new copy if it extends. If you're mutating the elements of the slice, or using another data structure, this approach is not safe.

Is there an efficient way of reclaiming over-capacity slices?

I have a large number of allocated slices (a few million) which I have appended to. I'm sure a large number of them are over capacity. I want to try and reduce memory usage.
My first attempt is to iterate over all of them, allocate a new slice of len(oldSlice) and copy the values over. Unfortunately this appears to increase memory usage (up to double) and the garbage collection is slow to reclaim the memory.
Is there a good general way to slim down memory usage for a large number of over-capacity slices?
Choosing the right strategy to allocate your buffers is hard without knowing the exact problem.
In general you can try to reuse your buffers:
type buffer struct{}
var buffers = make(chan *buffer, 1024)
func newBuffer() *buffer {
select {
case b:= <-buffers:
return b
return &buffer{}
func returnBuffer(b *buffer) {
select {
case buffers <- b:
The heuristic used in append may not be suitable for all applications. It's designed for use when you don't know the final length of the data you'll be storing. Instead of iterating over them later, I'd try to minimize the amount of extra capacity you're allocating as early as possible. Here's a simple example of one strategy, which is to use a buffer only while the length is not known, and to reuse that buffer:
type buffer struct {
names []string
... // possibly other things
// assume this is called frequently and has lots and lots of names
func (b *buffer) readNames(lines bufio.Scanner) ([]string, error) {
// Start from zero, so we can re-use capacity
b.names = b.names[:0]
for lines.Scan() {
b.names = append(b.names, lines.Text())
// Figure out the error
err := lines.Err()
if err == io.EOF {
err = nil
// Allocate a minimal slice
out := make([]string, len(b.names))
copy(out, b.names)
return out, err
Of course, you'll need to modify this if you need something that's safe for concurrent use; for that I'd recommend using a buffered channel as a leaky bucket for storing your buffers.
