I'm looking for a storage with the ability of storing expiring data. I mean, a structure where you could specify the data to be stored and a timeout, and where that value would be removed after some time.
If you need this for caching, consider using cache2go:
cache := cache2go.Cache("c")
val := struct{x string}{"This is a test!"}
cache.Add("valA", 5*time.Second, &val)
As cache2go is for caching it operates on memory alone but you can specify a data loading routine to lazily load a missing value for a given key. The data loader can be utilized to
load the value from disk:
cache.SetDataLoader(func(key interface{}) *cache2go.CacheItem {
val := loadFromDisk()
item := cache2go.CreateCacheItem(key, 0, val)
return &item
})
go-cache supports this as well and supports loading cached items from disk:
func (c *Cache) Set(k string, x interface{}, d time.Duration)
Adds an item to the cache, replacing any existing item. If the duration is 0,
the cache's default expiration time is used. If it is -1, the item never
expires.
func (c *Cache) Save(w io.Writer) error
Writes the cache's items (using Gob) to an io.Writer. Returns an error if
the serialization fails, e.g. because there are unserializable objects like
channels in the cache.
func (c *Cache) Load(r io.Reader) error
Adds (Gob-serialized) cache items from an io.Reader, excluding any items that
already exist in the current cache.
Related
I am writing some asynchromous code in go which basically implements in-memory caching. I have a not very fast source which I query every minute (using ticker), and save the result into a cache struct field. This field can be queried from different goroutines asynchronously.
In order to avoid using mutexes when updating values from source I do not write to the same struct field which is being queried by other goroutines but create another variable, fill it and then assign it to the queried field. This works fine since assigning operation is atomic and no race occurs.
The code looks like the following:
// this fires up when cache is created
func (f *FeaturesCache) goStartUpdaterDaemon(ctx context.Context) {
go func() {
defer kiterrors.RecoverFunc(ctx, f.logger(ctx))
ticker := time.NewTicker(updateFeaturesPeriod) // every minute
defer ticker.Stop()
for {
select {
case <-ticker.C:
f.refill(ctx)
case <-ctx.Done():
return
}
}
}()
}
func (f *FeaturesCache) refill(ctx context.Context) {
var newSources map[string]FeatureData
// some querying and processing logic
// save new data for future queries
f.features = newSources
}
Now I need to add another view of my data so I can also get it from cache. Basically that means adding one more struct field which will be queriad and filled in the same way the previous one (features) was.
I need these 2 views of my data to be in sync, so it is undesired to have, for example, new data in view 2 and old data in view 1 or the other way round.
So the only thing I need to change about refill is to add a new field, at first I did it this way:
func (f *FeaturesCache) refill(ctx context.Context) {
var newSources map[string]FeatureData
var anotherView map[string]DataView2
// some querying and processing logic
// save new data for future queries
f.features = newSources // line A
f.anotherView = anotherView // line B
}
However, for this code I'm wondering whether it satisfies my consistency requirements. I am worried that if the scheduler decides to interrupt the goroutine which runs refill between lines A nd B (check the code above) than I might get inconsistency between data views.
So I researched the problem. Many sources on the Internet say that the scheduler switches goroutines on syscalls and function calls. However, according to this answer https://stackoverflow.com/a/64113553/12702274 since go 1.14 there is an asynchronous preemtion mechanism in go scheduler which switches goroutines based on their running time in addition to previously checked signals. That makes me think that it is actually possible that refill goroutine can be interrupted between lines A and B.
Then I thought about surrounding those 2 assignments with mutex - lock before line A, unlock after line B. However, it seems to me that this doesn't change things much. The goroutine may still be interrupted between lines A and B and the data gets inconsistent. The only thing mutex achieves here is that 2 simultaneous refills do not conflict with each other which is actually impossible, because I run them in the same thread as timer. Thus it is useless here.
So, is there any way I can ensure atomicity for two consecutive assignments?
If I understand your concern correctly, you don't want to lock existing cached data while updating it(bec. it takes time to update, you want to be able to allow usage of existing cached data while updating it in another routine right ?).
Also you want to make f.features and f.anotherView updates atomic.
What about to take your data in a map[int]map[string]FeatureData and map[int]map[string]DataView2. Put the data to a new key each time and let the queries from this key(newSearchIndex).
Just tried to explain in code roughly(think below like pseudo code)
type FeaturesCache struct {
mu sync.RWMutex
features map[int8]map[string]FeatureData
anotherView map[int8]map[string]DataView2
oldSearchIndex int8
newSearchIndex int8
}
func (f *FeaturesCache) CreateNewIndex() int8 {
f.mu.Lock()
defer f.mu.Unlock()
return (f.newSearchIndex + 1) % 16 // mod 16 could be change per your refill rate
}
func (f *FeaturesCache) SetNewIndex(newIndex int8) {
f.mu.Lock()
defer f.mu.Unlock()
f.oldSearchIndex = f.newSearchIndex
f.newSearchIndex = newIndex
}
func (f *FeaturesCache) refill(ctx context.Context) {
var newSources map[string]FeatureData
var anotherView map[string]DataView2
// some querying and processing logic
// save new data for future queries
newSearchIndex := f.CreateNewIndex()
f.features[newSearchIndex] = newSources
f.anotherView[newSearchIndex] = anotherView
f.SetNewIndex(newSearchIndex) //Let the queries to new cached datas after updating search Index
f.features[f.oldSearchIndex] = nil
f.anotherView[f.oldSearchIndex] = nil
}
Hi here is the code where I make util called as collector
import (
"context"
"errors"
"sync"
"time"
)
type Collector struct {
keyValMap *sync.Map
}
func (c *Collector) LoadOrWait(key any) (retValue any, availability int, err error) {
value, status := c.getStatusAndValue(key)
switch status {
case 0:
return nil, 0, nil
case 1:
return value, 1, nil
case 2:
ctxWithTimeout, _ := context.WithTimeout(context.Background(), 5 * time.Second)
for {
select {
case <-ctxWithTimeout.Done():
return nil, 0, errRequestTimeout
default:
value, resourceStatus := c.getStatusAndValue(key)
if resourceStatus == 1 {
return value, 1, nil
}
time.Sleep(50 * time.Millisecond)
}
}
}
return nil, 0, errRequestTimeout
}
// Store ...
func (c *Collector) Store(key any, value any) {
c.keyValMap.Store(key, value)
}
func (c *Collector) getStatusAndValue(key any) (retValue any, availability int) {
var empty any
result, loaded := c.keyValMap.LoadOrStore(key, empty)
if loaded && result != empty {
return result, 1
}
if loaded && result == empty {
return empty, 2
}
return nil, 0
}
So the purpose of this util is to act as a cache where similar value is only loaded once but read many times. However when an object of Collector is passed to multiple goroutines I am facing increase in gorotines and ram usage whenever multiple goroutines are trying to use collector cache. Could someone explain if this usage of sync Map is correct. If yes then what might be the cause high number of goroutines / high ram usage
For sure, you're facing possible memory leaks due to not calling the cancel func of the newly created ctxWithTimeout context. In order to fix this change the line to these:
ctxWithTimeout, cancelFunc := context.WithTimeout(context.Background(), requestTimeout)
defer cancelFunc()
Thanks to this, you're always sure to clean up all the resources allocated once the context expires. This should address the issue of the leaks.
About the usage of sync.Map seems good to me.
Let me know if this solves your issue or if there is something else to address, thanks!
You show the code on the reader side of things, but not the code which does the request (and calls .Store(key, value)).
With the code you display :
the first goroutine which tries to access a given key will store your empty value in the map (when executing c.keyValMap.LoadOrStore(key, empty)),
so all goroutines that will come afterwards querying for the same key will enter the "query with timeout" loop -- even if the action that actually runs the request and stores its result in the cache isn't executed.
[after your update]
The code for your collector alone seems to be ok regarding resource consumption : I don't see deadlocks or multiplication of goroutines in that code alone.
You should probably look at other places in your code.
Also, if this structure only grows and never shrinks, it is bound to consume more memory. Do audit your program to evaluate how many different keys can live together in your cache and how much memory the cached values can occupy.
I'm a little bit confused with GoLang's garbage collector.
Consider this following code, where I implement reader interface for my type T.
type T struct {
header Header
data []*MyDataType
}
func (t *T) Read(p []byte) (int, error) {
t.Header = *(*Header) (t.readFileHeader(p))
t.Data = *(*[]*MyDataType) (t.readFileData(p))
}
wherein the reader functions I will cast the data to MyDataType using the unsafe.Pointer which will point to slice created with the reflect module (this is more complicated, but for the sake of the example this should be enough)
func (t *T) readFileData(data []byte, idx int, ...) unsafe.Pointer {
...
return unsafe.Pointer(&reflect.SliceHeader{Data : uintptr(unsafe.Pointer(&data[idx])), ...})
}
and If I am gonna read the data in different function
func (d *Dummy) foo() {
data, _ := ioutil.ReadFile(filename)
d.t.Read(data) <---will GC free data?
}
Now I'm confused if it is possible, that the GC will free loaded data from file after exiting the foo function. Or the data will be freed after the d.t is freed.
To understand what GC might do to your variables, first you need to know how and where Go allocates them. Here is a good reading about escape analysis, that is how Go compiler decides where allocate memory, between stack or heap.
Long story short, GC will free memory only if it is not referenced by your Go program.
In your example, the reference to loaded data by data, _ := ioutil.ReadFile(filename) is passed to t.Data = *(*[]*MyDataType) (t.readFileData(p)) ultimately. Therefore, they will be referenced as long as (t *T) struct is referenced as well. As far as I can see from your code, the loaded data will be garbage-collected along with (t *T).
According to the reflect docs, I've to keep a separate pointer to data *[]byte, to avoid garbage collection. So the solution is to add a referencePtr to
type T struct {
header Header
data []*MyDataType
referencePtr *[]byte
}
which will point to my data inside the Read function
func (t *T) Read(p []byte) (int, error) {
t.referencePtr = &p
t.Header = *(*Header) (t.readFileHeader(p))
t.Data = *(*[]*MyDataType) (t.readFileData(p))
}
or is this unnecessary?
I want to build a buffer in Go that supports multiple concurrent readers and one writer. Whatever is written to the buffer should be read by all readers. New readers are allowed to drop in at any time, which means already written data must be able to be played back for late readers.
The buffer should satisfy the following interface:
type MyBuffer interface {
Write(p []byte) (n int, err error)
NextReader() io.Reader
}
Do you have any suggestions for such an implementation preferably using built in types?
Depending on the nature of this writer and how you use it, keeping everything in memory (to be able to re-play everything for readers joining later) is very risky and might demand a lot of memory, or cause your app to crash due to out of memory.
Using it for a "low-traffic" logger keeping everything in memory is probably ok, but for example streaming some audio or video is most likely not.
If the reader implementations below read all the data that was written to the buffer, their Read() method will report io.EOF, properly. Care must be taken as some constructs (such as bufio.Scanner) may not read more data once io.EOF is encountered (but this is not the flaw of our implementation).
If you want the readers of our buffer to wait if no more data is available in the buffer, to wait until new data is written instead of returning io.EOF, you may wrap the returned readers in a "tail reader" presented here: Go: "tail -f"-like generator.
"Memory-safe" file implementation
Here is an extremely simple and elegant solution. It uses a file to write to, and also uses files to read from. The synchronization is basically provided by the operating system. This does not risk out of memory error, as the data is solely stored on the disk. Depending on the nature of your writer, this may or may not be sufficient.
I will rather use the following interface, because Close() is important in case of files.
type MyBuf interface {
io.WriteCloser
NewReader() (io.ReadCloser, error)
}
And the implementation is extremely simple:
type mybuf struct {
*os.File
}
func (mb *mybuf) NewReader() (io.ReadCloser, error) {
f, err := os.Open(mb.Name())
if err != nil {
return nil, err
}
return f, nil
}
func NewMyBuf(name string) (MyBuf, error) {
f, err := os.Create(name)
if err != nil {
return nil, err
}
return &mybuf{File: f}, nil
}
Our mybuf type embeds *os.File, so we get the Write() and Close() methods for "free".
The NewReader() simply opens the existing, backing file for reading (in read-only mode) and returns it, again taking advantage of that it implements io.ReadCloser.
Creating a new MyBuf value is implementing in the NewMyBuf() function which may also return an error if creating the file fails.
Notes:
Note that since mybuf embeds *os.File, it is possible with a type assertion to "reach" other exported methods of os.File even though they are not part of the MyBuf interface. I do not consider this a flaw, but if you want to disallow this, you have to change the implementation of mybuf to not embed os.File but rather have it as a named field (but then you have to add the Write() and Close() methods yourself, properly forwarding to the os.File field).
In-memory implementation
If the file implementation is not sufficient, here comes an in-memory implementation.
Since we're now in-memory only, we will use the following interface:
type MyBuf interface {
io.Writer
NewReader() io.Reader
}
The idea is to store all byte slices that are ever passed to our buffer. Readers will provide the stored slices when Read() is called, each reader will keep track of how many of the stored slices were served by its Read() method. Synchronization must be dealt with, we will use a simple sync.RWMutex.
Without further ado, here is the implementation:
type mybuf struct {
data [][]byte
sync.RWMutex
}
func (mb *mybuf) Write(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
// Cannot retain p, so we must copy it:
p2 := make([]byte, len(p))
copy(p2, p)
mb.Lock()
mb.data = append(mb.data, p2)
mb.Unlock()
return len(p), nil
}
type mybufReader struct {
mb *mybuf // buffer we read from
i int // next slice index
data []byte // current data slice to serve
}
func (mbr *mybufReader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
// Do we have data to send?
if len(mbr.data) == 0 {
mb := mbr.mb
mb.RLock()
if mbr.i < len(mb.data) {
mbr.data = mb.data[mbr.i]
mbr.i++
}
mb.RUnlock()
}
if len(mbr.data) == 0 {
return 0, io.EOF
}
n = copy(p, mbr.data)
mbr.data = mbr.data[n:]
return n, nil
}
func (mb *mybuf) NewReader() io.Reader {
return &mybufReader{mb: mb}
}
func NewMyBuf() MyBuf {
return &mybuf{}
}
Note that the general contract of Writer.Write() includes that an implementation must not retain the passed slice, so we have to make a copy of it before "storing" it.
Also note that the Read() of readers attempts to lock for minimal amount of time. That is, it only locks if we need new data slice from buffer, and only does read-locking, meaning if the reader has a partial data slice, will send that in Read() without locking and touching the buffer.
I linked to the append only commit log, because it seems very similar to your requirements. I am pretty new to distributed systems and the commit log so I may be butchering a couple of the concepts, but the kafka introduction clearly explains everything with nice charts.
Go is also pretty new to me, so i'm sure there's a better way to do it:
But perhaps you could model your buffer as a slice, I think a couple of cases:
buffer has no readers, new data is written to the buffer, buffer length grows
buffer has one/many reader(s):
reader subscribes to buffer
buffer creates and returns a channel to that client
buffer maintains a list of client channels
write occurs -> loops through all client channels and publishes to it (pub sub)
This addresses a pubsub real time consumer stream, where messages are fanned out, but does not address the backfill.
Kafka enables a backfill and their intro illustrates how it can be done :)
This offset is controlled by the consumer: normally a consumer will
advance its offset linearly as it reads records, but, in fact, since
the position is controlled by the consumer it can consume records in
any order it likes. For example a consumer can reset to an older
offset to reprocess data from the past or skip ahead to the most
recent record and start consuming from "now".
This combination of features means that Kafka consumers are very
cheap—they can come and go without much impact on the cluster or on
other consumers. For example, you can use our command line tools to
"tail" the contents of any topic without changing what is consumed by
any existing consumers.
I had to do something similar as part of an experiment, so sharing:
type MultiReaderBuffer struct {
mu sync.RWMutex
buf []byte
}
func (b *MultiReaderBuffer) Write(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
b.mu.Lock()
b.buf = append(b.buf, p...)
b.mu.Unlock()
return len(p), nil
}
func (b *MultiReaderBuffer) NewReader() io.Reader {
return &mrbReader{mrb: b}
}
type mrbReader struct {
mrb *MultiReaderBuffer
off int
}
func (r *mrbReader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
r.mrb.mu.RLock()
n = copy(p, r.mrb.buf[r.off:])
r.mrb.mu.RUnlock()
if n == 0 {
return 0, io.EOF
}
r.off += n
return n, nil
}
I have a large number of allocated slices (a few million) which I have appended to. I'm sure a large number of them are over capacity. I want to try and reduce memory usage.
My first attempt is to iterate over all of them, allocate a new slice of len(oldSlice) and copy the values over. Unfortunately this appears to increase memory usage (up to double) and the garbage collection is slow to reclaim the memory.
Is there a good general way to slim down memory usage for a large number of over-capacity slices?
Choosing the right strategy to allocate your buffers is hard without knowing the exact problem.
In general you can try to reuse your buffers:
type buffer struct{}
var buffers = make(chan *buffer, 1024)
func newBuffer() *buffer {
select {
case b:= <-buffers:
return b
default:
return &buffer{}
}
}
func returnBuffer(b *buffer) {
select {
case buffers <- b:
default:
}
}
The heuristic used in append may not be suitable for all applications. It's designed for use when you don't know the final length of the data you'll be storing. Instead of iterating over them later, I'd try to minimize the amount of extra capacity you're allocating as early as possible. Here's a simple example of one strategy, which is to use a buffer only while the length is not known, and to reuse that buffer:
type buffer struct {
names []string
... // possibly other things
}
// assume this is called frequently and has lots and lots of names
func (b *buffer) readNames(lines bufio.Scanner) ([]string, error) {
// Start from zero, so we can re-use capacity
b.names = b.names[:0]
for lines.Scan() {
b.names = append(b.names, lines.Text())
}
// Figure out the error
err := lines.Err()
if err == io.EOF {
err = nil
}
// Allocate a minimal slice
out := make([]string, len(b.names))
copy(out, b.names)
return out, err
}
Of course, you'll need to modify this if you need something that's safe for concurrent use; for that I'd recommend using a buffered channel as a leaky bucket for storing your buffers.