Map with TTL option in Go - data-structures

Map with TTL option in Go - data-structures

I need to build a data-structure like this:
map[string]SomeType
But it must store values for about 10 minutes and then clear it from memory.
Second condition is records amount - it must be huge. This data-structure must add at least 2-5K records per second.
So, what is the most correct way in Go to make it?
I'm trying to make goroutine with timeout for each new elemnt. And one(or more) garbage-collector goroutine with channel to receive timeouts and clear elements.
But I'm not sure it's the most clear way. Is it Ok to have millions of waiting goroutines with timeouts?
Thanks.

You will have to create a struct to hold your map and provide custom get/put/delete funcs to access it.
Note that 2-5k accesses per second is not really that much at all, so you don't have to worry about that.
Here's a simple implementation:
type item struct {
value string
lastAccess int64
}
type TTLMap struct {
m map[string]*item
l sync.Mutex
}
func New(ln int, maxTTL int) (m *TTLMap) {
m = &TTLMap{m: make(map[string]*item, ln)}
go func() {
for now := range time.Tick(time.Second) {
m.l.Lock()
for k, v := range m.m {
if now.Unix() - v.lastAccess > int64(maxTTL) {
delete(m.m, k)
}
}
m.l.Unlock()
}
}()
return
}
func (m *TTLMap) Len() int {
return len(m.m)
}
func (m *TTLMap) Put(k, v string) {
m.l.Lock()
it, ok := m.m[k]
if !ok {
it = &item{value: v}
m.m[k] = it
}
it.lastAccess = time.Now().Unix()
m.l.Unlock()
}
func (m *TTLMap) Get(k string) (v string) {
m.l.Lock()
if it, ok := m.m[k]; ok {
v = it.value
it.lastAccess = time.Now().Unix()
}
m.l.Unlock()
return
}
playground
note(2020-09-23): for some reason the time resolution on the current version of the playground is way off, this works fine, however to try on the playground you have to change the sleep to 3-5 seconds.

Take a look at buntdb.
tinykv is no longer being maintained.
Just for the record, I had the same problem and wrote tinykv package which uses a map internally.
It uses a heap of time.Time for timeouts, so it does not ranges over the whole map.
A max interval can be set when creating an instance. But actual intervals for checking the timeout can be any value of time.Duration greater than zero and less than max, based on the last item that timed out.
It provides CAS and Take functionality.
A callback (optional) can be set which notifies which key and value got timed out.
Timeouts can be explicit or sliding.

I suggest to use Map of golang's built-in package sync, it's very easy to use and already handles concurrency https://golang.org/pkg/sync/#Map

Related

Lock slice before reading and modifying it

My experience working with Go is recent and in reviewing some code, I have seen that while it is write-protected, there is a problem with reading the data. Not with the reading itself, but with possible modifications that can occur between the reading and the modification of the slice.
type ConcurrentSlice struct {
sync.RWMutex
items []Item
}
type Item struct {
Index int
Value Info
}
type Info struct {
Name string
Labels map[string]string
Failure bool
}
As mentioned, the writing is protected in this way:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
for inList := range cs.Iter() {
if item.Name == inList.Value.Name{
cs.items[i] = item
found = true
}
i++
}
if !found {
cs.Lock()
defer cs.Unlock()
cs.items = append(cs.items, item)
}
}
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.Lock()
defer cs.Unlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
But between collecting the content of the slice and modifying it, modifications can occur.It may be that another routine modifies the same slice and when it is time to assign a value, it no longer exists: slice[i] = item
What would be the right way to deal with this?
I have implemented this method:
func GetList() *ConcurrentSlice {
if list == nil {
denylist = NewConcurrentSlice()
return denylist
}
return denylist
}
And I use it like this:
concurrentSlice := GetList()
concurrentSlice.UpdateOrAppend(item)
But I understand that between the get and the modification, even if it is practically immediate, another routine may have modified the slice. What would be the correct way to perform the two operations atomically? That the slice I read is 100% the one I modify. Because if I try to assign an item to a index that no longer exists, it will break the execution.
Thank you in advance!

The way you are doing the blocking is incorrect, because it does not ensure that the items you return have not been removed. In case of an update, the array would still be at least the same length.
A simpler solution that works could be the following:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
cs.Lock()
defer cs.Unlock()
for _, it := range cs.items {
if item.Name == it.Name{
cs.items[i] = it
found = true
}
i++
}
if !found {
cs.items = append(cs.items, item)
}
}

Use a sync.Map if the order of the values is not important.
type Items struct {
m sync.Map
}
func (items *Items) Update(item Info) {
items.m.Store(item.Name, item)
}
func (items *Items) Range(f func(Info) bool) {
items.m.Range(func(key, value any) bool {
return f(value.(Info))
})
}

Data structures 101: always pick the best data structure for your use case. If you’re going to be looking up objects by name, that’s EXACTLY what map is for. If you still need to maintain the order of the items, you use a treemap
Concurrency 101: like transactions, your mutex should be atomic, consistent, and isolated. You’re failing isolation here because the data structure read does not fall inside your mutex lock.
Your code should look something like this:
func {
mutex.lock
defer mutex.unlock
check map or treemap for name
if exists update
else add
}

After some tests, I can say that the situation you fear can indeed happen with sync.RWMutex. I think it could happen with sync.Mutex too, but I can't reproduce that. Maybe I'm missing some informations, or maybe the calls are in order because they all are blocked and the order they redeem the right to lock is ordered in some way.
One way to keep your two calls safe without other routines getting in 'conflict' would be to use an other mutex, for every task on that object. You would lock that mutex before your read and write, and release it when you're done. You would also have to use that mutex on any other call that write (or read) to that object. You can find an implementation of what I'm talking about here in the main.go file. In order to reproduce the issue with RWMutex, you can simply comment the startTask and the endTask calls and the issue is visible in the terminal output.
EDIT : my first answer was wrong as I misinterpreted a test result, and fell in the situation described by OP.

tl;dr;
If ConcurrentSlice is to be used from a single goroutine, the locks are unnecessary, because the way algorithm written there is not going to be any concurrent read/writes to slice elements, or the slice.
If ConcurrentSlice is to be used from multiple goroutines, existings locks are not sufficient. This is because UpdateOrAppend may modify slice elements concurrently.
A safe version woule need two versions of Iter:
This can be called by users of ConcurrentSlice, but it cannot be called from `UpdateOrAppend:
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.RLock()
defer cs.RUnlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
and this is only to be called from UpdateOrAppend:
func (cs *ConcurrentSlice) internalIter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
// No locking
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
And UpdateOrAppend should be synchronized at the top level:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
cs.Lock()
defer cs.Unlock()
....
}
Here's the long version:
This is an interesting piece of code. Based on my understanding of the go memory model, the mutex lock in Iter() is only necessary if there is another goroutine working on this code, and even with that, there is a possible race in the code. However, UpdateOrAppend only modifies elements of the slice with lower indexes than what Iter is working on, so that race never manifests itself.
The race can happen as follows:
The for-loop in iter reads element 0 of the slice
The element is sent through the channel. Thus, the slice receive happens after the first step.
The receiving end potentially updates element 0 of the slice. There is no problem up to here.
Then the sending goroutine reads element 1 of the slice. This is when a race can happen. If step 3 updated index 1 of the slice, the read at step 4 is a race. That is: if step 3 reads the update done by step 4, it is a race. You can see this if you start with i:=1 in UpdateOrAppend, and running it with the -race flag.
But UpdateOrAppend always modifies slice elements that are already seen by Iter when i=0, so this code is safe, even without the lock.
If there will be other goroutines accessing and modifying the structure, you need the Mutex, but you need it to protect the complete UpdateOrAppend method, because only one goroutine should be allowed to run that. You need the mutex to protect the potential updates in the first for-loop, and that mutex has to also include the slice append case, because that may actually modify the slice of the underlying object.
If Iter is only called from UpdateOrAppend, then this single mutex should be sufficient. If however Iter can be called from multiple goroutines, then there is another race possibility. If one UpdateOrAppend is running concurrently with multiple Iter instances, then some of those Iter instances will read from the modified slice elements concurrently, causing a race. So, it should be such that multiple Iters can only run if there are no UpdateOrAppend calls. That is a RWMutex.
But Iter can be called from UpdateOrAppend with a lock, so it cannot really call RLock, otherwise it is a deadlock.
Thus, you need two versions of Iter: one that can be called outside UpdateOrAppend, and that issues RLock in the goroutine, and another that can only be called from UpdateOrAppend and does not call RLock.

GoRoutines and passing struct to original context

I have a configuration that defines a number of instances (SomeConfigItems) which have a thing() created for each of them.
That thing is a struct returned by an included package, which contains, among other things, a Price (float64) and a nested struct. The nested struct maintains a map of trades.
The problem is that I am able to loop through the thing.Streams.Trades and see all trades happening in real time from my main()'s for{} loop. I am not able to see an updated thing.Price even though it is set in the Handler on occasion.
I am having a hard time understanding how the nested structs can contain data but not Price. I feel as though I am missing something with scoping, goroutines, or possibly pointers for instantiation of new objects.
Any help would be appreciated, I will continue reading in the meantime. I've reduced the code to what seems relevant.
main.go:
package main
import thing
var Things []thing.Handler
for _, name := range SomeConfigItems {
handler := thing.New(name)
Things = append(Things, handler)
}
for {
for _, t := range Things {
log.Info("Price: ", t.Price) // This is set to 0 every iteration, but I can actively data in thing.Streams.Trades
}
}
thing.go:
package thing
import streams
type Handler struct {
Name string
Price float64
Streams streams.Streams
}
func New(name string) (h Handler, err error) {
stream, err := streams.New(strings.ToLower(name))
h = Handler{
Name: name,
Price: "0.0"
Streams: stream,
}
go h.handler()
return h, err
}
func (bot *Handler) handler() {
var currentPrice float64
for {
currentPrice = external.GetPrice(bot.Name).Price // Validated that this returns a float64
bot.Price = currentPrice // Verified that this is updated immediately after in this context.
// Unable to see Price updated from outer context.
}
}
streams.go:
package streams
type Streams struct {
Trades
}
type State struct {
Price string `json:"p"`
Quantity string `json:"q"`
}
type Trades struct {
Trades map[float64]float64
TradeMutex sync.Mutex
Updates chan State
}
func New(name string) (s Streams, err error) {
p := newTradeStream(name)
s = Streams{
Trades: p,
}
return s, err
}
func newTradeStream(name string) (ts Trades) {
ts = Trades{}
ts.Trades = make(map[float64]float64, MaxDepth)
ts.Updates = make(chan State, 500)
// ... Other watchdog code
return ts
}
Note:
I am added some debug logging in multiple locations. From within the Bot Handler, the price was printed (successfully), then updated, and then printed (successfully) again -- Showing no gap in the setting of Price from within the handler() function.
When adding the same type of debugging to the main() for{} loop, I tried setting an incrementing counter and assigning the value of thing.Price -- Printing thing.Price on each loop results in 0, even if I set the price (and validate it gets set) in the same loop, it is back to 0 on the next iteration.
This behavior is why I think that I am missing something very fundamental.

In Go, arguments are passed to functions by value -- meaning what the function gets is a copy of the value, not a reference to the variable. The same is true of the function receiver, and also the return list.
It's not the most elegant description, but for the sake of explanation, let's call this the "function wall." If the value being passed one way or the other is a pointer, the function still gets a copy, but it's a copy of a memory address, and so the pointer can be used to change the value of the variable on the other side of the wall. If it is a reference type, which uses a pointer in the implementation of the type, then again a change to the thing being pointed to can cross that wall. But otherwise the change does not cross the wall, which is one reason so many Go functions are written to return values instead of just modifying values.
Here's a runnable example:
package main
import (
"fmt"
)
type Car struct {
Color string
}
func (c Car) Change() { // c was passed by value, it's a copy
c.Color = "Red"
}
func main() {
ride := Car{"Blue"}
ride.Change()
fmt.Println(ride.Color)
}
Prints "Blue"
But two small changes:
func (c *Car) Change() { // here
c.Color = "Red"
}
func main() {
ride := &Car{"Blue"} // and here
ride.Change()
fmt.Println(ride.Color)
}
And now it prints "Red". Struct is not a reference type. So if you want modifications to a struct to cross the wall without using the return list to do it, use a pointer. Of course this only applies to values being passed via argument, return list, or receiver; and not to variables that are in scope on both sides of the wall; or to modifying the underlying value behind a reference type.
See also "Pointers Versus Values" in Effective Go, and "Go Data Structures" by Russ Cox.

Is it possible to store a Go type

I've got a handful of interfaces, and n number of structs that arbitrarily implement these interfaces. I'd like to keep an array of types and be able to run a loop over them to see which ones are implemented. Is it possible to store a type like this? I spent a little bit of time with the reflect package, but couldn't really find what I was looking for, I understand if maybe this isn't best practice. Trying to do something similar to this.. without a giant type switch, fallthrough, or if.. if... if.
type InterOne interface {
InterOneMethod() string
}
var interfaceMap = map[string]type {
"One": InterOne,
...
}
func doesHandle(any interface{}) []string {
var handles []string
for k, v := range interfaceMap {
if _, ok := any.(v); ok {
handles = append(handles, k)
}
}
return handles
}
EDIT: The answer marked as correct is technically right. I found that due to the comment about the method calling & the overuse of reflection, that this approach was a bad idea. Instead I went with a type switch to check for a single interface because fallthrough is not supported on type switch, and a large if.. if.. if.. with type assertions to be able to make the appropriate calls.

You can use reflect, notice that to get the type of an interface the only way is to use reflect.TypeOf((*INTERFACE)(nil)).Elem(), here's a working example:
var interfaceMap = map[string]reflect.Type{
"One": reflect.TypeOf((*InterOne)(nil)).Elem(),
....
}
func doesHandle(any interface{}) []string {
t := reflect.TypeOf(any)
var handles []string
for k, v := range interfaceMap {
if t.Implements(v) {
handles = append(handles, k)
}
}
return handles
}
playground

Can we write a generic array/slice deduplication in go?

Is there a way to write a generic array/slice deduplication in go, for []int we can have something like (from http://rosettacode.org/wiki/Remove_duplicate_elements#Go ):
func uniq(list []int) []int {
unique_set := make(map[int] bool, len(list))
for _, x := range list {
unique_set[x] = true
}
result := make([]int, len(unique_set))
i := 0
for x := range unique_set {
result[i] = x
i++
}
return result
}
But is there a way to extend it to support any array? with a signature like:
func deduplicate(a []interface{}) []interface{}
I know that you can write that function with that signature, but then you can't actually use it on []int, you need to create a []interface{} put everything from the []int into it, pass it to the function then get it back and put it into a []interface{} and go through this new array and put everything in a new []int.
My question is, is there a better way to do this?

While VonC's answer probably does the closest to what you really want, the only real way to do it in native Go without gen is to define an interface
type IDList interface {
// Returns the id of the element at i
ID(i int) int
// Returns the element
// with the given id
GetByID(id int) interface{}
Len() int
// Adds the element to the list
Insert(interface{})
}
// Puts the deduplicated list in dst
func Deduplicate(dst, list IDList) {
intList := make([]int, list.Len())
for i := range intList {
intList[i] = list.ID(i)
}
uniques := uniq(intList)
for _,el := range uniques {
dst.Insert(list.GetByID(el))
}
}
Where uniq is the function from your OP.
This is just one possible example, and there are probably much better ones, but in general mapping each element to a unique "==able" ID and either constructing a new list or culling based on the deduplication of the IDs is probably the most intuitive way.
An alternate solution is to take in an []IDer where the IDer interface is just ID() int. However, that means that user code has to create the []IDer list and copy all the elements into that list, which is a bit ugly. It's cleaner for the user to wrap the list as an ID list rather than copy, but it's a similar amount of work either way.

The only way I have seen that implemented in Go is with the clipperhouse/gen project,
gen is an attempt to bring some generics-like functionality to Go, with some inspiration from C#’s Linq and JavaScript’s underscore libraries
See this test:
// Distinct returns a new Thing1s slice whose elements are unique. See: http://clipperhouse.github.io/gen/#Distinct
func (rcv Thing1s) Distinct() (result Thing1s) {
appended := make(map[Thing1]bool)
for _, v := range rcv {
if !appended[v] {
result = append(result, v)
appended[v] = true
}
}
return result
}
But, as explained in clipperhouse.github.io/gen/:
gen generates code for your types, at development time, using the command line.
gen is not an import; the generated source becomes part of your project and takes no external dependencies.

You could do something close to this via an interface. Define an interface, say "DeDupable" requiring a func, say, UniqId() []byte, which you could then use to do the removing of dups. and your uniq func would take a []DeDupable and work on it

Correct way to test code that uses time.Ticker?

I'd like your advice on the correct way to test code that uses time.Ticker
For instance, let's say I have a countdown timer like below (just an example I thought up for the purposes of this question):
type TickFunc func(d time.Duration)
func Countdown(duration time.Duration, interval time.Duration, tickCallback TickFunc) {
ticker := time.NewTicker(interval)
for remaining := duration; remaining >= 0; remaining -= interval {
tickCallback(remaining)
<-ticker.C
}
ticker.Stop()
}
http://play.golang.org/p/WJisY52a5L
If I wanted to test this, I'd want to provide a mock so that I can have tests that run quickly and predictably, so I'd need to find a way to get my mock into the Countdown function.
I can think of a few ways to do this:
Create a Ticker interface and a first class function internal to the package that I can patch for the purposes of testing: http://play.golang.org/p/oSGY75vl0U
Create a Ticker interface and pass an implementation directly to the Countdown function:
http://play.golang.org/p/i67Ko5t4qk
If I do it the latter way, am I revealing too much information about how Countdown works and making it more difficult for potential clients to use this code? Instead of giving a duration and interval, they have to construct and pass in a Ticker.
I'm very interested in hearing what's the best approach when testing code like this? Or how you would change the code to preserve the behaviour, but make it more testable?
Thanks for your help!

Since this is a pretty simple function, I assume you are just using this as an example of how to mock non-trivial stuff. If you actually wanted to test this code, rather than mocking up ticker, why not just use really small intervals.
IMHO the 2nd option is the better of the two, making a user call:
foo(dur, NewTicker(interval)...
doesn't seem like much of a burden.
Also having the callback is serious code smell in Go:
func Countdown(ticker Ticker, duration time.Duration) chan time.Duration {
remainingCh := make(chan time.Duration, 1)
go func(ticker Ticker, dur time.Duration, remainingCh chan time.Duration) {
for remaining := duration; remaining >= 0; remaining -= ticker.Duration() {
remainingCh <- remaining
ticker.Tick()
}
ticker.Stop()
close(remainingCh)
}(ticker, duration, remainingCh)
return remainingCh
}
You could then use this code like:
func main() {
for d := range Countdown(NewTicker(time.Second), time.Minute) {
log.Printf("%v to go", d)
}
}
Here it is on the playground: http://play.golang.org/p/US0psGOvvt

This doesn't answer the how to inject the mock part, but it seems like you are trying too hard.
if the example is representative of what you actually are testing, then just use small numbers.
http://play.golang.org/p/b_1kqyIu-u
Countdown(5, 1, func(d time.Duration) {
log.Printf("%v to go", d)
})
Now, if you are testing code that calls Countdown (rather than testing Countdown), then I'd probably just create a flag you can set for your module that scales the numbers to be as fast possible with the same invocation count.
http://play.golang.org/p/KqCGnaR3vc
if testMode {
duration = duration/interval
interval = 1
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Map with TTL option in Go - data-structures

I suggest to use Map of golang's built-in package sync, it's very easy to use and already handles concurrency https://golang.org/pkg/sync/#Map

Related

Lock slice before reading and modifying it

GoRoutines and passing struct to original context

Is it possible to store a Go type

Can we write a generic array/slice deduplication in go?

Correct way to test code that uses time.Ticker?

Categories

Resources