Multiple requests for a single computationally expensive resource in Go - go

Looking for a more Go-ish solution to the following:
Say, a server have multiple parallell incoming requests asking for a resource with key key. Since computing this resource is expensive/time consuming, we'd like to ensure that is is computed only once. There are infinitely number of possible keys.
One naíve implementation:
if hasCachedValue(key) {
return cachedValue(key)
}
if somebodyElseWorkingOn(key) {
waitUntilReady(key)
} else {
buildCacheValue(key) // time consuming
}
return cachedValue(key)
So far we have solved this using a shared map[string]chan bool, where the first request inserts the chan for key, and the following requests waits for a close on that chan when the value is ready. To protect the map we use a sync.Mutex, but we have a feeling there is a better and more Go-ish solution.

Use the singleflight package. Declare a package-level variable for the group:
var g singleflight.Group
Use the following code to get the value:
v, err, _ := g.Do(key, func() (interface{}, error) {
if !hasCachedValue(key) {
buildCacheValue(key)
}
return cachedValue(key), nil
})
if err != nil {
// handle error
}
x := v.(valueType) // assert to type returned by cachedValue
// do something with x

Here is a simple code that does what you want. I tested it and it works without problem. The Go race condition checker detects no problem.
type Cache struct {
mtx sync.RWMutex
m map[KeyType]*CacheValue
}
type CacheValue struct {
val *ValueType
mtx sync.Mutex
}
func NewCache() *Cache {
return &Cache{m: make(map[KeyType]*CacheValue)}
}
func (c *Cache) Get(key KeyType) *ValueType {
c.mtx.RLock()
v := c.m[key]
c.mtx.RUnlock()
if v != nil {
v.mtx.Lock()
x := v.val
v.mtx.Unlock()
if x != nil {
return x
}
}
if v == nil {
c.mtx.Lock()
v = c.m[key]
if v == nil {
v = &CacheValue{}
c.m[key] = v
}
c.mtx.Unlock()
}
v.mtx.Lock()
if v.val == nil {
v.val = buildValue(key)
}
v.mtx.Unlock()
return v.val
}

Inspired by the Ping Pong example often used to describe channels, we sat out to try a channel-only approach. A ball keeps the state about keys being generated, and the ball is passed between the requests a shared channel:
import "time"
var table = make(chan map[string]chan bool)
func keeper() {
for {
ball := <- table
table <- ball
}
}
func getResource(key string) {
// Take ball from table
ball := <- table
if wait, ok := ball[key]; ok{
println("Somebody else working on " + key + ", waiting")
table <- ball
<- wait
} else {
println("I will build " + key)
ball[key] = make(chan bool)
// Throw away ball
table <- ball
// Building value
time.Sleep(time.Millisecond * 10)
println("I built value for " + key + "!")
// Clean up ball
ball = <- table
close(ball[key])
delete(ball, key)
table <- ball
}
println("Now value for " + key + " has been built")
}
func main(){
go keeper()
ball := make(map[string]chan bool)
table <- ball
key := "key"
go getResource(key)
go getResource(key)
go getResource(key)
time.Sleep(time.Second)
}

Related

Only have unique values (no duplicates) in a golang channel

On IoT devices go applications are running that can receive commands from the cloud. The commands are pushed on a queue
var queue chan time.Time
and workers on the IoT device process the queue.
The job of the worker is to send back data covering a period of time to the cloud, the time on the channel is the start time of such a period. The IoT devices are on mobile network connection so sometimes data gets lost and never arrives at the cloud. The cloud also is not sure if the command it sent arrived on the IoT device and could get impatient and resend the command.
I want to make sure that if the original command is still in the queue the same command can not be pushed on the queue. Is there a way to do that?
func addToQueue(periodStart time.Time) error {
if alreadyOnQueue(queue, periodStart) {
return errors.New("periodStart was already on the queue, not adding it again")
}
queue <- periodStart
return nil
}
func alreadyOnQueue(queue chan time.Time, t time.Time) bool {
return false // todo
}
I've created a solution that is available on https://github.com/munnik/uniqueue/
package uniqueue
import (
"errors"
"sync"
)
// UQ is a uniqueue queue. It guarantees that a value is only once in the queue. The queue is thread safe.
// The unique constraint can be temporarily disabled to add multiple instances of the same value to the queue.
type UQ[T comparable] struct {
back chan T
queue chan T
front chan T
constraints map[T]*constraint
mu sync.Mutex
AutoRemoveConstraint bool // if true, the constraint will be removed when the value is popped from the queue.
}
type constraint struct {
count uint // number of elements in the queue
disabled bool
}
func NewUQ[T comparable](size uint) *UQ[T] {
u := &UQ[T]{
back: make(chan T),
queue: make(chan T, size),
front: make(chan T),
constraints: map[T]*constraint{},
}
go u.linkChannels()
return u
}
// Get the back of the queue, this channel can be used to write values to.
func (u *UQ[T]) Back() chan<- T {
return u.back
}
// Get the front of the queue, this channel can be used to read values from.
func (u *UQ[T]) Front() <-chan T {
return u.front
}
// Ignores the constraint for a value v once, when the value is added to the queue again, the constraint is enabled again.
func (u *UQ[T]) IgnoreConstraintFor(v T) {
u.mu.Lock()
defer u.mu.Unlock()
if _, ok := u.constraints[v]; !ok {
u.constraints[v] = &constraint{}
}
u.constraints[v].disabled = true
}
// Manually add a constraint to the queue, only use in special cases when you want to prevent certain values to enter the queue.
func (u *UQ[T]) AddConstraint(v T) error {
u.mu.Lock()
defer u.mu.Unlock()
if _, ok := u.constraints[v]; !ok {
u.constraints[v] = &constraint{
count: 1,
disabled: false,
}
return nil
} else {
if u.constraints[v].disabled {
u.constraints[v].count += 1
u.constraints[v].disabled = false
return nil
}
}
return errors.New("Already existing constraint prevents adding new constraint")
}
// Manually remove a constraint from the queue, this needs to be called when AutoRemoveConstraint is set to false. Useful when you want to remove the constraint only when a worker using the queue is finished processing the value.
func (u *UQ[T]) RemoveConstraint(v T) {
u.mu.Lock()
defer u.mu.Unlock()
if _, ok := u.constraints[v]; ok {
u.constraints[v].count -= 1
if u.constraints[v].count == 0 {
delete(u.constraints, v)
}
}
}
func (u *UQ[T]) linkChannels() {
wg := &sync.WaitGroup{}
wg.Add(2)
go u.shiftToFront(wg)
go u.readFromBack(wg)
wg.Wait()
}
func (u *UQ[T]) shiftToFront(wg *sync.WaitGroup) {
for v := range u.queue {
u.front <- v
if u.AutoRemoveConstraint {
u.RemoveConstraint(v)
}
}
close(u.front)
wg.Done()
}
func (u *UQ[T]) readFromBack(wg *sync.WaitGroup) {
for v := range u.back {
if err := u.AddConstraint(v); err == nil {
u.queue <- v
}
}
close(u.queue)
wg.Done()
}

How to handle multiple goroutines that share the same channel

I've been searching a lot but could not find an answer for my problem yet.
I need to make multiple calls to an external API, but with different parameters concurrently.
And then for each call I need to init a struct for each dataset and process the data I receive from the API call. Bear in mind that I read each line of the incoming request and start immediately send it to the channel.
First problem I encounter was not obvious at the beginning due to the large quantity of data I'm receiving, is that each goroutine does not receive all the data that goes through the channel. (Which I learned by the research I've made). So what I need is a way of requeuing/redirect that data to the correct goroutine.
The function that sends the streamed response from a single dataset.
(I've cut useless parts of code that are out of context)
func (api *API) RequestData(ctx context.Context, c chan DWeatherResponse, dataset string, wg *sync.WaitGroup) error {
for {
line, err := reader.ReadBytes('\n')
s := string(line)
if err != nil {
log.Println("End of %s", dataset)
return err
}
data, err := extractDataFromStreamLine(s, dataset)
if err != nil {
continue
}
c <- *data
}
}
The function that will process the incoming data
func (s *StrikeStruct) Process(ch, requeue chan dweather.DWeatherResponse) {
for {
data, more := <-ch
if !more {
break
}
// data contains {dataset string, value float64, date time.Time}
// The s.Parameter needs to match the dataset
// IMPORTANT PART, checks if the received data is part of this struct dataset
// If not I want to send it to another go routine until it gets to the correct
one. There will be a max of 4 datasets but still this could not be the best approach to have
if !api.GetDataset(s.Parameter, data.Dataset) {
requeue <- data
continue
}
// Do stuff with the data from this point
}
}
Now on my own API endpoint I have the following:
ch := make(chan dweather.DWeatherResponse, 2)
requeue := make(chan dweather.DWeatherResponse)
final := make(chan strike.StrikePerYearResponse)
var wg sync.WaitGroup
for _, s := range args.Parameters.Strikes {
strike := strike.StrikePerYear{
Parameter: strike.Parameter(s.Dataset),
StrikeValue: s.Value,
}
// I receive and process the data in here
go strike.ProcessStrikePerYear(ch, requeue, final, string(s.Dataset))
}
go func() {
for {
data, _ := <-requeue
ch <- data
}
}()
// Creates a goroutine for each dataset
for _, dataset := range api.Params.Dataset {
wg.Add(1)
go api.RequestData(ctx, ch, dataset, &wg)
}
wg.Wait()
close(ch)
//Once the data is all processed it is all appended
var strikes []strike.StrikePerYearResponse
for range args.Fetch.Datasets {
strikes = append(strikes, <-final)
}
return strikes
The issue with this code is that as soon as I start receiving data from more than one endpoint the requeue will block and nothing more happens. If I remove that requeue logic data will be lost if it does not land on the correct goroutine.
My two questions are:
Why is the requeue blocking if it has a goroutine always ready to receive?
Should I take a different approach on how I'm processing the incoming data?
this is not a good way to solving your problem. you should change your solution. I suggest an implementation like the below:
import (
"fmt"
"sync"
)
// answer for https://stackoverflow.com/questions/68454226/how-to-handle-multiple-goroutines-that-share-the-same-channel
var (
finalResult = make(chan string)
)
// IData use for message dispatcher that all struct must implement its method
type IData interface {
IsThisForMe() bool
Process(*sync.WaitGroup)
}
//MainData can be your main struct like StrikePerYear
type MainData struct {
// add any props
Id int
Name string
}
type DataTyp1 struct {
MainData *MainData
}
func (d DataTyp1) IsThisForMe() bool {
// you can check your condition here to checking incoming data
if d.MainData.Id == 2 {
return true
}
return false
}
func (d DataTyp1) Process(wg *sync.WaitGroup) {
d.MainData.Name = "processed by DataTyp1"
// send result to final channel, you can change it as you want
finalResult <- d.MainData.Name
wg.Done()
}
type DataTyp2 struct {
MainData *MainData
}
func (d DataTyp2) IsThisForMe() bool {
// you can check your condition here to checking incoming data
if d.MainData.Id == 3 {
return true
}
return false
}
func (d DataTyp2) Process(wg *sync.WaitGroup) {
d.MainData.Name = "processed by DataTyp2"
// send result to final channel, you can change it as you want
finalResult <- d.MainData.Name
wg.Done()
}
//dispatcher will run new go routine for each request.
//you can implement a worker pool to preventing running too many go routines.
func dispatcher(incomingData *MainData, wg *sync.WaitGroup) {
// based on your requirements you can remove this go routing or not
go func() {
var p IData
p = DataTyp1{incomingData}
if p.IsThisForMe() {
go p.Process(wg)
return
}
p = DataTyp2{incomingData}
if p.IsThisForMe() {
go p.Process(wg)
return
}
}()
}
func main() {
dummyDataArray := []MainData{
MainData{Id: 2, Name: "this data #2"},
MainData{Id: 3, Name: "this data #3"},
}
wg := sync.WaitGroup{}
for i := range dummyDataArray {
wg.Add(1)
dispatcher(&dummyDataArray[i], &wg)
}
result := make([]string, 0)
done := make(chan struct{})
// data collector
go func() {
loop:for {
select {
case <-done:
break loop
case r := <-finalResult:
result = append(result, r)
}
}
}()
wg.Wait()
done<- struct{}{}
for _, s := range result {
fmt.Println(s)
}
}
Note: this is just for opening your mind for finding a better solution, and for sure this is not a production-ready code.

Inspect value from channel

I have two read-only channels <-chan Event that utilized as generators.
type Event struct{
time int
}
I can read their values as:
for {
select {
case <-chan1:
// do something
case <-chan2:
//do something
}
I use those channels for event-driven simulations so I have to choose the Event with the less time field.
Is it possible to inspect which value is going from each channel and only then choose from which one to read? Because the operation <-chan1 takes value from channel and it is impossible to push it back (read only channel).
You can implement your version of go channel structure. for example following implementation act like go channel without size limit and you can inspect its first element.
package buffchan
import (
"container/list"
"sync"
)
// BufferedChannel provides go channel like interface with unlimited storage
type BufferedChannel struct {
m *sync.Mutex
l *list.List
c *sync.Cond
}
// New Creates new buffer channel
func New() *BufferedChannel {
m := new(sync.Mutex)
return &BufferedChannel{
m: m,
l: list.New(),
c: sync.NewCond(m),
}
}
// Append adds given data at end of channel
func (b *BufferedChannel) Append(v interface{}) {
b.m.Lock()
defer b.m.Unlock()
b.l.PushBack(v)
b.c.Signal()
}
// Remove removes first element of list synchronously
func (b *BufferedChannel) Remove() interface{} {
b.m.Lock()
defer b.m.Unlock()
for b.l.Len() == 0 {
b.c.Wait()
}
v := b.l.Front()
b.l.Remove(v)
return v.Value
}
// Inspect first element of list if exists
func (b *BufferedChannel) Inspect() interface{} {
b.m.Lock()
defer b.m.Unlock()
for b.l.Len() == 0 {
return nil
}
return b.l.Front().Value
}
// AsyncRemove removes first element of list asynchronously
func (b *BufferedChannel) AsyncNonBlocking() interface{} {
b.m.Lock()
defer b.m.Unlock()
for b.l.Len() == 0 {
return nil
}
v := b.l.Front()
b.l.Remove(v)
return v.Value
}

How can I get rid of this data race

I have these 2 functions:
// PartyHub struct contains all data for the party
type PartyHub struct {
FullPartys map[string]Party
PartialPartys map[string]Party
Enter chan Member
Leave chan Member
sync.Mutex
}
// RemoveFromQueue will remove the member from party
func (p *PartyHub) RemoveFromQueue(memberLeaving Member, inQueue bool) {
if !inQueue {
return
}
for _, party := range p.PartialPartys {
go func(party Party) {
if _, ok := party.Members[memberLeaving.Identifier]; ok {
p.Lock()
->>>>>>>> delete(party.Members, memberLeaving.Identifier)
p.Unlock()
}
}(party)
}
log.Println("Removing")
}
// SortIntoParty will sort the member into party
func (p *PartyHub) SortIntoParty(newMember Member, inQueue bool) {
log.Println(inQueue)
if inQueue {
return
}
log.Println("Adding")
foundParty := false
->> for partyid, party := range p.PartialPartys {
if !party.Accepting {
continue
}
goodFitForParty := true
for _, partyMember := range party.Members {
if newMember.Type == partyMember.Type && newMember.Rank >= partyMember.Rank-partyMember.RankTol && newMember.Rank <= partyMember.Rank+partyMember.RankTol {
goodFitForParty = true
continue
} else {
goodFitForParty = false
break
}
}
if !goodFitForParty {
continue
} else {
foundParty = true
newMember.Conn.CurrentParty = partyid
p.Lock()
p.PartialPartys[partyid].Members[newMember.Conn.Identifier] = newMember
p.Unlock()
if len(party.Members) == 2 {
p.Lock()
party.Accepting = false
p.Unlock()
// Start Go Routine
}
break
}
}
if !foundParty {
uuid := feeds.NewUUID().String()
newMember.Conn.CurrentParty = uuid
p.Lock()
p.PartialPartys[uuid] = Party{Accepting: true, Members: make(map[string]Member), Ready: make(chan *Connection), Decline: make(chan *Connection)}
p.PartialPartys[uuid].Members[newMember.Conn.Identifier] = newMember
p.Unlock()
}
}
I put ->>>>>> next to where the 2 pieces of code are being accessed, I'm not sure how I can keep these 2 up to date without being in a data race, fairly new to go and wondering how I should be reading and writing this variable without a data-race.
You've got a lot of code in your question, but it looks like you're trying to delete elements from a map (party.Members) in one goroutine, while looping over it in another. This sounds like an unmaintainable, error-ridden disaster in the making, but it's possible to do without memory races.
You need a mutex to protect access (both read and write) to the map, and the hard part is to make sure the lock is held during the for/range iteration. Here's one way to do it, by having the lock held before the for loop starts, and unlocking it inside the body of the loop.
var mut sync.Mutex
var m = map[string]int{}
func f(key string) {
mut.Lock()
defer mut.Unlock()
delete(m, key)
}
func g() {
mut.Lock()
defer mut.Unlock()
for k, v := range m {
mut.Unlock()
fmt.Println(k, v)
mut.Lock()
}
}
Here, any combination of fs and gs can be called concurrently without memory races.
Drastically simpler to understand would be to not Unlock/Lock the mutex inside the loop, which would mean a deletion in f would wait for any running loop in g to complete (or vice-versa).

Go - wait for next item in a priority queue if empty

I am trying to implement a priority queue to send json objects through a network socket based on priority. I am using the container/heap package to implement the queue. I came up with something like this:
for {
if pq.Len() > 0 {
item := heap.Pop(&pq).(*Item)
jsonEncoder.Encode(&item)
} else {
time.Sleep(10 * time.Millisecond)
}
}
Are there better ways to wait for a new item than just polling the priority queue?
I'd probably use a couple a queuing goroutine. Starting with the data structures in the PriorityQueue example, I'd build a function like this:
http://play.golang.org/p/hcNFX8ehBW
func queue(in <-chan *Item, out chan<- *Item) {
// Make us a queue!
pq := make(PriorityQueue, 0)
heap.Init(&pq)
var currentItem *Item // Our item "in hand"
var currentIn = in // Current input channel (may be nil sometimes)
var currentOut chan<- *Item // Current output channel (starts nil until we have something)
defer close(out)
for {
select {
// Read from the input
case item, ok := <-currentIn:
if !ok {
// The input has been closed. Don't keep trying to read it
currentIn = nil
// If there's nothing pending to write, we're done
if currentItem == nil {
return
}
continue
}
// Were we holding something to write? Put it back.
if currentItem != nil {
heap.Push(&pq, currentItem)
}
// Put our new thing on the queue
heap.Push(&pq, item)
// Turn on the output queue if it's not turned on
currentOut = out
// Grab our best item. We know there's at least one. We just put it there.
currentItem = heap.Pop(&pq).(*Item)
// Write to the output
case currentOut <- currentItem:
// OK, we wrote. Is there anything else?
if len(pq) > 0 {
// Hold onto it for next time
currentItem = heap.Pop(&pq).(*Item)
} else {
// Oh well, nothing to write. Is the input stream done?
if currentIn == nil {
// Then we're done
return
}
// Otherwise, turn off the output stream for now.
currentItem = nil
currentOut = nil
}
}
}
}
Here's an example of using it:
func main() {
// Some items and their priorities.
items := map[string]int{
"banana": 3, "apple": 2, "pear": 4,
}
in := make(chan *Item, 10) // Big input buffer and unbuffered output should give best sort ordering.
out := make(chan *Item) // But the system will "work" for any particular values
// Start the queuing engine!
go queue(in, out)
// Stick some stuff on in another goroutine
go func() {
i := 0
for value, priority := range items {
in <- &Item{
value: value,
priority: priority,
index: i,
}
i++
}
close(in)
}()
// Read the results
for item := range out {
fmt.Printf("%.2d:%s ", item.priority, item.value)
}
fmt.Println()
}
Note that if you run this example, the order will be a little different every time. That's of course expected. It depends on exactly how fast the input and output channels run.
One way would be to use sync.Cond:
Cond implements a condition variable, a rendezvous point for goroutines waiting for or announcing the occurrence of an event.
An example from the package could be amended as follows (for the consumer):
c.L.Lock()
for heap.Len() == 0 {
c.Wait() // Will wait until signalled by pushing routine
}
item := heap.Pop(&pq).(*Item)
c.L.Unlock()
// Do stuff with the item
And producer could simply do:
c.L.Lock()
heap.Push(x)
c.L.Unlock()
c.Signal()
(Wrapping these in functions and using defers might be a good idea.)
Here is an example of thread-safe (naive) heap which pop method waits until item is available:
package main
import (
"fmt"
"sort"
"sync"
"time"
"math/rand"
)
type Heap struct {
b []int
c *sync.Cond
}
func NewHeap() *Heap {
return &Heap{c: sync.NewCond(new(sync.Mutex))}
}
// Pop (waits until anything available)
func (h *Heap) Pop() int {
h.c.L.Lock()
defer h.c.L.Unlock()
for len(h.b) == 0 {
h.c.Wait()
}
// There is definitely something in there
x := h.b[len(h.b)-1]
h.b = h.b[:len(h.b)-1]
return x
}
func (h *Heap) Push(x int) {
defer h.c.Signal() // will wake up a popper
h.c.L.Lock()
defer h.c.L.Unlock()
// Add and sort to maintain priority (not really how the heap works)
h.b = append(h.b, x)
sort.Ints(h.b)
}
func main() {
heap := NewHeap()
go func() {
for range time.Tick(time.Second) {
for n := 0; n < 3; n++ {
x := rand.Intn(100)
fmt.Println("push:", x)
heap.Push(x)
}
}
}()
for {
item := heap.Pop()
fmt.Println("pop: ", item)
}
}
(Note this is not working in playground because of the for range time.Tick loop. Run it locally.)

Resources