Wanted data-race or bad design? - go

I am implementing an app that integrates a third party API that has a limit of hits per second. I wrote my adapter and I was a happy man until I run my tests with the race condition detector.
The design is simple, there is a:
A struct that counts the requests it has made
A tick that resets this counter to 0 every second
A private function on this struct which is blocking until conditions are met to allow to do an extra call to the API.
Running this test case works very well until you give it the -race flag.
I believe the data-race is caused by the tick thread trying to reset the hit counter and the call requests who increments it...
Is my design bad or should I just live with a data-race alert ?
import (
"sync"
"testing"
"time"
)
var subject httpClientWrapper
func init() {
subject = httpClientWrapper{
hits: 0,
hitsSecond: 1,
}
// reset hits every second to 0
go func() {
tick := time.Tick(1 * time.Second)
for range tick {
subject.hits = 0
}
}()
}
type httpClientWrapper struct {
hits, hitsSecond int
}
var m sync.Mutex
func (c *httpClientWrapper) allowCall() {
m.Lock()
callAllowanceReached := c.hits >= c.hitsSecond
for callAllowanceReached {
// cool down for one second
time.Sleep(1 * time.Second)
callAllowanceReached = c.hits >= c.hitsSecond
}
c.hits = c.hits + 1
m.Unlock()
}
func TestItSleeps(t *testing.T) {
timeStart := time.Now()
var wg = sync.WaitGroup{}
for i := 0; i < 3; i++ {
wg.Add(1)
go func() {
subject.allowCall()
wg.Done()
}()
}
wg.Wait()
elapsedTime := time.Since(timeStart)
if elapsedTime < (1 * time.Second) {
t.Errorf("this test should not had been able to run in less than a second due to locks and cool down")
}
}

Any access to .hits should be behind the mutex, so
// reset hits every second to 0
go func() {
tick := time.Tick(1 * time.Second)
for range tick {
m.Lock()
subject.hits = 0
m.Unlock()
}
}()
Also any sleeps should not occur with the mutex locked, so
m.Lock()
...
{
m.Unlock()
// cool down for one second
time.Sleep(1 * time.Second)
m.Lock()
...
}
...
m.Unlock()

Related

Conditionally Run Consecutive Go Routines

I have the following piece of code. I'm trying to run 3 GO routines at the same time never exceeding three. This works as expected, but the code is supposed to be running updates a table in the DB.
So the first routine processes the first 50, then the second 50, and then third 50, and it repeats. I don't want two routines processing the same rows at the same time and due to how long the update takes, this happens almost every time.
To solve this, I started flagging the rows with a new column processing which is a bool. I set it to true for all rows to be updated when the routine starts and sleep the script for 6 seconds to allow the flag to be updated.
This works for a random amount of time, but every now and then, I'll see 2-3 jobs processing the same rows again. I feel like the method I'm using to prevent duplicate updates is a bit janky and was wondering if there was a better way.
stopper := make(chan struct{}, 3)
var counter int
for {
counter++
stopper <- struct{}{}
go func(db *sqlx.DB, c int) {
fmt.Println("start")
updateTables(db)
fmt.Println("stop"b)
<-stopper
}(db, counter)
time.Sleep(6 * time.Second)
}
in updateTables
var ids[]string
err := sqlx.Select(db, &data, `select * from table_data where processing = false `)
if err != nil {
panic(err)
}
for _, row:= range data{
list = append(ids, row.Id)
}
if len(rows) == 0 {
return
}
for _, row:= range data{
_, err = db.Exec(`update table_data set processing = true where id = $1, row.Id)
if err != nil {
panic(err)
}
}
// Additional row processing
I think there's a misunderstanding on approach to go routines in this case.
Go routines to do these kind of work should be approached like worker Threads, using channels as the communication method in between the main routine (which will be doing the synchronization) and the worker go routines (which will be doing the actual job).
package main
import (
"log"
"sync"
"time"
)
type record struct {
id int
}
func main() {
const WORKER_COUNT = 10
recordschan := make(chan record)
var wg sync.WaitGroup
for k := 0; k < WORKER_COUNT; k++ {
wg.Add(1)
// Create the worker which will be doing the updates
go func(workerID int) {
defer wg.Done() // Marking the worker as done
for record := range recordschan {
updateRecord(record)
log.Printf("req %d processed by worker %d", record.id, workerID)
}
}(k)
}
// Feeding the records channel
for _, record := range fetchRecords() {
recordschan <- record
}
// Closing our channel as we're not using it anymore
close(recordschan)
// Waiting for all the go routines to finish
wg.Wait()
log.Println("we're done!")
}
func fetchRecords() []record {
result := []record{}
for k := 0; k < 100; k++ {
result = append(result, record{k})
}
return result
}
func updateRecord(req record) {
time.Sleep(200 * time.Millisecond)
}
You can even buffer things in the main go routine if you need to update all the 50 tables at once.

Running a maximum of two go routines continuously forever

I'm trying to run a function concurrently. It makes a call to my DB that may take 2-10 seconds. I would like it to continue on to the next routine once it has finished, even if the other one is still processing, but only ever want it be processing a max of 2 at a time. I want this to happen indefinitely. I feel like I'm almost there, but waitGroup forces both routines to wait until completion prior to continuing to another iteration.
const ROUTINES = 2;
for {
var wg sync.WaitGroup
_, err:= db.Exec(`Random DB Call`)
if err != nil {
panic(err)
}
ch := createRoutines(db, &wg)
wg.Add(ROUTINES)
for i := 1; i <= ROUTINES; i++ {
ch <- i
time.Sleep(2 * time.Second)
}
close(ch)
wg.Wait()
}
func createRoutines(db *sqlx.DB, wg *sync.WaitGroup) chan int {
var ch = make(chan int, 5)
for i := 0; i < ROUTINES ; i++ {
go func(db *sqlx.DB) {
defer wg.Done()
for {
_, ok := <-ch
if !ok {
return
}
doStuff(db)
}
}(db)
}
return ch
}
If you need to only have n number of goroutines running at the same time, you can have a buffered channel of size n and use that to block creating new goroutines when there is no space left, something like this
package main
import (
"fmt"
"math/rand"
"time"
)
func main() {
const ROUTINES = 2
rand.Seed(time.Now().UnixNano())
stopper := make(chan struct{}, ROUTINES)
var counter int
for {
counter++
stopper <- struct{}{}
go func(c int) {
fmt.Println("+ Starting goroutine", c)
time.Sleep(time.Duration(rand.Intn(3)) * time.Second)
fmt.Println("- Stopping goroutine", c)
<-stopper
}(counter)
}
}
In this example you see how you can only have ROUTINES number of goroutines that live 0, 1 or 2 seconds. In the output you can also see how every time one goroutine ends another one starts.
This adds an external dependency, but consider this implementation:
package main
import (
"context"
"database/sql"
"log"
"github.com/MicahParks/ctxerrpool"
)
func main() {
// Create a pool of 2 workers for database queries. Log any errors.
databasePool := ctxerrpool.New(2, func(_ ctxerrpool.Pool, err error) {
log.Printf("Failed to execute database query.\nError: %s", err.Error())
})
// Get a list of queries to execute.
queries := []string{
"SELECT first_name, last_name FROM customers",
"SELECT price FROM inventory WHERE sku='1234'",
"other queries...",
}
// TODO Make a database connection.
var db *sql.DB
for _, query := range queries {
// Intentionally shadow the looped variable for scope.
query := query
// Perform the query on a worker. If no worker is ready, it will block until one is.
databasePool.AddWorkItem(context.TODO(), func(workCtx context.Context) (err error) {
_, err = db.ExecContext(workCtx, query)
return err
})
}
// Wait for all workers to finish.
databasePool.Wait()
}

waitgroup with concurrent limit but test fail

I use sync.WaitGroup with goroutine before, but I want to control the goroutine concurrency,
so I write my waitgroup with concurrency limit like:
package wglimit
import (
"sync"
)
// WaitGroupLimit ...
type WaitGroupLimit struct {
ch chan int
wg *sync.WaitGroup
}
// New ...
func New(size int) *WaitGroupLimit {
if size <= 0 {
size = 1
}
return &WaitGroupLimit{
ch: make(chan int, size), // buffer chan to limit concurrency
wg: &sync.WaitGroup{},
}
}
// Add ...
func (wgl *WaitGroupLimit) Add(delta int) {
for i := 0; i < delta; i++ {
wgl.ch <- 1
wgl.wg.Add(1)
}
}
// Done ...
func (wgl *WaitGroupLimit) Done() {
wgl.wg.Done()
<-wgl.ch
}
// Wait ...
func (wgl *WaitGroupLimit) Wait() {
close(wgl.ch)
wgl.wg.Wait()
}
And then I use it to control the goroutine concurrency, for example:
jobs := ["1", "2", "3", "4"] // some jobs
// wg := sync.WaitGroup{} // have no concurrency limit
wg := wglimit.New(2) // limit 2 goroutine
for _, job := range jobs {
wg.Add(1)
go func(job string) {
// job worker
defer wg.Done()
}(job)
}
wg.Wait()
And it looks like worked when running.
But Test Failed:
package wglimit
import (
"runtime"
"testing"
"time"
)
func TestGoLimit(t *testing.T) {
var limit int = 5
wglimit := New(limit)
for i := 0; i < 10000; i++ {
wglimit.Add(1)
go func() {
defer wglimit.Done()
time.Sleep(time.Millisecond)
if runtime.NumGoroutine() > limit+2 {
println(runtime.NumGoroutine()) // will print 9 , cocurrent limit fail ?
t.Errorf("FAIL")
}
}()
}
wglimit.Wait()
}
When testing, the goroutine numbers is bigger than my limit, it seems like the cocurrent limit fail.
Anything wrong with my WaitGroupLimit code and why?
Anything wrong with my WaitGroupLimit code [...]?
No.
The problem is runtime.NumGoroutine() doesn't do what you seem to think it does. It counts all goroutines, i.e. not only the ones you start but also the goroutines the runtime uses itself, e.g. for concurrent garbage collection. NumGoroutine is thus higher than your limit.
Your code is fine, your test isn't. Do not try to get clever in testing and test what you code really does: It blocks on Add until the limited resource is available. Test that and not a goroutine count which is just a (bad) proxy for the desired behaviour in your test.

How to always get the latest value from a Go channel?

I'm starting out with Go and I'm now writing a simple program which reads out data from a sensor and puts that into a channel to do some calculations with it. I now have it working as follows:
package main
import (
"fmt"
"time"
"strconv"
)
func get_sensor_data(c chan float64) {
time.Sleep(1 * time.Second) // wait a second before sensor data starts pooring in
c <- 2.1 // Sensor data starts being generated
c <- 2.2
c <- 2.3
c <- 2.4
c <- 2.5
}
func main() {
s := 1.1
c := make(chan float64)
go get_sensor_data(c)
for {
select {
case s = <-c:
fmt.Println("the next value of s from the channel: " + strconv.FormatFloat(s, 'f', 1, 64))
default:
// no new values in the channel
}
fmt.Println(s)
time.Sleep(500 * time.Millisecond) // Do heavy "work"
}
}
This works fine, but the sensor generates a lot of data, and I'm always only interested in the latest data. With this setup however, it only reads out the next item with every loop, which means that if the channel at some point contains 20 values, the newest value only is read out after 10 seconds.
Is there a way for a channel to always only contain one value at a time, so that I always only get the data I'm interested in, and no unnecessary memory is used by the channel (although the memory is the least of my worries)?
Channels are best thought of as queues (FIFO). Therefore you can't really skip around. However there are libraries out there that do stuff like this: https://github.com/cloudfoundry/go-diodes is an atomic ring buffer that will overwrite old data. You can set a smaller size if you like.
All that being said, it doesn't sound like you need a queue (or ring buffer). You just need a mutex:
type SensorData struct{
mu sync.RWMutex
last float64
}
func (d *SensorData) Store(data float64) {
mu.Lock()
defer mu.Unlock()
d.last = data
}
func (d *SensorData) Get() float64 {
mu.RLock()
defer mu.RUnlock()
return d.last
}
This uses a RWMutex which means many things can read from it at the same time while only a single thing can write. It will store a single entry much like you said.
No. Channels are FIFO buffers, full stop. That is how channels work and their only purpose. If you only want the latest value, consider just using a single variable protected by a mutex; write to it whenever new data comes in, and whenever you read it, you will always be reading the latest value.
Channels serves a specific purpose. You might want to use a code that is inside a lock and update the variable whenever new value is to be set.
This way reciever will always get the latest value.
You cannot get that from one channel directly, but you can use one channel per value and get notified when there are new values:
package main
import (
"fmt"
"strconv"
"sync"
"time"
)
type LatestChannel struct {
n float64
next chan struct{}
mu sync.Mutex
}
func New() *LatestChannel {
return &LatestChannel{next: make(chan struct{})}
}
func (c *LatestChannel) Push(n float64) {
c.mu.Lock()
c.n = n
old := c.next
c.next = make(chan struct{})
c.mu.Unlock()
close(old)
}
func (c *LatestChannel) Get() (float64, <-chan struct{}) {
c.mu.Lock()
n := c.n
next := c.next
c.mu.Unlock()
return n, next
}
func getSensorData(c *LatestChannel) {
time.Sleep(1 * time.Second)
c.Push(2.1)
time.Sleep(100 * time.Millisecond)
c.Push(2.2)
time.Sleep(100 * time.Millisecond)
c.Push(2.3)
time.Sleep(100 * time.Millisecond)
c.Push(2.4)
time.Sleep(100 * time.Millisecond)
c.Push(2.5)
}
func main() {
s := 1.1
c := New()
_, hasNext := c.Get()
go getSensorData(c)
for {
select {
case <-hasNext:
s, hasNext = c.Get()
fmt.Println("the next value of s from the channel: " + strconv.FormatFloat(s, 'f', 1, 64))
default:
// no new values in the channel
}
fmt.Println(s)
time.Sleep(250 * time.Millisecond) // Do heavy "work"
}
}
If you do not need the notify about new value, you can try to read Channels inside channels pattern in Golang.
Try this package https://github.com/subbuv26/chanup
It allows the producer to update the channel with latest value, which replaces the latest value. And produces does not get blocked. (with this, stale values gets overridden).
So, on the consumer side, always only the latest item gets read.
import "github.com/subbuv26/chanup"
ch := chanup.GetChan()
_ := ch.Put(testType{
a: 10,
s: "Sample",
})
_ := ch.Update(testType{
a: 20,
s: "Sample2",
})
// Continue updating with latest values
...
...
// On consumer end
val := ch.Get()
// val contains latest value
There is another way to solve this problem (trick)
sender work faster: sender remove channel if channel_length > 1
go func() {
for {
msg:=strconv.Itoa(int(time.Now().Unix()))
fmt.Println("make: ",msg," at:",time.Now())
messages <- msg
if len(messages)>1{
//remove old message
<-messages
}
time.Sleep(2*time.Second)
}
}()
receiver work slower:
go func() {
for {
channLen :=len(messages)
fmt.Println("len is ",channLen)
fmt.Println("received",<-messages)
time.Sleep(10*time.Second)
}
}()
OR, we can delete old message from receiver side
(read message like delete it)
There is an elegant channel-only solution. If you're OK with adding one more channel and goroutine - you can introduce a buferless channel and a goroutine that tries to send the latest value from your channel to it:
package main
import (
"fmt"
"time"
)
func wrapLatest(ch <-chan int) <-chan int {
result := make(chan int) // important that this one i unbuffered
go func() {
defer close(result)
value, ok := <-ch
if !ok {
return
}
LOOP:
for {
select {
case value, ok = <-ch:
if !ok {
return
}
default:
break LOOP
}
}
for {
select {
case value, ok = <-ch:
if !ok {
return
}
case result <- value:
if value, ok = <-ch; !ok {
return
}
}
}
}()
return result
}
func main() {
sendChan := make(chan int, 10) // may be buffered or not
for i := 0; i < 10; i++ {
sendChan <- i
}
go func() {
for i := 10; i < 20; i++ {
sendChan <- i
time.Sleep(time.Second)
}
close(sendChan)
}()
recvChan := wrapLatest(sendChan)
for i := range recvChan {
fmt.Println(i)
time.Sleep(time.Second * 2)
}
}

I have a strange bug

given are the following 2 functions.
func main() {
index := int(0)
for {
Loop(index)
index = (index + 1) % 86400 // Max interval: 1 day
time.Sleep(1 * time.Second)
}
}
func Loop(index int) {
if index%10 == 0 {
go doSomething...
}
}
I want to execute something every 10/60/3600 seconds. So I thought an incrementing index with modulo should do this.
But what I noticed (especially on high traffic servers) that it appears to skip some of that loops.
I looked in my logs and sometimes there is something every 10 seconds but sometimes there is a gap up to 1 minute.
Does anybody know why this is happening?
I'd recommend using a time.Ticker to perform some action every N seconds. That way, you use built-in timers and only wake the CPU when something needs to be done. Even if the CPU is not heavily used, time.Sleep and a for loop is not the most reliable way to schedule tasks. For example (from the link above):
package main
import (
"fmt"
"time"
)
func main() {
ticker := time.NewTicker(time.Second)
defer ticker.Stop()
done := make(chan bool)
go func() {
time.Sleep(10 * time.Second)
done <- true
}()
for {
select {
case <-done:
fmt.Println("Done!")
return
case t := <-ticker.C:
fmt.Println("Current time: ", t)
}
}
}

Resources