I'm trying to solve WARNING: DATA RACE
here is the code:
package models
import (
"sync"
"time"
)
type Stats struct {
sync.Mutex
request map[int64]int
}
func (s *Stats) PutRequest() {
s.Lock()
s.request[time.Now().Unix()]++
s.Unlock()
}
func (s *Stats) GetRequests() map[int64]int {
s.Lock()
m := s.request
s.Unlock()
return m
}
var Requests = Stats{
sync.Mutex{},
make(map[int64]int),
}
If i change Stats field request into integer then everithing works fine but not with map. How to correctly lock map in Go?
Use sync.RWMutex
func (s *Stats) PutRequest(ut int64) {
s.Lock()
defer s.Unlock()
s.request[ut]++
}
func (s *Stats) GetRequests() map[int64]int {
s.RLock()
defer s.RUnlock()
m := make(map[int64]int, len(s.request))
for k, v := range s.request {
m[k] = v
}
return m
}
The following channel example can be interesting in this case. go by example - stateful goroutines
Anyway you need to copy the map before returning it.
GetRequests returns the reference to the map, so if other code calls the function and do r/w on the returning map without acquiring the lock, then data race is introduced
Related
Say I have an expensive function
func veryExpensiveFunction(int) int
and this function gets called a lot for the same number.
Is there a good way to allow this function to store previous results to use if the function gets called again that is perhaps even reusable for veryExpensiveFunction2?
Obviously, it would be possible to add an argument
func veryExpensiveFunctionCached(p int, cache map[int]int) int {
if val, ok := cache[p]; ok {
return val
}
result := veryExpensiveFunction(p)
cache[p] = result
return result
}
But now I have to create the cache somewhere, where I don't care about it. I would rather have it as a "static function member" if this were possible.
What is a good way to simulate a static member cache in go?
You can use closures; and let the closure manage the cache.
func InitExpensiveFuncWithCache() func(p int) int {
var cache = make(map[int]int)
return func(p int) int {
if ret, ok := cache[p]; ok {
fmt.Println("from cache")
return ret
}
// expensive computation
time.Sleep(1 * time.Second)
r := p * 2
cache[p] = r
return r
}
}
func main() {
ExpensiveFuncWithCache := InitExpensiveFuncWithCache()
fmt.Println(ExpensiveFuncWithCache(2))
fmt.Println(ExpensiveFuncWithCache(2))
}
output:
4
from cache
4
veryExpensiveFunctionCached := InitExpensiveFuncWithCache()
and use the wrapped function with your code.
You can try it here.
If you want it to be reusable, change the signature to InitExpensiveFuncWithCache(func(int) int) so it accept a function as a parameter. Wrap it in the closure, replacing the expensive computation part with it.
You need to be careful about synchronization if this cache will be used in http handlers. In Go standard lib, each http request is processed in a dedicated goroutine and at this moment we are at the domain of concurrency and race conditions. I would suggest a RWMutex to ensure data consistency.
As for the cache injection, you may inject it at a function where you create the http handler.
Here it is a prototype
type Cache struct {
store map[int]int
mux sync.RWMutex
}
func NewCache() *Cache {
return &Cache{make(map[int]int), sync.RWMutex{}}
}
func (c *Cache) Set(id, value int) {
c.mux.Lock()
c.store[id] = id
c.mux.Unlock()
}
func (c *Cache) Get(id int) (int, error) {
c.mux.RLock()
v, ok := c.store[id]
c.mux.RUnlock()
if !ok {
return -1, errors.New("a value with given key not found")
}
return v, nil
}
func handleComplexOperation(c *Cache) http.HandlerFunc {
return http.HandlerFunc(func(rw http.ResponseWriter, r *http.Request){
})
}
The Go standard library uses the following style for providing "static" functions (e.g. flag.CommandLine) but which leverage underlying state:
// "static" function is just a wrapper
func Lookup(p int) int { return expCache.Lookup(p) }
var expCache = NewCache()
func newCache() *CacheExpensive { return &CacheExpensive{cache: make(map[int]int)} }
type CacheExpensive struct {
l sync.RWMutex // lock for concurrent access
cache map[int]int
}
func (c *CacheExpensive) Lookup(p int) int { /*...*/ }
this design pattern not only allows for simple one-time use, but also allows for segregated usage:
var (
userX = NewCache()
userY = NewCache()
)
userX.Lookup(12)
userY.Lookup(42)
Suppose I have the following struct:
package manager
type Manager struct {
strings []string
}
func (m *Manager) AddString(s string) {
m.strings = append(m.strings, s)
}
func (m *Manager) RemoveString(s string) {
for i, str := range m.strings {
if str == s {
m.strings = append(m.strings[:i], m.strings[i+1:]...)
}
}
}
This pattern is not thread safe, so the following test fails due to some race condition (array index out of bounds):
func TestManagerConcurrently(t *testing.T) {
m := &manager.Manager{}
wg := sync.WaitGroup{}
for i:=0; i<100; i++ {
wg.Add(1)
go func () {
m.AddString("a")
m.AddString("b")
m.AddString("c")
m.RemoveString("b")
wg.Done()
} ()
}
wg.Wait()
fmt.Println(m)
}
I'm new to Go, and from googling around I suppose I should use channels (?). So one way to make this concurrent would be like this:
type ManagerA struct {
Manager
addStringChan chan string
removeStringChan chan string
}
func NewManagerA() *ManagerA {
ma := &ManagerA{
addStringChan: make(chan string),
removeStringChan: make(chan string),
}
go func () {
for {
select {
case msg := <-ma.addStringChan:
ma.AddString(msg)
case msg := <-ma.removeStringChan:
ma.RemoveString(msg)
}
}
}()
return ma
}
func (m* ManagerA) AddStringA(s string) {
m.addStringChan <- s
}
func (m* ManagerA) RemoveStringA(s string) {
m.removeStringChan <- s
}
I would like to expose an API similar to the non-concurrent example, hence AddStringA, RemoveStringA.
This seems to work as expected concurrently (although I guess the inner goroutine should also exit at some point). My problem with this is that there is a lot of extra boilerplate:
need to define & initialize channels
define inner goroutine loop with select
map functions to channel calls
It seems a bit much to me. Is there a way to simplify this (refactor / syntax / library)?
I think the best way to implement this would be to use a Mutex instead? But is it still possible to simplify this sort of boilerplate?
Using a mutex would be perfectly idiomatic like this:
type Manager struct {
mu sync.Mutex
strings []string
}
func (m *Manager) AddString(s string) {
m.mu.Lock()
m.strings = append(m.strings, s)
m.mu.Unlock()
}
func (m *Manager) RemoveString(s string) {
m.mu.Lock()
for i, str := range m.strings {
if str == s {
m.strings = append(m.strings[:i], m.strings[i+1:]...)
}
}
m.mu.Unlock()
}
You could do this with channels, but as you noted it is a lot of extra work for not much gain. Just use a mutex is my advice!
If you simply need to make the access to the struct thread-safe, use mutex:
type Manager struct {
sync.Mutex
data []string
}
func (m *Manager) AddString(s string) {
m.Lock()
m.strings = append(m.strings, s)
m.Unlock()
}
I'm working on a golang web crawler that should parse the search results on some specific search engine. The main difficulty - parsing with concurrency, or rather, in processing pagination such as
← Previous 1 2 3 4 5 ... 34 Next →. All things work fine except recursive crawling of paginated results. Look at my code:
package main
import (
"bufio"
"errors"
"fmt"
"net"
"strings"
"github.com/antchfx/htmlquery"
"golang.org/x/net/html"
)
type Spider struct {
HandledUrls []string
}
func NewSpider(url string) *Spider {
// ...
}
func requestProvider(request string) string {
// Everything is good here
}
func connectProvider(url string) net.Conn {
// Also works
}
// getContents makes request to search engine and gets response body
func getContents(request string) *html.Node {
// ...
}
// CheckResult controls empty search results
func checkResult(node *html.Node) bool {
// ...
}
func (s *Spider) checkVisited(url string) bool {
// ...
}
// Here is the problems
func (s *Spider) Crawl(url string, channelDone chan bool, channelBody chan *html.Node) {
body := getContents(url)
defer func() {
channelDone <- true
}()
if checkResult(body) == false {
err := errors.New("Nothing found there")
ErrFatal(err)
}
channelBody <- body
s.HandledUrls = append(s.HandledUrls, url)
fmt.Println("Handled ", url)
newUrls := s.getPagination(body)
for _, u := range newUrls {
fmt.Println(u)
}
for i, newurl := range newUrls {
if s.checkVisited(newurl) == false {
fmt.Println(i)
go s.Crawl(newurl, channelDone, channelBody)
}
}
}
func (s *Spider) getPagination(node *html.Node) []string {
// ...
}
func main() {
request := requestProvider(*requestFlag)
channelBody := make(chan *html.Node, 120)
channelDone := make(chan bool)
var parsedHosts []*Host
s := NewSpider(request)
go s.Crawl(request, channelDone, channelBody)
for {
select {
case recievedNode := <-channelBody:
// ...
for _, h := range newHosts {
parsedHosts = append(parsedHosts, h)
fmt.Println("added", h.HostUrl)
}
case <-channelDone:
fmt.Println("Jobs finished")
}
break
}
}
It always returns the first page only, no pagination. Same GetPagination(...) works good. Please tell me, where is my error(s).
Hope Google Translate was correct.
The problem is probably that main exits before all goroutine finished.
First, there is a break after the select statement and it runs uncodintionally after first time a channel is read. That ensures the main func returns after the first time you send something over channelBody.
Secondly, using channelDone is not the right way here. The most idomatic approach would be using a sync.WaitGroup. Before starting each goroutine, use WG.Add(1) and replace the defer with defer WG.Done(); In main, use WG.Wait(). Please be aware that you should use a pointer to refer to the WaitGroup. You can read more here.
I have two read-only channels <-chan Event that utilized as generators.
type Event struct{
time int
}
I can read their values as:
for {
select {
case <-chan1:
// do something
case <-chan2:
//do something
}
I use those channels for event-driven simulations so I have to choose the Event with the less time field.
Is it possible to inspect which value is going from each channel and only then choose from which one to read? Because the operation <-chan1 takes value from channel and it is impossible to push it back (read only channel).
You can implement your version of go channel structure. for example following implementation act like go channel without size limit and you can inspect its first element.
package buffchan
import (
"container/list"
"sync"
)
// BufferedChannel provides go channel like interface with unlimited storage
type BufferedChannel struct {
m *sync.Mutex
l *list.List
c *sync.Cond
}
// New Creates new buffer channel
func New() *BufferedChannel {
m := new(sync.Mutex)
return &BufferedChannel{
m: m,
l: list.New(),
c: sync.NewCond(m),
}
}
// Append adds given data at end of channel
func (b *BufferedChannel) Append(v interface{}) {
b.m.Lock()
defer b.m.Unlock()
b.l.PushBack(v)
b.c.Signal()
}
// Remove removes first element of list synchronously
func (b *BufferedChannel) Remove() interface{} {
b.m.Lock()
defer b.m.Unlock()
for b.l.Len() == 0 {
b.c.Wait()
}
v := b.l.Front()
b.l.Remove(v)
return v.Value
}
// Inspect first element of list if exists
func (b *BufferedChannel) Inspect() interface{} {
b.m.Lock()
defer b.m.Unlock()
for b.l.Len() == 0 {
return nil
}
return b.l.Front().Value
}
// AsyncRemove removes first element of list asynchronously
func (b *BufferedChannel) AsyncNonBlocking() interface{} {
b.m.Lock()
defer b.m.Unlock()
for b.l.Len() == 0 {
return nil
}
v := b.l.Front()
b.l.Remove(v)
return v.Value
}
I have a simple package I am using to log stats during a program run and I found that go run -race says there is a race condition in it. Looking at the program I'm not sure how I can have a race condition when every read and write is protected by a mutex. Can someone explain this?
package counters
import "sync"
type single struct {
mu sync.Mutex
values map[string]int64
}
// Global counters object
var counters = single{
values: make(map[string]int64),
}
// Get the value of the given counter
func Get(key string) int64 {
counters.mu.Lock()
defer counters.mu.Unlock()
return counters.values[key]
}
// Incr the value of the given counter name
func Incr(key string) int64 {
counters.mu.Lock()
defer counters.mu.Unlock()
counters.values[key]++ // Race condition
return counters.values[key]
}
// All the counters
func All() map[string]int64 {
counters.mu.Lock()
defer counters.mu.Unlock()
return counters.values // running during write above
}
I use the package like so:
counters.Incr("foo")
counters.Get("foo")
A Minimal Complete Verifiable Example would be useful here, but I think your problem is in All():
// All the counters
func All() map[string]int64 {
counters.mu.Lock()
defer counters.mu.Unlock()
return counters.values // running during write above
}
This returns a map which does not make a copy of it, so it can be accessed outside the protection of the mutex.
All returns the underlying map and the releases the lock, so the code using the map will have a data race.
You should return a copy of the map:
func All() map[string]int64 {
counters.mu.Lock()
defer counters.mu.Unlock()
m := make(map[string]int64)
for k, v := range counters.values {
m[k] = v
}
return m
}
Or not have an All method.