Making a struct thread safe using go channels - go

Suppose I have the following struct:
package manager
type Manager struct {
strings []string
}
func (m *Manager) AddString(s string) {
m.strings = append(m.strings, s)
}
func (m *Manager) RemoveString(s string) {
for i, str := range m.strings {
if str == s {
m.strings = append(m.strings[:i], m.strings[i+1:]...)
}
}
}
This pattern is not thread safe, so the following test fails due to some race condition (array index out of bounds):
func TestManagerConcurrently(t *testing.T) {
m := &manager.Manager{}
wg := sync.WaitGroup{}
for i:=0; i<100; i++ {
wg.Add(1)
go func () {
m.AddString("a")
m.AddString("b")
m.AddString("c")
m.RemoveString("b")
wg.Done()
} ()
}
wg.Wait()
fmt.Println(m)
}
I'm new to Go, and from googling around I suppose I should use channels (?). So one way to make this concurrent would be like this:
type ManagerA struct {
Manager
addStringChan chan string
removeStringChan chan string
}
func NewManagerA() *ManagerA {
ma := &ManagerA{
addStringChan: make(chan string),
removeStringChan: make(chan string),
}
go func () {
for {
select {
case msg := <-ma.addStringChan:
ma.AddString(msg)
case msg := <-ma.removeStringChan:
ma.RemoveString(msg)
}
}
}()
return ma
}
func (m* ManagerA) AddStringA(s string) {
m.addStringChan <- s
}
func (m* ManagerA) RemoveStringA(s string) {
m.removeStringChan <- s
}
I would like to expose an API similar to the non-concurrent example, hence AddStringA, RemoveStringA.
This seems to work as expected concurrently (although I guess the inner goroutine should also exit at some point). My problem with this is that there is a lot of extra boilerplate:
need to define & initialize channels
define inner goroutine loop with select
map functions to channel calls
It seems a bit much to me. Is there a way to simplify this (refactor / syntax / library)?
I think the best way to implement this would be to use a Mutex instead? But is it still possible to simplify this sort of boilerplate?

Using a mutex would be perfectly idiomatic like this:
type Manager struct {
mu sync.Mutex
strings []string
}
func (m *Manager) AddString(s string) {
m.mu.Lock()
m.strings = append(m.strings, s)
m.mu.Unlock()
}
func (m *Manager) RemoveString(s string) {
m.mu.Lock()
for i, str := range m.strings {
if str == s {
m.strings = append(m.strings[:i], m.strings[i+1:]...)
}
}
m.mu.Unlock()
}
You could do this with channels, but as you noted it is a lot of extra work for not much gain. Just use a mutex is my advice!

If you simply need to make the access to the struct thread-safe, use mutex:
type Manager struct {
sync.Mutex
data []string
}
func (m *Manager) AddString(s string) {
m.Lock()
m.strings = append(m.strings, s)
m.Unlock()
}

Related

Testing In Golang, how to test cache that expires in 30 seconds

I have an interface called localcache:
package localcache
type Cache interface {
Set(k string, v interface{}) error
Get(k string) (interface{}, error)
}
and another file containing its implementation
type cache struct {
pool map[string]value
}
type value struct {
data interface{}
expiredAt time.Time
}
func New() *cache {
if cacheInstance == nil {
once.Do(func() {
cacheInstance = &cache{
pool: make(map[string]value),
}
go cacheInstance.spawnCacheChcker()
})
return cacheInstance
}
return cacheInstance
}
func (c *cache) Set(k string, v interface{}) (e error) {
expiredAt := time.Now().Add(expiredIn)
c.pool[k] = value{data: v, expiredAt: expiredAt}
e = nil
return
}
func (c *cache) Get(k string) (v interface{}, e error) {
v = c.pool[k].data
e = nil
return
}
func (c *cache) spawnCacheChcker() {
for {
for k, v := range c.pool {
if !(v.expiredAt.Before(time.Now())) {
continue
}
c.evictCache(k)
}
time.Sleep(checkInBetween)
}
}
a cache will be expired 30 seconds after it's been set, how can I test this functionality?
I'm using testify rn, my brute force solution was to time.Sleep in the test function, but I feel like this will prolong the entire test process, which is not the best practice.
Is there any ways to mock the expiredAt inside the Set function? Or is there any workaround that tests this better?
You can move time.Now() to be a field in cache struct like
type cache struct {
nowFunc func() time.Time
}
then assign time.Now to it in the constructor. Within the test, just modify it to your mock function to return any time you want.

cannot use make(IntChannel) (value of type IntChannel) as Consumer value in variable declaration

We have a below scenario:
package main
type Consumer chan interface {
OpenChannel()
CloseChannel()
}
type IntChannel chan int
type StringChannel chan string
func (c IntChannel) OpenChannel() {
}
func (c IntChannel) CloseChannel() {
}
func (c StringChannel) OpenChannel() {
}
func (c StringChannel) CloseChannel() {
}
func main() {
var dataChannel Consumer = make(IntChannel)
for data = range dataChannel {
}
}
Goal is to range on dataChannel.
var dataChannel Consumer = make(IntChannel) gives error: cannot use make(IntChannel) (value of type IntChannel) as Consumer value in variable declaration
We pick int channel or string channel based on a given config value at runtime.
Read this answer, but not much help.
How to range on a channel type that picks either int data or string data?
First, you declared Consumer as a chan of interface{ /* methods */ }, which most surely isn't what you want — as a matter of fact, the error tells that you can't assign IntChannel to it.
Then, until generics are added to the language, you don't have a way to preserve type safety.
The closest solution to what you want to do might be adding an additional method to the interface that returns something that you can range over.
type Consumer interface {
OpenChannel()
CloseChannel()
Range() <-chan interface{}
}
type IntChannel chan int
func (c IntChannel) OpenChannel() {
}
func (c IntChannel) CloseChannel() {
}
func (c IntChannel) Range() <-chan interface{} {
ret := make(chan interface{})
go func() {
defer close(ret)
for v := range c {
ret <- v
}
}()
return ret
}
func main() {
c := make(IntChannel)
var dataChannel Consumer = c
go func() {
c <- 12
close(c)
}()
for data := range dataChannel.Range() {
fmt.Println(data)
}
}
Go1 Playground: https://play.golang.org/p/55BpISRVadE
With generics (Go 1.18, early 2022), instead you can just define a parametrized type with underlying type chan:
package main
import "fmt"
type GenericChan[T] chan T
func main() {
c := make(GenericChan[int])
go func() {
c <- 12
close(c)
}()
for data := range c {
fmt.Println(data)
}
}
Go2 Playground: https://go2goplay.golang.org/p/HQJ36ego97i

Add a cache to a go function as if it were a static member

Say I have an expensive function
func veryExpensiveFunction(int) int
and this function gets called a lot for the same number.
Is there a good way to allow this function to store previous results to use if the function gets called again that is perhaps even reusable for veryExpensiveFunction2?
Obviously, it would be possible to add an argument
func veryExpensiveFunctionCached(p int, cache map[int]int) int {
if val, ok := cache[p]; ok {
return val
}
result := veryExpensiveFunction(p)
cache[p] = result
return result
}
But now I have to create the cache somewhere, where I don't care about it. I would rather have it as a "static function member" if this were possible.
What is a good way to simulate a static member cache in go?
You can use closures; and let the closure manage the cache.
func InitExpensiveFuncWithCache() func(p int) int {
var cache = make(map[int]int)
return func(p int) int {
if ret, ok := cache[p]; ok {
fmt.Println("from cache")
return ret
}
// expensive computation
time.Sleep(1 * time.Second)
r := p * 2
cache[p] = r
return r
}
}
func main() {
ExpensiveFuncWithCache := InitExpensiveFuncWithCache()
fmt.Println(ExpensiveFuncWithCache(2))
fmt.Println(ExpensiveFuncWithCache(2))
}
output:
4
from cache
4
veryExpensiveFunctionCached := InitExpensiveFuncWithCache()
and use the wrapped function with your code.
You can try it here.
If you want it to be reusable, change the signature to InitExpensiveFuncWithCache(func(int) int) so it accept a function as a parameter. Wrap it in the closure, replacing the expensive computation part with it.
You need to be careful about synchronization if this cache will be used in http handlers. In Go standard lib, each http request is processed in a dedicated goroutine and at this moment we are at the domain of concurrency and race conditions. I would suggest a RWMutex to ensure data consistency.
As for the cache injection, you may inject it at a function where you create the http handler.
Here it is a prototype
type Cache struct {
store map[int]int
mux sync.RWMutex
}
func NewCache() *Cache {
return &Cache{make(map[int]int), sync.RWMutex{}}
}
func (c *Cache) Set(id, value int) {
c.mux.Lock()
c.store[id] = id
c.mux.Unlock()
}
func (c *Cache) Get(id int) (int, error) {
c.mux.RLock()
v, ok := c.store[id]
c.mux.RUnlock()
if !ok {
return -1, errors.New("a value with given key not found")
}
return v, nil
}
func handleComplexOperation(c *Cache) http.HandlerFunc {
return http.HandlerFunc(func(rw http.ResponseWriter, r *http.Request){
})
}
The Go standard library uses the following style for providing "static" functions (e.g. flag.CommandLine) but which leverage underlying state:
// "static" function is just a wrapper
func Lookup(p int) int { return expCache.Lookup(p) }
var expCache = NewCache()
func newCache() *CacheExpensive { return &CacheExpensive{cache: make(map[int]int)} }
type CacheExpensive struct {
l sync.RWMutex // lock for concurrent access
cache map[int]int
}
func (c *CacheExpensive) Lookup(p int) int { /*...*/ }
this design pattern not only allows for simple one-time use, but also allows for segregated usage:
var (
userX = NewCache()
userY = NewCache()
)
userX.Lookup(12)
userY.Lookup(42)

Passing in a function that calls another func

Hello I have 2 funcs that look similar and I would like to create one generic func. My problem is: I am unsure how to pass in another func:
func (b *Business) StreamHandler1(sm streams.Stream, p []*types.People) {
guard := make(chan struct{}, b.maxManifestGoRoutines)
for _, person := range p {
guard <- struct{}{} // would block if guard channel is already filled
go func(n *types.People) {
b.PeopleHandler(sm, n)
<-guard
}(person)
}
}
func (b *Business) StreamHandler2(sm streams.Stream, pi []*types.PeopleInfo) {
guard := make(chan struct{}, b.maxManifestGoRoutines)
for _, personInfo := range pi {
guard <- struct{}{} // would block if guard channel is already filled
go func(n *types.PeopleInfo) {
b.PeopleInfoHandler(sm, n)
<-guard
}(personInfo)
}
}
You can see they both look very, very similar so I would like to make one generic func that I can pass in PeopleInfoHandler and PeopleHandler . Any idea how I can do this correctly? It looks like the syntax from Go I should be able to do something like this:
func (b *Business) StreamHandler1(f func(streams.Stream, interface{}), sm streams.Stream, p []*interface{}) {
But that doesn't seem to be working. Any ideas on how I can make this generic?
You can create abstractions for the type you passed with specific interface types defined.
I use the Peopler interface to either get the People or the PeopleInfo, based on the handler that I defined and that I pass to the new StreamHandler. You can pass the *Business as well in the handler if you need any of its field/method.
But as would Sergio say, if the method is only 5 lines long, even if it is mostly the same, it might not be worth it.
For your pattern with the guard struct, you could use a sync.WaitGroup that would fit better.
package main
import (
"fmt"
"time"
)
func (b *Business) StreamHandler(sm streamsStream, p []Peopler, handler func(streamsStream, Peopler)) {
guard := make(chan struct{}, b.maxManifestGoRoutines)
for _, person := range p {
guard <- struct{}{} // would block if guard channel is already filled
go func(p Peopler) {
handler(sm, p)
<-guard
}(person)
}
}
func peopleInfoHandler(s streamsStream, p Peopler) {
fmt.Println("info:", p.PeopleInfo())
}
func peopleHandler(s streamsStream, p Peopler) {
fmt.Println("people:", p.People())
}
func main() {
b := &Business{maxManifestGoRoutines: 2}
s := streamsStream{}
p := []Peopler{
&People{
Info: PeopleInfo{Name: "you"},
},
}
b.StreamHandler(s, p, peopleInfoHandler)
b.StreamHandler(s, p, peopleHandler)
time.Sleep(time.Second)
}
type streamsStream struct {
}
type People struct {
Info PeopleInfo
}
func (tp *People) People() People {
return *tp
}
type PeopleInfo struct {
Name string
}
func (tp *People) PeopleInfo() PeopleInfo {
return tp.Info
}
type Peopler interface {
People() People
PeopleInfo() PeopleInfo
}
type Business struct {
maxManifestGoRoutines int
}
play link
If possible, you can use dependency inversion to utilize interfaces instead. An association between a type and a function is pretty much the definition of a method. So use an interface to specify the method definition, and pass Business into that method.
The interface will be necessary to prevent an import cycle along with making the code more permissive.
Normally with dependency inversion, the handlers would be explicitly implementing an interface in the same package as Business, but that is all implicit in Go.
For example:
type Handler interface {
Handle(*Business, streams.Stream)
}
func (b *Business) StreamHandler1(sm streams.Stream, hs []Handler) {
guard := make(chan struct{}, b.maxManifestGoRoutines)
for _, h := range hs {
guard <- struct{}{} // would block if guard channel is already filled
go func(n Handler) {
n.Handle(b, sm)
<-guard
}(h)
}
}
Alternatively, if you need a specific functionality from the type, you can have the Business method accept the interface for that behavior. This would be a bit more elegant, but it takes more planning ahead of time across multiple types or a large refactor. For example:
type TypeDoThinger interface {
Type() Schema
DoThing() ImportantValue
}
func (b *Business) HandleTypeDoThinger(sm streams.Stream, t TypeDoThinger) {
sch := t.Type()
// use schema for something
v := t.DoThing()
// save the important data
}
func (b *Business) StreamHandler(sm streams.Stream, ts []TypeDoThinger) {
guard := make(chan struct{}, b.maxManifestGoRoutines)
for _, t := range ts {
guard <- struct{}{} // would block if guard channel is already filled
go func(t TypeDoThinger) {
b.HandleTypeDoThinger(sm, t)
<-guard
}(t)
}
}

Golang web spider with pagination processing

I'm working on a golang web crawler that should parse the search results on some specific search engine. The main difficulty - parsing with concurrency, or rather, in processing pagination such as
← Previous 1 2 3 4 5 ... 34 Next →. All things work fine except recursive crawling of paginated results. Look at my code:
package main
import (
"bufio"
"errors"
"fmt"
"net"
"strings"
"github.com/antchfx/htmlquery"
"golang.org/x/net/html"
)
type Spider struct {
HandledUrls []string
}
func NewSpider(url string) *Spider {
// ...
}
func requestProvider(request string) string {
// Everything is good here
}
func connectProvider(url string) net.Conn {
// Also works
}
// getContents makes request to search engine and gets response body
func getContents(request string) *html.Node {
// ...
}
// CheckResult controls empty search results
func checkResult(node *html.Node) bool {
// ...
}
func (s *Spider) checkVisited(url string) bool {
// ...
}
// Here is the problems
func (s *Spider) Crawl(url string, channelDone chan bool, channelBody chan *html.Node) {
body := getContents(url)
defer func() {
channelDone <- true
}()
if checkResult(body) == false {
err := errors.New("Nothing found there")
ErrFatal(err)
}
channelBody <- body
s.HandledUrls = append(s.HandledUrls, url)
fmt.Println("Handled ", url)
newUrls := s.getPagination(body)
for _, u := range newUrls {
fmt.Println(u)
}
for i, newurl := range newUrls {
if s.checkVisited(newurl) == false {
fmt.Println(i)
go s.Crawl(newurl, channelDone, channelBody)
}
}
}
func (s *Spider) getPagination(node *html.Node) []string {
// ...
}
func main() {
request := requestProvider(*requestFlag)
channelBody := make(chan *html.Node, 120)
channelDone := make(chan bool)
var parsedHosts []*Host
s := NewSpider(request)
go s.Crawl(request, channelDone, channelBody)
for {
select {
case recievedNode := <-channelBody:
// ...
for _, h := range newHosts {
parsedHosts = append(parsedHosts, h)
fmt.Println("added", h.HostUrl)
}
case <-channelDone:
fmt.Println("Jobs finished")
}
break
}
}
It always returns the first page only, no pagination. Same GetPagination(...) works good. Please tell me, where is my error(s).
Hope Google Translate was correct.
The problem is probably that main exits before all goroutine finished.
First, there is a break after the select statement and it runs uncodintionally after first time a channel is read. That ensures the main func returns after the first time you send something over channelBody.
Secondly, using channelDone is not the right way here. The most idomatic approach would be using a sync.WaitGroup. Before starting each goroutine, use WG.Add(1) and replace the defer with defer WG.Done(); In main, use WG.Wait(). Please be aware that you should use a pointer to refer to the WaitGroup. You can read more here.

Resources