Is there a way to model shared state using messages? - go

So currently I've come upon a very real problem in writing "correct" golang. I have an object (for the sake of simplicity lets think of it as a map[string]string) and I want it to hold a "shared" state between multiple gortuines.
Currently the implementation goes something like this:
//Inside shared_state.go
var sharedMap map[string]string = make(map[string]string)
var mutex sync.RWMutex = sync.RWMutex{}
func Add(k string, v string) bool {
mutex.Lock()
if _, exists := sharedMap[k]; exists {
mutex.Unlock()
return false
}
tokenMap[k] = v
mutex.Unlock()
return true
}
//Other methods to access, modify... etc
Whilst this does do the job is quite an ugly implementation by go standards, which encourage modeling concurrency using message.
Are there easy ways of modeling shared state using messages that I am blatantly unaware of ? Or am I forced to use mutexes in this kind of cases ?

You don't "model shared state using messages", you use messages instead of shared state, which requires designing the application based on different fundamentals. It is generally not a matter of rewriting a mutex as a channel, but a completely different implementation approach, and that approach won't be applicable to all scenarios where you need to synchronize operations. If a shared map is the best approach for your situation, then a mutex is the correct way to synchronize access to it.
As an example from my own experience, I've developed applications that allow for changing their configuration at runtime. Rather than having a shared Config object and synchronizing access to it, I give each main goroutine a channel on which it can receive configuration updates. When the config changes, the update is sent to all the listeners. When a listener gets a config change, it can complete its current operation, then deal with the config change in whatever way is appropriate to that routine - it may just update its local copy of the config, it may close connections to external resources and open new ones, etc. Instead of sharing data, I'm sending and receiving events, which is a fundamentally different design.

Related

Difference between plain go func and for loop in go func

I have some question regarding difference between plain go func and for loop in go func:
Plain go Func:
func asyncTask(){
//...something
}
in order to trigger asyncTask, we can simply:
func main(){
go asyncTask()
}
make a for loop to monitor channel:
func (c *Container) asyncTask(){
go func(){
for {
select {
case <- c.someChan:
//...do something
case <-c.ctx.Done():
//...prevent leaking
}
}
}()
}
to trigger:
func (c *Container) trigger(){
c.someChan <- val
}
My questions are:
I understand second scenario most fit the case when we wish to manage async task in a queue.
But speaking for performance out of frequently triggered async task (which cannot be block), which method is better?
Is there any best practice in general to handle async task in GoLang?
In nearly any case, performance is not the thing to think about in choosing which pattern to use (both will be fine), but which usage makes sense in your specific use case. If you use pattern (1), then you are not guaranteed the sequential processing of (2). That is the essential difference between your two examples. So for an http server for example, you would use the former pattern (go handleRequest(r HttpRequest) say) to process requests in parallel, but use the latter to ensure that certain operations are processed sequentially. I hope this is answering your question!
You can use model #1 with WaitGroups when you have goroutines for which you need to account for and are bothered only about their exit and as such otherwise don't need to manage etc.
You can use model #2 when you need explicit management / control / communication. Channel communication is NOT free - sending and receiving routines need synchronization/channels need locking when values are sent, lot of things will have to happen under the hood.
Unless the need be, definitely option #1 is the way to go. See what's the simplest possible solution for your problem - I know it's easy to preach, but simplicity may take some time to come by.
In short, from that what i know, 2 pattern you mentioned above is not something to really compare which one to use or which one is better. Both of them just have different use case with different necessity.
From what i know, it is not about
plain go func and for loop in go func
It is more to different usage.
Before answering your question, i like to try give short explanation about two pattern you mentioned.
The first pattern is a very basic go statement usage. Which just will execute function outside its main thread. As basic usage of concurrency in go, this pattern design doesn't have a way to get data from executed function with go statement. Can't be from main() thread or any other function. In order to communicate with any other function / thread its needs channel. You already mention one pattern form several go with channel pattern available.
Just like what i mentioned earlier, this second pattern is just one of several go with channel pattern in Golang in usage with go statement. Actually this one is quite complex pattern which main usage is for selecting from multiple channels and will do further things with those channels. I will give some slight explanation about this pattern as folow:
The for loop there has no conditional statement which will work similarly like while loop at any other language like C or Java. It is mean an endless loop.
Since it is endless loop, it is need a condition which usually check from the available channels to check. For example, something like when a channel is closed it will be end.
Regarding select and case statement, if two or more communication cases happen to be ready at the same time, one will be selected at random
Even you need to communicate between concurrent/asynchronous functions running, i guess you not need it in general. Generally there is more simple pattern to communicate the threads by using channel.
In summary to answer your questions:
Which method is better to do asynchronous task is really depend on your necessity. There are several pattern which not limited to you have mentioned above. If you need just do execute function asynchronously first pattern will be fine otherwise you need one from channel pattern way available. But again, not limited to 2nd pattern you mentioned above
Both pattern you mentioned looks as common practices for me. But i guess usually we often need at least a channel in order to communicate an asynchronous task with main() thread or any other thread. And the pattern it self really depend on how you will communicate (send/receive) the data/values sources (Database, slices variables etc.) and more other aspect. I suggest you learn more about the usage of channel there are lot patterns to do with that. I suggest to check this first https://gobyexample.com/goroutines. Start from there you see at the bottom of page the "Next Example" which will getting deeper about go concurrency things.
As addition:
go statement is simple, the complex things is about the usage with channel. Here is i make list you better to learn in order to have better understanding about concurrency communication.
goroutine
Channel direction ( Send / Receive / unidirectional )
Channel concept / behavior which is communicating sequential
processes (CSP) . It is some kind about "block" and "proceed" behavior of send/receive behavior.
Buffered channel
Unbuffered channel
And more about channel :)
Hope this helps you or some one to start with goroutine and channel to works with concurrency in Golang. Please feel free if some one like to give corrections to my answer or ask further explanation about it. Thank you.

Add multiple key values in context.Context from web services API

I have a web application written in Go with multiple modules, one deals with all database related things, one deals with reports, one consists all web services, one for just business logic and data integrity validation and several others. So, I have numerous methods, functions have been covered by these modules.
Now, the requirement is to use session in web service as well as we need to use transaction in some APIs. The first approach came to my mind is to change the signature of the existing methods to support session, transaction (*sql.Tx) (which is a painful task, but have to do in anyways!). Now, I'm afraid actually what if something will come in future that needs to be passed through all these methods and then should I have to go through this cycle again to change the method signature again? This does not seem to be a good approach.
Later, I found that context.Context might be a good approach (well, you can suggest other approaches too, apart from this!) that for every method call, just pass context parameter at first argument place in a method call hence I've to change methods signature only one time. If I go with this approach, can anyone tell me how would I set/pass multiple keys (session, sql.Tx) in that context object?
(AFAIK, context.Context provides WithValue method, but can I use it for multiple keys? How would I set a key in the nested function call, is that even possible?)
Actually, this question has two questions:
Should I consider context.Context for my solution? If not, give me a light on another approach.
How do I set multiple keys and values in context.Context?
For your second question you can group all your key/values in struct as follows:
type vars struct {
lock sync.Mutex
db *sql.DB
}
Then you can add this struct in context:
ctx := context.WithValue(context.Background(), "values", vars{lock: mylock, db: mydb})
And you can retrieve it:
ctxVars, ok := r.Context().Value("values").(vars)
if !ok {
log.Println(err)
return err
}
db := ctxVars.db
lock := ctxVars.lock
I hope it helps you.
Finally, I decided to go with context package solution, after studying the articles from the Go context experience reports. And especially I found Dave Cheney's article helpful.
Well, I can make my custom solution for context as gorilla (Ah, somewhat!). But as Go already have a solution for this, I would go with context package.
Right now, I only need session and database transaction in each method to support transaction if began and user authentication, authorization.
It might be overhead, of having context.Context in each method of the application cause I don't need cancellation, deadline, timeout functionality at the moment but it could be helpful in future.

Sharing a slice across a few different goroutines

Given that I have an slice of structs of type User
Users := make([]User)
I am listening for TCP connections, and when a user connects, I'm adding an new user to this slice.
The way I've done this, is by setting up a NewUsers channel
NewUsers := make(chan User)
Upon new TCP connects, a User gets sent to this channel, and a central function waits for a User to arrive to add it to the Users slice.
But now I would like multiple subsystems (packages/functions) to use this list of Users. One function might simply want to receive a list of users, while a different function might want to broadcast messages to every user, or just users matching a certain condition.
How do multiple functions (which are possibly executed from different goroutines) safely access the list of users. I see two possible ways:
Every subsystem that needs access to this list needs their own AddUser channel and maintain their own slice of users and something needs to broadcast new users to every one of these channels.
Block access with a Mutex
Option 1 seems very convoluted and would generate a fair bit of duplication, but my understanding that Mutexes are best to be avoided if you try to stick to the "Share Memory By Communicating" mantra.
The idiomatic Go way to share data between concurrent activities is summed up in this:
Do not communicate by sharing memory; instead, share memory by
communicating.
Andrew Gerrand blogged about this, for example.
It need not be overly-complex; you can think of designing internal microservices, expressed using goroutines with channels.
In your case, this probably means designing a service element to contain the master copy of the list of users.
The main advantages of the Go/CSP strategy are that
concurrency is a design choice, along with your other aspects of design
only local knowledge is needed to understand the concurrent behaviour: this arises because a goroutine can itself consist of internal goroutines, and this applies all the way down if needed. Understanding the external behaviour of the higher-level goroutines depends only on its interfaces, not on the hidden internals.
But...
There are times when a safely shared data structure (protected by mutexes) will be sufficient now and always. It might then be argued that the extra complexity of goroutines and channels is a non-requirement.
A safely shared list data structure is something you will find several people have provided as open-source APIs. (I have one myself - see the built-ins in runtemplate).
The mutex approach is the best, safest and most manageable approach to that problem and is the fastest.
Channels are complex beasts on the inside and are much slower than a rwmutex-guarded map/slice.

Idiomatic Golang goroutines

In Go, if we have a type with a method that starts some looped mechanism (polling A and doing B forever) is it best to express this as:
// Run does stuff, you probably want to run this as a goroutine
func (t Type) Run() {
// Do long-running stuff
}
and document that this probably wants to be launched as a goroutine (and let the caller deal with that)
Or to hide this from the caller:
// Run does stuff concurrently
func (t Type) Run() {
go DoRunStuff()
}
I'm new to Go and unsure if convention says let the caller prefix with 'go' or do it for them when the code is designed to run async.
My current view is that we should document and give the caller a choice. My thinking is that in Go the concurrency isn't actually part of the exposed interface, but a property of using it. Is this right?
I had your opinion on this until I started writing an adapter for a web service that I want to make concurrent. I have a go routine that must be started to parse results that are returned to the channel from the web calls. There is absolutely no case in which this API would work without using it as a go routine.
I then began to look at packages like net/http. There is mandatory concurrency within that package. It is documented at the interface level that it should be able to be used concurrently, however the default implementations automatically use go routines.
Because Go's standard library commonly fires of go routines within its own packages, I think that if your package or API warrants it, you can handle them on your own.
My current view is that we should document and give the caller a choice.
I tend to agree with you.
Since Go makes it so easy to run code concurrently, you should try to avoid concurrency in your API (which forces clients to use it concurrently). Instead, create a synchronous API, and then clients have the option to run it synchronously or concurrently.
This was discussed in a talk a couple years ago: Twelve Go Best Practices
Slide 26, in particular, shows code more like your first example.
I would view the net/http package as an exception because in this case, the concurrency is almost mandatory. If the package didn't use concurrency internally, the client code would almost certainly have to. For example, http.Client doesn't (to my knowledge) start any goroutines. It is only the server that does so.
In most cases, it's going to be one line of the code for the caller either way:
go Run() or StartGoroutine()
The synchronous API is no harder to use concurrently and gives the caller more options.
There is no 'right' answer because circumstances differ.
Obviously there are cases where an API might contain utilities, simple algorithms, data collections etc that would look odd if packaged up as goroutines.
Conversely, there are cases where it is natural to expect 'under-the-hood' concurrency, such as a rich IO library (http server being the obvious example).
For a more extreme case, consider you were to produce a library of plug-n-play concurrent services. Such an API consists of modules each having a well-described interface via channels. Clearly, in this case it would inevitably involve goroutines starting as part of the API.
One clue might well be the presence or absence of channels in the function parameters. But I would expect clear documentation of what to expect either way.

Are there disadvantages of using channel.Get() over channel.Consume()?

I'm using streadway's amqp library to connect with a rabbitmq server.
The library provides a channel.Consume() function which returns a "<- chan Delivery".
It also provides a channel.Get() function which returns a "Delivery" among other things.
I've to implement a pop() functionality, and I'm using channel.Get(). However, the documentation says:
"In almost all cases, using Channel.Consume will be preferred."
Does the preferred here means recommended? Are there any disadvantages of using channel.Get() over channel.Consume()? If yes, how do I use channel.Consume() to implement a Pop() function?
As far as I can tell from the docs, yes, "preferred" does mean "recommended".
It seems that channel.Get() doesn't provide as many features as channel.Consume(), as well as being more readily usable in concurrent code due to it's returning a chan of Delivery, as opposed to each individual Delivery separately.
The extra features mentioned are exclusive, noLocal and noWait, as well as an optional Table of args "that have specific semantics for the queue or server."
To implement a Pop() function using channel.Consume() you could, to link to some code fragments from the amqp example consumer, create a channel using the Consume() function, create a function to handle the chan of Delivery which will actually implement your Pop() functionality, then fire off the handle() func in a goroutine.
The key to this is that the channel (in the linked example) will block on sending if nothing is receiving. In the example, the handle() func uses range to process the entire channel until it's empty. Your Pop() functionality may be better served by a function that just receives the last value from the chan and returns it. Every time it's run it will return the latest Delivery.
EDIT: Example function to receive the latest value from the channel and do stuff with it (This may not work for your use case, it may be more useful if the function sent the Delivery on another chan to another function to be processed. Also, I haven't tested the code below, it may be full of errors)
func handle(deliveries <-chan amqp.Delivery, done chan error) {
select {
case d = <-deliveries:
// Do stuff with the delivery
// Send any errors down the done chan. for example:
// done <- err
default:
done <- nil
}
}
It really depend of what are you trying to do. If you want to get only one message from queue (first one) you probably should use basic.get, if you are planning to process all incoming messages from queue - basic.consume is what you want.
Probably, it is not platform or library specific question but rather protocol understanding question.
UPD
I'm not familiar with it go language well, so I will try to give you some brief on AMQP details and describe use cases.
You may get in troubles and have an overhead with basic.consume sometimes:
With basic.consume you have such workflow:
Send basic.consume method to notify broker that you want to receive messages
while this is a synchronous method, wait for basic.consume-ok message from broker
Start listening to basic.deliver message from server
this is an asynchronous method and you should take care by yourself situations where no messages on server available, e.g. limit reading time
With basic.get you have such workflow:
send synchronous method basic.get to broker
wait for basic.get-ok method, which hold message(s) or basic.empty method, which denote situation no message available on server
Note about synchronous and asynchronous methods: synchronous is expected to have some response, whether asynchronous doesn't
Note on basic.qos method prefetch-count property: it is ignored when no-ack property is set on basic.consume or basic.get.
Spec has a note on basic.get: "this method provides a direct access to the messages in a queue using a synchronous dialogue that is designed for specific types of application where synchronous functionality is more important than performance" which applies for continuous messages consumption.
My personal tests show that getting in row 1000 messages with basic.get (0.38659715652466) is faster than getting 1000 messages with basic.consume one by one (0.47398710250854) on RabbitMQ 3.0.1, Erlang R14B04 in average more than 15%.
If consume only one message in main thread is your case - probably you have to use basic.get.
You still can consume only one message asynchronously, for example in separate thread or use some event mechanism. It would be better solution for you machine resource sometimes, but you have to take care about situation where no message available in queue.
If you have to process message one by one it is obvious that basic.consume should be used, I think

Resources