Sharing a slice across a few different goroutines - go

Given that I have an slice of structs of type User
Users := make([]User)
I am listening for TCP connections, and when a user connects, I'm adding an new user to this slice.
The way I've done this, is by setting up a NewUsers channel
NewUsers := make(chan User)
Upon new TCP connects, a User gets sent to this channel, and a central function waits for a User to arrive to add it to the Users slice.
But now I would like multiple subsystems (packages/functions) to use this list of Users. One function might simply want to receive a list of users, while a different function might want to broadcast messages to every user, or just users matching a certain condition.
How do multiple functions (which are possibly executed from different goroutines) safely access the list of users. I see two possible ways:
Every subsystem that needs access to this list needs their own AddUser channel and maintain their own slice of users and something needs to broadcast new users to every one of these channels.
Block access with a Mutex
Option 1 seems very convoluted and would generate a fair bit of duplication, but my understanding that Mutexes are best to be avoided if you try to stick to the "Share Memory By Communicating" mantra.

The idiomatic Go way to share data between concurrent activities is summed up in this:
Do not communicate by sharing memory; instead, share memory by
communicating.
Andrew Gerrand blogged about this, for example.
It need not be overly-complex; you can think of designing internal microservices, expressed using goroutines with channels.
In your case, this probably means designing a service element to contain the master copy of the list of users.
The main advantages of the Go/CSP strategy are that
concurrency is a design choice, along with your other aspects of design
only local knowledge is needed to understand the concurrent behaviour: this arises because a goroutine can itself consist of internal goroutines, and this applies all the way down if needed. Understanding the external behaviour of the higher-level goroutines depends only on its interfaces, not on the hidden internals.
But...
There are times when a safely shared data structure (protected by mutexes) will be sufficient now and always. It might then be argued that the extra complexity of goroutines and channels is a non-requirement.
A safely shared list data structure is something you will find several people have provided as open-source APIs. (I have one myself - see the built-ins in runtemplate).

The mutex approach is the best, safest and most manageable approach to that problem and is the fastest.
Channels are complex beasts on the inside and are much slower than a rwmutex-guarded map/slice.

Related

How to implement a channel and multiple readers that read the same data at the same time?

I need several functions to have the same channel as a parameter and take the same data, simultaneously.
Each of these functions has an independent task from each other, but they start from the same value.
For example, given a slice of integers, one function calculates the sum of its values ​​and another calculates the average, at the same time. They would be goroutines.
One solution would be to create multiple channels from one value, but I want to avoid that. I might have to add or remove functions and for this, I would have to add or remove channels.
I think I understand that the Fan Out pattern could be an option, but I can't quite understand its implementation.
The question is against the rules of SO—as it does not present any concrete problem to be helped with but rather requests a tutoring session.
Anyway, two pointers for further research: basically—given the property of channel that each receive consumes a value sent to it, so it's impossible to read a once sent value multiple times,—such problems have two approaches to their solutions.
The first approach, which is what called a "fan-out", is to have all the consumers have a "personal" dedicated channel, copy the value to be broadcast as many times as there are consumers and send each copy to each of those dedicated channels.
The ostensibly most natural way to implement this is to have a single channel to which the producer sends its units of work—not caring how much consumers are to read them—and then have a dedicated goroutine receive those units of work, copy each of them and send the copies out to the dedicated channels of the consumers.
The second approach is to go lower level and implement basically the same scheme using stuff from the sync package.
One can think of the following scheme:
Have a custom struct type which has a sync.Mutex protecting the type's state.
Have a field which keeps the value multiple consumers have to read.
Have a counter in that type.
Have a sync.Cond in that type as well.
Have a channel with capacity there 1 as well.
Communicating a new value to the consumers looks like this:
Lock the mutex.
Verify the counter is 0, panic otherwise.
Write the new value into the respective field.
Set the counter to the number of consumers.
Unlock the mutex.
Pulse the sync.Cond.
The consumers are supposed to sleep in a wait call on that sync.Cond.
Once the sender pulses it, the goroutines running the code of consumers get woken up and try to read the value.
Reading of the value rolls like this:
Lock the mutex.
Verify the counter is greater than zero, panic otherwise.
Read the value.
Decrement the counter by one.
If the counter becomes 0, send on that special channel.
Unlock the mutex.
The channel is needed to communicate to the sender that all the consumers are done with their reads: before attempting to send the new value the consumer has to read from that channel.
As you can probably see, the second approach is way more involved and hard to get right, so I'd recommend to go with the first one.
I would also note that you seem to lack certain background knowledge on how to go around implementing concurrently running and communicating tasks.
I hereby recommend reading The Book and at least these chapters of The Blog:
Go Concurrency Patterns: Pipelines and cancellation.
Go Concurrency Patterns: Timing out, moving on
Advanced Go Concurrency Patterns

How get a data without polling?

This is more of a theorical question.
Well, imagine that I have two programas that work simultaneously, the main one only do something when he receives a flag marked with true from a secondary program. So, this main program has a function that will keep asking to the secondary for the value of the flag, and when it gets true, it will do something.
What I learned at college is that the polling is the simplest way of doing that. But when I started working as an developer, coworkers told me that this method generate some overhead or it's waste of computation, by asking every certain amount of time for a value.
I tried to come up with some ideas for doing this in a different way, searched on the internet for something like this, but didn't found a useful way about how to do this.
I read about interruptions and passive ways that can cause the main program to get that data only if was informed by the secondary program. But how this happen? The main program will need a function to check for interruption right? So it will not end the same way as before?
What could I do differently?
There is no magic...
no program will guess when it has new information to be read, what you can do is decide between two approaches,
A -> asks -> B
A <- is informed <- B
whenever use each? it depends in many other factors like:
1- how fast you need the data be delivered from the moment it is generated? as far as possible? or keep a while and acumulate
2- how fast the data is generated?
3- how many simoultaneuos clients are requesting data at same server
4- what type of data you deal with? persistent? fast-changing?
If you are building something like a stocks analyzer where you need to ask the price of stocks everysecond (and it will change also everysecond) the approach you mentioned may be the best
if you are writing a chat based app like whatsapp where you need to check if there is some new message to the client and most of time wont... publish subscribe may be the best
but all of this is a very superficial look into a high impact architecture decision, it is not possible to get the best by just looking one factor
what i want to show is that
coworkers told me that this method generate some overhead or it's
waste of computation
it is not a right statement, it may be in some particular scenario but overhead will always exist in distributed systems
The typical way to prevent polling is by using the Publish/Subscribe pattern.
Your client program will subscribe to the server program and when an event occurs, the server program will publish to all its subscribers for them to handle however they need to.
If you flip the order of the requests you end up with something more similar to a standard web API. Your main program (left in your example) would be a server listening for requests. The secondary program would be a client hitting an endpoint on the server to trigger an event.
There's many ways to accomplish this in every language and it doesn't have to be tied to tcp/ip requests.
I'll add a few links for you shortly.
Well, in most of languages you won't implement such a low level. But theorically speaking, there are different waiting strategies, you are talking about active waiting. Doing this you can easily eat all your memory.
Most of languages implements libraries to allow you to start a process as a service which is at passive waiting and it is triggered when a request comes.

Idiomatic Golang goroutines

In Go, if we have a type with a method that starts some looped mechanism (polling A and doing B forever) is it best to express this as:
// Run does stuff, you probably want to run this as a goroutine
func (t Type) Run() {
// Do long-running stuff
}
and document that this probably wants to be launched as a goroutine (and let the caller deal with that)
Or to hide this from the caller:
// Run does stuff concurrently
func (t Type) Run() {
go DoRunStuff()
}
I'm new to Go and unsure if convention says let the caller prefix with 'go' or do it for them when the code is designed to run async.
My current view is that we should document and give the caller a choice. My thinking is that in Go the concurrency isn't actually part of the exposed interface, but a property of using it. Is this right?
I had your opinion on this until I started writing an adapter for a web service that I want to make concurrent. I have a go routine that must be started to parse results that are returned to the channel from the web calls. There is absolutely no case in which this API would work without using it as a go routine.
I then began to look at packages like net/http. There is mandatory concurrency within that package. It is documented at the interface level that it should be able to be used concurrently, however the default implementations automatically use go routines.
Because Go's standard library commonly fires of go routines within its own packages, I think that if your package or API warrants it, you can handle them on your own.
My current view is that we should document and give the caller a choice.
I tend to agree with you.
Since Go makes it so easy to run code concurrently, you should try to avoid concurrency in your API (which forces clients to use it concurrently). Instead, create a synchronous API, and then clients have the option to run it synchronously or concurrently.
This was discussed in a talk a couple years ago: Twelve Go Best Practices
Slide 26, in particular, shows code more like your first example.
I would view the net/http package as an exception because in this case, the concurrency is almost mandatory. If the package didn't use concurrency internally, the client code would almost certainly have to. For example, http.Client doesn't (to my knowledge) start any goroutines. It is only the server that does so.
In most cases, it's going to be one line of the code for the caller either way:
go Run() or StartGoroutine()
The synchronous API is no harder to use concurrently and gives the caller more options.
There is no 'right' answer because circumstances differ.
Obviously there are cases where an API might contain utilities, simple algorithms, data collections etc that would look odd if packaged up as goroutines.
Conversely, there are cases where it is natural to expect 'under-the-hood' concurrency, such as a rich IO library (http server being the obvious example).
For a more extreme case, consider you were to produce a library of plug-n-play concurrent services. Such an API consists of modules each having a well-described interface via channels. Clearly, in this case it would inevitably involve goroutines starting as part of the API.
One clue might well be the presence or absence of channels in the function parameters. But I would expect clear documentation of what to expect either way.

Broadcast Server

I am writing a TCP Server that accepts connections from multiple clients, this server gathers data from the system that it's running on and transmits it to every connected client.
What design patterns would be best for this situation?
Example
Put all connections in an array, then loop through the array and send the data to each client one by one. Advantage: very easy to implement. Disadvantage: not very efficient when handling large amounts of data.
An easier way is to use some existing software to do this ... For example use https://www.ibm.com/developerworks/community/groups/service/html/communityview?communityUuid=d5bedadd-e46f-4c97-af89-22d65ffee070 .
In case you want to write on your own you will need a list(linked list) to manage the connections.
Here is an example of a server http://examples.oreilly.com/jenut/Server.java
If you want to handle large amounts of data, one of the techniques is to have a queue associated with each of the subscribers at the server end. A multi-threaded program can send the data to the clients from those queues.
A number of patterns have been developed for distributed processing and servers, for instance in the ACE project: http://www.cs.wustl.edu/~schmidt/patterns-ace.html. The design might be focused around the events which announce either that data has been received and may be read, or that buffers have been emptied and more data may now be written. At least in the days when a 32-bit address space was the rule, you could have many more open connections than you had threads, so you would typically have a small number of threads waiting for events which announced that they could safely read or write without stalling until the other side co-operated. This may come from events, or from calls such as select() or poll() terminating. Patterns are also associated with http://zguide.zeromq.org/page:all#-MQ-in-a-Hundred-Words.

How to design and structure a program that uses Actors

From Joe Armstrong's dissertation, he specified that an Actor-based program should be designed by following three steps. The thing is, I don't understand how the steps map to a real world problem or how to apply them. Here's Joe's original suggestion.
We identify all the truly concurrent activities in our real world activity.
We identify all message channels between the concurrent activities.
We write down all the messages which can flow on the different message channels.
Now we write the program. The structure of the program should exactly follow the structure of the problem. Each real world concurrent activity should be mapped onto exactly one concurrent process in our programming language. If there is a 1:1 mapping of the problem onto the program we say that the program is isomorphic to the problem.
It is extremely important that the mapping is exactly 1:1. The reason for this is that it minimizes the conceptual gap between the problem and the solution. If this mapping is not 1:1 the program will quickly degenerate, and become difficult to understand. This degeneration is often observed when non-CO languages are used to solve concurrent problems. Often the only way to get the program to work is to force several independent activities to be controlled by the same language thread or process. This leads to an inevitable loss of clarity, and makes the programs subject to complex and irreproducible interference errors.
I think #1 is fairly easy to figure out. It's #2 (and 3) where I get lost. To illustrate my frustration I stubbed out a small service available in this gist (Ruby service with callbacks).
Looking at that example service I can see how to answer #1. We have 5 concurrent services.
Start
LoginGateway
LogoutGateway
Stop
Subscribe
Some of those services don't work (or shouldn't) depending on the state the service is in. If the service hasn't been Started, then Login/Logout/Subscribe make no sense. Does this kind of state information have any relevance to Joe's 3 steps?
Anyway, given the example/mock service in that gist, I'm wondering how someone would go about designing a program to wrap this service up in an Actory fashion. I would just like to see a list of guidelines on how to apply Joe's 3 steps. Bonus points for writing some code (any language).
Generally, when structuring an application to use actors you have to identify the concurrent features of your application, which can be tricky to get the hang of. You identify 5 concurrent "services":
Start
LoginGateway
LogoutGateway
Stop
Subscribe
1, 4 and 5 seem to be types of messages that can flow through the system, 2 and 3 I'm not sure how to describe. Your gist is rather large and not super clear to me, but it looks like you've got some kind of message queue system. The actions a User can take are:
Log in to the system
Log out of the system
Subscribe to a Queue of messages
I'll assume logging in and out requires some auth step. I'll assume further that if the user fails the auth step their connection is broken but that creating a connection is not sufficient authentication.
The actions the System takes are:
Handling User actions
Routing messages to subscribers of a Queue
If that's not broadly true, let me know and I'll change this answer. (I'll assume that the messages that get sent to users are not generated by users but are an intrinsic part of the System; maybe we're discussing a monitoring service.) Anyhow, what is concurrent here? A few things:
Users act independently of one another
Queues have separate states
An actor based architecture represents each concurrent entity as its own process. The User is a finite state machine which authenticates, subscribes to a queue, alternatively receives messages and subscribes to more queues and eventually disconnects. In Erlang/OTP we'd represent this by a gen_fsm. The User process carries all the state needed to interact with the client which, if we're exposing a service over a network, would be a socket.
Authentication implies that the System is itself a 'process', though, more likely than not it's really a collection of processes which in Erlang/OTP we call an application. I digress. For simplification we'll assume that System is itself a single process which has some well-defined protocol and a state that keeps user credentials. User logins are, then, a well-defined message from a User process to the System process and the response therefrom. If there were no authentication we'd have no need for a System process as the only state related to a User would be a socket.
The careful reader will ask where do we accept socket connections for each User? Ah, good question. There's another concurrent entity in not mentioned, which we'll call here the Listener. It's another process that only listens for connections, creates a User for each new established socket and hands over ownership to the new User process, then loops back to listen.
The Queue is also a finite state machine. From its start state it accepts User subscription requests via a well-defined protocol, broadcasts messages to subscribers or accepts unsubscribe requests from User processes. This implies that the Queue has an internal store of User processes, the details of which are very dependent on language and need. In Erlang/OTP, for example, each Queue process would be a gen_server which stored User process ids--or PIDs--in a list and for each message to transmit simply did a multi-send to each User process in the list.
(In Erlang/OTP we'd user supervisors to ensure that processes stay alive and are restarted on death, which simplifies greatly the amount of work an Erlang developer has to do to ensure reliability in an actor-based architecture.)
Basically, to restate what Joe wrote, actor based architecture boils down to these points:
identify concurrent entities in the system and represent them in the implementation by processes,
decide how your processes will send messages (a primitive operation in Erlang/OTP, but something that has to be implemented explicitly in C or Ruby) and
create well-defined protocols between entities in the system which hide state modification.
It's been said that the Internet is the world's most successful actor based architecture and, really, that's not far off.

Resources