I new to golang and I am reading the example from the book gopl.
Section 9.8.4 of The Go Programming Language book explains why Goroutines have no notion of identity that is accessible to the programmer
Goroutines have no notion of identity that is accessible to the programmer. This is by design, since thread-local storage tends to be abused. For example, in a web server implemented in a language with thread-local storage, it’s common for many functions to find information about the HTTP request on whose behalf they are currently working by looking in that storage. However, just as with programs that rely excessively on global variables, this can lead to an unhealthy ‘‘action at a distance’’ in which the behavior of a function is not determined by its arguments alone, but by the identity of the thread in which it runs. Consequently, if the identity of the thread should change—some worker threads are enlisted to help, say—the function misbehaves mysteriously.
and use the example of web server to illustrate this point. However, I have difficulty in understanding why the so called "action at a distance" is a bad practice and how this leads to
a function is not determined by its arguments alone, but by the identity of the thread in which it runs.
could anyone give an explanation for this(preferably in short code snippets)
Any help is appreciated!
Let's say we have the following code:
func doubler(num int) {
return num + num
}
doubler(5) will return 10. go doubler(5) will also return 10.
You can do the same with some sort of thread-local storage, if you want:
func doubler() {
return getThreadLocal("num") + getThreadLocal("num")
}
And we could run this with something like:
go func() {
setThreadLocal("num", 10)
doubler()
}()
But which is clearer? The variant which explicitly passes the num argument, or the variant which "magically" gets this from sort of thread-local storage?
This is what is meant with "action at a distance". The line setThreadLocal("num", 10) (which is distant) affects how doubler() behaves.
This example is clearly artificial, but the same principle applies with more real examples. For example, in some environments it's not uncommon to use thread-local store things such as user information, or other "global" variables.
This is why the paragraph you quoted compared it to global variables: thread-local storage are global variables, applicable only to the current thread.
When you passing parameters as arguments things are a lot clearer defined. There is no magic (often undocumented) global state that you need to think of when debugging things or writing tests.
See my repo
package main
import (
"fmt"
"time"
"github.com/timandy/routine"
)
func main() {
goid := routine.Goid()
fmt.Printf("cur goid: %v\n", goid)
go func() {
goid := routine.Goid()
fmt.Printf("sub goid: %v\n", goid)
}()
// Wait for the sub-coroutine to finish executing.
time.Sleep(time.Second)
}
I recommend looking at this post for an example of why someone might want to get information about the current thread/thread a function is running in:
stackoverflow - main threads in C#
As pointed out in the question, conditioning the behavior of a function on certain thread requirements (most likely) produces fragile/error-prone code that is difficult to debug.
I guess what your text book is trying to say is that a function should never rely on running in a specific thread, look up threads, etc. because this can cause unexpected behavior (especially in an API, if it's not obvious to the end-user that a function has to run in a specific thread). In Go, anything like that is impossible purely by language design. The behavior of a goroutine never depends on threads or something similar just because goroutines don't have an identity as you say correctly.
Related
below code does not throw a data race
package main
import (
"fmt"
"os"
"strings"
)
func main() {
x := strings.Repeat(" ", 1024)
go func() {
for {
fmt.Fprintf(os.Stdout, x+"aa\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"bb\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"cc\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"dd\n")
}
}()
<-make(chan bool)
}
I tried multiple length of data, with variant https://play.golang.org/p/29Cnwqj5K30
This post says it is not TS.
This mail does not really answer the question, or I did not understand.
Package documentation of os and fmt dont mention much about this. I admit i did not dig the source code of those two packages to find further explanations, they appear too complex to me.
What are the recommendations and their references ?
I'm not sure it would qualify as a definitive answer but I'll try to provide some insight.
The F*-functions of the fmt package merely state they take a value of a type implementing io.Writer interface and call Write on it.
The functions themselves are safe for concurrent use — in the sense it's OK to call any number of fmt.Fwhaveter concurrently: the package itself is prepared for that,
but when it comes to concurrently writing to the same value of a type implementing io.Writer, the question becomes more complex because supporting of an interface in Go does not state anything about the real type concurrency-wise.
In other words, the real point of where the concurrency may or may not be allowed is deferred to the "writer" which the functions of fmt write to.
(One should also keep in mind that the fmt.*Print* functions are allowed to call Write on its destination any number of times during a single invocation, in a row, — as opposed to those provided by the stock package log.)
So, we basically have two cases:
Custom implementations of io.Writer.
Stock implementations of it, such as *os.File or wrappers around sockets produced by the functions of net package.
The first case is the simple one: whatever the implementor did.
The second case is harder: as I understand, the Go standard library's stance on this (albeit not clearly stated in the docs) in that the wrappers it provides around "things" provided by the OS—such as file descriptors and sockets—are reasonably "thin", and hence whatever semantics they implement, is transitively implemented by the stdlib code running on a particular system.
For instance, POSIX requires that write(2) calls are atomic with regard to one another when they are operating on regular files or symbolic links. This means, since any call to Write on things wrapping file descriptors or sockets actually results in a single "write" syscall of the target system, you might consult the docs of the target OS and get the idea of what will happen.
Note that POSIX only tells about filesystem objects, and if os.Stdout is opened to a terminal (or a pseudo-terminal) or to a pipe or to anything else which supports the write(2) syscall, the results will depend on what the relevant subsystem and/or the driver implement—for instance, data from multiple concurrent calls may be interspersed, or one of the calls, or both, may just be failed by the OS—unlikely, but still.
Going back to Go, from what I gather, the following facts hold true about the Go stdlib types which wrap file descriptors and sockets:
They are safe for concurrent use by themselves (I mean, on the Go level).
They "map" Write and Read calls 1-to-1 to the underlying object—that is, a Write call is never split into two or more underlying syscalls, and a Read call never returns data "glued" from the results of multiple underlying syscalls.
(By the way, people occasionally get tripped by this no-frills behaviour — for example, see this or this as examples.)
So basically when we consider this with the fact fmt.*Print* are free to call Write any number of times per a single call, your examples which use os.Stdout, will:
Never result in a data race — unless you've assigned the variable os.Stdout some custom implementation, — but
The data actually written to the underlying FD will be intermixed in an unpredictable order which may depend on many factors including the OS kernel version and settings, the version of Go used to build the program, the hardware and the load on the system.
TL;DR
Multiple concurrent calls to fmt.Fprint* writing to the same "writer" value defer their concurrency to the implementation (type) of the "writer".
It's impossible to have a data race with "file-like" objects provided by the Go stdlib in the setup you have presented in your question.
The real problem will be not with data races on the Go program level but with the concurrent access to a single resource happening on level of the OS. And there, we do not (usually) speak about data races because the commodity OSes Go supports expose things one may "write to" as abstractions, where a real data race would possibly indicate a bug in the kernel or in the driver (and the Go's race detector won't be able to detect it anyway as that memory would not be owned by the Go runtime powering the process).
Basically, in your case, if you need to be sure the data produced by any particular call to fmt.Fprint* comes out as a single contiguous piece to the actual data receiver provided by the OS, you need to serialize these calls as the fmt package provides no guarantees regarding the number of calls to Write on the supplied "writer" for the functions it exports.
The serialization may either be external (explicit, that is "take a lock, call fmt.Fprint*, release the lock") or internal — by wrapping the os.Stdout in a custom type which would manage a lock, and using it).
And while we're at it, the log package does just that, and can be used straight away as the "loggers" it provides, including the default one, allow to inhibit outputting of "log headers" (such as the timestamp and the name of the file).
I know that the Go idiomatic way to handle errors is that it's treated as a value which is checked using an if statement to see if it's nil or not.
However, it soon gets tedious in a long function where you would need to do this if err!=nil{...} in multiple places.
I am aware that error handling is one of the pain points in the Go community.
I was just thinking why can't we do something like this,
func Xyz(param1 map[string]interface{}, param2 context string) (return1 map[string]interface{}, err error)
{
defer func() {
if r := recover(); r != nil {
err = fmt.Errorf("error: %s\n", r)
}
}()
.....
.....
.....
// Code causes a panic
}
In your function have a deferred function call which makes use of recover so that if any panic occurs the call stack will start unwinding and the recover function will be invoked causing the program to not terminate, handle itself and return the error back to the caller.
Here is a Go Playground example,
https://play.golang.org/p/-bG-xEfSO-Q
My question is what is the downside of this approach? Is there anything that we lose by this?
I understand that the recover function only works for the same goroutine. Let's assume that this is on the same goroutine.
Can you? Yes.
This is in fact done in a few cases, even in the standard library. See encoding/json for examples.
But this should only ever be done within the confines of your private API. That is to say, you should not write an API, whose exposed behavior includes a possible panic-as-error-state. So ensure that you recover all panics, and convert them to errors (or otherwise handle them) before returning a value to the consumer of your API.
What is the downside of this approach? Is there anything that we lose by this?
A few things come to mind:
It clearly violates the principle of least astonishment, which is why you should absolutely never let this practice cross your public API boundary.
It's cumbersome, any time you need to act based on the error type/content, as it requires you to recover the error, convert it back to its original type (which may or may not be an error type), do your inspection/action, then possibly re-panic. It's much simpler to do all this with just a simple error value.
Closely related to #2, treating errors as values gives you a lot of fine-grained control over when and how to behave in response to an error condition. panic is a very blunt instrument in this regard, so you lose a lot of control.
panic + recover does not perform as well as simply returning an error. In situations where an error truly is exceptional, this may not matter (i.e. in the encoding/json example, it's used when a write fails, which will abort the entire operation, so being efficient is not of high importance).
I'm sure there are other reasons people can come up with. Google is full of blog posts on the topic. For your continued reading, here's one random example.
I have read https://golang.org/ref/mem, but there are some parts which are still unclear to me.
For instance, in the section "Channel communication" it says: "The write to a happens before the send on c", but I don't know why that is the case. I am copying below the sample code extracted from the mentioned page for providing context.
var c = make(chan int, 10)
var a string
func f() {
a = "hello, world"
c <- 0
}
func main() {
go f()
<-c
print(a)
}
From the point of view of a single goroutine that assertion is true, however from the point of view of another goroutine that cannot be inferred from the guarantees that the text has mentioned so far.
So my question is: Are there other guarantees which are not explicitly stated in this document? For instance, can we say that given some sync primitive such as sending on a channel, ensures that commands placed before it, will not be moved after it by the compiler? What about the commands which comes after it, can we say that they will not be placed before the sync primitive?
What about the operations offered in the atomic package? Do they provide the same guarantees as channels operations?
can we say that given some sync primitive such as sending on a channel, ensures that commands placed before it, will not be moved after it by the compiler?
That is exactly what the memory model says. When you look at a single goroutine, the execution order can be rearranged so that effects of write operations are visible in the order they appear in the execution. So if you set a=1 at some point and read a later, the compiler knows not to move the write operation ahead of the read. For multiple goroutines, channels and locks are the synchronization points, so anything that happened before a channel/lock operation is visible to other goroutines once the synchronization point is reached. The compiler will not move code around so that a write operation crosses the synchronization boundary.
There are guarantees satisfied by the sync/atomic operations as well, and there have been discussions on whether to add them to the memory model. They are not explicitly stated at the moment. There is an open issue about it:
https://github.com/golang/go/issues/5045
As tile, I am referring to Go package sync.Map, can its functions be considered as atomic? Mainly the Load, Store, LoadOrStore, and Delete function.
I also build a simple example go playground, is it guaranteed that only one goroutine can enter the code range line 15 - 17? As my test seems it can be guaranteed.
Please help to explain.
The godoc page for the sync package says: "Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple goroutines without additional locking or coordination."
This statement guarantees that there's no need for additional mutexes or synchronization across goroutines. I wouldn't call that claim "atomic" (which has a very precise meaning), but it does mean that you don't have to worry about multiple goroutines being able to enter a LoadOrStore block (with the same key) like in your example.
I'm writing a CMS in Go and have a session type (user id, page contents to render, etc). Ideally I'd like that type to be a global variable so I'm not having to propagate it through all the nested functions, however having a global variable like that would obviously mean that each new session would overwrite it's predecessor, which, needlessly to say, would be an epic fail.
Some languages to offer a way of having globals within threads that are preserved within that thread (ie the value of that global is sandboxed within that thread). While I'm aware that Goroutines are not threading, I just wondered if there was a similar method at my disposal or if I'd have to pass a local pointer of my session type down through the varies nested routines.
I'm guessing channels wouldn't do this? From what I can gather (and please correct me if I'm wrong here), but they're basically just a safe way of sharing global variables?
edit: I'd forgotten about this question! Anyhow, an update for anyone who is curious. This question was written back when I was new to Go and the CMS was basically my first project. I was coming from a C background with familiarity with POSIX thread but I quickly realised a better approach was to write the code in a mode functional design with session objects passed down as pointers in function parameters. This gave me both the context-sensitive local scope I was after while also minimizing the amount to data I was copying about. However being a 7 year old project and one that was at the start of my transition to Go, it's fair to say the project could do with a major rewrite anyway as there are a lot of mistakes made. That's a concern for another day though - currently it works and I have enough other projects on the go at.
You'll want to use something like a Context:
http://blog.golang.org/context
Basically, the pattern is to create a Context for each unique thing you want to do. (A web request in your case.) Use context.WithValue to embed multiple variables in the context. Then always pass it as the first parameter to other methods that are doing further work in other goroutines.
Getting the variable you need out of the context is a matter of calling context.Value from within any goroutine. From the above link:
A Context is safe for simultaneous use by multiple goroutines. Code can pass a single Context to any number of goroutines and cancel that Context to signal all of them.
I had an implementation where I was explicitly sending variables as method parameters, and I discovered that embedding these variables using contexts significantly cleaned up my code.
Using a Context also helps because it provides ways to end long-running tasks by using channels, select, and a concept called a "done channel." See this article for a great basic review and implementation:
http://blog.golang.org/pipelines
I'd recommend reading the pipelines article first for a good flavor of how to manage communication among goroutines, then the context article for a better idea of how to level-up and start embedding variables to pass around.
Good luck!
Don't use global variables. Use Go goroutine-local variables.
go-routine Id..
There are already goroutine-local variables: they are called function
arguments, function return values, and local variables.
Russ
If you have more than one user, then wouldn't you need that info for each connection? So I would think that you'd have a struct per connected user. It would be idiomatic Go to pass a pointer to that struct when setting up the worker goroutine, or passing the pointer over a channel.