below code does not throw a data race
package main
import (
"fmt"
"os"
"strings"
)
func main() {
x := strings.Repeat(" ", 1024)
go func() {
for {
fmt.Fprintf(os.Stdout, x+"aa\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"bb\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"cc\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"dd\n")
}
}()
<-make(chan bool)
}
I tried multiple length of data, with variant https://play.golang.org/p/29Cnwqj5K30
This post says it is not TS.
This mail does not really answer the question, or I did not understand.
Package documentation of os and fmt dont mention much about this. I admit i did not dig the source code of those two packages to find further explanations, they appear too complex to me.
What are the recommendations and their references ?
I'm not sure it would qualify as a definitive answer but I'll try to provide some insight.
The F*-functions of the fmt package merely state they take a value of a type implementing io.Writer interface and call Write on it.
The functions themselves are safe for concurrent use — in the sense it's OK to call any number of fmt.Fwhaveter concurrently: the package itself is prepared for that,
but when it comes to concurrently writing to the same value of a type implementing io.Writer, the question becomes more complex because supporting of an interface in Go does not state anything about the real type concurrency-wise.
In other words, the real point of where the concurrency may or may not be allowed is deferred to the "writer" which the functions of fmt write to.
(One should also keep in mind that the fmt.*Print* functions are allowed to call Write on its destination any number of times during a single invocation, in a row, — as opposed to those provided by the stock package log.)
So, we basically have two cases:
Custom implementations of io.Writer.
Stock implementations of it, such as *os.File or wrappers around sockets produced by the functions of net package.
The first case is the simple one: whatever the implementor did.
The second case is harder: as I understand, the Go standard library's stance on this (albeit not clearly stated in the docs) in that the wrappers it provides around "things" provided by the OS—such as file descriptors and sockets—are reasonably "thin", and hence whatever semantics they implement, is transitively implemented by the stdlib code running on a particular system.
For instance, POSIX requires that write(2) calls are atomic with regard to one another when they are operating on regular files or symbolic links. This means, since any call to Write on things wrapping file descriptors or sockets actually results in a single "write" syscall of the target system, you might consult the docs of the target OS and get the idea of what will happen.
Note that POSIX only tells about filesystem objects, and if os.Stdout is opened to a terminal (or a pseudo-terminal) or to a pipe or to anything else which supports the write(2) syscall, the results will depend on what the relevant subsystem and/or the driver implement—for instance, data from multiple concurrent calls may be interspersed, or one of the calls, or both, may just be failed by the OS—unlikely, but still.
Going back to Go, from what I gather, the following facts hold true about the Go stdlib types which wrap file descriptors and sockets:
They are safe for concurrent use by themselves (I mean, on the Go level).
They "map" Write and Read calls 1-to-1 to the underlying object—that is, a Write call is never split into two or more underlying syscalls, and a Read call never returns data "glued" from the results of multiple underlying syscalls.
(By the way, people occasionally get tripped by this no-frills behaviour — for example, see this or this as examples.)
So basically when we consider this with the fact fmt.*Print* are free to call Write any number of times per a single call, your examples which use os.Stdout, will:
Never result in a data race — unless you've assigned the variable os.Stdout some custom implementation, — but
The data actually written to the underlying FD will be intermixed in an unpredictable order which may depend on many factors including the OS kernel version and settings, the version of Go used to build the program, the hardware and the load on the system.
TL;DR
Multiple concurrent calls to fmt.Fprint* writing to the same "writer" value defer their concurrency to the implementation (type) of the "writer".
It's impossible to have a data race with "file-like" objects provided by the Go stdlib in the setup you have presented in your question.
The real problem will be not with data races on the Go program level but with the concurrent access to a single resource happening on level of the OS. And there, we do not (usually) speak about data races because the commodity OSes Go supports expose things one may "write to" as abstractions, where a real data race would possibly indicate a bug in the kernel or in the driver (and the Go's race detector won't be able to detect it anyway as that memory would not be owned by the Go runtime powering the process).
Basically, in your case, if you need to be sure the data produced by any particular call to fmt.Fprint* comes out as a single contiguous piece to the actual data receiver provided by the OS, you need to serialize these calls as the fmt package provides no guarantees regarding the number of calls to Write on the supplied "writer" for the functions it exports.
The serialization may either be external (explicit, that is "take a lock, call fmt.Fprint*, release the lock") or internal — by wrapping the os.Stdout in a custom type which would manage a lock, and using it).
And while we're at it, the log package does just that, and can be used straight away as the "loggers" it provides, including the default one, allow to inhibit outputting of "log headers" (such as the timestamp and the name of the file).
Related
I know that the Go idiomatic way to handle errors is that it's treated as a value which is checked using an if statement to see if it's nil or not.
However, it soon gets tedious in a long function where you would need to do this if err!=nil{...} in multiple places.
I am aware that error handling is one of the pain points in the Go community.
I was just thinking why can't we do something like this,
func Xyz(param1 map[string]interface{}, param2 context string) (return1 map[string]interface{}, err error)
{
defer func() {
if r := recover(); r != nil {
err = fmt.Errorf("error: %s\n", r)
}
}()
.....
.....
.....
// Code causes a panic
}
In your function have a deferred function call which makes use of recover so that if any panic occurs the call stack will start unwinding and the recover function will be invoked causing the program to not terminate, handle itself and return the error back to the caller.
Here is a Go Playground example,
https://play.golang.org/p/-bG-xEfSO-Q
My question is what is the downside of this approach? Is there anything that we lose by this?
I understand that the recover function only works for the same goroutine. Let's assume that this is on the same goroutine.
Can you? Yes.
This is in fact done in a few cases, even in the standard library. See encoding/json for examples.
But this should only ever be done within the confines of your private API. That is to say, you should not write an API, whose exposed behavior includes a possible panic-as-error-state. So ensure that you recover all panics, and convert them to errors (or otherwise handle them) before returning a value to the consumer of your API.
What is the downside of this approach? Is there anything that we lose by this?
A few things come to mind:
It clearly violates the principle of least astonishment, which is why you should absolutely never let this practice cross your public API boundary.
It's cumbersome, any time you need to act based on the error type/content, as it requires you to recover the error, convert it back to its original type (which may or may not be an error type), do your inspection/action, then possibly re-panic. It's much simpler to do all this with just a simple error value.
Closely related to #2, treating errors as values gives you a lot of fine-grained control over when and how to behave in response to an error condition. panic is a very blunt instrument in this regard, so you lose a lot of control.
panic + recover does not perform as well as simply returning an error. In situations where an error truly is exceptional, this may not matter (i.e. in the encoding/json example, it's used when a write fails, which will abort the entire operation, so being efficient is not of high importance).
I'm sure there are other reasons people can come up with. Google is full of blog posts on the topic. For your continued reading, here's one random example.
I have read https://golang.org/ref/mem, but there are some parts which are still unclear to me.
For instance, in the section "Channel communication" it says: "The write to a happens before the send on c", but I don't know why that is the case. I am copying below the sample code extracted from the mentioned page for providing context.
var c = make(chan int, 10)
var a string
func f() {
a = "hello, world"
c <- 0
}
func main() {
go f()
<-c
print(a)
}
From the point of view of a single goroutine that assertion is true, however from the point of view of another goroutine that cannot be inferred from the guarantees that the text has mentioned so far.
So my question is: Are there other guarantees which are not explicitly stated in this document? For instance, can we say that given some sync primitive such as sending on a channel, ensures that commands placed before it, will not be moved after it by the compiler? What about the commands which comes after it, can we say that they will not be placed before the sync primitive?
What about the operations offered in the atomic package? Do they provide the same guarantees as channels operations?
can we say that given some sync primitive such as sending on a channel, ensures that commands placed before it, will not be moved after it by the compiler?
That is exactly what the memory model says. When you look at a single goroutine, the execution order can be rearranged so that effects of write operations are visible in the order they appear in the execution. So if you set a=1 at some point and read a later, the compiler knows not to move the write operation ahead of the read. For multiple goroutines, channels and locks are the synchronization points, so anything that happened before a channel/lock operation is visible to other goroutines once the synchronization point is reached. The compiler will not move code around so that a write operation crosses the synchronization boundary.
There are guarantees satisfied by the sync/atomic operations as well, and there have been discussions on whether to add them to the memory model. They are not explicitly stated at the moment. There is an open issue about it:
https://github.com/golang/go/issues/5045
I new to golang and I am reading the example from the book gopl.
Section 9.8.4 of The Go Programming Language book explains why Goroutines have no notion of identity that is accessible to the programmer
Goroutines have no notion of identity that is accessible to the programmer. This is by design, since thread-local storage tends to be abused. For example, in a web server implemented in a language with thread-local storage, it’s common for many functions to find information about the HTTP request on whose behalf they are currently working by looking in that storage. However, just as with programs that rely excessively on global variables, this can lead to an unhealthy ‘‘action at a distance’’ in which the behavior of a function is not determined by its arguments alone, but by the identity of the thread in which it runs. Consequently, if the identity of the thread should change—some worker threads are enlisted to help, say—the function misbehaves mysteriously.
and use the example of web server to illustrate this point. However, I have difficulty in understanding why the so called "action at a distance" is a bad practice and how this leads to
a function is not determined by its arguments alone, but by the identity of the thread in which it runs.
could anyone give an explanation for this(preferably in short code snippets)
Any help is appreciated!
Let's say we have the following code:
func doubler(num int) {
return num + num
}
doubler(5) will return 10. go doubler(5) will also return 10.
You can do the same with some sort of thread-local storage, if you want:
func doubler() {
return getThreadLocal("num") + getThreadLocal("num")
}
And we could run this with something like:
go func() {
setThreadLocal("num", 10)
doubler()
}()
But which is clearer? The variant which explicitly passes the num argument, or the variant which "magically" gets this from sort of thread-local storage?
This is what is meant with "action at a distance". The line setThreadLocal("num", 10) (which is distant) affects how doubler() behaves.
This example is clearly artificial, but the same principle applies with more real examples. For example, in some environments it's not uncommon to use thread-local store things such as user information, or other "global" variables.
This is why the paragraph you quoted compared it to global variables: thread-local storage are global variables, applicable only to the current thread.
When you passing parameters as arguments things are a lot clearer defined. There is no magic (often undocumented) global state that you need to think of when debugging things or writing tests.
See my repo
package main
import (
"fmt"
"time"
"github.com/timandy/routine"
)
func main() {
goid := routine.Goid()
fmt.Printf("cur goid: %v\n", goid)
go func() {
goid := routine.Goid()
fmt.Printf("sub goid: %v\n", goid)
}()
// Wait for the sub-coroutine to finish executing.
time.Sleep(time.Second)
}
I recommend looking at this post for an example of why someone might want to get information about the current thread/thread a function is running in:
stackoverflow - main threads in C#
As pointed out in the question, conditioning the behavior of a function on certain thread requirements (most likely) produces fragile/error-prone code that is difficult to debug.
I guess what your text book is trying to say is that a function should never rely on running in a specific thread, look up threads, etc. because this can cause unexpected behavior (especially in an API, if it's not obvious to the end-user that a function has to run in a specific thread). In Go, anything like that is impossible purely by language design. The behavior of a goroutine never depends on threads or something similar just because goroutines don't have an identity as you say correctly.
As tile, I am referring to Go package sync.Map, can its functions be considered as atomic? Mainly the Load, Store, LoadOrStore, and Delete function.
I also build a simple example go playground, is it guaranteed that only one goroutine can enter the code range line 15 - 17? As my test seems it can be guaranteed.
Please help to explain.
The godoc page for the sync package says: "Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple goroutines without additional locking or coordination."
This statement guarantees that there's no need for additional mutexes or synchronization across goroutines. I wouldn't call that claim "atomic" (which has a very precise meaning), but it does mean that you don't have to worry about multiple goroutines being able to enter a LoadOrStore block (with the same key) like in your example.
I know there are no destructors in Go since technically there are no classes. As such, I use initClass to perform the same functions as a constructor. However, is there any way to create something to mimic a destructor in the event of a termination, for the use of, say, closing files? Right now I just call defer deinitClass, but this is rather hackish and I think a poor design. What would be the proper way?
In the Go ecosystem, there exists a ubiquitous idiom for dealing with objects which wrap precious (and/or external) resources: a special method designated for freeing that resource, called explicitly — typically via the defer mechanism.
This special method is typically named Close(), and the user of the object has to call it explicitly when they're done with the resource the object represents. The io standard package does even have a special interface, io.Closer, declaring that single method. Objects implementing I/O on various resources such as TCP sockets, UDP endpoints and files all satisfy io.Closer, and are expected to be explicitly Closed after use.
Calling such a cleanup method is typically done via the defer mechanism which guarantees the method will run no matter if some code which executes after resource acquisition will panic() or not.
You might also notice that not having implicit "destructors" quite balances not having implicit "constructors" in Go. This actually has nothing to do with not having "classes" in Go: the language designers just avoid magic as much as practically possible.
Note that Go's approach to this problem might appear to be somewhat low-tech but in fact it's the only workable solution for the runtime featuring garbage-collection. In a language with objects but without GC, say C++, destructing an object is a well-defined operation because an object is destroyed either when it goes out of scope or when delete is called on its memory block. In a runtime with GC, the object will be destroyed at some mostly indeterminate point in the future by the GC scan, and may not be destroyed at all. So if the object wraps some precious resource, that resource might get reclaimed way past the moment in time the last live reference to the enclosing object was lost, and it might even not get reclaimed at all—as has been well explained by #twotwotwo in their respective answer.
Another interesting aspect to consider is that the Go's GC is fully concurrent (with the regular program execution). This means a GC thread which is about to collect a dead object might (and usually will) be not the thread(s) which executed that object's code when it was alive. In turn, this means that if the Go types could have destructors then the programmer would need to make sure whatever code the destructor executes is properly synchronized with the rest of the program—if the object's state affects some data structures external to it. This actually might force the programmer to add such synchronization even if the object does not need it for its normal operation (and most objects fall into such category). And think about what happens of those exernal data strucrures happened to be destroyed before the object's destructor was called (the GC collects dead objects in a non-deterministic way). In other words, it's much easier to control — and to reason about — object destruction when it is explicitly coded into the program's flow: both for specifying when the object has to be destroyed, and for guaranteeing proper ordering of its destruction with regard to destroying of the data structures external to it.
If you're familiar with .NET, it deals with resource cleanup in a way which resembles that of Go quite closely: your objects which wrap some precious resource have to implement the IDisposable interface, and a method, Dispose(), exported by that interface, must be called explicitly when you're done with such an object. C# provides some syntactic sugar for this use case via the using statement which makes the compiler arrange for calling Dispose() on the object when it goes out of the scope declared by the said statement. In Go, you'll typically defer calls to cleanup methods.
One more note of caution. Go wants you to treat errors very seriously (unlike most mainstream programming language with their "just throw an exception and don't give a fsck about what happens due to it elsewhere and what state the program will be in" attitude) and so you might consider checking error returns of at least some calls to cleanup methods.
A good example is instances of the os.File type representing files on a filesystem. The fun stuff is that calling Close() on an open file might fail due to legitimate reasons, and if you were writing to that file this might indicate that not all the data you wrote to that file had actually landed in it on the file system. For an explanation, please read the "Notes" section in the close(2) manual.
In other words, just doing something like
fd, err := os.Open("foo.txt")
defer fd.Close()
is okay for read-only files in the 99.9% of cases, but for files opening for writing, you might want to implement more involved error checking and some strategy for dealing with them (mere reporting, wait-then-retry, ask-then-maybe-retry or whatever).
runtime.SetFinalizer(ptr, finalizerFunc) sets a finalizer--not a destructor but another mechanism to maybe eventually free up resources. Read the documentation there for details, including downsides. They might not run until long after the object is actually unreachable, and they might not run at all if the program exits first. They also postpone freeing memory for another GC cycle.
If you're acquiring some limited resource that doesn't already have a finalizer, and the program would eventually be unable to continue if it kept leaking, you should consider setting a finalizer. It can mitigate leaks. Unreachable files and network connections are already cleaned up by finalizers in the stdlib, so it's only other sorts of resources where custom ones can be useful. The most obvious class is system resources you acquire through syscall or cgo, but I can imagine others.
Finalizers can help get a resource freed eventually even if the code using it omits a Close() or similar cleanup, but they're too unpredictable to be the main way to free resources. They don't run until GC does. Because the program could exit before next GC, you can't rely on them for things that must be done, like flushing buffered output to the filesystem. If GC does happen, it might not happen soon enough: if a finalizer is responsible for closing network connections, maybe a remote host hits its limit on open connections to you before GC, or your process hits its file-descriptor limit, or you run out of ephemeral ports, or something else. So it's much better to defer and do cleanup right when it's necessary than to use a finalizer and hope it's done soon enough.
You don't see many SetFinalizer calls in everyday Go programming, partly because the most important ones are in the standard library and mostly because of their limited range of applicability in general.
In short, finalizers can help by freeing forgotten resources in long-running programs, but because not much about their behavior is guaranteed, they aren't fit to be your main resource-management mechanism.
There are Finalizers in Go. I wrote a little blog post about it. They are even used for closing files in the standard library as you can see here.
However, I think using defer is more preferable because it's more readable and less magical.