There is a parm which is read more and written less, And I don't want to use a mutex .I got it done like this by unsafe and atomic:
type tmp struct {
}
var value unsafe.Pointer
func GetValue() *tmp {
return (*tmp)(atomic.LoadPointer(&value))
}
func SetValue(p *tmp) {
atomic.StorePointer(&value, unsafe.Pointer(p))
}
Is this thread-safe ? and atomic.StorePointer happen before atomic.LoadPointer ?
It will be thread safe in the sense that you don't know what happens first, but the updating is atomic. Have you considered using a RWMutex instead? This won't lock readers unless there's a write going on.
As far as I know, storing and retrieving a pointer using atomic.LoadPointer and atomic.StorePointer is thread-safe (in the sense that the pointer itself will not be corrupted).
Of course, the pointed object must be immutable, and this is not enforced by this mechanism. This is your job to be sure that updating the configuration will result in a new object being created before calling UpdateConfig.
However, the main issue is the code relies on unsafe operations. Making it type safe is difficult, since two pointers have actually to be stored (one for the object, one for the type). Atomic pointer operations are not enough anymore to guarantee the consistency.
This is why a specific type has been recently added in the sync/atomic package: the sync.Value type. It is designed exactly to do what you want (i.e. optimizing access to mostly constant data).
You can find an example here: http://golang.org/pkg/sync/atomic/#example_Value_config
If you look in the implementation, you will realize why atomic pointer operations alone are not enough to implement a type safe mechanism.
According to https://code.google.com/p/go/issues/detail?id=5045
Regarding Add/CAS, it should be formulated in in more general terms, along the
lines of: at atomic operation that stores a value (incl ADD/CAS) happens before
atomic operation that reads that value from the memory location (incl ADD/CAS).
I think We can use atomic to ensure it thread-safe.
Related
below code does not throw a data race
package main
import (
"fmt"
"os"
"strings"
)
func main() {
x := strings.Repeat(" ", 1024)
go func() {
for {
fmt.Fprintf(os.Stdout, x+"aa\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"bb\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"cc\n")
}
}()
go func() {
for {
fmt.Fprintf(os.Stdout, x+"dd\n")
}
}()
<-make(chan bool)
}
I tried multiple length of data, with variant https://play.golang.org/p/29Cnwqj5K30
This post says it is not TS.
This mail does not really answer the question, or I did not understand.
Package documentation of os and fmt dont mention much about this. I admit i did not dig the source code of those two packages to find further explanations, they appear too complex to me.
What are the recommendations and their references ?
I'm not sure it would qualify as a definitive answer but I'll try to provide some insight.
The F*-functions of the fmt package merely state they take a value of a type implementing io.Writer interface and call Write on it.
The functions themselves are safe for concurrent use — in the sense it's OK to call any number of fmt.Fwhaveter concurrently: the package itself is prepared for that,
but when it comes to concurrently writing to the same value of a type implementing io.Writer, the question becomes more complex because supporting of an interface in Go does not state anything about the real type concurrency-wise.
In other words, the real point of where the concurrency may or may not be allowed is deferred to the "writer" which the functions of fmt write to.
(One should also keep in mind that the fmt.*Print* functions are allowed to call Write on its destination any number of times during a single invocation, in a row, — as opposed to those provided by the stock package log.)
So, we basically have two cases:
Custom implementations of io.Writer.
Stock implementations of it, such as *os.File or wrappers around sockets produced by the functions of net package.
The first case is the simple one: whatever the implementor did.
The second case is harder: as I understand, the Go standard library's stance on this (albeit not clearly stated in the docs) in that the wrappers it provides around "things" provided by the OS—such as file descriptors and sockets—are reasonably "thin", and hence whatever semantics they implement, is transitively implemented by the stdlib code running on a particular system.
For instance, POSIX requires that write(2) calls are atomic with regard to one another when they are operating on regular files or symbolic links. This means, since any call to Write on things wrapping file descriptors or sockets actually results in a single "write" syscall of the target system, you might consult the docs of the target OS and get the idea of what will happen.
Note that POSIX only tells about filesystem objects, and if os.Stdout is opened to a terminal (or a pseudo-terminal) or to a pipe or to anything else which supports the write(2) syscall, the results will depend on what the relevant subsystem and/or the driver implement—for instance, data from multiple concurrent calls may be interspersed, or one of the calls, or both, may just be failed by the OS—unlikely, but still.
Going back to Go, from what I gather, the following facts hold true about the Go stdlib types which wrap file descriptors and sockets:
They are safe for concurrent use by themselves (I mean, on the Go level).
They "map" Write and Read calls 1-to-1 to the underlying object—that is, a Write call is never split into two or more underlying syscalls, and a Read call never returns data "glued" from the results of multiple underlying syscalls.
(By the way, people occasionally get tripped by this no-frills behaviour — for example, see this or this as examples.)
So basically when we consider this with the fact fmt.*Print* are free to call Write any number of times per a single call, your examples which use os.Stdout, will:
Never result in a data race — unless you've assigned the variable os.Stdout some custom implementation, — but
The data actually written to the underlying FD will be intermixed in an unpredictable order which may depend on many factors including the OS kernel version and settings, the version of Go used to build the program, the hardware and the load on the system.
TL;DR
Multiple concurrent calls to fmt.Fprint* writing to the same "writer" value defer their concurrency to the implementation (type) of the "writer".
It's impossible to have a data race with "file-like" objects provided by the Go stdlib in the setup you have presented in your question.
The real problem will be not with data races on the Go program level but with the concurrent access to a single resource happening on level of the OS. And there, we do not (usually) speak about data races because the commodity OSes Go supports expose things one may "write to" as abstractions, where a real data race would possibly indicate a bug in the kernel or in the driver (and the Go's race detector won't be able to detect it anyway as that memory would not be owned by the Go runtime powering the process).
Basically, in your case, if you need to be sure the data produced by any particular call to fmt.Fprint* comes out as a single contiguous piece to the actual data receiver provided by the OS, you need to serialize these calls as the fmt package provides no guarantees regarding the number of calls to Write on the supplied "writer" for the functions it exports.
The serialization may either be external (explicit, that is "take a lock, call fmt.Fprint*, release the lock") or internal — by wrapping the os.Stdout in a custom type which would manage a lock, and using it).
And while we're at it, the log package does just that, and can be used straight away as the "loggers" it provides, including the default one, allow to inhibit outputting of "log headers" (such as the timestamp and the name of the file).
I have a question about the marker trait Sync after reading Extensible Concurrency with the Sync and Send Traits.
Java's "synchronize" means blocking, so I was very confused about how a Rust struct with Sync implemented whose method is executed on multiple threads would be effective.
I searched but found no meaningful answer. I'm thinking about it this way: every thread will get the struct's reference synchronously (blocking), but call the method in parallel, is that true?
Java: Accesses to this object from multiple threads become a synchronized sequence of actions when going through this codepath.
Rust: It is safe to access this type synchronously through a reference from multiple threads.
(The two points above are not canonical definitions, they are just demonstrations how similar words can be used in sentences to obtain different meanings)
synchronized is implemented as a mutual exclusion lock at runtime. Sync is a compile time promise about runtime properties of a specific type that allows other types depend on those properties through trait bounds. A Mutex just happens to be one way one can provide Sync behavior. Immutable types usually provide this behavior too without any runtime cost.
Generally you shouldn't rely on words having exactly the same meaning in different contexts. Java IO stream != java collection stream != RxJava reactive stream ~= tokio Stream. C volatile != java volatile. etc. etc.
Ultimately the prose matters a lot more than the keyword which are just shorthands.
I want to know exactly what could happen when go maps accessed by multiple goroutins lets assume we have a map[int]*User. can modifying fields of User structure by multiple goroutins cause data corruption ? or just operations like len() are not thread safe what would be different if map was thread safe in Go ?
Concurrently modifying the *User could cause corruption regardless of the map. Reading the pointer from the map concurrently is safe, as long as there are no modifications to the map. Modifying the data *User points to makes no changes to the map itself.
Concurrently modifying the map[int]*User itself also risks data corruption.
There are no benign data races, always test your code with the race detector.
Simplest example;
go WorkerMethodOne(myMapReference)
go WorkerMethodTwo(myMapReference)
in worker method one I have some code like this (example)
for i := 0; i < len(MyMapReference); i++ {
if i % 2 == 0 {
delete(MyMapReference, i)
}
}
Then when WorkerMethodTwo is iterating that same map and tries to access the item that just got deleted, what happens? While a k, err := map[index] may still be safe, unlike in many languages where you'd throw, it doesn't make sense and is unpredictable. Ultimately worse examples could happen like attempts to concurrently write to the value of some *User. It could cause concurrent modification to the actual value (what's at the pointer), or you could have the pointer pulled our from under you and randomly be working with a value different than what you expected ect. It's really no different than if you made two closures run as goroutines and started modifying a non-atomic int without locking/using a mutex. You don't what's going to happen since there is contention for that memory between two fully decoupled executions.
Question 1
I am building/searching for a RAM memory cache layer for my server. It is a simple LRU cache that needs to handle concurrent requests (both Gets an Sets).
I have found https://github.com/pmylund/go-cache claiming to be thread safe.
This is true as far as getting the stored interface. But if multiple goroutines requests the same data, they are all retrieving a pointer (stored in the interface) to the same block of memory. If any goroutine changes the data, this is no longer very safe.
Are there any cache-packages out there that tackles this problem?
Question 1.1
If the answer to Question 1 is No, then what would be the suggested solution?
I see two options:
Alternative 1
Solution: Storing the values in a wrapping struct with a sync.Mutex so that each goroutine needs to lock the data before reading/writing to it.
type cacheElement struct { value interface{}, lock sync.Mutex }
Drawbacks: The cache becomes unaware of changes made to data or might even have dropped it out of the cache. One goroutine might also lock others.
Alternative 2
Solution: Make a copy of the data (assuming the data in itself doesn't contain pointers)
Drawbacks: Memory allocation every time a cache Get is performed, more garbage collection.
Sorry for the multipart question. But you don't have to answer all of them. If you have a good answer to Question 1, that would be sufficient for me!
Alternative 2 sounds good to me, but please note that you do not have to copy the data for each cache.Get(). As long as your data can be considered immutable, you can access it with many multiple readers at once.
You only have to create a copy if you intend to modify it. This idiom is called COW (copy on write) and is quite common in concurrent software design. It's especially well suited for scenarios with a high read/write ratio (just like a cache).
So, whenever you want to modify a cached entry, you basically have to:
create a copy of the old cached data, if any.
modify the data (after this step, the data should be considered immutable and must not be changed anymore)
add / replace the existing element in the cache. You could either use the go-cache library you have pointed out earlier (which is based on locks) for that, or write your own lock-free library that simply swaps the pointers to the data element atomically.
At this point any goroutine that performs a cache.Get operation will get the new data. Existing goroutines however, might still be reading the old data. So, your program might operate on many different versions of the same data at once. But don't worry, as soon as all goroutines have finished accessing the old data, the GC will collect it automatically.
tux21b gave a good answer. I'll just point out that you don't have to return pointers to data. you can store non pointer values in your cache and go will pass by value which will be a copy. Then your Get and Set methods will be safe since nothing can actually modify the cache contents.
I've been reading some books on windows programming in C++ lately, and I have had some confusing understanding of some of the recurring concepts in WinAPI. For example, there are tons of data types that start with the handle keyword'H', are these supposed to be used like pointers? But then there are other data types that start with the pointer keyword 'P'. So I guess not. Then what is it exactly? And why were pointers to some data types given separate data types in the first place? For example, PCHAR could have easily designed to be CHAR*?
Handles used to be pointers in early versions of Windows but are not anymore. Think of them as a "cookie", a unique value that allows Windows to find back a resource that was allocated earlier. Like CreateFile() returns a new handle, you later use it in SetFilePointer() and ReadFile() to read data from that same file. And CloseHandle() to clean up the internal data structure, closing the file as well. Which is the general pattern, one api function to create the resource, one or more to use it and one to destroy it.
Yes, the types that start with P are pointer types. And yes, they are superfluous, it works just as well if you use the * yourself. Not actually sure why C programmers like to declare them, I personally think it reduces code readability and I always avoid them. But do note the compound types, like LPCWSTR, a "long pointer to a constant wide string". The L doesn't mean anything anymore, that dates back to the 16-bit version of Windows. But pointer, const and wide are important. I do use that typedef, not doing so will risk future portability problems. Which is the core reason these typedefs exist.
A handle is the same as a pointer only so far as both ID a particular item. Obviously a pointer is the address of the item so if you know it's structure you can start getting fields in the item. A handle may or may not be a pointer - basically if it is a pointer you don't know what it is pointing to so you can't get into the fields.
Best way to think of a handle is that it is a unique ID for something in the system. When you pass it to something in the system the system will know what to cast it to (if it is a pointer) or how to treat it (if it is just some id or index).