Confused about performance implications of Sync - performance

I have a question about the marker trait Sync after reading Extensible Concurrency with the Sync and Send Traits.
Java's "synchronize" means blocking, so I was very confused about how a Rust struct with Sync implemented whose method is executed on multiple threads would be effective.
I searched but found no meaningful answer. I'm thinking about it this way: every thread will get the struct's reference synchronously (blocking), but call the method in parallel, is that true?

Java: Accesses to this object from multiple threads become a synchronized sequence of actions when going through this codepath.
Rust: It is safe to access this type synchronously through a reference from multiple threads.
(The two points above are not canonical definitions, they are just demonstrations how similar words can be used in sentences to obtain different meanings)
synchronized is implemented as a mutual exclusion lock at runtime. Sync is a compile time promise about runtime properties of a specific type that allows other types depend on those properties through trait bounds. A Mutex just happens to be one way one can provide Sync behavior. Immutable types usually provide this behavior too without any runtime cost.
Generally you shouldn't rely on words having exactly the same meaning in different contexts. Java IO stream != java collection stream != RxJava reactive stream ~= tokio Stream. C volatile != java volatile. etc. etc.
Ultimately the prose matters a lot more than the keyword which are just shorthands.

Related

Is there any situation where I have to use the Public Subject instead of the Public Relay?

When fetching data, most people use PublishSubject, but what happens when they use PublishRelay? If an error occurs in the PublishSubject while using the app, isn't it dangerous because the app dies?
You shouldn't be using either when fetching data.
Subjects [and Relays] provide a convenient way to poke around Rx, however they are not recommended for day to day use... Instead of using subjects, favor the factory methods.
The Observable interface is the dominant type that you will be exposed to for representing a sequence of data in motion, and therefore will comprise the core concern for most of your work with Rx...
Avoid the use of the subject types [including Relays]. Rx is effectively a functional programming paradigm. Using subjects means we are now managing state, which is potentially mutating. Dealing with both mutating state and asynchronous programming at the same time is very hard to get right. Furthermore, many of the operators (extension methods) have been carefully written to ensure that correct and consistent lifetime of subscriptions and sequences is maintained; when you introduce subjects, you can break this.
-- Introduction to Rx
Note that URLSession.shared.rx.data(request:) returns an Observable, not a Subject or Relay.

Go destructors?

I know there are no destructors in Go since technically there are no classes. As such, I use initClass to perform the same functions as a constructor. However, is there any way to create something to mimic a destructor in the event of a termination, for the use of, say, closing files? Right now I just call defer deinitClass, but this is rather hackish and I think a poor design. What would be the proper way?
In the Go ecosystem, there exists a ubiquitous idiom for dealing with objects which wrap precious (and/or external) resources: a special method designated for freeing that resource, called explicitly — typically via the defer mechanism.
This special method is typically named Close(), and the user of the object has to call it explicitly when they're done with the resource the object represents. The io standard package does even have a special interface, io.Closer, declaring that single method. Objects implementing I/O on various resources such as TCP sockets, UDP endpoints and files all satisfy io.Closer, and are expected to be explicitly Closed after use.
Calling such a cleanup method is typically done via the defer mechanism which guarantees the method will run no matter if some code which executes after resource acquisition will panic() or not.
You might also notice that not having implicit "destructors" quite balances not having implicit "constructors" in Go. This actually has nothing to do with not having "classes" in Go: the language designers just avoid magic as much as practically possible.
Note that Go's approach to this problem might appear to be somewhat low-tech but in fact it's the only workable solution for the runtime featuring garbage-collection. In a language with objects but without GC, say C++, destructing an object is a well-defined operation because an object is destroyed either when it goes out of scope or when delete is called on its memory block. In a runtime with GC, the object will be destroyed at some mostly indeterminate point in the future by the GC scan, and may not be destroyed at all. So if the object wraps some precious resource, that resource might get reclaimed way past the moment in time the last live reference to the enclosing object was lost, and it might even not get reclaimed at all—as has been well explained by #twotwotwo in their respective answer.
Another interesting aspect to consider is that the Go's GC is fully concurrent (with the regular program execution). This means a GC thread which is about to collect a dead object might (and usually will) be not the thread(s) which executed that object's code when it was alive. In turn, this means that if the Go types could have destructors then the programmer would need to make sure whatever code the destructor executes is properly synchronized with the rest of the program—if the object's state affects some data structures external to it. This actually might force the programmer to add such synchronization even if the object does not need it for its normal operation (and most objects fall into such category). And think about what happens of those exernal data strucrures happened to be destroyed before the object's destructor was called (the GC collects dead objects in a non-deterministic way). In other words, it's much easier to control — and to reason about — object destruction when it is explicitly coded into the program's flow: both for specifying when the object has to be destroyed, and for guaranteeing proper ordering of its destruction with regard to destroying of the data structures external to it.
If you're familiar with .NET, it deals with resource cleanup in a way which resembles that of Go quite closely: your objects which wrap some precious resource have to implement the IDisposable interface, and a method, Dispose(), exported by that interface, must be called explicitly when you're done with such an object. C# provides some syntactic sugar for this use case via the using statement which makes the compiler arrange for calling Dispose() on the object when it goes out of the scope declared by the said statement. In Go, you'll typically defer calls to cleanup methods.
One more note of caution. Go wants you to treat errors very seriously (unlike most mainstream programming language with their "just throw an exception and don't give a fsck about what happens due to it elsewhere and what state the program will be in" attitude) and so you might consider checking error returns of at least some calls to cleanup methods.
A good example is instances of the os.File type representing files on a filesystem. The fun stuff is that calling Close() on an open file might fail due to legitimate reasons, and if you were writing to that file this might indicate that not all the data you wrote to that file had actually landed in it on the file system. For an explanation, please read the "Notes" section in the close(2) manual.
In other words, just doing something like
fd, err := os.Open("foo.txt")
defer fd.Close()
is okay for read-only files in the 99.9% of cases, but for files opening for writing, you might want to implement more involved error checking and some strategy for dealing with them (mere reporting, wait-then-retry, ask-then-maybe-retry or whatever).
runtime.SetFinalizer(ptr, finalizerFunc) sets a finalizer--not a destructor but another mechanism to maybe eventually free up resources. Read the documentation there for details, including downsides. They might not run until long after the object is actually unreachable, and they might not run at all if the program exits first. They also postpone freeing memory for another GC cycle.
If you're acquiring some limited resource that doesn't already have a finalizer, and the program would eventually be unable to continue if it kept leaking, you should consider setting a finalizer. It can mitigate leaks. Unreachable files and network connections are already cleaned up by finalizers in the stdlib, so it's only other sorts of resources where custom ones can be useful. The most obvious class is system resources you acquire through syscall or cgo, but I can imagine others.
Finalizers can help get a resource freed eventually even if the code using it omits a Close() or similar cleanup, but they're too unpredictable to be the main way to free resources. They don't run until GC does. Because the program could exit before next GC, you can't rely on them for things that must be done, like flushing buffered output to the filesystem. If GC does happen, it might not happen soon enough: if a finalizer is responsible for closing network connections, maybe a remote host hits its limit on open connections to you before GC, or your process hits its file-descriptor limit, or you run out of ephemeral ports, or something else. So it's much better to defer and do cleanup right when it's necessary than to use a finalizer and hope it's done soon enough.
You don't see many SetFinalizer calls in everyday Go programming, partly because the most important ones are in the standard library and mostly because of their limited range of applicability in general.
In short, finalizers can help by freeing forgotten resources in long-running programs, but because not much about their behavior is guaranteed, they aren't fit to be your main resource-management mechanism.
There are Finalizers in Go. I wrote a little blog post about it. They are even used for closing files in the standard library as you can see here.
However, I think using defer is more preferable because it's more readable and less magical.

COM `IStream` interface pointer and access from different threads

Is it an official COM requirement to any IStream implementation, that it should be thread-safe, in terms of concurrent access to IStream methods through the same interface pointer across threads?
I am not talking about data integrity (normally, reads/writes/seeks should be synchronized with locks anyway). The question is about the need to use COM marshaller to pass IStream object to a thread from different COM apartment.
This is a more general question than I asked about IStream as returned by CreateStreamOnHGlobal, please refer there for more technical details. I'm just trying to understand this stuff better.
EDITED, I have found this info on MSDN:
Thread safety. The stream created by SHCreateMemStream is thread-safe
as of Windows 8. On earlier systems, the stream is not thread-safe.
The stream created by CreateStreamOnHGlobal is thread-safe.
Now I believe, the IStream object returned by CreateStreamOnHGlobal is thread-safe, but there is NO requirement that other IStream implementations should follow this.
No, it isn't. And the accepted answer to the other question is dead wrong. Hans Passant's answer is correct. You should delete this question because it presupposes a falsehood, namely that CreateStreamOnHGlobal returns a thread-safe IStream. It doesn't. You then ask if this is true of other IStream implementations. It isn't.
In computer programming generally, and COM in particular, objects have guarantees they give and guarantees they do not give. If you use an object in conformance with its guarantees, then it will work all the time (barring bugs). If you exceed the guarantees, it may still work most of the time, but this is no longer guaranteed.
Generally in COM, the thread-safety guarantee is given by one of the standard threading models.
See here: http://msdn.microsoft.com/en-us/library/ms809971.aspx
Apartment threaded objects can be instantiated on multiple threads, but can only be used from the particular thread they were instantiated on.
Multi-threaded apartment objects can be instantiated in a multi-threaded apartment and can be used from any of those threads.
"Both"-threaded objects can be instantiated in any thread, and used from any thread.
Note: The threading model belongs to the object not the interface. Some objects supporting IStream may be single-threaded, others may be fully-thread safe. This depends on the code which implements the interface. Because an interface is just a specification, and thread-safety is not something covered by it.
It is always harmless to marshal an interface. If the threading models of the threads are compatible with the object's home thread, you will get the exact same interface pointer back. If they are not compatible, you will get a proxy. But it never hurts to marshal, and unless you know that the objects are compatible, you should always marshal.
However it is always open to an implementer to give additional guarantees.
In the case of CoMarshalInterthreadInterfaceInStream, you are told in the documentation that the returned IStream interface can be used for unmarshalling at the destination thread, using CoUnmarshalInterfaceAndReleaseStream.
That is, you have been given an additional guarantee. So you can rely on that working.
But that does not apply to any other instance of IStream at any time.
So you should always marshal them.

Can shared vals become a bottleneck for performance ? - Scala

We have a piece of code with a static field val format = DateTimeFormatter.forPattern("yyyy-MM-dd")
Now this instance of formatter will be used by concurrent threads to parse and print date format.parseDateTime("2013-09-24") and format.print(instant).
I learnt that in Scala you can write your code without caring for concurrency, provided that you only use immutable fields, but what about the performance ? Can it become a bottleneck if several threads use the same instance ?
Thanks,
Your question is more related to Java. If the implementation of the forPattern method is thread safe you can share it between many threads without any bottleneck.
Check the javadoc to see if the implementation is thread safe. In your specific case, I will assume that you are using the JodaTime library :
extract from DateTime Javadoc :
DateTimeFormat is thread-safe and immutable, and the formatters it returns are as well.
Has a counter example see SimpleDateFormat javadoc :
Date formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
Using a val just mean that the variable reference will not change after his declaration. see What is the difference between a var and val definition in Scala?
Well looks like you have mis-interpreted it: Concurrency in Scala is very much like Java (or any other language) and nothing much special. Just that it provides alternative libraries and syntactic sugar to get them done with much lesser boiler plate and more importantly do get them done safely (ex: akka).
But the other principles like: dependency on number of cores, thread-pool sizes, context switches etc etc will all have to be handled and taken care of.
Now for the question if a immutable val accessed by multiple threads degrades performance: I dont think there should be any over-head nor do I have data to support. But I think the performance might be good as the processor can cache it and the same object can retrieved faster in another core.

What does it mean when someone says that the module has both behavior and state?

As I understand I got a code review that my module has behavior and state at the same time, what does it mean anyway ?
Isn't that the whole point of object oriented programming, that instead of operating on data directly with logical circuitry using functions. We choose to operate on these closed black-boxes (encapsulation) using a set of neatly designed keys, switches and gears.
Wouldn't such a scheme naturally contain data(state) and logic(behavior) at the same time ?
By module I mean : a real Ruby module.
I designed something like this : How to design an application keeping SOLID principles and Design Patterns in mind
and implemented the commands in a module which I used to mixin.
Whatever you are referring to, be it an object defined by a class (or type), a module, or anything else with code in it, state is data that is persisted over multiple calls to the thing. If it "remembers" anything between one execution and the next, then it has state.
Behavior, otoh, is code that manipulates or processes that state-data, or non-state data that is used only during a single execution of the code, (like parameter values passed to a function). Methods, subroutines or functions, anything that changes or does something is behavior.
Most classes, types, or whatever, have both data (state) and behavior, but....
Some classes or types are designed simply to carry data around. They are referred to as Data Transfer objects or DTOs, or Plain Old Container Objects (POCOs). They only have state, and, generally, have little or no behavior.
Other times, a class or type is constructed to hold general utility functions, (like a Math Library). It will not maintain or keep any state between the many times it is called to perform one of its utilities. The only data used in it is data passed in as parameters for each call to the library function, and that data is discarded when the routine is finished. It has behavior. but no state.
You're right in your thinking that OOP encapsulates the ideas of both behaviour and state and mixes the two things together, but from the wording of your question, I'm wondering if you have written a ruby module (mixin, whatever you want to call it) that is stateful, such that there is the potential for state leakage across multiple uses of the same module.
Without seeing the code in question I can't really give you a full answer.
In Object-Oriented terminology, an object is said to have state when it encapsulates data (attributes, properties) and is said to have behavior when it offers operations (methods, procedures, functions) that operate (create, delete, modify, make calculations) on the data.
The same concepts can be extrapolated to a ruby module, it has "state" if it defined data accessible within the module, and it has "behavior" in the form of operations provided which operate on the data.

Resources