Promise or Future in ruby-concurrency? When should be used which? - ruby

In general promises and features are closely related, if not synonims. In the documentation of the gem https://github.com/ruby-concurrency it says so so too. Therefore, it's confusing which one should be used in what circumstances. And how are they related in the gem? Is one of a more low level or outdated than the other? Do they basically both do the same thing?
Also, existense of these make it even more confusing:
* Concurrent::Promises.future
* Concurrent::Promises::Future
* Concurrent::Future
* [...possibly something similar...]

In short, Future is about asynchronous execution. It's a high level abstraction over system threads (Ruby Thread class). So when you need to speed up and parallelize some calculations then you need to use threads. But it's some kind of low level things so it's easier to use Futures instead. The one of the benefits is that usually implementations of Futures use thread pool so you can manage the level of concurrency and system resource consuming. So the problem which Future solves is complexity of low-level threads concurrency model.
On the other hand Promise is about architecture and composition. Not about concurrency. It's a high level abstraction over callbacks and is a subtype of Observer pattern. It allows to decouple component which produces some result
and component which consumes this result. Both producer and consumer may know nothing about each other. So the problem which Promise solves is complexity and coupling that you get with callbacks approach.
So, regarding concurrent-ruby. They provide both classic versions of Promise and Future. But now they also provide a new API - Promises (with s at the end) which looks like just a combination of promises and futures with uniformed API. It implements (actually reuses) Promise and Future libraries but provides some new interface over them. For instance future behaves similar to promise and allows to register callbacks and can be chained similar to promises.

Related

Does it make sense to use actor/agent oriented programming in Function as a Service environment?

I am wondering, if is it possible to apply agent/actor library (Akka, Orbit, Quasar, JADE, Reactors.io) in Function as a Service environment (OpenWhisk, AWS Lambda)?
Does it make sense?
If yes, what is minimal example hat presents added value (that is missing when we are using only FaaS or only actor/agent library)?
If no, then are we able to construct decision graph, that can help us decide, if for our problem should we use actor/agent library or FaaS (or something else)?
This is more opinion based question, but I think, that in the current shape there's no sense in putting actors into FaaS - opposite works actually quite well: OpenWhisk is implemented on top of Akka actually.
There are several reasons:
FaaS current form is inherently stateless, which greatly simplifies things like request routing. Actors are stateful by nature.
From my experience FaaS functions are usually disjointed - ofc you need some external resources, but this is the mental model: generic resources and capabilities. In actor models we tend to think in category of particular entities represented as actors i.e. user Max, rather than table of users. I'm not covering here the scope of using actors solely as unit of concurrency.
FaaS applications have very short lifespan - this is one of the founding stones behind them. Since creation, placement and state recovery for more complex actors may take a while, and you usually need a lot of them to perform a single task, you may end up at point when restoring the state of a system takes more time than actually performing the task, that state is needed for.
That being said, it's possible that in the future, those two approaches will eventually converge, but it needs to be followed with changes in both mental and infrastructural model (i.e. actors live in runtime, which FaaS must be aware of). IMO setting up existing actor frameworks on top of existing FaaS providers is not feasible at this point.

Difference between Java 8 streams and RxJava observables

Are Java 8 streams similar to RxJava observables?
Java 8 stream definition:
Classes in the new java.util.stream package provide a Stream API to
support functional-style operations on streams of elements.
Short answer
All sequence/stream processing libs are offering very similar API for pipeline building. The differences are in API for handling multi-threading and composition of pipelines.
Long answer
RxJava is quite different from Stream. Of all JDK things, the closest to rx.Observable is perhaps java.util.stream.Collector Stream + CompletableFuture combo (which comes at a cost of dealing with extra monad layer, i. e. having to handle conversion between Stream<CompletableFuture<T>> and CompletableFuture<Stream<T>>).
There are significant differences between Observable and Stream:
Streams are pull-based, Observables are push-based. This may sound too abstract, but it has significant consequences that are very concrete.
Stream can only be used once, Observable can be subscribed to many times.
Stream#parallel() splits sequence into partitions, Observable#subscribeOn() and Observable#observeOn() do not; it is tricky to emulate Stream#parallel() behavior with Observable, it once had .parallel() method but this method caused so much confusion that .parallel() support was moved to separate repository: ReactiveX/RxJavaParallel: Experimental Parallel Extensions for RxJava. More details are in another answer.
Stream#parallel() does not allow to specify a thread pool to use, unlike most of RxJava methods accepting optional Scheduler. Since all stream instances in a JVM use the same fork-join pool, adding .parallel() can accidentally affect the behaviour in another module of your program.
Streams are lacking time-related operations like Observable#interval(), Observable#window() and many others; this is mostly because Streams are pull-based, and upstream has no control on when to emit next element downstream.
Streams offer restricted set of operations in comparison with RxJava. E.g. Streams are lacking cut-off operations (takeWhile(), takeUntil()); workaround using Stream#anyMatch() is limited: it is terminal operation, so you can't use it more than once per stream
As of JDK 8, there's no Stream#zip() operation, which is quite useful sometimes.
Streams are hard to construct by yourself, Observable can be constructed by many ways EDIT: As noted in comments, there are ways to construct Stream. However, since there's no non-terminal short-circuiting, you can't e. g. easily generate Stream of lines in file (JDK provides Files#lines() and BufferedReader#lines() out of the box though, and other similar scenarios can be managed by constructing Stream from Iterator).
Observable offers resource management facility (Observable#using()); you can wrap IO stream or mutex with it and be sure that the user will not forget to free the resource - it will be disposed automatically on subscription termination; Stream has onClose(Runnable) method, but you have to call it manually or via try-with-resources. E. g. you have to keep in mind that Files#lines() must be enclosed in try-with-resources block.
Observables are synchronized all the way through (I didn't actually check whether the same is true for Streams). This spares you from thinking whether basic operations are thread-safe (the answer is always 'yes', unless there's a bug), but the concurrency-related overhead will be there, no matter if your code need it or not.
Round-up
RxJava differs from Streams significantly. Real RxJava alternatives are other implementations of ReactiveStreams, e. g. relevant part of Akka.
Update
There's trick to use non-default fork-join pool for Stream#parallel, see Custom thread pool in Java 8 parallel stream.
Update
All of the above is based on the experience with RxJava 1.x. Now that RxJava 2.x is here, this answer may be out-of-date.
Java 8 Stream and RxJava looks pretty similar. They have look alike operators (filter, map, flatMap...) but are not built for the same usage.
You can perform asynchonus tasks using RxJava.
With Java 8 stream, you'll traverse items of your collection.
You can do pretty much the same thing in RxJava (traverse items of a collection) but, as RxJava is focussed on concurrent task, ..., it use synchronization, latch, ... So the same task using RxJava may be slower than with Java 8 stream.
RxJava can be compared to CompletableFuture, but that can be able to compute more than just one value.
There are a few technical and conceptional differences, for example, Java 8 streams are single use, pull based, synchronous sequences of values whereas RxJava Observables are re-observable, adaptively push-pull based, potentially asynchronous sequences of values. RxJava is aimed at Java 6+ and works on Android as well.
Java 8 Streams are pull based. You iterate over a Java 8 stream consuming each item. And it could be an endless stream.
RXJava Observable is by default push based. You subscribe to an Observable and you will get notified when the next item arrives (onNext), or when the stream is completed (onCompleted), or when an error occurred (onError).
Because with Observable you receive onNext, onCompleted, onError events, you can do some powerful functions like combining different Observables to a new one (zip, merge, concat). Other stuff you could do is caching, throttling, ...
And it uses more or less the same API in different languages (RxJava, RX in C#, RxJS, ...)
By default RxJava is single threaded. Unless you start using Schedulers, everything will happen on the same thread.
The existing answers are comprehensive and correct, but a clear example for beginners is lacking. Allow me to put some concrete behind terms like "push/pull-based" and "re-observable". Note: I hate the term Observable (it's a stream for heaven's sake), so will simply refer to J8 vs RX streams.
Consider a list of integers,
digits = [1,2,3,4,5]
A J8 Stream is a utility to modify the collection. For example even digits can be extracted as,
evens = digits.stream().filter(x -> x%2).collect(Collectors.toList())
This is basically Python's map, filter, reduce, a very nice (and long overdue) addition to Java. But what if digits weren't collected ahead of time - what if the digits were streaming in while the app was running - could we filter the even's in realtime.
Imagine a separate thread process is outputting integers at random times while the app is running (--- denotes time)
digits = 12345---6------7--8--9-10--------11--12
In RX, evencan react to each new digit and apply the filter in real-time
even = -2-4-----6---------8----10------------12
There's no need to store input and output lists. If you want an output list, no problem that's streamable too. In fact, everything is a stream.
evens_stored = even.collect()
This is why terms like "stateless" and "functional" are more associated with RX
RxJava is also closely related to the reactive streams initiative and considers it self as a simple implementation of the reactive streams API (e.g. compared to the Akka streams implementation). The main difference is, that the reactive streams are designed to be able to handle back pressure, but if you have a look at the reactive streams page, you will get the idea. They describe their goals pretty well and the streams are also closely related to the reactive manifesto.
The Java 8 streams are pretty much the implementation of an unbounded collection, pretty similar to the Scala Stream or the Clojure lazy seq.
Java 8 Streams enable processing of really large collections efficiently, while leveraging multicore architectures. In contrast, RxJava is single-threaded by default (without Schedulers). So RxJava won't take advantage of multi-core machines unless you code that logic yourself.

Event Scheduling in Twisted

Thus is s a fairly basic question, but I am new to Twisted. if the the reactor loop encounters 2 callLaters for the exact same timeout value and also encounters an incoming packet, how will it schedule the 3?
The callLaters would fire in the order that you registered them. The packet arrival could fire before or after the callLaters depending on the point of execution in the event loop when the packet arrives.
There is no definitive rule here. Different reactors may implement different strategies. In general these implementations are somewhat ad-hoc and not particularly well designed, but there isn't a lot of motivation to fix them, because most applications with deep ordering dependencies on different event sources are actually just buggy, and should be fixed not to care what order these fundamentally non-deterministic events arrive in.

Is it possible to create thread-safe collections without locks?

This is pure just for interest question, any sort of questions are welcome.
So is it possible to create thread-safe collections without any locks? By locks I mean any thread synchronization mechanisms, including Mutex, Semaphore, and even Interlocked, all of them. Is it possible at user level, without calling system functions? Ok, may be implementation is not effective, i am interested in theoretical possibility. If not what is the minimum means to do it?
EDIT: Why immutable collections don't work.
This of class Stack with methods Add that returns another Stack.
Now here is program:
Stack stack = new ...;
ThreadedMethod()
{
loop
{
//Do the loop
stack = stack.Add(element);
}
}
this expression stack = stack.Add(element) is not atomic, and you can overwrite new stack from other thread.
Thanks,
Andrey
There seem to be misconceptions by even guru software developers about what constitutes a lock.
One has to make a distinction between atomic operations and locks. Atomic operations like compare and swap perform an operation (which would otherwise require two or more instructions) as a single uninterruptible instruction. Locks are built from atomic operations however they can result in threads busy-waiting or sleeping until the lock is unlocked.
In most cases if you manage to implement an parallel algorithm with atomic operations without resorting to locking you will find that it will be orders of magnitude faster. This is why there is so much interest in wait-free and lock-free algorithms.
There has been a ton of research done on implementing various wait-free data-structures. While the code tends to be short, they can be notoriously hard to prove that they really work due to the subtle race conditions that arise. Debugging is also a nightmare. However a lot of work has been done and you can find wait-free/lock-free hashmaps, queues (Michael Scott's lock free queue), stacks, lists, trees, the list goes on. If you're lucky you'll also find some open-source implementations.
Just google 'lock-free my-data-structure' and see what you get.
For further reading on this interesting subject start from The Art of Multiprocessor Programming by Maurice Herlihy.
Yes, immutable collections! :)
Yes, it is possible to do concurrency without any support from the system. You can use Peterson's algorithm or the more general bakery algorithm to emulate a lock.
It really depends on how you define the term (as other commenters have discussed) but yes, it's possible for many data structures, at least, to be implemented in a non-blocking way (without the use of traditional mutual-exclusion locks).
I strongly recommend, if you're interested in the topic, that you read the blog of Cliff Click -- Cliff is the head guru at Azul Systems, who produce hardware + a custom JVM to run Java systems on massive and massively parallel (think up to around 1000 cores and in the hundreds of gigabytes of RAM area), and obviously in those kinds of systems locking can be death (disclaimer: not an employee or customer of Azul, just an admirer of their work).
Dr Click has famously come up with a non-blocking HashTable, which is basically a complex (but quite brilliant) state machine using atomic CompareAndSwap operations.
There is a two-part blog post describing the algorithm (part one, part two) as well as a talk given at Google (slides, video) -- the latter in particular is a fantastic introduction. Took me a few goes to 'get' it -- it's complex, let's face it! -- but if you presevere (or if you're smarter than me in the first place!) you'll find it very rewarding.
I don't think so.
The problem is that at some point you will need some mutual exclusion primitive (perhaps at the machine level) such as an atomic test-and-set operation. Otherwise, you could always devise a race condition. Once you have a test-and-set, you essentially have a lock.
That being said, in older hardware that did not have any support for this in the instruction set, you could disable interrupts and thus prevent another "process" from taking over but effectively constantly putting the system into a serialized mode and forcing sort of a mutual exclusion for a while.
At the very least you need atomic operations. There are lock free algorithms for single cpu's. I'm not sure about multiple CPU's

Actor model to replace the threading model?

I read a chapter in a book (Seven languages in Seven Weeks by Bruce A. Tate) about Matz (Inventor of Ruby) saying that 'I would remove the thread and add actors, or some other more advanced concurrency features'.
Why and how an actor model can be an advanced concurrency model that replaces the threading?
What other models are the 'advanced concurrency model'?
It's not so much that the actor model will replace threads; at the level of the cpu, processes will still have multiple threads which are scheduled and run on the processor cores. The idea of actors is to replace this underlying complexity with a model which, its proponents argue, makes it easier for programmers to write reliable code.
The idea of actors is to have separate threads of control (processes in Erlang parlance) which communicate exclusively by message passing. A more traditional programming model would be to share memory, and coordinate communication between threads using mutexes. This still happens under the surface in the actor model, but the details are abstracted away, and the programmer is given reliable primitives based on message passing.
One important point is that actors do not necessarily map 1-1 to threads -- in the case of Erlang, they definitely don't -- there would normally be many Erlang processes per kernel thread. So there has to be a scheduler which assigns actors to threads, and this detail is also abstracted away from the application programmer.
If you're interested in the actor model, you might want to take look at the way it works in Erlang or Scala.
If you're interested in other types of new concurrency hotness, you might want to look at software transactional memory, a different approach that can be found in clojure and haskell.
It bears mentioning that many of the more aggressive attempts at creating advanced concurrency models appear to be happening in functional languages. Possibly due to the belief (I drink some of this kool-aid myself) that immutability makes concurrency much easier.
I made this question my favorite and am waiting for answers, but since there still isn't, here is mine..
Why and how an actor model can be an
advanced concurrency model that
replaces the threading?
Actors can get rid of mutable shared state, which is very difficult to code right. (My understanding is that) actors can basically thought as objects with their own thread(s). You send messages between actors that will be queued and consumed by the thread within the actor. So, whatever state in the actor is encapsulated, and will not be shared. So it is easy to code right.
see also http://www.slideshare.net/jboner/state-youre-doing-it-wrong-javaone-2009
What other models are the 'advanced
concurrency model'?
see http://www.slideshare.net/jboner/state-youre-doing-it-wrong-javaone-2009
See Dataflow Programming. It's an approach, which is a layer over top of the usual OOP design. In some words:
there are a scene, where Components resides;
Components have Ports: Producers (output, which generate messages) and Consumers (input, which process messages);
there are Messages pre-defined between Components: one Component's Producer port is bind with another's Consumer.
The programming is going on 3 layer:
writing the dataflow system (language, framework/server, component API),
writing Components (system, basic, and domain-oriented ones),
creating the dataflow program: placing components into the scene, and define messages between them.
Wikipedia articles are good starting point to understand the business: http://en.wikipedia.org/wiki/Flow-based_programming
See also "actor model", "dataflow programming" etc.
Please the following paper
Actor Model of Computation
Also please see
ActorScript(TM) extension of C#(TM), Java(TM), and Objective C(TM): iAdaptive(TM) concurrency for antiCloud(TM) privacy and securitY

Resources