Are Java 8 streams similar to RxJava observables?
Java 8 stream definition:
Classes in the new java.util.stream package provide a Stream API to
support functional-style operations on streams of elements.
Short answer
All sequence/stream processing libs are offering very similar API for pipeline building. The differences are in API for handling multi-threading and composition of pipelines.
Long answer
RxJava is quite different from Stream. Of all JDK things, the closest to rx.Observable is perhaps java.util.stream.Collector Stream + CompletableFuture combo (which comes at a cost of dealing with extra monad layer, i. e. having to handle conversion between Stream<CompletableFuture<T>> and CompletableFuture<Stream<T>>).
There are significant differences between Observable and Stream:
Streams are pull-based, Observables are push-based. This may sound too abstract, but it has significant consequences that are very concrete.
Stream can only be used once, Observable can be subscribed to many times.
Stream#parallel() splits sequence into partitions, Observable#subscribeOn() and Observable#observeOn() do not; it is tricky to emulate Stream#parallel() behavior with Observable, it once had .parallel() method but this method caused so much confusion that .parallel() support was moved to separate repository: ReactiveX/RxJavaParallel: Experimental Parallel Extensions for RxJava. More details are in another answer.
Stream#parallel() does not allow to specify a thread pool to use, unlike most of RxJava methods accepting optional Scheduler. Since all stream instances in a JVM use the same fork-join pool, adding .parallel() can accidentally affect the behaviour in another module of your program.
Streams are lacking time-related operations like Observable#interval(), Observable#window() and many others; this is mostly because Streams are pull-based, and upstream has no control on when to emit next element downstream.
Streams offer restricted set of operations in comparison with RxJava. E.g. Streams are lacking cut-off operations (takeWhile(), takeUntil()); workaround using Stream#anyMatch() is limited: it is terminal operation, so you can't use it more than once per stream
As of JDK 8, there's no Stream#zip() operation, which is quite useful sometimes.
Streams are hard to construct by yourself, Observable can be constructed by many ways EDIT: As noted in comments, there are ways to construct Stream. However, since there's no non-terminal short-circuiting, you can't e. g. easily generate Stream of lines in file (JDK provides Files#lines() and BufferedReader#lines() out of the box though, and other similar scenarios can be managed by constructing Stream from Iterator).
Observable offers resource management facility (Observable#using()); you can wrap IO stream or mutex with it and be sure that the user will not forget to free the resource - it will be disposed automatically on subscription termination; Stream has onClose(Runnable) method, but you have to call it manually or via try-with-resources. E. g. you have to keep in mind that Files#lines() must be enclosed in try-with-resources block.
Observables are synchronized all the way through (I didn't actually check whether the same is true for Streams). This spares you from thinking whether basic operations are thread-safe (the answer is always 'yes', unless there's a bug), but the concurrency-related overhead will be there, no matter if your code need it or not.
Round-up
RxJava differs from Streams significantly. Real RxJava alternatives are other implementations of ReactiveStreams, e. g. relevant part of Akka.
Update
There's trick to use non-default fork-join pool for Stream#parallel, see Custom thread pool in Java 8 parallel stream.
Update
All of the above is based on the experience with RxJava 1.x. Now that RxJava 2.x is here, this answer may be out-of-date.
Java 8 Stream and RxJava looks pretty similar. They have look alike operators (filter, map, flatMap...) but are not built for the same usage.
You can perform asynchonus tasks using RxJava.
With Java 8 stream, you'll traverse items of your collection.
You can do pretty much the same thing in RxJava (traverse items of a collection) but, as RxJava is focussed on concurrent task, ..., it use synchronization, latch, ... So the same task using RxJava may be slower than with Java 8 stream.
RxJava can be compared to CompletableFuture, but that can be able to compute more than just one value.
There are a few technical and conceptional differences, for example, Java 8 streams are single use, pull based, synchronous sequences of values whereas RxJava Observables are re-observable, adaptively push-pull based, potentially asynchronous sequences of values. RxJava is aimed at Java 6+ and works on Android as well.
Java 8 Streams are pull based. You iterate over a Java 8 stream consuming each item. And it could be an endless stream.
RXJava Observable is by default push based. You subscribe to an Observable and you will get notified when the next item arrives (onNext), or when the stream is completed (onCompleted), or when an error occurred (onError).
Because with Observable you receive onNext, onCompleted, onError events, you can do some powerful functions like combining different Observables to a new one (zip, merge, concat). Other stuff you could do is caching, throttling, ...
And it uses more or less the same API in different languages (RxJava, RX in C#, RxJS, ...)
By default RxJava is single threaded. Unless you start using Schedulers, everything will happen on the same thread.
The existing answers are comprehensive and correct, but a clear example for beginners is lacking. Allow me to put some concrete behind terms like "push/pull-based" and "re-observable". Note: I hate the term Observable (it's a stream for heaven's sake), so will simply refer to J8 vs RX streams.
Consider a list of integers,
digits = [1,2,3,4,5]
A J8 Stream is a utility to modify the collection. For example even digits can be extracted as,
evens = digits.stream().filter(x -> x%2).collect(Collectors.toList())
This is basically Python's map, filter, reduce, a very nice (and long overdue) addition to Java. But what if digits weren't collected ahead of time - what if the digits were streaming in while the app was running - could we filter the even's in realtime.
Imagine a separate thread process is outputting integers at random times while the app is running (--- denotes time)
digits = 12345---6------7--8--9-10--------11--12
In RX, evencan react to each new digit and apply the filter in real-time
even = -2-4-----6---------8----10------------12
There's no need to store input and output lists. If you want an output list, no problem that's streamable too. In fact, everything is a stream.
evens_stored = even.collect()
This is why terms like "stateless" and "functional" are more associated with RX
RxJava is also closely related to the reactive streams initiative and considers it self as a simple implementation of the reactive streams API (e.g. compared to the Akka streams implementation). The main difference is, that the reactive streams are designed to be able to handle back pressure, but if you have a look at the reactive streams page, you will get the idea. They describe their goals pretty well and the streams are also closely related to the reactive manifesto.
The Java 8 streams are pretty much the implementation of an unbounded collection, pretty similar to the Scala Stream or the Clojure lazy seq.
Java 8 Streams enable processing of really large collections efficiently, while leveraging multicore architectures. In contrast, RxJava is single-threaded by default (without Schedulers). So RxJava won't take advantage of multi-core machines unless you code that logic yourself.
Related
In general promises and features are closely related, if not synonims. In the documentation of the gem https://github.com/ruby-concurrency it says so so too. Therefore, it's confusing which one should be used in what circumstances. And how are they related in the gem? Is one of a more low level or outdated than the other? Do they basically both do the same thing?
Also, existense of these make it even more confusing:
* Concurrent::Promises.future
* Concurrent::Promises::Future
* Concurrent::Future
* [...possibly something similar...]
In short, Future is about asynchronous execution. It's a high level abstraction over system threads (Ruby Thread class). So when you need to speed up and parallelize some calculations then you need to use threads. But it's some kind of low level things so it's easier to use Futures instead. The one of the benefits is that usually implementations of Futures use thread pool so you can manage the level of concurrency and system resource consuming. So the problem which Future solves is complexity of low-level threads concurrency model.
On the other hand Promise is about architecture and composition. Not about concurrency. It's a high level abstraction over callbacks and is a subtype of Observer pattern. It allows to decouple component which produces some result
and component which consumes this result. Both producer and consumer may know nothing about each other. So the problem which Promise solves is complexity and coupling that you get with callbacks approach.
So, regarding concurrent-ruby. They provide both classic versions of Promise and Future. But now they also provide a new API - Promises (with s at the end) which looks like just a combination of promises and futures with uniformed API. It implements (actually reuses) Promise and Future libraries but provides some new interface over them. For instance future behaves similar to promise and allows to register callbacks and can be chained similar to promises.
I am looking into the documentation of the Spliterator and according to it, the Spliterator is not thread-safe:
Despite their obvious utility in parallel algorithms, spliterators are not expected to be thread-safe; instead, implementations of parallel algorithms using spliterators should ensure that the spliterator is only used by one thread at a time. This is generally easy to attain via serial thread-confinement, which often is a natural consequence of typical parallel algorithms that work by recursive decomposition.
But, in its further documentation, which states a contradictory statement to the above statement:
Structural interference of a source can be managed in the following ways (in approximate order of decreasing desirability):
The source manages concurrent modifications.
For example, a key set of a java.util.concurrent.ConcurrentHashMap is a concurrent source. A Spliterator created from the source reports a characteristic of CONCURRENT.
So does that mean a Spliterator generated from a thread-safe collection would be thread-safe? Is it right?
No, a Spliterator reporting the CONCURRENT characteristic will have a thread safe source, which implies that it can iterate over it safely even when the source gets modified concurrently. But the Spliterator itself may still have state that must not be manipulated concurrently.
Note that your cite stems from a description of how “structural interference of a source can be managed”, not about the spliterator’s behavior in general.
This is also provided at the documentation of the CONCURRENT characteristic itself:
Characteristic value signifying that the element source may be safely concurrently modified (allowing additions, replacements, and/or removals) by multiple threads without external synchronization. If so, the Spliterator is expected to have a documented policy concerning the impact of modifications during traversal.
Nothing else.
So the consequences of these characteristics are astonishing small. A Spliterator reporting either CONCURRENT or IMMUTABLE will never throw a ConcurrentModificationException, that’s all. In all other regards, the differences between these characteristics will not be recognized by the Stream API, as the Stream API never performs any source manipulations, in fact, it doesn’t actually know the source (other than indirectly through the Spliterator), so it couldn’t do such manipulations nor detect whether a concurrent modification has happened.
This may sounds like I'm begging to start a flame war, but hear me out.
In some languages laziness is expensive. For example, in Ruby, where I have the most recent experience, laziness is slow because it's achieved using fibers, so it's only attractive when:
you must trade off cpu for memory (think paging through large data set)
the performance penalty is worth it to hide details (yielding to fibers is a great way to abstract away complexity instead of passing down blocks to run in mysterious places)
Otherwise you'll definitely want to use the normal, eager methods.
My initial investigation suggests that the overhead for laziness in Elixir is much lower (this thread on reddit backs me up), so there seems little reason to ever use Enum instead of Stream for those things which Stream can do.
Is there something I'm missing, since I assume Enum exists for a reason and implements some of the same functions as Stream. What cases, if any, would I want to use Enum instead of Stream when I could use Stream?
For short lists, Stream will be slower than simply using Enum, but there's no clear rule there without benchmarking exactly what you are doing. There are also some functions that exist in Enum, but don't have corresponding functions in Stream. (for example, Enum.reverse )
The real reason you need both is that Stream is just a composition of functions. Every pipeline that needs results, rather than side effects needs to end in an Enum to get the pipeline to run.
They go hand in hand, Stream couldn't stand alone. What Stream is largely doing is giving you a very handy abstraction for creating very complex reduce functions.
The methods in Stream essentially create a "recipe list" of transformations over your data while the methods in Enum actually resolve these transformations. So you eventually will have to use an Enum function to resolve your data transformation even if everything else is a Stream.
Also some concepts, namely Reduce, have no real meaning in Stream and you must use Enum.
As for performance, if you have a series of transformations you're performing, a possibly infinite stream of data, or you're reading a file, use Stream. If you've just one transformation over a finite enumerable or you need to resolve a Stream, use Enum.
I saw a SO question yesterday about implementing a classic linked list in Java. It was clearly an assignment from an undergraduate data structures class. It's easy to find questions and implementations for lists, trees, etc. in all languages.
I've been learning about Java lambdas and trying to use them at every opportunity to get the idiom under my fingers. This question made me wonder: How would I write a custom list or tree so I could use it in all the Java 8 lambda machinery?
All the examples I see use the built in collections. Those work for me. I'm more curious about how a professor teaching data structures ought to rethink their techniques to reflect lambdas and functional programming.
I started with an Iterator,but it doesn't appear to be fully featured.
Does anyone have any advice?
Exposing a stream view of arbitrary data structures is pretty easy. The key interface you have to implement is Spliterator, which, as the name suggests, combines two things -- sequential element access (iteration) and decomposition (splitting).
Once you have a Spliterator, you can turn that into a stream easily with StreamSupport.stream(). In fact, here's the stream() method from AbstractCollection (which most collections just inherit):
default Stream<E> stream() {
return StreamSupport.stream(spliterator(), false);
}
All the real work is in the spliterator() method -- and there's a broad range of spliterator quality (the absolute minimum you need to implement is tryAdvance, but if that's all you implement, it will work sequentially, but will lose out on most of the stream optimizations.) Look in the JDK sources Arrays.stream(), IntStream.range()) for examples of how to do better.)
I'd look at http://www.javaslang.io for inspiration, a library that does exactly what you want to do: Implement custom lists, trees, etc. in a Java 8 manner.
It specifically doesn't closely couple with the JDK collections outside of importing/exporting methods, but re-implements all the immutable collection semantics that a Scala (or other FP language) developer would expect.
Thus is s a fairly basic question, but I am new to Twisted. if the the reactor loop encounters 2 callLaters for the exact same timeout value and also encounters an incoming packet, how will it schedule the 3?
The callLaters would fire in the order that you registered them. The packet arrival could fire before or after the callLaters depending on the point of execution in the event loop when the packet arrives.
There is no definitive rule here. Different reactors may implement different strategies. In general these implementations are somewhat ad-hoc and not particularly well designed, but there isn't a lot of motivation to fix them, because most applications with deep ordering dependencies on different event sources are actually just buggy, and should be fixed not to care what order these fundamentally non-deterministic events arrive in.