Reactor use doOnNext multiple times - reactor

I was trying to change a little reactor samples from here and I'm a little confused with the behaviour I get.
So first I code like this:
EmitterProcessor<String> stream = EmitterProcessor.<String>create().connect();
Flux<String> flux = stream
.doOnNext(s -> System.out.println("1 " + s))
.doOnNext(s -> System.out.println("2 " + s));
flux.subscribe();
stream.onNext("Hello");
This code prints two lines as expected:
1 Hello
2 Hello
But if I add an intermediate varaible pretending I get it from some method or for readability the code starts to behave differently.
EmitterProcessor<String> stream = EmitterProcessor.<String>create().connect();
Flux<String> flux = stream
.doOnNext(s -> System.out.println("1 " + s));
flux .doOnNext(s -> System.out.println("2 " + s));
flux.subscribe();
stream.onNext("Hello");
So for the code above I get only one line, that is:
1 Hello
Can anybody explain this behaviour?

Thanks to Stephane Maldini I realised that Flux is immutable and each operation produces different flows.
Discussion is here

Related

Execution time reactive programming

Is this an ideal way to find execution time of method (getFavouriteDetails()), in reactive programming ?
public List<Favourites> getFavouriteDetails(String userId){
userService.getFavorites(userId)
.flatMap(favoriteService::getDetails)
.switchIfEmpty(suggestionService.getSuggestions())
.take(5)
.publishOn(UiUtils.uiThreadScheduler())
.subscribe(uiList::show, UiUtils::errorPopup)
.flatMap(a -> Mono.subscriberContext().map(ctx -> {
log.info("Time taken : " + Duration.between(ctx.get(key), Instant.now()).toMillis() + " milliseconds.");
return a;
}))
.subscriberContext(ctx -> ctx.put(key, Instant.now()))
}
Two approaches to ensure that you only measure execution time when you subscribe -
Wrap a Mono around the Flux using flatMapMany. This returns a Flux as well.
Use an AtomicReference, set time in onSubscribe and log elapsed time in doFinally.
Sample code -
timeFluxV1(getFavouriteDetails(userId)).subscribe(uiList::show, UiUtils::errorPopup);
timeFluxV1(getFavouriteDetails(userId)).subscribe(uiList::show, UiUtils::errorPopup);
private <T> Flux<T> timeFluxV1(Flux<T> flux) {
return Mono.fromSupplier(System::nanoTime)
.flatMapMany(time -> flux.doFinally(sig -> log.info("Time taken : " + TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - time) + " milliseconds.")));
}
private <T> Flux<T> timeFluxV2(Flux<T> flux) {
AtomicReference<Long> startTime = new AtomicReference<>();
return flux.doOnSubscribe(x -> startTime.set(System.nanoTime()))
.doFinally(x -> log.info("Time taken : " + TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTime.get()) + " milliseconds."));
}
public Flux<Favourites> getFavouriteDetails(String userId) {
return userService.getFavorites(userId)
.flatMap(favoriteService::getDetails)
.switchIfEmpty(suggestionService.getSuggestions())
.take(5)
.publishOn(UiUtils.uiThreadScheduler());
}
To time a method, the most basic way in Java is to use long System.nanoTime(). Instant and System.currentTimeMillis are for wall-clock operations and are not guaranteed to be monotonous nor precise enough...
In Reactor, to measure the time a sequence takes to complete, you would usually need to start the timing on subscription (nothing happens until you subscribe) and stop the timing within a doFinally (which execute some code on the side of the main sequence whenever it completes, errors or is cancelled).
Here however you are subscribing yourself, so there is no risk to be multiple subscriptions. You can thus do away with the "start timing on subscription" constraint.
It gives us something like this:
public List<Favourites> getFavouriteDetails(String userId){
final long start = System.nanoTime();
userService.getFavorites(userId)
.flatMap(favoriteService::getDetails)
.switchIfEmpty(suggestionService.getSuggestions())
.take(5)
.publishOn(UiUtils.uiThreadScheduler())
.doFinally(endType -> log.info("Time taken : " + TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start) + " milliseconds."))
.subscribe(uiList::show, UiUtils::errorPopup);
//return needed!
}
Note that there is also a elapsed() operator, which measures the time between subscription and 1st onNext, then between subsequent onNexts. It outputs a Flux<Tuple2<Long, T>>, and you could aggregate the longs to get overall timing, but that would lose you the "realtime" nature of Ts in that case.

Joining twice the same stream

I would like use the joining collector twice on a same stream for produce a string like this Tea:5 - Coffee:3 - Money:10 .
Drink is enum with an Bigdecimal attribute (price).
currently I done like this :
Map<Drink, Long> groupByDrink = listOfDrinks.stream().collect(groupingBy(identity(),counting()));
String acc = groupByDrink.entrySet().stream().map(ite -> join(":", ite.getKey().code(), ite.getValue().toString())).collect(joining(" - "));
acc += " - Money:" + groupByDrink.entrySet().stream().map(ite -> ite.getKey().price().multiply(valueOf(ite.getValue()))).reduce(ZERO, BigDecimal::add);
I think, you are overusing new features.
join(":", ite.getKey().code(), ite.getValue().toString())
bears no advantage over the classical
ite.getKey().code()+":"+ite.getValue()
Besides that, I’m not sure what you mean with “use the joining collector twice on a same stream”. If you want to use the joining collector for the summary element as well, you have to concat it as stream before collecting:
String acc = Stream.concat(
groupByDrink.entrySet().stream()
.map(ite -> ite.getKey().code()+":"+ite.getValue()),
Stream.of("Money:" + groupByDrink.entrySet().stream()
.map(ite -> ite.getKey().price().multiply(valueOf(ite.getValue())))
.reduce(ZERO, BigDecimal::add).toString())
).collect(joining(" - "));

How i can send data from mnesia via websocket

I have mnesia DB with table artists:
(gw#gw)227> lookup:for_test().
{atomic,["Baltic Baroque","Anna Luca","Karel Boehlee Trio",
"Bill Evans","Dino Saluzzi and Anja Lechner",
"Bill Evans Trio with Stan Getz","Duke Pearson",
"The John Butler Trio"]}
(gw#gw)228>
I want send this list to client by websocket in YAWS, but how i can do it? I broke my mind ... and nothing working. Please help by any info.
Best regards!
You have to convert this list of string to binaries and the pass is across over web socket, by the way what kind of data handling are you doing at the receiving end I mean you want to send this as JSON or just Comma separated values?
Yeeeees ... i solve my problem!
My YAWS Handle
handle_message({text, <<"q2">>}) ->
Var555 = unicode:characters_to_binary(my_json4:handle()),
{reply, {text, <<Var555/binary>>}};
And my mnesia select + convert to JSON. my_json4.erl
Data = [{obj,
[{art_id, Line#tmp.artist_id},
{art, unicode:characters_to_binary(Line#tmp.artist)},
{alb_id, Line#tmp.album_id},
{alb, unicode:characters_to_binary(Line#tmp.album)},
{path, unicode:characters_to_binary(Line#tmp.albumpath)},
{image, unicode:characters_to_binary(Line#tmp.image)},
{tracks, [unicode:characters_to_binary(X) ||X <- Line#tmp.tracks]}]}
|| Line <- Lines],
JsonData = {obj, [{data, Data}]},
rfc4627:encode(JsonData).
handle() ->
QHandle = qlc:q( [ X ||
X <- mnesia:table(artists),
X#artists.artist_id == 2]
),
Records = do(qlc:q([{tmp, X#artists.artist_id, X#artists.artist, A#albums.album_id, A#albums.album, A#albums.albumpath, A#albums.image, A#albums.tracklist} ||
X <- QHandle,
A <- mnesia:table(albums),
A#albums.artist_id == X#artists.artist_id])
),
Json = convert_to_json(Records),
Json.
do(Q) ->
F = fun() ->
qlc:e(Q)
end,
{atomic, Value} = mnesia:transaction(F),
Value.

Simple debugging in Haskell

I am new to Haskell. Previously I have programmed in Python and Java. When I am debugging some code I have a habit of littering it with print statements in the middle of code. However doing so in Haskell will change semantics, and I will have to change my function signatures to those with IO stuff. How do Haskellers deal with this? I might be missing something obvious. Please enlighten.
Other answers link the official doco and the Haskell wiki but if you've made it to this answer let's assume you bounced off those for whatever reason. The wikibook also has an example using Fibonacci which I found more accessible. This is a deliberately basic example which might hopefully help.
Let's say we start with this very simple function, which for important business reasons, adds "bob" to a string, then reverses it.
bobreverse x = reverse ("bob" ++ x)
Output in GHCI:
> bobreverse "jill"
"llijbob"
We don't see how this could possibly be going wrong, but something near it is, so we add debug.
import Debug.Trace
bobreverse x = trace ("DEBUG: bobreverse" ++ show x) (reverse ("bob" ++ x))
Output:
> bobreverse "jill"
"DEBUG: bobreverse "jill"
llijbob"
We are using show just to ensure x is converted to a string correctly before output. We also added some parenthesis to make sure the arguments were grouped correctly.
In summary, the trace function is a decorator which prints the first argument and returns the second. It looks like a pure function, so you don't need to bring IO or other signatures into the functions to use it. It does this by cheating, which is explained further in the linked documentation above, if you are curious.
Read this. You can use Debug.Trace.trace in place of print statements.
I was able to create a dual personality IO / ST monad typeclass, which will print debug statements when a monadic computation is typed as IO, them when it's typed as ST. Demonstration and code here: Haskell -- dual personality IO / ST monad? .
Of course Debug.Trace is more of a swiss army knife, especially when wrapped with a useful special case,
trace2 :: Show a => [Char] -> a -> a
trace2 name x = trace (name ++ ": " ++ show x) x
which can be used like (trace2 "first arg" 3) + 4
edit
You can make this even fancier if you want source locations
{-# LANGUAGE TemplateHaskell #-}
import Language.Haskell.TH
import Language.Haskell.TH.Syntax as TH
import Debug.Trace
withLocation :: Q Exp -> Q Exp
withLocation f = do
let error = locationString =<< location
appE f error
where
locationString :: Loc -> Q Exp
locationString loc = do
litE $ stringL $ formatLoc loc
formatLoc :: Loc -> String
formatLoc loc = let file = loc_filename loc
(line, col) = loc_start loc
in concat [file, ":", show line, ":", show col]
trace3' (loc :: String) msg x =
trace2 ('[' : loc ++ "] " ++ msg) x
trace3 = withLocation [| trace3' |]
then, in a separate file [from the definition above], you can write
{-# LANGUAGE TemplateHaskell #-}
tr3 x = $trace3 "hello" x
and test it out
> tr3 4
[MyFile.hs:2:9] hello: 4
You can use Debug.Trace for that.
I really liked Dons short blog about it:
https://donsbot.wordpress.com/2007/11/14/no-more-exceptions-debugging-haskell-code-with-ghci/
In short: use ghci, example with a program with code called HsColour.hs
$ ghci HsColour.hs
*Main> :set -fbreak-on-exception
*Main> :set args "source.hs"
Now run your program with tracing on, and GHCi will stop your program at the call to error:
*Main> :trace main
Stopped at (exception thrown)
Ok, good. We had an exception… Let’s just back up a bit and see where we are. Watch now as we travel backwards in time through our program, using the (bizarre, I know) “:back” command:
[(exception thrown)] *Main> :back
Logged breakpoint at Language/Haskell/HsColour/Classify.hs:(19,0)-(31,46)
_result :: [String]
This tells us that immediately before hitting error, we were in the file Language/Haskell/HsColour/Classify.hs, at line 19. We’re in pretty good shape now. Let’s see where exactly:
[-1: Language/Haskell/HsColour/Classify.hs:(19,0)-(31,46)] *Main> :list
18 chunk :: String -> [String]
vv
19 chunk [] = head []
20 chunk ('\r':s) = chunk s -- get rid of DOS newline stuff
21 chunk ('\n':s) = "\n": chunk s
^^

How to achieve Asynchrony instead of Parallelism in F#

(Sticking to a common example with async fetch of many web pages)
How would I spin off multiple (hundreds) of web page requests asynchronously, and then wait for all requests to complete before going to the next step? Async.AsParallel processes a few requests at a time, controlled by number of cores on the CPU. Grabbing a web page is not a CPU-bound operation. Not satisfied with the speedup of Async.AsParallel, I am looking for alternatives.
I tried to connect the dots between Async.StartAsTask and Task[].WaitAll. Instinctively, I wrote the following code, but it does not compile.
let processItemsConcurrently (items : int seq) =
let tasks = items |> Seq.map (fun item -> Async.StartAsTask(fetchAsync item))
Tasks.Task.WaitAll(tasks)
How would you approach this?
Async.Parallel is almost definitely right here. Not sure what you're not happy with; the strength of F# asyncs lies more in async computing than in task-parallel CPU-bound stuff (which is more tailored to Tasks and the .NET 4.0 TPL). Here's a full example:
open System.Diagnostics
open System.IO
open System.Net
open Microsoft.FSharp.Control.WebExtensions
let sites = [|
"http://bing.com"
"http://google.com"
"http://cnn.com"
"http://stackoverflow.com"
"http://yahoo.com"
"http://msdn.com"
"http://microsoft.com"
"http://apple.com"
"http://nfl.com"
"http://amazon.com"
"http://ebay.com"
"http://expedia.com"
"http://twitter.com"
"http://reddit.com"
"http://hulu.com"
"http://youtube.com"
"http://wikipedia.org"
"http://live.com"
"http://msn.com"
"http://wordpress.com"
|]
let print s =
// careful, don't create a synchronization bottleneck by printing
//printf "%s" s
()
let printSummary info fullTimeMs =
Array.sortInPlaceBy (fun (i,_,_) -> i) info
// for i, size, time in info do
// printfn "%2d %7d %5d" i size time
let longest = info |> Array.map (fun (_,_,time) -> time) |> Array.max
printfn "longest request took %dms" longest
let bytes = info |> Array.sumBy (fun (_,size,_) -> float size)
let seconds = float fullTimeMs / 1000.
printfn "sucked down %7.2f KB/s" (bytes / 1024.0 / seconds)
let FetchAllSync() =
let allsw = Stopwatch.StartNew()
let info = sites |> Array.mapi (fun i url ->
let sw = Stopwatch.StartNew()
print "S"
let req = WebRequest.Create(url)
use resp = req.GetResponse()
use stream = resp.GetResponseStream()
use reader = new StreamReader(stream,
System.Text.Encoding.UTF8, true, 4096)
print "-"
let contents = reader.ReadToEnd()
print "r"
i, contents.Length, sw.ElapsedMilliseconds)
let time = allsw.ElapsedMilliseconds
printSummary info time
time, info |> Array.sumBy (fun (_,size,_) -> size)
let FetchAllAsync() =
let allsw = Stopwatch.StartNew()
let info = sites |> Array.mapi (fun i url -> async {
let sw = Stopwatch.StartNew()
print "S"
let req = WebRequest.Create(url)
use! resp = req.AsyncGetResponse()
use stream = resp.GetResponseStream()
use reader = new AsyncStreamReader(stream, // F# PowerPack
System.Text.Encoding.UTF8, true, 4096)
print "-"
let! contents = reader.ReadToEnd() // in F# PowerPack
print "r"
return i, contents.Length, sw.ElapsedMilliseconds })
|> Async.Parallel
|> Async.RunSynchronously
let time = allsw.ElapsedMilliseconds
printSummary info time
time, info |> Array.sumBy (fun (_,size,_) -> size)
// By default, I think .NET limits you to 2 open connections at once
ServicePointManager.DefaultConnectionLimit <- sites.Length
for i in 1..3 do // to warmup and show variance
let time1,r1 = FetchAllSync()
printfn "Sync took %dms, result was %d" time1 r1
let time2,r2 = FetchAllAsync()
printfn "Async took %dms, result was %d (speedup=%2.2f)"
time2 r2 (float time1/ float time2)
printfn ""
On my 4-core box, this consistently gives a nearly 4x speedup.
EDIT
In reply to your comment, I've updated the code. You're right in that I've added more sites and am not seeing the expected speedup (still holding steady around 4x). I've started adding a little debugging output above, will continue investigating to see if something else is throttling the connections...
EDIT
Editted the code again. Well, I found what might be the bottleneck. Here's the implementation of AsyncReadToEnd in the PowerPack:
type System.IO.StreamReader with
member s.AsyncReadToEnd () =
FileExtensions.UnblockViaNewThread (fun () -> s.ReadToEnd())
In other words, it just blocks a threadpool thread and reads synchronously. Argh!!! Let me see if I can work around that.
EDIT
Ok, the AsyncStreamReader in the PowerPack does the right thing, and I'm using that now.
However, the key issue seems to be variance.
When you hit, say, cnn.com, a lot of the time the result will come back in like 500ms. But every once in a while you get that one request that takes 4s, and this of course potentially kills the apparent async perf, since the overall time is the time of the unluckiest request.
Running the program above, I see speedups from about 2.5x to 9x on my 2-core box at home. It is very highly variable, though. It's still possible there's some bottleneck in the program that I've missed, but I think the variance-of-the-web may account for all of what I'm seeing at this point.
Using the Reactive Extensions for .NET combined with F#, you can write a very elegant solution - check out the sample at http://blog.paulbetts.org/index.php/2010/11/16/making-async-io-work-for-you-reactive-style/ (this uses C#, but using F# is easy too; the key is using the Begin/End methods instead of the sync method, which even if you can make it compile, it will block up n ThreadPool threads unnecessarily, instead of the Threadpool just picking up completion routines as they come in)
My bet is that the speedup you're experiencing is not significant enough for your taste because you're either using a subtype of WebRequest or a class relying on it (such as WebClient).
If that's the case, you need to set the MaxConnection on the ConnectionManagementElement (and I suggest you only set it if needed otherwise it's gonna become a pretty time-consuming operation) to a high value, depending on the number of simultaneous connections you wanna initiate from your application.
I'm not an F# guy, but from a pure .NET perspective what you're looking for is TaskFactory::FromAsync where the asynchronous call you'd be wrapping in a Task would be something like HttpRequest::BeginGetResponse. You could also wrap up the EAP model that WebClient exposes using a TaskCompletionSource. More on both of these topics here on MSDN.
Hopefully with this knowledge you can find the nearest native F# approach to accomplish what you're trying to do.
Here's some code that avoids the unknowns, such as web access latency. I am getting under 5% CPU utilization, and about 60-80% efficiency for both sync and async code paths.
open System.Diagnostics
let numWorkers = 200
let asyncDelay = 50
let main =
let codeBlocks = [for i in 1..numWorkers ->
async { do! Async.Sleep asyncDelay } ]
while true do
printfn "Concurrent started..."
let sw = new Stopwatch()
sw.Start()
codeBlocks |> Async.Parallel |> Async.RunSynchronously |> ignore
sw.Stop()
printfn "Concurrent in %d millisec" sw.ElapsedMilliseconds
printfn "efficiency: %d%%" (int64 (asyncDelay * 100) / sw.ElapsedMilliseconds)
printfn "Synchronous started..."
let sw = new Stopwatch()
sw.Start()
for codeBlock in codeBlocks do codeBlock |> Async.RunSynchronously |> ignore
sw.Stop()
printfn "Synchronous in %d millisec" sw.ElapsedMilliseconds
printfn "efficiency: %d%%" (int64 (asyncDelay * numWorkers * 100) / sw.ElapsedMilliseconds)
main

Resources