Joining twice the same stream - java-8

I would like use the joining collector twice on a same stream for produce a string like this Tea:5 - Coffee:3 - Money:10 .
Drink is enum with an Bigdecimal attribute (price).
currently I done like this :
Map<Drink, Long> groupByDrink = listOfDrinks.stream().collect(groupingBy(identity(),counting()));
String acc = groupByDrink.entrySet().stream().map(ite -> join(":", ite.getKey().code(), ite.getValue().toString())).collect(joining(" - "));
acc += " - Money:" + groupByDrink.entrySet().stream().map(ite -> ite.getKey().price().multiply(valueOf(ite.getValue()))).reduce(ZERO, BigDecimal::add);

I think, you are overusing new features.
join(":", ite.getKey().code(), ite.getValue().toString())
bears no advantage over the classical
ite.getKey().code()+":"+ite.getValue()
Besides that, I’m not sure what you mean with “use the joining collector twice on a same stream”. If you want to use the joining collector for the summary element as well, you have to concat it as stream before collecting:
String acc = Stream.concat(
groupByDrink.entrySet().stream()
.map(ite -> ite.getKey().code()+":"+ite.getValue()),
Stream.of("Money:" + groupByDrink.entrySet().stream()
.map(ite -> ite.getKey().price().multiply(valueOf(ite.getValue())))
.reduce(ZERO, BigDecimal::add).toString())
).collect(joining(" - "));

Related

Execution time reactive programming

Is this an ideal way to find execution time of method (getFavouriteDetails()), in reactive programming ?
public List<Favourites> getFavouriteDetails(String userId){
userService.getFavorites(userId)
.flatMap(favoriteService::getDetails)
.switchIfEmpty(suggestionService.getSuggestions())
.take(5)
.publishOn(UiUtils.uiThreadScheduler())
.subscribe(uiList::show, UiUtils::errorPopup)
.flatMap(a -> Mono.subscriberContext().map(ctx -> {
log.info("Time taken : " + Duration.between(ctx.get(key), Instant.now()).toMillis() + " milliseconds.");
return a;
}))
.subscriberContext(ctx -> ctx.put(key, Instant.now()))
}
Two approaches to ensure that you only measure execution time when you subscribe -
Wrap a Mono around the Flux using flatMapMany. This returns a Flux as well.
Use an AtomicReference, set time in onSubscribe and log elapsed time in doFinally.
Sample code -
timeFluxV1(getFavouriteDetails(userId)).subscribe(uiList::show, UiUtils::errorPopup);
timeFluxV1(getFavouriteDetails(userId)).subscribe(uiList::show, UiUtils::errorPopup);
private <T> Flux<T> timeFluxV1(Flux<T> flux) {
return Mono.fromSupplier(System::nanoTime)
.flatMapMany(time -> flux.doFinally(sig -> log.info("Time taken : " + TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - time) + " milliseconds.")));
}
private <T> Flux<T> timeFluxV2(Flux<T> flux) {
AtomicReference<Long> startTime = new AtomicReference<>();
return flux.doOnSubscribe(x -> startTime.set(System.nanoTime()))
.doFinally(x -> log.info("Time taken : " + TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTime.get()) + " milliseconds."));
}
public Flux<Favourites> getFavouriteDetails(String userId) {
return userService.getFavorites(userId)
.flatMap(favoriteService::getDetails)
.switchIfEmpty(suggestionService.getSuggestions())
.take(5)
.publishOn(UiUtils.uiThreadScheduler());
}
To time a method, the most basic way in Java is to use long System.nanoTime(). Instant and System.currentTimeMillis are for wall-clock operations and are not guaranteed to be monotonous nor precise enough...
In Reactor, to measure the time a sequence takes to complete, you would usually need to start the timing on subscription (nothing happens until you subscribe) and stop the timing within a doFinally (which execute some code on the side of the main sequence whenever it completes, errors or is cancelled).
Here however you are subscribing yourself, so there is no risk to be multiple subscriptions. You can thus do away with the "start timing on subscription" constraint.
It gives us something like this:
public List<Favourites> getFavouriteDetails(String userId){
final long start = System.nanoTime();
userService.getFavorites(userId)
.flatMap(favoriteService::getDetails)
.switchIfEmpty(suggestionService.getSuggestions())
.take(5)
.publishOn(UiUtils.uiThreadScheduler())
.doFinally(endType -> log.info("Time taken : " + TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start) + " milliseconds."))
.subscribe(uiList::show, UiUtils::errorPopup);
//return needed!
}
Note that there is also a elapsed() operator, which measures the time between subscription and 1st onNext, then between subsequent onNexts. It outputs a Flux<Tuple2<Long, T>>, and you could aggregate the longs to get overall timing, but that would lose you the "realtime" nature of Ts in that case.

F# Type Provider slows down intellisense in Visual Studio 2017

I have a very simple type provider; all types are erased, the provided type has 2000 int readonly properties Tag1..Tag2000
let ns = "MyNamespace"
let asm = Assembly.GetExecutingAssembly()
let private newProperty t name getter isStatic = ProvidedProperty(name, t, getter, isStatic = isStatic)
let private newStaticProperty t name getter = newProperty t name (fun _ -> getter) true
let private newInstanceProperty t name getter = newProperty t name (fun _ -> getter) false
let private addStaticProperty t name getter (``type``:ProvidedTypeDefinition) = ``type``.AddMember (newStaticProperty t name getter); ``type``
let private addInstanceProperty t name getter (``type``:ProvidedTypeDefinition) = ``type``.AddMember (newInstanceProperty t name getter); ``type``
[<TypeProvider>]
type TypeProvider(config : TypeProviderConfig) as this =
inherit TypeProviderForNamespaces(config)
let provider = ProvidedTypeDefinition(asm, ns, "Provider", Some typeof<obj>, hideObjectMethods = true)
let tags = ProvidedTypeDefinition(asm, ns, "Tags", Some typeof<obj>, hideObjectMethods = true)
do [1..2000] |> Seq.iter (fun i -> addInstanceProperty typeof<int> (sprintf "Tag%d" i) <## i ##> tags |> ignore)
do provider.DefineStaticParameters([ProvidedStaticParameter("Host", typeof<string>)], fun name args ->
let provided = ProvidedTypeDefinition(asm, ns, name, Some typeof<obj>, hideObjectMethods = true)
addStaticProperty tags "Tags" <## obj() ##> provided |> ignore
provided
)
do this.AddNamespace(ns, [provider; tags])
Then a test project with two modules in separate files:
module Common
open MyNamespace
type Provided = Provider<"">
let providedTags = Provided.Tags
type LocalTags() =
member this.Tag1 with get() : int = 1
member this.Tag2 with get() : int = 2
.
.
member this.Tag1999 with get() : int = 1999
member this.Tag2000 with get() : int = 2000
let localTags = LocalTags()
module Tests
open Common
open Xunit
[<Fact>]
let ProvidedTagsTest () =
Assert.Equal<int>(providedTags.Tag1001, 1001)
[<Fact>]
let LocalTagsTest () =
Assert.Equal<int>(localTags.Tag100, 100)
Everything works as expected (tests execution included). The problem I have is with the design time behavior inside Visual Studio, while I write code. I expect to have some overhead due to the type provider, but the slowness seems frankly excessive. The times reported below are in seconds and refer to the time measured from pushing the dot (.) key until the intellisense property list appears on the screen
providedTags. -> 15
localTags. -> 5
If I comment out or remove the first test code lines (so to eliminate any references to the provided stuff), then I get
localTags. -> immediate
If the number of properties is greater, the time seems to increase exponentially, not linearly, so that at 10000 it becomes minutes.
Questions are:
Is this normal or am I doing something wrong?
Are there guidelines to achieve a faster response?
If someone is curious about why I need so many properties, I am trying to supply an instrument to data analysts so that they can write F# scripts and get data out of an historian database with more than 10000 tags in its schema.
Issue has been fixed by Don Syme, see
https://github.com/fsprojects/FSharp.TypeProviders.SDK/issues/220
and
https://github.com/fsprojects/FSharp.TypeProviders.SDK/pull/229

Reactor use doOnNext multiple times

I was trying to change a little reactor samples from here and I'm a little confused with the behaviour I get.
So first I code like this:
EmitterProcessor<String> stream = EmitterProcessor.<String>create().connect();
Flux<String> flux = stream
.doOnNext(s -> System.out.println("1 " + s))
.doOnNext(s -> System.out.println("2 " + s));
flux.subscribe();
stream.onNext("Hello");
This code prints two lines as expected:
1 Hello
2 Hello
But if I add an intermediate varaible pretending I get it from some method or for readability the code starts to behave differently.
EmitterProcessor<String> stream = EmitterProcessor.<String>create().connect();
Flux<String> flux = stream
.doOnNext(s -> System.out.println("1 " + s));
flux .doOnNext(s -> System.out.println("2 " + s));
flux.subscribe();
stream.onNext("Hello");
So for the code above I get only one line, that is:
1 Hello
Can anybody explain this behaviour?
Thanks to Stephane Maldini I realised that Flux is immutable and each operation produces different flows.
Discussion is here

ActiveState Perl 5.14 fails to compare certain values?

Basically, using the following code on a file stream, I get the following:
$basis = $2 * 1.0;
$cost = ($basis - 2500.0) ** 1.05;
# The above should ensure that both cost & basis are floats
printf " %f -> %f", $basis, $cost;
if ($basis gt $cost) { # <- *** THIS WAS MY ERROR: gt forces lexical!
$cost = $basis;
printf " -> %f", $cost;
}
Outputs:
10667.000000 -> 12813.438340
30667.000000 -> 47014.045519
26667.000000 -> 40029.842300
66667.000000 -> 111603.373367 -> 66667.000000
8000.000000 -> 8460.203780
10667.000000 -> 12813.438340
73333.000000 -> 123807.632158 -> 73333.000000
6667.000000 -> 6321.420427 -> 6667.000000
80000.000000 -> 136071.379474 -> 80000.000000
As you can see, for most values, the code appears to work fine.
But for some values.... 66667, 80000, and a few others, ActivePerl 5.14 tells me that 66667 > 1111603!!!
Does anyone know anything about this - or have an alternate Perl interpreter I might use (Windows). Because this is ridiculous.
You are using a lexical comparison instead of the numerical one
$cost = ($basis - 2500.0) ** 1.05;
printf " %f -> %f", $basis, $cost;
if ($basis > $cost) {
$cost = $basis;
printf " -> %f", $cost;
}
ps: revised to match the updated question
The first few chapters of Learning Perl will clear this up for you. Scalar values are can be either strings or numbers, or both at the same time. Perl uses the operator to decide how to treat them. If you want to do numeric comparisons, you use the numeric comparison operators. If you want to do string comparisons, you use the string comparison operators.
The scalar values themselves don't have a type, despite other answers and comments using words like "float" and "cast". It's just strings and numbers.
Not sure why you need to compare as lexical, but you can force it using sprintf
$basis_real = sprintf("%015.6f",$basis);
$cost_real = sprintf("%015.6f",$cost);
printf " %s -> %s", $basis_real, $cost_real;
if ($basis_real gt $cost_real) {
$cost = $basis;
printf " -> %015.6f", $cost;
}
Output:
00010667.000000 -> 00012813.438340
00030667.000000 -> 00047014.045519
00026667.000000 -> 00040029.842300
00066667.000000 -> 00111603.373367
00008000.000000 -> 00008460.203780
00010667.000000 -> 00012813.438340
00073333.000000 -> 00123807.632158
00006667.000000 -> 00006321.420427 -> 00006667.000000
00080000.000000 -> 00136071.379474
The reason it was failing as you noted, the lexical compare does character to character compare, so when it hits the decimal point in 6667. it actually is alphabetically before 111603. , so it is greater.
To fix this, you must make all the numbers the same size, especially where the decimal lines up. The %015 is the total size of number, including the period and the decimals.

How to achieve Asynchrony instead of Parallelism in F#

(Sticking to a common example with async fetch of many web pages)
How would I spin off multiple (hundreds) of web page requests asynchronously, and then wait for all requests to complete before going to the next step? Async.AsParallel processes a few requests at a time, controlled by number of cores on the CPU. Grabbing a web page is not a CPU-bound operation. Not satisfied with the speedup of Async.AsParallel, I am looking for alternatives.
I tried to connect the dots between Async.StartAsTask and Task[].WaitAll. Instinctively, I wrote the following code, but it does not compile.
let processItemsConcurrently (items : int seq) =
let tasks = items |> Seq.map (fun item -> Async.StartAsTask(fetchAsync item))
Tasks.Task.WaitAll(tasks)
How would you approach this?
Async.Parallel is almost definitely right here. Not sure what you're not happy with; the strength of F# asyncs lies more in async computing than in task-parallel CPU-bound stuff (which is more tailored to Tasks and the .NET 4.0 TPL). Here's a full example:
open System.Diagnostics
open System.IO
open System.Net
open Microsoft.FSharp.Control.WebExtensions
let sites = [|
"http://bing.com"
"http://google.com"
"http://cnn.com"
"http://stackoverflow.com"
"http://yahoo.com"
"http://msdn.com"
"http://microsoft.com"
"http://apple.com"
"http://nfl.com"
"http://amazon.com"
"http://ebay.com"
"http://expedia.com"
"http://twitter.com"
"http://reddit.com"
"http://hulu.com"
"http://youtube.com"
"http://wikipedia.org"
"http://live.com"
"http://msn.com"
"http://wordpress.com"
|]
let print s =
// careful, don't create a synchronization bottleneck by printing
//printf "%s" s
()
let printSummary info fullTimeMs =
Array.sortInPlaceBy (fun (i,_,_) -> i) info
// for i, size, time in info do
// printfn "%2d %7d %5d" i size time
let longest = info |> Array.map (fun (_,_,time) -> time) |> Array.max
printfn "longest request took %dms" longest
let bytes = info |> Array.sumBy (fun (_,size,_) -> float size)
let seconds = float fullTimeMs / 1000.
printfn "sucked down %7.2f KB/s" (bytes / 1024.0 / seconds)
let FetchAllSync() =
let allsw = Stopwatch.StartNew()
let info = sites |> Array.mapi (fun i url ->
let sw = Stopwatch.StartNew()
print "S"
let req = WebRequest.Create(url)
use resp = req.GetResponse()
use stream = resp.GetResponseStream()
use reader = new StreamReader(stream,
System.Text.Encoding.UTF8, true, 4096)
print "-"
let contents = reader.ReadToEnd()
print "r"
i, contents.Length, sw.ElapsedMilliseconds)
let time = allsw.ElapsedMilliseconds
printSummary info time
time, info |> Array.sumBy (fun (_,size,_) -> size)
let FetchAllAsync() =
let allsw = Stopwatch.StartNew()
let info = sites |> Array.mapi (fun i url -> async {
let sw = Stopwatch.StartNew()
print "S"
let req = WebRequest.Create(url)
use! resp = req.AsyncGetResponse()
use stream = resp.GetResponseStream()
use reader = new AsyncStreamReader(stream, // F# PowerPack
System.Text.Encoding.UTF8, true, 4096)
print "-"
let! contents = reader.ReadToEnd() // in F# PowerPack
print "r"
return i, contents.Length, sw.ElapsedMilliseconds })
|> Async.Parallel
|> Async.RunSynchronously
let time = allsw.ElapsedMilliseconds
printSummary info time
time, info |> Array.sumBy (fun (_,size,_) -> size)
// By default, I think .NET limits you to 2 open connections at once
ServicePointManager.DefaultConnectionLimit <- sites.Length
for i in 1..3 do // to warmup and show variance
let time1,r1 = FetchAllSync()
printfn "Sync took %dms, result was %d" time1 r1
let time2,r2 = FetchAllAsync()
printfn "Async took %dms, result was %d (speedup=%2.2f)"
time2 r2 (float time1/ float time2)
printfn ""
On my 4-core box, this consistently gives a nearly 4x speedup.
EDIT
In reply to your comment, I've updated the code. You're right in that I've added more sites and am not seeing the expected speedup (still holding steady around 4x). I've started adding a little debugging output above, will continue investigating to see if something else is throttling the connections...
EDIT
Editted the code again. Well, I found what might be the bottleneck. Here's the implementation of AsyncReadToEnd in the PowerPack:
type System.IO.StreamReader with
member s.AsyncReadToEnd () =
FileExtensions.UnblockViaNewThread (fun () -> s.ReadToEnd())
In other words, it just blocks a threadpool thread and reads synchronously. Argh!!! Let me see if I can work around that.
EDIT
Ok, the AsyncStreamReader in the PowerPack does the right thing, and I'm using that now.
However, the key issue seems to be variance.
When you hit, say, cnn.com, a lot of the time the result will come back in like 500ms. But every once in a while you get that one request that takes 4s, and this of course potentially kills the apparent async perf, since the overall time is the time of the unluckiest request.
Running the program above, I see speedups from about 2.5x to 9x on my 2-core box at home. It is very highly variable, though. It's still possible there's some bottleneck in the program that I've missed, but I think the variance-of-the-web may account for all of what I'm seeing at this point.
Using the Reactive Extensions for .NET combined with F#, you can write a very elegant solution - check out the sample at http://blog.paulbetts.org/index.php/2010/11/16/making-async-io-work-for-you-reactive-style/ (this uses C#, but using F# is easy too; the key is using the Begin/End methods instead of the sync method, which even if you can make it compile, it will block up n ThreadPool threads unnecessarily, instead of the Threadpool just picking up completion routines as they come in)
My bet is that the speedup you're experiencing is not significant enough for your taste because you're either using a subtype of WebRequest or a class relying on it (such as WebClient).
If that's the case, you need to set the MaxConnection on the ConnectionManagementElement (and I suggest you only set it if needed otherwise it's gonna become a pretty time-consuming operation) to a high value, depending on the number of simultaneous connections you wanna initiate from your application.
I'm not an F# guy, but from a pure .NET perspective what you're looking for is TaskFactory::FromAsync where the asynchronous call you'd be wrapping in a Task would be something like HttpRequest::BeginGetResponse. You could also wrap up the EAP model that WebClient exposes using a TaskCompletionSource. More on both of these topics here on MSDN.
Hopefully with this knowledge you can find the nearest native F# approach to accomplish what you're trying to do.
Here's some code that avoids the unknowns, such as web access latency. I am getting under 5% CPU utilization, and about 60-80% efficiency for both sync and async code paths.
open System.Diagnostics
let numWorkers = 200
let asyncDelay = 50
let main =
let codeBlocks = [for i in 1..numWorkers ->
async { do! Async.Sleep asyncDelay } ]
while true do
printfn "Concurrent started..."
let sw = new Stopwatch()
sw.Start()
codeBlocks |> Async.Parallel |> Async.RunSynchronously |> ignore
sw.Stop()
printfn "Concurrent in %d millisec" sw.ElapsedMilliseconds
printfn "efficiency: %d%%" (int64 (asyncDelay * 100) / sw.ElapsedMilliseconds)
printfn "Synchronous started..."
let sw = new Stopwatch()
sw.Start()
for codeBlock in codeBlocks do codeBlock |> Async.RunSynchronously |> ignore
sw.Stop()
printfn "Synchronous in %d millisec" sw.ElapsedMilliseconds
printfn "efficiency: %d%%" (int64 (asyncDelay * numWorkers * 100) / sw.ElapsedMilliseconds)
main

Resources