Convert infinite stream of finite streams to an infinite stream - Reactive X

Convert infinite stream of finite streams to an infinite stream - Reactive X - rxjs

How in Reactive x (ideally with examples in RxJava or RxJs) can be achieved this ?
a |-a-------------------a-----------a-----------a----
s1 |-x-x-x-x-x-x -| (subscribe)
s2 |-x-x-x-x-x-| (subscribe)
s2 |-x-x-x-x-x-| (subscribe)
...
sn
S |-x-x-x-x-x-x-x-------x-x-x-x-x-x-x-------------x-x-x-x-x-x- (subsribe)
a is an infinite stream of events which trigger finite stream sn of events each of which should be part of infinite stream S while being able to subscribe to each sn stream ( in order to do summation operations) but at the same time keeping stream S as infinite.
EDIT: To be more concrete I provide the implementation of what I am looking for in Kotlin.
Every 10 second an event is emitted which maps to shared finite stream of 4 events. The metastream is flatMap-ed into normal infinite stream. I make use of doAfterNext to additionally subscribe to each finite stream and print out results.
/** Creates a finite stream with events
* $ch-1 - $ch-4
*/
fun createFinite(ch: Char): Observable<String> =
Observable.interval(1, TimeUnit.SECONDS)
.take(4)
.map({ "$ch-$it" }).share()
fun main(args: Array<String>) {
var ch = 'A'
Observable.interval(10, TimeUnit.SECONDS).startWith(0)
.map { createFinite(ch++) }
.doAfterNext {
it
.count()
.subscribe({ c -> println("I am done. Total event count is $c") })
}
.flatMap { it }
.subscribe { println("Just received [$it] from the infinite stream ") }
// Let main thread wait forever
CountDownLatch(1).await()
}
However I am not sure if this is the 'pure RX' way.

You don't make clear how you want to do the counting. If you are doing a total count, then there is no need to do the interior subscription:
AtomicLong counter = new AtomicLong()
Observable.interval(10, TimeUnit.SECONDS).startWith(0)
.map { createFinite(ch++) }
.flatMap { it }
.doOnNext( counter.incrementAndget() )
.subscribe { println("Just received [$it] from the infinite stream ") }
On the other hand, if you need to provide a count for each intermediate observable, then you can move the counting inside the flatMap() and print out the count and reset it on completion:
AtomicLong counter = new AtomicLong()
Observable.interval(10, TimeUnit.SECONDS).startWith(0)
.map { createFinite(ch++) }
.flatMap { it
.doOnNext( counter.incrementAndget()
.doOnCompleted( { long ctr = counter.getAndSet(0)
println("I am done. Total event count is $ctr")
} )
.subscribe { println("Just received [$it] from the infinite stream ") }
This isn't very functional, but this kind of reporting tends to break normal streams.

Related

"go func" recursion vs. for loop performance/patterns

I'm writing a socket handler, and I thought of two ways to write individual synchronous event handlers (events of same type must be received in order):
For loop
for {
var packet EventType
select {
case packet = <-eventChannel:
case <- stop:
break
}
// Logic
}
go func Recursion
func GetEventType() {
var packet EventType
select {
case packet = <-eventChannel:
case <- stop:
return
}
// Logic
go func GetEventType()
}
I know that looping is almost always more efficient than recursing, but I couldn't find much on the performance of go func relative to alternatives. Here's my initial take on each method:
For loop:
Doesn't start new thread each call
Doesn't use call stack
Good pattern
go func Recursion:
Clean
Doesn't require anonymous function to use defer
Isolated access (data-hiding)
Are there any other reasons to use one over the other? Is method #2 an anti-pattern? Could method #2 cause a major slow-down (call stack?) under high throughput?

Limit the number of processed messages from channel

I recieve around 200 000 message/seconds over channel to my worker, I need to limit the number of messages I will send to the client to only 20 per second.
This make it 1 message per 50 milliseconds
And the worker will still alive during all the program lifetime with the help of the LOOP (and not opening a channel for each message).
My goal:
- Since the order of the messages is important, I want to skip all the messages that comes during that blocked 50ms and save only the latest one
- If the latest one come during the blocked 50ms, I want the saved message to be processed when the block time is over inside the loop and no new message coming ! <-- This is my problem
My strategy
- Keep sending the latest message that is not yet processed to the same channel
But the problem with it, what if that message is sent after a new message that comes (from the application) ?
The code below is more an algorythm as a working code, just want a tip/way on how to do it.
func example (new_message_from_channel <-chan *message) {
default = message
time = now_milliseconds
diff_accepted = 50milli
for this_message := range new_message_from_channel {
if now_millisecond - time >= diff_accepted {
send_it_to_the_client
time = now_milliseconds
} else {
//save the latest message
default = this_message
//My problem is how to process this latest message when the blocked 50ms is over and no new message coming ?!
//My strategy - keep sending it to the same channel
theChannel <- default
}
}
}
If you got an elegent way to do it, you are welcome to share it with me :)

Using a rate-limiter, you can create a throttle function which will take: a rate and a channel as input; and return two channels - one which includes all of the original channels items, the other only relaying items at a fixed rate:
func throttle(r time.Duration, in <-chan event) (C, tC <-chan event) {
// "writeable" channels
var (
wC = make(chan event)
wtC = make(chan event)
)
// read-only channels - returned to caller
C = wC
tC = wtC
go func() {
defer close(wC)
defer close(wtC)
rl := rate.NewLimiter(
rate.Every(r),
1,
)
// relays input channel's items to two channels:
// (1) gets all writes from original channel
// (2) only writes at a fixed frequency
for ev := range in {
wC <- ev
if rl.Allow() {
wtC <- ev
}
}
}()
return
}
Working example: https://play.golang.org/p/upei0TiyzNr
EDIT:
To avoid using a rate-limiter and instead use a simple time.Ticker:
tick := time.NewTicker(r)
for ev := range in {
select {
case wC <- ev: // write to main
case <-tick.C:
wC <- ev // write to main ...
wtC <- ev // ... plus throttle channel
}
}
Working example: https://play.golang.org/p/UTRXh72BvRl

"Recycling" items in iterators for better performance

I have a file that contains multiple instances of some complex data type (think of a trajectory of events). The API to read this file is written in C and I don't have much control over it. To expose it to Rust, I implemented the following interface:
// a single event read from the file
struct Event {
a: u32,
b: f32,
}
// A handle to the file used for I/O
struct EventFile;
impl EventFile {
fn open() -> Result<EventFile, Error> {
unimplemented!()
}
// read the next step of the trajectory into event
fn read(&self, event: &mut Event) -> Result<(), Error> {
event.a = unimplemented!();
event.b = unimplemented!();
}
}
To access the file contents, I could call the read function until it returns an Err similar to this:
let event_file = EventFile::open();
let mut event = Event::new();
let mut result = event_file.read(&mut event);
while let Ok(_) = result {
println!("{:?}", event);
result = event_file.read(&mut event);
}
Because event is reused for each call of read, there's no repeated allocation/deallocation of memory which hopefully results in some performance improvement (the event struct is much bigger in the actual implementation).
Now, It would be nice to be able to access this data through an iterator. However, to my understanding, this means that I have to create a new instance of Event each time the iterator yields - because I cannot reuse the event inside with an iterator. And this will hurt the performance:
struct EventIterator {
event_file: EventFile,
}
impl Iterator for EventIterator {
type Item = Event;
fn next(&mut self) -> Option<Event> {
let mut event = Event::new(); // costly allocation
let result = self.event_file.read(&mut event);
match result {
Ok(_) => Some(event),
Err(_) => None,
}
}
}
let it = EventIterator { event_file };
it.map(|event| unimplemented!())
Is there a way to somehow "recycle" or "reuse" events inside the iterator? Or is this a concept that is simply not transferable to Rust and I have to live with worse performance using iterators in this case?

You can "recycle" items between iterations by wrapping the Item in a reference counter. The idea here is that if the caller keeps the item around between iterations, the iterator allocates a new object and returns that new object. If the item is dropped by the caller before the next iteration begins, the item is recycled. This is ensured by std::rc::Rc::get_mut(), which will only return a reference if the reference-count is exactly 1.
This has the downside that your Iterator yields Rc<Foo> instead of Foo. There is also the added code-complexity and (maybe) some runtime-cost due to the reference-counting (which may get elided completely if the compiler can prove that).
You will, therefore, need to measure if this actually gets you a performance win. Allocating a new object on every single iteration may seem costly, but allocators are good at this...
Something to the tune of
use std::rc::Rc;
#[derive(Default)]
struct FoobarIterator {
item: Rc<String>,
}
impl Iterator for FoobarIterator {
type Item = Rc<String>;
fn next(&mut self) -> Option<Self::Item> {
let item = match Rc::get_mut(&mut self.item) {
Some(item) => {
// This path is only taken if the caller
// did not keep the item around
// so we are the only reference-holder!
println!("Item is re-used!");
item
},
None => {
// Let go of the item (the caller gets to keep it)
// and create a new one
println!("Creating new item!");
self.item = Rc::new(String::new());
Rc::get_mut(&mut self.item).unwrap()
}
};
// Create the item, possible reusing the same allocation...
item.clear();
item.push('a');
Some(Rc::clone(&self.item))
}
}
fn main() {
// This will only print "Item is re-used"
// because `item` is dropped before the next cycle begins
for item in FoobarIterator::default().take(5) {
println!("{}", item);
}
// This will allocate new objects every time
// because the Vec retains ownership.
let _: Vec<_> = FoobarIterator::default().take(5).collect();
}

The compiler (or LLVM) will most likely employ return value optimization in this case, so you do not need to prematurely optimize by yourself.
See this Godbolt example, particularly lines 43 to 47. My comprehension of Assembly is limited, but it seems that next() simply writes the Event value to the memory passed by the caller via a pointer (initially in rdi). In subsequent loop iterations this memory place can be reused.
Note that you get a much longer assembly output (which I did not analyze in depth) if you compile without the -O flag (e.g. when building in the "debug" mode as opposed to "release").

A small change to producer consumer semaphores order

This is the classic producer consumer problem solution:
semaphore mutex = 1
semaphore fillCount = 0
semaphore emptyCount = BUFFER_SIZE
procedure producer() {
while (true) {
item = produceItem()
down(emptyCount)
down(mutex)
putItemIntoBuffer(item)
up(mutex)
up(fillCount)
}
}
procedure consumer() {
while (true) {
down(fillCount)
down(mutex)
item = removeItemFromBuffer()
up(mutex)
up(emptyCount)
consumeItem(item)
}
}
My question is what are the implications if we swapped the last two semaphores order in the consumer or the producer?
For example if the code of the consumer becomes:
procedure consumer() {
while (true) {
down(fillCount)
down(mutex)
item = removeItemFromBuffer()
up(emptyCount)
up(mutex)
consumeItem(item)
}
}

In this simple example, the outcome is the same and you will not experience starvation or deadlock (even if the second version of consumer is definitely a bad practice).
Consider a more complicated situation (it should not happen) where you have the mistaken consumer and you have some code between up(emptyCount) and up(mutex), like this:
up(emptyCount)
// time-consuming code
up(mutex)
In this case you will probably have more producers waiting for the release of the mutex, which is not necessary.
Now consider the even worse situation where you have to deal with another semaphore between those 2 instructions: in the worst case it could lead to starvation.

retryWhen with timer appears to subvert merge behavior

I'm using RxJava in an asynchronous messaging environment (vert.x), and so have a flow that looks like this:
Observable.defer( () -> getEndpoint() )
.mergeWith( getCancellationMessage() )
.flatMap( endpoint -> useEndpoint( endpoint ) )
.retryWhen( obs -> obs.flatMap( error -> {
if ( wasCancelled( error ) ) {
return Observable.error( error );
}
return Observable.timer(/* args */)
}) )
.subscribe( result -> useResult( result ),
error -> handleError( error )
);
The implementation of getCancellationMessage() returns an observable stream that emits an error whenever a cancellation message has been received from an independent message source. This stream never emits anything other than Observable.error(), and it only emits an error when it receives a cancellation message.
If I understand how merge works, the entire chain should be terminated via onError when getCancellationMessage() emits an error.
However, I am finding that if the retryWhen operator is waiting for the timer to emit when a cancellation message is received, the error is ignored and the retryWhen loop continues as if the cancellation was never received.
I can fix the behavior by merging Observable.timer() with the same getCancellationMessage() function, but I'm not understanding why I have to do that in the first place.
Is this merge/retryWhen interaction expected?
Edits:
Below is an example of the kind of thing that the getCancellationMessage() function is doing:
Observable<T> getCancellationMessage() {
if ( this.messageStream == null ) {
this.messageStream = this.messageConsumer.toObservable()
.flatMap( message -> {
this.messageConsumer.unregister();
if ( isCancelMessage(message) ) {
return Observable.error( new CancelError() );
}
else {
return Observable.error( new FatalError() );
}
});
}
return this.messageStream;
}
Note that I don't own the implementation of this.messageConsumer - this comes from the third party library I'm using (vert.x), so I don't control the implementation of that Observable.
As I understand it, the messageConsumer.toObservable() method returns the result of Observable.create() provided with an instance of this class, which will call the subscriber's onNext method whenever a new message has arrived.
The call to messageConsumer.unregister() prevents any further messages from being received.

However, I am finding that if the retryWhen operator is waiting for the timer to emit when a cancellation message is received, the error is ignored and the retryWhen loop continues as if the cancellation was never received.
The operator retryWhen turns an upstream Throwable into a value and routes it through the sequence you provided in order to get a value response to retry the upstream or end the stream, thus
Observable.error(new IOException())
.retryWhen((Observable<Throwable> error) -> error)
.subscribe();
Will retry indefinitely because the inner error is considered a value now, not an exception.
retryWhen doesn't know by itself which of the error values should it consider to be one that shouldn't be retried, that's the job of your inner flow:
Observable.defer( () -> getEndpoint() )
.mergeWith( getCancellationMessage() )
.flatMap( endpoint -> useEndpoint( endpoint ) )
.retryWhen( obs -> obs
.takeWhile( error -> !( error instanceof CancellationException ) ) // <-------
.flatMap( error -> Observable.timer(/* args */) )
)
.subscribe( result -> useResult( result ),
error -> handleError( error )
);
Here, we only let the error pass if it is not of type CancellationException (you can replace it with your error type). This will complete the sequence.
If you want the sequence to end with an error instead, we need to change the flatMap logic instead:
.retryWhen(obs -> obs
.flatMap( error -> {
if (error instanceof CancellationException) {
return Observable.error(error);
}
return Observable.timer(/* args */);
})
)
Note that returning Observable.empty() in flatMap doesn't end the sequence as it just indicates a source to be merged is empty but there could be still other inner sources. In particular to retryWhen, an empty() will hang the sequence indefinitely because there won't be any signal to indicate retry or end-of-sequence.
Edit:
Based on your wording, I assume getCancellationMessage() is a hot observable. Hot observables have to be observed in order to receive their events or errors. When the retryWhen operator is in its retry grace period due to timer(), there is nothing subscribed to the topmost mergeWith with the getCancellationMessage() and thus it can't stop the timer at that point.
You have to keep a subscription to it while the timer executes to stop it right away:
Observable<Object> cancel = getCancellationMessage();
Observable.defer( () -> getEndpoint() )
.mergeWith( cancel )
.flatMap( endpoint -> useEndpoint( endpoint ) )
.retryWhen( obs -> obs
.flatMap( error -> {
if (error instanceof CancellationException) {
return Observable.error(error);
}
return Observable.timer(/* args */).takeUntil( cancel );
})
)
.subscribe( result -> useResult( result ),
error -> handleError( error )
);
In this case, if cancel fires while the timer is executing, the retryWhen will stop the timer and terminate with the cancel error immediately.
Using takeUntil is one option, as you found out, mergeWith ( cancel ) again works as well.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Convert infinite stream of finite streams to an infinite stream - Reactive X - rxjs

Related

"go func" recursion vs. for loop performance/patterns

Limit the number of processed messages from channel

"Recycling" items in iterators for better performance

A small change to producer consumer semaphores order

retryWhen with timer appears to subvert merge behavior

Categories

Resources