RxJava in trading engine - java-8

I'll write in pseudocode to avoid unnecessary boilerplate. I'm new to Rx but i really want to use it instead of Features and similar stuff...
this is a simplified model of trading engine
we have a which provides all the streams of events (observables) that one might be interested in like
market.updateStream // stream of orderbook update events
we have so called s those will subscribe to requried streams and "react" when any change occures like this
market.updateStream.subscribe(bot1.marketUpdateAction)
market.updateStream.subscribe(bot2.marketUpdateAction)
we might have MANY bots that will subscribe to ONE market and each of those will either start calculating or ignore the change event.
now that we have N bots reacting to 1 event, we need someone to compare their calculations and decide which one is the most profitable also if some bot is slowing down and is exceeding limit of time we are ready to wait, we skip it and proceed to comparision and execution... for that we have a botController which is subscribed to all bots events so that it knows when a bot decided to react to an event... like this
bot1.calculationStream.subscribe(botController.botActivityAction)
bot will in its turn emit 2 different events (calculationStarted and calculationEnded which contains actual result)
when a bot emits event that its started to calculate (this happens if market event is in bot interest so not all bots will emit start event) botController shall do the following, start counting time with the very first bot started event, and await for all bots that emit similar event, if registered bots finish early, comparision starts immidiately...
Sorry if the question is too abstract but i dont really see how do i implement botControllers behaviour with RxJava... Any toughts are appreciated... there are so many rx transformations that i dont really know what can i use there.
UPDATE
Suppose our controller is subscribed to N bots events and each bot can emit 2 events (STARTED, COMPLETED)...
now when controller gets first STARTED event its starts the countdown T, while T time is not expired it will accept new events from bots, when T is expired or all bots return COMPLETED event it does some calculation and returns a single result...
the part i dont understand : rx as far as i know is handling each event in isolation thus the safety from tipical concurrency problems, now that i have several events that are tied to each other i dont see how can i do this using rx... I just need some guidance on this.

I don't fully understand your problem, but here are some design ideas to show you how to think the "Rx-way":
I wouldn't subscribe bots, but rather, they should be a map or flatMap on the update stream, so that they transform the stream of updates into a stream of their answers.
I'd make a BotAnswer class with 4 subclasses: Result, NotInterested, Timeout, BotError.
Then for each bot:
Observable<BotAnswer> bot1Answers = market.updateStream.flatMap(event ->
Observable.just(event).map({ if (event is interesting)
return new Result(doBotCalculations())
else
return new NotInterested()
})
.timeout(T, TimeUnit.SECONDS)
.onErrorResumeNext(
error -> if (error instanceof TimeoutException)
return Observable.just(new Timeout())
else
return Observable.just(new BotError(error))
)
)
And the controller would do a zip on all bot answers:
Observable.zip(bot1Ansers, bot2Answers, ... (a1, a2, ...) -> {
// compare the answers a1, a2, ... do some calculations, return result
})

Here is what i've come up with...
each bot subsribes for market updates,
when market is updated, bot starts computing,
when its finished its emiting completion event,
if its getting termination event from supervisor its stoping computation, if its getting another market update its terminating current execution and starts over...
supervisor subsribes for market updates too,
when its getting one its expecting all bots computation results (no result is also a result :),
if some bot is taking too long its ingnoring him by sending termination event (and counts that bot as returning no result),
as soon as it gets all the bots results its aware of it starts comparing them and emitting the final value to its subscriber,
if it gets another market update while executing it will terminate its computation and start over (waiting for bots)...
Regarding implementation, Bot will manage to handle proper concurrent access to its methods like (start, terminate), as well as Supervisor, eventually those objects ensure no unsafe operation can be done with the object by concurrent execution, using traditional synchronization, but main flow of what triggers what and what happens around is controlled by Rx...
Note that timeout is controlled by supervisor in this case...
Do i miss something here? maybe i'm doing something wrong if i have to use traditional synchronization in conjunction with rx?

Related

Kafka Stream - How to send an alert if no event has been received for a given key during some amount of time

I need to send an alert if no event has been received in a topic for a given key during some amount of time. What would be the best approach to solve this use case with KafkaStream ?
I tried:
1) a windowedBy together with a suppress operator:
stream
.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofMillis(1000)).grace(Duration.ZERO))
.count()
.suppress(Suppressed.untilWindowCloses(unbounded()))
.filter((k, v) -> v == 0)
.toStream()
.map((windowId, count) -> KeyValue.pair(windowId.key(), AlarmEvent.builder().build()))
.to(ALARMS, Produced.with(Serdes.String(), AlarmEvent.serde()));
But it seems that the window will not close until an event occurs after the expiration, thus no alarm can be send exactly after the defined timeout.
2) Using processor API with a punctator, it seems to work but I only tested with a TopologyTestDriver and advanceWallClockTime(). Not sure this advanceWallClockTime() relflects real time advance, or would only change upon event reception, thus falling back to the problem in 1).
3) If punctuator works, I would like to use it in a ValueTranformer to benefit from the DSL topology. However, I am encountering the problem described in How to forward event downstream from a Punctuator instance in a ValueTransformer?. Cannot send event downstream from the punctuator instance.
4) Finally, I had the idea to inject some dummy events on a regular basis (eg. every second) for every partitions so as to artificially force the inner clock to advance. This would allows me to use the clean and simple DSL window and suppress operators.
2) Using processor API with a punctator, it seems to work but I only tested with a TopologyTestDriver and advanceWallClockTime(). Not sure this advanceWallClockTime() relflects real time advance, or would only change upon event reception, thus falling back to the problem in 1).
That is the right approach. As the name indicate, punctuations can be triggered based on wall-clock time (ie, system time). TopologyTestDriver mocks wall-clock time for testing purpose, but KafkaStreams will use system time.
3) If punctuator works, I would like to use it in a ValueTranformer to benefit from the DSL topology. However, I am encountering the problem described in How to forward event downstream from a Punctuator instance in a ValueTransformer?. Cannot send event downstream from the punctuator instance.
You need to use transform() instead. Emitting data via forward() is no allowed in punctuations of a ValueTransformer because you could emit any key, violating that contract of a non-modified key.
4) Finally, I had the idea to inject some dummy events on a regular basis (eg. every second) for every partitions so as to artificially force the inner clock to advance. This would allows me to use the clean and simple DSL window and suppress operators.
That should work, too.

Long running async method vs firing an event upon completion

I have to create a library that communicates with a device via a COM port.
In the one of the functions, I need to issue a command, then wait for several seconds as it performs a test (it varies from 10 to 1000 seconds) and return the result of the test:
One approach is to use async-await pattern:
public async Task<decimal> TaskMeasurementAsync(CancellationToken ctx = default)
{
PerformTheTest();
// Wait till the test is finished
await Task.Delay(_duration, ctx);
return ReadTheResult();
}
The other that comes to mind is to just fire an event upon completion.
The device performs a test and the duration is specified prior to performing it. So in either case I would either have to use Task.Delay() or Thread.Sleep() in order to wait for the completion of the task on the device.
I lean towards async-await as it easy to build in the cancellation and for the lack of a better term, it is self contained, i.e. I don't have to declare an event, create a EventArgs class etc.
Would appreciate any feedback on which approach is better if someone has come across a similar dilemma.
Thank you.
There are several tools available for how to structure your code.
Events are a push model (so is System.Reactive, a.k.a. "LINQ over events"). The idea is that you subscribe to the event, and then your handler is invoked zero or more times.
Tasks are a pull model. The idea is that you start some operation, and the Task will let you know when it completes. One drawback to tasks is that they only represent a single result.
The coming-soon async streams are also a pull model - one that works for multiple results.
In your case, you are starting an operation (the test), waiting for it to complete, and then reading the result. This sounds very much like a pull model would be appropriate here, so I recommend Task<T> over events/Rx.

How many "temperatures" are there for a Rx Observable?

All over the Rx.Net literature there are references to what is commonly know as the temperature of an observable.
There are cold observables (like the ones created by Observable.Interval() and similar factory methods), which will create side effects every time that a new Subscription is created.
On the other side of the spectrum there are hot observables (like Subject<T>) which will onboard new subscriptions as they come.
There are also warm observables, like the ones returned by RefCount() which will execute the initialisation every time one subscription is created, but only if there was no other active subscription. The behaviour of these warm observables is explained here by Dave Sexton:
Alternatively, you can call Publish then RefCount to get an IObservable that is shared among multiple consecutive observers. Note that this isn't truly a hot observable - it's more like a warm observable. RefCount makes a single subscription to the underlying observable while there's at least one observer of your query. When your query has no more observers, changing the reference count to 0, the underlying subscription is disposed. If another observer subscribes to your query later, moving the reference count from 0 to 1 again, then RefCount makes a new subscription to the underlying observable, causing subscription side-effects to occur again.
Are there any other temperatures that one should be aware of? Is it possible to obtain programmatically the temperature of an Observable?
Easy question first:
Is it possible to obtain programmatically the temperature of an Observable?
No. Best you can do is subscribe and see what happens.
The observable 'contract' specifies that when you subscribe to an observable you get zero or more OnNext messages, optionally followed by either one OnCompleted or one OnError message. The contract doesn't specify anything about how multiple or earlier/later subscribers are treated, which is what observable 'temperature' is mostly concerned with.
Are there any other temperatures that one should be aware of?
I wouldn't even think of it in such concrete or discrete terms as you have specified.
I think of it in terms of on-subscribe effects: The coldest of observables have all their effects happen on subscribe (like Observable.Return(42)). The hottest of observables have no effects happening on subscribe (new Subject<int>()). In between those two poles is a continuum.
Observable.Interval(TimeSpan.FromMilliseconds(100)) for example will emit a new number every 100 milliseconds. That example, unlike Observable.Return(42), could be mostly 'warmed-over' via .Publish().RefCount(): The first subscriber starts the numbers, but the second subscriber will see the only the latest numbers, not starting from 0. However, if instead of .Publish() you did .Replay(2).RefCount(), then you have some on-subscribe effects going on. Do the Publish and Replay observables have the same 'temperature'?
TL;DR: Don't focus on the classifications that much. Understand the difference between the two and know that some observables have colder properties and some have warmer ones.

CQRS - out of order messages

Suppose we have 3 different services producing events, each of them publishing to its own event store.
Each of these services consumes other producers services events.
This because each service has to process another service's events AND to create its own projection. Each of the service runs on multiple instances.
The most straight forward way to do it (for me) was to put "something" in front of each ES which is picking events and publishing (pub/sub) them in queues of every other service.
This is perfect because every service can subscribe to each topics it likes, while the event publisher is doing the job and if a service is unavailable events are still delivered. This seems to me to guarantee high scalability and availability.
My problem is the queue. I can't get an easily scalable queue that guarantees ordering of the messages. It actually guarantees "slightly out of order" with at-least once delivery: to be clear, it's AWS SQS.
So, the ordering problems are:
No order guaranteed across events from the same event stream.
No order guaranteed across events from the same ES.
No order guaranteed across events from different ES (different services).
I though I could solve the first two problems just by keeping track of the "sequence number" of the events coming from the same ES.
This would be done by tracking the last sequence number of each topic from which we are consuming events
This should be easy for reacting to events and also building our projection.
Then, when I pop an event from the queue, if the eventSequenceNumber > previousAppliedEventSequenceNumber + 1 i renqueue it (or make it invisible for a certain time).
But it turns out that using this solution, it will destroy performances when events are produced at high rates (I can use a visibility timeout or other stuff, the result should be the same).
This because when I'm expecting event 10 and I ignore event 11 for a moment, I should ignore also all events (from ES) with sequence numbers coming after that event 11, until event 11 shows up again and it's effectively processed.
Other difficulties were:
where to keep track of the event's sequence number for build the projection.
how to keep track of the event's sequence number for build the projection so that when appling it, I have a consistent lastSequenceNumber.
What I'm missing?
P.S.: for the third problem think at the following scenario. We have a UserService and a CartService. The CartService has a projection where for each user keeps track of the products in the cart. Each cart's projection must have also user's name and other info's that are coming from the UserCreated event published from the UserService. If UserCreated comes after ProductAddedToCart the normal flow requires to throw an exception because the user doesn't exist yet.
What I'm missing?
You are missing flow -- consumers pull messages from sources, rather than having sources push the messages to the consumers.
When I wake up, I check my bookmark to find out which of your messages I read last, and then ask you if there have been any since. If there have, I retrieve them from you in order (think "document message"), also writing down the new bookmarks. Then I go back to sleep.
The primary purpose of push notifications is to interrupt the sleep period (thereby reducing latency).
With SQS acting as a queue, the idea is that you read all of the enqueued messages at once. If there are no gaps, then you can order the collection then start processing them and acking them. If there are gaps, you either wait (leaving the messages in the queue) or you go to the event store to fetch copies of the missing messages.
There's no magic -- if the message pipeline is promising "at least once" delivery, then the consumers must take steps to recognize duplicate messages as they arrive.
If UserCreated comes after ProductAddedToCart the normal flow requires to throw an exception because the user doesn't exist yet.
Review Race Conditions Don't Exist, by Udi Dahan: "A microsecond difference in timing shouldn’t make a difference to core business behaviors."
The basic issue is assuming we can get messages IN ORDER...
This is a fallacy in distributed computing...
I suggest you design for no message ordering in your system.
As for your issues, try and use UTC time in the message body/header created by the originator and try and work around this data point. Sequence numbers are going to fail unless you have a central deterministic sequence creator (which will be a non-scalable, single point of failure).
Using Sagas/State machine is a path that can help to make sense of (business) events ordering.

When to use events?

At work, we have a huge framework and use events to send data from one part of it to another. I recently started a personal project and I often think to use events to control the interactions of my objects.
For example, I have a Mixer class that play sound effects and I initially thought I should receive events to play a sound effect. Then I decided to only make my class static and call
Mixer.playSfx(SoundEffect)
in my classes. I have a ton of examples like this one where I initially think of an implementation with events and then change my mind, saying to myself it is too complex for nothing.
So when should I use events in a project? In which occasions events have a serious advantage over others techniques?
You generally use events to notify subscribers about some action or state change that occurred on the object. By using an event, you let different subscribers react differently, and by decoupling the subscriber (and its logic) from the event generator, the object becomes reusable.
In your Mixer example, I'd have events signal the start and end of playing of the sound effect. If I were to use this in a desktop application, I could use those events to enable/disable controls in the UI.
The difference between Calling a subroutine and raising events has to do with: Specification, Election, Cardinality and ultimately, which side, the initiator or the receiver has Control.
With Calls, the initiator elects to call the receiving routine, and the initiator specifies the receiver. And this leads to many-to-one cardinality, as many callers may elect to call the same subroutine.
With Events on the other hand, the initiator raises an event that will be received by those routines that have elected to receive that event. The receiver specifies what events it will receive from what initiators. This then leads to one-to-many cardinality as one event source can have many receivers.
So the decision as to Calls or Events, mostly has to do with whether the initiator determines the receiver is or the receiver determines the initiator.
Its a tradeoff between simplicity and re-usability. Lets take an metaphor of "Sending the email" process:
If you know the recipients and they are finite in number that you can always determine, its as simple as putting them in "To" list and hitting the send button. Its simple as thats what we use most of the time. This is calling the function directly.
However, in case of mailing list, you don't know in advance that how many users are going to subscribe to your email. In that case, you create a mailing list program where the users can subscribe to and the email goes automatically to all the subscribed users. This is event modeling.
Now, even though, in both above option, emails are sent to users, you are a better judge of when to send email directly and when to use the mailing list program. Apply the same judgement, hope that you would get your answer :)
Cheers,
Ajit.
I have been working with a huge code base at my previous work place and have seen, that using events can increase the complexity quite a lot and often unnecessarily.
I had often to reverse engineer existing code in order to fix it or to extend it.
In both cases, it is a lot easier to understand what is going on, when you can simply read a list of function calls instead of just seeing the raise of an event.
The event forces you to look for usages in order to fully understand what is happening. Not a problem with modern IDEs, but if you then encounter many functions, which also raise events, it quickly becomes complex. I had encountered cases, where it mattered in what order functions did subscribe to an event, even though most languages don't even gurantee a calling order...
There are cases when it is a really good idea to use events. But before you start eventing, consider the alternative. It is probably easier to read and mantain.
A Classic example for the use of events is a UI framework, which provides elements like buttons etc.
You want the function "ButtonPressed()" of the framework to call some of your functions, so that you can react to the user action.
The alternative to an event that you can subscribe to, would for example be a public bool "buttonPressed", which the UI framework exposes
and which you can regurlary check for beeing true or false. This is of course very ineffecient, when there are hundreds of UI elements.

Resources