KafkaStreams handling late events when using windowing

KafkaStreams handling late events when using windowing - apache-kafka-streams

Question:
If an event arrives after the window has closed, then, how do we re-direct it to another topic for handling the correction ?
Context:
We use tumbling windows
We use events source creation time(event-time) for defining windows
thanks

Currently, there is no API to do that. Late events are dropped and you cannot get a hand on them easily.
What you could do is, to have an upstream operator (like a transform()) before the window, the compares the record timestamp to the current "stream time" (you would need to track "stream time" manually within the operator) -- this should help you to detect if the downstream window will drop the record as late and react to it accordingly (for example using a branch() after transform() and before groupByKey().windonwedBy().

Related

Should an Event Sourcing entry contain what should update the view model or the payload of the event?

I have a situation where data is coming from a third party service. It is being passed through to a function that formats the data and then saves it to a view model in a way that I can visualize for my system.
In an Event driven approach, should I save the payload of the request (as this can easily be repayable) in the Event stream, or the formatted changes it produces to the view model (a more accurate representation of the current state of the data)?
Or something else completely?
Thanks

The incoming data can be viewed as a command expressing the intent to ultimately update some state. In this case the command is from outside our system, but commands can also be internal to our system. Especially for external commands, one critical thing to remember is that a command can be rejected.
In event sourcing, however, events are internal and express that the change has occurred and cannot be denied (at most it can be ignored). Thus it's probably best to store them in the format that is the most convenient for that internal use.
I would characterize the requests as commands and the formatted changes as events. Saving the payload is command sourcing, saving the formatted changes is event sourcing (confusingly, Fowler's earliest descriptions of event sourcing are more like command sourcing) and both are valid approaches. Event sourcing tends to imply a commitment to replay to a similar state while command sourcing leaves open the ability for replay to depend on something in the outside world. I've seen (and developed even) applications which used both techniques (e.g. incoming data is dumped to Kafka, a consumer treats those messages as commands against aggregates whose state is persisted as a stream of events, which gets projected back into Kafka).
If you (in CQRS/ES fashion) consider the read-side of your application to be a separate autonomous component from the write-side, then you reach the interesting conclusion that when the write-side publishes events, from the read-side's perspective it's publishing commands to the read-side. "One component's events are often another component's commands".

Is it possible to pause and resume Kafka Stream conditionally?

I have a requirement as stated # https://kafka.apache.org/21/documentation/streams/developer-guide/dsl-api.html#window-final-results for waiting until window is closed in order to handle late out of order event by buffering it for duration of window.
Per my understanding of this feature is once windowing is created, the window works like wall clock processing, e.g. Creating for 1 hour window, The window starts ticking once first event comes. This 1hr window is closed exactly one hour later and all the events buffered so far will be forwarded to down stream. However, i need to be able to hold this window even longer say conditionally for as long as required e.g. based on state / information in external system such as database.
To be precise my requirement for event forwarding is (windows of 1 hour if external state record says it is good) or (hold for as long as required until external record says it's good and resume tracking of the event until the event make it fully 1hr, disregarding the time when external system is not good)
To elaborate this 2nd condition, e.g. if my window duration 1 1hr , my event starts at 00:00, if on 00:30 it is down and back normal on 00:45, the window should extend until 01:15.
Is it possible to pause and resume the forwarding of events conditionally based on my requirement above ?
Do I have to use transformation / processor and use value store manually to track the first processing time of my event and conditionally forwarding buffered events in punctuator ?
I appreciate all kind of work around and suggestion for this requirement.

the window works like wall clock processing
No. Kafka Streams work on event-time, hence, the timestamps as returned from the TimestampExtractor (by default the embedded record timestamp) are use to advance time.
To be precise my requirement for event forwarding is (windows of 1 hour if external state record says it is good)
This would need a custom solution IMHO.
or (hold for as long as required until external record says it's good and resume tracking of the event until the event make it fully 1hr, disregarding the time when external system is not good)
Not 100% if I understand this part.
Is it possible to pause and resume the forwarding of events conditionally based on my requirement above ?
No.
Do I have to use transformation / processor and use value store manually to track the first processing time of my event and conditionally forwarding buffered events in punctuator ?
I think this might be required.
Check out this blog post, that explains how suppress() work in details, and when it emits based on observed event-time: https://www.confluent.io/blog/kafka-streams-take-on-watermarks-and-triggers

How to ignore events in LabView triggered outside of a particular sequence frame?

Using event structures in LabView can get confusing, especially when mixing them with a mostly synchronous workflow. My question is, when an event structure exists in one frame of a sequence, how can I force it to ignore events (e.g. mousedown on a particular button) that were triggered while the workflow is in another frame of the sequence?
Currently, the event structures only process the events at the correct frame in the sequence, but if one was triggered while the workflow is in the previous frame, it processes those too and I want it to ignore any events that weren't triggered in the frame that the event structure exists within.
http://puu.sh/hwnoO/acdd4c011d.png
Here's part of my workflow. If the mousedown is triggered while the left part is executing, I want the event structure to ignore those events once the sequence reaches it.

Instead of placing the event structure inside your main program sequence, put it in a separate loop and have it pass the details of each event to the main sequence by means of a queue. Then you can discard the details of the events you don't want by flushing the queue at the appropriate point.
Alternatively you could use a boolean control to determine whether the event loop sends event details to the queue or discards them, and toggle the boolean with a local variable from the main sequence.

You can register for events dynamically. Registration is the point in time at which the event structure starts enqueueing events, and in your case this happens when the VI the event structure is in enters run mode (meaning it's executing or one of its callers is). You can change it so that you register using the Register for Events node and then you would only get events from that point on. When you unregister you will stop getting events.
There's a very good presentation by Jack Dunaway going into some details about events here.
You can find the code for it here.
In LabVIEW 2013 and later there are additional options for controlling the events queue, but I won't go into them here.

http://puu.sh/hwsBE/fe50dee671.png
I couldn't figure out how to flush the event queue for built-in event types like mousedown, but I managed to get around that by creating a static reference to the VI and setting the cursor to busy during the previous sequence, disabling clicking. Then when the sequence for the event structure is reached, I unset the cursor from busy, which re-enables clicking.

Coalescing GCD file system events

I have a class that implements a file-monitoring service to detect when a file I am interested in has been changed by something other than my application. I use the standard technique of opening the file (with the O_EVTONLY flag) and binding the file descriptor to a Grand Central Dispatch source of type DISPATCH_SOURCE_TYPE_VNODE. When I get an event, I notify my main thread with NSNotificationCenter's postNotificationName:object:userInfo: which calls an observer in my app delegate. So far so good. It works great. But, in general, if the triggering event is an attributes change (i.e. the DISPATCH_VNODE_ATTRIB flag is set on return from dispatch_source_get_data()) then I usually get two closely-spaced events. The behaviour is easily exhibited if I touch(1) the object I am monitoring. I hypothesise this is due to the file's mtime and atime being set non-atomically although I can't verify this. This can lead to spurious notifications being sent to my observer and this raises the possibility of race conditions etc.
What is the best way of dealing with this? I thought of storing a timestamp for the last event received and only sending a notification if the current event is later than this timestamp by some amount (a few tens of milliseconds?) Does this sound like a reasonable solution?

You can't ever escape the "race condition" in this situation, because the notification of your GCD event source in your process is not synchronous with the other process's modification of the underlying file. So, no matter what, you must always be tolerant of the possibility that the change you're being notified for could already be "gone."
As for coalescing, do whatever makes sense for your app. There are two obvious strategies. You can act immediately on a received event, and then drop subsequent events received in some time window on the floor, or you can delay every event for some time period during which you will drop other events for the same file on the floor. It really just depends on what's more important, acting quickly, or having a higher likelihood of a quiescent state (knowing that you can never be sure things are quiescent.)
The only thing I would add is to suggest that you do all your coalescence before dispatching anything to the main thread. The main thread has things like tracking loops, etc that will make it harder to get time-based coalescing right in certain cases.

Events changing state in CQRS

This should be easy to follow, but after some reading I still can find an answer.
So, say that the user needs to change his mobile number, to accomplished that, we might have a command as: ChangedUserMobileNumber
holding the new number. The domain responsible for handling the command will perform the change in the aggregate and publish an event: UserMobilePhoneChanged
There is a subscriber for that event in another domain, which also holds the user mobile number in its aggregate but according to our software architect, events can not old any data so what we end up is rather stupid to say the least:
The Domain 1, receives the command to update the mobile number, the number is updated and one event is published, also, because the event cannot hold data, the command handler in the Domain 1 issues yet another command which is sent to Domain 2. The subscriber of that event lives in Domain 2 too, we then have a Saga to handle both the event and the command.
In terms of implementation we are using NServiceBus, so we have this saga to handle these message and in it we have this line of code, where the entity.IsMobilePhoneUpdated field stored in a saga entity is changed when the event is handeled.
bool isReady = (entity.IsMobilePhoneUpdated && entity.MobilePhoneNumber != null);
Effectively the Saga is started by both the command and the event raised in the Domain 1, and until this condition is met, the saga is kept alive.
If it was up to me, I would be sending the mobile number in the event itself, I just want to get a few other opinions on this.
Thanks

I'm not sure how a UserMobilePhoneChanged event could be useful in any way unless it contained the new phone number. User asks to change a number, the event shoots out that it has. Should be very simple indeed. Why does your architect say that events shouldn't contain any information?

In the first event based system i've designed events also had no data. I also did enforce that rule. At the time that sounded like a clever decision. After a while i realised that it was dumb, and i was making a lot of workarounds because of it. Also this caused a lot of querying form the event subscribers, even for trivial data. I had no problem changing this "rule" after i realised i'm doing it wrong.
Events should have all the data required to make them meaningful. Also they should only have the data that makes sense for that event. ( No point in having the user address in a ChangePhoneNumber message )
If your architect imposes such a restriction, it's not going to be easy to develop a CQRS system. How are the read models updated? Since the events have no data then you either query something to get the data ( the write side ? ) of find some way of sending a command to the read model ( then what's the point of publishing events? ). To fix your problem you should try to have a professional discussion with this architect, preferably including other tech heads and without offending anybody try to get him to relax this constraint.
On argument you could use is Event Sourcing. Event Sourcing is complementary to CQRS and would not make sense without events that have data. Even more when using event sourcing, the only data you have is the data stored in the events. Even if you don't actually implement event sourcing you can use it's existence as a reason for events to have data.
There is little point in finding a technical solution to a people problem.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio