Is it possible to pause and resume Kafka Stream conditionally?

Is it possible to pause and resume Kafka Stream conditionally? - apache-kafka-streams

I have a requirement as stated # https://kafka.apache.org/21/documentation/streams/developer-guide/dsl-api.html#window-final-results for waiting until window is closed in order to handle late out of order event by buffering it for duration of window.
Per my understanding of this feature is once windowing is created, the window works like wall clock processing, e.g. Creating for 1 hour window, The window starts ticking once first event comes. This 1hr window is closed exactly one hour later and all the events buffered so far will be forwarded to down stream. However, i need to be able to hold this window even longer say conditionally for as long as required e.g. based on state / information in external system such as database.
To be precise my requirement for event forwarding is (windows of 1 hour if external state record says it is good) or (hold for as long as required until external record says it's good and resume tracking of the event until the event make it fully 1hr, disregarding the time when external system is not good)
To elaborate this 2nd condition, e.g. if my window duration 1 1hr , my event starts at 00:00, if on 00:30 it is down and back normal on 00:45, the window should extend until 01:15.
Is it possible to pause and resume the forwarding of events conditionally based on my requirement above ?
Do I have to use transformation / processor and use value store manually to track the first processing time of my event and conditionally forwarding buffered events in punctuator ?
I appreciate all kind of work around and suggestion for this requirement.

the window works like wall clock processing
No. Kafka Streams work on event-time, hence, the timestamps as returned from the TimestampExtractor (by default the embedded record timestamp) are use to advance time.
To be precise my requirement for event forwarding is (windows of 1 hour if external state record says it is good)
This would need a custom solution IMHO.
or (hold for as long as required until external record says it's good and resume tracking of the event until the event make it fully 1hr, disregarding the time when external system is not good)
Not 100% if I understand this part.
Is it possible to pause and resume the forwarding of events conditionally based on my requirement above ?
No.
Do I have to use transformation / processor and use value store manually to track the first processing time of my event and conditionally forwarding buffered events in punctuator ?
I think this might be required.
Check out this blog post, that explains how suppress() work in details, and when it emits based on observed event-time: https://www.confluent.io/blog/kafka-streams-take-on-watermarks-and-triggers

Related

KafkaStreams handling late events when using windowing

Question:
If an event arrives after the window has closed, then, how do we re-direct it to another topic for handling the correction ?
Context:
We use tumbling windows
We use events source creation time(event-time) for defining windows
thanks

Currently, there is no API to do that. Late events are dropped and you cannot get a hand on them easily.
What you could do is, to have an upstream operator (like a transform()) before the window, the compares the record timestamp to the current "stream time" (you would need to track "stream time" manually within the operator) -- this should help you to detect if the downstream window will drop the record as late and react to it accordingly (for example using a branch() after transform() and before groupByKey().windonwedBy().

Spark streaming mapWithState timeout without remove

Imagine a use case where events are streaming in per user but only the first week of events are of interest. Within that time frame stateful logic is taking place using mapWithState. After that period the user incoming events should be disregarded.
As the user's state takes memory, it makes sense to change it after the user's week period to a simple already-seen-marker.
If any event comes in for that user a week or later after his first event, it is easy to change the state to that already-seen-marker.
But, if no events come after that week, the state never changes to that already-seen-marker, and the state will continue to occupy memory forever.
As far as I understand, adding a timeout ( to user's state ) will not help, as you are not allowed to change state for a timeout state ( makes sense, as it is going to be removed ).
Is there a simple way to achieve this use case?

From what I understand, Spark's 2.2 mapGroupsWithState has richer timeouts that can be used not only to remove a state, but also to change it (check here).

How to ignore events in LabView triggered outside of a particular sequence frame?

Using event structures in LabView can get confusing, especially when mixing them with a mostly synchronous workflow. My question is, when an event structure exists in one frame of a sequence, how can I force it to ignore events (e.g. mousedown on a particular button) that were triggered while the workflow is in another frame of the sequence?
Currently, the event structures only process the events at the correct frame in the sequence, but if one was triggered while the workflow is in the previous frame, it processes those too and I want it to ignore any events that weren't triggered in the frame that the event structure exists within.
http://puu.sh/hwnoO/acdd4c011d.png
Here's part of my workflow. If the mousedown is triggered while the left part is executing, I want the event structure to ignore those events once the sequence reaches it.

Instead of placing the event structure inside your main program sequence, put it in a separate loop and have it pass the details of each event to the main sequence by means of a queue. Then you can discard the details of the events you don't want by flushing the queue at the appropriate point.
Alternatively you could use a boolean control to determine whether the event loop sends event details to the queue or discards them, and toggle the boolean with a local variable from the main sequence.

You can register for events dynamically. Registration is the point in time at which the event structure starts enqueueing events, and in your case this happens when the VI the event structure is in enters run mode (meaning it's executing or one of its callers is). You can change it so that you register using the Register for Events node and then you would only get events from that point on. When you unregister you will stop getting events.
There's a very good presentation by Jack Dunaway going into some details about events here.
You can find the code for it here.
In LabVIEW 2013 and later there are additional options for controlling the events queue, but I won't go into them here.

http://puu.sh/hwsBE/fe50dee671.png
I couldn't figure out how to flush the event queue for built-in event types like mousedown, but I managed to get around that by creating a static reference to the VI and setting the cursor to busy during the previous sequence, disabling clicking. Then when the sequence for the event structure is reached, I unset the cursor from busy, which re-enables clicking.

Coalescing GCD file system events

I have a class that implements a file-monitoring service to detect when a file I am interested in has been changed by something other than my application. I use the standard technique of opening the file (with the O_EVTONLY flag) and binding the file descriptor to a Grand Central Dispatch source of type DISPATCH_SOURCE_TYPE_VNODE. When I get an event, I notify my main thread with NSNotificationCenter's postNotificationName:object:userInfo: which calls an observer in my app delegate. So far so good. It works great. But, in general, if the triggering event is an attributes change (i.e. the DISPATCH_VNODE_ATTRIB flag is set on return from dispatch_source_get_data()) then I usually get two closely-spaced events. The behaviour is easily exhibited if I touch(1) the object I am monitoring. I hypothesise this is due to the file's mtime and atime being set non-atomically although I can't verify this. This can lead to spurious notifications being sent to my observer and this raises the possibility of race conditions etc.
What is the best way of dealing with this? I thought of storing a timestamp for the last event received and only sending a notification if the current event is later than this timestamp by some amount (a few tens of milliseconds?) Does this sound like a reasonable solution?

You can't ever escape the "race condition" in this situation, because the notification of your GCD event source in your process is not synchronous with the other process's modification of the underlying file. So, no matter what, you must always be tolerant of the possibility that the change you're being notified for could already be "gone."
As for coalescing, do whatever makes sense for your app. There are two obvious strategies. You can act immediately on a received event, and then drop subsequent events received in some time window on the floor, or you can delay every event for some time period during which you will drop other events for the same file on the floor. It really just depends on what's more important, acting quickly, or having a higher likelihood of a quiescent state (knowing that you can never be sure things are quiescent.)
The only thing I would add is to suggest that you do all your coalescence before dispatching anything to the main thread. The main thread has things like tracking loops, etc that will make it harder to get time-based coalescing right in certain cases.

How to debug a Process Query Operator Event dropped in StreamInsight

I have a Process deployed on a self-hosted MSSI server. Bound to this Process I have a simple Pass-through query.
Some events gets dropped here "cep:/Server/Application/Erp/Entity/Event_Events_Process1/Query/StreamableBinding_1/Operator/Stream_1_CleanseInput"
I can see the counter of event dropped going up and I cannot find the reason why it's dropping.
Does anyone know how to debug that?

You can use the StreamInsight Event Flow Debugger. Make sure your application exposes the StreamInsight Management Service so you can hook up with the debugger. Then you can record the events which you can debug/step-through in the debugger.
Chances are your events are being dropped because of CTI violations. You might be enqueueing events that based on their start time occurred before the last CTI event.

That's absolutely a CTI violation. You'll see this behavior when you are issuing CTIs declaratively (for example, by specifying AdvanceTimeSettings.IncreasingStartTime or StrictlyIncreasingStartTime). There are a couple of ways that you can handle this:
1) Enqueue your CTIs programmatically. But you'll have to be careful of violations! (They'll cause an exception).
2) Tweak your AdvanceTimeSettings to include a Delay. You won't be able to use IncreasingStartTIme or StrictlyIncreasingStart time but you will be able to specify the CTI span duration or event count and a delay. Keep the delay small enough to keep your stream lively but large enough to not drop events. I can't tell you what that is; it'll depend on your events.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio