How to debug a Process Query Operator Event dropped in StreamInsight

How to debug a Process Query Operator Event dropped in StreamInsight - debugging

I have a Process deployed on a self-hosted MSSI server. Bound to this Process I have a simple Pass-through query.
Some events gets dropped here "cep:/Server/Application/Erp/Entity/Event_Events_Process1/Query/StreamableBinding_1/Operator/Stream_1_CleanseInput"
I can see the counter of event dropped going up and I cannot find the reason why it's dropping.
Does anyone know how to debug that?

You can use the StreamInsight Event Flow Debugger. Make sure your application exposes the StreamInsight Management Service so you can hook up with the debugger. Then you can record the events which you can debug/step-through in the debugger.
Chances are your events are being dropped because of CTI violations. You might be enqueueing events that based on their start time occurred before the last CTI event.

That's absolutely a CTI violation. You'll see this behavior when you are issuing CTIs declaratively (for example, by specifying AdvanceTimeSettings.IncreasingStartTime or StrictlyIncreasingStartTime). There are a couple of ways that you can handle this:
1) Enqueue your CTIs programmatically. But you'll have to be careful of violations! (They'll cause an exception).
2) Tweak your AdvanceTimeSettings to include a Delay. You won't be able to use IncreasingStartTIme or StrictlyIncreasingStart time but you will be able to specify the CTI span duration or event count and a delay. Keep the delay small enough to keep your stream lively but large enough to not drop events. I can't tell you what that is; it'll depend on your events.

Related

Is it possible to pause and resume Kafka Stream conditionally?

I have a requirement as stated # https://kafka.apache.org/21/documentation/streams/developer-guide/dsl-api.html#window-final-results for waiting until window is closed in order to handle late out of order event by buffering it for duration of window.
Per my understanding of this feature is once windowing is created, the window works like wall clock processing, e.g. Creating for 1 hour window, The window starts ticking once first event comes. This 1hr window is closed exactly one hour later and all the events buffered so far will be forwarded to down stream. However, i need to be able to hold this window even longer say conditionally for as long as required e.g. based on state / information in external system such as database.
To be precise my requirement for event forwarding is (windows of 1 hour if external state record says it is good) or (hold for as long as required until external record says it's good and resume tracking of the event until the event make it fully 1hr, disregarding the time when external system is not good)
To elaborate this 2nd condition, e.g. if my window duration 1 1hr , my event starts at 00:00, if on 00:30 it is down and back normal on 00:45, the window should extend until 01:15.
Is it possible to pause and resume the forwarding of events conditionally based on my requirement above ?
Do I have to use transformation / processor and use value store manually to track the first processing time of my event and conditionally forwarding buffered events in punctuator ?
I appreciate all kind of work around and suggestion for this requirement.

the window works like wall clock processing
No. Kafka Streams work on event-time, hence, the timestamps as returned from the TimestampExtractor (by default the embedded record timestamp) are use to advance time.
To be precise my requirement for event forwarding is (windows of 1 hour if external state record says it is good)
This would need a custom solution IMHO.
or (hold for as long as required until external record says it's good and resume tracking of the event until the event make it fully 1hr, disregarding the time when external system is not good)
Not 100% if I understand this part.
Is it possible to pause and resume the forwarding of events conditionally based on my requirement above ?
No.
Do I have to use transformation / processor and use value store manually to track the first processing time of my event and conditionally forwarding buffered events in punctuator ?
I think this might be required.
Check out this blog post, that explains how suppress() work in details, and when it emits based on observed event-time: https://www.confluent.io/blog/kafka-streams-take-on-watermarks-and-triggers

Reliable Asynchronous Handling of Domain Events

In concurrent systems, domain events are typically handled asynchronously. In Go, a simple approach for asynchronous event handling can be implemented via channels, but the issue is that if something bad happens for handling an event, or worst, for the whole program, the event will be lost.
How asynchronous domain events can be handled properly in a Go program, i.e.:
When an event handler fails, the event should not be purged from the event queue, in order to be handled properly in a later time.
If the whole program goes down, the events have to be recovered and processed accordingly.

The first is relatively easy; you can have an error handler within the worker that re-queues the work in the event of an error.
The second is much harder; your options are a) roll your own bulletproof mechanism for writing events to disk and purging them when they're completed in a thread-safe way or b) use one of the many, many popular systems available that's already proven reliable, e.g. RabbitMQ or Kafka, with the appropriate replication and redundancy to ensure the level of reliability you require. I would strongly recommend the latter.

How to handle side effects based on multiple events in a message driven microservice system?

we are currently working in a message driven Microservice environment and some of our messages/events are event sourced (using Apache Kafka). Now we are struggling with implementing more complex business requirements, were we have to take multiple events into account to create new events and side effects.
In the current situation we are working with devices that can produce errors and we already process them and have a single topic which contains ERROR_OCCURRED and ERROR_RESOLVED events (so they are in order). We also make sure, that all messages regarding a specific device always go onto the same partition. And both messages share an ID that identifies that specific error incident. We already have a projection that consumes those events and provides an API for our customers, s.t. they can see all occurred errors and their current state.
Now we have to deal with the following requirement:
Reporting Errors
We need a push system that reports errors of devices to our external partners, but only after 15 minutes and if they have not been resolved in that timeframe. Our first approach was to consume all ERROR_RESOLVED events, store the IDs and have another consumer that is handling the ERROR_OCCURRED events in a delayed fashion (e.g. by only consuming the next ERROR_OCCURRED event on the topic if its timestamp is at least 15 minutes old). We would then be able to know if that particular error has already been resolved and does not need to be reported (since they share a common ID with the corresponding ERROR_RESOLVED event). Otherwise we send an HTTP request to our external partner and create an ERROR_REPORTED event on a new topic. Is there any better approach for delayed and conditional message processing?
We also have to take the following special use cases into account:
Service restarts: currently we are planning to keep the list of resolved errors in memory, so if a service restarts, that list has to be created from scratch. We could just replay the ERROR_RESOLVED messages, but that may take some time and in that time no ERROR_OCCURRED events should be processed because that may result in reporting errors that have been resolved in less then 15 minutes, but we are just not aware of it. Are there any good practices regarding replay vs. "normal" processing?
Scaling: we may increase or decrease the number of instances of our service at any time, so the partition assignment may change during runtime. That should not be a problem if we create a consumer group for each service instance when consuming the ERROR_RESOLVED events, s.t. every instance knows all resolved errors while still only handling the ERROR_OCCURRED events of its assigned partitions (in another consumer group which is shared by all instances). Is there a better approach for handling partition reassignment and internal state?
Thanks in advance!

For side effects, I would record all "side" actions in the event store. In your particular example, when it is time to send a notification, I would call SEND_NOTIFICATION command that emit NOTIFICATION_SENT event. These events would be processed by some worker process that does actual HTTP request.
Actually I would elaborate this even furter, since notifications could fail, so I would have, say, two events NOTIFICATION_REQUIRED, and NORIFICATION_SENT, so we can retry failed notifications.
And finally your logic would be "if error was not resolved in 15 minutes and notification was not sent - send a notification (or just discard if it missed its timeframe)"

Coalescing GCD file system events

I have a class that implements a file-monitoring service to detect when a file I am interested in has been changed by something other than my application. I use the standard technique of opening the file (with the O_EVTONLY flag) and binding the file descriptor to a Grand Central Dispatch source of type DISPATCH_SOURCE_TYPE_VNODE. When I get an event, I notify my main thread with NSNotificationCenter's postNotificationName:object:userInfo: which calls an observer in my app delegate. So far so good. It works great. But, in general, if the triggering event is an attributes change (i.e. the DISPATCH_VNODE_ATTRIB flag is set on return from dispatch_source_get_data()) then I usually get two closely-spaced events. The behaviour is easily exhibited if I touch(1) the object I am monitoring. I hypothesise this is due to the file's mtime and atime being set non-atomically although I can't verify this. This can lead to spurious notifications being sent to my observer and this raises the possibility of race conditions etc.
What is the best way of dealing with this? I thought of storing a timestamp for the last event received and only sending a notification if the current event is later than this timestamp by some amount (a few tens of milliseconds?) Does this sound like a reasonable solution?

You can't ever escape the "race condition" in this situation, because the notification of your GCD event source in your process is not synchronous with the other process's modification of the underlying file. So, no matter what, you must always be tolerant of the possibility that the change you're being notified for could already be "gone."
As for coalescing, do whatever makes sense for your app. There are two obvious strategies. You can act immediately on a received event, and then drop subsequent events received in some time window on the floor, or you can delay every event for some time period during which you will drop other events for the same file on the floor. It really just depends on what's more important, acting quickly, or having a higher likelihood of a quiescent state (knowing that you can never be sure things are quiescent.)
The only thing I would add is to suggest that you do all your coalescence before dispatching anything to the main thread. The main thread has things like tracking loops, etc that will make it harder to get time-based coalescing right in certain cases.

I Need an Analogy: Triggers and Events

For another question, I'm running into a misconception that seems to arise here at SO occasionally. Some questioners seem to think that Triggers are to Databases as Events are to OOP.
Does anyone have a good analogy to explain why this is a flawed comparison, and the consequences of misapplying it?
EDIT:
Bill K. has hit it correctly, but maybe doesn't see the importance of the critical differeence between the event and the callback function that strikes me, anyway. Triggers actually cause code to execute every time the event occurs; callbacks only occur whenever one has been registered for an event (which is not true for the vast majority of events); and even then, in most cases the callback's first action is to deregister itself (or at least the callback contains a qualifcation exit so it only executes once.)
If you write a trigger, it will unfailingly execute every time the event occurs, because there's no way to register or deregister to code segment.
Triggers are a way to interpose repeating logic synchronously into the thread of execution (i.e. synchronicity). Events are a means to defer logic until later (i.e. implement asynchronicity).
There are exceptions and mitigations in both cases, but the basic patterns of triggers and callbacks are mostly opposite in intention and implementation. Often the distinction doesn't seem to have fully sunk in. (IMHO, YMMV). :D

They're not the same thing, but they're not unrelated.
In both cases, the mechanism can be described approximately as follows:
Some block of code declares "interest" for changes in state.
Your application affects some change.
The system runs the block of code in response to the change.
Perhaps a database trigger is more like a callback function that has registered interest in a specific event.
Here's an analogy: the event is a rubber ball that you throw. The trigger is a dog that chases after a thrown ball.
If there's some other difference that you have in mind that makes it "dangerous" (note: OP has edited this choice of word out of the question) to compare triggers and events, you can describe what you mean.
Triggers are a way to interpose
repeating logic synchronously into the
thread of execution (i.e.
synchronicity). Events are a means to
defer logic until later (i.e.
implement asynchronicity).
Okay, I see what you mean more clearly. But I think it's in some ways subject to the implementation. I wouldn't assume an event handler has to deregister itself; it depends on the system you're using. A UNIX signal handler, for example, has to prevent itself from catching a new signal while it's already handling one. But a Java servlet inside a Tomcat container should be thread-safe because it may be called concurrently by multiple threads. They're both event handlers, of different kinds.
Event handlers may be synchronous or asynchronous. Can a handler in a publish/subscribe system read messages that were posted recently, but prior to the handler registering its interest? Or only messages posted concurrently?
There's another important reason to treat triggers as different from event handlers: I frequently recommend against doing anything in a trigger that affects state outside the database.
For example, sending an email, writing to a file, posting to a web service, or forking a process is inappropriate inside a trigger. If for no other reason than the transaction that spawned the trigger may be rolled back, but you can't roll back those external effects. You may not even be using explicit transactions, but say you send an email in a BEFORE trigger, but the operation fails because of a NOT NULL constraint or something.
Instead, all such work should be done by code in one's application, after one has confirmed that the SQL operation was successful and the transaction committed.
It's too bad that people keep trying to do inappropriate work inside a trigger. There are senior developers at MySQL who promote UDFs to read and write data in memcached. Wow -- I just noticed these have made it into the MySQL 6.0 product!! Shocking!
So here's another attempt at an analogy, comparing triggers and events to the process of a criminal trial:
A BEFORE trigger is an allegation.
An AFTER trigger is an indictment.
COMMIT is a conviction after a guilty verdict.
ROLLBACK is an acquittal after an innocent verdict.
You only want to put the perpetrator in prison after they are convicted.
Whereas an EVENT is the crime itself.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio