How to analyze event sequences with Apache Flink to trigger actions? - events

I want to analyze event streams in real-time with Apache Flink and trigger actions based on:
event windows, in which particular events occured ("if event A and event B occurred within 30 seconds -> trigger action")
event sequences, in which particular events occurred ("if event A occurred after event B and event C occurred after event B -> trigger action")
combinations of both
I know flink is capable of the windowing via stream.windowAll(...) but I am unsure how to reflect the event sequences.
How could such event sequence detections be achieved?

FlinkCEP (https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/libs/cep.html) is a CEP (Complex Event Processing) library capable of a more abstract way of processing streams of events and (among others) covers the scenarios you have described.

Related

How does Azure Event Grid handle failure when there are multiple subscribers?

The documentation for Event Grid states that it has a delivery and retry mechanism built in, and gives an example of what would classify as a successful or failed attempt. The documentation is very clear about what happens with a single event handler.
My question is, what happens if there are multiple event handlers, and only one handler fails to receive the event? Is the event retried only for that handler, or will all handlers see the retry?
Basically, the Azure Event Grid eventing Pub/Sub model can handle two messaging/mediation patterns such as Fan-In pattern and Fan-Out (broadcasting) pattern. The following screen snippets show their differences:
The logical connectivity between the Event Source and Event Sink is described by Subscription which it is basically a metadata artifact of the Pub/Sub model. Each logical connectivity (represented by Subscription) is independent and loosely decouple to others. In other words, each subscriber can handle in this Pub/Sub model only one logical connectivity such as only one event source.
Your question is related to the Fan-Out (broadcasting) pattern, where the event interest is broadcasting to the multiple subscribers using a PushWithAck delivery mode. Each subscription within this Fan-Out pattern has own "a message state delivery machine" declared by subscriber such as retrying option, deadlettering, filtering, etc.
In other words, the event delivery to the subscribers is processing in parallel way based on their subscription in the transparent manner without any dependences each other. Note, that the subscriber doesn't have any information about who, where, how, etc. are event delivering to other once, so each subscriber can see only own delivery state, for instance, the value of the Aeg-Delivery-Count shows a retry counter of the state machine.
So, in the case of the failed event delivery to the one of the multiple subscribers, the enabled retrying process is performing only for that subscriber.
As Roman explained, each endpoint is handled independently. If one event handler fails, it will be retried without affecting the other event handlers, and of course, if that particular endpoint continues to fail, it will eventually be deadlettered (assuming deadlettering has been configured on the event subscription), or dropped.
When coming to event publishing in event grids, the events from custom event grid topics, or system event grid topics(say Service Bus Namespaces) are forwarded to the event grid subscriptions configured with them. The events are then sent to the endpoints configured with the event grid subscription.
Whenever the event delivery to an endpoint fails, it is retried based on the retry policy configured.If the number of retries exceed the retry policy configured, the events are stored in the storage account blob if configured as the dead-letter destination, else the events will be lost.
By default, Event Grid expires all events that aren't delivered within 24 hours. You can customize the retry policy when creating an event subscription. You provide the maximum number of delivery attempts (default is 30) and the event time-to-live (default is 1440 minutes).
When there are multiple subscribers(event grid subscriptions) to a same event grid topic, retry occurs only with the event grid subscription whose event delivery has failed.
refer Event Grid message delivery and retry for more info on retry policy.

MassTransit Saga - raise multiple events from a saga

I am a beginner on service bus and trying to understand the concept of Saga and Statemachines using Masstransit and Automatonymous.
The situation I have is, a saga to calculate food consumption in a county.
The saga will be triggered for a geographical state. The first task is to get a list of all counties using an API call.
Then, for each county publish/send an event for food consumption.
Once the food consumption calculator finishes, it will raise an event.
The original saga will wait for all of these completion events and when it receives all of them, will collate the responses and create a final statistical report.
I was able to find the ability to collate the responses using this thread
Handling transition to state for multiple events
However, I am unable to find a way of invoking the geographical-states lookup api and fire individual events from within the state machine.

Listening on multiple events

How to deal with correlated events in an Event Driven Architecture? Concretely, what if multiple events must be triggered in order for some action to be performed. For example, I have a microservice that listens to two events foo and bar and only performs an action when both of the events arrive and have the same correlation id.
One way would be to keep an internal data structure inside the microservice that does the book keeping and when everything is satisfied an appropriate action is triggered. However, the problem with this approach is that the microservice is not immutable anymore.
Is there a better approach?
A classic example is where an order comes in at sales and an event is published. Both Finance and Shipping are subscribed to the event, but shipping is also subscribed to the event coming from finance.
The funny thing is that you have no idea on the order in which the messages arrive. The event from sales might cause a technical error, because the database is offline. It might get queued again or end up in an error queue for operations to retry it. In the meantime the event from finance might arrive. So theoretically
the event from sales should arrive first and then the finance event, but in practice it can be the other way around.
There are a number of solutions here, but I've never liked the graphical ones. As a .NET developer I've used K2 and Windows Workflow Foundation in the past, but the solutions most flexible are created in code, not via a graphical interface.
I currently would use NServiceBus or MassTransit for this. On a sidenote, I currently work at Particular Software and we make NServiceBus. NServiceBus has Sagas for this kind of work (documentation) and you can also read on my weblog about a presentation, incl. code on GitHub.
The term saga is kind of loaded, but it basically handles long running (business) processes. Gregor Hohpe calls it a Process Manager (link).
To summarize what sagas do : they are instantiated by incoming messages and have state. Incoming messages are bound/dispatched to a specific saga instance based on a correlationid, for example a customer id or order id. Once the message (event) is processed, state is stored until a new message arrives, or until the code marks the saga as completed and the state is removed from storage.
As said, in the .NET world MassTransit and NServiceBus support this, but there are most likely alternatives in other environments.
If i understand correctly, it looks like you need a CEP ( complex event processor), like ws02 cep or other , which does exactly that.
cep's can aggregate events and perform actions when certain conditions
have been met.

Gstremer Events

I am having a bit of trouble understanding how events work in Gstreamer. I understand that you can pass events to elements from the application to end a stream or block a pad etc., but when I look at the sample code in here, it seems like the program is not sending any specific event, just listening them through probes. If the program is only listening events through probes, then these events have to be sent between elements in some kind of fashion after certain things automatically. However, I couldn't find any information regarding to this. How does the events work in Gstreamer?
More information on the design of gstreamer events can be found here (https://github.com/GStreamer/gstreamer/blob/master/docs/random/events). This document describes how the various events propagate through a pipeline.
In the provided sample code, an EOS event is sent to an element with the function :
gst_pad_send_event (sinkpad, gst_event_new_eos ());
The element then proceeds to flush all of its buffers and forwards the EOS event downstream to the next element by posting the event on its src pad. This event continues through the elements until it reaches the installed probe which contains special logic to manipulate the pipeline if an EOS event is received.
This sample shows several things in regards to your question:
- Events are intrinsically handled within the gstreamer pipeline. The gstreamer elements automatically handle them.
- Pad Probes can be used to externally observe/modify events as they propagate through the pipeline.
- Events can be directly inserted within a pipeline using the function gst_pad_send_event or gst_element_send_event

Event driven programming with weblogic MDB

I am building an application which acts as an event listener and based on the events received it needs to execute certain steps or work-flow. Is it better to have events posted to a single queue and MDB invoking different business logic components based on event type or to have one queue per event type and the corresponding MDBs invoke different business logic ?
Our assumption is that a heavy workflow corresponding to a particular event will not affect the performance of other events since they are processed in separate queues.
Jms has a specific type of operation to support this use-case - message selectors.
Briefly, the business-logic message type would be set as a property of the message, and you would use a selector to filter them on a per-consumer basis.
The JMS spec assumes that the JMS implementation will perform optimizations to make these operations efficient, so that it should scale very well. This is the sort of tech that banking transactions are built on.

Resources