Firehose pattern - real-time streaming in kafka - events

In a book called Typescript microservices, I found a sentence containing weirdly named pattern, called "Event Firehose". I can't find any definition of this pattern in google. I quoted a sentence below.
When you want to use real-time streaming, use the Event Firehose pattern, which has KAFKA as one of its key components.

Related

Wildcard topic names and capturing message from all topics

We are using AWS IoT.
We have predefined topics, (+/device/), where the devices publish the messages.
But there is a possibility that, the devices can publish the messages to any other topics.
I want to calculate the number of messages which are published to all the topics, by these individual devices and implement throttling.
I tried to create IoT rules using wildcard topic names like ( +/* or /), but none of these wildcard topics seem to work.
Is there any wildcard topic name, which I can use to capture the messages from all the topics?
Or is there any way to dump all the messages on all the topics somewhere in DynamoDB or S3 and calculate the number of messages from individual devices in a specific time period?
I tried to create IoT rules using wildcard topic names like ( +/* or /), but none of these wildcard topics seem to work.
Is there any wildcard topic name, which I can use to capture the messages from all the topics?
+ and # are the relevant wildcards for AWS IoT rules. See https://docs.aws.amazon.com/iot/latest/developerguide/iot-sql-from.html
You can configure a rule with the following statement to capture messages from all topics.
SELECT * FROM '#'
Or is there any way to dump all the messages on all the topics somewhere in DynamoDB or S3 and calculate the number of messages from individual devices in a specific time period?
One approach is to create a rule based on the one above and also pass the client ID on every message (using the clientid() function). The action for this rule could write the client Id to DynamoDB or S3. Then this information is available to do your calculation.
An alternate approach might be to write the messages and clientid to a Kinesis Data Stream and use Kinesis Data Analytics to detect the errant devices.

How to route based on content with high perfomance?

In nifi, if I am listening to Kafka from single topic and based on the routing logic it'll call the respective process group.
However, in RouteOnContent processor, if we give regular expression for checking the occurance of string will it affect performance or how to achieve the a good performance while routing based on condition.
It would be more efficient to do some split at KSQL / Stream Processing level into different topics and have Nifi reading from different topics?
Running a regex on the content of each message is an inefficient approach, consider if you can modify your approach to one of the following:
Have your Producers write the necessary metadata into a Kafka Header which can use a much more efficient RouteOnAttribute processor in NiFi. This is still message-at-a-time which has throughput limitations
If your messages conform to a schema, use the more efficient KafkaRecord processors in NiFi with a QueryRecord approach which will significantly boost throughput
If you cannot modify the source data and the regex logic is involved, it may be more efficient to use a small Kafka Streams app to split the topic before processing the data further downstream

Filter-Interceptor-Kafka topics

Hi am trying to build a analytical engine to determine realtime analysis of urls\events being used by client as well as to log the performance of api.
Following is the logic I am planning to implement:
1. Create a filter to intercept urls
2. Code filter as a reusable jar which have the logic to intercept them
using mvc-interceptors.
3. The interceptor will produce and publish events into kafka streams if url pattern is matched.
My confusion is this is the best approach to achieve this. Or is there any alternative better approach, keeping in mind high traffice flow into apis.
If the filtering is just done a single message at a time it could also be done in Kafka Connect using the new Single Message Transforms feature https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect

Apache Storm Topology using Flux YAML file

I am designing an Apache Storm topology using a Flux YAML topology definition file. The trouble is I don't see how to :-
Create a stream that sends to multiple bolts (the syntax seems to only include one 'to:' line).
Emit multiple named streams from a single bolt. This is perfectly legal in Apache Storm. I am concerned that the Stream 'name:' line is declared as 'optional - not used' and hence Flux does not seem to support this feature of Storm ?
Each destination needs to be listed as a separate stream as they have individual grouping definitions.
I don't think that's possible with Flux (0.10.0) yet.

Subscribe on channel by criteria

Im looking for some tool that provides pub/sub model, but instead string channels allows to subscribe on some data by criteria.
I need to publish message to websocket connections each of them correspond to authenticated userwho fit numeric range mongodb query.
Read this: http://redis.io/topics/pubsub
Redis allows pattern-based subscription (not by regexp though, but allows asterisk operator).

Resources