With refernce to Kafka, what is the difference between all these?
Lets say I have a component "Order" which must emit events into a kafka-channel when I create/cancel/modify orders.
And I create a channel. As "Order-out". Is topic the name I can use for this channel?
What is topic vs channel?
And this is the Order-Details component. Which creates & maintains records of all such orders.
I want to use an orderEvents class inside subscriber section of this component.
public class OrderEvents {
public static final String ORDER_CREATED = "ORDER_CREATED";
public static final String ORDER_MODIFIED = "ORDER_MODIFIED";
public static final String ORDER_CANCELLED = "ORDER_CANCELLED";
}
An event is a single record. In Spring, you might work with a Message class to wrap an event.
Channel is a Spring Integration term used via Spring-Kafka or Spring Cloud Stream Binders for inputs and outputs. A Binder determines the implementation of the Channel.
Topic is a Kafka unit of organization.
An event will be serialized into bytes, and sent via a channel to a Kafka topic.
A Kafka record will be consumed from a Kafka topic, through a channel, and deserialized into an application event class.
Related
Below is a sample POC developed in ASP.net Core 6.0 API that uses MassTransit and RabbitMQ to simulate a simple publish/subscribe using MassTransit consumer. However when the code is executed it results in creation of 2 Exchanges and 1 Queue in RabbitMQ.
Program.cs
builder.Services.AddMassTransit(msConfig =>
{
msConfig.AddConsumers(Assembly.GetEntryAssembly());
msConfig.UsingRabbitMq((hostcontext, cfg) =>
{
cfg.Host("localhost", 5700, "/", h =>
{
h.Username("XXXXXXXXXXX");
h.Password("XXXXXXXXXXX");
});
cfg.ConfigureEndpoints(hostcontext);
});
});
OrderConsumer.cs
public class OrderConsumer : IConsumer<OrderDetails>
{
readonly ILogger<OrderConsumer> _logger;
public OrderConsumer(ILogger<OrderConsumer> logger)
{
_logger = logger;
}
public Task Consume(ConsumeContext<OrderDetails> context)
{
_logger.LogInformation("Message picked by OrderConsumer. OrderId : {OrderId}", context.Message.OrderId);
return Task.CompletedTask;
}
}
Model
public class OrderDetails
{
public int OrderId { get; set; }
public string OrderName { get; set; }
public int Quantity { get; set; }
}
Controller
readonly IPublishEndpoint _publishEndpoint;
[HttpPost("PostOrder")]
public async Task<ActionResult> PostOrder(OrderDetails orderDetails)
{
await _publishEndpoint.Publish<OrderDetails>(orderDetails);
return Ok();
}
Output from Asp.Net
As highlighted 2 Exchanges are created Sample:OrderDetails and Order.
However, the Sample:OrderDetails is bound to Order (Exchange)
And the Order (Exchange) routes to "Order" queue.
So, the question is regarding the 2 Exchanges that got created where I am not sure if that's per design or its a mistake on the code that led to both getting created and if its per design, why the need for 2 exchange.
I was pondering the same question when I first started playing with MassTransit, and in the end came to understand it as follows:
You are routing two types of messages via MassTransit, events and commands. Events are multicast to potentially multiple consumers, commands to a single consumer. Every consumer has their own input queue to which messages are being routed via exchanges.
For every message type, MassTransit by default creates one fanout exchange based on the message type and one fanout exchange and one queue for every consumer of this message.
This makes absolute sense for events, as you are publishing events using the event type (with no idea who or if anyone at all will consume it), so in your case, you publish to the OrderDetails exchange. MassTransit has to make sure that all consumers of this event are bound to this exchange. In this case, you have one consumer, OrderConsumer. MassTransit by default generates the name of the consumer exchange based on the type name of this consumer, removing the Consumer suffix. The actual input queue for this consumer is bound to this exchange.
So you get something like this:
EventTypeExchange => ConsumerExchange => ConsumerQueue
or in your case:
Sample:OrderDetails (based on the type Sample.OrderDetails) => Order (based on the type OrderConsumer) => Order (again based on the OrderConsumer type)
For commands this is a bit less obvious, because a command can only ever be consumed by one consumer. In fact you can actually tell MassTransit not to create the exchanges based on the command type. However, what you would then have to do is route commands not based on the command type, but on the command handler type, which is really not a good approach as now you would have to know - when sending a command - what the type name of the handler is. This would introduce coupling that you really do not want. Thus, I think it's best to keep the exchanges based on the command type and route to them, based on the command type.
As Chriss (author of MassTransit) mentions in the MassTransit RabbitMQ deep dive video (YouTube), this setup also allows you to potentially do interesting stuff like siphon off messages to another queue for monitoring/auditing/debugging, just by creating a new queue and binding it to the existing fanout exchange.
All the above is based on me playing with the framework, so it's possible I got some of this wrong, but it does make sense to me at least. RabbitMQ is extremely flexible with its routing options, so Chriss could've chosen a different approach (e. g. Brighter, a "competing" library uses RabbitMQ differently to achieve the same result) but this one has merit as well.
MassTransit also - unlike some other frameworks like NServiceBus or Brighter - doesn't really technically distinguish or care about the semantic difference between these two, e. g. you can just as well send or publish a command just as you can an event.
I am having a requirement. We have an spring boot kafka consumer app which is reading from a kafka topic. Our requirement is whenever the app goes down and comes up I would like to begin to the latest offset, and not be bothered by old values. Is there a possibility to reset the offset of the group? I have researched a little bit and used AbstractConsumerSeekAware to set the offset to the end using seekToEnd() as seen in the below code.
public class KafkaConsumer extends AbstractConsumerSeekAware {
#Autowired
private KafkaTemplate<String, String> kafkaTemplate;
#KafkaListener(topics = "${topic.consumer}")
public void consume(ConsumerRecord<?, ?> consumerRecord, Acknowledgment ack) {
//consuming the message from consumer Record.
seekToEnd();
doSomething();
ack.acknowledge();
}
However when we stopped the app and restarted it, it started reading from last offset where it left but we wanted to read only from the offset when the app started. How can we achieve this?
That's not correct to do that in the #KafkaListener method. This one is really called only when consumer delivers records from the partition(s).
You must implement an onPartitionsAssigned() for that reason, so the consumer is going seek those partitions before starting polling them.
See more in docs: https://docs.spring.io/spring-kafka/docs/current/reference/html/#seek
I am progressing on writing my first Kafka Consumer by using Spring-Kafka. Had a look at the different options provided by framework, and have few doubts on the same. Can someone please clarify below if you have already worked on it.
Question - 1 : As per Spring-Kafka documentation, there are 2 ways to implement Kafka-Consumer; "You can receive messages by configuring a MessageListenerContainer and providing a message listener or by using the #KafkaListener annotation". Can someone tell when should I choose one option over another ?
Question - 2 : I have chosen KafkaListener approach for writing my application. For this I need to initialize a container factory instance and inside container factory there is option to control concurrency. Just want to double check if my understanding about concurrency is correct or not.
Suppose, I have a topic name MyTopic which has 4 partitions in it. And to consume messages from MyTopic, I've started 2 instances of my application and these instances are started by setting concurrency as 2. So, Ideally as per kafka assignment strategy, 2 partitions should go to consumer1 and 2 other partitions should go to consumer2. Since the concurrency is set as 2, does each of the consumer will start 2 threads, and will consume data from the topics in parallel ? Also should we consider anything if we are consuming in parallel.
Question 3 - I have chosen manual ack mode, and not managing the offsets externally (not persisting it to any database/filesystem). So should I need to write custom code to handle rebalance, or framework will manage it automatically ? I think no as I am acknowledging only after processing all the records.
Question - 4 : Also, with Manual ACK mode, which Listener will give more performance? BATCH Message Listener or normal Message Listener. I guess if I use Normal Message listener, the offsets will be committed after processing each of the messages.
Pasted the code below for your reference.
Batch Acknowledgement Consumer:
public void onMessage(List<ConsumerRecord<String, String>> records, Acknowledgment acknowledgment,
Consumer<?, ?> consumer) {
for (ConsumerRecord<String, String> record : records) {
System.out.println("Record : " + record.value());
// Process the message here..
listener.addOffset(record.topic(), record.partition(), record.offset());
}
acknowledgment.acknowledge();
}
Initialising container factory:
#Bean
public ConsumerFactory<String, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<String, String>(consumerConfigs());
}
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> configs = new HashMap<String, Object>();
configs.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootStrapServer);
configs.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
configs.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, enablAutoCommit);
configs.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, maxPolInterval);
configs.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetReset);
configs.put(ConsumerConfig.CLIENT_ID_CONFIG, clientId);
configs.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
configs.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
return configs;
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<String, String>();
// Not sure about the impact of this property, so going with 1
factory.setConcurrency(2);
factory.setBatchListener(true);
factory.getContainerProperties().setAckMode(AckMode.MANUAL);
factory.getContainerProperties().setConsumerRebalanceListener(RebalanceListener.getInstance());
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setMessageListener(new BatchAckConsumer());
return factory;
}
#KafkaListener is a message-driven "POJO" it adds stuff like payload conversion, argument matching, etc. If you implement MessageListener you can only get the raw ConsumerRecord from Kafka. See #KafkaListener Annotation.
Yes, the concurrency represents the number of threads; each thread creates a Consumer; they run in parallel; in your example, each would get 2 partitions.
Also should we consider anything if we are consuming in parallel.
Your listener must be thread-safe (no shared state or any such state needs to be protected by locks.
It's not clear what you mean by "handle rebalance events". When a rebalance occurs, the framework will commit any pending offsets.
It doesn't make a difference; message listener Vs. batch listener is just a preference. Even with a message listener, with MANUAL ackmode, the offsets are committed when all the results from the poll have been processed. With MANUAL_IMMEDIATE mode, the offsets are committed one-by-one.
Q1:
From the documentation,
The #KafkaListener annotation is used to designate a bean method as a
listener for a listener container. The bean is wrapped in a
MessagingMessageListenerAdapter configured with various features, such
as converters to convert the data, if necessary, to match the method
parameters.
You can configure most attributes on the annotation with SpEL by using
"#{…} or property placeholders (${…}). See the Javadoc for more information."
This approach can be useful for simple POJO listeners and you do not need to implement any interfaces. You are also enabled to listen on any topics and partitions in a declarative way using the annotations. You can also potentially return the value you received whereas in case of MessageListener, you are bound by the signature of the interface.
Q2:
Ideally yes. If you have multiple topics to consume from, it gets more complicated though. Kafka by default uses RangeAssignor which has its own behaviour (you can change this -- see more details under).
Q3:
If your consumer dies, there will be rebalancing. If you acknowledge manually and your consumer dies before committing offsets, you do not need to do anything, Kafka handles that. But you could end up with some duplicate messages (at-least once)
Q4:
It depends what you mean by "performance". If you meant latency, then consuming each record as fast as possible will be the way to go. If you want to achieve high throughput, then batch consumption is more efficient.
I had written some samples using Spring kafka and various listeners - check out this repo
We are planning to use kafka flume-ng integration(Flafka) where flume is the consumer for kafka queues. Flume agents will receive files listing commands and their output as shown below:
root#host> [Command1]
[Output1]
root#host> [Command2]
[Output2]
The file may contain multiple commands and a command's output may be huge. We need to intercept the event (which is the file data) and split the event into multiple events based on commands. The source will then fan out the flow to multiple channel sending each sub-event to a channel(using multiplexing) and each sink will store the command info to respective Hive table.
Is it possible to use fanout flow to split an event to multiple events? Or If I ask in other way, Can we split an event into multiple events in an interceptor?
I have read about regex extractor interceptor and serializer, but not sure if it can be of any help for this scenario.
If I've understood well, you need the original event taken from a Kafka queue is splited into several, let's say, sub-events. And you want to know which piece of Flume could do that.
I think interceptors are not suitable for that purpose since interceptors are "placed" between the source and the channel, and they are designed to add, delete or modify the headers about a Flume event before putting it into the channel; as much, they can drop the entire event. But they are not able to generate several events based on other existing one.
I think you are looking for something like a handler attached to the source, able to interpret the events taken from Kafka and generating several Flume events at the source output. This concept is something similar to the handlers you can attach to a HTTPSoure (more details here). If such a thing is possible with your source, most probably you will have to develop your own custom handler, since the functionality you are required is very specific.
Thanks for the reply frb.
I want to split the incoming event to a flume source to multiple sub-events and send them to respective channel. So the first flume node in the topology will route the each sub-event (using multiplexing) to a specific hop which can handle that kind of piece of information.
As per you reply, I understand that it can not be done using interceptor. Can you please share any example or documentation of handlers?
Yes, flume cannot split event to multiple. Here is my alternative solution for this approach, take Kafka source for example.
First implement a source class that extends Kafka source, replace the default ChannelProcessor object.
public class XXXSplitSource extends KafkaSource {
#Override
public synchronized ChannelProcessor getChannelProcessor()
{
return new XXXYourChannelProcessorProxy(super.getChannelProcessor());
}
}
Then, in the ChannelProcessor proxy implemention, your can split the event with your custom funtion.
public class XXXYourChannelProcessorProxy extends ChannelProcessor {
public ChannelProcessor m_downstreamChannelProcessor = null;
public XXXYourChannelProcessorProxy (ChannelSelector selector) {
super(selector);
}
public XXXYourChannelProcessorProxy (ChannelProcessor processor) {
super(null);
m_downstreamChannelProcessor = processor;
}
#Override
public void processEventBatch(List<Event> events) {
List<Event> generatedEvents = YOUR_SPLIT_FUNCTION_HERE(events);
m_downstreamChannelProcessor.processEventBatch(generatedEvents);
}
}
I have 3 queues and these three queues need to be listened by MDBbean and accordingly based on reading input, i will split out the task for each category of input.
As of now, the code is working fine for only one queue and i don't know how to implement it for more than one queue. Could you please guide me
#MessageDriven(mappedName="receiver1")
public class MDBMessages implements MessageListener
How i can make my MDBMessage to listen for receiver2 and receiver 3 queue.
Thanks
Prabhakar
From Documentation :
A message-driven bean is defined for a
single messaging type, in accordance
with the message listener interface it
employs.
Therefore it will not be possible to map a MDB for multiple destination types.
Haven't tried, but you can try configuring MDB in ejb-jar.xml with different JNDI names pointing to the same class & add different destination to each of them. If configuration works, then MDBMessages will be able to listen messages for all specified queues in xml.
use the deployment descriptor to create multiple instances of your mdb. Each instance listens to one queue.
also there are brokers (like activeMQ) that allow one mdb to listen on multiple destinations of the same type (queue, topic), if they use the activemq resource adapter.
#Consumer(activationConfig = { #ActivationConfigProperty(
propertyName = "destinationType", propertyValue = "javax.jms.Queue"),
#**ActivationConfigProperty(propertyName = "destination",
propertyValue = "queue/MyTasksProcess"),**
public class MyProcessorMDBean implements Downloader {
public void processSomething(Serializable anyParameter){
//Do the actual processing
}
for a given message driven bean you can rout your message to single queue so can use only single destination type in your bean class.