Can producer find the additions and removals of brokers in Kafka 0.8? - producer-consumer

We knowthat, in kafka 0.7, we can specify zk.connect for producer, so producer can find the additions and removals of broker. But in kafka 0.8, we can't specify zk.connect for producer. Can producer in kafka 0.8 find that? If not, the scalability of the system is not worse than the 0.7 version?

You can still use a ZooKeeper client to retrieve the broker list:
ZkClient zkClient = new ZkClient("localhost:2108", 4000, 6000, new BytesPushThroughSerializer());
List<String> brokerList = zkClient.getChildren("/brokers/ips");
According to that, you do not have to "hardcode" the broker list on client side and you are flexible as far as the system architecture is concerned. But anyway, this would add the ZooKeeper dependency again which is in fact an disadvantage for producer in several environments.
If you want to get a detailed view to the so called "cluster metadata API" solution, check out this link: https://issues.apache.org/jira/browse/KAFKA-369
Best
pre
P.S.: Sorry for reposting this to your other question - but the answer fits on both ;-)

Little confused what exactly you are looking for, in 0.8 we must specify the list of brokers in the metadata.broker.list property
Properties props = new Properties();
props.put("metadata.broker.list", "broker1:9092,broker2:9092");
from the kafka consumer example they say
The property, “metadata.broker.list” defines where the Producer can find a one or more Brokers to determine the Leader for each topic. This does not need to be the full set of Brokers in your cluster but should include at least two in case the first Broker is not available. No need to worry about figuring out which Broker is the leader for the topic (and partition), the Producer knows how to connect to the Broker and ask for the meta data then connect to the correct Broker.
By saying additions do you mean adding new node to your cluster ?

Related

How to use RabbitMQ quorum queue for data replication

In RabbitMQ documentation, it is mentioned that:
All data/state required for the operation of a RabbitMQ broker is replicated across all nodes. An exception to this are message queues, which by default reside on one node, though they are visible and reachable from all nodes. To replicate queues across nodes in a cluster, use a queue type that supports replication. This topic is covered in the Quorum Queues guide.
If we are using springboot amqp classic queue and we need to start using a cluster of RabbitMQ where data is replicated across nodes for a lowest risk of data loss, what changes needs to be done to the code to start using a quorum queue?
When defining the queue, by default, the type is classic queue, to choose the quorum type instead, just add the type of queue as argument:
#Bean
public Queue eventsQueue() {
Map<String, Object> args = new HashMap<>();
args.put("x-queue-type", "quorum");
return new Queue(queueName, true, false, false, args);
}
In addition to the above, make sure you point your spring boot rabbit mq to the cluster not to one node. This can be done by changing spring.rabbitmq.host configuration in application.properties to spring.rabbitmq.addresses=[comma separated ip:port]
A Classic Queue has a master running somewhere on a node in the cluster, while the mirrors run on other nodes. This works the very same way for Quorum Queues, whereby the leader, by default, runs on the node the client application that created it was connected to, and followers are created on the rest of the nodes in the cluster.
In the past, replication of queues was specified by using policies in conjunction with Classic Queues. Quorum queues are created differently, but should be compatible with all client applications which allow you to provide arguments when declaring a queue. The x-queue-type argument needs to be provided with the value quorum when creating the queue.

in Kafka, how to make consumers consume from local partition?

Just to make the scenario simple.
number of consumers == number of partitions == Kafka broker numbers
If deploy the consumers on the same machines where the brokers are, how to make each consumer only consume the messages locally? The purpose is to cut all the network overhead.
I think we can make it if each consumer can know the partition_id on their machines, but I don't know how? or is there other directions to solve this problem?
Thanks.
bin/kafka-topics.sh --zookeeper [zk address] --describe --topic [topic_name] tells you which broker hosts the leader for each partition. Then you can use manual partition assignment for each consumer to make sure it consumes from a local partition.
Probably not worth the effort because partition leadership can change and then you would have to rebalance all your consumers to be local again. You can save the same amount of network bandwidth with less effort by just reducing the replication factor from 3 to 2.
Maybe you could use the Admin Client API.
First you can use the describeTopics() methods for getting information about topics in the cluster. From the DescribeTopicResult you can access to TopicPartitionInfo with information about partitions for each topic. From there you can access to the Node through the leader(). Node contains the host() and you can check if it's the same as the host your consumer is running or id() and the consumer should have the information about the broker-id running on the same machine (in general it's an information you can define upfront). More infor on Admin Client API at the following JavaDoc :
https://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/admin/AdminClient.html

Spring Kafka consumer: Is there a way to read from multiple partitions using Kafka 0.8?

This is the scenario:
I know that using latest API related to Spring kafka (like Spring-integration-kafka 2.10) we can do something like:
#KafkaListener(id = "id0", topicPartitions = { #TopicPartition(topic = "SpringKafkaTopic", partitions = { "0" }) })
#KafkaListener(id = "id1", topicPartitions = { #TopicPartition(topic = "SpringKafkaTopic", partitions = { "1" }) })
and with that read from different partitions related to the same kafka topic.
I'm wondering if we can do the same using, for example spring-integration-kafka 1.3.1
I didn't find any tip about how to do that (I'm interesting in the xml version).
In Kafka you can decide from which topics you want to read,
but we can't decide from which partitions we want to read, it's up to Kafka to decide that in order to avoid reading the same message more than once.
Consumers don't share partitions for reading purposes, by Kafka definition.
If you'll have more consumers than partitions some consumers will stay idle and won't consume from any partition. for example, if we'll have 5 consumers and 4 partitions, 1 consumer will stay idle and won't consume data from kafka broker.
The actual partition assignment is being done by a kafka broker (the group coordinator) and a leader consumer. we can't control that.
This definition helped me the most:
In Apache Kafka, the consumer group concept is a way of achieving two
things:
Having consumers as part of the same consumer group means providing the “competing consumers” pattern with whom the messages
from topic partitions are spread across the members of the group. Each
consumer receives messages from one or more partitions
(“automatically” assigned to it) and the same messages won’t be
received by the other consumers (assigned to different partitions). In
this way, we can scale the number of the consumers up to the number of
the partitions (having one consumer reading only one partition); in
this case, a new consumer joining the group will be in an idle state
without being assigned to any partition.
Having consumers as part of different consumer groups means providing the “publish/subscribe” pattern where the messages from
topic partitions are sent to all the consumers across the different
groups. It means that inside the same consumer group, we’ll have the
rules explained above, but across different groups, the consumers will
receive the same messages. It’s useful when the messages inside a
topic are of interest for different applications that will process
them in different ways. We want all the interested applications to
receive all the same messages from the topic.
From here Don't Use Apache Kafka Consumer Groups the Wrong Way!

RabbitMQ: Move messages to another queue on acknowledgement received

I have a setup with two queues (no exchanges), let's say queue A and queue B.
One parser puts messages on queue A, that are consumed by ElasticSearch RabbitMQ river.
What I want now is to move messages from queue A to queue B when the ES river sends an ack to the queue A, so that I can do other processing in the ack'd messages, being sure that ES already has processed them.
Is there any way in RabbitMQ to do this? If not, is there any other setup that can guarantee me that a message is only in queue B after being processed by ES?
Thanks in advance
I don't think this is supported by either AMQP or the rabbitmq extensions.
You could drop the river and let your consumer also publish to elasticsearch.
Since a normal behavior is that the queues are empty you can just perform a few retries of reading the entries from elasticsearch (with exponential backoff), so even if the elasticsearch loses the initial race it will backoff a bit and you can then perform the task. This might require tuning the prefetch_size/count in your clients.

Monitoring Kafka Spout with KafkaOffsetMonitoring tool

I am using the kafkaSpout that came with storm-0.9.2 distribution for my project. I want to monitor the throughput of this spout. I tried using the KafkaOffsetMonitoring, but it does not show any consumers reading from my topic.
I suspect this is because I have specified the root path in Zookeeper for the spout to store the consumer offsets. How will the kafkaOffsetMonitor know that where to look for data about my kafkaSpout instance?
Can someone explain exactly where does zookeeper store data about kafka topics and consumers? The zookeeper is a filesystem. So, how does it arrange data of different topics and their partitions? What is consumer groupid and how is it interpreted by zookeeper while storing consumer offset?
If anyone has ever used kafkaOffsetMonitor to monitor throughput of a kafkaSpout, please tell me how I can get the tool to find my spout?
Thanks a lot,
Palak Shah
Kafka-Spout maintains its offset in its own znode rather than under the znode where kafka stores the offsets for regular consumers. We had a similar need where we had to monitor the offsets of both the kafka-spout consumers and also regular kafka consumers, so we ended writing our own tool. You can get the tool from here:
https://github.com/Symantec/kafka-monitoring-tool
I have never used KafkaOffsetMonitor, but I can answer the other part.
zookeeper.connect is the property where you can specify the znode for Kafka; By default it keeps all data at '/'.
You can access the zookeeper filesystem using zkCli.sh, the zookeeper command line.
You should look at /consumers and /brokers; following would give you the offset
get /consumers/my_test_group/offsets/my_topic/0
You can poll this offset continuously to know the rate of consumption at spout.

Resources