does clients get an acknowledgement after a etcd leader replicate data to all other nodes? - etcd

Etcd cluster elects a leader under Raft consensus algorithm. When a client sends a write request to the leader, It should write a log in its disk and replicate it to other followers. I am unsure if the client gets an acknowledgment from a leader after all followers replicate the data or after N/2 + 1 nodes replicate the data.
For example, let's say that there are three nodes in the Etcd cluster. Does the client get an acknowledgment after a leader and a follower(two nodes in total) replicate the data? or after all three nodes successfully replicate the data?
If the latter is correct, does it mean that it has more latency when the Etcd cluster has more nodes because the client waits until all nodes replicate the data?
What happens if one of the followers takes too long or fails to replicate it?

This is actually something I've researched previously in ETCD-14501.
It requires N/2+1 acknowledgements before returning to the client.
Does the client get an acknowledgment after a leader and a follower(two nodes in total) replicate the data?
Yes, exactly that.

Related

Hashicorp Raft consensus deadlock state

I am implementing a Raft service using Hashicorp Raft library for distributed consensus.
https://github.com/hashicorp/raft
I have a simple layout with Raft, 1 followers and a leader.
I bootstrap the cluster with the leader and add 2 follower Raft nodes to the Raft cluster, things look fine. When I knock one of the followers offline, the leader gets:
failed to contact quorum of nodes, stepping down. The problem with this is now there are 0 nodes in leader state and no one can promote to leader because majority of votes required from quorum of nodes. Now because the previous leader is now a follower, even my service discovery tools can't remove the old ip address from the leader because it requires leader power to do so.
My cluster enters this infinite loop (deadlock) of trying to connect to a node that's offline forever and no one can get promoted to leader. Any ideas?
Edit: After realizing, I guess I need a system where there are an odd number of nodes to reach quorum. (ie 3 nodes, 1 gets knocked offline then I can tell the new leader to remove old IP address)
I don't have experience with that library, but:
"The problem with this is now there are 0 nodes in leader state and no one can promote to leader because majority of votes required from quorum of nodes".
With one of three nodes out, you still have quorum/majority. Raft would promote one of followers to leader.
The fact that your system stalled after you removed one of two followers tells me that one of followers was not added correctly in the first place.
You had one leader and one follower initially. Since that was working, it means that follower and the leader can communicate; all good here.
You have added second follower; how do you know that it was done correctly? Can you try to do it again, and knock out this second follower - the system should keep working as the leader and the first follower are ok.
So I can conclude, that the second follower did not join the cluster, and when you knocked out the first follower, the systems stopped, as it should - no more majority of correctly configured nodes are available.

kafka partition and producer relationship

I have a kafka cluster with three brokers and one topic with replication factor of three and three partitions. I can see that every broker has a copy of log for all partitions with the same size. There are two producers for this topic.
One day I reduced writing volume of one producer by half. Then I found that all three brokers' inbound traffic reduced which is expected, but only partition 1's leader node's out traffic reduced which I don't understand.
The partition leader's outbound traffic reduced because of replication. But each broker is the leader of one partition, why only one leader's outbound traffic reduced? Is it possible that the producer only writes content to one partition? while I don't think so.
Please help me explain it. The cluster is working fine now, but I need to understand it in case of potential problem.
Assuming you are using Default Partitioner for KafkaProducer, which means two events with the same key are guaranteed to be sent to the same partition.
From From Kafka Documentation
All reads and writes go to the leader of the partition and Followers
consume messages from the leader just as a normal Kafka consumer would
and apply them to their own log.
You could have reduced data ( from a producer) by skiping specific key or set of Keys, which could means no data to particular partition.
This answers why leader's outbound traffic reduced (No records for followers to consume)

what is difference between partition and replica of a topic in kafka cluster

What is difference between partition and replica of a topic in kafka cluster.
I mean both store the copies of messages in a topic. Then what is the real diffrence?
When you add the message to the topic, you call send(KeyedMessage message) method of the producer API. This means that your message contains key and value. When you create a topic, you specify the number of partitions you want it to have. When you call "send" method for this topic, the data would be sent to only ONE specific partition based on the hash value of your key (by default). Each partition may have a replica, which means that both partitions and its replicas store the same data. The limitation is that both your producer and consumer work only with the main replica and its copies are used only for redundancy.
Refer to the documentation: http://kafka.apache.org/documentation.html#producerapi
And a basic training: http://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign
Topics are partitioned across multiple nodes so a topic can grow beyond the limits of a node. Partitions are replicated for fault tolerance. Replication and leader takeover is one of the biggest difference between Kafka and other brokers/Flume. From the Apache Kafka site:
Each partition has one server which acts as the "leader" and zero or
more servers which act as "followers". The leader handles all read and
write requests for the partition while the followers passively
replicate the leader. If the leader fails, one of the followers will
automatically become the new leader. Each server acts as a leader for
some of its partitions and a follower for others so load is well
balanced within the cluster.
partition: each topic can be splitted up into partitions for load balancing (you could write into different partitions at the same time) & scalability (the topic can scale up without the instance limitations); within the same partition the records are ordered;
replica: for fault-tolerant durability mainly;
Quotes:
The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. Each partition is replicated across a configurable number of servers for fault tolerance.
There is a quite intuitive tutorial to explain some fundamental concepts in Kafka: https://www.tutorialspoint.com/apache_kafka/apache_kafka_fundamentals.htm
Furthermore, there is a workflow to get you through the confusing jumgle: https://www.tutorialspoint.com/apache_kafka/apache_kafka_workflow.htm
Partitions
A topic consists of a bunch of buckets. Each such bucket is called a partition.
When you want to publish an item, Kafka takes its hash, and appends it into the appropriate bucket.
Replication Factor
This is the number of copies of topic-data you want replicated across the network.
In simple terms, partition is used for scalability and replication is for availability.
Kafka topics are divided into a number of partitions. Any record written to a particular topic goes to particular partition. Each record is assigned and identified by an unique offset. Replication is implemented at partition level. The redundant unit of topic partition is called replica. The logic that decides partition for a message is configurable. Partition helps in reading/writing data in parallel by splitting in different partitions spread over multiple brokers. Each replica has one server acting as leader and others as followers. Leader handles the read/write while followers replicate the data. In case leader fails, any one of the followers is elected as the leader.
Hope this explains!
Further Reading
Partitions store different data of the same type and
Yes, you can store the same message in different topic partitions but your consumers need to handle duplicated messages.
Replicas are a copy of these partitions in other servers.
Your number of replicas will be defined by the number of kafka brokers (servers) of your cluster
Example:
Let's suppose you have a Kafka cluster of 3 brokers and inside you have a topic with name AIRPORT_ARRIVALS that receives messages of Flight information and it has 3 partitions; partition 1 for flight arrivals from airline A, partition 2 from airline B, and partition 3 from airline C. All these messages will be initially written in one broker (leader) and a copy of each message will be stored/replicated to the other 2 Kafka broker (followers). Disclaimer; this example is only for an easier explanation and not an ideal way to define a message key because you could ending up with unbalanced load over specific partitions.
Partitions are the way that Kafka provides redundancy.
Kafka keeps more than one copy of the same partition across multiple brokers.
This redundant copy is called a Replica. If a broker fails, Kafka can still serve consumers with the replicas of partitions that failed broker owned

Replication of data across the cluster in Cassandra database

According to DataStax Each node communicates with each other through the Gossip protocol, which exchanges information across the cluster...
I just wanted to know:
is it really possible to replicate 100gb data in 1 sec across the cluster????????
if it is..then how it's possible..using what kind of technique...can you elaborate??
The gossip protocol is just to share state information around the cluster. This is how Cassandra nodes discover new ones and detect if nodes are unavailable.
Data, however, is not transferred using gossip. Messages are sent directly to replicas during inserts and bulk streaming is done during bootstrap/decommission/repair.

How Connection Pool/distribution are across Vertica cluster is done?

How Connection Pool/distribution are across Vertica cluster ?
I am trying to understand how connections are handeled in Vertica! Like Oracle handles it's connections thou it's listener or how the connections are balanced inside the cluster (for better distribution).
Vertica's process of handling a connection is basically as follows:
A node receives the connection, making it the Initiator Node.
The initiator node generates the query execution plan and distributes it to the other nodes.
The nodes fill in any node specific details of the execution plan
The nodes execute the query
(ignoring some stuff here)*
The nodes send the result set back to the initiator node
The initiator node collects the data and does final aggregations
The initiator node sends the data back to the client.
The recommended way to connect through Vertica is through a load balancer so no single node becomes a failure point. Vertica itself does not distribute connections between nodes, it distributes the query to the other nodes.
I'm not well versed in Oracle or the details of how systems do their data connection process; so hopefully I'm not too far off the mark of what you're looking for.
From /my/ experience, each node can handle a number of connections. Once you try to connect more than that to a node, it will reject the connection. That was experienced from a map-reduce job that connected in the map function.
*Depending on the query/data/partitioning it may need to do some data transfer behind the scene to complete the query for each node. It slows the query down when this happens.

Resources