Newly added Raft node cannot receive proposal? - go

I am new to etfd/raft. I managed to make a simplest raft demo to create a cluster and make them sync proposal. The nodes in the cluster are pre-configured in peers of raft.StartNode(...)
However, when I tried to dynamically add a new node to a running cluster, although I managed to make the cluster recognized the new node, new proposals are not sent to the new node.
Here is my code: main.go
For short:
I did got entries from Ready(), and invoke ApplyConfChange() when I got raftpb.EntryConfChange type from CommittedEntries
Messages to each nodes are sent by channel, simulating a network RPC
Advance() and Step() are invoked make the state machines work.
And how I add the new node? Also for short:
Each node has another indivisual channel to receive conf change message
I send raftpb.ConfChange type to every cond change channels
Watching stdout, this new node is reconized by all previous nodes because they all got EntryConfChange in CommittedEntries
Then I make another proposal. All previous nodes received the proposal from leader node, but the new one did not.
I must had miss something or made sonething wrong, but I cannot figure out.
Anyone help? Thank you in ahead!

Related

Corda State Events : Do events have an order?

A network consist of 3 nodes, where 1 node is read-only and participates in every transaction. Request can start from either of the nodes which in turn creates a request state. It is received and processed by other node to create a new response state. Both only issue new states and do not consume the state. Both these state events are received by the read-only node. Would the State events received by the read-only corda node have an order or would it be processed in any order ?
For eg can we say that the request originator state event would be received/processed first and then the other node ? or can it be possible under high load that the other node request gets received/processed by the read-only node first and then the originators event is received.
My experience with corda is very minimal and need to understand
how events are received by the parties when one party acts as
read-only and all remaining parties only issue new states.
In general, the order of the receiving messages is not guaranteed. A node will process messages in the order it receives them. But it's not guaranteed that the received messages are sequential.
If Node A is receiving messages from Node B and Node C, and Node B produces a message before node C. There is no guarantee that the message from Node B is processed first. The one which reaches Node A first gets processed first. The delay could be because of multiple reasons like network latency, etc.

What is the deal with LinkName?

I am new to AMQP trying to understand the concept so my question might be very naive.
I am sending message to ActiveMQ Broker and while sending the message, I have to mention LinkName but that doesn't matter what I am putting at consumer side and producer side I am receiving the data anyway.
I am confused what is the deal with LinkName?
I can't really state it any better than section 2.6.1 of the AMQP 1.0 specification:
2.6.1 Naming A Link
Links are named so that they can be recovered when communication is interrupted. Link names MUST uniquely identify the link amongst all links of the same direction between the two participating containers. Link names are only used when attaching a link, so they can be arbitrarily long without a significant penalty.
A link’s name uniquely identifies the link from the container of the source to the container of the target node, i.e., if the container of the source node is A, and the container of the target node is B, the link can be globally identified by the (ordered) tuple (A,B,<name>). Consequently, a link can only be active in one connection at a time. If an attempt is made to attach the link subsequently when it is not suspended, then the link can be ’stolen’, i.e., the second attach succeeds and the first attach MUST then be closed with a link error of stolen. This behavior ensures that in the event of a connection failure occurring and being noticed by one party, that re-establishment has the desired effect.

How can I deal with a sudden loss of a clustered node using the low level .net Elasticsearch client?

I have set up a cluster of three elastic search nodes all master eligible with 2 being the minimum required. I have configured a client to then bulk upload using the low level client with a static connection pool using the code below.
What I am trying to test is live fail over scenarios i.e. start client with three nodes available and then randomly drop one (shutting down the VM), but keep two up. However I am not seeing the behavior I would expect, it keeps trying the dead node. It actually it seems to take up to about sixty seconds before it moves to the next node.
What I would expect is it to do is to take a the failed attempt and mark that node as potentially dead but at least move on to the next node. What is odd is this is the behavior I get if I start my application with only two of the three nodes available in my list or if I just stop the elastic search service during a test rather than a power down.
Is there a correct way to deal with such a case and get it to move to the next available node as quickly as possible? Or do I need to potentially back off in my code for up to sixty seconds before attempting a republication?
var nodes = new[]
{
new Node(new Uri("http://172.16.2.10:9200")),
new Node(new Uri("http://172.16.2.11:9200")),
new Node(new Uri("http://172.16.2.12:9200"))
};
var connectionPool = new StaticConnectionPool(nodes);
var settings = new ConnectionConfiguration(connectionPool)
.PingTimeout(TimeSpan.FromSeconds(10))
.RequestTimeout(TimeSpan.FromSeconds(20))
.ThrowExceptions()
.MaximumRetries(3);
_lowLevelClient = new ElasticLowLevelClient(settings);
The following I then have wrapped in a try catch where I retry for a maximum of three times before I consider it a failed attempt and revert to an error strategy.
ElasticsearchResponse<Stream> indexResponse = _lowLevelClient.Bulk<Stream>(data);
Any input is appreciated,
Thank you.
The tests for the client include tests for failover scenarios from which the API conventions documentation is generated. Specifically, take a look at the retry and failover documentation
With a StaticConnectionPool, the nodes to which requests can be made are static and never refreshed to reflect nodes that may join and leave the cluster, but they will be marked as being dead if a bad response is returned, and will be taken out of rotation for executing requests on for a configurable dead time, controlled by DeadTimeout and MaxDeadTimeout on connection settings.
The audit trail on the response should provide a timeline of what has happened for a given request, which is easiest to see with response.DebugInformation. The Virtual Clustering test harness (an example) that are part of the Tests project may help to ascertain the correct settings for the behaviour you're after.

How to handle url change when a node dies?

I am new to elasticsearch. I have a cluster with 3 nodes on a same machine. To access each node I have separate url as the port changes(localhost:9200, localhost:9201, localhost:9202).
Now the question I have is that suppose my node 1(i.e. master node) dies then elasticsearch engine handle the situation very well and makes node 2 as master node but how does my application know that a node died and now I need to hit node 2 with port 9201?
Is there a way using which I always hit a single URL and internally it figures out which node to hit?
Thanks,
Pratz
The client search nodes with a discovery module. The name of the cluster in your clients configuration is important to get this working.
With a correct configuration (on client and cluster) you can bring a single node down without any (negative) effect on your client.
See the following links:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html

How do I figure out the new master node when my master node fails in ElasticSearch?

Let's say I have 3 nodes. 1 of which is the master.
I have an API (running on another machine) which hits the master and gets my search result. This is through a subdomain, say s1.mydomain.com:9200 (assume the others are pointed to by s2.mydomain.com and s3.mydomain.com).
Now my master fails for whatever reason. How would my API recover from such a situation? Can I hit either S2 or S3 instead? How can I figure out what the new master is? Is there a predictable way to know which one would be picked as the new master should the master go down?
I've googled this and it's given me enough information about how when a master goes down, a failover is picked as the new master but I haven't seen anything clarify how I would need to handle this from the outside looking in.
The master in ElasticSearch is really only for internal coordination. There are no actions required when a node goes down, other than trying to get it back up to get your full cluster performance back.
You can read/write to any of the remaining nodes and the data replication will keep going. When the old master node comes back up, it will re-join the cluster once it has received the updated data. In fact, you never need to worry if the node you are writing on is the master node.
There are some advanced configurations to alter these behaviors, but ElasticSearch comes with suitable defaults.

Resources