etcd cluster id mistmatch - cluster-computing

Hey I have a cluster id mismatch for some reason, i had it on 1 node then disapperead after clearing data dir few times , changing cluster token and node names, but apperead on another
here is the script i use
IP0=10.150.0.1
IP1=10.150.0.2
IP2=10.150.0.3
IP3=10.150.0.4
NODENAME0=node0
NODENAME1=node1
NODENAME2=node2
NODENAME3=node3
# changing these on each box
THISIP=$IP2
THISNODENAME=$NODENAME2
etcd --name $THISNODENAME --initial-advertise-peer-urls http://$THISIP:2380 \
--data-dir /root/etcd-data \
--listen-peer-urls http://$THISIP:2380 \
--listen-client-urls http://$THISIP:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://$THISIP:2379 \
--initial-cluster-token etcd-cluster-2 \
--initial-cluster $NODENAME0=http://$IP0:2380,$NODENAME1=http://$IP1:2380,$NODENAME2=http://$IP2:2380,$NODENAME3=http://$IP3:2380 \
--initial-cluster-state new
I get
2016-11-11 22:13:12.090515 I | etcdmain: etcd Version: 2.3.7
2016-11-11 22:13:12.090643 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2016-11-11 22:13:12.090713 I | etcdmain: listening for peers on http://10.150.0.3:2380
2016-11-11 22:13:12.090745 I | etcdmain: listening for client requests on http://10.150.0.3:2379
2016-11-11 22:13:12.090771 I | etcdmain: listening for client requests on http://127.0.0.1:2379
2016-11-11 22:13:12.090960 I | etcdserver: name = node2
2016-11-11 22:13:12.090976 I | etcdserver: data dir = /root/etcd-data
2016-11-11 22:13:12.090983 I | etcdserver: member dir = /root/etcd-data/member
2016-11-11 22:13:12.090990 I | etcdserver: heartbeat = 100ms
2016-11-11 22:13:12.090995 I | etcdserver: election = 1000ms
2016-11-11 22:13:12.091001 I | etcdserver: snapshot count = 10000
2016-11-11 22:13:12.091011 I | etcdserver: advertise client URLs = http://10.150.0.3:2379
2016-11-11 22:13:12.091269 I | etcdserver: restarting member 7fbd572038b372f6 in cluster 4e73d7b9b94fe83b at commit index 4
2016-11-11 22:13:12.091317 I | raft: 7fbd572038b372f6 became follower at term 8
2016-11-11 22:13:12.091346 I | raft: newRaft 7fbd572038b372f6 [peers: [], term: 8, commit: 4, applied: 0, lastindex: 4, lastterm: 1]
2016-11-11 22:13:12.091516 I | etcdserver: starting server... [version: 2.3.7, cluster version: to_be_decided]
2016-11-11 22:13:12.091869 E | etcdmain: failed to notify systemd for readiness: No socket
2016-11-11 22:13:12.091894 E | etcdmain: forgot to set Type=notify in systemd service file?
2016-11-11 22:13:12.096380 N | etcdserver: added member 7508b3e625cfed5 [http://10.150.0.4:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.099800 N | etcdserver: added member 14c76eb5d27acbc5 [http://10.150.0.1:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.100957 N | etcdserver: added local member 7fbd572038b372f6 [http://10.150.0.2:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.102711 N | etcdserver: added member d416fca114f17871 [http://10.150.0.3:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.134330 E | rafthttp: request cluster ID mismatch (got cfd5ef74b3dcf6fe want 4e73d7b9b94fe83b)
the other memebers are not even running, how that's possible ?
Thank you

For all those who stumble upon this from google:
The error is about peer member ID, that tries to join cluster with same name as another member (probably old instance) that already exists in cluster (with same peer name, but another ID, this is the problem).
you should delete the peer and re-add it like shown in this helpful post:
In order to fix this it was pretty simple, first we had to log into an existing working server on the rest of the cluster and remove server00 from its member list:
etcdctl member remove <UID>
This free's up the ability to allow the new server00 to join but we needed to simply tell the cluster it could by issuing the add command:
etcdctl member add server00 http://1.2.3.4:2380
It you follow the logs on server00 you'll then see that everything spring into life. You can confirm this with the commands:
etcdctl member list
etcdctl cluster-health
Use "etcdctl member list" to find what are the IDs of current members, and find the one which tries to join cluster with wrong ID, then delete that peer from "members" with "etcdctl member remove " and try to rejoin him.
Hope it helps.

I just ran into this same issue, 2 years later. Dmitry's answer is fine but misses what the OP likely did wrong in the first place when setting up an etcd cluster.
Running an etcd instance with "--cluster-state new" at any point, will generate a cluster ID in the data directory. If you try to then/later join an existing cluster, it will use that old generated cluster ID (which is when the mismatch error occurs). Yes, technically the OP had an "old cluster" but more likely, and 100% common, is when someone is trying to stand up their first cluster, they don't notice the procedure has to change. I find that etcd kind of generally fails in providing a good usage model.
So, removing the member (you don't really need to if the new node never joined successfully) and/or deleting the new node's data directory will "fix" the issue, but its how the OP setup the 2nd cluster node that the problem.
Here's an example of the setup nuance: (sigh... thanks for that etcd...)
# On the 1st node (I used Centos7 minimal, with etcd installed)
sudo firewall-cmd --permanent --add-port=2379/tcp
sudo firewall-cmd --permanent --add-port=2380/tcp
sudo firewall-cmd --reload
export CL_NAME=etcd1
export HOST=$(hostname)
export IP_ADDR=$(ip -4 addr show ens33 | grep -oP '(?<=inet\s)\d+(\.\d+){3}')
# turn on etcdctl v3 api support, why is this not default?!
export ETCDCTL_API=3
sudo etcd --name $CL_NAME --data-dir ~/data --advertise-client-urls=http://127.0.0.1:2379,https://$IP_ADDR:2379 --listen-client-urls=https://0.0.0.0:2379 --initial-advertise-peer-urls https://$IP_ADDR:2380 --listen-peer-urls https://$IP_ADDR:2380 --initial-cluster-state new
Ok, the first node is running. The cluster data is in the ~/data directory. In future runs you only need (note that cluster-state isn't needed):
sudo etcd --name $CL_NAME --data-dir ~/data --advertise-client-urls=http://127.0.0.1:2379,https://$IP_ADDR:2379 --listen-client-urls=https://0.0.0.0:2379 --initial-advertise-peer-urls https://$IP_ADDR:2380 --listen-peer-urls https://$IP_ADDR:2380
Next, add your 2nd node's expected cluster name and peer URLs:
etcdctl --endpoints="https://127.0.0.1:2379" member add etcd2 --peer-urls="http://<next node's IP address>:2380"
Adding the member is important. You won't be able to successfully join without doing it first.
# Next on the 2nd/new node
export CL_NAME=etcd1
export HOST=$(hostname)
export IP_ADDR=$(ip -4 addr show ens33 | grep -oP '(?<=inet\s)\d+(\.\d+){3}')
sudo etcd --name $CL_NAME --data-dir ~/data --advertise-client-urls=https://127.0.0.1:2379,https://$IP_ADDR:2379 --listen-client-urls=https://0.0.0.0:2379 --initial-advertise-peer-urls https://$IP_ADDR:2380 --listen-peer-urls https://$IP_ADDR:2380 --initial-cluster-state existing --initial-cluster="etcd1=http://<IP of 1st node>:2380,etcd2=http://$IP_ADD:2380"
Note the annoying extra arguments here. --initial-cluster must have 100% of all nodes in the cluster identified... which doesn't matter after you join the cluster because cluster data will be replicated anyways... Also "--initial-cluster existing" is needed.
Again, after the 1st time the 2nd node runs/joins, you can run it without any cluster arguments:
sudo etcd --name $CL_NAME --data-dir ~/data --advertise-client-urls=http://127.0.0.1:2379,https://$IP_ADDR:2379 --listen-client-urls=https://0.0.0.0:2379 --initial-advertise-peer-urls https://$IP_ADDR:2380 --listen-peer-urls https://$IP_ADDR:2380
Sure, you could keep running etcd with all the cluster settings in there, but they "might" get ignored for whats in the data directory. Remember that if you join a 3rd node, knowledge of the new node member is replicated to the remaining node, and those "initial" cluster settings could be completely false/misleading in the future when your cluster changes. So run your joined nodes with no initial cluster settings unless you are actually joining one.
Also, last bit to impart, you should/must run at least 3 nodes in a cluster, otherwise the RAFT leader election process will break everything. With 2 nodes, when 1 node goes down or they get disconnected, the node will not elect itself and spin in an election loop. Clients can't talk to an etcd service that's in election mode... Great availability! You need a minimum of 3 nodes to handle if 1 goes down.

in my case i got the error
rafthttp: request cluster ID mismatch (got 1b3a88599e79f82b want b33939d80a381a57)
due to incorrect config on one node
two my nodes got in config
env ETCD_INITIAL_CLUSTER="etcd-01=http://172.16.50.101:2380,etcd-02=http://172.16.50.102:2380,etcd-03=http://172.16.50.103:2380"
and one node got
env ETCD_INITIAL_CLUSTER="etcd-01=http://172.16.50.101:2380"
to resolve the problem i stopped etcd on all nodes, edited incorrect config,
deleted /var/lib/etcd/member folder in all nodes , restarted etcd on all nodes and voila !
p.s.
/var/lib/etcd - is the folder where etcd save its data in my case

My --data-dir=/var/etcd/data, remove and recreate it, that works for me. It seems that something of previous etcd cluster I made left in this directory, which may affect the etcd settings.

I have faced the same problem, our leader etcd server went down and after replacing it with new we were getting an error
rafthttp: request sent was ignored (cluster ID mismatch)
It was looking for the old cluster-Id and generating some random local cluster with some misconfiguration.
Followed these steps to fix the issue.
Login to other working cluster and remove unreachable member from
the cluster
etcdctl cluster-health
etcdctl member remove member-id
Login to new server and stop if etcd process is running systemctl etcd2 stop
Remove data from the data directory rm -rf /var/etcd2/data Keep backup of this data somewhere in other folder before deleting it.
Now start your cluster with --initial-cluster-state existing parameter, don't use --initial-cluster-state new if you are already adding server to existing cluster.
Now go back to one of the running etcd server and add this new member to cluster etcdctl member add node0 http://$IP:2380
I have spent a lot of time on debugging this issue and now my cluster is running healthy with all members. Hope this information helps.

Add a new node to a existing etcd cluster.
etcdctl member add <new_node_name> --peer-urls="http://<new_node_ip>:2380"
Attention, if you enable TLS, replace http with https
Run etcd in new node. It is important to add "--initial-cluster-state existing", the purpose is telling new node that join the existing cluster, instead of creating a new cluster.
etcd --name <new_node_name> --initial-cluster-state existing ...
Check the result
etcdctl member list

Related

How to sub to and monitor a VerneMQ cluster?

I managed to create a cluster of 2 Vernemq nodes.
Both are well communicating and are located to a different server, respectively
A.company.com
B.company.com
Following command shows both nodes running. (ssl and port 8883 are used)
vmq-admin cluster show
+-----------------------+-------+
| Node |Running|
+-----------------------+-------+
|ServerAMQ#10.107.1.25 | true |
|ServerBMQ#10.107.1.26 | true |
+-----------------------+-------+
Everything is running fine actually, eg :
mosquitto_pub -h a.mycompany.com -p 8883 -t 'topic/A/1' -m "foo" -d --cert client.crt --key client.key --cafile ca.crt -d
As i'm subbing A.mycompany.com, does the Cluster distribute to B.mycompany.com ?
Can please someone share some insight on how it works ?
As i don't have any experience on this subject, may i please ask you which is the best way to supervise cluster's activity ?
Graphite (i saw some cluster's related metrics)
http://(oneofthetwoserver):8888/metrics
Regards,
Pierre
A quick way to overview the cluster is: http://{{your_server}}:8888/status.
For a production deployment, I'd use Prometheus metrics in any case.
I'm not sure I get your question regarding broker A and B. If they are clustered, messages published to A will be delivered to a subscriber on broker B, provided the topic subscription matches.

Is it possible to demote a master node without using "repmgr standby clone" and pg_rewind

I am currently using postgresql with log shipping replication. I use a master/slave resource of pacemaker to deal with postgresql failover.
I was asking if there is a way to demote a master, set it as standby and keep synchronized without using "repmgr standby clone" neither pg_rewind.
In fact, I want the old master to be quickly ready to get back to master state and "repmgr standby clone" takes several minutes to recover which is too long.
I see that it is possible to use pg_rewind to synchronize faster but it implies to have wal_log_hints enable, and I afraid that this options will decrease the performances of the master. The master is already too much busy.
I try to just write the recovery.conf in data directory, the master has well turned to slave mode, however it doesn't have upstream:
[root#bkm-01 httpd]# su - postgres -c "/usr/pgsql-9.5/bin/repmgr -f /var/lib/pgsql/repmgr/repmgr.conf cluster show"
Role | Name | Upstream | Connection String
----------+--------|----------|--------------------------------------
* master | node-02 | | host=node-02 user=repmgr dbname=repmgr
standby | node-01 | | host=node-01 user=repmgr dbname=repmgr
I wish it is clear enough, I actually a newbie in database replication. Any help would be appreciated.
I found the solution by myself. In fact the former-master just need to be registered after been demoted. --force should be used if node was previously registered.
[root#node-01 ] su - postgres -c "/usr/pgsql-9.5/bin/repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby register --force"

Docker Swarm : How to setup multiple containers on same volume

Am using Docker 1.12.2 build bb80604 with swarm enabled
I have
a swarm cluster with 2 leader & 3 slave nodes.
named volume on each on slave & leader nodes.
Elistacsearch running on 2 master servers
Volume create command
docker volume create -d local-persist -o
mountpoint=/data/docker/swarm/elasticsearch --name esvolume
Now when i run docker service create command to create 5 replicas of Elasticsearch, 3 nodes start (1 on each slave server) whereas remaining 2 replicas fails
docker service create --replicas 5 --name esdata \
--restart-max-attempts 5 --network myesnetwork \
-e CLUSTER_NAME=swarmescluster \
-e MASTER_NODES=esmaster \
--mount type=volume,src=esvolume,dst=/var/lib/elasticsearch \
--mount type=volume,src=esvolume,dst=/var/log/elasticsearch \
myimagename
Error for failure is
Caused by: java.lang.IllegalStateException: failed to obtain node
locks, tried [[/var/lib/elasticsearch/swarmescluster]] with lock id
[0]; maybe these locations are not writable or multiple nodes were
started without increasing [node.max_local_storage_nodes] (was [1])?
at org.elasticsearch.env.NodeEnvironment.(NodeEnvironment.java:259)
~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.node.Node.(Node.java:240) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.node.Node.(Node.java:220) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.bootstrap.Bootstrap$5.(Bootstrap.java:191)
~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:191)
~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:286)
~[elasticsearch-5.0.0.jar:5.0.0]
Questions
How can I configure replicas to write to same path or dynamic path (I need persistent data)
If I want to set value of 'node.max_local_storage_nodes' when a replica is created, how i do it an runtime?

Mesos slave node unable to restart

I've setup a Mesos cluster using the CloudFormation templates from Mesosphere. Things worked fine after cluster launch.
I recently noticed that none of the slave nodes are listed in the Mesos dashboard. EC2 console shows the slaves are running & pass health checks. I restarted nodes on cluster but that didn't help.
I ssh'ed into one of the slaves and noticed mesos-slave services are not running. Executed sudo systemctl status dcos-mesos-slave.service but that couldn't start the service.
Looked in /var/log/mesos/ and tail -f mesos-slave.xxx.invalid-user.log.ERROR.20151127-051324.31267 and saw the following...
F1127 05:13:24.242182 31270 slave.cpp:4079] CHECK_SOME(state::checkpoint(path, bootId.get())): Failed to create temporary file: No space left on device
But the output of df -h and free show there is plenty of disk space left.
Which leads me to wonder, why is it complaining about no disk space?
Ok I figured it out.
When running Mesos for a long time or under frequent load, the /tmp folder won't have any disk space left since Mesos uses the /tmp/mesos/ as the work_dir. You see, the filesystem can only hold a certain number of file references(inodes). In my case, slaves were collecting large number of file chuncks from image pulls in /var/lib/docker/tmp.
To resolve this issue:
1) Remove files under /tmp
2) Set a different work_dir location
It is good practice to run
docker rmi -f $(docker images | grep "<none>" | awk "{print \$3}")
this way you will free space by deleting unused docker images

cant' replace dead cassandra node because it doesn't exist in gossip

One of the nodes in a cassandra cluster has died.
I'm using cassandra 2.0.7 throughout.
When I do a nodetool status this is what I see (real addresses have been replaced with fake 10 nets)
[root#beta-new:/opt] #nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.10.1.94 171.02 KB 256 49.4% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1
DN 10.10.1.98 ? 256 50.6% f2a48fc7-a362-43f5-9061-4bb3739fdeaf rack1
I tried to get the token ID for the down node by doing a nodetool ring command, grepping for the IP and doing a head -1 to get the initial one.
[root#beta-new:/opt] #nodetool ring | grep 10.10.1.98 | head -1
10.10.1.98 rack1 Down Normal ? 50.59% -9042969066862165996
I then started following this documentation on how to replace the node:
[http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html?scroll=task_ds_aks_15q_gk][1]
So I installed cassandra on a new node but did not start it.
Set the following options:
cluster_name: 'Jokefire Cluster'
seed_provider:
- seeds: "10.10.1.94"
listen_address: 10.10.1.94
endpoint_snitch: SimpleSnitch
And set the initial token of the new install as the token -1 of the node I'm trying to replace in cssandra.yaml:
initial_token: -9042969066862165995
And after making sure there was no data yet in:
/var/lib/cassandra
I started up the database:
[root#web2:/etc/alternatives/cassandrahome] #./bin/cassandra -f -Dcassandra.replace_address=10.10.1.98
The documentation I link to above says to use the replace_address directive on the command line rather than cassandra-env.sh if you have a tarball install (which we do) as opposed to a package install.
After I start it up, cassandra fails with the following message:
Exception encountered during startup: Cannot replace_address /10.10.10.98 because it doesn't exist in gossip
So I'm wondering at this point if I've missed any steps or if there is anything else I can try to replace this dead cassandra node?
Has the rest of your cluster been restarted since the node failure, by chance? Most gossip information does not survive a full restart, so you may genuinely not have gossip information for the down node.
This issue was reported as a bug CASSANDRA-8138, and the answer was:
I think I'd much rather say that the edge case of a node dying, and then a full cluster restart (rolling would still work) is just not supported, rather than make such invasive changes to support replacement under such strange and rare conditions. If that happens, it's time to assassinate the node and bootstrap another one.
So rather than replacing your node, you need to remove the failed node from the cluster and start up a new one. If using vnodes, it's quite straightforward.
Discover the node ID of the failed node (from another node in the cluster)
nodetool status | grep DN
And remove it from the cluster:
nodetool removenode (node ID)
Now you can clear out the data directory of the failed node, and bootstrap it as a brand-new one.
Some less known issues of Cassandra dead node replacement has been captured in below link based on my experience:
https://github.com/laxmikant99/cassandra-single-node-disater-recovery-lessons

Resources