cant' replace dead cassandra node because it doesn't exist in gossip

cant' replace dead cassandra node because it doesn't exist in gossip - cassandra-2.0

One of the nodes in a cassandra cluster has died.
I'm using cassandra 2.0.7 throughout.
When I do a nodetool status this is what I see (real addresses have been replaced with fake 10 nets)
[root#beta-new:/opt] #nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.10.1.94 171.02 KB 256 49.4% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1
DN 10.10.1.98 ? 256 50.6% f2a48fc7-a362-43f5-9061-4bb3739fdeaf rack1
I tried to get the token ID for the down node by doing a nodetool ring command, grepping for the IP and doing a head -1 to get the initial one.
[root#beta-new:/opt] #nodetool ring | grep 10.10.1.98 | head -1
10.10.1.98 rack1 Down Normal ? 50.59% -9042969066862165996
I then started following this documentation on how to replace the node:
[http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html?scroll=task_ds_aks_15q_gk][1]
So I installed cassandra on a new node but did not start it.
Set the following options:
cluster_name: 'Jokefire Cluster'
seed_provider:
- seeds: "10.10.1.94"
listen_address: 10.10.1.94
endpoint_snitch: SimpleSnitch
And set the initial token of the new install as the token -1 of the node I'm trying to replace in cssandra.yaml:
initial_token: -9042969066862165995
And after making sure there was no data yet in:
/var/lib/cassandra
I started up the database:
[root#web2:/etc/alternatives/cassandrahome] #./bin/cassandra -f -Dcassandra.replace_address=10.10.1.98
The documentation I link to above says to use the replace_address directive on the command line rather than cassandra-env.sh if you have a tarball install (which we do) as opposed to a package install.
After I start it up, cassandra fails with the following message:
Exception encountered during startup: Cannot replace_address /10.10.10.98 because it doesn't exist in gossip
So I'm wondering at this point if I've missed any steps or if there is anything else I can try to replace this dead cassandra node?

Has the rest of your cluster been restarted since the node failure, by chance? Most gossip information does not survive a full restart, so you may genuinely not have gossip information for the down node.
This issue was reported as a bug CASSANDRA-8138, and the answer was:
I think I'd much rather say that the edge case of a node dying, and then a full cluster restart (rolling would still work) is just not supported, rather than make such invasive changes to support replacement under such strange and rare conditions. If that happens, it's time to assassinate the node and bootstrap another one.
So rather than replacing your node, you need to remove the failed node from the cluster and start up a new one. If using vnodes, it's quite straightforward.
Discover the node ID of the failed node (from another node in the cluster)
nodetool status | grep DN
And remove it from the cluster:
nodetool removenode (node ID)
Now you can clear out the data directory of the failed node, and bootstrap it as a brand-new one.

Some less known issues of Cassandra dead node replacement has been captured in below link based on my experience:
https://github.com/laxmikant99/cassandra-single-node-disater-recovery-lessons

Related

Is it possible to demote a master node without using "repmgr standby clone" and pg_rewind

I am currently using postgresql with log shipping replication. I use a master/slave resource of pacemaker to deal with postgresql failover.
I was asking if there is a way to demote a master, set it as standby and keep synchronized without using "repmgr standby clone" neither pg_rewind.
In fact, I want the old master to be quickly ready to get back to master state and "repmgr standby clone" takes several minutes to recover which is too long.
I see that it is possible to use pg_rewind to synchronize faster but it implies to have wal_log_hints enable, and I afraid that this options will decrease the performances of the master. The master is already too much busy.
I try to just write the recovery.conf in data directory, the master has well turned to slave mode, however it doesn't have upstream:
[root#bkm-01 httpd]# su - postgres -c "/usr/pgsql-9.5/bin/repmgr -f /var/lib/pgsql/repmgr/repmgr.conf cluster show"
Role | Name | Upstream | Connection String
----------+--------|----------|--------------------------------------
* master | node-02 | | host=node-02 user=repmgr dbname=repmgr
standby | node-01 | | host=node-01 user=repmgr dbname=repmgr
I wish it is clear enough, I actually a newbie in database replication. Any help would be appreciated.

I found the solution by myself. In fact the former-master just need to be registered after been demoted. --force should be used if node was previously registered.
[root#node-01 ] su - postgres -c "/usr/pgsql-9.5/bin/repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby register --force"

etcd cluster id mistmatch

Hey I have a cluster id mismatch for some reason, i had it on 1 node then disapperead after clearing data dir few times , changing cluster token and node names, but apperead on another
here is the script i use
IP0=10.150.0.1
IP1=10.150.0.2
IP2=10.150.0.3
IP3=10.150.0.4
NODENAME0=node0
NODENAME1=node1
NODENAME2=node2
NODENAME3=node3
# changing these on each box
THISIP=$IP2
THISNODENAME=$NODENAME2
etcd --name $THISNODENAME --initial-advertise-peer-urls http://$THISIP:2380 \
--data-dir /root/etcd-data \
--listen-peer-urls http://$THISIP:2380 \
--listen-client-urls http://$THISIP:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://$THISIP:2379 \
--initial-cluster-token etcd-cluster-2 \
--initial-cluster $NODENAME0=http://$IP0:2380,$NODENAME1=http://$IP1:2380,$NODENAME2=http://$IP2:2380,$NODENAME3=http://$IP3:2380 \
--initial-cluster-state new
I get
2016-11-11 22:13:12.090515 I | etcdmain: etcd Version: 2.3.7
2016-11-11 22:13:12.090643 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2016-11-11 22:13:12.090713 I | etcdmain: listening for peers on http://10.150.0.3:2380
2016-11-11 22:13:12.090745 I | etcdmain: listening for client requests on http://10.150.0.3:2379
2016-11-11 22:13:12.090771 I | etcdmain: listening for client requests on http://127.0.0.1:2379
2016-11-11 22:13:12.090960 I | etcdserver: name = node2
2016-11-11 22:13:12.090976 I | etcdserver: data dir = /root/etcd-data
2016-11-11 22:13:12.090983 I | etcdserver: member dir = /root/etcd-data/member
2016-11-11 22:13:12.090990 I | etcdserver: heartbeat = 100ms
2016-11-11 22:13:12.090995 I | etcdserver: election = 1000ms
2016-11-11 22:13:12.091001 I | etcdserver: snapshot count = 10000
2016-11-11 22:13:12.091011 I | etcdserver: advertise client URLs = http://10.150.0.3:2379
2016-11-11 22:13:12.091269 I | etcdserver: restarting member 7fbd572038b372f6 in cluster 4e73d7b9b94fe83b at commit index 4
2016-11-11 22:13:12.091317 I | raft: 7fbd572038b372f6 became follower at term 8
2016-11-11 22:13:12.091346 I | raft: newRaft 7fbd572038b372f6 [peers: [], term: 8, commit: 4, applied: 0, lastindex: 4, lastterm: 1]
2016-11-11 22:13:12.091516 I | etcdserver: starting server... [version: 2.3.7, cluster version: to_be_decided]
2016-11-11 22:13:12.091869 E | etcdmain: failed to notify systemd for readiness: No socket
2016-11-11 22:13:12.091894 E | etcdmain: forgot to set Type=notify in systemd service file?
2016-11-11 22:13:12.096380 N | etcdserver: added member 7508b3e625cfed5 [http://10.150.0.4:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.099800 N | etcdserver: added member 14c76eb5d27acbc5 [http://10.150.0.1:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.100957 N | etcdserver: added local member 7fbd572038b372f6 [http://10.150.0.2:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.102711 N | etcdserver: added member d416fca114f17871 [http://10.150.0.3:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.134330 E | rafthttp: request cluster ID mismatch (got cfd5ef74b3dcf6fe want 4e73d7b9b94fe83b)
the other memebers are not even running, how that's possible ?
Thank you

For all those who stumble upon this from google:
The error is about peer member ID, that tries to join cluster with same name as another member (probably old instance) that already exists in cluster (with same peer name, but another ID, this is the problem).
you should delete the peer and re-add it like shown in this helpful post:
In order to fix this it was pretty simple, first we had to log into an existing working server on the rest of the cluster and remove server00 from its member list:
etcdctl member remove <UID>
This free's up the ability to allow the new server00 to join but we needed to simply tell the cluster it could by issuing the add command:
etcdctl member add server00 http://1.2.3.4:2380
It you follow the logs on server00 you'll then see that everything spring into life. You can confirm this with the commands:
etcdctl member list
etcdctl cluster-health
Use "etcdctl member list" to find what are the IDs of current members, and find the one which tries to join cluster with wrong ID, then delete that peer from "members" with "etcdctl member remove " and try to rejoin him.
Hope it helps.

I just ran into this same issue, 2 years later. Dmitry's answer is fine but misses what the OP likely did wrong in the first place when setting up an etcd cluster.
Running an etcd instance with "--cluster-state new" at any point, will generate a cluster ID in the data directory. If you try to then/later join an existing cluster, it will use that old generated cluster ID (which is when the mismatch error occurs). Yes, technically the OP had an "old cluster" but more likely, and 100% common, is when someone is trying to stand up their first cluster, they don't notice the procedure has to change. I find that etcd kind of generally fails in providing a good usage model.
So, removing the member (you don't really need to if the new node never joined successfully) and/or deleting the new node's data directory will "fix" the issue, but its how the OP setup the 2nd cluster node that the problem.
Here's an example of the setup nuance: (sigh... thanks for that etcd...)
# On the 1st node (I used Centos7 minimal, with etcd installed)
sudo firewall-cmd --permanent --add-port=2379/tcp
sudo firewall-cmd --permanent --add-port=2380/tcp
sudo firewall-cmd --reload
export CL_NAME=etcd1
export HOST=$(hostname)
export IP_ADDR=$(ip -4 addr show ens33 | grep -oP '(?<=inet\s)\d+(\.\d+){3}')
# turn on etcdctl v3 api support, why is this not default?!
export ETCDCTL_API=3
sudo etcd --name $CL_NAME --data-dir ~/data --advertise-client-urls=http://127.0.0.1:2379,https://$IP_ADDR:2379 --listen-client-urls=https://0.0.0.0:2379 --initial-advertise-peer-urls https://$IP_ADDR:2380 --listen-peer-urls https://$IP_ADDR:2380 --initial-cluster-state new
Ok, the first node is running. The cluster data is in the ~/data directory. In future runs you only need (note that cluster-state isn't needed):
sudo etcd --name $CL_NAME --data-dir ~/data --advertise-client-urls=http://127.0.0.1:2379,https://$IP_ADDR:2379 --listen-client-urls=https://0.0.0.0:2379 --initial-advertise-peer-urls https://$IP_ADDR:2380 --listen-peer-urls https://$IP_ADDR:2380
Next, add your 2nd node's expected cluster name and peer URLs:
etcdctl --endpoints="https://127.0.0.1:2379" member add etcd2 --peer-urls="http://<next node's IP address>:2380"
Adding the member is important. You won't be able to successfully join without doing it first.
# Next on the 2nd/new node
export CL_NAME=etcd1
export HOST=$(hostname)
export IP_ADDR=$(ip -4 addr show ens33 | grep -oP '(?<=inet\s)\d+(\.\d+){3}')
sudo etcd --name $CL_NAME --data-dir ~/data --advertise-client-urls=https://127.0.0.1:2379,https://$IP_ADDR:2379 --listen-client-urls=https://0.0.0.0:2379 --initial-advertise-peer-urls https://$IP_ADDR:2380 --listen-peer-urls https://$IP_ADDR:2380 --initial-cluster-state existing --initial-cluster="etcd1=http://<IP of 1st node>:2380,etcd2=http://$IP_ADD:2380"
Note the annoying extra arguments here. --initial-cluster must have 100% of all nodes in the cluster identified... which doesn't matter after you join the cluster because cluster data will be replicated anyways... Also "--initial-cluster existing" is needed.
Again, after the 1st time the 2nd node runs/joins, you can run it without any cluster arguments:
sudo etcd --name $CL_NAME --data-dir ~/data --advertise-client-urls=http://127.0.0.1:2379,https://$IP_ADDR:2379 --listen-client-urls=https://0.0.0.0:2379 --initial-advertise-peer-urls https://$IP_ADDR:2380 --listen-peer-urls https://$IP_ADDR:2380
Sure, you could keep running etcd with all the cluster settings in there, but they "might" get ignored for whats in the data directory. Remember that if you join a 3rd node, knowledge of the new node member is replicated to the remaining node, and those "initial" cluster settings could be completely false/misleading in the future when your cluster changes. So run your joined nodes with no initial cluster settings unless you are actually joining one.
Also, last bit to impart, you should/must run at least 3 nodes in a cluster, otherwise the RAFT leader election process will break everything. With 2 nodes, when 1 node goes down or they get disconnected, the node will not elect itself and spin in an election loop. Clients can't talk to an etcd service that's in election mode... Great availability! You need a minimum of 3 nodes to handle if 1 goes down.

in my case i got the error
rafthttp: request cluster ID mismatch (got 1b3a88599e79f82b want b33939d80a381a57)
due to incorrect config on one node
two my nodes got in config
env ETCD_INITIAL_CLUSTER="etcd-01=http://172.16.50.101:2380,etcd-02=http://172.16.50.102:2380,etcd-03=http://172.16.50.103:2380"
and one node got
env ETCD_INITIAL_CLUSTER="etcd-01=http://172.16.50.101:2380"
to resolve the problem i stopped etcd on all nodes, edited incorrect config,
deleted /var/lib/etcd/member folder in all nodes , restarted etcd on all nodes and voila !
p.s.
/var/lib/etcd - is the folder where etcd save its data in my case

My --data-dir=/var/etcd/data, remove and recreate it, that works for me. It seems that something of previous etcd cluster I made left in this directory, which may affect the etcd settings.

I have faced the same problem, our leader etcd server went down and after replacing it with new we were getting an error
rafthttp: request sent was ignored (cluster ID mismatch)
It was looking for the old cluster-Id and generating some random local cluster with some misconfiguration.
Followed these steps to fix the issue.
Login to other working cluster and remove unreachable member from
the cluster
etcdctl cluster-health
etcdctl member remove member-id
Login to new server and stop if etcd process is running systemctl etcd2 stop
Remove data from the data directory rm -rf /var/etcd2/data Keep backup of this data somewhere in other folder before deleting it.
Now start your cluster with --initial-cluster-state existing parameter, don't use --initial-cluster-state new if you are already adding server to existing cluster.
Now go back to one of the running etcd server and add this new member to cluster etcdctl member add node0 http://$IP:2380
I have spent a lot of time on debugging this issue and now my cluster is running healthy with all members. Hope this information helps.

Add a new node to a existing etcd cluster.
etcdctl member add <new_node_name> --peer-urls="http://<new_node_ip>:2380"
Attention, if you enable TLS, replace http with https
Run etcd in new node. It is important to add "--initial-cluster-state existing", the purpose is telling new node that join the existing cluster, instead of creating a new cluster.
etcd --name <new_node_name> --initial-cluster-state existing ...
Check the result
etcdctl member list

Hadoop HA Namenode goes down with the Error: flush failed for required journal (JournalAndStream(mgr=QJM to [< ip >:8485, < ip >:8485, < ip >:8485]))

Hadoop Namenode goes down almost everyday once.
FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) -
**Error: flush failed for required journal** (JournalAndStream(mgr=QJM to [< ip >:8485, < ip >:8485, < ip >:8485], stream=QuorumOutputStream starting at txid <>))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at
Can someone suggest what are the things that I need to look into for resolving this issue?
I am using VMs for the journal nodes and master nodes. Does it cause any issue?

From the error you pasted. It appears your journal nodes could not talk to the NN in a timely manner. What was going on at the time of this event?
Since you mention that your nodes are vms I would guess you overloaded the hypervisor or it had troubling talking from the NN to the JN and zk quorum.

In my case, this issue was caused due to the difference in the system time between the nodes of the cluster.
To keep the system time in sync, we can execute the commands below in each node.
sudo service ntpd stop
sudo ntpdate pool.ntp.org # Run this command multiple times
sudo service ntpd start
If hue is down, run below command on the hue server machine
sudo service hue start
If namenode is down, start the namenode.
Recurring fix
Add a crontab for the root user on all the nodes of the environment.
or
Install VM tools, to keep the system time in sync.

Percona Xtradb Cluster nodes won't start

I setup percona_xtradb_cluster-56 with three nodes in the cluster. To start the first cluster, i use the following command and it starts just fine:
#/etc/init.d/mysql bootstrap-pxc
The other two nodes however fail to start when i start them normally using the command:
#/etc/init.d/mysql start
The error i am getting is "The server quit without updating the PID file". The error log contains this message:
Error in my_thread_global_end(): 1 threads didn't exit 150605 22:10:29
mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended.
The cluster nodes are running all Ubuntu 14.04. When i use percona-xtradb-cluster5.5, the cluster ann all the nodes run just fine as expected. But i need to use version 5.6 because i am also using GTID which is only available in version 5.6 and not supported in earlier versions.
I was following these two percona documentation to setup the cluster:
https://www.percona.com/doc/percona-xtradb-cluster/5.6/installation.html#installation
https://www.percona.com/doc/percona-xtradb-cluster/5.6/howtos/ubuntu_howto.html
Any insight or suggestions on how to resolve this issue would be highly appreciated.

The problem is related to memory, as "The Georgia" writes. There should be at least 500MB for default setup and bootstrapping. See here http://sysadm.pp.ua/linux/px-cluster.html

Hadoop JobClient: Error Reading task output

I'm trying to process 40GB of Wikipedia English articles on my cluster. The problem is the following repeating error message:
13/04/27 17:11:52 INFO mapred.JobClient: Task Id : attempt_201304271659_0003_m_000046_0, Status : FAILED
Too many fetch-failures
13/04/27 17:11:52 WARN mapred.JobClient: Error reading task outputhttp://ubuntu:50060/tasklog?plaintext=true&attemptid=attempt_201304271659_0003_m_000046_0&filter=stdout
When I run the same MapReduce program on a smaller part of the Wikipedia articles rather than the full set, it works just fine and I get all the desired results. Based on that, I figured maybe its a memory issue. I cleared all the user logs (as specified in a similar post) and tried again. No Use.
I turned down replication to 1 and added a few more nodes. Still no use.
The cluster summary are as follows:
Configured Capacity: 205.76 GB
DFS Used: 40.39 GB
Non DFS USed: 44.66 GB
DFS Remaining: 120.7 GB
DFS Used%: 19.63%
DFS Remaining%: 58.66%
Live Nodes: 12
Dead Nodes: 0
Decomissioned Nodes: 0
Number of Under Replicated Blocks: 0
Each node runs on Ubuntu 12.04 LTS
Any help is appreciated.
EDIT
JobTracker Log: http://txtup.co/gtBaY
TaskTracker Log: http://txtup.co/wEZ5l

Fetch-failures are often due to DNS problems. Check each datanode to be sure that the hostname and ip address it is configured with match DNS resolves for that hostname.
You can do this by visiting each node in your cluster and run hostname and ifconfig and note the hostname and ip address returned. Lets say, for instance, this returns the following:
namenode.foo.com 10.1.1.100
datanode1.foo.com 10.1.1.1
datanode2.foo.com 10.1.1.2
datanode3.foo.com 10.1.1.3
Then, revisit each node and nslookup all the hostnames returned from the other nodes. Verify that the returned ip address matches the one found from ifconfig. For instance, when on datanode1.foo.com, you should do the following:
nslookup namenode.foo.com
nslookup datanode2.foo.com
nslookup datanode3.foo.com
and you should get back:
    10.1.1.100
    10.1.1.2
    10.1.1.3
When you ran your job on a subset of data, you probably didn't have enough splits to start a task on the datanode(s) that are misconfigured.

I had a similar problem and was able to find a solution. The problem lies on how hadoop deals with smaller files. In my case, I had about 150 text files that added up to 10MB. Because of how the files are "divided" into blocks the system runs out of memory pretty quickly. So to solve this you have to "fill" the blocks and arrange your new files so that they are spread nicely into blocks. Hadoop lets you "archive" small files so that they are correctly allocated into blocks.
hadoop archive -archiveName files.har -p /user/hadoop/data /user/hadoop/archive
In this case I created an archive called files.har from the /user/hadoop/data folder and stored it into the folder /user/hadoop/archive. After doing this, I rebalance the cluster allocation using start-balancer.sh.
Now when I run the wordcount example agains the files.har everything works perfectly.
Hope this helps.
Best,
Enrique

I had exactly the same problem with Hadoop 1.2.1 on an 8-node cluster. The problem was in the /etc/hosts file. I removed all entries containing "127.0.0.1 localhost". Instead of "127.0.0.1 localhost" you should map your IP Address to your hostname (e.g. "10.15.3.35 myhost"). Note, that you should that for all nodes in the cluster. So, in a two-node cluster,the master's /etc/hosts should contain "10.15.3.36 masters_hostname" and the slave's /etc/hosts should contain "10.15.3.37 slave1_hostname". After these changes, it would be good to restart the cluster.
Also have a look here for some basic Hadoop troubleshooting :Hadoop Troubleshooting

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

cant' replace dead cassandra node because it doesn't exist in gossip - cassandra-2.0

Some less known issues of Cassandra dead node replacement has been captured in below link based on my experience: https://github.com/laxmikant99/cassandra-single-node-disater-recovery-lessons

Related

Is it possible to demote a master node without using "repmgr standby clone" and pg_rewind

etcd cluster id mistmatch

Hadoop HA Namenode goes down with the Error: flush failed for required journal (JournalAndStream(mgr=QJM to [< ip >:8485, < ip >:8485, < ip >:8485]))

Percona Xtradb Cluster nodes won't start

Hadoop JobClient: Error Reading task output

Categories

Resources