I am looking for some help in figuring out how to clear what looks like a corruption in Zookeeper. Our setup was running fine with Solr Cloud. At some point the root partition on one of the cluster nodes became full and the system went down. After we brought it back up, Solr was not responding and could not start.
It looks like there is a corruption in the zookeeper data. Anytime a client tries to access the node /overseer/queue it will kill the connection with an error:
..."KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /overseer/queue"
Opening up the zk client shell allows us to list other nodes, but if we try to delete/list/clear/etc on the /overseer node it will fail with this error.
Are there any manual steps that could be done to clear this out? Any help would be appreciated.
Edit: Also it looks like there are over 200k child nodes under the /overseer/queue node. Maybe this has something to do with it, but we can't delete the child nodes if we can't even list them out.
The Zookeeper database isn't corrupt, but zookeeper has a limit on the maximum response size, and listing 200k children of a znode exceeds this max response size.
To work around this, you can set jute.maxbuffer to a large value to let you list and delete the nodes under queue. You need to update this setting on all the servers, and the client you are using to clean up.
There is an open bug to fix this, ZOOKEEPER-1162 .
Related
I have set up a cluster of three elastic search nodes all master eligible with 2 being the minimum required. I have configured a client to then bulk upload using the low level client with a static connection pool using the code below.
What I am trying to test is live fail over scenarios i.e. start client with three nodes available and then randomly drop one (shutting down the VM), but keep two up. However I am not seeing the behavior I would expect, it keeps trying the dead node. It actually it seems to take up to about sixty seconds before it moves to the next node.
What I would expect is it to do is to take a the failed attempt and mark that node as potentially dead but at least move on to the next node. What is odd is this is the behavior I get if I start my application with only two of the three nodes available in my list or if I just stop the elastic search service during a test rather than a power down.
Is there a correct way to deal with such a case and get it to move to the next available node as quickly as possible? Or do I need to potentially back off in my code for up to sixty seconds before attempting a republication?
var nodes = new[]
{
new Node(new Uri("http://172.16.2.10:9200")),
new Node(new Uri("http://172.16.2.11:9200")),
new Node(new Uri("http://172.16.2.12:9200"))
};
var connectionPool = new StaticConnectionPool(nodes);
var settings = new ConnectionConfiguration(connectionPool)
.PingTimeout(TimeSpan.FromSeconds(10))
.RequestTimeout(TimeSpan.FromSeconds(20))
.ThrowExceptions()
.MaximumRetries(3);
_lowLevelClient = new ElasticLowLevelClient(settings);
The following I then have wrapped in a try catch where I retry for a maximum of three times before I consider it a failed attempt and revert to an error strategy.
ElasticsearchResponse<Stream> indexResponse = _lowLevelClient.Bulk<Stream>(data);
Any input is appreciated,
Thank you.
The tests for the client include tests for failover scenarios from which the API conventions documentation is generated. Specifically, take a look at the retry and failover documentation
With a StaticConnectionPool, the nodes to which requests can be made are static and never refreshed to reflect nodes that may join and leave the cluster, but they will be marked as being dead if a bad response is returned, and will be taken out of rotation for executing requests on for a configurable dead time, controlled by DeadTimeout and MaxDeadTimeout on connection settings.
The audit trail on the response should provide a timeline of what has happened for a given request, which is easiest to see with response.DebugInformation. The Virtual Clustering test harness (an example) that are part of the Tests project may help to ascertain the correct settings for the behaviour you're after.
We are having a two node cluster of aerospike. We thought of adding two more nodes to the cluster. As soon I added them we are getting queue too deep error on new nodes and as well Device over load on client.
I tried of making migrate-max-num-incoming from 256 to 4. Still the issue persists.
What is the best way to add a new node to cluster without impacting the clients.
More info:
1) We are using SSD based installation
2) we are using mesh node architecture
Your storage is not keeping up.
The following links should help:
1- Understand device overload:
https://discuss.aerospike.com/t/device-overload/733
2- Understand how to tune migrations:
http://www.aerospike.com/docs/operations/manage/migration#lowering-the-migration-rate
3- This could also be caused by defragmentation on the previous nodes in the cluster as data migrating out will cause a vacuum effect and could cause defragmentation activity to pick up, in which case you would want to slow down defragmentation by tuning defrag sleep down:
http://www.aerospike.com/docs/reference/configuration#defrag-sleep
Add one node at a time. Wait till migrations are complete before adding second node. (I assume all nodes are running the same version of Aerospike and configuration is consistent, all have same namespace defined etc.)
I use ganglia to monitor hadoop. I choose the metric "dfs.datanode.HeartbeatsAvgTime" to judge whether the datanode(I mean datanode service, not the host.) is down or not.
When the datanode is working fine, the "dfs.datanode.HeartbeatsAvgTime" is remaining changing. That's to say ,the value in the graph is varing.
It looks like this:
but after I stop the datanode service, the value in the graph remains not change.
It looks like this:
The value in the second graph remains unchanged.But the value is not 0 or infinity. So, I can not judge the datanode service is up or down.
It is the same when deal with other metrics.
I've check rrd which is used by ganglia to store the metric data with "rrdtool fetch". The value about the metric is stored in a *.rrd file.when I check the file, I find that after I stop datanode, the value about the metric is also updated. But the value is not varing.
I read the references about rrd in rrd's official website. they says that, if rrd did not receive update date between the interval setted before, rrd write UNKNOWN in the *.rrdfile.
I think that there may be two causes to raise the problem.
when the gmetad did not receive metric. it update rrd with the old value.So the graph stay the same as the old value.
when gmond can not collect metric, it report the old value to gmetad.
But I haven't really find any evidence in the source code in the github of ganglia.
So do you know how to solve the problem that value in the graph remain unchanged? or do you know other details about how to monitor hadoop cluster with ganglia?
#DaveStephens #Lorin Hochstein
After my struggle to solve the problem, I found that if we set dmax of the metric in hadoop-metrics2.properties, when the hadoop break down, ganglia would not receive any data, and return UNKNOW. The graph in the ganglia website will disappear . when ganglia + nagios, nagios will also return UNKNOW status. that's enough to judge whether hadoop is up or down.
dmax means that after dmax time, hadoop will destroy the metric.
After going through H2 developer guide I still don't understand how can I find out what cluster node(s) was/were failing and which database needs to be recovered in the event of temporary network failure.
Let's consider the following scenario:
H2 cluster started with N active nodes (is actually it true that H2 can support N>2, i.e. more than 2 cluster nodes?)
(lots DB updates, reads...)
Network connection with one (or several) cluster nodes gets down and node becomes invisible to the rest of the cluster
(lots of DB updates, reads...)
Network link with previously disconnected node(s) restored
It is discovered that cluster node was probably missing (as far as I can see SELECT VALUE FROM INFORMATION_SCHEMA.SETTINGS WHERE NAME='CLUSTER' starts responding with empty string if one node in cluster fails)
After this point it is unclear how to find out what nodes were failing?
Obviously, I can do some basic check like comparing DB size, but it is unreliable.
What is the recommended procedure to find out what node was missing in the cluster, esp. if query above responds with empty string?
Another question - why urlTarget doesn't support multiple parameters?
How I am supposed to use CreateCluster tool if multiple nodes in the cluster failed and I want to recover more than one?
Also I don't understand how CreateCluster works if I had to stop the cluster and I don't want to actually recover any nodes? What's not clear to me is what I need to pass to CreateCluster tool if I don't actually need to copy database.
That is partially right SELECT VALUE FROM INFORMATION_SCHEMA.SETTINGS WHERE NAME='CLUSTER', will return an empty string when queried in standard mode.
However, you can get the list of servers by using Connection.getClientInfo() as well, but it is a two-step process. Paraphrased from h2database.com:
The list of properties returned by getClientInfo() includes a numServers property that returns the number of servers that are in the connection list. getClientInfo() also has properties server0..serverN, where N is the number of servers - 1. So to get the 2nd server from the list you use getClientInfo('server1').
Note: The serverX property only returns IP addresses and ports and not
hostnames.
And before you say simple replication, yes that is default operation, but you can do more advanced things that are outside the scope of your question in clustered H2.
Here's the quote for what you're talking about:
Clustering can only be used in the server mode (the embedded mode does not support clustering). The cluster can be re-created using the CreateCluster tool without stopping the remaining server. Applications that are still connected are automatically disconnected, however when appending ;AUTO_RECONNECT=TRUE, they will recover from that.
So yes if the cluster stops, auto_reconnect is not enabled, and you stick with the basic query, you are stuck and it is difficult to find information. While most people will tell you to look through the API and or manual, they haven't had to look through this one so, my sympathies.
I find it way more useful to track through the error codes, because you get a real good idea of what you can do when you see how the failure is planned for ... here you go.
Let's say I have 3 nodes. 1 of which is the master.
I have an API (running on another machine) which hits the master and gets my search result. This is through a subdomain, say s1.mydomain.com:9200 (assume the others are pointed to by s2.mydomain.com and s3.mydomain.com).
Now my master fails for whatever reason. How would my API recover from such a situation? Can I hit either S2 or S3 instead? How can I figure out what the new master is? Is there a predictable way to know which one would be picked as the new master should the master go down?
I've googled this and it's given me enough information about how when a master goes down, a failover is picked as the new master but I haven't seen anything clarify how I would need to handle this from the outside looking in.
The master in ElasticSearch is really only for internal coordination. There are no actions required when a node goes down, other than trying to get it back up to get your full cluster performance back.
You can read/write to any of the remaining nodes and the data replication will keep going. When the old master node comes back up, it will re-join the cluster once it has received the updated data. In fact, you never need to worry if the node you are writing on is the master node.
There are some advanced configurations to alter these behaviors, but ElasticSearch comes with suitable defaults.