JBossCache evict() versus removeNode() - caching

Reading JBossCache documentation, there are different policies for eviction
JBoss Cache also ships with RemoveOnEvictActionPolicy, which calls Cache.removeNode() for each node that needs to be evicted, instead of Cache.evict().
I've checked the documentation and API but can't figure out the difference between the two.
Does anyone know such a difference?

Looking at the RemoveNodeCommand() and EvictCommand()...
removeNode() removes the node and the nodes children if it has any.
evict() removes the data from a node but does not remove children. Only if the node is a leaf does it remove the node as well.
(version 3.1.0.GA)

Related

Is there any way to check the relocation progress of shards in elasticsearch?

Yesterday, I was adding a node to production elasticsearch cluster once I added it I can use /_cat/health api to check number of relocating shards. And there is another api /_cat/shards to check which shards are getting relocated. However, is there any way or api to check live progress of shards/data movement to the newly added node. Suppose there is a 13GB shards, we've added a node to es cluster can we check how much percent, GBs(MBs or KBs) has moved currently so that we can have a estimate of how much time it will take for reallocation.
Can this be implemented by on our own or suggest this to elasticsearch? If it can be implemented on our own, how to proceed or what pre-requisites I need to know?
you have
GET _cat/recovery?active_only=true&v
GET _cat/recovery?active_only=true&h=index,shard,source_node,target_node,bytes_percent,time
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-recovery.html
Take a look to the Pending Tasks API :
https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html
The task management API returns information about tasks currently executing on one or more nodes in the cluster.
GET /_tasks
You can also see the reasons for the allocation using the allocation explain API:
https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-allocation-explain.html
GET _cluster/allocation/explain

Adding new node to aerospike cluster

We are having a two node cluster of aerospike. We thought of adding two more nodes to the cluster. As soon I added them we are getting queue too deep error on new nodes and as well Device over load on client.
I tried of making migrate-max-num-incoming from 256 to 4. Still the issue persists.
What is the best way to add a new node to cluster without impacting the clients.
More info:
1) We are using SSD based installation
2) we are using mesh node architecture
Your storage is not keeping up.
The following links should help:
1- Understand device overload:
https://discuss.aerospike.com/t/device-overload/733
2- Understand how to tune migrations:
http://www.aerospike.com/docs/operations/manage/migration#lowering-the-migration-rate
3- This could also be caused by defragmentation on the previous nodes in the cluster as data migrating out will cause a vacuum effect and could cause defragmentation activity to pick up, in which case you would want to slow down defragmentation by tuning defrag sleep down:
http://www.aerospike.com/docs/reference/configuration#defrag-sleep
Add one node at a time. Wait till migrations are complete before adding second node. (I assume all nodes are running the same version of Aerospike and configuration is consistent, all have same namespace defined etc.)

Detecting and recovering failed H2 cluster nodes

After going through H2 developer guide I still don't understand how can I find out what cluster node(s) was/were failing and which database needs to be recovered in the event of temporary network failure.
Let's consider the following scenario:
H2 cluster started with N active nodes (is actually it true that H2 can support N>2, i.e. more than 2 cluster nodes?)
(lots DB updates, reads...)
Network connection with one (or several) cluster nodes gets down and node becomes invisible to the rest of the cluster
(lots of DB updates, reads...)
Network link with previously disconnected node(s) restored
It is discovered that cluster node was probably missing (as far as I can see SELECT VALUE FROM INFORMATION_SCHEMA.SETTINGS WHERE NAME='CLUSTER' starts responding with empty string if one node in cluster fails)
After this point it is unclear how to find out what nodes were failing?
Obviously, I can do some basic check like comparing DB size, but it is unreliable.
What is the recommended procedure to find out what node was missing in the cluster, esp. if query above responds with empty string?
Another question - why urlTarget doesn't support multiple parameters?
How I am supposed to use CreateCluster tool if multiple nodes in the cluster failed and I want to recover more than one?
Also I don't understand how CreateCluster works if I had to stop the cluster and I don't want to actually recover any nodes? What's not clear to me is what I need to pass to CreateCluster tool if I don't actually need to copy database.
That is partially right SELECT VALUE FROM INFORMATION_SCHEMA.SETTINGS WHERE NAME='CLUSTER', will return an empty string when queried in standard mode.
However, you can get the list of servers by using Connection.getClientInfo() as well, but it is a two-step process. Paraphrased from h2database.com:
The list of properties returned by getClientInfo() includes a numServers property that returns the number of servers that are in the connection list. getClientInfo() also has properties server0..serverN, where N is the number of servers - 1. So to get the 2nd server from the list you use getClientInfo('server1').
Note: The serverX property only returns IP addresses and ports and not
hostnames.
And before you say simple replication, yes that is default operation, but you can do more advanced things that are outside the scope of your question in clustered H2.
Here's the quote for what you're talking about:
Clustering can only be used in the server mode (the embedded mode does not support clustering). The cluster can be re-created using the CreateCluster tool without stopping the remaining server. Applications that are still connected are automatically disconnected, however when appending ;AUTO_RECONNECT=TRUE, they will recover from that.
So yes if the cluster stops, auto_reconnect is not enabled, and you stick with the basic query, you are stuck and it is difficult to find information. While most people will tell you to look through the API and or manual, they haven't had to look through this one so, my sympathies.
I find it way more useful to track through the error codes, because you get a real good idea of what you can do when you see how the failure is planned for ... here you go.

etcd: change resilient recursive wait

etcd allows clients to safely wait for changes of individual k/v nodes, by supplying a last known index of a node to the wait command. etcd also allows to wait ("recursively") for any changes to child nodes under a certain parent node.
Now, the problem is: is it possible to recursively wait on a parent node in such a way, as to guarantee that no child node changes are ever missed by the client? Parent node index is of no use in this case, as it would not change on child node modification.
If you're just starting up, presumably you have just retrieved the subtree you're watching. The reply has an etcd_index field. Use that as the starting point.
Otherwise, your wait contains the modification index of the change. Use that as a starting point for the next call.
You may have to increase one or two of these values to ensure that you don't get duplicate replies. I don't remember which of these I need to increment on purpose; the code needs tests which ensure that I get every change exactly once, so I adjust the values based on that.

How do I figure out the new master node when my master node fails in ElasticSearch?

Let's say I have 3 nodes. 1 of which is the master.
I have an API (running on another machine) which hits the master and gets my search result. This is through a subdomain, say s1.mydomain.com:9200 (assume the others are pointed to by s2.mydomain.com and s3.mydomain.com).
Now my master fails for whatever reason. How would my API recover from such a situation? Can I hit either S2 or S3 instead? How can I figure out what the new master is? Is there a predictable way to know which one would be picked as the new master should the master go down?
I've googled this and it's given me enough information about how when a master goes down, a failover is picked as the new master but I haven't seen anything clarify how I would need to handle this from the outside looking in.
The master in ElasticSearch is really only for internal coordination. There are no actions required when a node goes down, other than trying to get it back up to get your full cluster performance back.
You can read/write to any of the remaining nodes and the data replication will keep going. When the old master node comes back up, it will re-join the cluster once it has received the updated data. In fact, you never need to worry if the node you are writing on is the master node.
There are some advanced configurations to alter these behaviors, but ElasticSearch comes with suitable defaults.

Resources