Elasticsearch DNS Host Setup in a cluster - elasticsearch

I have been asked to take care of an ES cluster with 14 machines.
For example:
Hostnames of my machines start with es1.abc.com till es14.abc.com
These all are part of a cluster called "es-test" configured in elasticsearch.yml
Now suppose the DNS A record of this cluster is es.abc.com.
The current DNS resolves es.abc.com to 4 machines es1.abc.com to es4.abc.com in a round robin set up. So all the HTTP requests hit these 4 machines first.
The only difference I see is that these 4 machines having SSD and other 10 machines have hard disks
I do not have any idea why the original engineer set up like this.
In ES documentation , I could not find any particular mention of DNS resolution guidelines.
I do not see a reason why I can not set up DNS resolution to all 14 machines.
I am not an expert in ES. If I have missed something obvious pleaase point me to the right documentation.
Your help would be truly appreciated

Related

Marklogic latency : Document not found

I am working on a clustered marklogic environment where we have 10 Nodes. All nodes are shared E&D Nodes.
Problem that we are facing:
When a page is written in marklogic its takes some time (upto 3 secs) for all the nodes in the cluster to get updated & its during this time if I then do a read operation to fetch the previously written page, its not found.
Has anyone experienced this latency issue? and looked at eliminating it then please let me know.
Thanks
It's normal for a new document to only appear after the database transaction commits. But it is not normal for a commit to take 3-sec.
Which version of MarkLogic Server?
Which OS and version?
Can you describe the hardware configuration?
How large are these documents? All other things equal, update time should be proportional to document size.
Can you reproduce this with a standalone host? That should eliminate cluster-related network latency from the transaction, which might tell you something. Possibly your cluster network has problems, or possibly one or more of the hosts has problems.
If you can reproduce the problem with a standalone host, use system monitoring to see what that host is doing at the time. On linux I favor something like iostat -Mxz 5 and top, but other tools can also help. The problem could be disk I/O - though it would have to be really slow to result in 3-sec commits. Or it might be that your servers are low on RAM, so they are paging during the commit phase.
If you can't reproduce it with a standalone host, then I think you'll have to run similar system monitoring on all the hosts in the cluster. That's harder, but for 10 hosts it is just barely manageable.

Node thinks that it is online when it's network cable is unplugged. Pacemaker/Corosync

I am trying to cluster 2 computers together with Pacemaker/Corosync. The only resource that they share is an ocf:heartbeat:IPaddr this is the main problem:
Since there are only two nodes failover will only occur if the no-quorum-policy=ignore.
When the network cable is pulled from node A, corosync on node A binds to 127.0.0.1 and pacemaker believes that node A is still online and the node B is the one offline.
Pacemaker attempts to start the IPaddr on Node A but it fails to start because there is no network connection. Node B on the other hand recognizes that node B is offline and if the IPaddr service was started on node A it will start it on itself (node B) successfully.
However, since the service failed to start on node A it enters a fatal state and has to be rebooted to rejoin the cluster. (you could restart some of the needed services instead.)
1 workaround is the set start-failure-is-fatal="false" which makes node A continue to try to start the IPaddr service until it is successful. the problem with this is that once it is successful you have a ip conflict between the two nodes until they re cluster and one of the gives up the resource.
I am playing around with the idea of having a node attribute that mirrors cat /sys/class/net/eth0/carrier which is 1 when the cable is connected and zero when it is disconnected and then having a location rule that says if "connected" == zero don't start service kind of thing, but we'll see.
Any thoughts or ideas would be greatly appreciated.
After speaking with Andrew Beekhof (Author of Pacemaker) and Digimer on the freenote.net/#linux-cluster irc network, I have learned that the actual cause behind this issue is do to the cluster being improperly fenced.
Fencing or having stonith enabled is absolutely essential to having a successful High Availability Cluster. The following page is a must read on the subject:
Cluster Tutorial: Concept - Fencing
Many thanks to Digimer for providing this invaluable resource. The section on clustering answers this question, however the entire article is beneficial.
Basically fencing and S.T.O.N.I.T.H. (Shoot the other node in the head) are mechanisms that a cluster uses to make sure that a down node is actually dead. It needs to do this to avoid shared memory corruption, split brain status (multiple nodes taking over shared resources), and most make sure that your cluster does not get stuck in recovery or crash.
If you don't have stonith/fencing configured and enabled in your cluster environment you really need it.
Other issues to look out for are Stonith Deathmatch, and Fencing Loops.
In short the issue of loss of network connectivity causing split brain was solved by creating our own Stonith Device and writing a stonith agent following the /usr/share/doc/cluster-glue/stonith/README.external tutorial, and then writing a startup script that checks to see if the node is able to support joining the cluster and then starts corosync or waits 5 minutes and checks again.
According your configuration, the heartbeat between two nodes will use "127.0.0.1" , i think it's totally wrong.
Usually the corosync need to bind to private IPs, and the resource IPaddr service should use different ip which named traffic IP.
For example:
Node A: 192.168.1.00 (For heartbeat); 10.0.0.1(traffic ip)
Node B: 192.168.1.101 (For heartbeat) ; 10.0.0.2(traffic ip)
If my understanding is correct ,ipaddr service will startup an virtual ip base on traffic ip, we assume it's 10.0.0.3.

Haproxy Load Balancer, EC2, writing my own availability script

I've been looking at high availability solutions such as heartbeat, and keepalived to failover when an haproxy load balancer goes down. I realised that although we would like high availability it's not really a requirement at this point in time to do it to the extent of the expenditure on having 2 load balancer instances running at any one time so that we get instant failover (particularly as one lb is going to be redundant in our setup).
My alternate solution is to fire up a new load balancer EC2 instance from an AMI if the current load balancer has stopped working and associate it to the elastic ip that our domain name points to. This should ensure that downtime is limited to the time it takes to fire up the new instance and associate the elastic ip, which given our current circumstance seems like a reasonably cost effective solution to high availability, particularly as we can easily do it multi-av zone. I am looking to do this using the following steps:
Prepare an AMI of the load balancer
Fire up a single ec2 instance acting as the load balancer and assign the Elastic IP to it
Have a micro server ping the current load balancer at regular intervals (we always have an extra micro server running anyway)
If the ping times out, fire up a new EC2 instance using the load balancer AMI
Associate the elastic ip to the new instance
Shut down the old load balancer instance
Repeat step 3 onwards with the new instance
I know how to run the commands in my script to start up and shut down EC2 instances, associate the elastic IP address to an instance, and ping the server.
My question is what would be a suitable ping here? Would a standard ping suffice at regular intervals, and what would be a good interval? Or is this a rather simplistic approach and there is a smarter health check that I should be doing?
Also if anyone foresees any problems with this approach please feel free to comment
I understand exactly where you're coming from, my company is in the same position. We care about having a highly available fault tolerant system however the overhead cost simply isn't viable for the traffic we get.
One problem I have with your solution is that you're assuming the micro instance and load balancer wont both die at the same time. With my experience with amazon I can tell you it's defiantly possible that this could happen, however unlikely, its possible that whatever causes your load balancer to die also takes down the micro instance.
Another potential problem is you also assume that you will always be able to start another replacement instance during downtime. This is simply not the case, take for example an outage amazon had in their us-east-1 region a few days ago. A power outage caused one of their zones to loose power. When they restored power and began to recover the instances their API's were not working properly because of the sheer load. During this time it took almost 1 hour before they were available. If an outage like this knocks out your load balancer and you're unable to start another you'll be down.
That being said. I find the ELB's provided by amazon are a better solution for me. I'm not sure what the reasoning is behind using HAProxy but I recommend investigating the ELB's as they will allow you to do things such as auto-scaling etc.
For each ELB you create amazon creates one load balancer in each zone that has an instance registered. These are still vulnerable to certain problems during severe outages at amazon like the one described above. For example during this downtime I could not add new instances to the load balancers but my current instances ( the ones not affected by the power outage ) were still serving requests.
UPDATE 2013-09-30
Recently we've changed our infrastructure to use a combination of ELB and HAProxy. I find that ELB gives the best availability but the fact that it uses DNS load balancing doesn't work well for my application. So our setup is ELB in front of a 2 node HAProxy cluster. Using this tool HAProxyCloud I created for AWS I can easily add auto scaling groups to the HAProxy servers.
I know this is a little old, but the solution you suggest is overcomplicated, there's a much simpler method that does exactly what you're trying to accomplish...
Just put your HAProxy machine, with your custom AMI in an auto-scaling group with a minimum AND maximum of 1 instance. That way when your instance goes down the ASG will bring it right back up, EIP and all. No external monitoring necessary, same if not faster response to downed instances.

Websphere 7 cluster

I have a Websphre 7 cluster with nodes running on different servers.
When a server with one node loses connection to the network, it takes about a minute, after which the Websphre knows that the member is unavailable.
How can I speed up the status updates?
UPD. The cluster is used only for EJB. EJB called from the local network.
I think this is always going to be a tradeoff between performance during normal operations and how quickly a down cluster member is detected.
See this article, Understanding HTTP plug-in failover in a clustered environment and this plugin-cfg.xml reference in the WebSphere 7 InfoCenter.
From the article, the answer will involve the ConnectTimeout, ServerIOTimeout, and RetryInterval settings, but note the warning that:
In an environment with busy workload or a slow network connection, setting this value too low could make the HTTP plug-in mark a cluster member down falsely. Therefore, caution should be used whenever choosing a value for ConnectTimeout.​​

AppFabric Redundancy

We just tested an AppFabric cluster of 2 servers where we removed the "lead" server. The second server timeouts on any request to it with the error:
Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0017>:SubStatus<ES0006>:
There is a temporary failure. Please retry later.
(One or more specified Cache servers are unavailable, which could be caused by busy network or servers. Ensure that security permission has been granted for this client account on the cluster and that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Retry later.)
In practive this means that if one server in the cluster goes down then they all go down. (Note we are not using Windows cluster, only linking multiple AppFabric cache servers to each other.)
I need the cluster to continue operating even if a single server goes down. How do I do this?
(I realize this question is borderlining Serverfault, but imho developers should know this.)
You'll have to install the AppFabric cache on at least three lead servers for the cache to survive a single server crash. The docs state that the cluster will only go down if the "majority" of the lead servers go down, but in the fine print, they explain that 1 out of 2 constitutes a majority. I've verified that removing a server from a three lead-node cluster works as advertised.
Typical distributed systems concept. For a write or read quorum to occur in an ensemble you need to have 2f + 1 servers up where f is number of servers failing. I think appfabric or any CP (as in CAP theorem) consensus based systems need this to happen for working of the cluster.
--Sai
Thats actually a problem with the Appfabric architecture and it is rather confusing in terms of the "lead-host" concept. The idea is that the majority of lead hosts should be running so that the cluster remains up and running. So if you had three servers you'd have to have at least two lead hosts constantly communicating with each other and eating up server resources and if both go down then the whole cluster fails. The idea is to have a peer-to-peer architecture where all servers act as peers meaning that even if two servers go down the cluster remains functioning with no application downtimes. Try NCache:
http://www.alachisoft.com/ncache/

Resources