We have RabbitMQ server hosted on AWS and recently we received notice that instance will be under maintenance and will become temporary unavailable for few hours.
As it is production server we want to avoid downtime for our users and currently thinking about strategies to migrate RabbitMQ to another server without loosing data. It looks like there are two options:
Try to connect other nodes from different machines and replicate
data to them.
Install rabbit on new machine and copy mnesia files
from old server to new one. Switch on new server, switch off old
one. E.G. It is possible to do image snapshot on AWS which can simplify process.
I was not able to find a way to implement (1) without cleaning data thus this option does not look workable.
As for (2) it looks like very manual and creepy.
Are there any other data migration strategies or am I missing something here?
I managed to set up flow for 1st option to replicate data without down time by setting up RabbitMQ cluster. To do that I followed manual, but tweaked two things to make it work for my stack:
RabbitMQ cluster in AWS does not work with ip addresses as fqdn short names so to make cluster machines to see each other you need to edit /etc/hosts file and rearrange "string" names to your cluster machines:
vi /etc/hosts
File should look like something like this:
127.0.0.1 localhost
10.242.86.191 ip-10-242-86-191
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
After setting up cluster you need to set up replication as described here but important fact is that queues by default are not replicated. So you need to set up policy for queue replication like that:
rabbitmqctl set_policy policy_name "queue_pattern" '{"ha-mode":"all", "ha-sync-mode":"automatic"}' -p your_vhost
By the way -p your_vhost parameter is not mentioned in docs - be carefull to specify vhost if you use any.
After setting everything up queues replicated across the cluster and syncronized via mnesia, that enabled me to switch first cluster machine off without downtime and switch on after maintance without lost of data.
Related
I am doing a POC on Consul for supporting service discovery and multiple microservice versions. Consul clients and server cluster(3 servers) are set up on Linux VMs. I followed the documentation at Consul and the set up is successful.
Here is my doubt. My set up is completely on VMs. I've added a service definition using HTTP API. The same service is running on two nodes.
The services are correctly registered:
curl http://localhost:8600/v1/catalog/service/my-service
gives me the two node details.
When I do a DNS query:
dig #127.0.0.1 -p 8600 my-service.service.consul
I am able to see the expected results with the node which hosts the service. But I cannot ping the service since the service name is not resolved.
ping -c4 my-service or ping -c4 my-service.service.consul
ping: unknown host.
If I enter a mapping for my-service in /etc/hosts file, I can ping this, only from the same VM. I won't be able to ping this from another VM on the same LAN or WAN.
The default port for DNS is 53. Consul DNS interface listens to 8600. I cannot use Docker for DNS forwarding. Is it possible I missed something here? Can consul DNS query work without Docker/dnsmasq or iptables updates?
To be clear, here is what I would like to have as the end result:
ping my-service
This needs to ping the nodes I have configured, in a round robin fashion.
Please bear with me if this question is basic, and I've gone through each of the consul related questions in SO.
Also gone through this and this and these too says I need to do extra set up.
Wait! Please don't do this!
DO. NOT. RUN. CONSUL. AS. ROOT.
Please. You can, but don't. Instead do the following:
Run a caching or forwarding DNS server on your VMs. I'm bias toward dnsmasq because of its simplicity and stability in the common case.
Configure dnsmasq to forward the TLD .consul to the consul agent listening on 127.0.0.1:8600 (the default).
Update your /etc/resolv.conf file to point to 127.0.0.1 as your nameserver.
There are a few ways of doing this, and the official docs have a write up that is worth looking into:
https://www.consul.io/docs/guides/forwarding.html
That should get you started.
This can be a pretty complicated topic, but the simplest way is to change consul to bind to port 53 using the ports directive, and add some recursers to the consul config can pass real DNS requests on to a host that has full DNS functionality. Something like these bits:
{
"recursors": [
"8.8.8.8",
"8.8.4.4"
],
"ports": {
"dns": 53
}
}
Then modify your system to use the consul server for dns with a nameserver entry in /etc/resolve.conf. Depending on your OS, you might be able to use a port in the resolv.conf file, and avoid having to deal with Consul needing root to bind to port 53.
In a more complicated scenario, I know many people that use unbound or bind to do split DNS and essentially do the opposite, routing the .consul domain to the consul cluster on a non-privileged port at the org level of their DNS infrastructure.
Despite ElasticSearch and Kibana both running on my production server, I'm unable to visit the GUI over the public IP: http://52.4.153.19:5601/
Localhost curls return 200 but console errors on the browser report timeouts after a few images are retrieved.
I've successfully installed, run, and accessed Kibana on my local (Windows 10) and on my staging AWS EC2 Ubuntu 14.04 environment. I'm able to access both over port 5601 on localhost and the staging environment is accessible over the public IP address and all domains addressed accordingly. The reverse proxy also works and all status indicators are green on the dashboard.
I'm running Kibana 4.5, ElasticSearch 2.3.1, Apache 2.4.12
I've used the same exact volume from the working environment to attach to the production instance, so everything is identical on the two volumes, except that the staging environment's apache vhost uses a subdomain while the production environment's servername is the base domain. Both are configured for SSL wildcards. Both are in separate availability zones at Amazon. I've tried altering the server block to use a subdomain on the production server, just to see if the domain was impactful but the error remains.
I also tried running one instance individually, in case EC2 had some kind of networking error with 0.0.0.0 but I'm unable to come to a resolution. All logs and configurations are identical between the two servers for ElasticSearch and Kibana.
I've tried deleting and re-creating the kibana index, tried alternate settings inclusive of the host, elasticsearch url, extending the max ping and timeout, max retries, extended the apache limits, http.cors to allow different origins. I've tried other ports but both servers are indicating that 5601 is listening in the same way.
I also had the same problem on a completely different volume that was previously attached to this instance.
The only difference I can see is that the working version pings fine while the non-working version has a 100% packet loss when pinging the IP, although I can't imagine why that would be, as I'm able to reach the website on 80, just fine. I can also access various other tools running on other ports. I assume there might be some kind of networking conflict. Any ideas?
May be port 5601 is blocked by firewall
Allow incoming connections to port 5601 by:
sudo iptables -I INPUT -p tcp --dport 5601 -j ACCESS
For security:
Modify above mentioned command and accept connection only from specific address. (See man iptables)
or use Shield plugin for elasticseach
Sorry, forgot to update this question. The answer turned out being that I simply needed to deploy a new instance. Simply by creating a clone of the instance, I was able to resolve the issue. I've had networking problems at AWS, before, with their internal dns/ip conflicts, so I've had to do so, in the past and this turned out to be the quickest and cleanest solution, albeit not providing any definitive insight into the cause.
I have installed a single node haodoop cluster on using Hortonworks/Ambari on Amazon's ec2 host.
Since I don't want this cluster running 24/7, I stop the instance when done. When I reboot the instance later, I get a new IP address and then ambari no longer is able to start the Hadoop related services.
Is there a way other than completely redeploying to reconfigure the cluster so the services will start?
It looks like the IP address lives in various xml files under /etc, in the postgres database table ambari, and possibly other places I haven't found yet.
I tried updating the xml files and postgres database with updated versions of the ip address, internal and external dns names as I could find them, but to no avail. I have not been able to restart the services.
The reason I am doing this is to possibly save the deployment time and data configuration on hdfs and other project specific setup each time I restart the host.
Any suggestions?
Thanks!
Elastic IP can be used. Also, since you mentioned it being a single node cluster - you can use localhost or private IP.
If you use elastic IP, your UIs will always be on the same public IP. However, if you use private IP or localhost and do not associate your instance with an elastic IP you will have to look for public IP everytime you start the instance and then connect to the web UI using the IP.
Thanks for the help, both Harman and TJ are correct. I haven't used an elastic IP because I might have more than one of these running and a time, and for now at least, I don't mind looking up the public ip address.
Harman's suggestion of using "localhost" as the fqdn when setting up ambari in the first place is a really good idea in retrospect. Unless I go through the whole setup again, that's water under the bridge for me, but I recommend this to others who might read this post.
In my case, I figured this out on my own before coming back to the page. The specific step I took was insanely simple after all, thanks to Occam's Razor.
I added the following line in /etc/hosts:
<new internal IP> <old internal dns name>
and then did
ambari-server restart. from the command line. Then I am able to restart all services after logging into ambari.
I have 2 Linux VM's (both at same datacenter of Cloud Provider): Elastic1 and Elastic2 (where Elastic 2 is a clone of Elastic 1). Both have same version centos, same cluster name, and same version ES, again - Elastic2 is a clone.
I use the service wrapper to automatically start them both at boot, and introduced each others ip to their respective iptables file, so now I can successfully ping between nodes.
I thought this would be enough to allow ES to form a cluster, but to no avail.
Both Elastic1 and Elastic2 have 1 index each named e1 and e2 respectfully. Each index has 1 shard with no replicas.
I can use the head and paramedic plugins on each server successfully. And use curl -XGET 'http://localhost:9200/_cluster/nodes?pretty=true' to validate the cluster name is the same and each server only has 1 node listed.
Is there anything glaring out at why these nodes arent talking? Ive restarted the ES service and rebooted on both servers to no avail. Could cloning be the problem??
In your elasticsearch.yml:
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ['host1:9300', 'host2:9300']
So, just list your node IPs with the transport port (default is 9300) under unicast hosts. Multicast is enabled by default, but is generally impossible on cloud environments without use of external plugins.
Also, make sure to check your IP rules / security groups! That's easy to forget.
I set up a tomcat 7 cluster by including the -part in the server.xml.
In the Docs ( http://tomcat.apache.org/tomcat-7.0-doc/cluster-howto.html ) it says:
The IP broadcasted is java.net.InetAddress.getLocalHost().getHostAddress() (make sure you don't broadcast 127.0.0.1, this is a common error)
Unfortunately getLocalHost().getHostAddress() return 127.0.1.1 for all my virtual machines (Ubuntu run in Virtual Box under Win7) instead of the correct ip that I can reach the vm's with, ie 10.42.29.191.
Question:
Is there a way to tell tomcat what ip to send to other members of the cluster via the multicast? Or can I specify (e.g. code) a different way to obtain the ip?
Additional info:
My cluster seems to fail session replication and above "error" could be the cause of it. Glassfish doesn't do session replication either, maybe it's the same error. If you could give information for glassfish configuration regarding this I'd be glad too. The multicast between virtual machines works according to the tool iperf.
Since the vm is a Ubuntu machine, I had to edit the file /etc/hosts.
replace entry like this:
127.0.1.1 tim-VirtualBox
with the correct ip:
10.42.29.191 tim-VirtualBox