Changing hostname breaks Rabbitmq when running on Kubernetes - amazon-ec2

I'm trying to run Rabbitmq using Kubernetes on AWS. I'm using the official Rabbitmq docker container. Each time the pod restarts the rabbitmq container gets a new hostname. I've setup a service (of type LoadBalancer) for the pod with a resolvable DNS name.
But when I use an EBS to make the rabbit config/messsage/queues persistent between restarts it breaks with:
exception exit: {{failed_to_cluster_with,
['rabbitmq#rabbitmq-deployment-2901855891-nord3'],
"Mnesia could not connect to any nodes."},
{rabbit,start,[normal,[]]}}
in function application_master:init/4 (application_master.erl, line 134)
rabbitmq-deployment-2901855891-nord3 is the previous hostname rabbitmq container. It is almost like Mnesia saved the old hostname :-/
The container's info looks like this:
Starting broker...
=INFO REPORT==== 25-Apr-2016::12:42:42 ===
node : rabbitmq#rabbitmq-deployment-2770204827-cboj8
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.config
cookie hash : XXXXXXXXXXXXXXXX
log : tty
sasl log : tty
database dir : /var/lib/rabbitmq/mnesia/rabbitmq
I'm only able to set the first part of the node name to rabbitmq using the RABBITMQ_NODENAME environment variable.
Setting RABBITMQ_NODENAME to a resolvable DNS name breaks with:
Can't set short node name!\nPlease check your configuration\n"
Setting RABBITMQ_USE_LONGNAME to true breaks with:
Can't set long node name!\nPlease check your configuration\n"
Update:
Setting RABBITMQ_NODENAME to rabbitmq#localhost works but that negates any possibility to cluster instances.
Starting broker...
=INFO REPORT==== 26-Apr-2016::11:53:19 ===
node : rabbitmq#localhost
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.config
cookie hash : 9WtXr5XgK4KXE/soTc6Lag==
log : tty
sasl log : tty
database dir : /var/lib/rabbitmq/mnesia/rabbitmq#localhost
Setting RABBITMQ_NODENAME to the service name, in this case rabbitmq-service like so rabbitmq#rabbitmq-service also works since kubernetes service names are internally resolvable via DNS.
Starting broker...
=INFO REPORT==== 26-Apr-2016::11:53:19 ===
node : rabbitmq#rabbitmq-service
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.config
cookie hash : 9WtXr5XgK4KXE/soTc6Lag==
log : tty
sasl log : tty
database dir : /var/lib/rabbitmq/mnesia/rabbitmq#rabbitmq-service
Is this the right way though? Will I still be able to cluster multiple instances if the node names are the same?

The idea is to use a different 'service' and 'deployment' for each of the node you want to create.
As you said, you have to create a custom NODENAME for each i.e:
RABBITMQ_NODENAME=rabbit#rabbitmq-1
Also rabbitmq-1,rabbitmq-2,rabbitmq-3 have to be resolved from each nodes. For that you can use kubedns. The /etc/resolv.conf will look like:
search rmq.svc.cluster.local
and /etc/hosts must contains:
127.0.0.1 rabbitmq-1 # or rabbitmq-2 on node 2...
The services are here to create a stable network identity for each nodes
rabbitmq-1.svc.cluster.local
rabbitmq-2.svc.cluster.local
rabbitmq-3.svc.cluster.local
The different deployments resources will allow you to mount a different volume on each node.
I'm working on a deployment tool to simplify those actions:
I've done a demo on how I scale and deploy rabbitmq from 1 to 3 nodes on kubernetes:
https://asciinema.org/a/2ktj7kr2d2m3w25xrpz7mjkbu?speed=1.5
More generally, the complexity your facing to deploy a clustered application is addressed in the 'petset proposal': https://github.com/kubernetes/kubernetes/pull/18016

In addition to the first reply by #ant31:
Kubernetes now allows to setup a hostname, e.g. in yaml:
template:
metadata:
annotations:
"pod.beta.kubernetes.io/hostname": rabbit-rc1
See https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns # A Records and hostname Based on Pod Annotations - A Beta Feature in Kubernetes v1.2
It seems that the whole configuration alive multiple restarts or re-schedules. I've not setup a cluster however I'm going to follow the tutorial for mongodb, see https://www.mongodb.com/blog/post/running-mongodb-as-a-microservice-with-docker-and-kubernetes
The approach will be probably almost same from kubernetes point of view.

Related

Kubernetes logs not found in default locations?

In my k8s environment where spring-boot applications runs, I checked log location in /var/log and /var/lib but both empty. Then I found log location in /tmp/spring.log . It seems this the default log location. My problem are
How kubectl log knows it should read logs from /tmp location. I get log output on kubectl logs command.
I have fluent-bit configured where it has input as
following
[INPUT]
Name tail
Tag kube.dev.*
Path /var/log/containers/*dev*.log
DB /var/log/flb_kube_dev.db
This suggest it should reads logs from /var/log/containers/ but it does not have logs. However i am getting fluent-bit logs successfully. What am i missing here ?
Docker logs only contain the logs that are dumped on STDOUT by your container's process with PID 1 (your container's entrypoint or cmd process).
If you want to see the logs via kubectl logs or docker logs, you should redirect your application logs to STDOUT instead of file /tmp/spring.log. Here's an excellent example of how this can achieved with minimal effort.
Alternatively, you can also use hostPath volumeMount. This way, you can directly access the log from the path on the host.
Warning when using hostPath volumeMount
If the pod is shifted to another host due to some reason, you logs will not move along with it. A new log file will be created on this new host at the same path.
If you are searching for the actual location of the logs outside the containers (and on the host nodes of the cluster), this depends on a couple things. I suppose you are using Docker to run your containers under Kubernetes, which is the most common setup.
On each node of your Kubernetes cluster, you can use the following command to check what is the logging driver being currently used:
docker info | grep -i logging
The default value should be json-file, which means that logs are being written as jsons from the containers, to a certain location on your host nodes.
If you find another driver, such as for example journald, then that means Docker logging driver is sending logs directly to the systemd journal. There are many logging drivers, so as a first check, you should be sure that all yours Kubernetes nodes are configured to log as json files (or, in the way you need to harvest them).
Once this is done, you can start checking where your containers are logging their own log. Choose a Pod to analyze, then:
Identify on which Kubernetes node it is running on
kubectl get pod pod-name -owide
Grab the container ID with something like the following
kubectl get pod pod-name -ojsonpath='{.status.containerStatuses[0].containerID}'
Where the id should be something in the shape of docker://f834508490bd2b248a2bbc1efc4c395d0b8086aac4b6ff03b3cc8fd16d10ce2c
Remove the docker:// part and SSH on the Kubernetes node on which this container is running, then do a
docker inspect container-id | grep -i logpath
Which should give you the log locations for that particular container. You can try tail on the file to check if the logs are really there or not.
In my case, the container I tried this procedure on, was logging inside:
/var/lib/docker/containers/289271086d977dc4e2e0b80cc28a7a6aca32c888b7ea5e1b5f24b28f7601ff63/289271086d977dc4e2e0b80cc28a7a6aca32c888b7ea5e1b5f24b28f7601ff63-json.log

Kubernetes Service Resolution from application.properties

I have configured mysql cluster service and I am using the that service instead of hostname in jdbc url in my application.properties. It is not resolving. But when I use the minikube URL, it is connecting correctly. Shouldn't DNS resolution happen for jdbc url as well in application.properties for a java project ?
Just as #sfgroups mentioned, it is highly likely that the service has not been properly registered. Maybe you are using a different namespace or simply the service is not available. In order to check that:
Run kubectl get svc and kubectl get endpoints to check if the service is registered and the mysql pods selected. It may sound silly but I advise you to check if the service name you are using is correct.
If it is registered, try kubectl get pods, get the ID of your jdbc pod and launch kubectl exec -ti <ID> nslookup <servicename>. This will give you a hint if the dns resolution is working or not.
If it is not resolving, then check in minikube addons list that dns is enabled. If it is disabled, enable it (you will need to wait a little bit) and try again.

RabbitMQ Erlang distribution failed

I have two Windows Server 2012 R2 machines located in one of the client's datacenters. Both servers are domain-joined. They both have RabbitMQ 3.6.0. installed on them. RabbitMQ is running as Windows Service on both machines. I've tried to cluster these two machines for a long time now without success. I always get the following error when I try to cluster them.
One the first machine nodeA I run the command 'rabbitmqctl join_cluster rabbit#nodeB'. This is what I get:
Clustering node 'rabbit#nodeA' with 'rabbit#nodeB' ...
Error: unable to connect to nodes ['rabbit#nodeB']: nodedown
`DIAGNOSTICS`
===========
attempted to contact: ['rabbit#nodeB']
rabbit#nodeB:
* connected to epmd (port 4369) on nodeB
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* suggestion: hostname mismatch?
* suggestion: is the cookie set correctly?
* suggestion: is the Erlang distribution using TLS?
current node details:
- node name: 'rabbitmq-cli-3892#nodeA'
- home dir: C:\Users\mydirectory
- cookie hash: l+SSu57+cRyAQ03AJdwAbQ==
I've tried this setup with Azure Virtual Machines within Azure Virtual Network and I succeeded to cluster the two VM's, however it seems I cannot connect these two (customer's machines) together.
This is what I have done and ensured:
There isn't any firewall blocking connections
Added host names to hosts file located on C:\Windows\system32\drivers\etc
Tried to refer to host names as FQDN without adding anything to hosts file
Tried to refer to host names with CAPITAL letters and without
Copied the same exact .erlang.cookie to C:\Windows and C:\Users\mydirectory on both machines.
I've read, understood and applied RabbitMQ Clustering Guide https://www.rabbitmq.com/clustering.html
Stopped, restarted, reinstalled RabbitMQ on both machines.
It seems I can't get it to work. On Azure machines, which were not domain-joined clustering worked beautifully. I am really running out of options... Any help?
i had the same problem you need to install rabbitmq as a admin. uninstall then reinstall as admin and it should work fine
Try to connect to each of RabbitMQ nodes via remote shell and check if value of cookie is the same (cookie can be set in 3 different ways: .erlang.cookie is one of them).
erl -remsh 'rabbitmq-cli-3892#nodeA' -name 'test#nodeA'
erlang:get_cookie().

Unable to get Mesos to run from tutorial: Setting up a Single Node Mesosphere Cluster

I have been following this tutorial to try and setup a single node mesosphere cluster from their
official tutorial:
http://mesosphere.com/docs/getting-started/developer/single-node-install/
I followed all the commands without any issues, and I also added the ports 5050 and 8080 to my security group. When I try to access the console for mesos/marathon, I get a "Internet Explorer cannot display the webpage" message.
They also recommend checking it the following way:
MASTER=$(mesos-resolve `cat /etc/mesos/zk`)
mesos-execute --master=$MASTER --name="cluster-test" --command="sleep 5"
But that comes up with an error:
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0106 17:03:08.126703 20993 process.cpp:1561] Failed to initialize, gethostbyname2: Unknown host
*** Check failure stack trace: ***
I am not really sure how to troubleshoot this either, and there are not many tutorials I could find on how to install mesos on ubuntu.
I checked the contents of the zk file, seems to be the default value.
$ cat /etc/mesos/zk
zk://localhost:2181/mesos
I would really appreciate any clues on how to go about this one.
Edit: The process is definitely running too - just an fyi:
root 31545 8.5 5.9 187464 35604 ? Ssl 17:28 0:00 /usr/local/sbin/mesos-slave --master=zk://localhost:2181/mesos --log_dir=/var/log/mesos
root 31563 28.5 2.1 116304 12856 ? Rs 17:28 0:00 /usr/local/sbin/mesos-master --zk=zk://localhost:2181/mesos --port=5050 --log_dir=/var/log/mesos --quorum=1 --wo
Mesos uses gethostbyname2 to resolve hostnames to IPs. The first thing I would recommend, is to try "ping localhost" and "ping hostname", and verify that there are no strange settings in /etc/hosts. If you're doing a multi-node cluster, I'd recommend that hostname map to the public IP address (not 127.0.x.1).
If that doesn't help, you can try setting the --ip and --hostname flags when starting mesos-master and mesos-slave, to bypass the gethostbyname2 resolution. These can also be set by writing to the file-based parameters, e.g. /etc/mesos/mesos-master/ip
For additional troubleshooting, try running wget http://localhost:5050 (or curl -L) from the mesos master, to verify that it is locally visible. Also try wget http://<public_ip>:5050 to verify that the web server is up and serving to the public IP. Depending on how your (EC2?) node is setup, you may need to expose/forward the port, or connect to a VPN.
Thanks Adam. I ran the wget and curl commands, and nothing was actually listening on port 8080 or 5050. I did open those ports in the ec2. A simple reboot did the trick however, once I ssh'ed into the ec2 instance after the reboot, both mesos and marathon were running and both ports are now showing after I ran
netstat -ntln.

rabbitmqctl Error: unable to connect to node rabbit#myserver nodedown

I am running RabbitMQ v3.3.5 with Erlang OTP 17.1 on Windows 2008 R2. My Dev and QA environments are stand-alone. My staging and production environments are clustered.
I am finding this one problem happening often where the RabbitMQ service is running, the RabbitMQ management console is seeing everything, but when I try running rabbitmqctl from the command line it fails with an error saying that the node is down (tried locally and on a remote server).
This problem is resolved if I restart the Windows service.
I see no error message in the RabbitMQ error log. The last message indicated that the node was up.
Below is an example output of the issue that I recently experienced on node 2 of our staging windows cluster:
PS C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.3.5\sbin> .\rabbitmqctl.bat status
Status of node rabbit#MYSERVER2 ...
Error: unable to connect to node rabbit#MYSERVER2: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit#MYSERVER2]
rabbit#MYSERVER2:
* connected to epmd (port 4369) on MYSERVER2
* epmd reports: node 'rabbit' not running at all
no other nodes on MYSERVER2
* suggestion: start the node
current node details:
- node name: rabbitmqctl2199771#MYSERVER2
- home dir: C:\Users\RabbitMQ
- cookie hash: mn6OaTX9mS4DnZaiOzg8pA==
at this point I restart the RabbitMQ service and then try again
PS C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.3.5\sbin> .\rabbitmqctl.bat status
Status of node rabbit#MYSERVER2...
[{pid,3784},
{running_applications,
[{rabbitmq_management_agent,"RabbitMQ Management Agent","3.3.5"},
{rabbit,"RabbitMQ","3.3.5"},
{os_mon,"CPO CXC 138 46","2.2.15"},
{mnesia,"MNESIA CXC 138 12","4.12.1"},
{xmerl,"XML parser","1.3.7"},
{sasl,"SASL CXC 138 11","2.4"},
{stdlib,"ERTS CXC 138 10","2.1"},
{kernel,"ERTS CXC 138 10","3.0.1"}]},
{os,{win32,nt}},
{erlang_version,
"Erlang/OTP 17 [erts-6.1] [64-bit] [smp:4:4] [async-threads:30]\n"},
{memory,
[{total,35960208},
{connection_procs,2704},
{queue_procs,5408},
{plugins,111936},
{other_proc,13695792},
{mnesia,102296},
{mgmt_db,0},
{msg_index,21816},
{other_ets,884704},
{binary,25776},
{code,16672826},
{atom,602729},
{other_system,3834221}]},
{alarms,[]},
{listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{amqp,5672,"0.0.0.0"}]},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,3435787059},
{disk_free_limit,50000000},
{disk_free,74911649792},
{file_descriptors,
[{total_limit,8092},
{total_used,4},
{sockets_limit,7280},
{sockets_used,2}]},
{processes,[{limit,1048576},{used,139}]},
{run_queue,0},
{uptime,5}]
...done.
Any idea as to what causes this and how to automatically detect the situation?
Is this specifically a problem with running RabbitMQ on Windows?
Hostnames are case-insensitives when you are trying to resolve them. For example, LOCALHOST and localhost are the same host.
However, when Erlang constructs the name of a node (eg. rabbit#<hostname> in the case of RabbitMQ), this name is case-sensitive. So rabbit#LOCALHOST and rabbit#localhost are two different node names, even if they run on the same host.
Recently, we (the RabbitMQ team) found out that, on Windows, the node name constructed for RabbitMQ was inconsistent. Therefore, sometimes, RabbitMQ started as a Windows service could be named rabbit#MYHOST but rabbitmqctl would try to reach rabbit#myhost and fail.
Since RabbitMQ 3.6.0, the node name should be consistent.
To anyone else getting this error, this was my fix. I installed Erlang, but overlooked the instructions on setting up the Environmental Variable.
I was reading the manual install page:
https://www.rabbitmq.com/install-windows-manual.html
and found the following:
Set ERLANG_HOME to where you actually put your Erlang installation,
e.g. C:\Program Files\erlx.x.x (full path). The RabbitMQ batch files
expect to execute %ERLANG_HOME%\bin\erl.exe.
Go to Start > Settings > Control Panel > System > Advanced >
Environment Variables. Create the system environment variable
ERLANG_HOME and set it to the full path of the directory which
contains bin\erl.exe.
For some reason, the auto install assigned the wrong path name to the ERLANG_HOME variable - see image below. I simply added \bin on the end.
I had a similar problem on my linux box and am posting the answer here, because rabbitmq on windows may handle things similarly.
My post and solution: rabbtimqadmin - Could not connect: [Errno -2] Name or service not known
The core issue was changing the servername after rabbitmq was configured. When installed, rabbitmq references the servers name, making it part of its configuration. I can see this being a similar issue on windows.
In short, you can change server's name back to the name it was when you first installed rabbitmq or you can add a rabbitmq-env.conf file, I'm not sure where it would go in windows, but the following gives details for linux: https://www.rabbitmq.com/man/rabbitmq-env.conf.5.man.html
Note that on linux the name of the server was CaSe SENiTivE! So you may or may not have a similar issue with windows.
Hope this helps and good luck!
If you are using linux try to change permission of /var/lib/rabbitmq/mnesia folder.

Resources