windows cluster - SSH seems to be failing - windows

Two physical systems, each is running Server 2008
Installed DataStax Community (version 2.0.7 64-bit) on each (that is the version number in the DataStax package I downloaded according to the file name)
OpCenter running locally shows a running 1 node cluster. I can execute IO on the system at the command line (using cassandra-stress)
The system names are "5017-cassandra-1" and "5017-cassandra-2"
I'd like to create a cluster in which both nodes participate. This is not a production environment (I'm just trying to learn).
From OpCenter on 5017-cassandra-1 I go to Nodes (I see 1 node of course), Add Nodes.
I leave the "Package" drop down as default (but the latest version shown in the drop down is 2.0.6), enter the IP address of 5017-cassandra-2. I add the Administrator user name and password in the "Node Creditials (sudo)" fields and press "Add Nodes" and get:
Error provisioning cluster: Unable to SSH to some of the hosts
Unable to SSH to 10.108.14.224:
global name 'get_output' is not defined
Reading that I needed to add OpenSSL - I installed the runtime redistributables (on both system) and Win64 OpenSSL-1_0_1h.
The error persists.
any suggestions or link to a step-by-step would be appreciated.

Related

Unable to launch rabbitmq management console in Windows

On a Windows 7 Enterprise 64 Bit OS, I installed Erlang (otp_win64_20.0.exe) and RabbitMQ 3.6.9 (64bit) as standalone one. I have set System Variable for ERLANG_HOME. The installation was successful and RabbitMQ service is running.
But when I trying to enable rabbitmq_management, I am getting following error.
C:\Program Files\RabbitMQ Server\rabbitmq_server-3.6.9\sbin>rabbitmq-plugins.bat enable rabbitmq_management
Plugin configuration unchanged.
Applying plugin configuration to rabbit#machinename... failed.
* Could not contact node rabbit#machinename.
Changes will take effect at broker restart.
* Options: --online - fail if broker cannot be contacted.
--offline - do not try to contact broker.
C:\Program Files\RabbitMQ Server\rabbitmq_server-3.6.9\sbin>rabbitmqctl status
Status of node rabbit#machinename ...
Error: unable to connect to node rabbit#machinename: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit#machinename]
rabbit#machinename:
* connected to epmd (port 4369) on machinename
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* Authentication failed (rejected by the remote node), please check the Erlang cookie
current node details:
- node name: 'rabbitmq-cli-45#machinename'
- home dir: C:\
- cookie hash: LLCyvm2Dd7VpUhtY9jxerg==
I am going through various posts in stackoverflow and still could not figure out what is the root cause of this issue with node and management plugin.
Any help to resolve this is highly appreciated.
It looks like you have problem with `erlang.cookie. It contains key that allows connecting to Erlang node. You can read more about it in official documentation, but simplest solution can be found here
Installing as a non-administrator user leaves .erlang.cookie in the wrong place
This makes it impossible to use rabbitmqctl.
Workarounds:
Run the installer as an administrator or
Copy the file .erlang.cookie manually from %SystemRoot% to %HOMEDRIVE%%HOMEPATH%.
Where %SystemRoot% is normally C:\WINDOWS\.erlang.cookie and %HOMEDRIVE%%HOMEPATH%should be something like C:\Documents and Settings\%USERNAME%\.erlang.cookie or C:\Users\%USERNAME%\.erlang.cookie
This should solve your problem.

RabbitMQ Erlang distribution failed

I have two Windows Server 2012 R2 machines located in one of the client's datacenters. Both servers are domain-joined. They both have RabbitMQ 3.6.0. installed on them. RabbitMQ is running as Windows Service on both machines. I've tried to cluster these two machines for a long time now without success. I always get the following error when I try to cluster them.
One the first machine nodeA I run the command 'rabbitmqctl join_cluster rabbit#nodeB'. This is what I get:
Clustering node 'rabbit#nodeA' with 'rabbit#nodeB' ...
Error: unable to connect to nodes ['rabbit#nodeB']: nodedown
`DIAGNOSTICS`
===========
attempted to contact: ['rabbit#nodeB']
rabbit#nodeB:
* connected to epmd (port 4369) on nodeB
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* suggestion: hostname mismatch?
* suggestion: is the cookie set correctly?
* suggestion: is the Erlang distribution using TLS?
current node details:
- node name: 'rabbitmq-cli-3892#nodeA'
- home dir: C:\Users\mydirectory
- cookie hash: l+SSu57+cRyAQ03AJdwAbQ==
I've tried this setup with Azure Virtual Machines within Azure Virtual Network and I succeeded to cluster the two VM's, however it seems I cannot connect these two (customer's machines) together.
This is what I have done and ensured:
There isn't any firewall blocking connections
Added host names to hosts file located on C:\Windows\system32\drivers\etc
Tried to refer to host names as FQDN without adding anything to hosts file
Tried to refer to host names with CAPITAL letters and without
Copied the same exact .erlang.cookie to C:\Windows and C:\Users\mydirectory on both machines.
I've read, understood and applied RabbitMQ Clustering Guide https://www.rabbitmq.com/clustering.html
Stopped, restarted, reinstalled RabbitMQ on both machines.
It seems I can't get it to work. On Azure machines, which were not domain-joined clustering worked beautifully. I am really running out of options... Any help?
i had the same problem you need to install rabbitmq as a admin. uninstall then reinstall as admin and it should work fine
Try to connect to each of RabbitMQ nodes via remote shell and check if value of cookie is the same (cookie can be set in 3 different ways: .erlang.cookie is one of them).
erl -remsh 'rabbitmq-cli-3892#nodeA' -name 'test#nodeA'
erlang:get_cookie().

Jenkins doesn't recognize slave being down and thus does not allow for it to reconnect

We have a Jenkins instance running on Ubuntu that has several slaves in different systems. One of them is a Windows 7 host, having jenkins slave instance configured as a service.
We have a problem that when that machine is rebooted, master Jenkins doesn't realize it's gone. It looks to be just fine in the nodes view. Then, when a build is issued that is supposed to use that slave it gets stuck. If that is stopped, the next build fails immediately
Caused by: java.util.concurrent.TimeoutException: Ping started at 1457016721684 hasn't completed by 1457016961684
... 2 more
[EnvInject] - [ERROR] - SEVERE ERROR occurs: channel is already closed
When the slave has started up and it tries to connect back to master, connection is refused, and in the logs there is an error saying connection with that name already exists:
Server didn't accept the handshake: xxx is already connected to this master. Rejecting this connection.
There is issue JENKINS-5055 which claims a fix was committed allowing the same JNLP slave to reconnect without getting rejected, apparently this commit, and according to changelog, it was introduced in version 1.396 (2011/02/02). We are however using version 1.639 and seeing this. Somebody else seems to be seeing it as well. By looking at current codebase, I see where the error is coming from, but don't see the fix done in Jenkins-5055.
Any ideas on resolving this?
Edit: also asked on jenkins user mailing list, but no responses.
We faced the same issue. Used https://wiki.jenkins-ci.org/display/JENKINS/slave-status as workaround
Reinstalling the slave on a Windows Server 2012 R2 machine shows no signs of this behavior, so it seems that either there was a mistake done during installation steps or this is something caused by using a workstation Windows version.
Regardless, here were the steps to get it working, assuming a brand new installation of Windows, with no network connectivity, and master instance using a self-signed certificate:
Install JRE on the machine. If you have 64-bit operating system, install both 32-bit and 64-bit, otherwise go with 32-bit. Download link here
Install .NET 3.5 on the machine. This is needed by the Jenkins service. You can follow the steps outlined by my other answer for this.
Install Jenkins using Windows installer (.zipped) to C:\Jenkins. It can be downloaded from here.
Check your installation is responding by navigating to http://localhost:8080 . In case of trouble, check for logs in the jenkins folder. If there is a port conflict, edit jenkins.xml and change the httpPort to something else.
From the Windows computer, navigate to your master jenkins and configure a new node there.
Start a slave agent using Java Launch Agent in configure -> node screen (you need to be still using your Windows slave computer)
You should see a visible window opening. From there, select File -> Install as a service. (details and screenshots) If you experience an error without proper explanation, confirm .NET 3.5 is properly installed. If you see "WMI.WmiException: AccessDenied", save the jnlp file locally and start it from administrator prompt or otherwise with elevated privileges (details).
From the Administrative tools -> Services, stop and disable the Jenkins service, and stop Jenkins Slave Agent but leave it on Automatic so it will start up when starting up the computer.
This is only relevant if you're using a self-signed or otherwise problematic certificate:
download the previously mentioned Java Launch Agent file (.jnlp file) again and save it to C:\jenkins
open c:\jenkins\jenkins-slave.xml to your editor
change it to refer to your local .jnlp file by changing jnlp url parameter (file:/C:/jenkins/jenkins-slave.jnlp)
add -noCertificateCheck to parameters
replace the -secret parameter with -auth "user:pass", since otherwise automatic url get parameters will be added which will mess finding the .jnlp file
Start the Jenkins Slave Agent service again
For problems with jenkins slave service, check out jenkins-slave.err.log. For Windows Server 2012 R2, you can get the functionality of tail by using Get-Content .\jenkins-slave.err.log -Wait -Tail 10 in Powershell prompt. For older versions of Powershell, leave out -Tail 10.

RabbitMQ fails on Error: unable to connect to node rabbit#TPAJ05421843: nodedown

On a Windows 7 Enterprise machine, I made a fresh install of Erlang 17.4 and RabbitMQ 3.4.3 x64. The installation was successful and uneventful.
I have not yet tried to create my first queue or exchange, but I already see trouble. This problem is similar to another SO post, but that other post appears to involve clustering, which I don't have. Furthermore, that other poster can circumvent his issue by restarting the RabbitMQ service; that approach does not work for me.
My "nodedown" problem is evident at the RabbitMQ command prompt:
C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.4.3\sbin>rabbitmqctl status
Status of node rabbit#TPAJ05421843 ...
Error: unable to connect to node rabbit#TPAJ05421843: nodedown
DIAGNOSTICS
attempted to contact: [rabbit#TPAJ05421843]
rabbit#TPAJ05421843:
* connected to epmd (port 4369) on TPAJ05421843
* epmd reports: node 'rabbit' not running at all
other nodes on TPAJ05421843: ['RabbitMQ']
* suggestion: start the node
current node details:
- node name: 'rabbitmqctl-19884#TPAJ05421843'
- home dir: H:\
- cookie hash: PD4QQCYrf0TME9vIko3Xuw==
Based on the above, I chose to check the status of the node explicitly named 'RabbitMQ'. I get this:
C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.4.3\sbin>rabbitmqctl -n RabbitMQ status
Status of node 'RabbitMQ#TPAJ05421843' ...
Error: unable to connect to node 'RabbitMQ#TPAJ05421843': nodedown
DIAGNOSTICS
attempted to contact: ['RabbitMQ#TPAJ05421843']
RabbitMQ#TPAJ05421843:
* connected to epmd (port 4369) on TPAJ05421843
* epmd reports node 'RabbitMQ' running on port 59301
* TCP connection succeeded but Erlang distribution failed
* suggestion: hostname mismatch?
* suggestion: is the cookie set correctly?
current node details:
- node name: 'rabbitmqctl-23076#TPAJ05421843'
- home dir: H:\
- cookie hash: PD4QQCYrf0TME9vIko3Xuw==
Ok, this is barely better since at least it acknowledges 'RabbitMQ' running on port 59301. But what the heck could it mean that "Erlang distribution failed"?
When I try to research this topic, I found articles saying "be sure you have matched cookies." Based on that I found this article, which claims the "cookie mismatch" does not pertain to me, because I have not created (nor intend to create) a RabbitMQ cluster.
What should I do?
I had this same problem today. There were no cookie or firewall problems and windows reported that the service was running successfully. This is what finally fixed it:
Run RabbitMQ sbin command prompt as administrator.
Run "rabbitmq-service remove"
Run "rabbitmq-service install"
For some reason the service set up by the installer did not configure several registry entries. Running this set them correctly and allowed the service to run.
One thing I noticed was that before I did this, there was no description of the service in the Windows Services view. After installing with the rabbitmq-service command, the description was visible. This might be a quick indicator if you are having the same problem.
As #eddyP commented, I had two different Erlang cookie files:
A server cookie file, located at $env:WINDIR\system32\config\systemprofile\.erlang.cookie (prior to Erlang 20.2 it was located at $env:WINDIR\.erlang.cookie).
A client cookie file, located at $env:USERPROFILE\.erlang.cookie.
Copying the server cookie file over the client one, so that both files were the same, fixed the problem for me.
For further details, see "How Nodes (and CLI tools) Authenticate to Each Other: the Erlang Cookie".
From RabbitMQ Command Prompt sbin (run as administrator) execute this command:
rabbitmq-server restart
In Windown, For some reason delete all folder in c:\Users\xxx\AppData\Roaming\RabbitMQ\db\ (xxx is your username)
then flow #Jerdev answer and
start rabbitmq net start rabbitmq
check rabbitmq service rabbitmqctl status
The same question on the RabbitMQ mailing list: https://groups.google.com/forum/#!topic/rabbitmq-users/0s1ExFhl4hM.
The Erlang cookie is used by rabbitmqctl as well as server nodes, so it may need being taken care of (placed in the correct location).
See "Installing as a non-administrator user leaves .erlang.cookie in the wrong place" on Windows quirks.
I resolve my problem doing this in Windows 10.
Execute RabbitMQ Command Prompt (sbin dir) as administrator.
Execute "rabbitmq-service remove" in (RabbitMQ Command Prompt).
Execute %AppData% in Run Dialog Box of Windows.
Delete all files in RabbitMQ folder.
Execute "rabbitmq-service install" in (RabbitMQ Command Prompt).
Execute "rabbitmqctl start_app" in (RabbitMQ Command Prompt).
If you come here looking for a linux answer for the same error message, try
sudo service rabbitmq-server start
(which is not a blocking command)
Just do the following:
Uninstall rabbitmq and erlang.
delete the rabbitmq folder existing in your appdata (if you dont
know the appdata location, just type echo %AppData% in the command
prompt)
Then install erlang first and then rabbitmq.
After installing, enable the management plugin using below command:
rabbitmq-plugins enable rabbitmq_management
For me the cookies didnt match, like the other comments but the locations was in a different path for those having the same issue as me C:\Windows\System32\config\systemprofile
That is happening because rabbit MQ is not being installed correctly on Windows (and this error is misleading!). So to solve it do the following:
type "cmd" in Cortana search or in "Run" for older version of Windows
right click on in and choose "Run as Administrator"
go to rabbit's sbin folder (cd "C:\Program Files\RabbitMQ Server\rabbitmq_server-3.7.4\sbin")
run: rabbitmq-service remove
run: rabbitmq-service install
now you can run
6. rabbitmq-plugins enable rabbitmq_management
7. rabbitmq-service start
8. and, finally, run: start http://localhost:15672
9. log on as user "guest" with password: "guest" and that's it. Happy Rabbiting!
I missed restarting my WINDOWS OS and then deleting the old version of ERLANG (which I uninstalled before restarting).
Somehow the fresh installation of Rabbit was referring to the old (un-installed version) and all the mismatch was happening. Clue was the 'services' referred Rabbit from the old ERLANG version.
This is how I resolved the error in my Windows 8 system:
Check for a syntax error in the rabbitmq.config file placed in the AppData folder for Windows.
How to check if there is any syntax error?
You can run rabbitmq-server restart from sbin folder in:
Program Files/RabbitMQ/rabbitmq_server_x.x/sbin/.
Replace the content of the rabbitmq.config with rabbitmq.config.example.
You may find the rabbitmq.config.example in:
Program Files/RabbitMQ/rabbitmq_server_x.x/etc/
Warning, you will lose the configuration you have saved previously with rabbitmq.
After changing the files, just hit
rabbitmq-server restart
in the sbin folder mentioned above.

rabbitmqctl Error: unable to connect to node rabbit#myserver nodedown

I am running RabbitMQ v3.3.5 with Erlang OTP 17.1 on Windows 2008 R2. My Dev and QA environments are stand-alone. My staging and production environments are clustered.
I am finding this one problem happening often where the RabbitMQ service is running, the RabbitMQ management console is seeing everything, but when I try running rabbitmqctl from the command line it fails with an error saying that the node is down (tried locally and on a remote server).
This problem is resolved if I restart the Windows service.
I see no error message in the RabbitMQ error log. The last message indicated that the node was up.
Below is an example output of the issue that I recently experienced on node 2 of our staging windows cluster:
PS C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.3.5\sbin> .\rabbitmqctl.bat status
Status of node rabbit#MYSERVER2 ...
Error: unable to connect to node rabbit#MYSERVER2: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit#MYSERVER2]
rabbit#MYSERVER2:
* connected to epmd (port 4369) on MYSERVER2
* epmd reports: node 'rabbit' not running at all
no other nodes on MYSERVER2
* suggestion: start the node
current node details:
- node name: rabbitmqctl2199771#MYSERVER2
- home dir: C:\Users\RabbitMQ
- cookie hash: mn6OaTX9mS4DnZaiOzg8pA==
at this point I restart the RabbitMQ service and then try again
PS C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.3.5\sbin> .\rabbitmqctl.bat status
Status of node rabbit#MYSERVER2...
[{pid,3784},
{running_applications,
[{rabbitmq_management_agent,"RabbitMQ Management Agent","3.3.5"},
{rabbit,"RabbitMQ","3.3.5"},
{os_mon,"CPO CXC 138 46","2.2.15"},
{mnesia,"MNESIA CXC 138 12","4.12.1"},
{xmerl,"XML parser","1.3.7"},
{sasl,"SASL CXC 138 11","2.4"},
{stdlib,"ERTS CXC 138 10","2.1"},
{kernel,"ERTS CXC 138 10","3.0.1"}]},
{os,{win32,nt}},
{erlang_version,
"Erlang/OTP 17 [erts-6.1] [64-bit] [smp:4:4] [async-threads:30]\n"},
{memory,
[{total,35960208},
{connection_procs,2704},
{queue_procs,5408},
{plugins,111936},
{other_proc,13695792},
{mnesia,102296},
{mgmt_db,0},
{msg_index,21816},
{other_ets,884704},
{binary,25776},
{code,16672826},
{atom,602729},
{other_system,3834221}]},
{alarms,[]},
{listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{amqp,5672,"0.0.0.0"}]},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,3435787059},
{disk_free_limit,50000000},
{disk_free,74911649792},
{file_descriptors,
[{total_limit,8092},
{total_used,4},
{sockets_limit,7280},
{sockets_used,2}]},
{processes,[{limit,1048576},{used,139}]},
{run_queue,0},
{uptime,5}]
...done.
Any idea as to what causes this and how to automatically detect the situation?
Is this specifically a problem with running RabbitMQ on Windows?
Hostnames are case-insensitives when you are trying to resolve them. For example, LOCALHOST and localhost are the same host.
However, when Erlang constructs the name of a node (eg. rabbit#<hostname> in the case of RabbitMQ), this name is case-sensitive. So rabbit#LOCALHOST and rabbit#localhost are two different node names, even if they run on the same host.
Recently, we (the RabbitMQ team) found out that, on Windows, the node name constructed for RabbitMQ was inconsistent. Therefore, sometimes, RabbitMQ started as a Windows service could be named rabbit#MYHOST but rabbitmqctl would try to reach rabbit#myhost and fail.
Since RabbitMQ 3.6.0, the node name should be consistent.
To anyone else getting this error, this was my fix. I installed Erlang, but overlooked the instructions on setting up the Environmental Variable.
I was reading the manual install page:
https://www.rabbitmq.com/install-windows-manual.html
and found the following:
Set ERLANG_HOME to where you actually put your Erlang installation,
e.g. C:\Program Files\erlx.x.x (full path). The RabbitMQ batch files
expect to execute %ERLANG_HOME%\bin\erl.exe.
Go to Start > Settings > Control Panel > System > Advanced >
Environment Variables. Create the system environment variable
ERLANG_HOME and set it to the full path of the directory which
contains bin\erl.exe.
For some reason, the auto install assigned the wrong path name to the ERLANG_HOME variable - see image below. I simply added \bin on the end.
I had a similar problem on my linux box and am posting the answer here, because rabbitmq on windows may handle things similarly.
My post and solution: rabbtimqadmin - Could not connect: [Errno -2] Name or service not known
The core issue was changing the servername after rabbitmq was configured. When installed, rabbitmq references the servers name, making it part of its configuration. I can see this being a similar issue on windows.
In short, you can change server's name back to the name it was when you first installed rabbitmq or you can add a rabbitmq-env.conf file, I'm not sure where it would go in windows, but the following gives details for linux: https://www.rabbitmq.com/man/rabbitmq-env.conf.5.man.html
Note that on linux the name of the server was CaSe SENiTivE! So you may or may not have a similar issue with windows.
Hope this helps and good luck!
If you are using linux try to change permission of /var/lib/rabbitmq/mnesia folder.

Resources