opscenter can't connect to agents after enabling ssl - cassandra-2.0

I'm setting up node-to-node encryption in AWS on ubuntu in a 3-node datastax enterprise 4.5.2 cluster. I followed these docs -
[1] -http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/sec/secNodeNodeEncryp.html
[2] - http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/sec/secPrepareCerts.html
[3] - https://github.com/PatrickCallaghan/datastax-ssl-secure-cluster
[4] - http://datastax.com/documentation/opscenter/5.0/opsc/configure/opscEnableSSLpkg.html
I created the certs and the keystore. [1,2,3]
Added the certs to the truststore on each node [1,3]
edited cassandra.yaml to turn on node-to-node encryption (leaving client-to-node for another day) [1,3]
edited address.yaml to turn on encryption for datastax-agent [4]
restarted all nodes
'nodetool status' shows me all nodes are up normally. opscenter shows the nodes but gives the error message '0 of 3 agents are connected' What else needs to be done to allow opscenter to talk to the agents? Opsecenter is installed on one of the nodes, and it won't talk to the agent on the same box.

The opscenterd daemon failed to start correctly. examining /var/log/opscenter/opscenterd.log showed the following:
exceptions.ImportError: libssl.so.0.9.8: cannot open shared object file: No such file or directory
First, I tried the simple solution of linking
ln -s /lib/x86_64-linux-gnu/libssl.so.1.0.0 /lib/x86_64-linux-gnu/libssl.so.0.9.8 and
ln -s /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 /lib/x86_64-linux-gnu/libscrypto.so.0.9.8
but that didn't work. The solution was to install libssl0.9.8 on the server running opscenter
$ sudo apt-get install libssl0.9.8
btw, this is opscenter 5.0.1.

Related

CDH cluster installation failing in "distributing" stage- failed due to stall on seeded torrent

Hi,
We are trying to install CDH cluster on Redhat 7 remote server using cloudera-installer.bin file, in standalone mode( we have only 1 host) . We are specifying hostname/ip address of the machine during installation , it is able to resolve it. But the installation halts during parcel distribution stage. Here are the logs of cloudera-scm-agent :(We tried both cloudera express edition and entrerprise trial version too)
['http://INHUSZ1-V250152:7180/cmf/parcel/download/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel'] location=/opt/cloudera/parcels/.flood/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel progress=0]
[03/Oct/2018 10:11:55 +0000] 28315 Thread-13 downloader INFO Current state: CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel [totalDownloaded=0 totalSize=2120090032 upload=0 state=downloading seed=['http://INHUSZ1-V250152:7180/cmf/parcel/download/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel'] location=/opt/cloudera/parcels/.flood/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel progress=0]
[03/Oct/2018 10:11:57 +0000] 28315 Thread-13 downloader INFO Current state: CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel [totalDownloaded=0 totalSize=2120090032 upload=0 state=downloading seed=['http://INHUSZ1-V250152:7180/cmf/parcel/download/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel'] location=/opt/cloudera/parcels/.flood/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel progress=0]
[03/Oct/2018 10:11:59 +0000] 28315 Thread-13 downloader INFO Current state: CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel [totalDownloaded=0 totalSize=2120090032 upload=0 state=downloading seed=['http://INHUSZ1-V250152:7180/cmf/parcel/download/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel'] location=/opt/cloudera/parcels/.flood/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel progress=0]
Please let us know what can be done
I just had the same error message and stall during install at parcel distribution stage.
Installing a single node (test) cluster on CentOS 7.5 with CDH Express 5.15.
Solution that worked for me was adding the node IP and FQDN to /etc/hosts (previously it only contained entries for 127.0.0.1 localhost):
[root#mynode ~]# vi /etc/hosts
192.168.1.1 myhostname.mydomain
Then restarted Cloudera SCM Agent:
[root#mynode ~]# service cloudera-scm-agent restart
Installation then continued successfully.
Do the following:
Stop all services.
Deactivate all in-use parcels.
Shut down the Cloudera Manager Agent on all hosts.
Move the existing parcels to the new location.
Configure the host parcel directory.
Start the Cloudera Manager Agents.
Activate the parcels.
Start all services.
Delete the corresponding parcels package from below folder including .torrent file
/opt/cloudera/parcels/.flood/
Download and distribute
This is happening because .torrent file is corrupted

Upgrade MariaDB Cluster 10.1 to 10.2

I'm planning to upgrade MariaDB Galera cluster from 10.1 to 10.2. Does anyone have details for steps to upgrade? My idea is something
Backup
Shutdown cluster
Uninstall 10.1 from each node
Install 10.2 to each node
Run mysql_upgrade at node which going to started first
Configure the first node and start
Configure rest of nodes and start them
I have three node cluster with maxscale loadbalancing.
You can upgrade the cluster in a rolling fashion, i.e. one node at a time without shutting down the others. That is one of the benefits of Galera cluster.
Make sure to avoid 10.2.9 or be ready to edit mysqld_safe, see here.
For each node:
maxadmin: set server $node-name maintenance
Backup databases and config files
Shutdown the mysqld instance
Uninstall 10.1. On Redhat use rpm -e --nodeps rather than yum remove to avoid uninstalling packages such as postfix and cronie.
Install 10.2
Copy back config files, change any mariadb-10.1 sections to mariadb-10.2
Startup the mysqld instance
If you're on Redhat, CentOS or Fedora run mysql_upgrade
maxadmin: clear server $node-name maintenance

RabbitMQ Erlang distribution failed

I have two Windows Server 2012 R2 machines located in one of the client's datacenters. Both servers are domain-joined. They both have RabbitMQ 3.6.0. installed on them. RabbitMQ is running as Windows Service on both machines. I've tried to cluster these two machines for a long time now without success. I always get the following error when I try to cluster them.
One the first machine nodeA I run the command 'rabbitmqctl join_cluster rabbit#nodeB'. This is what I get:
Clustering node 'rabbit#nodeA' with 'rabbit#nodeB' ...
Error: unable to connect to nodes ['rabbit#nodeB']: nodedown
`DIAGNOSTICS`
===========
attempted to contact: ['rabbit#nodeB']
rabbit#nodeB:
* connected to epmd (port 4369) on nodeB
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* suggestion: hostname mismatch?
* suggestion: is the cookie set correctly?
* suggestion: is the Erlang distribution using TLS?
current node details:
- node name: 'rabbitmq-cli-3892#nodeA'
- home dir: C:\Users\mydirectory
- cookie hash: l+SSu57+cRyAQ03AJdwAbQ==
I've tried this setup with Azure Virtual Machines within Azure Virtual Network and I succeeded to cluster the two VM's, however it seems I cannot connect these two (customer's machines) together.
This is what I have done and ensured:
There isn't any firewall blocking connections
Added host names to hosts file located on C:\Windows\system32\drivers\etc
Tried to refer to host names as FQDN without adding anything to hosts file
Tried to refer to host names with CAPITAL letters and without
Copied the same exact .erlang.cookie to C:\Windows and C:\Users\mydirectory on both machines.
I've read, understood and applied RabbitMQ Clustering Guide https://www.rabbitmq.com/clustering.html
Stopped, restarted, reinstalled RabbitMQ on both machines.
It seems I can't get it to work. On Azure machines, which were not domain-joined clustering worked beautifully. I am really running out of options... Any help?
i had the same problem you need to install rabbitmq as a admin. uninstall then reinstall as admin and it should work fine
Try to connect to each of RabbitMQ nodes via remote shell and check if value of cookie is the same (cookie can be set in 3 different ways: .erlang.cookie is one of them).
erl -remsh 'rabbitmq-cli-3892#nodeA' -name 'test#nodeA'
erlang:get_cookie().

windows cluster - SSH seems to be failing

Two physical systems, each is running Server 2008
Installed DataStax Community (version 2.0.7 64-bit) on each (that is the version number in the DataStax package I downloaded according to the file name)
OpCenter running locally shows a running 1 node cluster. I can execute IO on the system at the command line (using cassandra-stress)
The system names are "5017-cassandra-1" and "5017-cassandra-2"
I'd like to create a cluster in which both nodes participate. This is not a production environment (I'm just trying to learn).
From OpCenter on 5017-cassandra-1 I go to Nodes (I see 1 node of course), Add Nodes.
I leave the "Package" drop down as default (but the latest version shown in the drop down is 2.0.6), enter the IP address of 5017-cassandra-2. I add the Administrator user name and password in the "Node Creditials (sudo)" fields and press "Add Nodes" and get:
Error provisioning cluster: Unable to SSH to some of the hosts
Unable to SSH to 10.108.14.224:
global name 'get_output' is not defined
Reading that I needed to add OpenSSL - I installed the runtime redistributables (on both system) and Win64 OpenSSL-1_0_1h.
The error persists.
any suggestions or link to a step-by-step would be appreciated.

Error in Cloudera Cluster installation process?

I have installed Cloudera manager successfully. It shows Currently managed hosts as 127.0.0.1 and it is active.
When I search and install cluster using the cloudera manager after the loads it shows the following error.
Installation failed. Failed to receive heartbeat from agent.
Ensure that the host's hostname is configured properly.
Ensure that port 7182 is accessible on the Cloudera Manager server (check firewall rules).
Ensure that ports 9000 and 9001 are free on the host being added.
Check agent logs in /var/log/cloudera-scm-agent/ on the host being added (some of the logs can be found in the installation details).
The following image clearly shows the problem while installing my cluster on cloudera manager.
I had a similar problem and it turned out the issue was conveniently skipping (unfortunately) the ...password-less SSH key ... step
After several hours breaking my head over it, I realised this.
At the terminal do,
ls -al ~/.ssh
You must see files like,
abc
abc.pub
These are you public/private key pairs. [Not necessarily the same names as mine above].The file name you used in Setting up SSH public/private keys step for your machine.
You need to copy the data in abc.pub to a file authorized_keys in this same folder. If its not there, create authorized_keys.
Incase you don't have you public/private key pair see here
For ubuntu, the problem is usually because of the association of "ubuntu 127.0.1.1." in your /etc/hosts file. For me, after changing it to "ubuntu 127.0.0.1", which is the standard local loopback, I can add the cluster successfully. Hope this helps!
I was struggling with this problem for two days. Fixing /etc/hosts as suggested by "khoadoan" worked for me.
/etc/hosts was looking like this when I had the problem
127.0.0.1 localhost
127.0.1.1 ubuntu
I changed it like this:
127.0.0.1 localhost
127.0.0.1 ubuntu
Restarted the machine.
sudo init 6
Launched the Cloudera Manager Admin page. This time the host status was already showing up "Managed = Yes". And I got an additional tab "Currently Managed Hosts(1)", where the local host was listed.

Resources