Cloudera install agent issue - hadoop

I have installed cm6 already, and want to install cloudera manager agent from custom repository and CDH6 with using packages.
(I work with only one host)
I have files for cloudera manager agent in directory /cloudera/cloudera-repo/cm6/6.0.1 and for CDH6 in directory /cloudera/cloudera-repo/cdh6/6.0.1
My steps for Cloudera Manager Agent:
Custom repository -> choose http://ip_addr/cloudera/cloudera-repo/cm6/6.0.1
For CDH and other software:
Install Method -> Use Packages
CDH Version -> CDH6
CDH Minor Version -> choose http://ip_addr/cloudera/cloudera-repo/cdh6/6.0.1
And on page Install Agents I have such error:
Failed to copy installation files
/tmp/scm_prepare_node.xpsM8dvM
Connection refused (Connection refused)
I have same error even when I specify empty directories. Why?

From the error, it seems that you have not provided proper credentials to connect to your host. The ssh credentials seems to be incorrect. If you are sure, ssh credentials are fine, then it is a firewall issue. You need to make sure all the required ports are enabled and no blocker is there for cloudera to install the agent.

Related

Apache Zeppelin configuration for connect to Hive on HDP Virtualbox

I've been struggling with the Apache Zeppelin notebook version 0.10.0 setup for a while.
The idea is to be able to connect it to a remote Hortonworks 2.6.5 server that runs locally on Virtualbox in Ubuntu 20.04.
I am using an image downloaded from the:
https://www.cloudera.com/downloads/hortonworks-sandbox.html
Of course, the image has pre-installed Zeppelin which works fine on port 9995, but this is an old 0.7.3 version that doesn't support Helium plugins that I would like to use. I know that HDP version 3.0.1 has updated Zeppelin version 0.8 onboard, but its use due to my hardware resource is impossible at the moment. Additionally, from what I remember, enabling Leaflet Map Plugin there was a problem either.
The first thought was to update the notebook on the server, but after updating according to the instructions on the Cloudera forums (unfortunately they are not working at the moment, and I cannot provide a link or see any other solution) it failed to start correctly.
A simpler solution seemed to me now to connect the newer notebook version to the virtual server, unfortunately, despite many attempts and solutions from threads here with various configurations, I was not able to connect to Hive via JDBC. I am using Zeppelin with local Spark 3.0.3 too, but I have some geodata in Hive that I would like to visualize this way.
I used, among others, the description on the Zeppelin website:
https://zeppelin.apache.org/docs/latest/interpreter/jdbc.html#apache-hive
This is my current JDBC interpreter configuration:
hive.driver org.apache.hive.jdbc.HiveDriver
hive.url jdbc:hive2://sandbox-hdp.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
hive.user hive
Artifact org.apache.hive:hive-jdbc:3.1.2
Depending on the driver version, there were different errors, but this time after typing:
%jdbc(hive)
SELECT * FROM mydb.mytable;
I get the following error:
Could not open client transport for any of the Server URI's in
ZooKeeper: Could not establish connection to
jdbc:hive2://sandbox-hdp.hortonworks.com:10000/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;hive.server2.proxy.user=hive;?tez.application.tags=paragraph_1645270946147_194101954;mapreduce.job.tags=paragraph_1645270946147_194101954;:
Required field 'client_protocol' is unset!
Struct:TOpenSessionReq(client_protocol:null,
configuration:{set:hiveconf:mapreduce.job.tags=paragraph_1645270946147_194101954,
set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000,
hive.server2.proxy.user=hive, use:database=default,
set:hiveconf:tez.application.tags=paragraph_1645270946147_194101954})
I will be very grateful to everyone for any help. Regards.
So, after many hours and trials, here's a working solution. First of all, the most important thing is to use drivers that correlate with your version of Hadoop. Needed are jar files like 'hive-jdbc-standalone' and 'hadoop-common' in their respective versions and to avoid adding all of them in the 'Artifact' field of the %jdbc interpreter in Zeppelin it is best to use one complete file containing all required dependencies.
Thanks to Tim Veil it is available in his Github repository below:
https://github.com/timveil/hive-jdbc-uber-jar/
This is my complete Zeppelin %jdbc interpreter settings:
default.url jdbc:postgresql://localhost:5432/
default.user gpadmin
default.password
default.driver org.postgresql.Driver
default.completer.ttlInSeconds 120
default.completer.schemaFilters
default.precode
default.statementPrecode
common.max_count 1000
zeppelin.jdbc.auth.type SIMPLE
zeppelin.jdbc.auth.kerberos.proxy.enable false
zeppelin.jdbc.concurrent.use true
zeppelin.jdbc.concurrent.max_connection 10
zeppelin.jdbc.keytab.location
zeppelin.jdbc.principal
zeppelin.jdbc.interpolation false
zeppelin.jdbc.maxConnLifetime -1
zeppelin.jdbc.maxRows 1000
zeppelin.jdbc.hive.timeout.threshold 60000
zeppelin.jdbc.hive.monitor.query_interval 1000
hive.driver org.apache.hive.jdbc.HiveDriver
hive.password
hive.proxy.user.property hive.server2.proxy.user
hive.splitQueries true
hive.url jdbc:hive2://sandbox-hdp.hortonworks.com:10000/default
hive.user hive
Dependencies
Artifact
/opt/zeppelin/interpreter/jdbc/hive-jdbc-uber-2.6.5.0-292.jar
Next step is to go to Ambari http://localhost:8080/ and login as admin. To do that first you must login on Hadoop root account via SSH:
ssh root#127.0.0.1 -p 2222
root#127.0.0.1's password: hadoop
After successful login, you will be prompted to change your password immediately, please do that and next set Ambari admin password with command:
[root#sandbox-hdp ~]# ambari-admin-password-reset
After that you can use admin account in Ambari (login and click Hive link in the left panel):
Ambari -> Hive -> Configs -> Advanced -> Custom hive-site
Click Add Property
Insert followings into the opening window:
hive.security.authorization.sqlstd.confwhitelist.append=tez.application.tags
And after saving, restart all Hive services in Ambari. Everything should be working now if you set the proper Java path in 'zeppelin-env.sh' and port in 'zeppelin-site.xml' (you must copy and rename 'zeppelin-env.sh.template' and 'zeppelin-site.xml.template' in Zeppelin/config directory, please remember that Ambari also use 8080 port!).
In my case, the only thing left to do is add or uncomment the fragment responsible for the Helium plug-in repository (in 'zeppelin-site.xml'):
<property>
<name>zeppelin.helium.registry</name>
<value>helium,https://s3.amazonaws.com/helium-package/helium.json</value>
<description>Enable helium packages</description>
</property>
Now you can go to the Helium tab in the top right corner of the Zeppelin sheet and install the plugins of your choice, in my case it is 'zeppelin-leaflet' visualization. And voilĂ ! Sample vizualization from this Kaggle dataset in Hive:
https://www.kaggle.com/kartik2112/fraud-detection
Have a nice day!

ambari 2.7.4 Installation Wizard problems

Ambari built. Network on virtual machines set. Trying to install cluster with the installation wizard of ambari UI. Could not get passed from "Get Started" to "Select version".
There is this error in the logs:
Could not load repo results
java.io.IOException: Server returned HTTP response code: 403 for URL: http://s3.amazonaws.com/dev.hortonworks.com/HDP/hdp_urlinfo.json
Found question with same problem which was not resolved
Screenshot from UI:
#eGs
It looks like you missed some steps and prerequisites before installing ambari. 127.0.0.1 should not be used to access the UI. Ambari docs require use to use FQDNs for all nodes and hosts.
Additionally, the 403 error above is result from using versions of ambari/hdp which cloudera has moved behind a paywall. User/password is required to access these assets now.
You should try with Ambari 2.7.4 and repos and artifacts that are not behind a paywall.

How to safely fix an AWOL ambari system user?

I'm a student working on a test cluster, consisting of around 25 hosts. We installed using Ambari and have FreeIpa running on a host as a dns and ldap server. The rest are typical Hadoop
infrastructure. Hive was failing and I wondered whether the db connection parameters used during the Ambari installation were incorrect and I tried to find a way to re-run the db connection process. I didn't get anywhere and it was late so I left it, ambari interface working.
Next morning, ambari webUI seems to be down. I thought that maybe the webserver needed restarted so I tried the following:
[akidd#dw ~]$ sudo ambari-server start
Using python /usr/bin/python
Starting ambari-server
ERROR: Exiting with exit code 1.
REASON: Unable to detect a system user for Ambari Server.
- If this is a new setup, then run the "ambari-server setup" command to create the user
- If this is an upgrade of an existing setup, run the "ambari-server upgrade" command.
Refer to the Ambari documentation for more information on setup and upgrade.
Can anyone help me to understand what could have happened?
If I run ambari-server setup will the existing cluster be ok assuming I create everything like for like with how it was originally?
Thanks for your help!
#user3535074 You should try to start it with the user that installed it.
If you do run ambari-server setup as current user, remember to choose No the following options:
Customize user account for ambari-server daemon [y/n] (n)? n
Do you want to change Oracle JDK [y/n] (n)? n
Enter advanced database configuration [y/n] (n)? n
More info on the following post, including how to backup ambari database before running setup again:
https://community.cloudera.com/t5/Support-Questions/Ambari-server-failed-to-start-after-system-reboot-Below-is/td-p/203806

CDH cluster installation failing in "distributing" stage- failed due to stall on seeded torrent

Hi,
We are trying to install CDH cluster on Redhat 7 remote server using cloudera-installer.bin file, in standalone mode( we have only 1 host) . We are specifying hostname/ip address of the machine during installation , it is able to resolve it. But the installation halts during parcel distribution stage. Here are the logs of cloudera-scm-agent :(We tried both cloudera express edition and entrerprise trial version too)
['http://INHUSZ1-V250152:7180/cmf/parcel/download/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel'] location=/opt/cloudera/parcels/.flood/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel progress=0]
[03/Oct/2018 10:11:55 +0000] 28315 Thread-13 downloader INFO Current state: CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel [totalDownloaded=0 totalSize=2120090032 upload=0 state=downloading seed=['http://INHUSZ1-V250152:7180/cmf/parcel/download/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel'] location=/opt/cloudera/parcels/.flood/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel progress=0]
[03/Oct/2018 10:11:57 +0000] 28315 Thread-13 downloader INFO Current state: CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel [totalDownloaded=0 totalSize=2120090032 upload=0 state=downloading seed=['http://INHUSZ1-V250152:7180/cmf/parcel/download/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel'] location=/opt/cloudera/parcels/.flood/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel progress=0]
[03/Oct/2018 10:11:59 +0000] 28315 Thread-13 downloader INFO Current state: CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel [totalDownloaded=0 totalSize=2120090032 upload=0 state=downloading seed=['http://INHUSZ1-V250152:7180/cmf/parcel/download/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel'] location=/opt/cloudera/parcels/.flood/CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel progress=0]
Please let us know what can be done
I just had the same error message and stall during install at parcel distribution stage.
Installing a single node (test) cluster on CentOS 7.5 with CDH Express 5.15.
Solution that worked for me was adding the node IP and FQDN to /etc/hosts (previously it only contained entries for 127.0.0.1 localhost):
[root#mynode ~]# vi /etc/hosts
192.168.1.1 myhostname.mydomain
Then restarted Cloudera SCM Agent:
[root#mynode ~]# service cloudera-scm-agent restart
Installation then continued successfully.
Do the following:
Stop all services.
Deactivate all in-use parcels.
Shut down the Cloudera Manager Agent on all hosts.
Move the existing parcels to the new location.
Configure the host parcel directory.
Start the Cloudera Manager Agents.
Activate the parcels.
Start all services.
Delete the corresponding parcels package from below folder including .torrent file
/opt/cloudera/parcels/.flood/
Download and distribute
This is happening because .torrent file is corrupted

Jenkins doesn't recognize slave being down and thus does not allow for it to reconnect

We have a Jenkins instance running on Ubuntu that has several slaves in different systems. One of them is a Windows 7 host, having jenkins slave instance configured as a service.
We have a problem that when that machine is rebooted, master Jenkins doesn't realize it's gone. It looks to be just fine in the nodes view. Then, when a build is issued that is supposed to use that slave it gets stuck. If that is stopped, the next build fails immediately
Caused by: java.util.concurrent.TimeoutException: Ping started at 1457016721684 hasn't completed by 1457016961684
... 2 more
[EnvInject] - [ERROR] - SEVERE ERROR occurs: channel is already closed
When the slave has started up and it tries to connect back to master, connection is refused, and in the logs there is an error saying connection with that name already exists:
Server didn't accept the handshake: xxx is already connected to this master. Rejecting this connection.
There is issue JENKINS-5055 which claims a fix was committed allowing the same JNLP slave to reconnect without getting rejected, apparently this commit, and according to changelog, it was introduced in version 1.396 (2011/02/02). We are however using version 1.639 and seeing this. Somebody else seems to be seeing it as well. By looking at current codebase, I see where the error is coming from, but don't see the fix done in Jenkins-5055.
Any ideas on resolving this?
Edit: also asked on jenkins user mailing list, but no responses.
We faced the same issue. Used https://wiki.jenkins-ci.org/display/JENKINS/slave-status as workaround
Reinstalling the slave on a Windows Server 2012 R2 machine shows no signs of this behavior, so it seems that either there was a mistake done during installation steps or this is something caused by using a workstation Windows version.
Regardless, here were the steps to get it working, assuming a brand new installation of Windows, with no network connectivity, and master instance using a self-signed certificate:
Install JRE on the machine. If you have 64-bit operating system, install both 32-bit and 64-bit, otherwise go with 32-bit. Download link here
Install .NET 3.5 on the machine. This is needed by the Jenkins service. You can follow the steps outlined by my other answer for this.
Install Jenkins using Windows installer (.zipped) to C:\Jenkins. It can be downloaded from here.
Check your installation is responding by navigating to http://localhost:8080 . In case of trouble, check for logs in the jenkins folder. If there is a port conflict, edit jenkins.xml and change the httpPort to something else.
From the Windows computer, navigate to your master jenkins and configure a new node there.
Start a slave agent using Java Launch Agent in configure -> node screen (you need to be still using your Windows slave computer)
You should see a visible window opening. From there, select File -> Install as a service. (details and screenshots) If you experience an error without proper explanation, confirm .NET 3.5 is properly installed. If you see "WMI.WmiException: AccessDenied", save the jnlp file locally and start it from administrator prompt or otherwise with elevated privileges (details).
From the Administrative tools -> Services, stop and disable the Jenkins service, and stop Jenkins Slave Agent but leave it on Automatic so it will start up when starting up the computer.
This is only relevant if you're using a self-signed or otherwise problematic certificate:
download the previously mentioned Java Launch Agent file (.jnlp file) again and save it to C:\jenkins
open c:\jenkins\jenkins-slave.xml to your editor
change it to refer to your local .jnlp file by changing jnlp url parameter (file:/C:/jenkins/jenkins-slave.jnlp)
add -noCertificateCheck to parameters
replace the -secret parameter with -auth "user:pass", since otherwise automatic url get parameters will be added which will mess finding the .jnlp file
Start the Jenkins Slave Agent service again
For problems with jenkins slave service, check out jenkins-slave.err.log. For Windows Server 2012 R2, you can get the functionality of tail by using Get-Content .\jenkins-slave.err.log -Wait -Tail 10 in Powershell prompt. For older versions of Powershell, leave out -Tail 10.

Resources