Hadoop Ambari cannot confirm hosts - hadoop

I tried to use Ambari to manage the installation and maintenance of the Hadoop cluster.
After I started ambari server, I use the web page to set up Hadoop cluster.
But at the 3rd step-- confirm hosts, the error shows below
And I check the log at /var/log/ambari-server, I found:
INFO:root:BootStrapping hosts ['qiao'] using /usr/lib/python2.6/site-packages/ambari_server cluster primary OS: redhat6 with user 'root' sshKey File /var/run/ambari-server/bootstrap/1/sshKey password File null using tmp dir /var/run/ambari-server/bootstrap/1 ambari: master; server_port: 8080; ambari version: 1.4.1.25
INFO:root:Executing parallel bootstrap
ERROR:root:ERROR: Bootstrap of host qiao fails because previous action finished with non-zero exit code (1)
INFO:root:Finished parallel bootstrap

Do you provide ssh rsa private key or paste it?
and from the place you are installing, make sure you can ssh to any hosts without typing any password.
If still the same error, try
ambari-server reset
ambari-server setup

Pls restart ambari-server
ambari-server restart
and then try accessing Ambari
It would work.

Make sure you can ssh to every single host on the list, including all master hosts.
To do this, ensure that Ambari host's .ssh/id_rsa.pub entry is included in every hosts' .ssh/authorized_keys file. Then ssh from Ambari's host to every single server - and check if it is asking for your password. You can use a tutorial like http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/ to check if everything has been done properly.
You need to do the same on the Ambari host itself, if you added it to hosts list.

Related

How to safely fix an AWOL ambari system user?

I'm a student working on a test cluster, consisting of around 25 hosts. We installed using Ambari and have FreeIpa running on a host as a dns and ldap server. The rest are typical Hadoop
infrastructure. Hive was failing and I wondered whether the db connection parameters used during the Ambari installation were incorrect and I tried to find a way to re-run the db connection process. I didn't get anywhere and it was late so I left it, ambari interface working.
Next morning, ambari webUI seems to be down. I thought that maybe the webserver needed restarted so I tried the following:
[akidd#dw ~]$ sudo ambari-server start
Using python /usr/bin/python
Starting ambari-server
ERROR: Exiting with exit code 1.
REASON: Unable to detect a system user for Ambari Server.
- If this is a new setup, then run the "ambari-server setup" command to create the user
- If this is an upgrade of an existing setup, run the "ambari-server upgrade" command.
Refer to the Ambari documentation for more information on setup and upgrade.
Can anyone help me to understand what could have happened?
If I run ambari-server setup will the existing cluster be ok assuming I create everything like for like with how it was originally?
Thanks for your help!
#user3535074 You should try to start it with the user that installed it.
If you do run ambari-server setup as current user, remember to choose No the following options:
Customize user account for ambari-server daemon [y/n] (n)? n
Do you want to change Oracle JDK [y/n] (n)? n
Enter advanced database configuration [y/n] (n)? n
More info on the following post, including how to backup ambari database before running setup again:
https://community.cloudera.com/t5/Support-Questions/Ambari-server-failed-to-start-after-system-reboot-Below-is/td-p/203806

Hadoop 2.6.4 Web UI Time Out

I installed Hadoop 2.6.4 on my AWS - 4 instance; 1 namenode; 1 secnamenode; 2 slaves. After the installation is completed, I tried seeing the namenode on Web UI using URL ec2-52-90-242-76.compute-1.amazonaws.com:50070 I am getting timed out.. anybody help??
If you are accessing from your system, you need to update your hosts files with IP address along with hostname or you can open directly with IP_address:50070
As well as check below
Check Firewall is on or off (Recommended is off)
Check Iptables service status (Recommended is stop)
Check SELINUX (Recommended is disables)

Unable to get Mesos to run from tutorial: Setting up a Single Node Mesosphere Cluster

I have been following this tutorial to try and setup a single node mesosphere cluster from their
official tutorial:
http://mesosphere.com/docs/getting-started/developer/single-node-install/
I followed all the commands without any issues, and I also added the ports 5050 and 8080 to my security group. When I try to access the console for mesos/marathon, I get a "Internet Explorer cannot display the webpage" message.
They also recommend checking it the following way:
MASTER=$(mesos-resolve `cat /etc/mesos/zk`)
mesos-execute --master=$MASTER --name="cluster-test" --command="sleep 5"
But that comes up with an error:
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0106 17:03:08.126703 20993 process.cpp:1561] Failed to initialize, gethostbyname2: Unknown host
*** Check failure stack trace: ***
I am not really sure how to troubleshoot this either, and there are not many tutorials I could find on how to install mesos on ubuntu.
I checked the contents of the zk file, seems to be the default value.
$ cat /etc/mesos/zk
zk://localhost:2181/mesos
I would really appreciate any clues on how to go about this one.
Edit: The process is definitely running too - just an fyi:
root 31545 8.5 5.9 187464 35604 ? Ssl 17:28 0:00 /usr/local/sbin/mesos-slave --master=zk://localhost:2181/mesos --log_dir=/var/log/mesos
root 31563 28.5 2.1 116304 12856 ? Rs 17:28 0:00 /usr/local/sbin/mesos-master --zk=zk://localhost:2181/mesos --port=5050 --log_dir=/var/log/mesos --quorum=1 --wo
Mesos uses gethostbyname2 to resolve hostnames to IPs. The first thing I would recommend, is to try "ping localhost" and "ping hostname", and verify that there are no strange settings in /etc/hosts. If you're doing a multi-node cluster, I'd recommend that hostname map to the public IP address (not 127.0.x.1).
If that doesn't help, you can try setting the --ip and --hostname flags when starting mesos-master and mesos-slave, to bypass the gethostbyname2 resolution. These can also be set by writing to the file-based parameters, e.g. /etc/mesos/mesos-master/ip
For additional troubleshooting, try running wget http://localhost:5050 (or curl -L) from the mesos master, to verify that it is locally visible. Also try wget http://<public_ip>:5050 to verify that the web server is up and serving to the public IP. Depending on how your (EC2?) node is setup, you may need to expose/forward the port, or connect to a VPN.
Thanks Adam. I ran the wget and curl commands, and nothing was actually listening on port 8080 or 5050. I did open those ports in the ec2. A simple reboot did the trick however, once I ssh'ed into the ec2 instance after the reboot, both mesos and marathon were running and both ports are now showing after I ran
netstat -ntln.

Is there a way to avoid from entering the localhost password again and again while starting and stopping hadoop?

When I am going to start the hadoop it required to enter localhost password three times and it is same on stopping hadoop. Is there a way to avoid from entering the localhost password again and again?
You have to configure ssh to be able to do a passwordless ssh login.
Nothing better than a good link: http://hortonworks.com/kb/generating-ssh-keys-for-passwordless-login/
It is related to the files ~/hadoop/conf/masters, ~/hadoop/conf/slaves and the scripts of starting and stoping. When you run them, they connect to the hosts listed in those configuration files to run the master and slaves nodes in them.
Hope it helps :)

Error in Cloudera Cluster installation process?

I have installed Cloudera manager successfully. It shows Currently managed hosts as 127.0.0.1 and it is active.
When I search and install cluster using the cloudera manager after the loads it shows the following error.
Installation failed. Failed to receive heartbeat from agent.
Ensure that the host's hostname is configured properly.
Ensure that port 7182 is accessible on the Cloudera Manager server (check firewall rules).
Ensure that ports 9000 and 9001 are free on the host being added.
Check agent logs in /var/log/cloudera-scm-agent/ on the host being added (some of the logs can be found in the installation details).
The following image clearly shows the problem while installing my cluster on cloudera manager.
I had a similar problem and it turned out the issue was conveniently skipping (unfortunately) the ...password-less SSH key ... step
After several hours breaking my head over it, I realised this.
At the terminal do,
ls -al ~/.ssh
You must see files like,
abc
abc.pub
These are you public/private key pairs. [Not necessarily the same names as mine above].The file name you used in Setting up SSH public/private keys step for your machine.
You need to copy the data in abc.pub to a file authorized_keys in this same folder. If its not there, create authorized_keys.
Incase you don't have you public/private key pair see here
For ubuntu, the problem is usually because of the association of "ubuntu 127.0.1.1." in your /etc/hosts file. For me, after changing it to "ubuntu 127.0.0.1", which is the standard local loopback, I can add the cluster successfully. Hope this helps!
I was struggling with this problem for two days. Fixing /etc/hosts as suggested by "khoadoan" worked for me.
/etc/hosts was looking like this when I had the problem
127.0.0.1 localhost
127.0.1.1 ubuntu
I changed it like this:
127.0.0.1 localhost
127.0.0.1 ubuntu
Restarted the machine.
sudo init 6
Launched the Cloudera Manager Admin page. This time the host status was already showing up "Managed = Yes". And I got an additional tab "Currently Managed Hosts(1)", where the local host was listed.

Resources