Installing CDH on VMware - Cloudera-scm-server dies - hadoop

I am trying to install CDH on VMware(Ubuntu 12.05 desktop) as per steps mentioned in below link:-
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-Installation-Guide/cmig_install_path_A.html.
I have found no issues till step 2, but everytime when I move ahead on step 3: cloudera-scm-server dies in backend, due to which I am unable to complete step 3. So when I see that cloudera Manager admin console gets stuck, i check the status of server on terminal by using below command and it shows that server is dead.
$user#ubuntu: sudo service cloudera-scm-server status
Checking for service cloudera-scm-server: * cloudera-scm-server is dead and pid file exists
Please help me out, why server is getting dead after some time.

Related

How to safely fix an AWOL ambari system user?

I'm a student working on a test cluster, consisting of around 25 hosts. We installed using Ambari and have FreeIpa running on a host as a dns and ldap server. The rest are typical Hadoop
infrastructure. Hive was failing and I wondered whether the db connection parameters used during the Ambari installation were incorrect and I tried to find a way to re-run the db connection process. I didn't get anywhere and it was late so I left it, ambari interface working.
Next morning, ambari webUI seems to be down. I thought that maybe the webserver needed restarted so I tried the following:
[akidd#dw ~]$ sudo ambari-server start
Using python /usr/bin/python
Starting ambari-server
ERROR: Exiting with exit code 1.
REASON: Unable to detect a system user for Ambari Server.
- If this is a new setup, then run the "ambari-server setup" command to create the user
- If this is an upgrade of an existing setup, run the "ambari-server upgrade" command.
Refer to the Ambari documentation for more information on setup and upgrade.
Can anyone help me to understand what could have happened?
If I run ambari-server setup will the existing cluster be ok assuming I create everything like for like with how it was originally?
Thanks for your help!
#user3535074 You should try to start it with the user that installed it.
If you do run ambari-server setup as current user, remember to choose No the following options:
Customize user account for ambari-server daemon [y/n] (n)? n
Do you want to change Oracle JDK [y/n] (n)? n
Enter advanced database configuration [y/n] (n)? n
More info on the following post, including how to backup ambari database before running setup again:
https://community.cloudera.com/t5/Support-Questions/Ambari-server-failed-to-start-after-system-reboot-Below-is/td-p/203806

cloudera host with bad health during install

Trying again & again with all required steps completed but cluster Installation when install selected Parcels, always shows every host with bad health. setup never completed at full.
i am installing cm 5.5 on CentOS 6.7 using virtualbox.
The Error
Host is in bad health cm.feuni.edu
Host is in bad health dn1.feuni.edu
Host is in bad health dn2.feuni.edu
Host is in bad health nn1.feuni.edu
Host is in bad health nn2.feuni.edu
Host is in bad health rm.feuni.edu
above error are shown on step 6 where setup says
The selected parcels are being downloaded and installed on all the hosts in the cluster
in previous step 5 all hosts were completed with heartbeat checks in the end
memory distributions
cm 8GB
all others with 1GB
i could not find proper answer anywhere else. What reason could be for the bad health?
I don't know if it will help you...
For me, after a few days I struggled with it,
I found the log files (at )
It had a comment there is a mismatch of the guid,
so I uninstalled everything from both machines (using the script they give,/usr/share/cmf/uninstall-cloudera-manager.sh , yum remove 'cloudera-manager-*' and deletion of every directory related to cloudera I found...)
and then removed the guid file:
rm /var/lib/cloudera-scm-agent/cm_guid
Afterwards I re-installed everything, and that fixed that issue for me...
I read online that there can be issues with the hostname and things like that, but I guess that if you get to this part of the installation, you already fixed all the domain/FDQN/hosname/hosts issues.
It saddens me there is no real manual/FAQ for this product.. :(
Good luck!
I faced the same problem. This is my solution:
First I edited config.ini
$ nano /etc/cloudera-scm-agent/config.ini
so that the hostname where the same as the command $ hostname returned.
then I restarted the agent and the server of cloudera:
$ service cloudera-scm-agent restart
$ service cloudera-scm-server restart
then in cloudera manager I deleted the cluster and added again. The wizard continued to run normally.

how to manually start/stop hadoop services on boot up/down?

Hi is someone aware about stopping and starting CDH(cloudera distribution Hadoop) Services with script we are doing this for production servers. For an instance if servers are restarted then before reboot all the Hadoop services stops gracefully and on startup the start.
I have a 8 Node Hadoop cluster on RHEL with cloudera 5.4.7 installed on it.
Till now i have identified few ways to do that one is here on link it says i have to use chkconfig to register the service on OS for eg as below:
sudo chkconfig hadoop-hdfs-namenode on
But when i am doing that i am getting error as
error reading information on service hadoop-hdfs-namenode: No such file or directory
which clearly states that it is unable to find the file i have specifed.
Then i searched for file and it is located in
/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/etc/rc.d/init.d/hadoop-hdfs-namenode
/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/etc/default/hadoop-hdfs-namenode
the i tried executing the same commands from the folder itself where files are located but the same error. The permissions are fine on file and tried ./ as well but same error.
I am also able to list all the process which are currently running by
sudo jps
14035 -- process information unavailable
10615 -- process information unavailable
15323 -- process information unavailable
5486 -- process information unavailable
2001 -- process information unavailable
46991 -- process information unavailable
42667 -- process information unavailable
33732 Jps
2698 -- process information unavailable
2727 -- process information unavailable
7901 -- process information unavailable
42624 -- process information unavailable
As one can see process names are not coming but these are hadoop process so to stop the process i can kill all of them but this is not the way to gracefully stop hadoop managed by cloudera. Please let me know if anyone is aware of anything which can help me moving forward.
Thanks to cloudera they provide a way to boot services on system startup. Below is the way to do that:
Click on the service
Go to the configuration
Search for Automatically Restart Process
Check the Check-Box.
It will restart the services on bootup.
you can do this by executing curl command form shell script. For example to start solr service you can use
curl -u admin:admin -X POST http://ipaddress:7180/api/v4/clusters//services/solr1/commands/start -H 'Content-type:aplication/json; charset=utf-8';
For More details on the visit
http://cloudera.github.io/cm_api/apidocs/v10/index.html

Hadoop installation: what is "This is comment for WebHCat Service (sic)"

Using Ambari, This is comment for WebHcat Service is the final selection in the “Services Selection” step.
If I don't select this service, then the Customize Services step hangs indefinitely. It doesn't matter which other services are selected.
If I select it, then the Customize Services step functions normally, but the installation will stop on step four with the error message:
“org.apache.ambari.server.controller.spi.SystemException:
An internal system exception occurred:
Configuration with tag version1439256707212 exists for webhcat-site
This is on a clean install, for a single node SLES 11 SP3 server.
What is the service This is comment for WebHcat Service, and why is it a comment instead of a service name?
If this is a fresh install, it's strange your getting configuration already exists errors. I would try to clean your ambari server instance by running:
sudo ambari-server reset
This will reset the postgres database that ambari-server uses, giving you a clean slate to retry another cluster install.

Impala The Cloudera Manager Agent got an unexpected response from this role's web server

i have done an hadoop cluster installation with cloudera manager. After this installation impala status has become bad.
I have the following error for master node:
Web Server Status
and this one for nodes with imapala daemon:
Impala Daemon Ready Check, Web Server Status
looking into logs i have found some errors:
The health test result for IMPALAD_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent got an unexpected response from this role's web server.
looking into cloudera-scm-agent.log there are those errors:
1261 Monitor-HostMonitor throttling_logger ERROR (29 skipped) Failed to collect NTP metrics
i tryed to install NTP (sudo apt-get install ntp) but after this installation HDFS, HIVE, YARN and others services goes bad, removing that only impala goes bad.
MainThread agent ERROR Failed to connect to previous supervisor.
Another error is this:
Monitor-GenericMonitor throttling_logger ERROR Error fetching metrics at 'http://nodo-1:50075/jmx'
i tried looking all hostnames and seems correct...
so, what is this problem? how can i solve it?
I also had problem with NTP, the problem still existed after installing NTP , but when I done sudo service ntp restart the error was fixed

Resources