How to shutdown and start cloudera cluster safely - hadoop

I am using CDH 5.3 on muiltinode cluster and on top of this I have installed Hive, Hbase, Pig and zookeeper. It has total 5 nodes.
Recently server was shutdown to upgrade number of cores in each node. First all datanodes' services were stopped and later name node services.
Below commands were used to stop all services:
DataNode:
sudo service hbase-regionserver stop
sudo service hadoop-yarn-nodemanager stop
sudo service hadoop-hdfs-datanode stop
Name Node:
sudo service mysql stop
sudo service hive-metastore stop
sudo service zookeeper-server stop
sudo service hbase-master stop
sudo service hadoop-yarn-resourcemanager stop
sudo service hadoop-mapreduce-historyserver stop
sudo service hadoop-hdfs-namenode stop
While starting this cluster, name node was started first and then all datanodes.
But when it was started, name node was not coming out of safe mode, even when all datanodes were up. Almost all the files in HDFS were corrupted. Hive metastore and hbase name space was corrupted. Due to this all data got removed from cluster.
Could anyone please give me steps on how to stop all the services and start the cluster back.
Thanks in advance.

Related

ntpd service in a docker container is dead, cannot restart

I'm trying to mount a local hadoop cluster using docker and ambari, the problem im having is that ambari install check shows NTP is not running, and it is needed to know if the services installed with ambari are working. I checked ntpd in the containers and tried to launch them but it failed
[root#97ea7075ca78 ~]# service ntpd start
Starting ntpd: [ OK ]
[root#97ea7075ca78 ~]# service ntpd status
ntpd dead but pid file exists
Is there a way to start ntp daemon in those containers?
In docker you don't use the service command as there is no init system. Just run the ntpd command and it should work
ntpd by default goes to background. If that was not the case you would need to use ntpd &

How do I restart hadoop services on dataproc cluster

I may be searching with the wrong terms, but google is not telling me how to do this. The question is how can I restart hadoop services on Dataproc after changing some configuration files (yarn properties, etc)?
Services have to be restarted on a specific order throughout the cluster. There must be scripts or tools out there, hopefully in the Dataproc installation, that I can invoke to restart the cluster.
Configuring properties is a common and well supported use case.
You can do this via cluster properties, no daemon restart required. Example:
dataproc clusters create my-cluster --properties yarn:yarn.resourcemanager.client.thread-count=100
If you're doing something more advanced, like updating service log levels, then you can use systemctl to restart services.
First ssh to a cluster node and type systemctl to see the list of available services. For example to restart HDFS NameNode type sudo systemctl restart hadoop-hdfs-namenode.service
If this is part of initialization action then sudo is not needed.
On master nodes:
sudo systemctl restart hadoop-yarn-resourcemanager.service
sudo systemctl restart hadoop-hdfs-namenode.service
on worker nodes:
sudo systemctl restart hadoop-yarn-nodemanager.service
sudo systemctl restart hadoop-hdfs-datanode.service
After that, you can use systemctl status <name> to check the service status, also check logs in /var/log/hadoop.

Call From quickstart.cloudera/172.17.0.2 to quickstart.cloudera:8020 failed on connection exception: java.net.ConnectException: Connection refused

I am very new to Docker and Hadoop system. I have installed the Docker in Ubuntu 16.04 and run the Hadoop image from Cloudera inside a new Docker container. But when I try to run any command in hdfs the error message is shown as:
Call From quickstart.cloudera/172.17.0.2 to quickstart.cloudera:8020 failed on connection exception: java.net.ConnectException: Connection refused;
I could not figure out how to solve this. I expect for kind help.
Port 8020 is for the hdfs-namenode service, so my guess is that service not started or has failed.
Can you try to restart it?
command: sudo service hadoop-hdfs-namenode restart
You can also check the status of the namenode service.
Command: sudo service hadoop-hdfs-namenode status
Also, check the hadoop-hdfs-datanode service as it may also need to be restarted.
command: sudo service hadoop-hdfs-datanode restart
If you still get the error then check the NameNode logs in /var/log/hadoop-hdfs and add it to your question for further analysis.
In my case, restarting namenode, datanode, and yarn resource manager worked.
sudo service hadoop-yarn-resourcemanager restart
sudo service hadoop-hdfs-namenode restart
sudo service hadoop-hdfs-datanode restart

Start Apache Mesos slave with Docker containerizer

I have a setup with Mesos and Aurora, I have dockerized my application which I need to deploy, now i have to start mesos slave with the docker support, but I'm not able to start the mesos slave with docker support, I'm trying the following:
sudo service mesos-slave --containerizers=docker,mesos start
this gives me
mesos-slave: unrecognized service
but if I try :
sudo service mesos-slave start
the slave gets activated.
Can anyone let me know how to solve this issue.
You should also inform people about what OS you're using, otherwise it's mostly guesswork.
Normally, your /etc/mesos-slave/containerizers should contain the following to enable Docker support:
docker,mesos
Then, you'd have to restart the service:
sudo service mesos-slave restart
References:
https://open.mesosphere.com/getting-started/install/#slave-setup
https://mesosphere.github.io/marathon/docs/native-docker.html
https://open.mesosphere.com/advanced-course/deploying-a-web-app-using-docker/

Why does clock offset error in the host keeps occurring again and again : cloudera

I have stopped the ntpd and restarted it again. Have done a ntpdate pool.ntp.org. the error went once and the hosts were healthy but after sometime again got a clock offset error.
Also I observed that after doing a ntpdate the web interface of cloudera stopped working. It says potential mismatch configuration fix and restart hue.
I have the cloudera quick start vm with centos setup on VMware.
Check if /etc/ntp.conf file is the same across all nodes/masters
restart ntp
add deamon with chkconfig and set it to on
You can fix it by restarting the NTP service which syncronizes the time with a central source.
You can do this by logging in as root from the commandline and running service ntpd restart.
After about a minute the error in CM shoud go away.
Host Terminal
sudo su
service ntpd restart
Clock offset Error occur on Cloudera Manager if host\node's NTP service could not located or did not respond to a request for the clock offset.
Solution:
1)Identify NTP Server IP or Get details of NTP Server IP for your hadoop Cluster
2)On your Hadoop Cluster Nodes Edit-> /etc/ntp.conf
3)Add entries in ntp.conf
server [NTP Server IP]
server xxx.xx.xx.x
4)Restart Services.Execute
Service ntpd restart
5) Restart Cluster From Cloudera Manager
Note: If Problem Still Persist .Reboot you Hadoop Nodes & Check Process.
Check $ cat /etc/ntp.conf make sure configuration file is same as others (nodes)
$ systemctl restart ntpd
$ ntpdc -np
$ ntpdate -u 0.centos.pool.ntp.org
$ hwclock --systohc
$ systemctl restart cloudera-scm-agent
After that wait a few seconds to let it auto configure.

Resources