I may be searching with the wrong terms, but google is not telling me how to do this. The question is how can I restart hadoop services on Dataproc after changing some configuration files (yarn properties, etc)?
Services have to be restarted on a specific order throughout the cluster. There must be scripts or tools out there, hopefully in the Dataproc installation, that I can invoke to restart the cluster.
Configuring properties is a common and well supported use case.
You can do this via cluster properties, no daemon restart required. Example:
dataproc clusters create my-cluster --properties yarn:yarn.resourcemanager.client.thread-count=100
If you're doing something more advanced, like updating service log levels, then you can use systemctl to restart services.
First ssh to a cluster node and type systemctl to see the list of available services. For example to restart HDFS NameNode type sudo systemctl restart hadoop-hdfs-namenode.service
If this is part of initialization action then sudo is not needed.
On master nodes:
sudo systemctl restart hadoop-yarn-resourcemanager.service
sudo systemctl restart hadoop-hdfs-namenode.service
on worker nodes:
sudo systemctl restart hadoop-yarn-nodemanager.service
sudo systemctl restart hadoop-hdfs-datanode.service
After that, you can use systemctl status <name> to check the service status, also check logs in /var/log/hadoop.
Related
I have an ELK setup on a single instance running ubuntu 18.04. Every service (logstash, kibana, metricbeat) will auto start upon reboot except elasticsearch. I have to issue sudo service elasticsearch start command after rebooting the instance.
I tried this command sudo update-rc.d elasticsearch enable but it did not help.
What needs to be done to so that elastic would restart automatically?
in ubuntu 18.04 (above 16.04) the systemctl is command control of systemd.
to making a program as service you should use below command:
systemctl enable elasticsearch.service
you can check a program is service enabled?
systemctl is-enabled elasticsearch.service
I have an ICp installation on some bare metal to educate myself with. So I don't need to keep it running all the time. What is the proper way to shut it down while I am not using it? I have two physical nodes; master and worker. Currently I just ssh into each and issue a sudo shutdown now command.
When I bring the cluster back on line later, the I can't get to the admin UI. It responds with a 502 bad gateway error. When I load https://master:9443 I get the Welcome to Liberty page (indicating that at least the web server is running).
If you stop docker containers or the docker runtime, then the kubelet will attempt to restart them.
If you want to shutdown the system, you must stop the kubelet on each node. On Ubuntu, you would use systemctl:
sudo systemctl stop kubelet
sudo systemctl stop docker
Confirm that all processes are shutdown:
top
And that all related network ports are no longer in use:
netstat -antp
(Note that netstat's "-p" option requires root privileges to inspect the pid holding onto the port).
To restart the cluster, start docker and then the kubelet. Again for Ubuntu:
sudo start docker
sudo start kubelet
And of course you can follow the logs for the kubelet:
sudo journalctl -e -u kubelet
Stop Docker to shut it down, I hope this helped.
systemctl stop docker
I have a setup with Mesos and Aurora, I have dockerized my application which I need to deploy, now i have to start mesos slave with the docker support, but I'm not able to start the mesos slave with docker support, I'm trying the following:
sudo service mesos-slave --containerizers=docker,mesos start
this gives me
mesos-slave: unrecognized service
but if I try :
sudo service mesos-slave start
the slave gets activated.
Can anyone let me know how to solve this issue.
You should also inform people about what OS you're using, otherwise it's mostly guesswork.
Normally, your /etc/mesos-slave/containerizers should contain the following to enable Docker support:
docker,mesos
Then, you'd have to restart the service:
sudo service mesos-slave restart
References:
https://open.mesosphere.com/getting-started/install/#slave-setup
https://mesosphere.github.io/marathon/docs/native-docker.html
https://open.mesosphere.com/advanced-course/deploying-a-web-app-using-docker/
I have stopped the ntpd and restarted it again. Have done a ntpdate pool.ntp.org. the error went once and the hosts were healthy but after sometime again got a clock offset error.
Also I observed that after doing a ntpdate the web interface of cloudera stopped working. It says potential mismatch configuration fix and restart hue.
I have the cloudera quick start vm with centos setup on VMware.
Check if /etc/ntp.conf file is the same across all nodes/masters
restart ntp
add deamon with chkconfig and set it to on
You can fix it by restarting the NTP service which syncronizes the time with a central source.
You can do this by logging in as root from the commandline and running service ntpd restart.
After about a minute the error in CM shoud go away.
Host Terminal
sudo su
service ntpd restart
Clock offset Error occur on Cloudera Manager if host\node's NTP service could not located or did not respond to a request for the clock offset.
Solution:
1)Identify NTP Server IP or Get details of NTP Server IP for your hadoop Cluster
2)On your Hadoop Cluster Nodes Edit-> /etc/ntp.conf
3)Add entries in ntp.conf
server [NTP Server IP]
server xxx.xx.xx.x
4)Restart Services.Execute
Service ntpd restart
5) Restart Cluster From Cloudera Manager
Note: If Problem Still Persist .Reboot you Hadoop Nodes & Check Process.
Check $ cat /etc/ntp.conf make sure configuration file is same as others (nodes)
$ systemctl restart ntpd
$ ntpdc -np
$ ntpdate -u 0.centos.pool.ntp.org
$ hwclock --systohc
$ systemctl restart cloudera-scm-agent
After that wait a few seconds to let it auto configure.
I am using CDH 5.3 on muiltinode cluster and on top of this I have installed Hive, Hbase, Pig and zookeeper. It has total 5 nodes.
Recently server was shutdown to upgrade number of cores in each node. First all datanodes' services were stopped and later name node services.
Below commands were used to stop all services:
DataNode:
sudo service hbase-regionserver stop
sudo service hadoop-yarn-nodemanager stop
sudo service hadoop-hdfs-datanode stop
Name Node:
sudo service mysql stop
sudo service hive-metastore stop
sudo service zookeeper-server stop
sudo service hbase-master stop
sudo service hadoop-yarn-resourcemanager stop
sudo service hadoop-mapreduce-historyserver stop
sudo service hadoop-hdfs-namenode stop
While starting this cluster, name node was started first and then all datanodes.
But when it was started, name node was not coming out of safe mode, even when all datanodes were up. Almost all the files in HDFS were corrupted. Hive metastore and hbase name space was corrupted. Due to this all data got removed from cluster.
Could anyone please give me steps on how to stop all the services and start the cluster back.
Thanks in advance.