Unable to restart Hue in EMR - hadoop

I am unable to restart Hue in AWS EMR Hadoop cluster.
I have modified hue.ini file and wanted to restart hue for the changes to apply.When I ran "service hue restart", It is giving "command not found" error. I can understand that this must be because hue is not added to the environment path. However, when I run bin/hue, it doesn't take restart as an input. Is there a way to restart hue?
I am using Hue 3.7.1-amzn-7, emr-4.8.4 and Amazon 2.7.3 Hadoop distribution.
Thanks in Advance.

The restart process depends on the EMR AMI version you are using.
On EMR 4.x.x & 5.x.x AMI's ,
Service management is handled by upstart, and not the traditional SysVInit scripts. So, the error like "Command not found" is expected. Services can be queried using the upstart commands found in upstart cookbook
List of services on EMR:
grep -ir "env DAEMON=" /etc/init/ | cut -d"\"" -f2
hadoop-yarn-resourcemanager
oozie
hadoop-hdfs-namenode
hive-hcatalog-server
hadoop-mapreduce-historyserver
hue
hadoop-kms
hadoop-yarn-proxyserver
hadoop-httpfs
hive-server2
hadoop-yarn-timelineserver
Example commands to stop/start hue:
status hue
sudo stop hue
sudo start hue
sudo reload hue
On EMR 3.x.x AMI's ,the SysVInit commands that you are trying to use service hue restart might work.

Related

quickstart hue ui potential misconfiguration detected

I need help with Hue quickstart, i'm a beginner and i'm facing an issue with opening hue ui.
Configuration files located in /etc/hue/conf.empty
Potential misconfiguration detected. Fix and restart Hue.
hadoop.hdfs_clusters.default.webhdfs_url Current value: http://localhost:50070/webhdfs/v1
Failed to create temporary file "/tmp/hue_config_validation.9845984781315522608"
Hive Editor The application won't work without a running HiveServer2.
Here's a snapshot of error
Note
i tried some commands for restarting hue service but no use
sudo service hue stop
sudo service hbase-thrift stop
sudo service hbase-thrift start
sudo service hive-server2 stop
sudo service hive-server2 start
sudo service hue start

start-all.sh command not found

I have just installed Cloudera VM setup for hadoop. But when I open the command prompt and want to start all daemons for hadoop using command 'start-all.sh' , I get an error stating "bash : start-all.sh: command not found".
I have tried 'start-dfs.sh' too yet still gives the same error. When I use 'jps' command, I can see that none of the daemons have been started.
You can find start-all.sh and start-dfs.sh scripts in bin or sbin folders. You can use the following command to find that. Go to hadoop installation folder and run this command.
find . -name 'start-all.sh' # Finds files having name similar to start-all.sh
Then you can specify the path to start all the daemons using bash /path/to/start-all.sh
If you're using the QuickStart VM then the right way to start the cluster (as #cricket_007 hinted) is by restarting it in the Cloudera Manager UI. The start-all.sh scripts will not work since those only apply to the Hadoop servers (Name Node, Data Node, Resource Manager, Node Manager ...) but not all the services in the ecosystem (like Hive, Impala, Spark, Oozie, Hue ...).
You can refer to the YouTube video and the official documentation Starting, Stopping, Refreshing, and Restarting a Cluster

CDH 5.3.2 - Need to restart impala daemon from shell/script

I am using CDH 5.3.2 cluster and have a requirement to be able to start/stop impala daemons from a script. The command mentioned in Cloudera Docs
sudo service impala-server start
works fine on my CDH 5.10 local VM but on CDH 5.3.2 cluster I get an error "impala-server: unrecognized service". On checking in /etc/init.d I see that no such service is listed either (while its listed in 5.10 version)
Then i tried to restart the service directly from impala bin directory
cd /usr/bin
./impalad stop
However running into below error now:
E0918 11:55:27.815739 12046 JniFrontend.java:622] FileSystem is file:///
W0918 11:55:27.817589 12046 JniFrontend.java:534] Cannot detect CDH version. Skipping Hadoop configuration checks
E0918 11:55:27.817620 12046 impala-server.cc:210] Unsupported file system. Impala only supports DistributedFileSystem but the configured filesystem is: LocalFileSystem.fs.defaultFS(file:///) might be set incorrectly
E0918 11:55:27.817631 12046 impala-server.cc:212] Aborting Impala Server startup due to improper configuration
I checked core-site.xml on Cloudera Manager and fs.defaultFS is correctly set so not sure where its picking the value from. Any pointers on how to go further on this?
The init.d service packages to start Impala from the command line are meant to be used for CDH users who do NOT want to use Cloudera Manager. The right way to start and stop Impala on a Cloudera Manager cluster is to use the CM API:
https://cloudera.github.io/cm_api/apidocs/v17/index.html
start cluster service API
stop cluster service API
commands API
The tutorial shows how to use the CM APIs but for your situation you probably need to do:
$ curl -X POST -u USER:PASSWORD \
'CM_URL//api/v1/clusters/CLUSTERNAME/services/IMPALA_SERVICE/commands/stop'
replacing USER, PASSWORD, CM_URL, CLUSTERNAME, IMPALA_SERVICE_NAME with the appropriate values. The curl command will return a command ID.
Then poll this API with the command ID to see that the start/stop operation completed.
$ curl -u USER:PASSWORD 'CM_URL//api/v1/commands/COMMAND_ID'
However, if you still want to use the init.d service packages then you'll need to install the impala-server package.

How to restart yarn on AWS EMR

I am using Hadoop 2.6.0 (emr-4.2.0 image). I have made some changes in yarn-site.xml and want to restart yarn to bring the changes into effect.
Is there a command using which I can do this?
Edit (10/26/2017): A more detailed Knowledge Center article on how to do this has been published here by AWS officially -
https://aws.amazon.com/premiumsupport/knowledge-center/restart-service-emr/.
You can ssh into the master node of your EMR cluster and run -
"sudo /sbin/stop hadoop-yarn-resourcemanager"
"sudo /sbin/start hadoop-yarn-resourcemanager"
commands to restart the Yarn resource manager. EMR AMI 4.x.x uses upstart - /sbin/{start,stop,restart} are all symlinks to /sbin/initctl, which is part of upstart. See the initctl man page for more information.
Alternatively, you can follow the instructions here to propagate your changes to yarn-site.xml - yarn-change-configuration-on-yarn-site-xml
For those who are gonna come from Google
In order to restart a service in EMR, perform the following actions:
Find the name of the service by running the following command:
initctl list
For example, the YARN Resource Manager service is named hadoop-yarn-resourcemanager.
Stop the service by running the following command:
sudo stop hadoop-yarn-resourcemanager
Wait a few seconds, then start the service by running the following command:
sudo start hadoop-yarn-resourcemanager
Note: Stop/start is required; do not use the restart command.
Verify that the process is running by running the following command:
sudo status hadoop-yarn-resourcemanager
Check for the process using ps, and then check the log file for any errors in the log directory /var/log/.
Source : https://aws.amazon.com/premiumsupport/knowledge-center/restart-service-emr/
If what you want to do is to enable log-aggregation, it is actually easier to create the cluster with log-aggregation already enabled, as described in the documentation:
http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-plan-debugging.html
(It is actually enabled by default if you are using emr-4.3.0).
Try restarting this service as well:
hadoop-yarn-nodemanager

Restart hive service on AWS EMR

I am very new to HIVE as well AWS-EMR. As per my requirement, i need to create Hive Metastore Outside the Cluster (from AWS EMR to AWS RDS).
I followed the instruction given in
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-dev-create-metastore-outside.html
I made changes in hive-site.xml and able to setup hive metaStore to Amazon RDS mysql server. To bring the changes in action, currently i am rebooting the complete cluster so hive start storing metastore to AWS-RDS. This way it is working.
But i want to avoid rebooting the cluster, is there any way i can restart the service?
Just for those who are gonna come from Google
To restart any EMR service
In order to restart a service in EMR, perform the following actions:
Find the name of the service by running the following command:
initctl list
For example, the YARN Resource Manager service is named “hadoop-yarn-resourcemanager”.
Stop the service by running the following command:
sudo stop hadoop-yarn-resourcemanager
Wait a few seconds, then start the service by running the following command:
sudo start hadoop-yarn-resourcemanager
Note: Stop/start is required; do not use the restart command.
Verify that the process is running by running the following command:
sudo status hadoop-yarn-resourcemanager
Check for the process using ps, and then check the log file for any errors in the log directory /var/log/.
Source : https://aws.amazon.com/premiumsupport/knowledge-center/restart-service-emr/
sudo stop hive-metastore
sudo start hive-metastore
On EMR 5.x I have found this to work:
hive --service metastore --stop
hive --service metastore --start
For me this approach worked:
Get the pid
Kill the process
Process restarts by itself
Commands for 1 & 2:
ps aux | grep MetaStore
sudo -u hive kill <pid from above>
Here if you are not familiar with ps you can use the following command which will show the headers for PID and only one line of the hive Metastore command:
ps aux | egrep "MetaStore|PID" | grep -v grep
Hive Server restarted by itself. Validate again by ps the pig would have changed.
ps aux | grep MetaStore
You don't have to restart the entire cluster. While launching the cluster, you can specify a hive-site.xml file with the details of RDS. If you are not following this option and making the changes manually after launching the cluster, you don't need to restart the entire cluster. Just restart the hive-metastore service alone. Hive metastore is running in the master node only
You can launch the cluster either by using multiple ways.
1) AWS console
2) Using API (Java, Python etc)
3) Using AWS cli
You can keep the hive-site.xml in S3 and perform this activity as a bootstrap step while launching the cluster. AWS api is providing the feature to specify custom hive-site.xml from S3 rather than the one created by default.
If you are using hive from the master machine alone, you don't have to make the changes in all the machines.
An example of specifying the hive-site.xml while launching EMR using aws cli is given below
aws emr create-cluster --name "Test cluster" --ami-version 3.3 --applications Name=Hue Name=Hive Name=Pig \
--use-default-roles --ec2-attributes KeyName=myKey \
--instance-type m3.xlarge --instance-count 3 \
--bootstrap-actions Name="Install Hive Site Configuration",Path="s3://elasticmapreduce/libs/hive/hive-script",\
Args=["--base-path","s3://elasticmapreduce/libs/hive","--install-hive-site","--hive-site=s3://mybucket/hive-site.xml","--hive-versions","latest"]

Resources