Hadoop: Cannot set priority of resourcemanager process - hadoop

I am very new to hadoop and am trying to set a psuedo-distributed mode execution with Hadoop-3.1.2.
When I try to start yarn service I get the following error, please see the code snippet below.
$ sbin/start-yarn.sh
Starting resourcemanagers on []
localhost: ERROR: Cannot set priority of resourcemanager process 13209
pdsh#manager-4: localhost: ssh exited with exit code 1
Starting nodemanagers
localhost: ERROR: Cannot set priority of nodemanager process 13366
pdsh#manager-4: localhost: ssh exited with exit code 1
I tried solutions at this stackoverflow question, which is very similar to my problem. But nothing worked out. A problem same as mine is posted in another forum here. However, no solution is available there as well.
Then, I tried another option which I am describing in the following text.
I set following exports in the file sbin/start-yarn.sh.
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"
Then executed with sbin/start-yarn.sh and I got the following error. Please note that I have done all the settings for passwordless ssh for root#localhost.
$ sudo sbin/start-yarn.sh
Starting resourcemanagers on []
localhost: Permission denied (publickey).
pdsh#manager-4: localhost: ssh exited with exit code 255
Starting nodemanagers
localhost: Permission denied (publickey).
pdsh#manager-4: localhost: ssh exited with exit code 255

In addition to the steps suggested by zhao, ephraimbuddy and qitian.
Please make sure that if you have a firewall running than the firewall is not blocking it in anyway. Also make sure that the user with which you are executing the command has enough permissions to update the priorities.

Before running the start-yarn script, try the command: ssh localhost

When you have set passwordless ssh for localhost change the pdsh_rcmd_type value to ssh:
export PDSH_RCMD_TYPE=ssh

this error info actually very confuse me, later i find it happens because i have not correctly config cgroup. so you can firstly check your config make sure they are all right, you can check you resourcemanager logs

I had the same issue, what helped me was the guide I found in this link!
The message "Cannot set priority of resourcemanager process" is misleading. I checked the resource manager logs and found that there was an error as follows
Unexpected close tag </property>; expected </configuration>

I had the same issue and was finally able to solve it. I got ResourceManager and NodeManager to run. If you're running Hadoop 3.3 and up, the issue might be with the java version you're using. hadoop_compatibility
" Apache Hadoop 3.3 and upper supports Java 8 and Java 11 (runtime only)
Please compile Hadoop with Java 8. Compiling Hadoop with Java 11 is not supported"
Solution:
Try switching to java 8.
Then make sure your JAVA_HOME path variables are pointing to java 8 (including any JAVA_HOME path variables in hadoop-env.sh).
If the issue persists, check the error messages in resourcemanager log located in $HADOOP_HOME/logs/.

Related

Hadoop: Unable to connect to Web GUI

Introduction: I'm using Ubuntu 18.04.2 LTS on which I'm trying to set up a Hadoop 3.2 Single Node Cluster. The installation goes perfectly fine, and I have Java installed. JPS is working as well.
Issue: I'm trying to connect to the Web GUI at localhost:50070, but I'm unable to. I'm attaching a snippet of my console when I execute ./start-all.sh:
root#it-research:/usr/local/hadoop/sbin# ./start-all.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [it-research]
Starting resourcemanager
Starting nodemanagers
pdsh#it-research: localhost: ssh exited with exit code 1
root#it-research:/usr/local/hadoop/sbin# jps
6032 Jps
3154 SecondaryNameNode
2596 NameNode
I'm unable to resolve localhost: ssh exited with exit code 1
Solutions I've tried:
Set up password-less SSH
Set up NameNode User
Set up PDSH to work with SSH
I've also added master [myIPAddressv4Here] in /etc/hosts file and tried connecting to master:50070. but still facing the same issue
Expected Behaviour: I should be able to connect to the Web GUI when I go to localhost:50070, but I can't.
Please let me know if there's some more information I should provide.
The port number for Hadoop 3.x is 9870, so localhost:9870 should work.

HDFS_NAMENODE_USER, HDFS_DATANODE_USER & HDFS_SECONDARYNAMENODE_USER not defined

I am new to hadoop.
I'm trying to install hadoop in my laptop in Pseudo-Distributed mode.
I am running it with root user, but I'm getting the error below.
root#debdutta-Lenovo-G50-80:~# $HADOOP_PREFIX/sbin/start-dfs.sh
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined.
Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined.
Aborting operation.
Starting secondary namenodes [debdutta-Lenovo-G50-80]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
Also, I have to run hadoop in root user as hadoop is not able to access ssh service with other user.
How to fix the same?
just do what it asks you:
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"
The root cause of this problem,
hadoop install for different user and you start yarn service for different user.
OR
in hadoop config's hadoop-env.sh specified HDFS_NAMENODE_USER and HDFS_DATANODE_USER user is something else.
Hence we need to correct and make it consistent at every place. So a simple solution of this problem is to edit your hadoop-env.sh file and add the user-name for which you want to start the yarn service. So go ahead and edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh by adding the following lines
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
Now save and start yarn, hdfs service and check that it works.
Based on on the first warning, HADOOP_PREFIX, sounds like you've not defined HADOOP_HOME correctly.
This would be done in your /etc/profile.d.
hadoop-env.sh is where the remainder of those variables are are defined.
Please refer to the UNIX Shell Guide
hadoop is not able to access ssh service with other user
This has nothing to do with Hadoop itself. It's basic SSH account management. You need to
Make the hadoop (and other, like yarn) accounts on all machines of a cluster (see adduser command documentation)
Copy a passwordless SSH key using ssh-copy-id hadoop#localhost, for example
If you don't need distributed mode and just want to use Hadoop locally, you can use a Mini Cluster.
The documentation also recommends making a single node installation before continuing to pseudo distributed
Vim ${HADOOP_HOME}sbin/start-dfs.sh & ${HADOOP_HOME}sbin/stop-dfs.sh, then add:
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
Check your pdsh default rcmd rsh
pdsh -q -w localhost -- should get something like this
-- DSH-specific options --
Separate stderr/stdout Yes
Path prepended to cmd none
Appended to cmd none
Command: none
Full program pathname /usr/bin/pdsh
Remote program path /usr/bin/pdsh
-- Generic options --
Local username enock
Local uid 1000
Remote username enock
Rcmd type rsh
one ^C will kill pdsh No
Connect timeout (secs) 10
Command timeout (secs) 0
Fanout 32
Display hostname labels Yes
Debugging No
-- Target nodes --
localhost
Modify pdsh default rcmd. Add pdsh to bashrc
nano ~/.bashrc
-- add this line towards the end
export PDSH_RCMD_TYPE=ssh
-- update
source ~/.bashrc
That should solve your problem
C. sbin/start-dfs.sh

Can't access Ganglia on EC2 Spark cluster

Launching using spark-ec2 script results in:
Setting up ganglia RSYNC'ing /etc/ganglia to slaves... <...>
Shutting down GANGLIA gmond: [FAILED]
Starting GANGLIA gmond: [ OK ]
Shutting down GANGLIA gmond: [FAILED]
Starting GANGLIA gmond: [ OK ]
Connection to <...> closed. <...> Stopping httpd:
[FAILED] Starting httpd: httpd: Syntax error on line 199 of
/etc/httpd/conf/httpd.conf: Cannot load modules/libphp-5.5.so into
server: /etc/httpd/modules/libphp-5.5.so: cannot open shared object
file: No such file or directory
[FAILED] [timing]
ganglia setup: 00h 00m 03s Connection to <...> closed.
Spark standalone cluster started at <...>:8080 Ganglia started at
<...>:5080/ganglia
Done!
However, when I netstat, there is no 5080 port listened on.
Is this related to the above error with httpd or it's something else?
EDIT:
So the issue is found (see the answer below), and the fix can be applied locally on the instance, after which Ganglia works fine. However the question is how to fix this issue in the root, so that spark-ec2 script can start Ganglia normally without intervention.
The fact that ganglia is not available is related to these errors - ganglia is php application and it won't run without php module for apache.
Which version of spark you are using to start cluster?
It is wierd error - these file should be present in AMI image.
Just traced the error: /etc/httpd/conf/httpd.conf is trying to load libphp-5.5 library while modules/ contains libphp-5.6 version...
Changing httpd.conf fixes the issue, however I'd be good to know a permanent fix within spark-ec2 script
This is because httpd fails to launch. As you have noted httpd.conf is trying to load modules and failing. You can reproduce the problem via apachectl start and examine exactly what modules are failing to load.
In my case there was one involving "auth" and "core". The last four (maybe five) listed will also fail to load. I did not encounter anything related to PHP so maybe our cases our different. Anyway the hacky solution is to comment out the problems. I did so and am running Ganglia without issue.

zookeeper.znode.parent mismatch exception

I have installed hadoop 2.2.0 & hbase-0.94.18 on ubuntu 12.04. When I try to run the command
create 't1','c1'
in hbase shell, I get the following error-
ERROR client.HConnectionManager$HConnectionImplementation:
Check the value configured in 'zookeeper.znode.parent'.
There could be a mismatch with the one configured in the master.
What's wrong?
A few things in no particular order:
To start with, let the error display continue. It will try 7 times and then exit. Before it exits, it will show the name of exception occurring. Try to look it up. It probably says MasterNotRunningException.
Verify that master is indeed running by doing $sudo jps. You should see an entry for HMaster. If not, start the hbase-master service.
Assuming you're going for pseudo-distributed mode, you may also want to check your /etc/hosts to make sure that entries point to 127.0.0.1 and not 127.0.1.1.
For cloudera's installs, here is a guide on how to setup HBase in pseudo-distributed mode. It also includes instructions to install hbase-master and zookeeper correctly.
Maybe you should check the file hbase-site.xml about zookeeper.znode.parent whether it's right. its default value is /hbase
Mine was set by default to /hbase-unsecure (hbase-site.xml)

Need help adding multiple DataNodes in pseudo-distributed mode (one machine), using Hadoop-0.18.0

I am a student, interested in Hadoop and started to explore it recently.
I tried adding an additional DataNode in the pseudo-distributed mode but failed.
I am following the Yahoo developer tutorial and so the version of Hadoop I am using is hadoop-0.18.0
I tried to start up using 2 methods I found online:
Method 1 (link)
I have a problem with this line
bin/hadoop-daemon.sh --script bin/hdfs $1 datanode $DN_CONF_OPTS
--script bin/hdfs doesn't seem to be valid in the version I am using. I changed it to --config $HADOOP_HOME/conf2 with all the configuration files in that directory, but when the script is ran it gave the error:
Usage: Java DataNode [-rollback]
Any idea what does the error mean? The log files are created but DataNode did not start.
Method 2 (link)
Basically I duplicated conf folder to conf2 folder, making necessary changes documented on the website to hadoop-site.xml and hadoop-env.sh. then I ran the command
./hadoop-daemon.sh --config ..../conf2 start datanode
it gives the error:
datanode running as process 4190. stop it first.
So I guess this is the 1st DataNode that was started, and the command failed to start another DataNode.
Is there anything I can do to start additional DataNode in the Yahoo VM Hadoop environment? Any help/advice would be greatly appreciated.
Hadoop start/stop scripts use /tmp as a default directory for storing PIDs of already started daemons. In your situation, when you start second datanode, startup script finds /tmp/hadoop-someuser-datanode.pid file from the first datanode and assumes that the datanode daemon is already started.
The plain solution is to set HADOOP_PID_DIR env variable to something else (but not /tmp). Also do not forget to update all network port numbers in conf2.
The smart solution is start a second VM with hadoop environment and join them in a single cluster. It's the way hadoop is intended to use.

Resources