HDFS_NAMENODE_USER, HDFS_DATANODE_USER & HDFS_SECONDARYNAMENODE_USER not defined - hadoop

I am new to hadoop.
I'm trying to install hadoop in my laptop in Pseudo-Distributed mode.
I am running it with root user, but I'm getting the error below.
root#debdutta-Lenovo-G50-80:~# $HADOOP_PREFIX/sbin/start-dfs.sh
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined.
Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined.
Aborting operation.
Starting secondary namenodes [debdutta-Lenovo-G50-80]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
Also, I have to run hadoop in root user as hadoop is not able to access ssh service with other user.
How to fix the same?

just do what it asks you:
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"

The root cause of this problem,
hadoop install for different user and you start yarn service for different user.
OR
in hadoop config's hadoop-env.sh specified HDFS_NAMENODE_USER and HDFS_DATANODE_USER user is something else.
Hence we need to correct and make it consistent at every place. So a simple solution of this problem is to edit your hadoop-env.sh file and add the user-name for which you want to start the yarn service. So go ahead and edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh by adding the following lines
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
Now save and start yarn, hdfs service and check that it works.

Based on on the first warning, HADOOP_PREFIX, sounds like you've not defined HADOOP_HOME correctly.
This would be done in your /etc/profile.d.
hadoop-env.sh is where the remainder of those variables are are defined.
Please refer to the UNIX Shell Guide
hadoop is not able to access ssh service with other user
This has nothing to do with Hadoop itself. It's basic SSH account management. You need to
Make the hadoop (and other, like yarn) accounts on all machines of a cluster (see adduser command documentation)
Copy a passwordless SSH key using ssh-copy-id hadoop#localhost, for example
If you don't need distributed mode and just want to use Hadoop locally, you can use a Mini Cluster.
The documentation also recommends making a single node installation before continuing to pseudo distributed

Vim ${HADOOP_HOME}sbin/start-dfs.sh & ${HADOOP_HOME}sbin/stop-dfs.sh, then add:
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

Check your pdsh default rcmd rsh
pdsh -q -w localhost -- should get something like this
-- DSH-specific options --
Separate stderr/stdout Yes
Path prepended to cmd none
Appended to cmd none
Command: none
Full program pathname /usr/bin/pdsh
Remote program path /usr/bin/pdsh
-- Generic options --
Local username enock
Local uid 1000
Remote username enock
Rcmd type rsh
one ^C will kill pdsh No
Connect timeout (secs) 10
Command timeout (secs) 0
Fanout 32
Display hostname labels Yes
Debugging No
-- Target nodes --
localhost
Modify pdsh default rcmd. Add pdsh to bashrc
nano ~/.bashrc
-- add this line towards the end
export PDSH_RCMD_TYPE=ssh
-- update
source ~/.bashrc
That should solve your problem
C. sbin/start-dfs.sh

Related

Error in formating the namenode in Hadoop single cluster node

I am trying to install and configure hadoop in the system Ubuntu 16.04, as per the guidelines of https://data-flair.training/blogs/installation-of-hadoop-3-x-on-ubuntu/
all the steps were run successfully, but while trying to run the command hdfs namenode -format, I get a message
There is some problem with your bashrc file. Just check your variables inside bashrc. Even I faced the same problem when I started with hadoop. Mention correct path for each and every variable and afterwards use source ~/.bashrc to commit the changes done to your bashrc file

What are the required steps to use Beeline to query a remote Hadoop instance?

I have a Hadoop cluster running on another server. I am able to ssh into that server and use Hive to run queries. I'm trying to determine if I can query that server remotely, using Hive or Beeline; would prefer Beeline, since it's not being deprecated.
I used Homebrew to install Hadoop and Hive. However it complains about missing environment variables and path. But it seems like those things are set, so I must not have configured it correctly. So, what are the steps I need to go through to execute queries on a remote Hadoop from my Mac? Do I have to go through all the steps to set up a local Hadoop instance just so I can query a remote Hadoop?
~ (master) 10:24:30
# next line is from the docs
$ beeline -u jdbc:hive2://localhost:10000/default -n scott -w password_file
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path
~ (master) 10:25:05
$ which hadoop
/usr/local/bin/hadoop
~ (master) 10:25:18
$ echo $HADOOP_HOME
/usr/local/Cellar/hadoop/2.7.3/bin

Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode Tried all solution still error persists

I am following tutorial to install hadoop. It is explained with hadoop 1.x but I am using hadoop-2.6.0
I have successfully completed all the step just before executing following cmd.
bin/hadoop namenode -format
I am getting the following error when I execute the above command.
Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode
My hadoop-env.sh file
The java implementation to use.
export JAVA_HOME="C:/Program Files/Java/jdk1.8.0_74"
# The jsvc implementation to use. Jsvc is required to run secure datanodes
# that bind to privileged ports to provide authentication of data transfer
# protocol. Jsvc is not required if SASL is configured for authentication of
# data transfer protocol using non-privileged ports.
#export JSVC_HOME=${JSVC_HOME}
export HADOOP_PREFIX="/home/582092/hadoop-2.6.0"
export HADOOP_HOME="/home/582092/hadoop-2.6.0"
export HADOOP_COMMON_HOME=$HADOOP_HOME
#export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_PREFIX/bin
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/hdfs/hadoop-hdfs-2.6.0.jar
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
core-site.xml
core-site.xml Image
hdfs-site.xml
dfs.data.dir
/home/582092/hadoop-dir/datadir
dfs.name.dir
/home/582092/hadoop-dir/namedir
Kindly help me in fixing this issue.
One cause behind this problem might be a user-defined HDFS_DIR environment variable. This is picked up by scripts such as the following lines in libexec/hadoop-functions.sh:
HDFS_DIR=${HDFS_DIR:-"share/hadoop/hdfs"}
...
if [[ -z "${HADOOP_HDFS_HOME}" ]] &&
[[ -d "${HADOOP_HOME}/${HDFS_DIR}" ]]; then
export HADOOP_HDFS_HOME="${HADOOP_HOME}"
fi
The solution is to avoid defining an environment variable HDFS_DIR.
The recommendations in the comments of question are correct – use the hadoop classpath command to identify whether hadoop-hdfs-*.jar files are present in the classpath or not. They were missing in my case.

Need help adding multiple DataNodes in pseudo-distributed mode (one machine), using Hadoop-0.18.0

I am a student, interested in Hadoop and started to explore it recently.
I tried adding an additional DataNode in the pseudo-distributed mode but failed.
I am following the Yahoo developer tutorial and so the version of Hadoop I am using is hadoop-0.18.0
I tried to start up using 2 methods I found online:
Method 1 (link)
I have a problem with this line
bin/hadoop-daemon.sh --script bin/hdfs $1 datanode $DN_CONF_OPTS
--script bin/hdfs doesn't seem to be valid in the version I am using. I changed it to --config $HADOOP_HOME/conf2 with all the configuration files in that directory, but when the script is ran it gave the error:
Usage: Java DataNode [-rollback]
Any idea what does the error mean? The log files are created but DataNode did not start.
Method 2 (link)
Basically I duplicated conf folder to conf2 folder, making necessary changes documented on the website to hadoop-site.xml and hadoop-env.sh. then I ran the command
./hadoop-daemon.sh --config ..../conf2 start datanode
it gives the error:
datanode running as process 4190. stop it first.
So I guess this is the 1st DataNode that was started, and the command failed to start another DataNode.
Is there anything I can do to start additional DataNode in the Yahoo VM Hadoop environment? Any help/advice would be greatly appreciated.
Hadoop start/stop scripts use /tmp as a default directory for storing PIDs of already started daemons. In your situation, when you start second datanode, startup script finds /tmp/hadoop-someuser-datanode.pid file from the first datanode and assumes that the datanode daemon is already started.
The plain solution is to set HADOOP_PID_DIR env variable to something else (but not /tmp). Also do not forget to update all network port numbers in conf2.
The smart solution is start a second VM with hadoop environment and join them in a single cluster. It's the way hadoop is intended to use.

HADOOP_HOME and hadoop streaming

Hi I am trying to run hadoop on a server that has hadoop installed but I have no idea the directory where hadoop resides. The server was configure by the server admin.
In order to load hadoop I use the use command from the dotkit package.
There may be several solutions but wanted to know where the hadoop package was installed, how to set up the $HADOOP_HOME variable, and how to approp run a hadoop streaming job, such as $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/mapred/contrib/streaming/hadoop-streaming.jar, aka, http://wiki.apache.org/hadoop/HadoopStreaming.
Thanks! any help would be greatly appreciated!
If you're using a cloudera distribution then it's most probably in /usr/lib/hadoop, otherwise it could be anywhere (at the discretion of your system admin).
There are some tricks you can use to try and locate it:
locate hadoop-env.sh (assuming that locate has been installed and updatedb has been run recently)
If the machine you're running this on is running a hadoop service (such as data node, job tracker, task tracker, name node), then you can perform a process list and grep for the hadoop command: ps axww | grep hadoop
Failing the above two, look for the hadoop root directory in some common locations such as: /usr/lib, /usr/local, /opt
Failing all this, and assuming your current user has the permissions: find / -name hadoop-env.sh
If you're install with rpm then it's most probably in /etc/hadoop.
Why don't you try:
echo $HADOOP_HOME
Obiviously the above env variable has to be set before you could even issue hadoop executables from anywhere on the box.

Resources