Not taking right user name while starting Hadoop - hadoop

Im attempting to start Hadoop
./sbin/start-dfs.sh
but I get the following error
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
I ran this from my terminal before I execute start
export HADOOP_USER_NAME="myname"
export HDFS_NAMENODE_USER="myname"
export HDFS_DATANODE_USER="myname"
export HDFS_SECONDARYNAMENODE_USER="myname"
export YARN_RESOURCEMANAGER_USER="myname"
export YARN_NODEMANAGER_USER="myname"
I have also created data folder and assigned it to same user group. Anything else Im missing?

Related

How to remove ERROR start-dfs.sh in Hadoop-3.2.0

Getting following errors when running start-dfs.sh to start hadoop services:
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [ahsan-Lenovo-G570]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
In hadoop home directory open etc/hadoop/hadoop-env.sh file and add below lines to remove error:
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
You can add your user name by replacing root in above commands.

localhost: ERROR: Cannot set priority of datanode process 2984

I set up and configured a multi-node Hadoop .Will appear when I start
My Ubuntu is 16.04 and Hadoop is 3.0.2
Starting namenodes on [master]
Starting datanodes
localhost: ERROR: Cannot set priority of datanode process 2984
Starting secondary namenodes [master]
master: ERROR: Cannot set priority of secondarynamenode process 3175
2018-07-17 02:19:39,470 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting resourcemanager
Starting nodemanagers
Who can tell me which link is wrong?
I had the same error and fixed it by ensuring that the datanode and namenode locations have the right permissions and are owned by the user starting hadoop daemons.
Check that
The directory path properties in hdfs-site.xml under $HADOOP_CONF_DIR are pointing to valid locations.
dfs.namenode.name.dir
dfs.datanode.data.dir
dfs.namenode.checkpoint.dir
Hadoop user must have write permission for these paths
If the write permission is not present for the mentioned paths, then the processes might not start and the error you see can occur.
I had the same error, and tried the above method, but it doesn't work.
I set XXX_USER in all xxx-env.sh files, and got the same result.
Finally I set HADOOP_SHELL_EXECNAME="root" in ${HADOOP_HOME}/bin/hdfs, and the error disappeared.
The default value of HADOOP_SHELL_EXECNAME is "HDFS".
I had the same error when I renamed my Ubuntu home directory, and had to edit core-site.xml, changing the value of the property hadoop.tmp.dir to the new path.
Just append the word "native" to your HADOOP_OPTS like this:
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
I had the same issue, you just need to check hadoop/logs directory and look for a .log file for datanode, type more nameofthefile.log and check for the errors, mine was a problem in to configuration, I fixed it and it worked.

HDFS_NAMENODE_USER, HDFS_DATANODE_USER & HDFS_SECONDARYNAMENODE_USER not defined

I am new to hadoop.
I'm trying to install hadoop in my laptop in Pseudo-Distributed mode.
I am running it with root user, but I'm getting the error below.
root#debdutta-Lenovo-G50-80:~# $HADOOP_PREFIX/sbin/start-dfs.sh
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined.
Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined.
Aborting operation.
Starting secondary namenodes [debdutta-Lenovo-G50-80]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
Also, I have to run hadoop in root user as hadoop is not able to access ssh service with other user.
How to fix the same?
just do what it asks you:
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"
The root cause of this problem,
hadoop install for different user and you start yarn service for different user.
OR
in hadoop config's hadoop-env.sh specified HDFS_NAMENODE_USER and HDFS_DATANODE_USER user is something else.
Hence we need to correct and make it consistent at every place. So a simple solution of this problem is to edit your hadoop-env.sh file and add the user-name for which you want to start the yarn service. So go ahead and edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh by adding the following lines
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
Now save and start yarn, hdfs service and check that it works.
Based on on the first warning, HADOOP_PREFIX, sounds like you've not defined HADOOP_HOME correctly.
This would be done in your /etc/profile.d.
hadoop-env.sh is where the remainder of those variables are are defined.
Please refer to the UNIX Shell Guide
hadoop is not able to access ssh service with other user
This has nothing to do with Hadoop itself. It's basic SSH account management. You need to
Make the hadoop (and other, like yarn) accounts on all machines of a cluster (see adduser command documentation)
Copy a passwordless SSH key using ssh-copy-id hadoop#localhost, for example
If you don't need distributed mode and just want to use Hadoop locally, you can use a Mini Cluster.
The documentation also recommends making a single node installation before continuing to pseudo distributed
Vim ${HADOOP_HOME}sbin/start-dfs.sh & ${HADOOP_HOME}sbin/stop-dfs.sh, then add:
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
Check your pdsh default rcmd rsh
pdsh -q -w localhost -- should get something like this
-- DSH-specific options --
Separate stderr/stdout Yes
Path prepended to cmd none
Appended to cmd none
Command: none
Full program pathname /usr/bin/pdsh
Remote program path /usr/bin/pdsh
-- Generic options --
Local username enock
Local uid 1000
Remote username enock
Rcmd type rsh
one ^C will kill pdsh No
Connect timeout (secs) 10
Command timeout (secs) 0
Fanout 32
Display hostname labels Yes
Debugging No
-- Target nodes --
localhost
Modify pdsh default rcmd. Add pdsh to bashrc
nano ~/.bashrc
-- add this line towards the end
export PDSH_RCMD_TYPE=ssh
-- update
source ~/.bashrc
That should solve your problem
C. sbin/start-dfs.sh

Hadoop task tracker - all local directories are not writable

I have a 10 node cluster.
When I submit Hive jobs I get the below error -
WARN org.apache.hadoop.mapred.TaskTracker: Task Tracker local Incorrect permission for /data/gomz/mapred/local, expected: rwxr-xr-x, while actual: rwxrwxr-x
ERROR org.apache.hadoop.mapred.TaskTracker: Can not start TaskTracker because org.apache.hadoop.util.DiskChecker$DiskErrorException: all local directories are not writable
at org.apache.hadoop.mapred.TaskTracker.checkLocalDirs(TaskTracker.java:5268)
at org.apache.hadoop.mapred.TaskTracker.initializeDirectories(TaskTracker.java:907)
at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:979)
at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:2176)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:5310)
mapred.local.dir in both mapred-site.xml and taskcontroller.cfg point to /data/gomz/mapred/local
For my Hive sessions, I use the following settings:
SET hive.exec.scratchdir=/dev/tmp/hive;
SET hive.metastore.warehouse.dir=/dev/warehouse; (setting works for Hive jobs that do not launch MR)
What other local directories could the error be referring to ?
Can you check permission of /data?
Can you try this command?
sudo chown $USER /data
After executing this command, try again.

Error in Hadoop 2.2 while starting in windows

I am trying to install hadoop on windows7.i have installed cygwin, when i do ./start-dfs.sh i am getting the following error:
Error: Could not find or load main class org.apache.hadoop.hdfs.tools.GetConf
Starting namenodes on []
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-kalai-namenode kalai-PC.out
localhost: Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-kalai-datanode-kalai-PC.out
localhost: Error: Could not find or load main class org.apache.hadoop.hdfs.server.datanode.DataNode
Error: Could not find or load main class org.apache.hadoop.hdfs.tools.GetConf
Can anyone let me know what i'm doing here wrong?
The above issue gets cleared when I used Command Prompt with admin privileges for Formatting namenode and starting services.
Removed the C:\tmp and C:\data directories manually
Open a cmd.exe with 'Run as Administrator"
Format the namenode and start the services.

Resources