Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode Tried all solution still error persists - hadoop

I am following tutorial to install hadoop. It is explained with hadoop 1.x but I am using hadoop-2.6.0
I have successfully completed all the step just before executing following cmd.
bin/hadoop namenode -format
I am getting the following error when I execute the above command.
Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode
My hadoop-env.sh file
The java implementation to use.
export JAVA_HOME="C:/Program Files/Java/jdk1.8.0_74"
# The jsvc implementation to use. Jsvc is required to run secure datanodes
# that bind to privileged ports to provide authentication of data transfer
# protocol. Jsvc is not required if SASL is configured for authentication of
# data transfer protocol using non-privileged ports.
#export JSVC_HOME=${JSVC_HOME}
export HADOOP_PREFIX="/home/582092/hadoop-2.6.0"
export HADOOP_HOME="/home/582092/hadoop-2.6.0"
export HADOOP_COMMON_HOME=$HADOOP_HOME
#export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_PREFIX/bin
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/hdfs/hadoop-hdfs-2.6.0.jar
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
core-site.xml
core-site.xml Image
hdfs-site.xml
dfs.data.dir
/home/582092/hadoop-dir/datadir
dfs.name.dir
/home/582092/hadoop-dir/namedir
Kindly help me in fixing this issue.

One cause behind this problem might be a user-defined HDFS_DIR environment variable. This is picked up by scripts such as the following lines in libexec/hadoop-functions.sh:
HDFS_DIR=${HDFS_DIR:-"share/hadoop/hdfs"}
...
if [[ -z "${HADOOP_HDFS_HOME}" ]] &&
[[ -d "${HADOOP_HOME}/${HDFS_DIR}" ]]; then
export HADOOP_HDFS_HOME="${HADOOP_HOME}"
fi
The solution is to avoid defining an environment variable HDFS_DIR.
The recommendations in the comments of question are correct – use the hadoop classpath command to identify whether hadoop-hdfs-*.jar files are present in the classpath or not. They were missing in my case.

Related

How hive access Hadoop setup using different user

If i install hadoop using 'hadoop' user, and install hive using 'hive' user on same node(Pseudo distribution mode).
How can my hive access hadoop?
when i input 'hive --version', i receive error like this:
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path.
The question is hive user have no right to access hadoop, but i don't know how to fix it.
Thanks a lot.
As the error says, $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path.
So, edit /home/hive/.bash_profile, for example assuming you're on Linux, and add one of those values to set the environment variable to the downloaded Hadoop package
For example
export HADOOP_HOME=/opt/hadoop # example
export PATH=$HADOOP_HOME/bin:$PATH

HDFS_NAMENODE_USER, HDFS_DATANODE_USER & HDFS_SECONDARYNAMENODE_USER not defined

I am new to hadoop.
I'm trying to install hadoop in my laptop in Pseudo-Distributed mode.
I am running it with root user, but I'm getting the error below.
root#debdutta-Lenovo-G50-80:~# $HADOOP_PREFIX/sbin/start-dfs.sh
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined.
Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined.
Aborting operation.
Starting secondary namenodes [debdutta-Lenovo-G50-80]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
Also, I have to run hadoop in root user as hadoop is not able to access ssh service with other user.
How to fix the same?
just do what it asks you:
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"
The root cause of this problem,
hadoop install for different user and you start yarn service for different user.
OR
in hadoop config's hadoop-env.sh specified HDFS_NAMENODE_USER and HDFS_DATANODE_USER user is something else.
Hence we need to correct and make it consistent at every place. So a simple solution of this problem is to edit your hadoop-env.sh file and add the user-name for which you want to start the yarn service. So go ahead and edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh by adding the following lines
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
Now save and start yarn, hdfs service and check that it works.
Based on on the first warning, HADOOP_PREFIX, sounds like you've not defined HADOOP_HOME correctly.
This would be done in your /etc/profile.d.
hadoop-env.sh is where the remainder of those variables are are defined.
Please refer to the UNIX Shell Guide
hadoop is not able to access ssh service with other user
This has nothing to do with Hadoop itself. It's basic SSH account management. You need to
Make the hadoop (and other, like yarn) accounts on all machines of a cluster (see adduser command documentation)
Copy a passwordless SSH key using ssh-copy-id hadoop#localhost, for example
If you don't need distributed mode and just want to use Hadoop locally, you can use a Mini Cluster.
The documentation also recommends making a single node installation before continuing to pseudo distributed
Vim ${HADOOP_HOME}sbin/start-dfs.sh & ${HADOOP_HOME}sbin/stop-dfs.sh, then add:
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
Check your pdsh default rcmd rsh
pdsh -q -w localhost -- should get something like this
-- DSH-specific options --
Separate stderr/stdout Yes
Path prepended to cmd none
Appended to cmd none
Command: none
Full program pathname /usr/bin/pdsh
Remote program path /usr/bin/pdsh
-- Generic options --
Local username enock
Local uid 1000
Remote username enock
Rcmd type rsh
one ^C will kill pdsh No
Connect timeout (secs) 10
Command timeout (secs) 0
Fanout 32
Display hostname labels Yes
Debugging No
-- Target nodes --
localhost
Modify pdsh default rcmd. Add pdsh to bashrc
nano ~/.bashrc
-- add this line towards the end
export PDSH_RCMD_TYPE=ssh
-- update
source ~/.bashrc
That should solve your problem
C. sbin/start-dfs.sh

When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment

I am trying to run Spark using yarn and I am running into this error:
Exception in thread "main" java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
I am not sure where the "environment" is (what specific file?). I tried using:
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
in the bash_profile, but this doesn't seem to help.
While running spark using Yarn, you need to add following line in to spark-env.sh
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
Note: check $HADOOP_HOME/etc/hadoop is correct one in your environment. And spark-env.sh contains export of HADOOP_HOME as well.
For the Windows environment, open file load-spark-env.cmd in the Spark bin folder and add the following line:
set HADOOP_CONF_DIR=%HADOOP_HOME%\etc\hadoop
just an update to answer by Shubhangi,
cd $SPARK_HOME/bin
sudo nano load-spark-env.sh
add below lines , save and exit
export SPARK_LOCAL_IP="127.0.0.1"
export HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop"
export YARN_CONF_DIR="$HADOOP_HOME/etc/hadoop"

hadoop installation -hadooop-home set error

I'm installing Hanborq optimized Hadoop Distribution (fully distribution mode ) ,i followed all steps exactly in the following links,and there is no errors happened .when I reach to step that format the hdfs file :
$ hadoop namenode -format
An error accursed tells that "HADOOP_HOME is not set correctly
please set your hadoop_home variable to the absolute path of the directorythat contains hadoop-core-VERSION.jar"
installation_steps_1
installation_steps_2
It seems you did not set HADOOP_HOME correctly in .bashrc file. Add below lines in your .bashrc file and execute it by . .bashrc. Please give reply if it works
#HADOOP_HOME setup
export HADOOP_HOME="/usr/local/hadoop/hadoop-2.6"
PATH=$PATH:$HADOOP_HOME/bin
export PATH
Note: HADOOP_HOME is location of hadoop directory

running fpg algorithm of mahout on hadoop as cluster mod

I install mahout-0.7 and hadoop-1.2.1 on linux (centos).hadoop config as multi_node.
I created a user named hadoop and install mahout and hadoop in path /home/hadoop/opt/
I set MAHOU_HOME and HADOOP_HOME and MAHOUT_LOCAL , .... in .bashrc file in the user's environment hadoop
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# User specific aliases and functions
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.71/jre
export HADOOP_HOME=/home/hadoop/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=/opt/hadoop/conf
export MAHOUT_HOME=/home/hadoop/opt/mahout
export MAHOUT_CONF_DIR=$MAHOUT_HOME/conf
export PATH=$PATH:$MAHOUT_HOME/bin
I want to run mahout on hadoop systemfile ,When I run the following command, I get an error
command:
hadoop#master mahout$ bin/mahout fpg -i /home/hadoop/output.dat -o patterns -method mapreduce -k 50 -s 2
error:
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
Error occurred during initialization of VM
Could not reserve enough space for object heap
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Please help me. I tried but could not fix the error.
It seems that there are some conflicts in your configurations and usage.
In the fist look you can make sure about these:
To make sure that you've set the Mahout path correctly use this command:
echo $MAHOUT_LOCAL
This should not return an empty string (when you run mahout locally)
Also HADOOP_CONF_DIR should be set to $HADOOP_HOME/conf
Here's a list of popular environment variables for Hadoop:
#HADOOP VARIABLES START
export JAVA_HOME=/path/to/jdk1.8.0/ #your jdk path
export HADOOP_HOME=/usr/local/hadoop #your hadoop path
export HADOOP_INSTALL=/usr/local/hadoop #your hadoop path
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
export HADOOP_CLASSPATH=/home/hduser/lib/* #thir party libraries to be loaded with Hadoop
#HADOOP VARIABLES END
Also you get a heap error and you should increase your heap size so JVM will be enable to initialize
Also you may help to solve your error by adding more info about your cluster:
how many machine are you using?
what is the hardware spec of these machines?
what distribution and version of Hadoop?

Resources