"hadoop namenode -format" formats wrong directory - hadoop

I'm trying to install Hadoop 1.1.2.21 on CentOS 6.3
I've configured dfs.name.dir in /etc/hadoop/conf/hdfs-site.xml file
<name>dfs.name.dir</name>
<value>/mnt/ext/hadoop/hdfs/namenode</value>
But when I run "hadoop namenode -format" command, it formats /tmp/hadoop-hadoop/dfs/name instead.
What am I missing?

I ran into this problem and solved it. So updating this answer.
Make sure your environment variable HADOOP_CONF_DIR points to the directory where it can find all you xml files for used for configuration. It solved it for me.

It might be taking the path /tmp/hadoop-hadoop/dfs/name from hdfs-default.xml. Not sure why the value from hdfs-site.xml is not taken. Is dfs.name.dir marked as final in hdfs-default.xml?

Check if some Hadoop Process is running in the background already. This happens if you have aborted a previous process and it has not been killed and has become a ZOMBIE process
If that is the case kill the process and then again try to format the system
Also you can check the permission of the Directory.
Try to give a different location for the directory, if it is reflected

Please don't set HADOOP_CONF_DIR. You can check .bashrc file and remove it.

Related

JAVA_HOME is not set and could not be found. error on When Install HADOOP

I'm new to the hadoop. when the process of the installation, i gave hadoop.env.sh a JAVA_HOME path, but when I'm going to execute hdfs namenode -format it says that the java_home is not set.when check it again, it also saved in the hadoop.env.sh. i can't up the hdfs because of this. explained help is much appreciated.
thank u. i've attached the screen shot for the reference as well.[hadoop.env.sh view][error message]1
Can you restart HDFS service after adding JAVA_HOME to hadoop-env.sh?
ALso try echoing echo $JAVA_HOME before running hadoop namenode format command.
Make sure you have set environment variable correctly.
https://hadoopwala.wordpress.com/2016/07/03/java/
Reference: Hadoop-Psuedo Distributed Mode
Hope this helps.

Error in formating the namenode in Hadoop single cluster node

I am trying to install and configure hadoop in the system Ubuntu 16.04, as per the guidelines of https://data-flair.training/blogs/installation-of-hadoop-3-x-on-ubuntu/
all the steps were run successfully, but while trying to run the command hdfs namenode -format, I get a message
There is some problem with your bashrc file. Just check your variables inside bashrc. Even I faced the same problem when I started with hadoop. Mention correct path for each and every variable and afterwards use source ~/.bashrc to commit the changes done to your bashrc file

Hadoop 2.4 installation for mac: file configuration

I am new to Hadoop. I am trying to set up Hadoop 2.4 on MacBook Pro using Homebrew. I have been following instructions in this web site (http://shayanmasood.com/blog/how-to-setup-hadoop-on-mac-os-x-10-9-mavericks/). I have installed Hadoop on my machine. Now I am trying to configure Hadoop.
One needs to configure the following files according to the website.
mapred-site.xml
hdfs-site.xml
core-site.xml
hadoop-env.sh
But, it seems that this information is a bit old. In Terminal, I see the following.
In Hadoop's config file:
/usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/hadoop-env.sh,
/usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/mapred-env.sh and
/usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/yarn-env.sh
$JAVA_HOME has been set to be the output of:
/usr/libexec/java_home
It seems that I have three files to configure here. Am I right on the track? There is information for hadoop-env.sh and mapped-env.sh for configuration. But, I have not seen one for yarn-env.sh. What do I have to do with this file?
The other question is how I access to these files for modification? I receive the following message in terminal right now.
-bash: /usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/hadoop-env.sh: Permission denied
If you have any suggestions, please let me know. Thank you very much for taking your time.
You can find the configuration files under :
/usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop
And concerning the permission for the scripts suggested by brew, you also need to change their mode.
In the scripts directory (/usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/)
sudo chmod +x *.sh
You are checking in hadoop/conf/ folder to amend below
mapred-site.xml,hdfs-site.xml,core-site.xml
And you can change permission of hadoop-env.sh to make changes to that.
Make sure that your session is in SSH. Then use the start-all.sh command to start Hadoop.

A Hadoop DataNode error: host:port authority

guys.when I try to run the hadoop cluster ,but i don't make it .The main error is like this:
But the strong strange is that the NameNode,JobTracker,SecondNameNode and TaskTracker are ok,besides the dataNode .
My other configurations are like these:
hdfs-site.xml
core-site.xml
mapred-site.xml
I am not sure if it would help, but check this page
To quote from there,
Even thought I configured the core-site.xml, mapred-site.xml &
hdfs-site.xml under /usr/local/hadoop/conf/ folder, by default the
system is referring to /etc/hadoop/ *.xml. Once I update the
configuration files in /etc/hadoop location everything started
working.
Please make sure you are picking the correct set of configuration files. Looks like some classpath related issue since your setup is bypassing whatever you have configured in your core-site.xml. Make sure you don't have any classpath related issue. Do you have any other Hadoop setup on the same machine, which was done earlier, and then you forgot to edit the classpath for the current setup?
Also, http:// is not required in mapred-site.xml.
HTH

Where HDFS stores files locally by default?

I am running hadoop with default configuration with one-node cluster, and would like to find where HDFS stores files locally.
Any ideas?
Thanks.
You need to look in your hdfs-default.xml configuration file for the dfs.data.dir setting. The default setting is: ${hadoop.tmp.dir}/dfs/data and note that the ${hadoop.tmp.dir} is actually in core-default.xml described here.
The configuration options are described here. The description for this setting is:
Determines where on the local
filesystem an DFS data node should
store its blocks. If this is a
comma-delimited list of directories,
then data will be stored in all named
directories, typically on different
devices. Directories that do not exist
are ignored.
Seems like for the current version(2.7.1) the dir is
/tmp/hadoop-${user.name}/dfs/data
Based on dfs.datanode.data.dir, hadoop.tmp.dir setting from:
http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/core-default.xml
As "more recent answer" and to clarify hadoop version numbers:
If you use Hadoop 1.2.1 (or something similar), #Binary Nerd's answer is still true.
But if you use Hadoop 2.1.0-beta (or something similar), you should read the configuration documentation here and the option you want to set is: dfs.datanode.data.dir
For hadoop 3.0.0, the hdfs root path is as given by the property "dfs.datanode.data.dir"
Run this in the cmd prompt, and you will get the HDFS location:
bin/hadoop fs -ls /

Resources