Hadoop release missing /conf directory - hadoop

I am trying to install a single node setup of Hadoop on Ubuntu.
I started following the instructions on the Hadoop 2.3 docs.
But I seem to be missing something very simple.
First, it says to
To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.
Then,
Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.
However, I can't seem to find the conf directory.
I downloaded a release of 2.3 at one of the mirrors. Then unpacked the tarball, an ls of the inside returns:
$ ls
bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share
I was able to find the file they were referencing, just not in a conf directory:
$ find . -name hadoop-env.sh
./etc/hadoop/hadoop-env.sh
Am I missing something, or am I grabbing the wrong package? Or are the docs just outdated?
If so, anyone know where some more up-to date docs are?

I am trying to install a pseudo-distributed mode Hadoop, running into the same issue.
By following the book Hadoop The Definitive Guide (Third Edition), on page 618, it says:
In Hadoop 2.0 and later, MapReduce runs on YARN and there is an additional con-
figuration file called yarn-site.xml. All the configuration files should go in the
etc/hadoop subdirectory
Hope this confirms that etc/hadoop is the correct place.

I think the docs need to be updated. Although the directory structure has changed, file names for important files like hadoop-env.sh, core-ste.xml and hdfs-site.xml have not changed. You may find the following link useful for getting started.
http://codesfusion.blogspot.com/2013/10/setup-hadoop-2x-220-on-ubuntu.html

In Hadoop1,
{$HADOOP_HOME}/conf/
In Hadoop2,
{$HADOOP_HOME}/etc/hadoop

in Hadoop 2.7.3 the file is in hadoop-common/src/main/conf/
$ sudo find . -name hadoop-env.sh
./hadoop-2.7.3-src/hadoop-common-project/hadoop-common/src/main/conf/hadoop-env.sh

Just adding a note on the blog post http://codesfusion.blogspot.com/2013/10/setup-hadoop-2x-220-on-ubuntu.html. The blogpost is fantastic and very useful. That's how I got started. One aspect that I took a little time to figure is, that this blog seems to use a simplified way of providing configuration in the hadoop conf files such as "conf/core-site.xml", hdfs-site.xml
etc... as follows
<!--fs.default.name is the name node URI -->
<configuration>
fs.default.name
hdfs://localhost:9000
</configuration>
As per official docs there is a more rigorous way - that would be useful when you have more than one properties is to add it as follows ( please note - the description is optional :-) )
<configuration>
<property>
<name> fs.default.name </name>
<value>hdfs://localhost:9000 </value>
<description>the name node URI </description>
</property>
<!--Add more configuration properties here -->
</configuration>

The conf directory for Hadoop's (2022) version 3.3.1 is located in src/main directory:
$HOME/hadoop/hadoop3.3/hadoop-common-project/hadoop-common/src/main/

Related

Hadoop 2.4 installation for mac: file configuration

I am new to Hadoop. I am trying to set up Hadoop 2.4 on MacBook Pro using Homebrew. I have been following instructions in this web site (http://shayanmasood.com/blog/how-to-setup-hadoop-on-mac-os-x-10-9-mavericks/). I have installed Hadoop on my machine. Now I am trying to configure Hadoop.
One needs to configure the following files according to the website.
mapred-site.xml
hdfs-site.xml
core-site.xml
hadoop-env.sh
But, it seems that this information is a bit old. In Terminal, I see the following.
In Hadoop's config file:
/usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/hadoop-env.sh,
/usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/mapred-env.sh and
/usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/yarn-env.sh
$JAVA_HOME has been set to be the output of:
/usr/libexec/java_home
It seems that I have three files to configure here. Am I right on the track? There is information for hadoop-env.sh and mapped-env.sh for configuration. But, I have not seen one for yarn-env.sh. What do I have to do with this file?
The other question is how I access to these files for modification? I receive the following message in terminal right now.
-bash: /usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/hadoop-env.sh: Permission denied
If you have any suggestions, please let me know. Thank you very much for taking your time.
You can find the configuration files under :
/usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop
And concerning the permission for the scripts suggested by brew, you also need to change their mode.
In the scripts directory (/usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/)
sudo chmod +x *.sh
You are checking in hadoop/conf/ folder to amend below
mapred-site.xml,hdfs-site.xml,core-site.xml
And you can change permission of hadoop-env.sh to make changes to that.
Make sure that your session is in SSH. Then use the start-all.sh command to start Hadoop.

where did the configuration file stored in CDH4

I setup a CDH4
Now I can configure the hadoop on the web page.
I want to know where did the cdh put the configuration file on the local file system.
for example, I want to find the core-site.xml, but where is it?
By default, the installation of CDH has the conf directory located in
/etc/hadoop/
You could always use the following command to find the file:
$ sudo find / -name "core-site.xml"

Failed to set permissions of path: \tmp

Failed to set permissions of path: \tmp\hadoop-MayPayne\mapred\staging\MayPayne2016979439\.staging to 0700
I'm getting this error when the MapReduce job executing, I was using hadoop 1.0.4, then I got to know it's a known issue and I tried this with the 1.2.0 but the issue still exists. Can I know a hadoop version that they have resolved this issue.
Thank you all in advance
I was getting the same exception while runing nutch-1.7 on windows 7.
bin/nutch crawl urls -dir crawl11 -depth 1 -topN 5
The following steps worked for me
Download the pre-built JAR, patch-hadoop_7682-1.0.x-win.jar, from theDownload section. You may get the steps for hadoop.
Copy patch-hadoop_7682-1.0.x-win.jar to the ${NUTCH_HOME}/lib directory
Modify ${NUTCH_HOME}/conf/nutch-site.xml to enable the overriden implementation as shown below:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.file.impl</name>
<value>com.conga.services.hadoop.patch.HADOOP_7682.WinLocalFileSystem</value>
<description>Enables patch for issue HADOOP-7682 on Windows</description>
</property>
</configuration>
Run your job as usual (using Cygwin).
Downloading hadoop-core-0.20.2.jar and putting it on nutcher's lib directory resolved the problem for me
(In case of windows) If still not solved for you, try using this hadoop's patch
set the below vm arguments
-Dhadoop.tmp.dir=<A directory location with write permission>
to override the default /tmp directory
Also using hadoop-core-0.20.2.jar (http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core/0.20.2) will solve the reported issue.
I managed to solve this by changing the hadoop-core jar file little bit. Changed the error causing method in FileUtil.java in hadoop-core.jar file and recompiled and included in my eclipse project. Now the error is gone. I suggest every one of you to do that.

apache Hadoop-2.0.0 aplha version installation in full cluster using fedration

I had installed hadoop stable version successfully. but confused while installing hadoop -2.0.0 version.
I want to install hadoop-2.0.0-alpha on two nodes, using federation on both machines. rsi-1, rsi-2 are hostnames.
what should be values of below properties for implementation of federation. Both machines are also used for datanodes too.
fs.defaulFS dfs.federation.nameservices dfs.namenode.name.dir dfs.datanode.data.dir yarn.nodemanager.localizer.address yarn.resourcemanager.resource-tracker.address yarn.resourcemanager.scheduler.address yarn.resourcemanager.address
One more point, in stable version of hadoop i have configuration files under conf folder in installation directory.
But in 2.0.0-aplha version, there is etc/hadoop directory and it doesnt have mapred-site.xml, hadoop-env.sh. do i need to copy conf folder under share folder into hadoop-home directory? or do i need to copy these files from share folder into etc/hadoop directory?
Regards, Rashmi
You can run hadoop-setup-conf.sh in sbin folder. It instructs you step-by-step to configure.
Please remember when it asks you to input the directory path, you should use full link
e.g., when it asks for conf directory, you should input /home/user/Documents/hadoop-2.0.0/etc/hadoop
After completed, remember to check every configuration file in etc/hadoop.
As my experience, I modified JAVA_HOME variable in hadoop-env.sh and some properties in core-site.xml, mapred-site.xml.
Regards

Mkdirs failed to create hadoop.tmp.dir

I have upgraded from Apache Hadoop 0.20.2 to the newest stable release; 0.20.203. While doing that, I've also updated all configuration files properly. However, I am getting the following error while trying to run a job via a JAR file:
$ hadoop jar myjar.jar
$ Mkdirs failed to create /mnt/mydisk/hadoop/tmp
where /mnt/mydisk/hadoop/tmp is the location of hadoop.tmp.dir as stated in the core-site.xml:
..
<property>
<name>hadoop.tmp.dir</name>
<value>/mnt/mydisk/hadoop/tmp</value>
</property>
..
I've already checked that the directory exists, and that the permissions for the user hadoop are set correctly. I've also tried out to delete the directory, so that Hadoop itself can create it. But that didn't help.
Executing an Hadoop job with hadoop version 0.20.2 worked out of the box. However, something is broken after the update. Can someone help me to track down the problem?

Resources