no mapred-site.xml.template file in 3.0.0 - hadoop

I am in the process of installing a pseudo-distributed node Hadoop cluster on my Windows laptop using Oracle virtual box 5.1 and ubuntu. I have already downloaded version 3.0.0 from the mirror site. I trying to create the mapred-site.xml file by typing the command
sudo cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template \$HADOOP_HOME/etc/hadoop/mapred-site.xml
The mapred-site.xml.template file is not in the directory /usr/share/hadoop/etc/hadoop
Is the mapred-site.xml.template file not included in this release?
I have already searched StackOverflow with no success as well as googling this issue.

there is no .template in Hadoop 3.0.0 their should already be a mapred-site.xml file in Hadoop 3.0.0. If the file is not there for some reason you can create the xml file that references the configuration.xsl
*xml version="1.0"
xml-stylesheet type="text/xsl" href="configuration.xsl"
then you will fill out your configuration element.
try 2.7.5 if you want to cp that file

Related

Install Hive on windows: 'hive' is not recognized as an internal or external command, operable program or batch file

I have installed Hadoop 2.7.3 on Windows and I am able to start the cluster. Now I would like to have hive and went through the steps below:
1. Downloaded db-derby-10.12.1.1-bin.zip, unpacked it and started the startNetworkServer -h 0.0.0.0.
2. Downloaded apache-hive-1.1.1-bin.tar.gz from mirror site and unpacked it. Created hive-site.xml to have below properties:
javax.jdo.option.ConnectionURL
javax.jdo.option.ConnectionDriverName
hive.server2.enable.impersonation
hive.server2.authentication
datanucleus.autoCreateTables
hive.metastore.schema.verification
I have also setup HIVE_HOME and updated path. Also set HIVE_LIB and HIVE_BIN_PATH.
When I run hive from bin I get
'hive' is not recognized as an internal or external command,
operable program or batch file.
The bin/hive appears as filetype File.
Please suggest. Not sure if the hive version is correct one.
Thank you.
If someone is still going through this problem; here's what i did to solve hive installation on windows.
My configurations are as below (latest as of date):
I am using Windows 10
Hadoop 2.9.1
derby 10.14
hive 2.3.4 (my hive version does not contain bin/hive.cmd; the necessary file to run hive on windows)
#wheeler above mentioned that Hive is for Linux. Here's the hack to make it work for windows.
My Hive installation version did not come with windows executable files. Hence the hack!
STEP 1
There are 3 files which you need to specifically download from *https://svn.apache.org/repos/
https://svn.apache.org/repos/asf/hive/trunk/bin/hive.cmd
save it in your %HIVE_HOME%/bin/ as hive.cmd
https://svn.apache.org/repos/asf/hive/trunk/bin/ext/cli.cmd
save it in your %HIVE_HOME%/bin/ext/ as cli.cmd
https://svn.apache.org/repos/asf/hive/trunk/bin/ext/util/execHiveCmd.cmd
save it in your %HIVE_HOME%/bin/ext/util/ as execHiveCmd.cmd*
where %HIVE_HOME% is where Hive is installed.
STEP 2
Create tmp dir under your HIVE_HOME (on local machine and not on HDFS)
give 777 permissions to this tmp dir
STEP 3
Open your conf/hive-default.xml.template save it as conf/hive-site.xml
Then in this hive-site.xml, paste below properties at the top under
<property>
<name>system:java.io.tmpdir</name>
<value>{PUT YOUR HIVE HOME DIR PATH HERE}/tmp</value>
<!-- MY PATH WAS C:/BigData/hive/tmp -->
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>
(check the indents)
STEP 4
- Run Hadoop services
start-dfs
start-yarn
Run derby
StartNetworkServer -h 0.0.0.0
Make sure you have all above services running
- go to cmd for HIVE_HOME/bin and run hive command
hive
Version 1.1.1 of Apache Hive does not contain a version that can be executed on Windows (only Linux binaries):
However, version 2.1.1 does have Windows capabilities:
So even if you had your path correctly set, cmd wouldn't be able to find an executable it could run, since one doesn't exist in 1.1.1.
i also run into this problem. to get necessary file to run hive on windows i have downloaded hive-2.3.9 and hive-3.1.2 but none of them have this files.so, we have two option:
Option 1: install hive-2.1.0 and set it up as i have tried,
Hadoop 2.8.0
derby 10.12.1.1
hive 2.1.0
Option 2: download whole bin directory and replace with yours hive bin directory. for downloading bin we need wget utility for windows.
after that run this command(to understand how it works):
wget -r -np -nH --cut-dirs=3 -R index.html
https://svn.apache.org/repos/asf/hive/trunk/bin/
your downloaded bin looks like:
after replacing it you are ready to go. so now my configurations are as below:
Hadoop 3.3.1
derby 10.13.1.1
hive 2.3.9

Hadoop release missing /conf directory

I am trying to install a single node setup of Hadoop on Ubuntu.
I started following the instructions on the Hadoop 2.3 docs.
But I seem to be missing something very simple.
First, it says to
To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.
Then,
Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.
However, I can't seem to find the conf directory.
I downloaded a release of 2.3 at one of the mirrors. Then unpacked the tarball, an ls of the inside returns:
$ ls
bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share
I was able to find the file they were referencing, just not in a conf directory:
$ find . -name hadoop-env.sh
./etc/hadoop/hadoop-env.sh
Am I missing something, or am I grabbing the wrong package? Or are the docs just outdated?
If so, anyone know where some more up-to date docs are?
I am trying to install a pseudo-distributed mode Hadoop, running into the same issue.
By following the book Hadoop The Definitive Guide (Third Edition), on page 618, it says:
In Hadoop 2.0 and later, MapReduce runs on YARN and there is an additional con-
figuration file called yarn-site.xml. All the configuration files should go in the
etc/hadoop subdirectory
Hope this confirms that etc/hadoop is the correct place.
I think the docs need to be updated. Although the directory structure has changed, file names for important files like hadoop-env.sh, core-ste.xml and hdfs-site.xml have not changed. You may find the following link useful for getting started.
http://codesfusion.blogspot.com/2013/10/setup-hadoop-2x-220-on-ubuntu.html
In Hadoop1,
{$HADOOP_HOME}/conf/
In Hadoop2,
{$HADOOP_HOME}/etc/hadoop
in Hadoop 2.7.3 the file is in hadoop-common/src/main/conf/
$ sudo find . -name hadoop-env.sh
./hadoop-2.7.3-src/hadoop-common-project/hadoop-common/src/main/conf/hadoop-env.sh
Just adding a note on the blog post http://codesfusion.blogspot.com/2013/10/setup-hadoop-2x-220-on-ubuntu.html. The blogpost is fantastic and very useful. That's how I got started. One aspect that I took a little time to figure is, that this blog seems to use a simplified way of providing configuration in the hadoop conf files such as "conf/core-site.xml", hdfs-site.xml
etc... as follows
<!--fs.default.name is the name node URI -->
<configuration>
fs.default.name
hdfs://localhost:9000
</configuration>
As per official docs there is a more rigorous way - that would be useful when you have more than one properties is to add it as follows ( please note - the description is optional :-) )
<configuration>
<property>
<name> fs.default.name </name>
<value>hdfs://localhost:9000 </value>
<description>the name node URI </description>
</property>
<!--Add more configuration properties here -->
</configuration>
The conf directory for Hadoop's (2022) version 3.3.1 is located in src/main directory:
$HOME/hadoop/hadoop3.3/hadoop-common-project/hadoop-common/src/main/

where did the configuration file stored in CDH4

I setup a CDH4
Now I can configure the hadoop on the web page.
I want to know where did the cdh put the configuration file on the local file system.
for example, I want to find the core-site.xml, but where is it?
By default, the installation of CDH has the conf directory located in
/etc/hadoop/
You could always use the following command to find the file:
$ sudo find / -name "core-site.xml"

Failed to set permissions of path: \tmp

Failed to set permissions of path: \tmp\hadoop-MayPayne\mapred\staging\MayPayne2016979439\.staging to 0700
I'm getting this error when the MapReduce job executing, I was using hadoop 1.0.4, then I got to know it's a known issue and I tried this with the 1.2.0 but the issue still exists. Can I know a hadoop version that they have resolved this issue.
Thank you all in advance
I was getting the same exception while runing nutch-1.7 on windows 7.
bin/nutch crawl urls -dir crawl11 -depth 1 -topN 5
The following steps worked for me
Download the pre-built JAR, patch-hadoop_7682-1.0.x-win.jar, from theDownload section. You may get the steps for hadoop.
Copy patch-hadoop_7682-1.0.x-win.jar to the ${NUTCH_HOME}/lib directory
Modify ${NUTCH_HOME}/conf/nutch-site.xml to enable the overriden implementation as shown below:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.file.impl</name>
<value>com.conga.services.hadoop.patch.HADOOP_7682.WinLocalFileSystem</value>
<description>Enables patch for issue HADOOP-7682 on Windows</description>
</property>
</configuration>
Run your job as usual (using Cygwin).
Downloading hadoop-core-0.20.2.jar and putting it on nutcher's lib directory resolved the problem for me
(In case of windows) If still not solved for you, try using this hadoop's patch
set the below vm arguments
-Dhadoop.tmp.dir=<A directory location with write permission>
to override the default /tmp directory
Also using hadoop-core-0.20.2.jar (http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core/0.20.2) will solve the reported issue.
I managed to solve this by changing the hadoop-core jar file little bit. Changed the error causing method in FileUtil.java in hadoop-core.jar file and recompiled and included in my eclipse project. Now the error is gone. I suggest every one of you to do that.

Mkdirs failed to create hadoop.tmp.dir

I have upgraded from Apache Hadoop 0.20.2 to the newest stable release; 0.20.203. While doing that, I've also updated all configuration files properly. However, I am getting the following error while trying to run a job via a JAR file:
$ hadoop jar myjar.jar
$ Mkdirs failed to create /mnt/mydisk/hadoop/tmp
where /mnt/mydisk/hadoop/tmp is the location of hadoop.tmp.dir as stated in the core-site.xml:
..
<property>
<name>hadoop.tmp.dir</name>
<value>/mnt/mydisk/hadoop/tmp</value>
</property>
..
I've already checked that the directory exists, and that the permissions for the user hadoop are set correctly. I've also tried out to delete the directory, so that Hadoop itself can create it. But that didn't help.
Executing an Hadoop job with hadoop version 0.20.2 worked out of the box. However, something is broken after the update. Can someone help me to track down the problem?

Resources