Apache Sqoop - addtowar.sh not found - sqoop

I just now downloaded the Sqoop installation file sqoop-1.99.3-bin-hadoop100.tar.gz. I am not able to find the file addtowar.sh in it. I am following the installation instructions from here - https://sqoop.apache.org/docs/1.99.1/Installation.html . The following is the listing of the bin directory.
hduser#system:~/sqoop-1.99.3-bin-hadoop100/bin$ ls -ltr
total 8
-rwxr-xr-x 1 hduser2 hadoop 1361 Oct 18 2013 sqoop-sys.sh
-rwxr-xr-x 1 hduser2 hadoop 3439 Oct 18 2013 sqoop.sh
Am I missing something here or are the installations instructions not updated properly?

You may refer to the docs of the version you are using.
For 1.99.3 Refer the below link
http://sqoop.apache.org/docs/1.99.3/Installation.html

I don't have a direct answer but I have been tracking this down and it seems addtowar.sh has been removed (I am also using 1.99.3) in favor of adding hadoop jar directories into catalina.properties under common.loader line. However I cannot get this to work.

Definitely follow the 1.99.3 documentation:
http://sqoop.apache.org/docs/1.99.3/Installation.html
But they fail to mention in that documentation that you need to add all of the libraries for Hadoop to the common.loader variable in catalina.properties.
To get the sqoop client working, I had to add the following to catalina.properties:
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/Users/bone/tools/hadoop/share/hadoop/common/*.jar,/Users/bone/tools/hadoop/share/hadoop/yarn/lib/*.jar,/Users/bone/tools/hadoop/share/hadoop/mapreduce/*.jar,/Users/bone/tools/hadoop/share/hadoop/tools/lib/*.jar,/Users/bone/tools/hadoop/share/hadoop/common/lib/*.jar
In my case, /Users/bone/tools/hadoop was a complete install of hadoop-2.4.0.

Related

apache hadoop-tools installation in cloudera

I have cloudera 5.14 development environment. I want to install apache hadoop-tools(link) in the cloudera distribution .
Specifically I need hadoop-resourceestimator (link).
There is no documentation available how to install the same .
Any leads will be highly appreciated.
AFAIK cdh5.14.x is based on the old hadoop version 2.6.0 which does not have resourceestimator tool.
It is available but is not supported in CDH6 ("not supported" is not the same as "not available"). You can find resourceestimator in CDH6.x distribution,
-rw-r--r-- 1 root root 71105 Dec 6 03:13 /opt/cloudera/parcels/CDH/jars/hadoop-resourceestimator-3.0.0-cdh6.0.x-SNAPSHOT.jar
and you're free to use it, but Cloudera Support won't provide any help.

The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw- (on Windows)

I am running Spark on Windows 7. When I use Hive, I see the following error
The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
The permissions are set as the following
C:\tmp>ls -la
total 20
drwxr-xr-x 1 ADMIN Administ 0 Dec 10 13:06 .
drwxr-xr-x 1 ADMIN Administ 28672 Dec 10 09:53 ..
drwxr-xr-x 2 ADMIN Administ 0 Dec 10 12:22 hive
I have set "full control" to all users from Windows->properties->security->Advanced.
But I still see the same error.
I have checked a bunch of links, some say this is a bug on Spark 1.5?
First of all, make sure you are using correct Winutils for your OS. Then next step is permissions.
On Windows, you need to run following command on cmd:
D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive
Hope you have downloaded winutils already and set the HADOOP_HOME variable.
First thing first check your computer domain. Try
c:\work\hadoop-2.2\bin\winutils.exe ls c:/tmp/hive
If this command says access denied or FindFileOwnerAndPermission error (1789): The trust relationship between this workstation and the primary domain failed.
It means your computer domain controller is not reachable , possible reason could be you are not on same VPN as your system domain controller.Connect to VPN and try again.
Now try the solution provided by Viktor or Nishu.
You need to set this directory's permissions on HDFS, not your local filesystem. /tmp doesn't mean C:\tmp unless you set fs.defaultFs in core-site.xml to file://c:/, which is probably a bad idea.
Check it using
hdfs dfs -ls /tmp
Set it using
hdfs dfs -chmod 777 /tmp/hive
Next solution worked on Windows for me:
First, I defined HADOOP_HOME. It described in detail here
Next, I did like Nishu Tayal, but with one difference:C:\temp\hadoop\bin\winutils.exe chmod 777 \tmp\hive
\tmp\hive is not local directory
Error while starting the spark-shell on VM running on Windows:
Error msg: The root scratch dir: /tmp/hive on HDFS should be writable. Permission denied
Solution:
/tmp/hive is temporary directory. Only temporary files are kept in this
location. No problem even if we delete this directory, will be created when
required with proper permissions.
Step 1) In hdfs, Remove the /tmp/hive directory ==> "hdfs dfs -rm -r /tmp/hive"
2) At OS level too, delete the dir /tmp/hive ==> rm -rf /tmp/hive
After this, started the spark-shell and it worked fine..
This is a simple 4 step process:
For Spark 2.0+:
Download Hadoop for Windows / Winutils
Add this to your code (before SparkSession initialization):
if(getOS()=="windows"){
System.setProperty("hadoop.home.dir", "C:/Users//winutils-master/hadoop-2.7.1");
}
Add this to your spark-session (You can change it to C:/Temp instead of Desktop).
.config("hive.exec.scratchdir","C:/Users//Desktop/tmphive")
Open cmd.exe and run:
"path\to\hadoop-2.7.1\bin\winutils.exe" chmod 777 C:\Users\\Desktop\tmphive
The main reason is you started the spark at wrong directory. please create folders in D://tmp/hive (give full permissions) and start your spark in D: drive
D:> spark-shell
now it will work.. :)
Can please try giving 777 permission to the folder /tmp/hive because what I think is that spark runs as a anonymous user(which will come in other user category) and this permission should be recursive.
I had this same issue with 1.5.1 version of spark for hive, and it worked by giving 777 permission using below command on linux
chmod -r 777 /tmp/hive
There is a bug in Spark Jira for the same. This has been resolved few days back. Here is the link.
https://issues.apache.org/jira/browse/SPARK-10528
Comments have all options, but no guaranteed solution.
Issue resolved in spark version 2.0.2 (Nov 14 2016). Use this version .
Version 2.1.0 Dec 28 2016 release has same issues.
Use the latest version of "winutils.exe" and try. https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe
I also faced this issue. This issue is related to network. I installed spark on Windows 7 using particular domain.
Domain name can be checked
Start -> computer -> Right click -> Properties -> Computer name,
domain and workgroup settings -> click on change -> Computer Name
(Tab) -> Click on Change -> Domain name.
When I run spark-shell command, it works fine, without any error.
In other networks I received write permission error.
To avoid this error, run spark command on Domain specified in above path.
I was getting the same error "The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-" on Windows 7. Here is what I did to fix the issue:
I had installed Spark on C:\Program Files (x86)..., it was looking for /tmp/hive under C: i.e., C:\tmp\hive
I downloaded WinUtils.exe from https://github.com/steveloughran/winutils. I chose a version same as what I chose for hadoop package when I installed Spark. i.e., hadoop-2.7.1
(You can find the under the bin folder i.e., https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin)
Now used the following command to make the c:\tmp\hive folder writable
winutils.exe chmod 777 \tmp\hive
Note: With a previous version of winutils too, the chmod command was setting the required permission without error, but spark still complained that the /tmp/hive folder was not writable.
Using the correct version of winutils.exe did the trick for me. The winutils should be from the version of Hadoop that Spark has been pre built for.
Set HADOOP_HOME environment variable to the bin location of winutils.exe. I have stored winutils.exe along with C:\Spark\bin files. So now my SPARK_HOME and HADOOP_HOME point to the same location C:\Spark.
Now that winultils has been added to path, give permissions for hive folder using winutils.exe chmod 777 C:\tmp\hive
You don't have to fix the permission of /tmp/hive directory yourself (like some of the answers suggested). winutils can do that for you. Download the appropriate version of winutils from https://github.com/steveloughran/winutils and move it to spark's bin directory (e. x. C:\opt\spark\spark-2.2.0-bin-hadoop2.6\bin). That will fix it.
I was running spark test from IDEA, and in my case the issue was wrong winutils.exe version. I think you need to match it with you Hadoop version. You can find winutils.exe here
/*
Spark and hive on windows environment
Error: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
Pre-requisites: Have winutils.exe placed in c:\winutils\bin\
Resolve as follows:
*/
C:\user>c:\Winutils\bin\winutils.exe ls
FindFileOwnerAndPermission error (1789): The trust relationship between this workstation and the primary domain failed.
// Make sure you are connected to the domain controller, in my case I had to connect using VPN
C:\user>c:\Winutils\bin\winutils.exe ls c:\user\hive
drwx------ 1 BUILTIN\Administrators PANTAIHQ\Domain Users 0 Aug 30 2017 c:\user\hive
C:\user>c:\Winutils\bin\winutils.exe chmod 777 c:\user\hive
C:\user>c:\Winutils\bin\winutils.exe ls c:\user\hive
drwxrwxrwx 1 BUILTIN\Administrators PANTAIHQ\Domain Users 0 Aug 30 2017 c:\user\hive

Error in hadoop examples.jar

I just installed Hadoop from the yahoo developers network running on a vm. I ran the following code after start-all.sh after cd-ing to the bin folder
hadoop jar hadoop-0.19.0.-examples.jar pi 10 1000000
I'm getting
java. io.IOException:Error opening jon jar:hadoop-0.18.0-examples.jar
at org.apache.hadoop.util.main(RunJar.java:90) at
org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at
org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) caused
by:java.util.ZipExcaption:error in opening zip file
How do i sort this out?
Please make sure that have below things in place
Your examples.jar file is present in the path where you are running the above command. else you need to give complete path for the jar file.
hadoop jar /usr/lib/hadoop-mapreduce/*example.jar pi 10 100000
It has appropriate read permissions for the user that you are using to run the hadoop job.
If you still face issue, please update logs in your question.
You will face this issue if you are using older version of the java . Hadoop needs Java 7 or Java 8. Please check your JAVA version and update if needed.

unable to setup psuedo distributed hadoop cluster

I am using centos 7. Downloaded and untarred hadoop 2.4.0 and followed the instruction as per the link Hadoop 2.4.0 setup
Ran the following command.
./hdfs namenode -format
Got this error :
Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode
I see a number of posts with the same error with no accepted answers and I have tried them all without any luck.
This error can occur if the necessary jarfiles are not readable by the user running the "./hdfs" command or are misplaced so that they can't be found by hadoop/libexec/hadoop-config.sh.
Check the permissions on the jarfiles under: hadoop-install/share/hadoop/*:
ls -l share/hadoop/*/*.jar
and if necessary, chmod them as the owner of the respective files to ensure they're readable. Something like chmod 644 should be sufficient to at least check if that fixes the initial problem. For the more permanent fix, you'll likely want to run the hadoop commands as the same user that owns all the files.
I followed the link Setup hadoop 2.4.0
and I was able to get over the error message.
Seems like the documentation on hadoop site is not complete.

Hadoop release missing /conf directory

I am trying to install a single node setup of Hadoop on Ubuntu.
I started following the instructions on the Hadoop 2.3 docs.
But I seem to be missing something very simple.
First, it says to
To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.
Then,
Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.
However, I can't seem to find the conf directory.
I downloaded a release of 2.3 at one of the mirrors. Then unpacked the tarball, an ls of the inside returns:
$ ls
bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share
I was able to find the file they were referencing, just not in a conf directory:
$ find . -name hadoop-env.sh
./etc/hadoop/hadoop-env.sh
Am I missing something, or am I grabbing the wrong package? Or are the docs just outdated?
If so, anyone know where some more up-to date docs are?
I am trying to install a pseudo-distributed mode Hadoop, running into the same issue.
By following the book Hadoop The Definitive Guide (Third Edition), on page 618, it says:
In Hadoop 2.0 and later, MapReduce runs on YARN and there is an additional con-
figuration file called yarn-site.xml. All the configuration files should go in the
etc/hadoop subdirectory
Hope this confirms that etc/hadoop is the correct place.
I think the docs need to be updated. Although the directory structure has changed, file names for important files like hadoop-env.sh, core-ste.xml and hdfs-site.xml have not changed. You may find the following link useful for getting started.
http://codesfusion.blogspot.com/2013/10/setup-hadoop-2x-220-on-ubuntu.html
In Hadoop1,
{$HADOOP_HOME}/conf/
In Hadoop2,
{$HADOOP_HOME}/etc/hadoop
in Hadoop 2.7.3 the file is in hadoop-common/src/main/conf/
$ sudo find . -name hadoop-env.sh
./hadoop-2.7.3-src/hadoop-common-project/hadoop-common/src/main/conf/hadoop-env.sh
Just adding a note on the blog post http://codesfusion.blogspot.com/2013/10/setup-hadoop-2x-220-on-ubuntu.html. The blogpost is fantastic and very useful. That's how I got started. One aspect that I took a little time to figure is, that this blog seems to use a simplified way of providing configuration in the hadoop conf files such as "conf/core-site.xml", hdfs-site.xml
etc... as follows
<!--fs.default.name is the name node URI -->
<configuration>
fs.default.name
hdfs://localhost:9000
</configuration>
As per official docs there is a more rigorous way - that would be useful when you have more than one properties is to add it as follows ( please note - the description is optional :-) )
<configuration>
<property>
<name> fs.default.name </name>
<value>hdfs://localhost:9000 </value>
<description>the name node URI </description>
</property>
<!--Add more configuration properties here -->
</configuration>
The conf directory for Hadoop's (2022) version 3.3.1 is located in src/main directory:
$HOME/hadoop/hadoop3.3/hadoop-common-project/hadoop-common/src/main/

Resources