Big Data: Sqoop-Export Error - hadoop

I am very new to this world. While running export command using sqoop, I am getting the below error “Input path does not exist: hdfs://quickstart.cloudera:8020/home/cloudera/Test5”. I have checked the path /home/cloudera/Test5 and the file exists in the path. From the core-site.xml file of sqoop configuration the details of hdfs path is coming, when I tested it through file browser Just opening IE and type hdfs://quickstart.cloudera:8020/home/cloudera/Test5, the message is coming as “Unable to connect”. I do not know the correct paramater values of the property. Please help me in solving this issue.
Please find the property file parameter and errir details below.
Parameter file
<name>fs.defaultFS</name>
<value>hdfs://quickstart.cloudera:8020</value>
Error
[cloudera#quickstart hadoop-conf]$ sqoop export --connect jdbc:sqlserver://10.34.83.177:54815 --username hadoop --password xxxxxx --table hadoop_sanjeeb3 --export-dir /home/cloudera/Test5 -mapreduce-job-name sqoop_export_job -m 1
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/10/01 08:42:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.2
15/10/01 08:42:47 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/10/01 08:42:48 INFO manager.SqlManager: Using default fetchSize of 1000
15/10/01 08:42:48 INFO tool.CodeGenTool: Beginning code generation
15/10/01 08:42:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [hadoop_sanjeeb3] AS t WHERE 1=0
15/10/01 08:42:49 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/aa9c9fd9f69b76202be29508561f22ff/hadoop_sanjeeb3.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/10/01 08:42:51 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/aa9c9fd9f69b76202be29508561f22ff/hadoop_sanjeeb3.jar
15/10/01 08:42:51 INFO mapreduce.ExportJobBase: Beginning export of hadoop_sanjeeb3
15/10/01 08:42:51 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
15/10/01 08:42:51 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/10/01 08:42:54 WARN mapreduce.ExportJobBase: Input path hdfs://quickstart.cloudera:8020/home/cloudera/Test5 does not exist
15/10/01 08:42:54 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
15/10/01 08:42:54 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/10/01 08:42:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/10/01 08:42:54 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/10/01 08:42:58 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/cloudera/.staging/job_1443557935828_0011
15/10/01 08:42:58 WARN security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/home/cloudera/Test5
15/10/01 08:42:58 ERROR tool.ExportTool: Encountered IOException running export job: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/home/cloudera/Test5
Regards - Sanjeeb

Seems like you have some confusion between local file system and hadoop file system. The file that you are trying to export using sqoop should be present in hdfs. The directory location /home/cloudera/Test seems to be present in local file system.
Execute the command given below and confirm that the location that you mentioned exists in hdfs.
hadoop fs -ls /home/cloudera/Test5
If this is giving error, it means the directory doesn't exists in hdfs. You can't browse hdfs with a simple ls command, you have to use hadoop commands. If you want to browse hdfs directories using a browser, open the namenode web ui (http://namenode-host:50070) and there you have an option to browse the files and directories.
You cannot browse the hdfs filesystem using a url like hdfs://quickstart.cloudera:8020/home/cloudera/Test5 using the browser. You can use webhdfs for similar operation.
Ensure that the file is present in hdfs and trigger the command again. It will work.
NB: Usually we never keep user directories like /home/cloudera in hdfs. The structure will be something like /user/{username}. By default, hdfs considers /user/{username} as the home dir in hdfs. Where {username} will be the current logged-in user in linux

That file may be in the local file system, but not in the hadoop distributed file system (HDFS). You can add those local files from local file system to HDFS by
hadoop fs -put <local_file_path> <HDFS_diresctory>
command. You should do it as an HDFS user.

Related

PIG command execution

I am learning Hadoop by myself so I am not sure if what I asking is even a problem. When I run the command pig -x local to run it locally, i get the following message:
15/10/05 15:23:28 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
15/10/05 15:23:28 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType
2015-10-05 15:23:28,830 [main] INFO org.apache.pig.Main - Apache Pig version 0.15.0 (r1682971) compiled Jun 01 2015, 11:44:35
2015-10-05 15:23:28,831 [main] INFO org.apache.pig.Main - Logging error messages to: /home/nkhl/pig_1444038808829.log
2015-10-05 15:23:29,050 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/nkhl/.pigbootup not found
2015-10-05 15:23:29,333 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-10-05 15:23:29,334 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-10-05 15:23:29,335 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2015-10-05 15:23:29,562 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
It looks different on my online tutor's screen so I am a little confused.
What concerns me most is the deprecation part. Can someone help me with that please? What is it trying to say? Don't get me wrong, everything works fine. The GRUNT shell loads up, and things execute fine. I just wanted to know what that meant.
It's an Ubuntu machine.
Thanks!
Running pig as local is great AFAIK if you are using for some quick testing.Like displaying the sysout in UDF etc.
The above warnings you can safely ignore.It is saying that some of the variables set in conf-site.xml are deprecated.
You can switch off those parameters by editing the
log4j.logger.org.apache.hadoop.conf.Configuration.deprecation
in log4j.properties file.
You have some Hadoop-related variables set, such as HADOOP_HOME or HADOOP_PREFIX or HADOOP_CONF_DIR, which aren't needed if you are running Pig in local mode.
unset HADOOP_HOME
unset HADOOP_PREFIX
unset HADOOP_CONF_DIR
Deprecations aren't scary. They are a reminder that the code is calling on something that will eventually go away in a future version. These specific deprecations are caused by differences between Hadoop 1 vs Hadoop 2. Pig is compatible with both versions. If you happened to be using Hadoop 1.2.1 instead of 2.x, you wouldn't see the warnings. This is because Pig is checking the Hadoop 1 values first.
If you're interested in learning more, you can check out the Pig source code.
https://github.com/apache/pig/blob/release-0.15.0/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java#L219-L222

Flag -useHCatalog not working

I installed CDH5.4 in single node following the instructions here, also, I put the hive-metastore in localmode using these instructions and everything works perfectly, except when I tried to connect pig with the metastore:
➜ ~ pig -useHCatalog
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2015-05-01 15:45:08,657 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.4.0 (rUnversioned directory) compiled Apr 21 2015, 12:19:15
2015-05-01 15:45:08,658 [main] INFO org.apache.pig.Main - Logging error messages to: /home/itam/pig_1430495108571.log
2015-05-01 15:45:09,035 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:09,035 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:09,035 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2015-05-01 15:45:09,940 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:09,941 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
2015-05-01 15:45:09,941 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:09,999 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,001 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,088 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,089 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,125 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,126 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,160 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,162 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,194 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,195 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,227 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,228 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,261 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,262 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,295 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,296 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
and when I tried to access the table:
grunt> a = load 'ufos' using org.apache.hcatalog.pig.HCatLoader();
2015-05-01 15:46:11,656 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /home/itam/pig_1430495108571.log
grunt>
Hadoop version
➜ ~ hadoop version
Hadoop 2.6.0-cdh5.4.0
Subversion http://github.com/cloudera/hadoop -r c788a14a5de9ecd968d1e2666e8765c5f018c271
Compiled by jenkins on 2015-04-21T19:16Z
Compiled with protoc 2.5.0
From source with checksum cd78f139c66c13ab5cee96e15a629025
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.4.0.jar
UPDATE: I just tried with Impala, and It neither sees anything:
➜ ~ impala-shell
/usr/lib/python2.7/dist-packages/pkg_resources.py:1049: UserWarning: /home/itam/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extracti
on_path or the PYTHON_EGG_CACHE environment variable).
warnings.warn(msg, UserWarning)
Starting Impala Shell without Kerberos authentication
Connected to 6b512e41337d:21000
Server version: impalad version 2.2.0-cdh5 RELEASE (build 2ffd73a4255cefd521362ffe1cfb37463f67f75c)
Welcome to the Impala shell. Press TAB twice to see a list of available commands.
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Shell build version: Impala Shell v2.2.0-cdh5 (2ffd73a) built on Tue Apr 21 12:09:21 PDT 2015)
[6b512e41337d:21000] > invalidate metadata;
Query: invalidate metadata
[6b512e41337d:21000] > show tables;
Query: show tables
Fetched 0 row(s) in 0.00s
but from beeline:
~ beeline -u jdbc:hive2://
scan complete in 2ms
Connecting to jdbc:hive2://
Connected to: Apache Hive (version 1.1.0-cdh5.4.0)
Driver: Hive JDBC (version 1.1.0-cdh5.4.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.4.0 by Apache Hive
0: jdbc:hive2://> show tables;
OK
+-----------+--+
| tab_name |
+-----------+--+
| ufos |
+-----------+--+
1 row selected (0.701 seconds)
It worked... What is happening?
UPDATE: I am running hcatalog too
➜ ~ sudo service hive-webhcat-server status
* WEBHCat server is running
➜ ~ hcat -e "desc ufos"
OK
timestamp string from deserializer
city string from deserializer
state string from deserializer
shape string from deserializer
duration string from deserializer
summary string from deserializer
posted string from deserializer
Time taken: 1.314 seconds
UPDATE: The problem with impala was due that I didn't copy hive-site.xml to /etc/impala/conf, once this is done, impala-shell worked properly.
The loader you are using is deprecated. Instead of using org.apache.hcatalog.pig.HCatLoader, you need to use org.apache.hive.hcatalog.pig.HCatLoader.
From org.apache.hcatalog.pig.HCatLoader:
Deprecated.
Use/modify HCatLoader instead
I was facing the issue in HDP 2.3 and Pig 0.15 .
Package name for HCatLoader() class is different in Hortonworks distribution.
The following worked for me
USING org.apache.hive.hcatalog.pig.HCatLoader()
instead of
USING org.apache.hcatalog.pig.HCatLoader();
like you started to see the issues you have is with hive-site.xmlfile - you need to place it in the classpath
As mention here:
A workflow action interacting with HCatalog requires the following
jars in the classpath: hcatalog-core.jar, webhcat-java-client.jar,
hive-common.jar, hive-exec.jar, hive-metastore.jar, hive-serde.jar and
libfb303.jar. hive-site.xml which has the configuration to talk to the
HCatalog server also needs to be in the classpath. The correct version
of HCatalog and hive jars should be placed in classpath based on the
version of HCatalog installed on the cluster.
The jars can be added to the classpath of the action using one of the
below ways.
You can place the jars and hive-site.xml in the system shared library.
The shared library for a pig, hive or java action can be overridden to
include hcatalog shared libraries along with the action's shared
library. Refer to Shared Libraries for more information. The
oozie-sharelib-[version].tar.gz in the oozie distribution bundles the
required HCatalog jars in a hcatalog sharelib. If using a different
version of HCatalog than the one bundled in the sharelib, copy the
required HCatalog jars from such version into the sharelib.
You can
place the jars and hive-site.xml in the workflow application lib/
path.
You can specify the location of the jar files in archive tag and
the hive-site.xml in file tag in the corresponding pig, hive or java
action.
If you are going to use Oozie coordinator, upload them to HDFS coordinator path

PIG setup throwing error

I was trying to install PIG v0.13.0 in my Fedora 20 system. After extracting the tar.gz contents, I did the PATH setup for JAVA_HOME and PIG/bin. Then I type the command pig in the console and this is what I got: Unable to understand what went wrong:
[root#localhost /]# pig
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-12-21 00:05:16,082 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-12-21 00:05:16,083 [main] INFO org.apache.pig.Main - Logging error messages to: //pig_1419100516081.log
2014-12-21 00:05:16,130 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2014-12-21 00:05:16,765 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-12-21 00:05:16,771 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-12-21 00:05:16,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2014-12-21 00:05:16,780 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2014-12-21 00:05:19,130 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-12-21 00:05:19,130 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
2014-12-21 00:05:19,136 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> ls
2014-12-21 00:05:33,697 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Call From localhost.localdomain/127.0.0.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Details at logfile: //pig_1419100516081.log
Please let me know why did the ls command in grunt shell throw the error?
Please guide.
When you type pig in console, by default it will go to MAPREDUCE mode, for that you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode in pig.
It looks like your hadoop cluster is not configured properly that is the reason you are getting the connection refunded error. Please follow up this link to solve this connect-refused problem.http://wiki.apache.org/hadoop/ConnectionRefused.
As a workaround use LOCAL mode, this doesn't need hadoop installation.
In the console type pig -x local this will bring the grunt shell and type ls command.
Local mode
$ pig -x local
Mapreduce mode
$ pig
(or) //try to connect HDFS
$ pig -x mapreduce
Ok I got this one working. if I connect to the pig mapreduce mode the the ls command will change to ls hdfs:/. Hence changing the above command from ls to ls hdfs:/ resolves my problem. But again, if I am connecting to the local mode then the ls command works fine.

Sqoop running into local job runner mode

When i run sqoop am not sure why it runs into local job runner mode and then says that i have provided invalid jobtracker url for LocalJobRunner. Can anyone tell whats going on?
$ bin/sqoop import -jt myjobtracker:50070 --connect jdbc:mysql://mydbhost.com/mydata --username foo --password bar --as-parquetfile --table campaigns --target-dir hdfs://myhdfs:8020/user/myself/campaigns
14/08/20 21:04:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-SNAPSHOT
14/08/20 21:04:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/08/20 21:04:51 INFO manager.SqlManager: Using default fetchSize of 1000
14/08/20 21:04:51 INFO tool.CodeGenTool: Beginning code generation
14/08/20 21:04:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `campaigns` AS t LIMIT 1
14/08/20 21:04:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `campaigns` AS t LIMIT 1
14/08/20 21:04:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `campaigns` AS t LIMIT 1
14/08/20 21:04:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-myself/compile/6acdb40688239f19ddf86a1290ad6c64/campaigns.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/08/20 21:04:54 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-myself/compile/6acdb40688239f19ddf86a1290ad6c64/campaigns.jar
14/08/20 21:04:54 WARN manager.MySQLManager: It looks like you are importing from mysql.
14/08/20 21:04:54 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
14/08/20 21:04:54 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
14/08/20 21:04:54 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
14/08/20 21:04:54 INFO mapreduce.ImportJobBase: Beginning import of campaigns
14/08/20 21:04:54 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
14/08/20 21:04:54 WARN mapred.JobConf: The variable mapred.child.ulimit is no longer used.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/hbase/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
14/08/20 21:04:54 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/08/20 21:04:56 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/08/20 21:04:56 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid "mapreduce.jobtracker.address" configuration value for LocalJobRunner : "myjobtracker:50070"
14/08/20 21:04:56 ERROR security.UserGroupInformation: PriviledgedActionException as:myself (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
14/08/20 21:04:56 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1239)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1235)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1234)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1263)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1287)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:247)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:665)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:102)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Figured out the problem, i was running sqoop 1.4.5 and pointing it to the latest hadoop 2.0.0-cdh4.4.0 which had the yarn stuff also thats why it was complaining.
When i pointed sqoop to hadoop-0.20/2.0.0-cdh4.4.0 (MR1 i think) it worked.

Sqoop error while importing

I am new to sqoop and trying ti import a table in MYSQL table widgets table from the hadoopguide Database.
I am using Hadoop version 0.20.
and my Sqoop is sqoop-1.4.4.bin__hadoop-0.20
I am Running the command:
sqoop import --connect jdbc:mysql://localhost/hadoopguide --table widgets -m 1
This is error log I am getting
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
13/09/25 15:29:41 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
13/09/25 15:29:41 INFO tool.CodeGenTool: Beginning code generation
13/09/25 15:29:41 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `widgets` AS t LIMIT 1
13/09/25 15:29:41 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `widgets` AS t LIMIT 1
13/09/25 15:29:41 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
13/09/25 15:29:41 INFO orm.CompilationManager: Found hadoop core jar at: /usr/local/hadoop/hadoop-0.20.2-core.jar
Note: /tmp/sqoop-ubuntu/compile/348861f092b25aac3fae4089da9abdf0/widgets.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/09/25 15:29:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop- ubuntu/compile/348861f092b25aac3fae4089da9abdf0/widgets.jar
13/09/25 15:29:42 WARN manager.MySQLManager: It looks like you are importing from mysql.
13/09/25 15:29:42 WARN manager.MySQLManager: This transfer can be faster! Use the -- direct
13/09/25 15:29:42 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
13/09/25 15:29:42 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
13/09/25 15:29:42 INFO mapreduce.ImportJobBase: Beginning import of widgets
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
at org.apache.sqoop.mapreduce.db.DBConfiguration.getPassword(DBConfiguration.java:304)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:272)
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:187)
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:162)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:882)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:600)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:413)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Can anyone have any idea about it.
If you have installed hive, hcatalog is installed with it. Now set the HCAT_HOME in your .bashrc as below
cd ~
gedit .bashrc
export HCAT_HOME=${HIVE_HOME}/hcatalog/
export PATH=$HCAT_HOME/bin:$PATH
source .bashrc //to refresh the .bashrc file
otherwise install hcatalog saperately and set home path.
The Hadoop release 0.20 is very old release that is lacking a lot of features. One feature that Sqoop requires is a security additions that were added in 1.x. As a result Sqoop won't work on bare 0.20, at least CDH3u1 or Hadoop 1.x is required. I would strongly suggest to upgrade your Hadoop cluster.

Resources