SQOOP IMPORT in avro format fails - hadoop

SQOOP IMPORT in avro format fails with below error. Please help. Code is given at bottom.
Caused by: java.lang.ClassNotFoundException:
org.apache.avro.mapred.AvroWrapper
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 13 more
script:
sqoop import -Dmapreduce.job.user.classpath.first=true --connect jdbc:mysql://localhost/test --table emp --target-dir /user/edureka/tableemp --username root -p \
--delete-target-dir \
--as-avrodatafile \
--compress \
-m 1 \

Issue resolved once copied avro mapred jar to sqoop home.

Related

Compression codec com.hadoop.compression.lzo.LzoCodec was not found

Trying to run a mapreduce job with compression
hadoop jar \
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
randomtextwriter \
-Ddfs.replication=1 -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec \
/tmp/randomtextwriter
Using parcels distributed lzo to all nodes in the cluster. Even then I am gettin the below error
Getting below error
Error: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec was not found.
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:140)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:56)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:659)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1731)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2409)
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:138)
... 10 more
As a temporary solution you can manually add the hadoop-lzo jar in the hadoop classpath .
curl https://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/hadoop-lzo-0.4.19.jar
hadoop jar \ /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \ randomtextwriter \ -Ddfs.replication=1 -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec \ /tmp/randomtextwriter --libjars hadoop-lzo-0.4.19.jar
Please make sure you download the compatible version of hadoop-lzo with your hadoop version.

I have installed Sqoop(1.4.7) in my system and while running import i am getting classnotdefexception

I have installed Sqoop(1.4.7) in my system and while running import command I am getting Classnotdef exception please find below import command and exception detail. please help me out.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/lang/StringUtils
at org.apache.sqoop.tool.BaseSqoopTool.validateHiveOptions(BaseSqoopTool.java:1583)
at org.apache.sqoop.tool.ImportTool.validateOptions(ImportTool.java:1178)
at org.apache.sqoop.Sqoop.run(Sqoop.java:137)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.lang.StringUtils
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:190)
at java.base/java.lang.ClassLoad
Command ---
sqoop import --connect jdbc:mysql://localhost/emp --table employee --username root --password Root123# --target-dir /data/sqoop -m 1

Unable run sqoop job in oozie (No enum constant com.cloudera.sqoop.SqoopOptions.FileLayout.ParquetFile)

I successfully created and execute sqoop import job, but unable to run it in oozie workflow. Sqoop imports the data from RDBMS to parquet file in HDFS. It seems the problem is related to parquet format. If I use --as-textfile, the workflow run without any problem.
Also, I've copied all parquet-*.jar from SQOOP_HOME/lib to oozie share lib.
Sqoop 1.4.7 and Oozie 4.3.1.
Sqoop job definition
$ sqoop job --create ingest_amsp_custmaster -- import --connect "jdbc:oracle:thin:#<IP>:<PORT>/<SID>" \
--username <USER> -P \
--table CUSTMASTER \
--as-parquetfile \
--target-dir /warehouse/raw/amsp/custmaster \
--delete-target-dir \
-m 1
Here's what I got from error log:
java.lang.IllegalArgumentException: No enum constant com.cloudera.sqoop.SqoopOptions.FileLayout.ParquetFile
at java.lang.Enum.valueOf(Enum.java:238)
at org.apache.sqoop.SqoopOptions.loadProperties(SqoopOptions.java:522)
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.read(HsqldbJobStorage.java:299)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:198)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:200)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:183)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:64)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:235)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:436)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:350)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:211)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:254)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Any help would be appreciated.
Thanks. Yusata.
Check version of Sqoop in Oozie workflow. It seems like it is not 1.4.7: stack trace line
org.apache.sqoop.SqoopOptions.loadProperties(SqoopOptions.java:522)
is corresponds to https://github.com/apache/sqoop/blob/20af67ef60096b17e1d9585670e5ec787eb760e2/src/java/org/apache/sqoop/SqoopOptions.java#L522

Unable to import data from Hdfs to Hbase using importtsv

I moved tab delimited file into hdfs now was trying to move it to hbase.
Below is my importtsv command
bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:ok,cf:ek,cf:rk,cf:rsk,cf:pdk,cf:pmk,cf:omk,cf:sok,cf:sdk,cf:cdk,cf:q,cf:uc,cf:up,cf:usp,cf:gm,cf:st,cf:gp -Dimporttsv.skip.bad.lines=false 'sales_fact' hdfs://localhost:54310/my/file.txt
it is trying to read a jar from location which doesnt exists.
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/home/elijah/Downloads/hbase/lib/htrace-core-3.1.0-incubating.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.hadoop.hbase.mapreduce.ImportTsv.run(ImportTsv.java:738)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.hbase.mapreduce.ImportTsv.main(ImportTsv.java:747)
I am not getting why it has mixed up hdfs and local dir path into one.
hdfs://localhost:54310/home/elijah/Downloads/hbase/lib/htrace-core-3.1.0-incubating.jar
User who is running import job has full access to hbase lib on local directory.
I can see -libjars option is missing....You can use -libjars option below is example usage :
hadoop jar \
hbase-server-0.98.6-cdh5.2.1.jar \
importtsv \
-libjars /opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/high-scale-lib-1.1.1.jar \
-Dimporttsv.separator=, -Dimporttsv.bulk.output=output \
-Dimporttsv.columns=HBASE_ROW_KEY,f:count wordcount \
word_count.csv
You can also do something like this:-
# export HADOOP_CLASSPATH=`./hbase classpath`
One of the jar which was missing i.e hbase/lib/htrace-core-3.1.0-incubating.jar will be hbase classpath. and should work in this case.

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hive/hcatalog/mapreduce/HCatOutputFormat

In my local box i have mysql installed and also installed sqoop locally to connect hive to pull the data
1) sqoop list-databases --connect jdbc:mysql://localhost/db --username db1
Which returns
16/05/13 21:49:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/05/13 21:49:50 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
search
test
2) I want to get data from hive table to mysql table so i tried
sqoop export --connect jdbc:mysql://localhost/db --username db1 --table search \
--driver com.mysql.jdbc.Driver \
--export-dir /projects/Tool_DB/search/dt=20160403/region=0 \
--input-fields-terminated-by '\001'
which displays error
16/05/13 21:48:24 INFO mapreduce.Job: Running job: job_1459561405699_1417866
16/05/13 21:48:43 INFO mapreduce.Job: Job job_1459561405699_1417866 running in uber mode : false
16/05/13 21:48:43 INFO mapreduce.Job: map 0% reduce 0%
16/05/13 21:48:52 INFO mapreduce.Job: Task Id : attempt_1459561405699_1417866_m_000001_0, Status : FAILED
Error: java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at org.apache.sqoop.mapreduce.ExportOutputFormat.getRecordWriter(ExportOutputFormat.java:79)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:350)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2408)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2445)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2230)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:813)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
3) Also i tried like
sqoop export --connect jdbc:mysql://localhost/db --username db1 --table search \
--driver com.mysql.jdbc.Driver \
--hcatalog-database db
--hcatalog-table search
and display error
16/05/13 21:53:08 INFO mapreduce.ExportJobBase: Configuring HCatalog for export job
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hive/hcatalog/mapreduce/HCatOutputFormat
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:420)
at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:912)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.lang.ClassNotFoundException: org.apache.hive.hcatalog.mapreduce.HCatOutputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
Any suggestion or tips to solve this errors?
You are facing this issue because the HCAT_HOME path is not set.
Kindly set the path and try.
export HCAT_HOME= < hive-hcatalog path in local>
For example :
export HCAT_HOME=/opt/cloudera/parcels/CDH/lib/hive-hcatalog

Resources