Trying to run a mapreduce job with compression
hadoop jar \
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
randomtextwriter \
-Ddfs.replication=1 -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec \
/tmp/randomtextwriter
Using parcels distributed lzo to all nodes in the cluster. Even then I am gettin the below error
Getting below error
Error: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec was not found.
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:140)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:56)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:659)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1731)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2409)
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:138)
... 10 more
As a temporary solution you can manually add the hadoop-lzo jar in the hadoop classpath .
curl https://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/hadoop-lzo-0.4.19.jar
hadoop jar \ /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \ randomtextwriter \ -Ddfs.replication=1 -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec \ /tmp/randomtextwriter --libjars hadoop-lzo-0.4.19.jar
Please make sure you download the compatible version of hadoop-lzo with your hadoop version.
Related
Referenced blog : https://blog.codecentric.de/en/2020/06/spring-boot-graalvm-native-image-maven-plugin/
The following is my application version information
Spring Boot version : 2.3.3.RELEASE
spring-graalvm-native version : 0.7.0
native-image-maven-plugin version : 20.1.0
native-image command
time native-image \
-J-Xmx4G \
-Dspring.native.verbose=true \
-H:+TraceClassInitialization \
-H:Name=$ARTIFACT \
-H:+ReportExceptionStackTraces \
-H:+ReportUnsupportedElementsAtRuntime \
-Dspring.graal.missing-selector-hints=warning \
-Dspring.graal.remove-unused-autoconfig=true \
-Dspring.graal.remove-yaml-support=true \
-cp $CP $MAINCLASS;
When executed with the above command, the following error occurs.
Excluding 2 auto-configurations from spring.factories file
Processing spring.factories - EnableAutoConfiguration lists #7 configurations
Fatal error:java.lang.IllegalStateException: java.lang.IllegalStateException: No access hint found for import selector: org.springframework.boot.autoconfigure.ImportAutoConfigurationImportSelector
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:593)
at java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1005)
at com.oracle.svm.hosted.NativeImageGenerator.run(NativeImageGenerator.java:463)
at com.oracle.svm.hosted.NativeImageGeneratorRunner.buildImage(NativeImageGeneratorRunner.java:359)
at com.oracle.svm.hosted.NativeImageGeneratorRunner.build(NativeImageGeneratorRunner.java:518)
at com.oracle.svm.hosted.NativeImageGeneratorRunner.main(NativeImageGeneratorRunner.java:117)
Caused by: java.lang.IllegalStateException: No access hint found for import selector: org.springframework.boot.autoconfigure.ImportAutoConfigurationImportSelector
at org.springframework.graalvm.type.Type.getHints(Type.java:1124)
at org.springframework.graalvm.support.ResourcesHandler.processType(ResourcesHandler.java:900)
at org.springframework.graalvm.support.ResourcesHandler.processType(ResourcesHandler.java:1155)
at org.springframework.graalvm.support.ResourcesHandler.processType(ResourcesHandler.java:817)
at org.springframework.graalvm.support.ResourcesHandler.checkAndRegisterConfigurationType(ResourcesHandler.java:807)
at org.springframework.graalvm.support.ResourcesHandler.processSpringFactory(ResourcesHandler.java:741)
at org.springframework.graalvm.support.ResourcesHandler.processSpringFactories(ResourcesHandler.java:578)
at org.springframework.graalvm.support.ResourcesHandler.register(ResourcesHandler.java:122)
at org.springframework.graalvm.support.SpringFeature.beforeAnalysis(SpringFeature.java:78)
at com.oracle.svm.hosted.NativeImageGenerator.lambda$runPointsToAnalysis$7(NativeImageGenerator.java:679)
at com.oracle.svm.hosted.FeatureHandler.forEachFeature(FeatureHandler.java:70)
at com.oracle.svm.hosted.NativeImageGenerator.runPointsToAnalysis(NativeImageGenerator.java:679)
at com.oracle.svm.hosted.NativeImageGenerator.doRun(NativeImageGenerator.java:538)
at com.oracle.svm.hosted.NativeImageGenerator.lambda$run$0(NativeImageGenerator.java:451)
at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Error: Image build request failed with exit status 1
How can I fix it? I ask for help.
I successfully created and execute sqoop import job, but unable to run it in oozie workflow. Sqoop imports the data from RDBMS to parquet file in HDFS. It seems the problem is related to parquet format. If I use --as-textfile, the workflow run without any problem.
Also, I've copied all parquet-*.jar from SQOOP_HOME/lib to oozie share lib.
Sqoop 1.4.7 and Oozie 4.3.1.
Sqoop job definition
$ sqoop job --create ingest_amsp_custmaster -- import --connect "jdbc:oracle:thin:#<IP>:<PORT>/<SID>" \
--username <USER> -P \
--table CUSTMASTER \
--as-parquetfile \
--target-dir /warehouse/raw/amsp/custmaster \
--delete-target-dir \
-m 1
Here's what I got from error log:
java.lang.IllegalArgumentException: No enum constant com.cloudera.sqoop.SqoopOptions.FileLayout.ParquetFile
at java.lang.Enum.valueOf(Enum.java:238)
at org.apache.sqoop.SqoopOptions.loadProperties(SqoopOptions.java:522)
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.read(HsqldbJobStorage.java:299)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:198)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:200)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:183)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:64)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:235)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:436)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:350)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:211)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:254)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Any help would be appreciated.
Thanks. Yusata.
Check version of Sqoop in Oozie workflow. It seems like it is not 1.4.7: stack trace line
org.apache.sqoop.SqoopOptions.loadProperties(SqoopOptions.java:522)
is corresponds to https://github.com/apache/sqoop/blob/20af67ef60096b17e1d9585670e5ec787eb760e2/src/java/org/apache/sqoop/SqoopOptions.java#L522
What should i do further?
I have an error message when running this jar file on hadoop system.
hadoop jar units.jar /input_dir/sample.txt /output_dir/result
Exception in thread "main" java.lang.ClassNotFoundException:
/input_di /sample/txt at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:278) at
org.apache.hadoop.util.RunJar.run(RunJar.java:214) at
org.apache.hadoop.util.RunJar.main(RunJar.java:136)
From the Apache Hadoop Docs :
Usage: hadoop jar <jar> [mainClass] args...
Runs a jar file.
You are missing the fully qualified class name in your JAR command.
I moved tab delimited file into hdfs now was trying to move it to hbase.
Below is my importtsv command
bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:ok,cf:ek,cf:rk,cf:rsk,cf:pdk,cf:pmk,cf:omk,cf:sok,cf:sdk,cf:cdk,cf:q,cf:uc,cf:up,cf:usp,cf:gm,cf:st,cf:gp -Dimporttsv.skip.bad.lines=false 'sales_fact' hdfs://localhost:54310/my/file.txt
it is trying to read a jar from location which doesnt exists.
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/home/elijah/Downloads/hbase/lib/htrace-core-3.1.0-incubating.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.hadoop.hbase.mapreduce.ImportTsv.run(ImportTsv.java:738)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.hbase.mapreduce.ImportTsv.main(ImportTsv.java:747)
I am not getting why it has mixed up hdfs and local dir path into one.
hdfs://localhost:54310/home/elijah/Downloads/hbase/lib/htrace-core-3.1.0-incubating.jar
User who is running import job has full access to hbase lib on local directory.
I can see -libjars option is missing....You can use -libjars option below is example usage :
hadoop jar \
hbase-server-0.98.6-cdh5.2.1.jar \
importtsv \
-libjars /opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/high-scale-lib-1.1.1.jar \
-Dimporttsv.separator=, -Dimporttsv.bulk.output=output \
-Dimporttsv.columns=HBASE_ROW_KEY,f:count wordcount \
word_count.csv
You can also do something like this:-
# export HADOOP_CLASSPATH=`./hbase classpath`
One of the jar which was missing i.e hbase/lib/htrace-core-3.1.0-incubating.jar will be hbase classpath. and should work in this case.
I am running this command:
hadoop jar hadoop-streaming.jar -D stream.tmpdir=/tmp -input "<input dir>" -output "<output dir>" -mapper "grep 20151026" -reducer "wc -l"
Where <input dir> is a directory with many avro files.
And getting this error:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead
limit exceeded at
org.apache.hadoop.hdfs.protocol.DatanodeID.updateXferAddrAndInvalidateHashCode(DatanodeID.java:287)
at
org.apache.hadoop.hdfs.protocol.DatanodeID.(DatanodeID.java:91)
at
org.apache.hadoop.hdfs.protocol.DatanodeInfo.(DatanodeInfo.java:136)
at
org.apache.hadoop.hdfs.protocol.DatanodeInfo.(DatanodeInfo.java:122)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:633)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:793)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1252)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1270)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1413)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1524)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1533)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:557)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.getListing(Unknown Source) at
org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969) at
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNextNoFilter(DistributedFileSystem.java:888)
at
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNext(DistributedFileSystem.java:863)
at
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:267)
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296) at
org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415)
How can this issue be resolved ?
It took a while, but I found the solution here.
Prepending HADOOP_CLIENT_OPTS="-Xmx1024M" to the command solves the problem.
The final commandline is:
HADOOP_CLIENT_OPTS="-Xmx1024M" hadoop jar hadoop-streaming.jar -D stream.tmpdir=/tmp -input "<input dir>" -output "<output dir>" -mapper "grep 20151026" -reducer "wc -l"