Compression codec com.hadoop.compression.lzo.LzoCodec was not found

Compression codec com.hadoop.compression.lzo.LzoCodec was not found - hadoop

Trying to run a mapreduce job with compression
hadoop jar \
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
randomtextwriter \
-Ddfs.replication=1 -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec \
/tmp/randomtextwriter
Using parcels distributed lzo to all nodes in the cluster. Even then I am gettin the below error
Getting below error
Error: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec was not found.
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:140)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:56)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:659)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1731)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2409)
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:138)
... 10 more

As a temporary solution you can manually add the hadoop-lzo jar in the hadoop classpath .
curl https://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/hadoop-lzo-0.4.19.jar
hadoop jar \ /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \ randomtextwriter \ -Ddfs.replication=1 -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec \ /tmp/randomtextwriter --libjars hadoop-lzo-0.4.19.jar
Please make sure you download the compatible version of hadoop-lzo with your hadoop version.

Related

"No access hint found for import selector" error occurs when building graalvm native-image to make fat jar for spring-boot

Referenced blog : https://blog.codecentric.de/en/2020/06/spring-boot-graalvm-native-image-maven-plugin/
The following is my application version information
Spring Boot version : 2.3.3.RELEASE
spring-graalvm-native version : 0.7.0
native-image-maven-plugin version : 20.1.0
native-image command
time native-image \
-J-Xmx4G \
-Dspring.native.verbose=true \
-H:+TraceClassInitialization \
-H:Name=$ARTIFACT \
-H:+ReportExceptionStackTraces \
-H:+ReportUnsupportedElementsAtRuntime \
-Dspring.graal.missing-selector-hints=warning \
-Dspring.graal.remove-unused-autoconfig=true \
-Dspring.graal.remove-yaml-support=true \
-cp $CP $MAINCLASS;
When executed with the above command, the following error occurs.
Excluding 2 auto-configurations from spring.factories file
Processing spring.factories - EnableAutoConfiguration lists #7 configurations
Fatal error:java.lang.IllegalStateException: java.lang.IllegalStateException: No access hint found for import selector: org.springframework.boot.autoconfigure.ImportAutoConfigurationImportSelector
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:593)
at java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1005)
at com.oracle.svm.hosted.NativeImageGenerator.run(NativeImageGenerator.java:463)
at com.oracle.svm.hosted.NativeImageGeneratorRunner.buildImage(NativeImageGeneratorRunner.java:359)
at com.oracle.svm.hosted.NativeImageGeneratorRunner.build(NativeImageGeneratorRunner.java:518)
at com.oracle.svm.hosted.NativeImageGeneratorRunner.main(NativeImageGeneratorRunner.java:117)
Caused by: java.lang.IllegalStateException: No access hint found for import selector: org.springframework.boot.autoconfigure.ImportAutoConfigurationImportSelector
at org.springframework.graalvm.type.Type.getHints(Type.java:1124)
at org.springframework.graalvm.support.ResourcesHandler.processType(ResourcesHandler.java:900)
at org.springframework.graalvm.support.ResourcesHandler.processType(ResourcesHandler.java:1155)
at org.springframework.graalvm.support.ResourcesHandler.processType(ResourcesHandler.java:817)
at org.springframework.graalvm.support.ResourcesHandler.checkAndRegisterConfigurationType(ResourcesHandler.java:807)
at org.springframework.graalvm.support.ResourcesHandler.processSpringFactory(ResourcesHandler.java:741)
at org.springframework.graalvm.support.ResourcesHandler.processSpringFactories(ResourcesHandler.java:578)
at org.springframework.graalvm.support.ResourcesHandler.register(ResourcesHandler.java:122)
at org.springframework.graalvm.support.SpringFeature.beforeAnalysis(SpringFeature.java:78)
at com.oracle.svm.hosted.NativeImageGenerator.lambda$runPointsToAnalysis$7(NativeImageGenerator.java:679)
at com.oracle.svm.hosted.FeatureHandler.forEachFeature(FeatureHandler.java:70)
at com.oracle.svm.hosted.NativeImageGenerator.runPointsToAnalysis(NativeImageGenerator.java:679)
at com.oracle.svm.hosted.NativeImageGenerator.doRun(NativeImageGenerator.java:538)
at com.oracle.svm.hosted.NativeImageGenerator.lambda$run$0(NativeImageGenerator.java:451)
at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Error: Image build request failed with exit status 1
How can I fix it? I ask for help.

Unable run sqoop job in oozie (No enum constant com.cloudera.sqoop.SqoopOptions.FileLayout.ParquetFile)

I successfully created and execute sqoop import job, but unable to run it in oozie workflow. Sqoop imports the data from RDBMS to parquet file in HDFS. It seems the problem is related to parquet format. If I use --as-textfile, the workflow run without any problem.
Also, I've copied all parquet-*.jar from SQOOP_HOME/lib to oozie share lib.
Sqoop 1.4.7 and Oozie 4.3.1.
Sqoop job definition
$ sqoop job --create ingest_amsp_custmaster -- import --connect "jdbc:oracle:thin:#<IP>:<PORT>/<SID>" \
--username <USER> -P \
--table CUSTMASTER \
--as-parquetfile \
--target-dir /warehouse/raw/amsp/custmaster \
--delete-target-dir \
-m 1
Here's what I got from error log:
java.lang.IllegalArgumentException: No enum constant com.cloudera.sqoop.SqoopOptions.FileLayout.ParquetFile
at java.lang.Enum.valueOf(Enum.java:238)
at org.apache.sqoop.SqoopOptions.loadProperties(SqoopOptions.java:522)
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.read(HsqldbJobStorage.java:299)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:198)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:200)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:183)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:64)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:235)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:436)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:350)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:211)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:254)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Any help would be appreciated.
Thanks. Yusata.

Check version of Sqoop in Oozie workflow. It seems like it is not 1.4.7: stack trace line
org.apache.sqoop.SqoopOptions.loadProperties(SqoopOptions.java:522)
is corresponds to https://github.com/apache/sqoop/blob/20af67ef60096b17e1d9585670e5ec787eb760e2/src/java/org/apache/sqoop/SqoopOptions.java#L522

Run time Error in executing a jar file on hdfs

What should i do further?
I have an error message when running this jar file on hadoop system.
hadoop jar units.jar /input_dir/sample.txt /output_dir/result
Exception in thread "main" java.lang.ClassNotFoundException:
/input_di /sample/txt at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:278) at
org.apache.hadoop.util.RunJar.run(RunJar.java:214) at
org.apache.hadoop.util.RunJar.main(RunJar.java:136)

From the Apache Hadoop Docs :
Usage: hadoop jar <jar> [mainClass] args...
Runs a jar file.
You are missing the fully qualified class name in your JAR command.

Unable to import data from Hdfs to Hbase using importtsv

I moved tab delimited file into hdfs now was trying to move it to hbase.
Below is my importtsv command
bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:ok,cf:ek,cf:rk,cf:rsk,cf:pdk,cf:pmk,cf:omk,cf:sok,cf:sdk,cf:cdk,cf:q,cf:uc,cf:up,cf:usp,cf:gm,cf:st,cf:gp -Dimporttsv.skip.bad.lines=false 'sales_fact' hdfs://localhost:54310/my/file.txt
it is trying to read a jar from location which doesnt exists.
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/home/elijah/Downloads/hbase/lib/htrace-core-3.1.0-incubating.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.hadoop.hbase.mapreduce.ImportTsv.run(ImportTsv.java:738)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.hbase.mapreduce.ImportTsv.main(ImportTsv.java:747)
I am not getting why it has mixed up hdfs and local dir path into one.
hdfs://localhost:54310/home/elijah/Downloads/hbase/lib/htrace-core-3.1.0-incubating.jar
User who is running import job has full access to hbase lib on local directory.

I can see -libjars option is missing....You can use -libjars option below is example usage :
hadoop jar \
hbase-server-0.98.6-cdh5.2.1.jar \
importtsv \
-libjars /opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/high-scale-lib-1.1.1.jar \
-Dimporttsv.separator=, -Dimporttsv.bulk.output=output \
-Dimporttsv.columns=HBASE_ROW_KEY,f:count wordcount \
word_count.csv
You can also do something like this:-
# export HADOOP_CLASSPATH=`./hbase classpath`
One of the jar which was missing i.e hbase/lib/htrace-core-3.1.0-incubating.jar will be hbase classpath. and should work in this case.

Hadoop streaming "GC overhead limit exceeded"

I am running this command:
hadoop jar hadoop-streaming.jar -D stream.tmpdir=/tmp -input "<input dir>" -output "<output dir>" -mapper "grep 20151026" -reducer "wc -l"
Where <input dir> is a directory with many avro files.
And getting this error:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead
limit exceeded at
org.apache.hadoop.hdfs.protocol.DatanodeID.updateXferAddrAndInvalidateHashCode(DatanodeID.java:287)
at
org.apache.hadoop.hdfs.protocol.DatanodeID.(DatanodeID.java:91)
at
org.apache.hadoop.hdfs.protocol.DatanodeInfo.(DatanodeInfo.java:136)
at
org.apache.hadoop.hdfs.protocol.DatanodeInfo.(DatanodeInfo.java:122)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:633)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:793)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1252)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1270)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1413)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1524)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1533)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:557)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.getListing(Unknown Source) at
org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969) at
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNextNoFilter(DistributedFileSystem.java:888)
at
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNext(DistributedFileSystem.java:863)
at
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:267)
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296) at
org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415)
How can this issue be resolved ?

It took a while, but I found the solution here.
Prepending HADOOP_CLIENT_OPTS="-Xmx1024M" to the command solves the problem.
The final commandline is:
HADOOP_CLIENT_OPTS="-Xmx1024M" hadoop jar hadoop-streaming.jar -D stream.tmpdir=/tmp -input "<input dir>" -output "<output dir>" -mapper "grep 20151026" -reducer "wc -l"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Compression codec com.hadoop.compression.lzo.LzoCodec was not found - hadoop

Related

"No access hint found for import selector" error occurs when building graalvm native-image to make fat jar for spring-boot

Unable run sqoop job in oozie (No enum constant com.cloudera.sqoop.SqoopOptions.FileLayout.ParquetFile)

Run time Error in executing a jar file on hdfs

Unable to import data from Hdfs to Hbase using importtsv

Hadoop streaming "GC overhead limit exceeded"

Categories

Resources