error when reading from S3 using Spark/Hadoop

error when reading from S3 using Spark/Hadoop - hadoop

I am trying to read data from Amazon S3 using the Spark.
but I am getting
java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException
from inside a Hadoop call.
I have tried dwnloading jets3t and adding all the included jars to my classpath,
but it did not help.
Here is the full transcript of what is happening:
scala> val zz = sc.textFile("s3n:/<bucket>/<path>")
13/08/30 19:50:21 INFO storage.MemoryStore: ensureFreeSpace(45979) called with curMem=46019, maxMem=8579469803
13/08/30 19:50:21 INFO storage.MemoryStore: Block broadcast_1 stored as values to memory (estimated size 44.9 KB, free 8.0 GB)
zz: spark.RDD[String] = MappedRDD[3] at textFile at <console>:12
scala> zz.first
13/08/30 19:50:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/08/30 19:50:38 WARN snappy.LoadSnappy: Snappy native library not loaded
java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:224)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:214)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
at spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:76)
at spark.RDD.partitions(RDD.scala:214)
at spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)
at spark.RDD.partitions(RDD.scala:214)
at spark.RDD.take(RDD.scala:764)
at spark.RDD.first(RDD.scala:778)

When you run a Hadoop job you have to set the Hadoop classpath Environmental Variable. Typically, this is done in the Hadoop startup script.
export HADOOP_CLASSPATH=/path/to/yourlib:/path/to/your/other/lib
replace the : with a ; if you are on windows.

Related

Unable to load native-hadoop library for your platform -Rancher

I am using Rancher for manage an environment , I am using Hadoop + Yarn (Experimental) for flink and zookeeper in rancher .
I am trying to configure the hdfs on the flink-conf.yaml.
This is the changes that I made in connection to Hdfs :
fs.hdfs.hadoopconf: /etc/hadoop
recovery.zookeeper.storageDir: hdfs://:8020/flink/recovery
state.backend.fs.checkpointdir: hdfs://:8020/flink/checkpoints
And I get an error that say that :
2016-09-06 14:10:44,965 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
What I did wrong ?
Best regards

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

alpesh#alpesh-Inspiron-3647:~/hadoop-2.7.2/sbin$ hadoop fs -ls
16/07/05 13:59:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
It is also showing me the the output as below
hadoop check native -a
16/07/05 14:00:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Native library checking:
hadoop: false
zlib: false
snappy: false
lz4: false
bzip2: false
openssl: false
16/07/05 14:00:42 INFO util.ExitUtil: Exiting with status 1
Please help me to solve this

Library you are using is compiled for 32 bit and you are using 64 bit version. so open your .bashrc file where configuration for hadoop exists. Go to this line
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
and replace it with
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native"

To get rid of this error:
Suppose Jar file is at /home/cloudera/test.jar and class file is at /home/cloudera/workspace/MapReduce/bin/mapreduce/WordCount, where mapreduce is the package name.
Input file mytext.txt is at /user/process/mytext.txt and output file location is /user/out.
We should run this mapreduce program in following way:
$hadoop jar /home/cloudera/bigdata/text.jar mapreduce.WordCount /user/process /user/out

Add these properties in bash profile of Hadoop user, the issue will be solved
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_COMMON_LIB_NATIVE_DIR"

It's just a warning, because it can not find the correct .jar. Either by compiling it or because it does not exist.
If I were you, I would simply omit it
To do that add in the corresponding configuration file
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR

Hadoop Wordcount Program Compile Error

I am new to hadoop programming.I am using eclipse for hadoop development.I added all jar files through java buildpath when i run my program it is not running and giving this error,so please help me.how to solve error?
14/05/31 23:33:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/31 23:33:10 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/05/31 23:33:10 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-deep/mapred/staging/deep689130586/.staging/job_local689130586_0001
14/05/31 23:33:10 ERROR security.UserGroupInformation: PriviledgedActionException as:deep cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/ already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/ already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:975)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
at hadoop1.MyJob.run(MyJob.java:57)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at hadoop1.MyJob.main(MyJob.java:63)

The native libraries failed to load.
This can be due to you are using a 64-bit machine but the hadoop distro is for 32-bit. You can follow the steps here http://www.ercoppa.org/Linux-Compile-Hadoop-220-fix-Unable-to-load-native-hadoop-library.htm to recompile hadoop for 64 bit, and then replace the native libs.

Hadoop compression : "Loaded native gpl library" but "Failed to load/initialize native-lzo library"

after several try installing Lzo compression for hadoop, I need help because I have really no idea why it doesn't work.
I'using hadoop 1.0.4 on CentOs 6. I tried http://opentsdb.net/setup-hbase.html, https://github.com/kevinweil/hadoop-lzo and some others but i'm still getting error :
13/07/03 19:52:23 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
13/07/03 19:52:23 WARN lzo.LzoCompressor: java.lang.NoSuchFieldError: workingMemoryBuf
13/07/03 19:52:23 ERROR lzo.LzoCodec: Failed to load/initialize native-lzo library
even if native gpl is loaded. I've updated my mapred-site and core-site according to links below, I've copy/paste libs in right path (still according to links).
The real problem is that the lzo test works on the namenode :
13/07/03 18:55:47 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
13/07/03 18:55:47 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev ]
I've try setting several path in haddop-env.sh but there seems to be no right solution...
So, if you have any idea, link ... ? I'm really interested
[edit] after a week, i'm still trying to make it functionnal.
I've try sudhirvn.blogspot.fr/2010/08/hadoop-lzo-installation-errors-and.html but removing all Lzo and gplcompression libraries and then making a nez install was not better at all.
Is that due to my hadoop core version ? Is it possible to have hadoop-core-0.20 and hadoop-core-1.0.4 at the same time ? Should i compile Lzo on a 0.20 hadoop in order to use lzo ?
By the way I already tried compiling hadoop-lzo like this :
CLASSPATH=/usr/lib/hadoop/hadoop-core-1.0.4.jar CFLAGS=-m64 CXXFLAGS=-m64 ant compile-native tar
If it helps the full error is :
INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
WARN lzo.LzoCompressor: java.lang.NoSuchFieldError: workingMemoryBuf
ERROR lzo.LzoCodec: Failed to load/initialize native-lzo library
INFO lzo.LzoIndexer: [INDEX] LZO Indexing file test/table.lzo, size 0.00 GB...
WARN snappy.LoadSnappy: Snappy native library is available
INFO util.NativeCodeLoader: Loaded the native-hadoop library
INFO snappy.LoadSnappy: Snappy native library loaded
Exception in thread "main" java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzopCodec.createDecompressor(LzopCodec.java:87)
at com.hadoop.compression.lzo.LzoIndex.createIndex(LzoIndex.java:229)
at com.hadoop.compression.lzo.LzoIndexer.indexSingleFile(LzoIndexer.java:117)
at com.hadoop.compression.lzo.LzoIndexer.indexInternal(LzoIndexer.java:98)
at com.hadoop.compression.lzo.LzoIndexer.index(LzoIndexer.java:52)
at com.hadoop.compression.lzo.LzoIndexer.main(LzoIndexer.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I really want to use lzo because I have to deal with very large files on a rather small cluster (5 nodes). Having splittable compressed files could make it run really fast.
Every remark or idea is welcome.

I was having the exact same issue and finally resolved it by randomly choosing a datanode, and checking whether lzop was installed properly.
If it wasn't, I did:
sudo apt-get install lzop
Assuming you are using Debian-based packages.

I had this same issue on my OSX Machine. The problem was solved when I removed hadoop-lzo.jar (0.4.16) from my classpath and put hadoop-gpl-compression jar instead.

wordcount from eclipse

I was using the eclipse plugin for hadoop. I can see all the files in HDFS by making a hadoop server but when I try to run the wordcount.java file from the eclipse, it gives me exception whereas from the terminal it runs smoothly. The exception is below.
2/11/14 04:09:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/11/14 04:09:06 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
12/11/14 04:09:06 WARN snappy.LoadSnappy: Snappy native library not loaded
12/11/14 04:09:06 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-hduser/mapred/staging/hduser1728681403/.staging/job_local_0001
12/11/14 04:09:06 ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/hduser/gutenberg
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/hduser/gutenberg
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at WordCount.run(WordCount.java:149)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at WordCount.main(WordCount.java:155)

I'd start with investigating this:
ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/hduser/gutenberg
It seems it causes the problem. Are you sure this is the proper path? If so, you may not have the privilage to access to it. Later on I would try to eliminate as much WARN as I can.

Thanks Shujaat that solved my problem. From Eclipse I was getting the same issue...
use hdfs://localhost:54310/user/... instead of "/user/..."

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

error when reading from S3 using Spark/Hadoop - hadoop

When you run a Hadoop job you have to set the Hadoop classpath Environmental Variable. Typically, this is done in the Hadoop startup script. export HADOOP_CLASSPATH=/path/to/yourlib:/path/to/your/other/lib replace the : with a ; if you are on windows.

Related

Unable to load native-hadoop library for your platform -Rancher

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Hadoop Wordcount Program Compile Error

Hadoop compression : "Loaded native gpl library" but "Failed to load/initialize native-lzo library"

wordcount from eclipse

Categories

Resources