Cascading + libjars = ClassNotFoundException. Sometimes - hadoop

I am running Cascading (actually Scalding) hadoop job that uses DistributedCache for dependent jars.
Fist time it works fine (meaning that the classpath is set up correctly) but then it starts failing with ClassNotFoundException:
java.io.IOException: Split class cascading.tap.hadoop.io.MultiInputSplit not found
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:387)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.ClassNotFoundException: cascading.tap.hadoop.io.MultiInputSplit
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:385)
...
Did anybody else have success with Cascading and jars in the DistributedCache
This message seems to imply that Cascading has some internal handling of the distributed cache jars. Any light you can shed on this?
Edit: I am using Cascading 2.1.6 on Hadoop 1.0.3

Which version of hadoop are you using? There are some problems with the distributed cache in 0.20.2. Can you try switching to a newer version?

Chris K Wensel, the author of Cascading responded on the mailing list that Cascading does not do anything with DistributedCache.
I looked further and it was a problem in my code -- I did not add these files to the DistributedCache properly.

Related

Hive action in Oozie failing intermittently - CDH 5.1 - Oozie 4.0.0

We had an oozie workflow with simple "create" and "alter" statements, with "create" statement using "RCFILE" file format in Hive Action.
The challenge we are facing is that this Hive action is executing successfully sometimes and failing sometimes... We weren't able to fix this.
It is throwing "NoSuchMethodError" exception in regard to "serde".
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], main() threw exception, org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.getPrimitiveTypeInfo(Ljava/lang/String;)Lorg/apache/hadoop/hive/serde2/typeinfo/TypeInfo;
java.lang.NoSuchMethodError: org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.getPrimitiveTypeInfo(Ljava/lang/String;)Lorg/apache/hadoop/hive/serde2/typeinfo/TypeInfo;
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerNumericType(FunctionRegistry.java:630)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:636)
at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:208)
at org.apache.hadoop.hive.cli.CliSessionState.<init>(CliSessionState.java:78)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:645)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623)
at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:318)
at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:279)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:38)
at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Can someone help me fix this?
I had the same problem. It was in a different context, but I'm sure the underlying issue is the same. The return type of the org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.getPrimitiveTypeInfo(String) method was narrowed down to PrimitiveTypeInfoin Hive 0.13 to from TypeInfo in prior versions of Hive which broke Java binary compatibility.
It seems that the org.apache.hadoop.hive.ql.exec.FunctionRegistry was compiled with a pre 0.13 version of a hive-serde-x.x.x.jar, but execution classpath includes a version from newer hive-serde-0.13.x.jar.
The solution was to make sure that hive-serde-n.n.n.jar and hive-exec-n.n.n.jar (or wherever org.apache.hadoop.hive.ql.exec.FunctionRegistry is coming from in your case) have consistent versions on the classpath.

Pig not working in terminal

I am new to pig and i have downloaded from
http://apache.techartifact.com/mirror/pig/pig-0.10.1/
Now when i write pig in my linux terminal it displays the following message
2013-04-26 17:14:53,641 [main] INFO org.apache.pig.Main - Logging error messages to: /home/vishal/Downloads/pig_1366976693634.log
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf
at org.apache.pig.Main.run(Main.java:587)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.JobConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 7 mor
Is it that i have to include jar or what else can be the issue
Thanks
You need to include the mapred java archive, depending on which MapReduce version you use, MRv1 or MRv2 (=YARN).
FYI: java.lang.NoClassDefFoundError is always about a forgotten/mistyped JAR-file

ClassNotFoundException, while running example job of Hadoop also

I saw the post and followed the process. But it didn't work.
ClassNotFoundException, while running example job of Hadoop
Help me please.
created mapreduce-0.1-tests.jar( there is actually a MapReduceTest.class)
copied input file form local > hadoop dfs -copyFromLocal input /Users/test/Documents/movie_titles_only.csv
hadoop jar /Users/test/project/mapReduce/target/mapreduce-0.1-tests.jar MapReduceTest -D mapred.reduce.tasks=0 input/movie_titles_only.csv movie_output
BUT!! It was same the message... How do I do, help me please!!
Exception in thread "main" java.lang.ClassNotFoundException: MapReduceTest at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
You need to use job.setJarByClass to specify your jar. Add the following statement in your codes:
job.setJarByClass(MapReduceTest.class);

FileNotFoundException for hive/lib/hive-builtins-0.9.0.jar in Hive

I've a configured Hadoop 0.23 on my local box and got it work with a simple map-reduce wordcount program. I Have configured Hive to work with it. All the DDL queries works fine. But when i fire queries that have aggregates (which will trigger Map-educe jobs)
java.io.FileNotFoundException: File does not exist: /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:738)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:609)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:604)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:604)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:693)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar)'
You should create the same file /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar in your hadoop file system. Then it should work.
Make sure you have the jar in HDFS.

HBase completebulkload returns exception

I am trying to bulk-populate an HBase table quickly from a text file (several GB) by using the bulk loading method described in the Hadoop docs.
I have created an HFile which I now want to push to my HBase table.
When I use this command:
hadoop jar /home/hxcaine/hadoop/lib/hbase.jar completebulkload /user/hxcaine/dbpopulate/output/cf1 my_hbase_table
The job starts and then I get this exception:
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/util/concurrent/ThreadFactoryBuilder
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:195)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:696)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:701)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: java.lang.ClassNotFoundException: com.google.common.util.concurrent.ThreadFactoryBuilder
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 17 more
However, I can see that the Guava jar is in my classpath and when I check inside the jar I can see ThreadFactoryBuilder.class.
I am using these versions (and stuck with them):
Hadoop 0.20.2-cdh3u3
HBase 0.90.4-cdh3u3
Guava jar: /usr/lib/hadoop-0.20/lib/guava-r09-jarjar.jar
I do have an older Guava jar in my classpath but I don't know where it came from, I don't suppose it should have an effect.
Any ideas?
what happens if you run:
export HADOOP_CLASSPATH=`hbase classpath`
before running the load? From the stack trace, it looks like the jar is needed by one of the actual tasks though I am surprised to see that this actually kicks off an M/R job.

Resources