Pig not working in terminal - hadoop

I am new to pig and i have downloaded from
http://apache.techartifact.com/mirror/pig/pig-0.10.1/
Now when i write pig in my linux terminal it displays the following message
2013-04-26 17:14:53,641 [main] INFO org.apache.pig.Main - Logging error messages to: /home/vishal/Downloads/pig_1366976693634.log
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf
at org.apache.pig.Main.run(Main.java:587)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.JobConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 7 mor
Is it that i have to include jar or what else can be the issue
Thanks

You need to include the mapred java archive, depending on which MapReduce version you use, MRv1 or MRv2 (=YARN).
FYI: java.lang.NoClassDefFoundError is always about a forgotten/mistyped JAR-file

Related

Error with Pig on Yarn - PigStatsUtil - Failed to get running job - IOException

I am hitting against this error on executing Pig jobs on top of Yarn. This error is not fatal though and the jobs seem to complete fine.
Just wanted to investigate this to see if some config is messed up. Did anyone else stumble upon this?
Have seen the same error on this Pig-User thread. Its still Unanswered:
http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3CCA+HR8ZPo=64dx137eHMEaEJcG42ozvSk3LbTjyhmKasS23aBVw#mail.gmail.com%3E
Any tips would be appreciated.
[main] WARN org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get running job
java.io.IOException
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:317)
at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:385)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:495)
at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:185)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:624)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:622)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:622)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:640)
at org.apache.pig.tools.pigstats.PigStatsUtil.addSuccessJobStats(PigStatsUtil.java:345)
at org.apache.pig.tools.pigstats.PigStatsUtil.accumulateStats(PigStatsUtil.java:257)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:324)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
at org.apache.pig.PigServer.execute(PigServer.java:1241)
at org.apache.pig.PigServer.executeBatch(PigServer.java:335)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:475)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

HBase completebulkload returns exception

I am trying to bulk-populate an HBase table quickly from a text file (several GB) by using the bulk loading method described in the Hadoop docs.
I have created an HFile which I now want to push to my HBase table.
When I use this command:
hadoop jar /home/hxcaine/hadoop/lib/hbase.jar completebulkload /user/hxcaine/dbpopulate/output/cf1 my_hbase_table
The job starts and then I get this exception:
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/util/concurrent/ThreadFactoryBuilder
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:195)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:696)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:701)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: java.lang.ClassNotFoundException: com.google.common.util.concurrent.ThreadFactoryBuilder
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 17 more
However, I can see that the Guava jar is in my classpath and when I check inside the jar I can see ThreadFactoryBuilder.class.
I am using these versions (and stuck with them):
Hadoop 0.20.2-cdh3u3
HBase 0.90.4-cdh3u3
Guava jar: /usr/lib/hadoop-0.20/lib/guava-r09-jarjar.jar
I do have an older Guava jar in my classpath but I don't know where it came from, I don't suppose it should have an effect.
Any ideas?
what happens if you run:
export HADOOP_CLASSPATH=`hbase classpath`
before running the load? From the stack trace, it looks like the jar is needed by one of the actual tasks though I am surprised to see that this actually kicks off an M/R job.

Hadoop Job throws NullPointerException in FBUtilities.java

I am getting a NullPointerException when trying to start my hadoop job with access to Cassandra. Here comes the stack trace:
Exception in thread "main" java.lang.NullPointerException
at org.apache.cassandra.utils.FBUtilities.newPartitioner(FBUtilities.java:415)
at org.apache.cassandra.hadoop.ConfigHelper.getOutputPartitioner(ConfigHelper.java:416)
at org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.checkOutputSpecs(ColumnFamilyOutputFormat.java:90)
at org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.checkOutputSpecs(ColumnFamilyOutputFormat.java:81)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:887)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at RowKeyIndexer.run(RowKeyIndexer.java:393)
at Indexer.run(Indexer.java:56)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at Indexer.main(Indexer.java:30)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I am running hadoop version 1.0.3 and cassandra version 1.1.2.
Any help is highly appreciated as I have no idea where to start.
Thanks alot!
It looks like you haven't set your output parititioner, like this:
ConfigHelper.setOutputPartitioner(conf, "org.apache.cassandra.dht.RandomPartitioner");

pig: java.lang.NoClassDefFoundError: org/jruby/embed/ScriptingContainer

pig 0.10.0 supports ruby UDF. So, i am trying a very simple example. but got the following error. Do you know why?
Pig Stack Trace
--------------- ERROR 2998: Unhandled internal error. org/jruby/embed/ScriptingContainer
java.lang.NoClassDefFoundError: org/jruby/embed/ScriptingContainer at
org.apache.pig.scripting.jruby.JrubyScriptEngine.<clinit>(JrubyScriptEngine.java:65)
at java.lang.Class.forName0(Native Method) at
java.lang.Class.forName(Class.java:169) at
org.apache.pig.scripting.ScriptEngine.getInstance(ScriptEngine.java:254)
at org.apache.pig.PigServer.registerCode(PigServer.java:523) at
org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:422)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:419)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at
org.apache.pig.Main.run(Main.java:555) at
org.apache.pig.Main.main(Main.java:111) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by:
java.lang.ClassNotFoundException: org.jruby.embed.ScriptingContainer
at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:190) at
java.lang.ClassLoader.loadClass(ClassLoader.java:306) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at
java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 17 more
================================================================================
I had the same problem. You should look to see if you have a jruby.jar installed w pig.
Seems like the jython.jar was there so maybe that's a friendly nudge for people to use python.
I had to explicitly put the jruby.jar in the classpath doing:
java -cp $PIG_HOME/pig-0.11.1.jar:$PIG_HOME/lib/jruby.jar org.apache.pig.Main -x local myscript.pig

ClassNotFoundException thrown by RecommenderJob (Apache Mahout on Hadoop)

I am using the org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob.java file to run a pseudo distributed recommender. I am using it to run the GenericItemsRecommender class.
The command I am using is
bin/hadoop jar mahout-core-0.7-SNAPSHOT-job org.apache.mahout.cf.taste.hadoop.pesudo.RecommenderJob -Dmapred.input.dir=./ratingsLess.txt -Dmapred.output.dir=/input/output --tempDir /input/tmp --recommenderClassName org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender
When I run it I get an Exception saying :
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.mahout.cf.taste.hadoop.pesudo.RecommenderJob
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
Could you please let me know why I am getting this error?

Resources