pig mapreduce modea and UDF jar - hadoop

I created own UDF jar and when I run pig in local mode everything works fine. The UDF jar is stored on my local FS.
When I execute pig script in mapreduce mode (-x mapreduce) and UDF jar is also placed in local FS, I got the error below. Could you please advice what is wrong? Do I have to place the UDF jar into HDFS also?
Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias sourceData
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sourceData
at org.apache.pig.PigServer.openIterator(PigServer.java:935)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:746)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:558)
at org.apache.pig.Main.main(Main.java:170)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:927)
... 13 more
================================================================================
UPDATE:
If i execute the pig script (below) in mapreduce mode all works fine:
sd = load 'hdfs://default:8020/hdfs/customerData.csv' using PigStorage(';') as (name: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: int,Fuel_Consumption: double);
dump sd;
but when i include at the beginning my UDF function, even i dont call/use it, when i execute the script i got the Iterator error.
register /root/MyUDF.jar;
sd = load 'hdfs://default:8020/hdfs/customerData.csv' using PigStorage(';') as (name: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: int,Fuel_Consumption: double);
dump sd;
Should I provide the UDF code?
UPDATE2:
this is what i get from the yarn log
2016-03-01 11:31:33,399 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config null
2016-03-01 11:31:33,443 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:225)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:275)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:470)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:452)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1541)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:452)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:371)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1499)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1496)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
2016-03-01 11:31:33,457 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1
Im still getting error:
2016-03-07 14:29:48,625 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:235)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:285)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:470)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:452)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1541)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:452)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:371)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1499)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1496)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
2016-03-07 14:29:48,628 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1

Related

Can not write to Hive table from pig

I have weird situation. When I'm running pig script as test1 user, script executes successfully:
pig -param_file /tmp/pig_parameters.param -param DBNAME=default -param TABLENAME=test_pig_table_orc -param FPATH=/data/170622164344.csv /tmp/test.pig
2017-10-31 14:40:40,968 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2017-10-31 14:40:41,057 [Thread-7] INFO hive.metastore - Closed a connection to metastore, current connections: 1
2017-10-31 14:40:41,058 [Thread-7] INFO hive.metastore - Closed a connection to metastore, current connections: 0
Scripts simple load data from csv and stores data into hive table
But when I connect to the server as another user - test2, and run the same script, got this exception :
Pig Stack Trace
---------------
ERROR 1115: org.apache.hive.hcatalog.common.HCatException : 2001 : Error setting output information. Cause : org.apache.thrift.transport.TTransportException
org.apache.pig.impl.plan.VisitorException: ERROR 1115:
<line 27, column 0> Output Location Validation Failed for: 'default.test_pig_table_orc More info to follow:
org.apache.hive.hcatalog.common.HCatException : 2001 : Error setting output information. Cause : org.apache.thrift.transport.TTransportException
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:75)
at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:311)
at org.apache.pig.PigServer.compilePp(PigServer.java:1392)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1317)
at org.apache.pig.PigServer.execute(PigServer.java:1309)
at org.apache.pig.PigServer.executeBatch(PigServer.java:387)
at org.apache.pig.PigServer.executeBatch(PigServer.java:365)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at org.apache.pig.tools.grunt.GruntParser.processScript(GruntParser.java:504)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.Script(PigScriptParser.java:1014)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:550)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:547)
at org.apache.pig.Main.main(Main.java:158)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.pig.PigException: ERROR 1115: org.apache.hive.hcatalog.common.HCatException : 2001 : Error setting output information. Cause : org.apache.thrift.transport.TTransportException
at org.apache.hive.hcatalog.pig.HCatStorer.setStoreLocation(HCatStorer.java:196)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:68)
... 30 more
Caused by: org.apache.hive.hcatalog.common.HCatException : 2001 : Error setting output information. Cause : org.apache.thrift.transport.TTransportException
at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:220)
at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:70)
at org.apache.hive.hcatalog.pig.HCatStorer.setStoreLocation(HCatStorer.java:191)
... 31 more
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1254)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1240)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1263)
at org.apache.hive.hcatalog.common.HCatUtil.getTable(HCatUtil.java:180)
at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:91)
... 33 more
Both users are members of supergroup and have equal permissions.
Script runs from the same server.
Tried to place script .pig file localy and on hdfs as well - the same error
Also important point, that it runs successfully from each worker, except master node. Cluster has kerberos authentication
Got stuck with this issue, pls suggest what I could try to fix it?
Solved, by removing hive-site.xml from test2 user home folder. Or just simply run script being in another directory
In my case there was an old hive-site.xml without kerberos configuration parameters in test2 user home folder. When this user ran pig script, by default it applied file conf parameters from home folder (not only hive), if they are located there.

Pig Service Check failed using - User: rm/sandbox.hortonworks.com#HDP-SANDBOX is not allowed to impersonate ambari-qa

I ran pig service check using Ambari but it failed and got below exception.
2016-04-09 20:35:19,399 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /user/ambari-qa/.staging/job_1460043791266_0012
2016-04-09 20:35:19,407 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:pigSmoke.sh got an error while submitting
java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1460043791266_0012 to YARN : User: rm/sandbox.hortonworks.com#HDP-SANDBOX is not allowed to impersonate ambari-qa
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1460043791266_0012 to YARN : User: rm/sandbox.hortonworks.com#HDP-SANDBOX is not allowed to impersonate ambari-qa
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:271)
at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
... 16 more
2016-04-09 20:35:19,410 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1460043791266_0012
2016-04-09 20:35:19,410 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B
2016-04-09 20:35:19,410 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M:
Any pointer will be very much helpful.
Thanks in advance.
Take a look at the property "hadoop.proxyuser.yarn.groups" in the core-site.xml, is it configured? You can check the configs in the ambari ui: HDFS -> Configs -> Advanced Tab -> Custom core-site. You may want to set the value of the property to "*" to see if that solves the issue. If it does you can tighten security by defining only particular groups, instead of all groups (*).

Debugging hadoop Wordcount program in eclipse in windows

Trying to run Wordcount program in hadoop in eclipse (windows 7). and passing these argument in eclipse only
E:\hadoop\eclipse-hadoop-pro\workspace-hadoop\WordCountPro\input\word.txt
E:\hadoop\eclipse-hadoop-pro\workspace-hadoop\WordCountPro\output
I have created input file in project only like input folder and inside it word.txt file
But it is throughing below excption
2015-04-08 15:30:09,947 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-04-08 15:30:10,238 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(373)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable E:\hadoop\hadoop-HADOOP_HOME\hadoop-2.6.0\bin\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:363)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
at org.apache.hadoop.security.Groups.<init>(Groups.java:86)
at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
at org.apache.hadoop.mapreduce.task.JobContextImpl.<init>(JobContextImpl.java:72)
at org.apache.hadoop.mapreduce.Job.<init>(Job.java:144)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:187)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:206)
at com.WordCount.main(WordCount.java:52)
2015-04-08 15:30:11,039 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1049)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2015-04-08 15:30:11,041 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/E:/hadoop/eclipse-hadoop-pro/workspace-hadoop/WordCountPro/output already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:562)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at com.WordCount.main(WordCount.java:61)
I doubt if Hadoop is installed correctly. Check in your machine if all the daemons are running or not.If not, then consider re-checking or re-installing what you are missing.
ERROR [main] util.Shell (Shell.java:getWinUtilsPath(373)) - Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable

not able to run pig on windows 7 using cygwin

I configured pig as directed in the documentation.
Enviornment: Windows 7, Hadoop-0.20.2, pig 0.13.0, Cygwin
But when i type pig (mapreduce) on the command prompt it just displays below thing. I am not sure whether pig is started or not. I don't see GRUNT shell to execute script.
Btw, Hadoop is running on the same node.
Can someone please help?
$ pig
Find hadoop at /hadoop-0.20.2/bin/hadoop
dry run:
HADOOP_CLASSPATH: C:\cygwin64\pig-0.13.0\conf;C;C:\Program Files\Java\jdk1.6.0_25\lib\tools.jar;C;C:\cygwin64\hadoop-0.20.2\conf;C:\cygwin64\pig-0.13.0\lib\accumulo-core-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\accumulo-fate-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\accumulo-server-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\accumulo-start-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\accumulo-trace-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\avro-1.7.5.jar;C:\cygwin64\pig-0.13.0\lib\avro-mapred-1.7.5.jar;C:\cygwin64\pig-0.13.0\lib\avro-tools-1.7.5-nodeps.jar;C:\cygwin64\pig-0.13.0\lib\groovy-all-1.8.6.jar;C:\cygwin64\pig-0.13.0\lib\hbase-0.94.1.jar;C:\cygwin64\pig-0.13.0\lib\jruby-complete-1.6.7.jar;C:\cygwin64\pig-0.13.0\lib\js-1.7R2.jar;C:\cygwin64\pig-0.13.0\lib\json-simple-1.1.jar;C:\cygwin64\pig-0.13.0\lib\jython-standalone-2.5.3.jar;C:\cygwin64\pig-0.13.0\lib\piggybank.jar;C:\cygwin64\pig-0.13.0\lib\protobuf-java-2.4.0a.jar;C:\cygwin64\pig-0.13.0\lib\zookeeper-3.4.5.jar:C:\cygwin64\PIG-01~1.0/pig-withouthadoop-h2.jar:
HADOOP_OPTS: -Xmx1000m -Dpig.log.dir=C:\cygwin64\PIG-01~1.0\logs -Dpig.log.file=pig.log -Dpig.home.dir=C:\cygwin64\PIG-01~1.0\
HADOOP_CLIENT_OPTS: -Xmx1000m -Dpig.log.dir=C:\cygwin64\PIG-01~1.0\logs -Dpig.log.file=pig.log -Dpig.home.dir=C:\cygwin64\PIG-01~1.0\
/hadoop-0.20.2/bin/hadoop jar C:\cygwin64\PIG-01~1.0/pig-withouthadoop-h2.jar
when i run in debug mode, i see this exception. This is because Hadoop Jar is not set.
localhsot#mymachine~
$ echo $PIG_INSTALL
C:\cygwin64\pig-0.13.0
localhsot#mymachine~
$ export PIG_INSTALL=/cygdrive/c/cygwin64/pig-0.13.0
localhsot#mymachine~
$ export HADOOP_INSTALL=/cygdrive/c/cygwin64/hadoop-0.20.2/
localhsot#mymachine~
$ export PATH=$PATH:$PIG_INSTALL/bin:$HADOOP_INSTALL/bin
$ pig
14/08/26 14:05:12 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/08/26 14:05:12 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/08/26 14:05:12 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-08-26 14:05:12,998 [main] INFO org.apache.pig.Main - Apache Pig version 0. 13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-08-26 14:05:12,998 [main] INFO org.apache.pig.Main - Logging error message s to: C:\cygwin64\home\chparekh\pig_1409076312996.log
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/map reduce/task/JobContextImpl
at org.apache.pig.tools.pigstats.PigStatsUtil.<clinit>(PigStatsUtil.java :68)
at org.apache.pig.Main.run(Main.java:643)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.task.Jo bContextImpl
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 8 more
you can refer below link for the same, i hope this will help you.
http://abhijitsureshshingate.wordpress.com/2013/07/08/code-debug-test-apache-pig-scripts-using-eclipse-on-windows/

Pig 0.13 ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl

Just installed Pig 0.13 and I am attempting to use it with Hadoop 1.1.2. (Pig documentation states Pig 0.13 is compatible with Hadoop 1.1.2). Per the Pig install instructions, I set $PIG_CLASSPATH
to point at /etc/hadoop where core-site.xml, hdfs-site.xml, and mapred-site.xml are defined. Hadoop cluster is functional and works fine with non-Pig jobs. Based on the error descriptions below, I understand that Pig cannot find the JobContextImpl class it is looking for.
Based on the Hadoop 1.1.2 API documentation, I don't believe "task" is a sub-package of the "mapreduce" package. I have tried adding hadoop-core-1.1.2.jar directly to $PIG_CLASSPATH
and that did not work. (After looking at the contents of hadoop-core-1.1.2.jar, and the Hadoop 1.1.2 API documentation, I don't believe JobContextImpl is defined in the package Pig is attempting to load it from). How do I get Pig 0.13 to work with Hadoop 1.1.2?
=======Error follows as below=======
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-08-03 14:01:05,959 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-08-03 14:01:05,959 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop/pig-0.13.0/bin/pig_1407088865958.log
2014-08-03 14:01:06,112 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.localdomain:8020/
2014-08-03 14:01:06,388 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master.localdomain:8021
2014-08-03 14:01:06,440 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl
Details at logfile: /home/hadoop/pig-0.13.0/bin/pig_1407088865958.log
Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class org.apache.pig.tools.pigstats.PigStatsUtil
at org.apache.pig.Main.run(Main.java:643)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
===Contents of pig_1407088865958.log ===
Pig Stack Trace
ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/task/JobContextImpl
at org.apache.pig.tools.pigstats.PigStatsUtil.<clinit>(PigStatsUtil.java:68)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:79)
at org.apache.pig.Main.run(Main.java:510)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.task.JobContextImpl
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 9 more
Though it is unclear how well this works for everyone, it appears that the asker mentioned how he solved the problem:
In my searching for help I saw posts stating that it needs to be
recompiled with a parameter that indicates version. The parameter
values I saw were 23,24. I did not know how that parameter mapped to
the version of hadoop that I am using 1.1.2. I hacked the bin/pig
script to point to hadoop-core-1.1.2.jar. The script requires
HADOOP_HOME be set (which is deprecated).

Resources