EMR Hadoop Pig job error "Internal error creating job configuration" - hadoop

I have a PIG job running on Amazon EMR and suddnly it has stopped working giving the following error:
Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:855)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:294)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1249)
at org.apache.pig.PigServer.execute(PigServer.java:1239)
at org.apache.pig.PigServer.executeBatch(PigServer.java:333)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:479)
at org.apache.pig.Main.main(Main.java:159)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
Caused by: java.lang.NullPointerException
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.adjustNumReducers(JobControlCompiler.java:875)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:480)
... 17 more
================================================================================
Does anyone know why or what might be the problem? this is one of the most vague errors I have ever seen.

The problem actually turned out to be that PIG was unable to locate one of the input files to be processed, yet the error doesn't even remotely suggest a missing file issue.

Related

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/python/google/common/collect/Lists

I'm just getting started with Pig and I'm facing lots of issues with running my first program. Any help is much appreciated.
I've tried resolving using these:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/hadoop/hbase/filter/WritableByteArrayComparable
and
Pig Installation error: ERROR pig.Main: ERROR 2998: Unhandled internal error
but none of them seem to work. Can someone give a more detailed solution of what needs to be done.
Pig version: 0.17.0
Stack Trace:
Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. org/python/google/common/collect/Lists
java.lang.NoClassDefFoundError: org/python/google/common/collect/Lists
at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.getTaskReports(MRJobStats.java:533)
at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addMapReduceStatistics(MRJobStats.java:355)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:232)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:164)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:379)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1475)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1460)
at org.apache.pig.PigServer.storeEx(PigServer.java:1119)
at org.apache.pig.PigServer.store(PigServer.java:1082)
at org.apache.pig.PigServer.openIterator(PigServer.java:995)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:782)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:383)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:630)
at org.apache.pig.Main.main(Main.java:175)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.ClassNotFoundException: org.python.google.common.collect.Lists
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 24 more
================================================================================
I think this is a bug in some versions of Pig.
It seems that it (MRJobStats.java) is using org.python.google.common.collect.* classes instead of org.google.common.collect.*, and the former are not on the classpath.
The bug was fixed in this commit:
https://github.com/apache/pig/commit/6dd3ca4deb84edd9edd7765aa1d12f89a31b1283
in July 2017. Unfortunately, that is after the pig 0.17.0 release that you are using; see https://github.com/apache/pig/blob/trunk/CHANGES.txt.
So you will most likely need to checkout and build pig for yourself. There is a link to the build instructions in the README.txt file on Github: https://github.com/apache/pig

HORTONWORKS - Hbase/Phoenix - WALEditCodec - missing

I am receiving the following error while trying to run Phoenix on top of Hbase:
EXCEPTION #1:
2017-11-07 12:40:12,620 WARN [RS_LOG_REPLAY_OPS-XXX:16020-0]
regionserver.SplitLogWorker: log splitting of
WALs/XXX.XXX.XXX.XXX,16020,1507179047656-
splitting/XXX.XXX.XXX.XXX%2C16020%2C1507179047656.default.1507179049782 failed, returning error
java.io.IOException: Cannot get log reader
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:355)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:839)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:763)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:235)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:104)
at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:72)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
:$
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:355)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:839)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:763)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:235)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:104)
at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:72)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.UnsupportedOperationException: Unable to find org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
at org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:36)
at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec.create(WALCellCodec.java:103)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.getCodec(ProtobufLogReader.java:297)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:307)
at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:82)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:164)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303)
... 11 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:32)
... 17 more
APPLIED PATCHES #1:
I have applied the following settings through the Ambari web UI for the advanced Hbase configs as specified by the Hortonworks document:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_command-line-upgrade/content/configure-phoenix-25.html
EXCEPTION #2
FATAL [RS_LOG_REPLAY_OPS-XXX:16020-1] conf.Configuration: error parsing conf core-site.xml
java.io.FileNotFoundException: /etc/hadoop/2.6.1.0-129/0/core-site.xml (Too many open files)
APPLIED PATCHES #2
I checked each 'core-site.xml' file on each server that contained a Hbase region server and made sure it ended with a </configuration>. As well as the core-site.xml files in the specified directory '/etc/hadoop/2.6.1.0-129/0/core-site.xml'
Haven't been able to find any other information regarding this issue.
I went into the HDFS and deleted all WAL split logs using the following command:
hdfs dfs -rm -r /apps/hbase/data/WALs/*splitting*
This resolved exception #1. Keep in mind from what I've read this will incur data loss.
For exception #2 I went back and checked the open file limits for each server (ulimit -n) and updated where applicable with respect to the Hortonworks doc:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_security/content/kerb-config-limits.html

Pig script running on MapReduce but not on Tez

I am using version of Pig(0.16.0) and Tez version is 0.9.0. The pig script is running fine on MapReduce, but not with Tez. I had tried change tez-0.8.(3-5) still not work. Can this be a version mismatch problem? Please have a look at the logs:
ERROR 2017: Internal error creating job configuration.
org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:137)
at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:78)
at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:198)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:308)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1474)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1459)
at org.apache.pig.PigServer.execute(PigServer.java:1448)
at org.apache.pig.PigServer.executeBatch(PigServer.java:488)
at org.apache.pig.PigServer.executeBatch(PigServer.java:471)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:172)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:235)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:206)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:501)
at org.apache.pig.Main.main(Main.java:176)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.NoSuchMethodException: org.apache.tez.dag.api.DAG.setCallerContext(org.apache.tez.client.CallerContext)
at java.lang.Class.getMethod(Class.java:1786)
at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:128)
... 20 more
================================================================================

PIG Unable to Read Local CSV Leading to Job Failure

Relatively new to the pig/hadoop ecosystem and encountering a frustrating issue when trying to execute a simple DUMP. I am trying to call the below pig script (the file is local, not HFDS, so I am opening the pig shell using pig -x local).
REGISTER utils.py USING jython AS utils;
events = LOAD '../test/events.csv' USING PigStorage(',') AS (patientid:int, eventid:chararray, eventdesc:chararray, timestamp:chararray, value:float);
events = FOREACH events GENERATE patientid, eventid, ToDate(timestamp, 'yyyy-MM-dd') AS etimestamp, value;
DUMP events;
However, when doing this, I receive the following error messages (failed job summary below, full PIG stack trace at bottom):
Input(s): Failed to read data from "file:///bootcamp/test/events.csv"
Output(s): Failed to produce result in "file/tmp/temp/305054006/tmp-908064458"
Pig Stack Trace:
ERROR 1066: Unable to open iterator for alias events. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias events. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
at org.apache.pig.PigServer.openIterator(PigServer.java:925)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:746)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:558)
at org.apache.pig.Main.main(Main.java:170)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.getStats(MapReduceLauncher.java:822)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:452)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:280)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
at org.apache.pig.PigServer.storeEx(PigServer.java:1034)
at org.apache.pig.PigServer.store(PigServer.java:997)
at org.apache.pig.PigServer.openIterator(PigServer.java:910)
... 13 more
Caused by: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:294)
at org.apache.hadoop.mapreduce.Job.getTaskReports(Job.java:540)
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getTaskReports(HadoopShims.java:235)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.getStats(MapReduceLauncher.java:801)
...20 more
I have seen similar issues in regards to failed jobs, but sadly I haven't managed to hunt down a resolution as of yet.
EDIT: I should mention that when following the PIG tutorial at the below link, I was encountering the same issue.
http://www.sunlab.org/teaching/cse8803/fall2016/lab/hadoop-pig/
So, I found I was able to "DUMP" the file by doing the following:
tmp = events 100000; --any int larger than number of rows
dump tmp;
I had seen a similar issue on here, and was able to resolve by running as root.

Cannot start Hive in terminal

I have installed and Configured Apache Hive-1.2.1 long ago. It worked fine. Recently I have installed Apache Spark-2.7.0 and started using its shells. Now when I want to work with Hive again, it didn't start. It's showing the following error:
Exception in thread "main" java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log(Lorg/slf4j/Marker;Ljava/lang/String;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V
at org.apache.commons.logging.impl.SLF4JLocationAwareLog.debug(SLF4JLocationAwareLog.java:133)
at org.apache.hadoop.hive.common.LogUtils.logConfigLocation(LogUtils.java:147)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jDefault(LogUtils.java:128)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:77)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:58)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:637)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
I tried reinstalling Hive, but the same error follows. Is this error due to installing Spark? How can I run Hive normally again?
It seems that you are having a conflict with your logging library. This question could help you: java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log

Resources