We're running Hive on Hadoop leveraging Google Cloud Storage using bdutil version 1.3.2. One of the options that we installed is TEZ.
AMBARI_SERVICES='FALCON FLUME GANGLIA HBASE HDFS KAFKA KERBEROS MAPREDUCE2 NAGIOS OOZIE SLIDER SQOOP STORM TEZ YARN ZOOKEEPER KNOX HIVE PIG'
When running a query using the mapreduce query engine, everything runs as expected:
set hive.execution.engine=mr;
But, if I change the execution engine to tez:
set hive.execution.engine=tez;
I get a long stacktrace, but the error is:
Status: Failed
Vertex failed, vertexName=Map 4, vertexId=vertex_1443128249582_0182_1_00, diagnostics=[Vertex vertex_1443128249582_0182_1_00 [Map 4] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: t1 initializer failed, vertex=vertex_1443128249582_0182_1_00 [Map 4], java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
I'm assuming that means that the gcs connector jar isn't in the right path, so I've added a link to the jar to the below paths, but still get the same error:
/usr/hdp/2.2.6.0-2800/hadoop/lib/gcs-connector-1.4.2-hadoop2.jar
/usr/hdp/2.2.6.0-2800/tez/lib/gcs-connector-1.4.2-hadoop2.jar
/usr/hdp/2.2.6.0-2800/tez/gcs-connector-1.4.2-hadoop2.jar
/usr/lib/hadoop/lib/gcs-connector-1.4.2-hadoop2.jar
Any thoughts on what needs changed to enable tez?
Thanks for your help.
The full stack trace is:
Status: Failed
Vertex failed, vertexName=Map 4, vertexId=vertex_1443128249582_0182_1_00, diagnostics=[Vertex vertex_1443128249582_0182_1_00 [Map 4] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: t1 initializer failed, vertex=vertex_1443128249582_0182_1_00 [Map 4], java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2076)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2601)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2614)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2635)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:130)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1982)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
... 24 more
]
Vertex failed, vertexName=Map 3, vertexId=vertex_1443128249582_0182_1_01, diagnostics=[Vertex vertex_1443128249582_0182_1_01 [Map 3] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: t2 initializer failed, vertex=vertex_1443128249582_0182_1_01 [Map 3], java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2076)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2601)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2614)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2635)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:130)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1982)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
... 24 more
]
Vertex killed, vertexName=Map 1, vertexId=vertex_1443128249582_0182_1_02, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1443128249582_0182_1_02 [Map 1] killed/failed due to:null]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1443128249582_0182_1_03, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1443128249582_0182_1_03 [Reducer 2] killed/failed due to:null]
DAG failed due to vertex failure. failedVertices:2 killedVertices:2
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
Related
I have a Hive Table in ORC format, where:
My table is partition by: Year + Month + Day.
Total file size in HDFS: 10TB.
Total records: 21 Billion records.
Total data nodes in HDP: 8
I used Zeppelin notebook to query the table using jdbc(hive)
SELECT `timestamp`, url FROM events where id='f9e43fc7b' ORDER BY `timestamp` DESC
However, the query runs into below error:
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
Vertex failed, vertexName=Map 1, vertexId=vertex_1591402457216_0009_2_01, diagnostics=[Vertex vertex_1591402457216_0009_2_01 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE,
Vertex Input: evt initializer failed, vertex=vertex_1591402457216_0009_2_01 [Map 1], java.lang.OutOfMemoryError: Java heap space
at java.util.regex.Matcher.<init>(Matcher.java:225)
at java.util.regex.Pattern.matcher(Pattern.java:1093)
at org.apache.hadoop.hive.ql.io.AcidUtils$BucketMetaData.parse(AcidUtils.java:318)
at org.apache.hadoop.hive.ql.io.AcidUtils$BucketMetaData.parse(AcidUtils.java:332)
at org.apache.hadoop.hive.ql.io.AcidUtils.parseBucketId(AcidUtils.java:367)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.determineSplitStrategy(OrcInputFormat.java:2331)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.determineSplitStrategies(OrcInputFormat.java:2306)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1811)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1939)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:522)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:777)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1591402457216_0009_2_02, diagnostics=[Vertex received Kill in INITED state.,
Vertex vertex_1591402457216_0009_2_02 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:401)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:266)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:718)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:801)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:633)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I'm using HDP-3.0.1.0 with Ambari. Here's my current Hive config:
Tez Container Size: 3072 MB
HiveServer2 Heap Size: 4096 MB
Memory: 819.2 MB
Data per Reducer: 2042.9 MB
They are mostly the default values. Any suggestion on which to tweak for optimum performance?
I have a very strange behavior with my cluster, I am running Delete/Update statement in Hive cli which sometime works fine and some time fails below is the Command. Table is ACID enabled.
hive> delete from temptable where name='Jose';
Exception is as follows:
Status: Running (Executing on YARN cluster with App id application_15217_3223)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 FAILED -1 0 0 -1 0 0
Reducer 2 KILLED 2 0 0 2 0 0
--------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 0.23 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_15217_3223_1_00, diagnostics=[Vertex vertex_15217_3223_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: offset initializer failed, vertex=vertex_1514368279217_3223_1_00 [Map 1], java.lang.RuntimeException: serious problem
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1273)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1300)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 1
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1268)
... 15 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSargColumnNames(OrcInputFormat.java:358)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setSearchArgument(OrcInputFormat.java:392)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1011)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2000(OrcInputFormat.java:838)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:992)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:989)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:989)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:838)
... 4 more
]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_15217_3223_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_15217_3223_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_15217_3223_1_00, diagnostics=[Vertex vertex_15217_3223_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: offset initializer failed, vertex=vertex_15217_3223_1_00 [Map 1], java.lang.RuntimeException: serious problem
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1273)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1300)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 1
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1268)
... 15 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSargColumnNames(OrcInputFormat.java:358)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setSearchArgument(OrcInputFormat.java:392)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1011)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2000(OrcInputFormat.java:838)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:992)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:989)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:989)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:838)
... 4 more
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_15217_3223_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_15217_3223_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
Unable to find any Solution.
What is the cause, Do I need to enable something or do I need to verify any config or missing something in the cli?
You can try increasing the memory that is being used to run the query. Could be it is running out of memory allocation before it can finish the query. Also try using Tez, I find it normally runs faster... explanation between the two can be found here
https://community.hortonworks.com/questions/83394/difference-between-mr-and-tez.html
try setting these values(increase the memory based off of need) before you run your query
SET hive.tez.container.size=4096;
SET hive.tez.java.opts='-Xmx2000m';
SET hive.execution.engine=tez;
or these two if you're set on using mapreduce
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
Set execution engine to MR and try. I will work as expected.
I have a basic setup of Ambari 2.5.3 and HDP 2.6.3 and tried to run some simple queries below. I don't understand why it failed. Can you help?
[root#demo demo]# beeline
Beeline version 1.2.1000.2.6.3.0-235 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000/default hive hive
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Apache Hive (version 1.2.1000.2.6.3.0-235)
Driver: Hive JDBC (version 1.2.1000.2.6.3.0-235)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000/default> create table test2 (id int, desc varchar(40));
No rows affected (1.216 seconds)
0: jdbc:hive2://localhost:10000/default> insert into table test2 values (1,"aa"),(2,"bb");
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: insert into table test2 ...(1,"aa"),(2,"bb")(Stage-1)
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1514250829950_0001_1_00, diagnostics=[Vertex vertex_1514250829950_0001_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: values__tmp__table__1 initializer failed, vertex=vertex_1514250829950_0001_1_00 [Map 1], java.lang.NullPointerException
at org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:388)
at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:311)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:413)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1514250829950_0001_1_00, diagnostics=[Vertex vertex_1514250829950_0001_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: values__tmp__table__1 initializer failed, vertex=vertex_1514250829950_0001_1_00 [Map 1], java.lang.NullPointerException
at org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:388)
at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:311)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:413)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)
UPDATE 1
This is what I have in Hive config
I have faced the same issue, just change the execution engine to MR and it will work.
NPE could be related to mapred.FileInputFormat. The inputformat name matters.
Try setting org.apache.hadoop.hive.ql.io.HiveInputFormat for hive.input.format.
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
Hello I got this error when I tried to start a select count(*) in Hive.
Vertex failed, vertexName=Map 1, vertexId=vertex_1468250226607_1410_1_00, diagnostics=[Task failed, taskId=task_1468250226607_1410_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.tez.dag.api.TezUncheckedException: Expected length: 8417166 actual length: 9223372036854775675
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 8417166 actual length: 9223372036854775675
at org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
at org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
... 14 more
I dont understand at all what tez means by Expected length and actual length.
We use hive 1.2.1, yarn 2.7.1, hadoop 2.7.1.2.3.2.0-2950
We used to run the below mentioned query in hive and it was working fine but after hadoop 2.1 upgrade it fails.
insert overwrite table x
Select a.emp_id,a.emp_name,
coalesce(b.salary, 0.0),
coalesce(b.work_days, 0.0),
coalesce(c.previous_work_days, 0.0)
from a left outer join b
on (a.emp_id = b.emp_id)
left outer join x
on (a.emp_id = x.emp_id);
I found the issue was like reading from the same table(x) and inserting into the same(x) causes the issue. Does anyone has faced the same issue.
error trace
Status: Failed
Vertex re-running, vertexName=Map 2, vertexId=vertex_1412109137269_13777_1_01
Vertex re-running, vertexName=Map 1, vertexId=vertex_1412109137269_13777_1_02
Vertex re-running, vertexName=Map 2, vertexId=vertex_1412109137269_13777_1_01
Vertex re-running, vertexName=Map 2, vertexId=vertex_1412109137269_13777_1_01
Vertex failed, vertexName=Map 3, vertexId=vertex_1412109137269_13777_1_00, diagnostics=[Task failed, taskId=task_1412109137269_13777_1_00_000015, diagnostics=[AttemptID:attempt_1412109137269_13777_1_00_000015_0 Info:AttemptID:attempt_1412109137269_13777_1_00_000015_0 Timed out after 300 secs
Container released by application, AttemptID:attempt_1412109137269_13777_1_00_000015_1 Info: Containercontainer_1412109137269_13777_01_000077 received a STOP_REQUEST
Container released by application, AttemptID:attempt_1412109137269_13777_1_00_000015_2 Info: Containercontainer_1412109137269_13777_01_000100 received a STOP_REQUEST
Container released by application, AttemptID:attempt_1412109137269_13777_1_00_000015_3 Info:Error: exceptionThrown=java.lang.IllegalStateException
at java.util.ArrayList$Itr.remove(ArrayList.java:844)
at org.apache.tez.runtime.library.shuffle.common.impl.ShuffleManager.constructFetcherForHost(ShuffleManager.java:322)
at org.apache.tez.runtime.library.shuffle.common.impl.ShuffleManager.access$1300(ShuffleManager.java:80)
at org.apache.tez.runtime.library.shuffle.common.impl.ShuffleManager$RunShuffleCallable.call(ShuffleManager.java:267)
at org.apache.tez.runtime.library.shuffle.common.impl.ShuffleManager$RunShuffleCallable.call(ShuffleManager.java:220)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
, errorMessage=Shuffle Scheduler Failed], Vertex failed as one or more tasks failed. failedTasks:1]
DAG failed due to vertex failure. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
Shutting down tez session.