Can not query struct field with hive (CDH 5.9.0) - hadoop

I just switch to CDH 5.9.0 (a full new install, not an upgrade, on a new cluster).
I have a table like this one (a bit more complex, but I reproduce with this example too):
CREATE TABLE `products`(`header` struct<PCODE:string, PNAME:string>)
PARTITIONED BY (`IMPORT_DATE' string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://myhost.com:8020/user/hive/warehouse/dbp/products'
TBLPROPERTIES ('transient_lastDdlTime'='1482160314')
If I do:
SELECT header FROM products;
==> The query is successful and return all products headers (in a JSON format)
But if I do:
SELECT header.PCODE FROM products;
==> It fails with the following stacktrace:
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:147)
... 22 more
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61)
at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:980)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:63)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:431)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126)
... 22 more
On my old cluster (CDH 5.8.2), it works fine.
Any idea?
[EDIT: I have downgraded all CDH 5.9.0 jars (/opt/cloudera/parcels/CDH/jars) to the CDH 5.8.2 ones and the query is successfull. there is maybe a regression in CDH 5.9.0...]
[EDIT 2: If the table is stored as TextFile ( 'org.apache.hadoop.mapred.TextInputFormat'), the query runs successfuly.
We can think that the problem is linked with parquet.]
[Also posted on Cloudera forum: https://community.cloudera.com/t5/Batch-SQL-Apache-Hive/Can-not-query-struct-field-with-hive-CDH-5-9-0/m-p/48672#U48672 ]

I'll fix this by lowering the case on the query elements. P.ex:
SELECT header.pcode FROM products;

So I try a lot of things and I ended up with the followings:
-- Struct fieldnames in lowercase
CREATE TABLE `products`(`header` struct<pcode:string, pname:string>) STORED AS PARQUET;
Select results:
SELECT header.pcode FROM products ==> OK
SELECT HEADER.pcode FROM products ==> OK
SELECT header.PCODE FROM products ==> KO
SELECT HEADER.PCODE FROM products ==> KO
-- Struct fieldnames in UPPERCASE
CREATE TABLE `products`(`header` struct<PCODE:string, PNAME:string>) STORED AS PARQUET;
Select results:
SELECT header.pcode FROM products ==> KO
SELECT HEADER.pcode FROM products ==> KO
SELECT header.PCODE FROM products ==> KO
SELECT HEADER.PCODE FROM products ==> KO
==> Avoid UPPERCASE in struct fieldnames with tables stored as PARQUET in CDH 5.9.0 (it worked in CDH 5.8.2)...

Related

HQL throws ArrayList cannot be cast to org.apache.hadoop.io.Text

I have a query which fails when reducing, the error which is thrown is:
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
However, when going deeper into the YARN logs, I was able to find this:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"2020-05-05","reducesinkkey1":10039,"reducesinkkey2":103,"reducesinkkey3":"2020-05-05","reducesinkkey4":10039,"reducesinkkey5":103},"value":{"_col0":103,"_col1":["1","2"]}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"2020-05-05","reducesinkkey1":10039,"reducesinkkey2":103,"reducesinkkey3":"2020-05-05","reducesinkkey4":10039,"reducesinkkey5":103},"value":{"_col0":103,"_col1":["1","2"]}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253) ... 7 more Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.apache.hadoop.io.Text
The most relevant part being:
java.util.ArrayList cannot be cast to org.apache.hadoop.io.Text
This is the query which I'm executing (FYI: this is a subquery within a bigger query):
SELECT
yyyy_mm_dd,
h_id,
MAX(CASE WHEN rn=1 THEN prov_id ELSE NULL END) OVER (partition by yyyy_mm_dd, h_id) as primary_prov,
collect_set(api) OVER (partition by yyyy_mm_dd, h_id, p_id) prov_id_api, --re-assemple array to include all elements from multiple initial arrays if there are different arrays per prov_id
prov_id
FROM(
SELECT --get "primary prov" (first element in ascending array))
yyyy_mm_dd,
h_id,
prov_id,
api,
ROW_NUMBER() OVER(PARTITION BY yyyy_mm_dd, h_id ORDER BY api) rn
FROM(
SELECT --explode array to get data at row level
t.yyyy_mm_dd,
t.h_id,
prov_id,
collect_set(--array of integers, use set to remove duplicates
CASE
WHEN e.apis_xml_element = 'res' THEN 1
WHEN e.apis_xml_element = 'av' THEN 2
...
...
ELSE e.apis_xml_element
END) as api
FROM
mytable t
LATERAL VIEW EXPLODE(apis_xml) e AS apis_xml_element
WHERE
yyyy_mm_dd = "2020-05-05"
AND t.apis_xml IS NOT NULL
GROUP BY
1,2,3
)s
)s
I have further narrowed the issue down to the top level select, as the inner select works fine by itself, which makes me believe the issue is happening here specifically:
collect_set(api) OVER (partition by yyyy_mm_dd, h_id, prov_id) prov_id_api
However, I'm unsure how to solve it. At the most inner select, apis_xml is an array<string> which holds strings such as 'res' and 'av' up until a point. Then integers are used. Hence the case statement to align these.
Strangely, if I run this via Spark i.e. spark.sql=(above_query), it works. However, on beeline via HQL, the job gets killed.
Remove collect_set in the inner query, because it already produces array, upper collect_set should receive scalars. Also remove group by in the inner query, because without collect_set there is no aggregation any more. You can use DISTINCT if you need to remove duplicates

Hive | ArrayIndexOutOfBounds | More than 1000 columns in select query

I am trying to run some group operations (like max, min, avg, count etc) on Hive table with 300 columns. So, my select query would have more than 1000 columns and more than 4000 characters.
The select query is failing. I am facing the below issue.
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:217)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1084)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128
at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1042)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1081)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:401)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1007)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1025)
... 14 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -128
at java.util.ArrayList.elementData(ArrayList.java:400)
at java.util.ArrayList.get(ArrayList.java:413)
at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:797)
at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:609)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.toHiveKey(ReduceSinkOperator.java:508)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:394)
... 17 more
I get this error when I try to run the query on Hive terminal.
There is table in hive which contains 300 columns and when I perform group functions like count, min, max, distinct etc. on all the columns of this table, I face the above error. The hive query for this is huge and has 300*6 (let's consider 6 group functions - applied to each and every column) columns in it.

Hive query error java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException

Hello I am executing an Hive query :
CREATE TABLE temp_session_orgid as
SELECT
sorgid.property_num, sorgid.visitid, sorgid.fullvisitorid, sorgid.adate, sorgid.hits_customvariables_customvarvalue as orgid
FROM
(
SELECT
*,
row_number() over (partition by property_num, visitid, fullvisitorid, adate order by hitsid) as rn
FROM bt_hits_custom_vars
WHERE hits_customvariables_customvarname = 'orgId'
) sorgid
WHERE
sorgid.rn = 1
;
Hive:2.1.1
EMR:5.3.1
where I am getting following error:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.nio.channels.ClosedChannelException
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:785)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:126)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:373)
at org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:122)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:348)
... 17 more
Caused by: java.nio.channels.ClosedChannelException
at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1546)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:60)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.write(HiveIgnoreKeyTextOutputFormat.java:87)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:751)
... 27 more
], TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) :
attempt_1501196537592_0020_2_01_000000_3:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row (tag=0)
{"key":{"reducesinkkey0":"89634781","reducesinkkey1":"1442844353","reducesinkkey2":"5186210141339993001","reducesinkkey3":"20150921","reducesinkkey4":"1"},"value":{"_col1":"CUSTOMER"}}
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
What is the cause of this error ? What is the solution to resolve this error?
use below settings in hive.
set hive.auto.convert.join=false;
set hive.vectorized.execution.enabled=false;
set hive.vectorized.execution.reduce.enabled=false;
set hive.merge.cardinality.check=false;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=500;
set hive.auto.convert.join.noconditionaltask = true;
set hive.auto.convert.join.noconditionaltask.size = 10000000;
set hive.auto.convert.sortmerge.join=true;

Too many counter groups while storing Hive partitioned table as parquet

I have created a table sample with id as its partition and stored it in parquet format.
create table sample(uuid String,date String,Name String,EmailID String,Comments String,CompanyName String,country String,url String,keyword String,source String) PARTITIONED BY (id String) Stored as parquet;
Then I inserted values into it using below command
INSERT INTO TABLE sample PARTITION (id) Select uuid,date,Name,EmailID,Comments,CompanyName,country,url,keyword,source,id from inter distribute by id;
This query results in following issue
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counter groups: 51 max=50 at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1613) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.mapreduce.counters.LimitExceededException: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counter groups: 51 max=50 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:97) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounterImpl(AbstractCounterGroup.java:123) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:113) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:130) at org.apache.hadoop.mapred.Counters$Group.findCounter(Counters.java:369) at org.apache.hadoop.mapred.Counters$Group.getCounterForName(Counters.java:314) at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:479) at org.apache.hadoop.mapred.Counters.incrCounter(Counters.java:544) at org.apache.hadoop.mapred.Task$TaskReporter.incrCounter(Task.java:679) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper$ReportStats.func(ExecMapper.java:261) at org.apache.hadoop.hive.ql.exec.Operator.preorderMap(Operator.java:850) at org.apache.hadoop.hive.ql.exec.Operator.preorderMap(Operator.java:853) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:289) ... 7 more Caused by: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counter groups: 51 max=50 at org.apache.hadoop.mapreduce.counters.Limits.checkGroups(Limits.java:118) at org.apache.hadoop.mapreduce.counters.AbstractCounters.getGroup(AbstractCounters.java:230) at org.apache.hadoop.mapred.Counters.getGroup(Counters.java:113) at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:479) at org.apache.hadoop.mapred.Counters.incrCounter(Counters.java:544) at org.apache.hadoop.mapred.Task$TaskReporter.incrCounter(Task.java:679) at org.apache.hadoop.hive.ql.stats.CounterStatsPublisher.publishStat(CounterStatsPublisher.java:54) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1167) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1017) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287) ... 7 more Container killed by the ApplicationMaster. Container killed on request. Exit code is 137 Container exited with a non-zero exit code 137
NOTE id column have 1 million distinct values
Any one help me in this?
You should expand the counters limit, such as:
mapreduce.job.counters.limit=1000
mapreduce.job.counters.max=1000
mapreduce.job.counters.groups.max=500
mapreduce.job.counters.group.name.max=1000
mapreduce.job.counters.counter.name.max=500

Hive Runtime Error

After I created a Hive table that is connected to an HBase table, I performed an INSERT query such as:
CREATE TABLE hbase_test(key int,subject string,predicate string,object string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val1,cf1:val2,cf1:val3")
TBLPROPERTIES ("hbase.table.name" = "hbase_test");
INSERT OVERWRITE TABLE hbase_test select * from hbase_origin;
(hbase_orgin is a Hive table which has four colums - int key, subject string, predicate string, object string)
When I performed this query, I am getting a runtime error like the one below.
How can I solve this problem?
Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1441611615703_0524_1_00, diagnostics=[Task failed, taskId=task_1441611615703_0524_1_00_000014, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"key":3713292,"subject":"<Department5.University49.edu/AssociateProfessor2/Publication10>","predicate":"<w3.org/1999/02/22-rdf-syntax-ns#type>","object":"<swat.cse.lehigh.edu/onto/univ-bench.owl#Publication>"}
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"key":3713292,"subject":"Department5.University49.edu/AssociateProfessor2/Publication10>","predicate":"w3.org/1999/02/22-rdf-syntax-ns#type>","object":"<swat.cse.lehigh.edu/onto/univ-bench.owl#Publication>"}
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
... 13 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"key":3713292,"subject":"<Department5.University49.edu/AssociateProfessor2/Publication10>","predicate":"w3.org/1999/02/22-rdf-syntax-ns#type>","object":"swat.cse.lehigh.edu/onto/univ-bench.owl#Publication>"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2500 actions: ConnectException: 2500 times,
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:723)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 17 more Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2500 actions: ConnectException: 2500 times,
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:224)
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:204)
at org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1597)
at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1069)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:1041)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:999)
at org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat$MyRecordWriter.write(HiveHBaseTableOutputFormat.java:146)
at org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat$MyRecordWriter.write(HiveHBaseTableOutputFormat.java:117)
at org.apache.hadoop.hive.ql.io.HivePassThroughRecordWriter.write(HivePassThroughRecordWriter.java:40)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:689)
... 23 more

Resources