Hive Runtime Error - hadoop

After I created a Hive table that is connected to an HBase table, I performed an INSERT query such as:
CREATE TABLE hbase_test(key int,subject string,predicate string,object string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val1,cf1:val2,cf1:val3")
TBLPROPERTIES ("hbase.table.name" = "hbase_test");
INSERT OVERWRITE TABLE hbase_test select * from hbase_origin;
(hbase_orgin is a Hive table which has four colums - int key, subject string, predicate string, object string)
When I performed this query, I am getting a runtime error like the one below.
How can I solve this problem?
Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1441611615703_0524_1_00, diagnostics=[Task failed, taskId=task_1441611615703_0524_1_00_000014, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"key":3713292,"subject":"<Department5.University49.edu/AssociateProfessor2/Publication10>","predicate":"<w3.org/1999/02/22-rdf-syntax-ns#type>","object":"<swat.cse.lehigh.edu/onto/univ-bench.owl#Publication>"}
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"key":3713292,"subject":"Department5.University49.edu/AssociateProfessor2/Publication10>","predicate":"w3.org/1999/02/22-rdf-syntax-ns#type>","object":"<swat.cse.lehigh.edu/onto/univ-bench.owl#Publication>"}
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
... 13 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"key":3713292,"subject":"<Department5.University49.edu/AssociateProfessor2/Publication10>","predicate":"w3.org/1999/02/22-rdf-syntax-ns#type>","object":"swat.cse.lehigh.edu/onto/univ-bench.owl#Publication>"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2500 actions: ConnectException: 2500 times,
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:723)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 17 more Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2500 actions: ConnectException: 2500 times,
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:224)
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:204)
at org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1597)
at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1069)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:1041)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:999)
at org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat$MyRecordWriter.write(HiveHBaseTableOutputFormat.java:146)
at org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat$MyRecordWriter.write(HiveHBaseTableOutputFormat.java:117)
at org.apache.hadoop.hive.ql.io.HivePassThroughRecordWriter.write(HivePassThroughRecordWriter.java:40)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:689)
... 23 more

Related

Hive | ArrayIndexOutOfBounds | More than 1000 columns in select query

I am trying to run some group operations (like max, min, avg, count etc) on Hive table with 300 columns. So, my select query would have more than 1000 columns and more than 4000 characters.
The select query is failing. I am facing the below issue.
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:217)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1084)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128
at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1042)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1081)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:401)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1007)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1025)
... 14 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -128
at java.util.ArrayList.elementData(ArrayList.java:400)
at java.util.ArrayList.get(ArrayList.java:413)
at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:797)
at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:609)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.toHiveKey(ReduceSinkOperator.java:508)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:394)
... 17 more
I get this error when I try to run the query on Hive terminal.
There is table in hive which contains 300 columns and when I perform group functions like count, min, max, distinct etc. on all the columns of this table, I face the above error. The hive query for this is huge and has 300*6 (let's consider 6 group functions - applied to each and every column) columns in it.

Hive query error java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException

Hello I am executing an Hive query :
CREATE TABLE temp_session_orgid as
SELECT
sorgid.property_num, sorgid.visitid, sorgid.fullvisitorid, sorgid.adate, sorgid.hits_customvariables_customvarvalue as orgid
FROM
(
SELECT
*,
row_number() over (partition by property_num, visitid, fullvisitorid, adate order by hitsid) as rn
FROM bt_hits_custom_vars
WHERE hits_customvariables_customvarname = 'orgId'
) sorgid
WHERE
sorgid.rn = 1
;
Hive:2.1.1
EMR:5.3.1
where I am getting following error:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.nio.channels.ClosedChannelException
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:785)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:126)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:373)
at org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:122)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:348)
... 17 more
Caused by: java.nio.channels.ClosedChannelException
at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1546)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:60)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.write(HiveIgnoreKeyTextOutputFormat.java:87)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:751)
... 27 more
], TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) :
attempt_1501196537592_0020_2_01_000000_3:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row (tag=0)
{"key":{"reducesinkkey0":"89634781","reducesinkkey1":"1442844353","reducesinkkey2":"5186210141339993001","reducesinkkey3":"20150921","reducesinkkey4":"1"},"value":{"_col1":"CUSTOMER"}}
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
What is the cause of this error ? What is the solution to resolve this error?
use below settings in hive.
set hive.auto.convert.join=false;
set hive.vectorized.execution.enabled=false;
set hive.vectorized.execution.reduce.enabled=false;
set hive.merge.cardinality.check=false;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=500;
set hive.auto.convert.join.noconditionaltask = true;
set hive.auto.convert.join.noconditionaltask.size = 10000000;
set hive.auto.convert.sortmerge.join=true;

Can not query struct field with hive (CDH 5.9.0)

I just switch to CDH 5.9.0 (a full new install, not an upgrade, on a new cluster).
I have a table like this one (a bit more complex, but I reproduce with this example too):
CREATE TABLE `products`(`header` struct<PCODE:string, PNAME:string>)
PARTITIONED BY (`IMPORT_DATE' string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://myhost.com:8020/user/hive/warehouse/dbp/products'
TBLPROPERTIES ('transient_lastDdlTime'='1482160314')
If I do:
SELECT header FROM products;
==> The query is successful and return all products headers (in a JSON format)
But if I do:
SELECT header.PCODE FROM products;
==> It fails with the following stacktrace:
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:147)
... 22 more
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61)
at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:980)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:63)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:431)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126)
... 22 more
On my old cluster (CDH 5.8.2), it works fine.
Any idea?
[EDIT: I have downgraded all CDH 5.9.0 jars (/opt/cloudera/parcels/CDH/jars) to the CDH 5.8.2 ones and the query is successfull. there is maybe a regression in CDH 5.9.0...]
[EDIT 2: If the table is stored as TextFile ( 'org.apache.hadoop.mapred.TextInputFormat'), the query runs successfuly.
We can think that the problem is linked with parquet.]
[Also posted on Cloudera forum: https://community.cloudera.com/t5/Batch-SQL-Apache-Hive/Can-not-query-struct-field-with-hive-CDH-5-9-0/m-p/48672#U48672 ]
I'll fix this by lowering the case on the query elements. P.ex:
SELECT header.pcode FROM products;
So I try a lot of things and I ended up with the followings:
-- Struct fieldnames in lowercase
CREATE TABLE `products`(`header` struct<pcode:string, pname:string>) STORED AS PARQUET;
Select results:
SELECT header.pcode FROM products ==> OK
SELECT HEADER.pcode FROM products ==> OK
SELECT header.PCODE FROM products ==> KO
SELECT HEADER.PCODE FROM products ==> KO
-- Struct fieldnames in UPPERCASE
CREATE TABLE `products`(`header` struct<PCODE:string, PNAME:string>) STORED AS PARQUET;
Select results:
SELECT header.pcode FROM products ==> KO
SELECT HEADER.pcode FROM products ==> KO
SELECT header.PCODE FROM products ==> KO
SELECT HEADER.PCODE FROM products ==> KO
==> Avoid UPPERCASE in struct fieldnames with tables stored as PARQUET in CDH 5.9.0 (it worked in CDH 5.8.2)...

Too many counter groups while storing Hive partitioned table as parquet

I have created a table sample with id as its partition and stored it in parquet format.
create table sample(uuid String,date String,Name String,EmailID String,Comments String,CompanyName String,country String,url String,keyword String,source String) PARTITIONED BY (id String) Stored as parquet;
Then I inserted values into it using below command
INSERT INTO TABLE sample PARTITION (id) Select uuid,date,Name,EmailID,Comments,CompanyName,country,url,keyword,source,id from inter distribute by id;
This query results in following issue
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counter groups: 51 max=50 at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1613) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.mapreduce.counters.LimitExceededException: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counter groups: 51 max=50 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:97) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounterImpl(AbstractCounterGroup.java:123) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:113) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:130) at org.apache.hadoop.mapred.Counters$Group.findCounter(Counters.java:369) at org.apache.hadoop.mapred.Counters$Group.getCounterForName(Counters.java:314) at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:479) at org.apache.hadoop.mapred.Counters.incrCounter(Counters.java:544) at org.apache.hadoop.mapred.Task$TaskReporter.incrCounter(Task.java:679) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper$ReportStats.func(ExecMapper.java:261) at org.apache.hadoop.hive.ql.exec.Operator.preorderMap(Operator.java:850) at org.apache.hadoop.hive.ql.exec.Operator.preorderMap(Operator.java:853) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:289) ... 7 more Caused by: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counter groups: 51 max=50 at org.apache.hadoop.mapreduce.counters.Limits.checkGroups(Limits.java:118) at org.apache.hadoop.mapreduce.counters.AbstractCounters.getGroup(AbstractCounters.java:230) at org.apache.hadoop.mapred.Counters.getGroup(Counters.java:113) at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:479) at org.apache.hadoop.mapred.Counters.incrCounter(Counters.java:544) at org.apache.hadoop.mapred.Task$TaskReporter.incrCounter(Task.java:679) at org.apache.hadoop.hive.ql.stats.CounterStatsPublisher.publishStat(CounterStatsPublisher.java:54) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1167) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1017) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287) ... 7 more Container killed by the ApplicationMaster. Container killed on request. Exit code is 137 Container exited with a non-zero exit code 137
NOTE id column have 1 million distinct values
Any one help me in this?
You should expand the counters limit, such as:
mapreduce.job.counters.limit=1000
mapreduce.job.counters.max=1000
mapreduce.job.counters.groups.max=500
mapreduce.job.counters.group.name.max=1000
mapreduce.job.counters.counter.name.max=500

Hive Query having virtual column INPUT__FILE__NAME in where clause gives exception

Executing hive query with filter on virtual column INPUT__FILE__NAME result in following exception.
hive> select count(*) from netflow where INPUT__FILE__NAME='vzb.1351794600.0';
FAILED: SemanticException java.lang.RuntimeException: cannot find field input__file__name from [org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField#1d264bf5, org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField#3d44d0c6,
.
.
.
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField#7e6bc5aa]
This error is different from the one we get when column name is wrong
hive> select count(*) from netflow where INPUT__FILE__NAM='vzb.1351794600.0';
FAILED: SemanticException [Error 10004]: Line 1:35 Invalid table alias or column reference 'INPUT__FILE__NAM': (possible column names are: first, last, ....)
But using this virtual column in select clause works fine.
hive> select INPUT__FILE__NAME from netflow group by INPUT__FILE__NAME;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 4
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201306041359_0006, Tracking URL = http://192.168.0.224:50030/jobdetails.jsp?jobid=job_201306041359_0006
Kill Command = /opt/hadoop/bin/../bin/hadoop job -kill job_201306041359_0006
Hadoop job information for Stage-1: number of mappers: 12; number of reducers: 4
2013-06-14 18:20:10,265 Stage-1 map = 0%, reduce = 0%
2013-06-14 18:20:33,363 Stage-1 map = 8%, reduce = 0%
.
.
.
2013-06-14 18:21:15,554 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201306041359_0006
MapReduce Jobs Launched:
Job 0: Map: 12 Reduce: 4 HDFS Read: 3107826046 HDFS Write: 55 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
hdfs://192.168.0.224:9000/data/jk/vzb/vzb.1351794600.0
Time taken: 78.467 seconds
I am trying to create external hive table on already present HDFS data. And I have extra files in the folder that I want to ignore. Similar to what is asked and suggested in following stackflow questions
how to make hive take only specific files as input from hdfs folder
when creating an external table in hive can I point the location to specific files in a direcotry?
Any help would be appreciated.
Full stack trace I am getting is as follows
2013-06-14 15:01:32,608 ERROR ql.Driver (SessionState.java:printError(401)) - FAILED: SemanticException java.lang.RuntimeException: cannot find field input__
org.apache.hadoop.hive.ql.parse.SemanticException: java.lang.RuntimeException: cannot find field input__file__name from [org.apache.hadoop.hive.serde2.object
at org.apache.hadoop.hive.ql.optimizer.pcr.PcrOpProcFactory$FilterPCR.process(PcrOpProcFactory.java:122)
at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
at org.apache.hadoop.hive.ql.optimizer.pcr.PartitionConditionRemover.transform(PartitionConditionRemover.java:86)
at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:102)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8163)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:50)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:755)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field input__file__name from [org.apache.hadoop.hive.ser
at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:231)
at org.apache.hadoop.hive.ql.optimizer.pcr.PcrOpProcFactory$FilterPCR.process(PcrOpProcFactory.java:112)
... 23 more
Caused by: java.lang.RuntimeException: cannot find field input__file__name from [org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyF
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:344)
at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldRef(UnionStructObjectInspector.java:100)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:128)
at org.apache.hadoop.hive.ql.optimizer.ppr.PartExprEvalUtils.prepareExpr(PartExprEvalUtils.java:100)
at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.pruneBySequentialScan(PartitionPruner.java:328)
at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:219)
... 24 more

Resources