Too many counter groups while storing Hive partitioned table as parquet

Too many counter groups while storing Hive partitioned table as parquet - hadoop

I have created a table sample with id as its partition and stored it in parquet format.
create table sample(uuid String,date String,Name String,EmailID String,Comments String,CompanyName String,country String,url String,keyword String,source String) PARTITIONED BY (id String) Stored as parquet;
Then I inserted values into it using below command
INSERT INTO TABLE sample PARTITION (id) Select uuid,date,Name,EmailID,Comments,CompanyName,country,url,keyword,source,id from inter distribute by id;
This query results in following issue
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counter groups: 51 max=50 at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1613) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.mapreduce.counters.LimitExceededException: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counter groups: 51 max=50 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:97) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounterImpl(AbstractCounterGroup.java:123) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:113) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:130) at org.apache.hadoop.mapred.Counters$Group.findCounter(Counters.java:369) at org.apache.hadoop.mapred.Counters$Group.getCounterForName(Counters.java:314) at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:479) at org.apache.hadoop.mapred.Counters.incrCounter(Counters.java:544) at org.apache.hadoop.mapred.Task$TaskReporter.incrCounter(Task.java:679) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper$ReportStats.func(ExecMapper.java:261) at org.apache.hadoop.hive.ql.exec.Operator.preorderMap(Operator.java:850) at org.apache.hadoop.hive.ql.exec.Operator.preorderMap(Operator.java:853) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:289) ... 7 more Caused by: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counter groups: 51 max=50 at org.apache.hadoop.mapreduce.counters.Limits.checkGroups(Limits.java:118) at org.apache.hadoop.mapreduce.counters.AbstractCounters.getGroup(AbstractCounters.java:230) at org.apache.hadoop.mapred.Counters.getGroup(Counters.java:113) at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:479) at org.apache.hadoop.mapred.Counters.incrCounter(Counters.java:544) at org.apache.hadoop.mapred.Task$TaskReporter.incrCounter(Task.java:679) at org.apache.hadoop.hive.ql.stats.CounterStatsPublisher.publishStat(CounterStatsPublisher.java:54) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1167) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1017) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287) ... 7 more Container killed by the ApplicationMaster. Container killed on request. Exit code is 137 Container exited with a non-zero exit code 137
NOTE id column have 1 million distinct values
Any one help me in this?

You should expand the counters limit, such as:
mapreduce.job.counters.limit=1000
mapreduce.job.counters.max=1000
mapreduce.job.counters.groups.max=500
mapreduce.job.counters.group.name.max=1000
mapreduce.job.counters.counter.name.max=500

Related

HQL throws ArrayList cannot be cast to org.apache.hadoop.io.Text

I have a query which fails when reducing, the error which is thrown is:
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
However, when going deeper into the YARN logs, I was able to find this:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"2020-05-05","reducesinkkey1":10039,"reducesinkkey2":103,"reducesinkkey3":"2020-05-05","reducesinkkey4":10039,"reducesinkkey5":103},"value":{"_col0":103,"_col1":["1","2"]}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"2020-05-05","reducesinkkey1":10039,"reducesinkkey2":103,"reducesinkkey3":"2020-05-05","reducesinkkey4":10039,"reducesinkkey5":103},"value":{"_col0":103,"_col1":["1","2"]}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253) ... 7 more Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.apache.hadoop.io.Text
The most relevant part being:
java.util.ArrayList cannot be cast to org.apache.hadoop.io.Text
This is the query which I'm executing (FYI: this is a subquery within a bigger query):
SELECT
yyyy_mm_dd,
h_id,
MAX(CASE WHEN rn=1 THEN prov_id ELSE NULL END) OVER (partition by yyyy_mm_dd, h_id) as primary_prov,
collect_set(api) OVER (partition by yyyy_mm_dd, h_id, p_id) prov_id_api, --re-assemple array to include all elements from multiple initial arrays if there are different arrays per prov_id
prov_id
FROM(
SELECT --get "primary prov" (first element in ascending array))
yyyy_mm_dd,
h_id,
prov_id,
api,
ROW_NUMBER() OVER(PARTITION BY yyyy_mm_dd, h_id ORDER BY api) rn
FROM(
SELECT --explode array to get data at row level
t.yyyy_mm_dd,
t.h_id,
prov_id,
collect_set(--array of integers, use set to remove duplicates
CASE
WHEN e.apis_xml_element = 'res' THEN 1
WHEN e.apis_xml_element = 'av' THEN 2
...
...
ELSE e.apis_xml_element
END) as api
FROM
mytable t
LATERAL VIEW EXPLODE(apis_xml) e AS apis_xml_element
WHERE
yyyy_mm_dd = "2020-05-05"
AND t.apis_xml IS NOT NULL
GROUP BY
1,2,3
)s
)s
I have further narrowed the issue down to the top level select, as the inner select works fine by itself, which makes me believe the issue is happening here specifically:
collect_set(api) OVER (partition by yyyy_mm_dd, h_id, prov_id) prov_id_api
However, I'm unsure how to solve it. At the most inner select, apis_xml is an array<string> which holds strings such as 'res' and 'av' up until a point. Then integers are used. Hence the case statement to align these.
Strangely, if I run this via Spark i.e. spark.sql=(above_query), it works. However, on beeline via HQL, the job gets killed.

Remove collect_set in the inner query, because it already produces array, upper collect_set should receive scalars. Also remove group by in the inner query, because without collect_set there is no aggregation any more. You can use DISTINCT if you need to remove duplicates

Presto: Cannot INSERT values

I can create and drop tables and do query normally in Presto, but when I use insert, it's always wrong as shown bellow:
presto:default> create table test.lll (a int);
CREATE TABLE
presto:default> insert into test.lll select 1;
Query 20180104_091933_00007_k8e78, FAILED, 5 nodes
Splits: 84 total, 30 done (35.71%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20180104_091933_00007_k8e78 failed: No page sink provider for connector 'hive'
What is the reason and how to address it? Any help is appreciated.
Error Type: INTERNAL_ERROR
Error Code: GENERIC_INTERNAL_ERROR (65536)
Full stack trace:
java.lang.IllegalArgumentException: No page sink provider for connector 'hive'
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:191)
at com.facebook.presto.split.PageSinkManager.providerFor(PageSinkManager.java:67)
at com.facebook.presto.split.PageSinkManager.createPageSink(PageSinkManager.java:61)
at com.facebook.presto.operator.TableWriterOperator$TableWriterOperatorFactory.createPageSink(TableWriterOperator.java:97)
at com.facebook.presto.operator.TableWriterOperator$TableWriterOperatorFactory.createOperator(TableWriterOperator.java:88)
at com.facebook.presto.operator.DriverFactory.createDriver(DriverFactory.java:92)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunnerFactory.createDriver(SqlTaskExecution.java:515)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunnerFactory.access$1400(SqlTaskExecution.java:490)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:616)
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at com.facebook.presto.execution.executor.LegacyPrioritizedSplitRunner.process(LegacyPrioritizedSplitRunner.java:23)
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:492)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)`

Hive | ArrayIndexOutOfBounds | More than 1000 columns in select query

I am trying to run some group operations (like max, min, avg, count etc) on Hive table with 300 columns. So, my select query would have more than 1000 columns and more than 4000 characters.
The select query is failing. I am facing the below issue.
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:217)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1084)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128
at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1042)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1081)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:401)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1007)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1025)
... 14 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -128
at java.util.ArrayList.elementData(ArrayList.java:400)
at java.util.ArrayList.get(ArrayList.java:413)
at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:797)
at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:609)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.toHiveKey(ReduceSinkOperator.java:508)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:394)
... 17 more
I get this error when I try to run the query on Hive terminal.
There is table in hive which contains 300 columns and when I perform group functions like count, min, max, distinct etc. on all the columns of this table, I face the above error. The hive query for this is huge and has 300*6 (let's consider 6 group functions - applied to each and every column) columns in it.

Hive Runtime Error

After I created a Hive table that is connected to an HBase table, I performed an INSERT query such as:
CREATE TABLE hbase_test(key int,subject string,predicate string,object string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val1,cf1:val2,cf1:val3")
TBLPROPERTIES ("hbase.table.name" = "hbase_test");
INSERT OVERWRITE TABLE hbase_test select * from hbase_origin;
(hbase_orgin is a Hive table which has four colums - int key, subject string, predicate string, object string)
When I performed this query, I am getting a runtime error like the one below.
How can I solve this problem?
Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1441611615703_0524_1_00, diagnostics=[Task failed, taskId=task_1441611615703_0524_1_00_000014, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"key":3713292,"subject":"<Department5.University49.edu/AssociateProfessor2/Publication10>","predicate":"<w3.org/1999/02/22-rdf-syntax-ns#type>","object":"<swat.cse.lehigh.edu/onto/univ-bench.owl#Publication>"}
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"key":3713292,"subject":"Department5.University49.edu/AssociateProfessor2/Publication10>","predicate":"w3.org/1999/02/22-rdf-syntax-ns#type>","object":"<swat.cse.lehigh.edu/onto/univ-bench.owl#Publication>"}
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
... 13 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"key":3713292,"subject":"<Department5.University49.edu/AssociateProfessor2/Publication10>","predicate":"w3.org/1999/02/22-rdf-syntax-ns#type>","object":"swat.cse.lehigh.edu/onto/univ-bench.owl#Publication>"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2500 actions: ConnectException: 2500 times,
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:723)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 17 more Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2500 actions: ConnectException: 2500 times,
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:224)
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:204)
at org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1597)
at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1069)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:1041)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:999)
at org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat$MyRecordWriter.write(HiveHBaseTableOutputFormat.java:146)
at org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat$MyRecordWriter.write(HiveHBaseTableOutputFormat.java:117)
at org.apache.hadoop.hive.ql.io.HivePassThroughRecordWriter.write(HivePassThroughRecordWriter.java:40)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:689)
... 23 more

Hive Query having virtual column INPUTFILENAME in where clause gives exception

Executing hive query with filter on virtual column INPUT__FILE__NAME result in following exception.
hive> select count(*) from netflow where INPUT__FILE__NAME='vzb.1351794600.0';
FAILED: SemanticException java.lang.RuntimeException: cannot find field input__file__name from [org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField#1d264bf5, org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField#3d44d0c6,
.
.
.
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField#7e6bc5aa]
This error is different from the one we get when column name is wrong
hive> select count(*) from netflow where INPUT__FILE__NAM='vzb.1351794600.0';
FAILED: SemanticException [Error 10004]: Line 1:35 Invalid table alias or column reference 'INPUT__FILE__NAM': (possible column names are: first, last, ....)
But using this virtual column in select clause works fine.
hive> select INPUT__FILE__NAME from netflow group by INPUT__FILE__NAME;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 4
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201306041359_0006, Tracking URL = http://192.168.0.224:50030/jobdetails.jsp?jobid=job_201306041359_0006
Kill Command = /opt/hadoop/bin/../bin/hadoop job -kill job_201306041359_0006
Hadoop job information for Stage-1: number of mappers: 12; number of reducers: 4
2013-06-14 18:20:10,265 Stage-1 map = 0%, reduce = 0%
2013-06-14 18:20:33,363 Stage-1 map = 8%, reduce = 0%
.
.
.
2013-06-14 18:21:15,554 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201306041359_0006
MapReduce Jobs Launched:
Job 0: Map: 12 Reduce: 4 HDFS Read: 3107826046 HDFS Write: 55 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
hdfs://192.168.0.224:9000/data/jk/vzb/vzb.1351794600.0
Time taken: 78.467 seconds
I am trying to create external hive table on already present HDFS data. And I have extra files in the folder that I want to ignore. Similar to what is asked and suggested in following stackflow questions
how to make hive take only specific files as input from hdfs folder
when creating an external table in hive can I point the location to specific files in a direcotry?
Any help would be appreciated.
Full stack trace I am getting is as follows
2013-06-14 15:01:32,608 ERROR ql.Driver (SessionState.java:printError(401)) - FAILED: SemanticException java.lang.RuntimeException: cannot find field input__
org.apache.hadoop.hive.ql.parse.SemanticException: java.lang.RuntimeException: cannot find field input__file__name from [org.apache.hadoop.hive.serde2.object
at org.apache.hadoop.hive.ql.optimizer.pcr.PcrOpProcFactory$FilterPCR.process(PcrOpProcFactory.java:122)
at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
at org.apache.hadoop.hive.ql.optimizer.pcr.PartitionConditionRemover.transform(PartitionConditionRemover.java:86)
at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:102)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8163)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:50)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:755)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field input__file__name from [org.apache.hadoop.hive.ser
at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:231)
at org.apache.hadoop.hive.ql.optimizer.pcr.PcrOpProcFactory$FilterPCR.process(PcrOpProcFactory.java:112)
... 23 more
Caused by: java.lang.RuntimeException: cannot find field input__file__name from [org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyF
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:344)
at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldRef(UnionStructObjectInspector.java:100)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:128)
at org.apache.hadoop.hive.ql.optimizer.ppr.PartExprEvalUtils.prepareExpr(PartExprEvalUtils.java:100)
at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.pruneBySequentialScan(PartitionPruner.java:328)
at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:219)
... 24 more

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Too many counter groups while storing Hive partitioned table as parquet - hadoop

You should expand the counters limit, such as: mapreduce.job.counters.limit=1000 mapreduce.job.counters.max=1000 mapreduce.job.counters.groups.max=500 mapreduce.job.counters.group.name.max=1000 mapreduce.job.counters.counter.name.max=500

Related

HQL throws ArrayList cannot be cast to org.apache.hadoop.io.Text

Presto: Cannot INSERT values

Hive | ArrayIndexOutOfBounds | More than 1000 columns in select query

Hive Runtime Error

Hive Query having virtual column INPUTFILENAME in where clause gives exception

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Too many counter groups while storing Hive partitioned table as parquet - hadoop

You should expand the counters limit, such as: mapreduce.job.counters.limit=1000 mapreduce.job.counters.max=1000 mapreduce.job.counters.groups.max=500 mapreduce.job.counters.group.name.max=1000 mapreduce.job.counters.counter.name.max=500

Related

HQL throws ArrayList cannot be cast to org.apache.hadoop.io.Text

Presto: Cannot INSERT values

Hive | ArrayIndexOutOfBounds | More than 1000 columns in select query

Hive Runtime Error

Hive Query having virtual column INPUT__FILE__NAME in where clause gives exception

Categories

Resources

Hive Query having virtual column INPUTFILENAME in where clause gives exception