Hive insert query with dynamic partitioning - hadoop

I have an orc formatted partitioned and clustered hive table to which I have to insert data from a temp table.
Create table statements:
orc table:
CREATE EXTERNAL TABLE DYN_TGT_TABLE(ID INT,USRNAME VARCHAR(20), LD TIMESTAMP) PARTITIONED BY (PART VARCHAR(20)) CLUSTERED BY (id) INTO 1024 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION '/tmp/dyn_tgt_table' TBLPROPERTIES ('transactional'='true','transient_lastDdlTime'='1461157247');
temp table:
CREATE EXTERNAL TABLE DYN_TEMP_TABLE(ID INT,USRNAME VARCHAR(20),PART VARCHAR(20), LD TIMESTAMP) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION '/tmp/dyn_temp_table' TBLPROPERTIES ('transactional'='TRUE','transient_lastDdlTime'='1461216088');
Added 5 records to temp table:
hive> select * from DYN_TEMP_TABLE;
OK
1 user1 A 2016-05-06 06:31:48
2 user2 A 2016-05-06 06:31:48
14 user3 B 2016-05-06 06:31:48
15 user4 B 2016-05-06 06:31:48
16 user5 B 2016-05-06 06:31:48
Time taken: 0.166 seconds, Fetched: 5 row(s)
The below dynamic insert query is erroring:
insert into TABLE DYN_TGT_TABLE PARTITION(part) select id,usrname,ld,part from DYN_TEMP_TABLE;
Error message:
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0)
{"key":{},"value":{"_col0":15,"_col1":"user4","_col2":"2016-05-06 06:31:48","_col3":"B"}}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":15,"_col1":"user4","_col2":"2016-05-06 06:31:48","_col3":"B"}}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253)
... 7 more
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:755)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:683)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
Please help me point out the cause for this error.

Related

Issue in sum function for ORC hive table

I have a hive table stored as ORC having a bigint column col1 and many other columns.
Few records of col1 :
3180231637038089849
3185739118697487865
3196698142218730052
3262542509863274723
3180231637038089849
3262542509863274723
3180231637038089849
I need to calculate sum of col1 values. As the sum will be greater than the max value, So I am casting it to decimal(38,0)
select sum(cast(col1 as decimal (38,0))) from sample_table;
Execption:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) [Error getting row data with exception java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:314)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:219)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:202)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:354)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:354)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:198)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:184)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:239)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
]
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) [Error getting row data with exception java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:314)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:219)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:202)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:354)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:354)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:198)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:184)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:239)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
]
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:766)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
... 7 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:314)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:219)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:202)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:98)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:587)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:851)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761)
... 8 more
Table schema:
CREATE TABLE `sample-table`(
`col2` bigint,
`col1` bigint)
CLUSTERED BY (
col2)
INTO 2 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://mycluster:8020/apps/hive/warehouse/testdb.db/sanple-table'
TBLPROPERTIES (
'last_modified_by'='devender',
'last_modified_time'='1521526039',
'numFiles'='1',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='49939',
'transient_lastDdlTime'='1521526106')
There are around 130 columns there.
This query is working fine if I create textfile table and run the query on that table.
create table sample_table_text stored as textfile as select * from sample_table;

HIVE - SQL insert record

In HIVE SQL , i ran same query in two different cloudera version. Cloudera VM 5.10 is not causing any issue. But another version cloudera -CDH-5.1.0-1.cdh5.1.0.p0.53 is throwing error.
hive> select * from t;
OK
Time taken: 1.803 seconds
hive> insert into table t values (1);
NoViableAltException(26#[])
at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:713)
at org.apache.hadoop.hive.ql.parse.HiveParser.selectClause(HiveParser.java:35992)
at org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:33510)
at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:33389)
at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:33169)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1284)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:983)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:352)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:995)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1038)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: ParseException line 1:20 cannot recognize input near 'values' '(' '1' in select clause
hive>
Any idea? which version have to choose for my studies. Please advise.
In older versions of CDH like CDH-5.1 insert record is not supported but in new versions of CDH it is supported feature.
So instead of trying insert into values,try with Load data statement
If your file is in Local then
hive> LOAD DATA LOCAL INPATH '<local-path-tofile>' INTO TABLE t;
If your file is in HDFS then
hive> LOAD DATA INPATH 'hdfs_file_or_directory_path' INTO TABLE t;
For more details refer to:-
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations
What you are trying is right, make sure you have one column in your table. if you multiple columns you need have the values for all the fields. checkout the below image
The version I tried in :
[cloudera#quickstart ~]$ hadoop version
Hadoop 2.6.0-cdh5.5.0
Subversion http://github.com/cloudera/hadoop -r fd21232cef7b8c1f536965897ce20f50b83ee7b2
Compiled by jenkins on 2015-11-09T20:37Z
Compiled with protoc 2.5.0
From source with checksum 98e07176d1787150a6a9c087627562c
This command was run using /usr/jars/hadoop-common-2.6.0-cdh5.5.0.jar
#roh
hive> describe formatted t;
OK
# col_name data_type comment
id int None
# Detailed Table Information
Database: default
Owner: learnhadoop
CreateTime: Fri Feb 23 21:25:15 IST 2018
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://hadoopmasters:8020/user/hive/warehouse/t
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1519401315\
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputForm at
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.111 seconds, Fetched: 26 row(s)
hive>

Hive : Cannot INSERT OVERWRITE TABLE from unpartitioned External Table into a new partitioned table

In summary this is what I did:
Original data -> SELECT and save filtered data in HDFS -> create an External table using the file saved in HDFS -> populate an empty table using the External table.
Looking at the Exception, seems this has something todo with OUTPUT types between the two tables
In details :
1) I have "table_log" table with lots of data (in Database A) with the following structure (with 3 partitions) :
CREATE TABLE `table_log`(
`e_id` string,
`member_id` string,
.
.
PARTITIONED BY (
`dt` string,
`service_type` string,
`event_type` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
COLLECTION ITEMS TERMINATED BY '\u0002'
MAP KEYS TERMINATED BY '\u0003'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
2) I filtered data by (td,service_type,event_type) and saved the result in HDFS as follows :
INSERT OVERWRITE DIRECTORY '/user/atscale/filterd-ratlog' SELECT * FROM rat_log WHERE dt >= '2016-05-01' AND dt <='2016-05-31' AND service_type='xxxx_jp' AND event_type='vv';
3) Then I created an External Table (table_log_filtered_ext) (in Database B) with above result.
Note that this table doesn't have the partitions.
DROP TABLE IF EXISTS table_log_filtered_ext;
CREATE EXTERNAL TABLE `table_log_filtered_ext`(
`e_id` string,
`member_id` string,
.
.
dt string,
service_type string,
event_type string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
COLLECTION ITEMS TERMINATED BY '\u0002'
MAP KEYS TERMINATED BY '\u0003'
LOCATION '/user/atscale/filterd-ratlog'
4) I created another new table (table_log_filtered) similar to the "table_log" structure(with 3 partitions) as :
CREATE TABLE `table_log_filtered` (
`e_id` string,
`member_id` string,
.
.
PARTITIONED BY (
`dt` string,
`service_type` string,
`event_type` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
COLLECTION ITEMS TERMINATED BY '\u0002'
MAP KEYS TERMINATED BY '\u0003'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
5) Now I wanted to populate "table_log_filtered" table (with 3 partitions as in "table_log") from the data from the external table "table_log_filtered_ext"
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.execution.engine=tez;
INSERT OVERWRITE TABLE rat_log_filtered PARTITION(dt, service_type, event_type)
SELECT * FROM table_log_filtered_ext;
But I get this "java.lang.ClassCastException.
Looking at the exception, this has something todo with OUTPUT types between the two tables..
AnyTips ?:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":
.
.
.
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
... 16 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:81)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:54)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
... 17 more
Just in case if anyone else bump into this issue, the fix was as #Samson Scharfrichter mentioned , I specified STORED AS ORC for the table_log_filtered
CREATE TABLE `table_log_filtered` (
`e_id` string,
`member_id` string,
.
.
PARTITIONED BY (
`dt` string,
`service_type` string,
`event_type` string)
STORED AS ORC

unable to create partitions in hive

I am unable to create the partition into a new table from the table which is already present on hive.
The query that I am running on hive after the Table creation is
INSERT INTO TABLE ba_data.PNR_INFO1_partitioned PARTITION(pnr_create_dt) select * from pnr_info1_external;
The error that I am getting is
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hive/warehouse/ba_data.db/pnr_info1_partitioned/.hive-staging_hive_2016-08-09_17-47-47_508_8688474345886508021-1/_task_tmp.-ext-10002/pnr_create_dt=18%2F12%2F2013/_tmp.000000_3 could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
After I browsed and found that namenode,datanode folders needs to be deleted and namenode should be formatted.I have done that sanitary task as well.But still the same error I am getting.
Also I have set the replication factor to 1 and all the Hadoop processes are running well.
Please suggest me how to proceed in order to get away from this issue.Your suggestions are much appreciated.
first you need to create
1. table with all field
2. load data into table
3. create table with partition column with their type
4. copy data from first table to partition table
I think the dynamic partition needs to be enabled. following works.
set hive.exec.dynamic.partition.mode=nonstrict;
create table parttable (id int) partitioned by (partcolumn string)
row format delimited fields terminated by '\t'
lines terminated by '\n'
;
create table source_table (id int,partcolumn string)
row format delimited fields terminated by '\t'
lines terminated by '\n'
;
insert into source_table values (1,'Chicago');
insert into source_table values (2,'Chicago');
insert into source_table values (3,'Orlando');
set hive.exec.dynamic.partition=true;
insert overwrite table parttable partition(partcolumn)
select id,partcolumn from source_table;

Negative Array Size Exception while inserting into Hive Bucketed Table

I am trying to insert into a hive bucketed sorted table and stuck with a Negative Array Size exception thrown by the reducer. Please find below stack trace.
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#3
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NegativeArraySizeException
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:305)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:295)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:514)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
And my table DDL is (Only showing a subset of columns for readability. Actual DDL has 100 columns)
CREATE TABLE clustered_sorted_orc( conv_type string,
multi_dim_id int,
multi_key_id int,
advertiser_id bigint,
buy_id bigint,
day timestamp
PARTITIONED BY(job_instance_id int)
CLUSTERED BY(conv_type) SORTED BY (day) INTO 8 BUCKETS
STORED AS ORC;
Insert statement is
FROM not_clustered_orc
INSERT OVERWRITE TABLE clustered_sorted_orc PARTITION(job_instance_id)
SELECT conv_type ,multi_dim_id ,multi_key_id ,advertiser_id,buy_id ,day, job_instance_id
Following hive properties are set
set hive.enforce.bucketing = true;
set hive.exec.dynamic.partition.mode=nonstrict;
This is a log snippet from MergerManagerImpl which specifies ioSortFactor,mergeThreshold etc if it helps.
2016-06-30 05:57:20,518 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager: memoryLimit=12828540928, maxSingleShuffleLimit=3207135232, mergeThreshold=8466837504, ioSortFactor=64, memToMemMergeOutputsThreshold=64
I am using CDH 5.7.1, Hive1.1.0, Hadoop 2.6.0. Has anyone faced a similar issue before? Any help is really appreciated.
I got it working after setting
hive.optimize.sort.dynamic.partition=true

Resources