Can't create HIVE table with 'ROW FORMAT SERDE'

Can't create HIVE table with 'ROW FORMAT SERDE' - hadoop

I am trying to create a HIVE table with SERDE. But it always fails.
My table creation command -
CREATE TABLE products_info_raw(
id STRING,
name STRING,
reseller STRING,
category STRING,
price BIGINT,
discount FLOAT,
profit_percent FLOAT
)
PARTITIONED BY (
rptg_dt STRING
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.contrib.serde2.JsonSerde';
I added jar -
ADD jar /Users/<user>/Development/Hadoop/projects/e-commerce/hive-json-serde.jar;
that contains the necessary JsonSerde class -
META-INF/
META-INF/MANIFEST.MF
org/
org/apache/
org/apache/hadoop/
org/apache/hadoop/hive/
org/apache/hadoop/hive/contrib/
org/apache/hadoop/hive/contrib/serde2/
org/json/
org/apache/hadoop/hive/contrib/serde2/JsonSerde.class
org/apache/hadoop/hive/contrib/serde2/NewJson.class
org/json/CDL.class
org/json/Cookie.class
org/json/CookieList.class
org/json/HTTP.class
org/json/HTTPTokener.class
org/json/JSONArray.class
org/json/JSONException.class
org/json/JSONML.class
org/json/JSONObject$1.class
org/json/JSONObject$Null.class
org/json/JSONObject.class
org/json/JSONString.class
org/json/JSONStringer.class
org/json/JSONTokener.class
org/json/JSONWriter.class
org/json/Test$1Obj.class
org/json/Test.class
org/json/XML.class
org/json/XMLTokener.class
But always keep getting below error -
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.serde2.SerDe
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
I am using HIVE 3.1.0.
Please help.

Related

Invalid Data in Hive-Created Table?

I'm using Hive version 3.1.3 on Hadoop 3.3.4 with Tez 0.9.2. I'm trying to run a SELECT statement on table that Hive created and manages. The query never finishes and fails. The full error message is below, but this appears to be the relevant portion:
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector
It looks like the error is a long to decimal conversion issue. However, this table was created by Hive, loading/transforming data in a previous step. Wouldn't Hive have thrown an error earlier if it was inserting an invalid value into a decimal column?
I used the exact same codebase and the exact same data on AWS EMR and didn't get this error, so I don't think there's an invalid value. But I'm stuck on where to go from here.
Here's the table definition:
claimid varchar(50)
claimlineid int
dos date
dosto date
member varchar(50)
provider varchar(50)
setname varchar(255)
code varchar(50)
system varchar(255)
primary int
positivenegative int
result decimal(10,2)
supply int
size decimal(10,2)
quantity decimal(10,2)
And here's the full error message:
Vertex failed, vertexName=Map 1, vertexId=vertex_1667735849290_0030_32_15, diagnostics=[Task failed, taskId=task_1667735849290_0030_32_15_000009, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1667735849290_0030_32_15_000009_0:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:488)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException:
DeserializeRead detail: Reading byte[] of length 4096 at start offset 4 for length 100 to read 14 fields with types [varchar(50), int, date, date, varchar(50), varchar(50), varchar(255), varchar(50), varchar(255), int, decimal(10,2), int, decimal(10,2), decimal(10,2)]. Read field #14 at field start position 0 current read offset 104
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:611)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.closeOp(VectorMapJoinGenerateResultOperator.java:681)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:757)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:477)
... 17 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException:
DeserializeRead detail: Reading byte[] of length 4096 at start offset 4 for length 100 to read 14 fields with types [varchar(50), int, date, date, varchar(50), varchar(50), varchar(255), varchar(50), varchar(255), int, decimal(10,2), int, decimal(10,2), decimal(10,2)]. Read field #14 at field start position 0 current read offset 104
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.reProcessBigTable(VectorMapJoinGenerateResultOperator.java:609)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOperator.java:671)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:604)
... 21 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
DeserializeRead detail: Reading byte[] of length 4096 at start offset 4 for length 100 to read 14 fields with types [varchar(50), int, date, date, varchar(50), varchar(50), varchar(255), varchar(50), varchar(255), int, decimal(10,2), int, decimal(10,2), decimal(10,2)]. Read field #14 at field start position 0 current read offset 104
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.reProcessBigTable(VectorMapJoinGenerateResultOperator.java:589)
... 23 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector
at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:687)
at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:934)
at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.reProcessBigTable(VectorMapJoinGenerateResultOperator.java:585)
... 23 more

Unfortunately, this is a problem with CBO. You can disable it, run the expression and get the result.
set hive.cbo.enable=false;

Getting error while creating hive table using "hive -e" but not in hive shell

I am trying to create hive table on top of HBase table. Using the mentioned query for same.
create external table MaprDB_batch_info_table (Batch_ID string, BatchParserJobId string, count string, CurrentRunTime string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,BatchInfo:BatchParserJobId,BatchInfo:count,BatchInfo:CurrentRunTime") TBLPROPERTIES ('hbase.table.name' = '/user/all/batchinfo');
This command is successfully executing in hive shell but when I try to execute same through bash shell
hive -e "create external table MaprDB_batch_info_table (Batch_ID string, BatchParserJobId string, count string, CurrentRunTime string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,BatchInfo:BatchParserJobId,BatchInfo:count,BatchInfo:CurrentRunTime") TBLPROPERTIES ('hbase.table.name' = '/user/all/batchinfo');
I get below error:
NoViableAltException(26#[])
at org.apache.hadoop.hive.ql.parse.HiveParser.tablePropertiesList(HiveParser.java:34375)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableProperties(HiveParser.java:34243)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableFileFormat(HiveParser.java:35913)
at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:5380)
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2640)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1650)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:397)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:309)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1146)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1194)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1083)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1073)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:473 cannot recognize input near 'hbase' '.' 'columns' in table properties list'
If anybody can help in rectifying this please.

Replace the " that you have within the query with '
...('hbase.columns.mapping'=':key,BatchInfo:BatchParserJobId,BatchInfo:count,BatchInfo:CurrentRunTime')...
Also you have an issue with the value given to 'hbase.table.name', replace the path with the actual table name.

Hive : Cannot INSERT OVERWRITE TABLE from unpartitioned External Table into a new partitioned table

In summary this is what I did:
Original data -> SELECT and save filtered data in HDFS -> create an External table using the file saved in HDFS -> populate an empty table using the External table.
Looking at the Exception, seems this has something todo with OUTPUT types between the two tables
In details :
1) I have "table_log" table with lots of data (in Database A) with the following structure (with 3 partitions) :
CREATE TABLE `table_log`(
`e_id` string,
`member_id` string,
.
.
PARTITIONED BY (
`dt` string,
`service_type` string,
`event_type` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
COLLECTION ITEMS TERMINATED BY '\u0002'
MAP KEYS TERMINATED BY '\u0003'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
2) I filtered data by (td,service_type,event_type) and saved the result in HDFS as follows :
INSERT OVERWRITE DIRECTORY '/user/atscale/filterd-ratlog' SELECT * FROM rat_log WHERE dt >= '2016-05-01' AND dt <='2016-05-31' AND service_type='xxxx_jp' AND event_type='vv';
3) Then I created an External Table (table_log_filtered_ext) (in Database B) with above result.
Note that this table doesn't have the partitions.
DROP TABLE IF EXISTS table_log_filtered_ext;
CREATE EXTERNAL TABLE `table_log_filtered_ext`(
`e_id` string,
`member_id` string,
.
.
dt string,
service_type string,
event_type string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
COLLECTION ITEMS TERMINATED BY '\u0002'
MAP KEYS TERMINATED BY '\u0003'
LOCATION '/user/atscale/filterd-ratlog'
4) I created another new table (table_log_filtered) similar to the "table_log" structure(with 3 partitions) as :
CREATE TABLE `table_log_filtered` (
`e_id` string,
`member_id` string,
.
.
PARTITIONED BY (
`dt` string,
`service_type` string,
`event_type` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
COLLECTION ITEMS TERMINATED BY '\u0002'
MAP KEYS TERMINATED BY '\u0003'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
5) Now I wanted to populate "table_log_filtered" table (with 3 partitions as in "table_log") from the data from the external table "table_log_filtered_ext"
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.execution.engine=tez;
INSERT OVERWRITE TABLE rat_log_filtered PARTITION(dt, service_type, event_type)
SELECT * FROM table_log_filtered_ext;
But I get this "java.lang.ClassCastException.
Looking at the exception, this has something todo with OUTPUT types between the two tables..
AnyTips ?:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":
.
.
.
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
... 16 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:81)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:54)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
... 17 more

Just in case if anyone else bump into this issue, the fix was as #Samson Scharfrichter mentioned , I specified STORED AS ORC for the table_log_filtered
CREATE TABLE `table_log_filtered` (
`e_id` string,
`member_id` string,
.
.
PARTITIONED BY (
`dt` string,
`service_type` string,
`event_type` string)
STORED AS ORC

Negative Array Size Exception while inserting into Hive Bucketed Table

I am trying to insert into a hive bucketed sorted table and stuck with a Negative Array Size exception thrown by the reducer. Please find below stack trace.
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#3
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NegativeArraySizeException
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:305)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:295)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:514)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
And my table DDL is (Only showing a subset of columns for readability. Actual DDL has 100 columns)
CREATE TABLE clustered_sorted_orc( conv_type string,
multi_dim_id int,
multi_key_id int,
advertiser_id bigint,
buy_id bigint,
day timestamp
PARTITIONED BY(job_instance_id int)
CLUSTERED BY(conv_type) SORTED BY (day) INTO 8 BUCKETS
STORED AS ORC;
Insert statement is
FROM not_clustered_orc
INSERT OVERWRITE TABLE clustered_sorted_orc PARTITION(job_instance_id)
SELECT conv_type ,multi_dim_id ,multi_key_id ,advertiser_id,buy_id ,day, job_instance_id
Following hive properties are set
set hive.enforce.bucketing = true;
set hive.exec.dynamic.partition.mode=nonstrict;
This is a log snippet from MergerManagerImpl which specifies ioSortFactor,mergeThreshold etc if it helps.
2016-06-30 05:57:20,518 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager: memoryLimit=12828540928, maxSingleShuffleLimit=3207135232, mergeThreshold=8466837504, ioSortFactor=64, memToMemMergeOutputsThreshold=64
I am using CDH 5.7.1, Hive1.1.0, Hadoop 2.6.0. Has anyone faced a similar issue before? Any help is really appreciated.

I got it working after setting
hive.optimize.sort.dynamic.partition=true

data reload from one table to another in hive

i am loading data from one table into another in hive, while the new properties of the new table differ from the original.
While loading i am facing the below issue... Any help to fix this... ?
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"mdse_item_i":671841,"co_loc_i":146,"persh_expr_d":"2014-05-01","greg_d":"2013-06-17","persh_oh_q":16.0,"crte_btch_i":765,"updt_btch_i":765,"range_n":"ITEM_LOC_DAY_PERSH_OH_INV_2013-04-01_2013-07-31"}
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"mdse_item_i":671841,"
My old table defntn:
hive> describe nonclickstream.ITEM_LOC_DAY_PERSH_OH_INV;
OK
mdse_item_i int
co_loc_i int
persh_expr_d string
greg_d string
persh_oh_q double
crte_btch_i int
updt_btch_i int
range_n string
Time taken: 0.058 seconds
My new table def. is below:
hive> describe ITEM_LOC_DAY_PERSH_OH_INV;
OK
mdse_item_i int from deserializer
co_loc_i int from deserializer
persh_expr_d string from deserializer
greg_d string from deserializer
persh_oh_q string from deserializer
crte_btch_i int from deserializer
updt_btch_i int from deserializer
greg_date string
Time taken: 0.241 seconds
The new one is created with avro schema.
CREATE external TABLE ITEM_LOC_DAY_PERSH_OH_INV
partitioned by (greg_date string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
Location '/common/TD/INV_new/ITEM_LOC_DAY_PERSH_OH_INV/'
TBLPROPERTIES (
'avro.schema.url'='hdfs:///common/TD/INV_new/ITEM_LOC_DAY_PERSH_OH_INV/ITEM_LOC_DAY_PERSH_OH_INV.avs');
Load command we are using:
INSERT INTO TABLE ITEM_LOC_DAY_PERSH_OH_INV PARTITION (greg_date)
SELECT
mdse_item_i,
co_loc_i,
persh_expr_d,
greg_d,
persh_oh_q,
crte_btch_i,
updt_btch_i,
greg_d FROM nonclickstream.ITEM_LOC_DAY_PERSH_OH_INV where range_n='ITEM_LOC_DAY_PERSH_OH_INV_2013-04-01_2013-07-31';
we are using dynamic partitioning while loading!
Actually what we are trying to do is re-partitioning the table with another column. while modified the schema also.
The same approach worked for other tables... but for only this table we are facing this issue......

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Can't create HIVE table with 'ROW FORMAT SERDE' - hadoop

Related

Invalid Data in Hive-Created Table?

Getting error while creating hive table using "hive -e" but not in hive shell

Hive : Cannot INSERT OVERWRITE TABLE from unpartitioned External Table into a new partitioned table

Negative Array Size Exception while inserting into Hive Bucketed Table

data reload from one table to another in hive

Categories

Resources