I am trying to create a table in Hive from a txt file using a shell script in this format.
My t_cols.txt has data as below:
id string, name string, city string, lpd timestamp
I want to create hive table whose columns should be coming from this text file.
This is how my shell script looks like:
table_cols=`cat t_cols.txt`
hive --hiveconf t_name=${table_cols} -e 'create table leap_frog_snapshot.LINKED_OBJ_TRACKING (\${hiveconf:t_name}) stored as orc tblproperties ("orc.compress"="SNAPPY");'
This is not working somehow.
I am getting the below error:
Logging initialized using configuration in file:/etc/hive/2.4.3.0-227/0/hive-log4j.properties
NoViableAltException(307#[])
at org.apache.hadoop.hive.ql.parse.HiveParser.type(HiveParser.java:38618)
at org.apache.hadoop.hive.ql.parse.HiveParser.colType(HiveParser.java:38375)
at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:38059)
at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:36183)
at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:5222)
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2648)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1658)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1117)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:316)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1202)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1250)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1139)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1129)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:379)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:314)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:428)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:717)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:60 cannot recognize input near ')' 'stored' 'as' in column type
Am I missing something?
If this is not the right thing to do, what is the correct way to achieve this?
This is because cat is giving only first word before a space into the variable.
Following should work
#!/bin/bash
hive --hiveconf t_name="`cat t_cols.txt`" -e 'create table leap_frog_snapshot.LINKED_OBJ_TRACKING (${hiveconf:t_name}) stored as orc tblproperties ("orc.compress"="SNAPPY") ; '
Related
HI I am trying to build a cube with Kylin, the data gets souced fine from sqoop but the next step for creating hive tables fails . Looking at the command being fired it looks weird as the create statement looks good to me .
I think the issue is with DOUBLE types as when I remove the same the create statement works fine . Can someone please help .
I am using the stack in AWS EMR, kylin 2.5 hive 2.3.0
The errors logs with commands as as below
Command
hive -e "USE default;
DROP TABLE IF EXISTS kylin_intermediate_fm_inv_holdings_8a1c33df_d12b_3609_13ee_39e169169368;
CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_fm_inv_holdings_8a1c33df_d12b_3609_13ee_39e169169368
(
HOLDINGS_STOCK_INVESTOR_ID string
,STOCK_INVESTORS_CHANNEL string
,STOCK_STOCK_ID string
,STOCK_DOMICILE string
,STOCK_STOCK_NM string
,STOCK_APPROACH string
,STOCK_STOCK_TYP string
,INVESTOR_ID string
,INVESTOR_NM string
,INVESTOR_DOMICILE_CNTRY string
,CLIENT_NM string
,INVESTOR_HOLDINGS_GROSS_ASSETS_USD double(22)
,INVESTOR_HOLDINGS_NET_ASSETS_USD double(22)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 's3://wfg1tst-models/kylin/kylin_metadata/kylin-4ae3b18b-831b-da66-eb8c-7318245c4448/kylin_intermediate_fm_inv_holdings_8a1c33df_d12b_3609_13ee_39e169169368';
ALTER TABLE kylin_intermediate_fm_inv_holdings_8a1c33df_d12b_3609_13ee_39e169169368 SET TBLPROPERTIES('auto.purge'='true');
" --hiveconf hive.merge.mapredfiles=false --hiveconf hive.auto.convert.join=true --hiveconf dfs.replication=2 --hiveconf hive.exec.compress.output=true --hiveconf hive.auto.convert.join.noconditionaltask=true --hiveconf mapreduce.job.split.metainfo.maxsize=-1 --hiveconf hive.merge.mapfiles=false --hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 --hiveconf hive.stats.autogather=true
Error is as below
OK
Time taken: 1.315 seconds
OK
Time taken: 0.09 seconds
MismatchedTokenException(334!=347)
at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:6179)
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:3808)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2382)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1333)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1316)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1456)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1236)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1226)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 15:42 mismatched input '(' expecting ) near 'double' in create table statement
Ohh I think I finally Figured it out seems DOUBLE with precision is not supported by Hive . But I think Kylin should take care of this while importing the jdbc metadata into the model .
Will raise an enhancement or bug in Kylin for the same .
I installed the newest stable hive in my test env, which version is 2.3.4( Hive DDL doc says 2.2.0 support double with precision). I found hive didn't not support double with specific/user-defined precision.
e.g.
[ERROR] create table haha(ha double 10);
[ERROR] create table haha(ha double Precision 17);
[ERROR] create table haha(ha double(22));
[CORRECT] create table haha(ha double Precision);
Hive CMD:
hive> use ss;
OK
Time taken: 0.015 seconds
hive> create table haha(ha double 10);
FAILED: ParseException line 1:28 extraneous input '10' expecting ) near '<EOF>'
hive> create table haha(ha double Precision 17);
FAILED: ParseException line 1:38 extraneous input '17' expecting ) near '<EOF>'
hive> create table haha(ha double(22));
MismatchedTokenException(334!=347)
at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:6179)
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:3808)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2382)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1333)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:27 mismatched input '(' expecting ) near 'double' in create table statement
hive> create table haha(ha double Precision);
OK
Time taken: 0.625 seconds
So, as far as I can concerned, Hive didn't support such data type definition. It can not be resolved by kylin dev.
If you found any mistake, please let me know. And I found below links which maybe helpful:
https://cwiki.apache.org/confluence/display/hive/LanguageManual+DDL#LanguageManualDDL-CreateTableCreate/Drop/TruncateTable
https://issues.apache.org/jira/browse/HIVE-13556
I have set up hive on mac. While executing simple create external table query. I am getting below stack trace :
hive> CREATE EXTERNAL TABLE weatherext ( wban INT, date STRING)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ‘,’
> LOCATION ‘ /hive/data/weatherext’;
NoViableAltException(80#[]) at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeOrPKOrFK(HiveParser.java:33341)
at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeOrPKOrFKList(HiveParser.java:29513)
at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:6175)
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:3808)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2382)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1333)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1316)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1456)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1236)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1226)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:45 cannot recognize input near 'date' 'STRING' ')' in column name or primary key or foreign key.
I am able to run same query successfully on ubuntu.
Is it mandatory to surround coulmn names with ` in hive2?
date is a reserved word and should be qualified.
You are using the wrong qualifiers. You should use ' and not ‘ or ’.
You have space in the beginning of your location /hive/data...
CREATE EXTERNAL TABLE weatherext (wban INT, `date` STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/hive/data/weatherext';
I am trying to create hive table on top of HBase table. Using the mentioned query for same.
create external table MaprDB_batch_info_table (Batch_ID string, BatchParserJobId string, count string, CurrentRunTime string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,BatchInfo:BatchParserJobId,BatchInfo:count,BatchInfo:CurrentRunTime") TBLPROPERTIES ('hbase.table.name' = '/user/all/batchinfo');
This command is successfully executing in hive shell but when I try to execute same through bash shell
hive -e "create external table MaprDB_batch_info_table (Batch_ID string, BatchParserJobId string, count string, CurrentRunTime string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,BatchInfo:BatchParserJobId,BatchInfo:count,BatchInfo:CurrentRunTime") TBLPROPERTIES ('hbase.table.name' = '/user/all/batchinfo');
I get below error:
NoViableAltException(26#[])
at org.apache.hadoop.hive.ql.parse.HiveParser.tablePropertiesList(HiveParser.java:34375)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableProperties(HiveParser.java:34243)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableFileFormat(HiveParser.java:35913)
at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:5380)
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2640)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1650)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:397)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:309)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1146)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1194)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1083)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1073)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:473 cannot recognize input near 'hbase' '.' 'columns' in table properties list'
If anybody can help in rectifying this please.
Replace the " that you have within the query with '
...('hbase.columns.mapping'=':key,BatchInfo:BatchParserJobId,BatchInfo:count,BatchInfo:CurrentRunTime')...
Also you have an issue with the value given to 'hbase.table.name', replace the path with the actual table name.
I am unable to create the partition into a new table from the table which is already present on hive.
The query that I am running on hive after the Table creation is
INSERT INTO TABLE ba_data.PNR_INFO1_partitioned PARTITION(pnr_create_dt) select * from pnr_info1_external;
The error that I am getting is
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hive/warehouse/ba_data.db/pnr_info1_partitioned/.hive-staging_hive_2016-08-09_17-47-47_508_8688474345886508021-1/_task_tmp.-ext-10002/pnr_create_dt=18%2F12%2F2013/_tmp.000000_3 could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
After I browsed and found that namenode,datanode folders needs to be deleted and namenode should be formatted.I have done that sanitary task as well.But still the same error I am getting.
Also I have set the replication factor to 1 and all the Hadoop processes are running well.
Please suggest me how to proceed in order to get away from this issue.Your suggestions are much appreciated.
first you need to create
1. table with all field
2. load data into table
3. create table with partition column with their type
4. copy data from first table to partition table
I think the dynamic partition needs to be enabled. following works.
set hive.exec.dynamic.partition.mode=nonstrict;
create table parttable (id int) partitioned by (partcolumn string)
row format delimited fields terminated by '\t'
lines terminated by '\n'
;
create table source_table (id int,partcolumn string)
row format delimited fields terminated by '\t'
lines terminated by '\n'
;
insert into source_table values (1,'Chicago');
insert into source_table values (2,'Chicago');
insert into source_table values (3,'Orlando');
set hive.exec.dynamic.partition=true;
insert overwrite table parttable partition(partcolumn)
select id,partcolumn from source_table;
Here's the scenario: 3 folders are located in hdfs. The files are as follows:
/root/20140901/part-0
/root/20140901/part-1
/root/20140901/part-2
/root/20140902/part-0
/root/20140902/part-1
/root/20140902/part-2
/root/20140903/part-0
/root/20140903/part-1
/root/20140903/part-2
After creating a hive table whose command is as below, I invoke hql=[select * from hive_combine_test where rdm > 50000;], this will cost 9 mappers, just the same number as the files in hdfs.
CREATE EXTERNAL table hive_combine_test
(id string,
rdm string)
PARTITIONED BY (dateid string)
row format delimited fields terminated by '\t'
stored as textfile;
ALTER TABLE hive_combine_test
ADD PARTITION (dateid='20140901')
location '/root/20140901';
ALTER TABLE hive_combine_test
ADD PARTITION (dateid='20140902')
location '/root/20140902';
ALTER TABLE hive_combine_test
ADD PARTITION (dateid='20140903')
location '/root/20140903';
But what I want is to make all the part-i together in one split, in such way, there should be only three mappers.
I've tried to inherit from org.apache.hadoop.hive.ql.io.HiveInputFormat in order to test whether the custom JudHiveInputFormat can work.
public class JudHiveInputFormat<K extends WritableComparable, V extends Writable>
extends HiveInputFormat<WritableComparable, Writable> {
}
But when I mount it in hive, it returns exception:
hive> add jar /my_path/jud_udf.jar;
hive> set hive.input.format=com.judking.hive.inputformat.JudHiveInputFormat;
hive> select * from hive_combine_test where rdm > 50000;
java.lang.RuntimeException: com.judking.hive.inputformat.JudCombineHiveInputFormat
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:290)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1472)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1239)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1057)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:880)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:870)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Could anyone give me some clue? Thanks a lot!
As far as I know to add a custom INPUT/OUTPUT format in Hive you need to mention that format in your create table statement. Some thing like this:
CREATE TABLE (...)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT '<your input format class name >' OUTPUTFORMAT '<your output format class name>';
Since you need only InputFormat, so your create table statement will look like this:
CREATE TABLE (...)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT 'JudHiveInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat';
Why you need to mention this OUTPUT format class, since you have overwritten the INPUT format Hive expects the OUTPUT class as well, so here we need to say Hive to use its DEFAULT OUTPUT format class.
May be you can give it a try.
Hope it helps...!!!