I am trying to load data into a Hive table using partition.
The code is as follow:
CREATE EXTERNAL TABLE URL(url STRING, clicks INT)
COMMENT 'Unique Clicks per URL'
PARTITIONED BY(dt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/mypath/URL';
LOAD DATA INPATH '/inputpath/' INTO TABLE URL
PARTITION (dt=date_format(CURRENT_TIMESTAMP, "yyyy.MM.dd HH:mm:ss"));
I am gettin the following error:
FAILED: ParseException line 4:14 cannot recognize input near
'date_format' '(' 'CURRENT_TIMESTAMP' in constant
I tried using
SET hive.exec.dynamic.partition.mode=nonstrict;
but nothing changed.
Why is it not working?
How to set the current date as partition column?
Thank you in advance.
Lorenzo
Why move the files when you can create the external table on top of them?
LOAD DATA INPATH just moves the files (HDFS metadata operation) "as is", to the table's location.
Why define the partition column as a string when it is clearly a date?
CREATE EXTERNAL TABLE URL ... PARTITIONED BY(dt DATE) ...
Why are you trying to use non-ISO formats (yyyy.MM.dd)?
ISO date format is yyyy-MM-dd
Since it seems the partition information is not part of the data you have 3 options:
1.
Use a constant (no expression are allowed, including functions), e.g.
LOAD DATA INPATH '/inputpath/' INTO TABLE URL PARTITION (dt=date '2017-03-04');
2.
Create an additional table,URL_STG, similar to URL but without partition and use it to insert the partitions dynamically.
set hive.exec.dynamic.partition.mode=nonstrict;
insert into URL select *,current_date from URL_STG;
3.
Supply the date as a variable from the CLI
hive --hivevar dt=$(date +"%Y-%m-%d") -e \
'LOAD DATA INPATH '\''/inputpath/'\'' INTO TABLE URL PARTITION (dt=date '\''${hivevar:dt}'\'')'
Related
Is it possible to create external partitioned table without location? I want to add all the locations later, together with partitions.
i tried:
CREATE EXTERNAL TABLE IF NOT EXISTS a.b
(line STRING)
COMMENT 'abc'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE
PARTITIONED BY day;
but i got ParseException: missing EOF at 'PARTITIONED' near 'TEXTFILE'
I don't think so, as said in alter location.
But anyway, i think your query as some errors and the correct script would be :
CREATE EXTERNAL TABLE IF NOT EXISTS a.b
(line STRING)
COMMENT 'abc'
PARTITIONED BY (day String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE
;
I think the issue is that you have not specified data type for your partition column "day". And you can create a HIVE external table without location and can use ALTER table options later to change the location.
I am running the following commands to create my table ABC and insert data from all files that are in my designated file path. Now I want to add a column with filename, but I can't find any way to do that without looping through the files or something. Any suggestions on what the best way to do this would be?
CREATE TABLE ABC
(NAME string
,DATE string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
hive -e "LOAD DATA LOCAL INPATH '${DATA_FILE_PATH}' INTO TABLE ABC;"
Hive does have virtual columns, which include INPUT__FILE__NAME. The link shows how to use this in a statement.
To fill another table with the filename as a column. Assuming your location of your data is hdfs://hdfs.location:port/data/folder/filename1
DROP TABLE IF EXISTS ABC2;
CREATE TABLE ABC2 (
filename STRING COMMENT 'this is the file the row was in',
name STRING,
date STRING);
INSERT INTO TABLE ABC2 SELECT split(INPUT__FILE__NAME,'folder/')[1],* FROM ABC;
You can alter the split to change how much of the full path you actually want to store.
I have a partitioned Hive table that i want to load in a Pig script and would like to add partition as column also.
How can I do that?
Table definition in Hive:
CREATE EXTERNAL TABLE IF NOT EXISTS transactions
(
column1 string,
column2 string
)
PARTITIONED BY (datestamp string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/path';
Pig script:
%default INPUT_PATH '/path'
A = LOAD '$INPUT_PATH'
USING PigStorage('|')
AS (
column1:chararray,
column2:chararray,
datestamp:chararray
);
The datestamp column is not populated. Why is it so?
I am sorry I didn't get the part which says add partition as column also. Once created, partition keys behave like regular columns. What exactly do you need?
And you are loading the data directly from a given HDFS location, not as a Hive table. If you intend to use Pig to load/store data from/into a Hive table you should use HCatalog.
For example :
A = LOAD 'transactions' USING org.apache.hcatalog.pig.HCatLoader();
I have a log file in HDFS, values are delimited by comma. For example:
2012-10-11 12:00,opened_browser,userid111,deviceid222
Now I want to load this file to Hive table which has columns "timestamp","action" and partitioned by "userid","deviceid". How can I ask Hive to take that last 2 columns in log file as partition for table? All examples e.g. "hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');" require definition of partitions in the script, but I want partitions to set up automatically from HDFS file.
The one solution is to create intermediate non-partitioned table with all that 4 columns, populate it from file and then make an INSERT into first_table PARTITION (userid,deviceid) select from intermediate_table timestamp,action,userid,deviceid; but that is and additional task and we will have 2 very similiar tables.. Or we should create external table as intermediate.
Ning Zhang has a great response on the topic at http://grokbase.com/t/hive/user/114frbfg0y/can-i-use-hive-dynamic-partition-while-loading-data-into-tables.
The quick context is that:
Load data simply copies data, it doesn't read it so it cannot figure out what to partition
Would suggest that you load data into an intermediate table first (or using an external table pointing to all the files) and then letting partition dynamic insert to kick in to load it into a partitioned table
As mentioned in #Denny Lee's answer, we need to involve a staging table(invites_stg)
managed or external and then INSERT from staging table to partitioned table(invites in this case).
Make sure we have these two properties set to:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
And finally insert to invites,
INSERT OVERWRITE TABLE India PARTITION (STATE) SELECT COL's FROM invites_stg;
Refer this link for help: http://www.edupristine.com/blog/hive-partitions-example
I worked this very same scenario, but instead, what we did is create separate HDFS data files for each partition you need to load.
Since our data is coming from a MapReduce job, we used MultipleOutputs in our Reducer class to multiplex the data into their corresponding partition file. Afterwards, it is just a matter of building the script using the Partition from the HDFS file name.
How about
LOAD DATA INPATH '/path/to/HDFS/dir/file.csv' OVERWRITE INTO TABLE DB.EXAMPLE_TABLE PARTITION (PARTITION_COL_NAME='PARTITION_VALUE');
CREATE TABLE India (
OFFICE_NAME STRING,
OFFICE_STATUS STRING,
PINCODE INT,
TELEPHONE BIGINT,
TALUK STRING,
DISTRICT STRING,
POSTAL_DIVISION STRING,
POSTAL_REGION STRING,
POSTAL_CIRCLE STRING
)
PARTITIONED BY (STATE STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
5. Instruct hive to dynamically load partitions
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
I have my data in data/2011/01/13/0100/file in HDFS, each of thes file contain data in tab separated, say name, ip , url.
I want to create a table in Hive and import the data from hdfs, table should contain time,name, ip and url.
How can I import these using Hive ? r the data should be in some other format so that I can import the time as well ?
You need to create the table to load the files into and then use the LOAD DATA command to load the files into the Hive tables. See the Hive documentation for the precise syntax to use.
Regards,
Jeff
To do this you have to use partitions, read more about them here:
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Add_Partitions
partition column in hive
You can create an external table for such data.
Something like:
CREATE EXTERNAL TABLE log_data (name STRING, ip STRING, url STRING)
PARTITIONED BY (year BIGINT, month BIGINT, day BIGINT, hour BIGINT)
row format delimited fields terminated by '\t' stored as TEXTFILE
location 'data'