Hive partition table by month from daily timestamp? - hadoop

Is it possible to create partition like 01 from date like 2017-01-02' where 01 is month ?
I have daily sales record and I need to do query like select * from sales where month = '01'. So it will be better if I could partition my daily sales by month.but my data has date of format 2017-01-01 and doing
create table tl (columns ......) partitioned by (date <datatype> ) will create partition on daily basis which is the last thing I want .
I need to create partition dynamically.

CAUTION:- You need to escape date column(by using ` i.e. backtick around column name) in create statement. Because date is a datatype in hive.
You can create partitions dynamically:-
by setting below parameter in query.
set hive.exec.dynamic.partition.mode=nonstrict;
Along with that you need to select only month part from source table:-
insert into table sales partition(date) select columns...,SUBSTR(date,5,2) from source_table
This insert statement will create partitions like.
show partitions sales
date=01
date=02
date=03
date=04

Related

Oracle SQL Developer- How to force 00:00:00 hour when inserting a new DATE value

In my Oracle SQL Developer, i have a table with a column with DATE format. When i insert a new row into this table, and insert a new value in this column, it automatically suggestes me the current date with the current hour.
I would like that it automatically suggestes me current date, but with 00:00:00 hour . Is there some setting or parameter that i can set in my SQL Developer to have this result?
We can't able to insert 00:00:00 hours ... the hour value should be between 1 to 12...
we can use below query to insert 00:00:00 hours but the value will be changed to 12:00:00
INSERT INTO TABLE (DATE_COL) VALUES
( TO_DATE ('11/16/2017 00:00:00 ', 'MM/DD/YYYY HH24:MI:SS '));
It seems to me that your DATE column is set with a DEFAULT of SYSDATE. This means, for any INSERT operations which do not specify a value in your DATE column, the current date and time will populate for that row. However, if INSERT operations do specify a value in your DATE column, then the specified date value will supersede the DEFAULT of SYSDATE.
If an application is controlling INSERT operations on that table, then one solution is to ensure the application utilizes the TRUNC() function to obtain your desired results. For example:
INSERT INTO tbl_target
(
col_date,
col_value
)
VALUES
(
TRUNC(SYSDATE, 'DDD'),
5000
)
;
However, if there are multiple applications or interfaces where users could be inserting new rows into the table, (e.g. using Microsoft Access or users running INSERT statements via SQL Developer) and you can't force all of those interfaces to utilize the TRUNC() function on that column during insertion, then you need to look into other options.
If you can ensure via applications that INSERT operations will not actually reference the DATE, then you can simply ALTER the table so that the DATE column will have a DEFAULT of TRUNC(SYSDATE). A CHECK CONSTRAINT can be added for further integrity:
ALTER TABLE tbl_target
MODIFY
(
col_date DATE DEFAULT TRUNC(SYSDATE, 'DDD') NOT NULL
)
ADD
(
CONSTRAINT tbl_target_CHK_dt CHECK(col_date = TRUNC(col_date, 'DDD'))
)
;
However, if users still have the freedom to specify the DATE when inserting new rows, you will want to use a TRIGGER:
CREATE OR REPLACE TRIGGER tbl_target_biu_row
BEFORE INSERT OR UPDATE OF col_val
ON tbl_target
FOR EACH ROW
BEGIN
:NEW.col_date := TRUNC(SYSDATE, 'DDD');
END tbl_target_biu_row
;
This will take of needing to manage the application code of all external INSERT operations on the table. Keep in mind, the above trigger is also modifying the DATE column if a user updates the specified value column.

How can insert into the table with the original day as partition in Hive?

create table h5_qti_desc
( h5id string,
query string,
title string,
item string,
query_ids string,
title_ids string,
item_ids string,
label bigint
)PARTITIONED BY (day string) LIFECYCLE 160;
insert overwrite into h5_qti_desc
select * from aaa
;
I create a table named h5_qti_desc, and I want to insert into it from another aaa table, which has the field of day and there is no partition in aaa.
Table aaa has several days, like '20171010','20171015'...
How can I insert into h5_qti_desc with day as partition once, and the days in aaa acted as day in h5_qti_desc's partition.
You can use Hive dynamic partition functionality to insert data. Dynamic-partition insert (or multi-partition insert) is designed to solve this problem by dynamically determining which partitions should be created and populated while scanning the input table.
Below is an example of loading data to all partitions using one insert statement:
hive>set hive.exec.dynamic.partition.mode=nonstrict;
hive>INSERT OVERWRITE TABLE h5_qti_desc PARTITION(day)
SELECT * FROM aaa
DISTRIBUTE day;

How to use insert statement for a Hive partitioned table?

I have a hive table dynpart.
id int
name char(30)
city char(30)
thisday string
# Partition Information
# col_name data_type comment
thisday string
It is partitioned by 'thisday' whose datatype is STRING.
How can I insert a single record into the table in a particular partition. I know there is load command to load an entire file data into hive table. I just want to know how an Insert statement can be written for a partitioned table. I tried to write command like below but this is taking data from another table.
insert into droplater partition(thisday='30/03/2017') select * from dynpart;
The table: Droplater has the same structure as dynpart. But the above command is to insert the data from another table. What I'd like to learn is to write a simple insert command into a partition, like: insert into tabname values(1,"abcd","efgh");into the table.
This will work for primitive types only (no arrays, structs etc.)
insert into tabname partition (thisday='30/03/2017') values (1,"abcd","efgh");
This will work for all types
insert into tabname partition (thisday='30/03/2017') select 1,"abcd","efgh";
P.s.
By all means, partition your table by date ((thisday date) )
insert into tabname partition (thisday=date '2017-03-30') ...
or at least use the ISO date format
insert into tabname partition (thisday='2017-03-30') ...

Hive update all values in a column

I have an external partitioned Hive table. One of its columns is a string named OLDDATE that has the date in a different format(DD-MM-YY). I want to update the column and store dates in YYYY-MM-DD format. All years are 20XX.
So I thought of this
select CONCAT('20',SPLIT(OLDDATE ,'-')[2],'-',SPLIT(OLDDATE ,'-')[1],'-',SPLIT(OLDDATE ,'-')[0]) from table
This gives me the dates in the format I want. Now how do I overwrite the old date with this new date?
You can effect an update by overwriting the table with its own contents, just with the date field changed according to your transformation, like this pseudo-code:
INSERT OVERWRITE table
SELECT
col1
, col2
...
, CONCAT('20',SPLIT(OLDDATE ,'-')[2],'-',SPLIT(OLDDATE ,'-')[1],'-',SPLIT(OLDDATE ,'-')[0]) AS olddate
...
, coln
FROM table;
#user2441441
To overwrite a partitioned table:
INSERT OVERWRITE table PARTITION (p_col)
SELECT
col1
, col2
...
, CONCAT('20',SPLIT(OLDDATE ,'-')[2],'-',SPLIT(OLDDATE ,'-')[1],'-
',SPLIT(OLDDATE ,'-')[0]) AS olddate
...
, coln
, p_col
FROM table;
Since its an partitioned table, the folder names must be created with the date values.
Hence you are not able to update the values.
One work around for this would be create a new table and run your above query and insert data into the new table.
After that you can drop your existing table and treat this new table as your required table.

Hive and Sqoop partition

I have sqoopd data from Netezza table and output file is in HDFS, but one column is a timestamp and I want to load it as a date column in my hive table. Using that column I want to create partition on date. How can i do that?
Example: in HDFS data is like = 2013-07-30 11:08:36
In hive I want to load only date (2013-07-30) not timestamps. I want to partition on that column DAILY.
How can I pass partition by column as dynamically?
I have tried with loading data into one table as source. In final table I will do insert overwrite table partition by (date_column=dynamic date) select * from table1
Set these 2 properties -
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
And the Query can be like -
INSERT OVERWRITE TABLE TABLE PARTITION (DATE_STR)
SELECT
:
:
-- Partition Col is the last column
to_date(date_column) DATE_STR
FROM table1;
You can explore the two options of hive-import - if it is an incremental import you will be able to get the current day's partition.
--hive-partition-key
--hive-partition-value
You can just load the EMP_HISTORY table from EMP by enabling dynamic partition and converting the timestamp to date using to_date date function
The code might look something like this....
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE EMP_HISTORY PARTITION (join_date)
SELECT e.name as name, e.age as age, e.salay as salary, e.loc as loc, to_date(e.join_date) as join_date from EMP e ;

Resources