How can I partition a hive table by (only) a portion of a timestamp column? - hadoop

Assume I have a Hive table that includes a TIMESTAMP column that is frequently (almost always) included in the WHERE clauses of a query. It makes sense to partition this table by the TIMESTAMP field; however, to keep to a reasonable cardinality, it makes sense to partition by day (not by the maximum resolution of the TIMESTAMP).
What's the best way to achieve this? Should I create an additional column (DATE) and partition on that? Or is there a way to achieve the partition without creating a duplicate column?

Its not a new column, but its a pseudo-column, You should re-create your table with adding the partitioning specification like this :
create table table_name (
id int,
name string,
timestamp string
)
partitioned by (date string)
Then you load the data creating the partitions dynamically like this
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
FROM table_name_old tno
INSERT OVERWRITE TABLE table_name PARTITION(substring(timestamp,0,10))
SELECT tno.id, tno.name, tno.timestamp;
Now if you select all from your table you will see a new column for the partition, but consider that a Hive partition is just a subdirectory and its not a real column, hence it does not affect the total table size only by some kilobytes.

As partition is also one of the column in hive, every partition has value(assign using static or dynamic partition) and every partition is mapped to directory in HDFS, so it has to be additional column.
You may choose one the below option:
Let's say table DDL:
CREATE TABLE temp( id string) PARTITIONED BY (day int)
If the data is organised day wise then add static partition:
ALTER TABLE xyz
ADD PARTITION (day=00)
location '/2017/02/02';
or
INSERT OVERWRITE TABLE xyz
PARTITION (day=1)
SELECT id FROM temp 
WHERE dayOfTheYear(**timestamp**)=1;
Generate day number using dynamic partition:
INSERT INTO TABLE xyz
PARTITION (day)
SELECT id ,
dayOfTheYear(day)
FROM temp;
Hive doesn't have any dayOfTheYear function you create it.

Related

Is it possible to automatically create a new partition in a list partitioned table?

The code below would create a new partition if I would insert a date that does not exist in my table. Is it possible to do the same thing in a list partitioned table, where the partition is based on a VARCHAR2 column?
ALTER TABLE MY_TABLE MODIFY
PARTITION BY RANGE(DATE) INTERVAL(NUMTODSINTERVAL(1,'day'))
( partition MY_PARTITION values less than (to_date('2019-06-01', 'yyyy-mm-dd')));
Yes, it is possible starting from the Oracle 12.2.
See the details here.

How to add interval partition to an existing table in oracle?

I have a scenario in which want to partition an existing Oracle table using interval partitioning. I don't know what is the best approach in the database to do the same.
Table size is around 11 GB. Partitioning needs to be done on Date column with the interval of 1 month.
Recreate the table:
Rename your current table to something like <your-table-name>_OLD
Create a new partitioned table with your original table name.
Insert your table data from the old table to the new table.
Drop the old table.
Precondition: you have the required space available in your database.
And don't forget to recreate all needed constraints etc.
Or use Oracle's online table redefinition mechanism: https://docs.oracle.com/en/database/oracle/oracle-database/12.2/vldbg/evolve-nopartition-table.html#GUID-6054142E-207A-4DF0-A62A-4C1A94DD36C4
If you want only partition the new data and let the old table as one partition you may usi this ismple approach withou reorganisation and required twice space.
0) This is your sample data , ID id the partition column defined up to 10.000
create table my_tab
(id number,
vc1 varchar2(100));
insert into my_tab(id,vc1)
select rownum id, 'xxx'||rownum vc1 from dual connect by level <= 10000;
commit;
1) create empty interval partitione dtable with one partition covering all your data - here ID less then 10.001
create table my_part_tab
(id number,
vc1 varchar2(100))
PARTITION BY RANGE (id)
INTERVAL (1000)
(PARTITION p1 VALUES LESS THAN (10001));
2) Define identical indexes on the partition table as on the original table
3) Exchange your table with the initial partition.
alter table my_part_tab exchange partition P1
with table my_tab
including indexes;
Now your original table is empty and all data are in the partition P1of the partitioned table.
New data (higher keys) would be stored based on the interval partition policy.

hive first column to consider in partition table

Creating partition table in hive, does it mandatory to choose always the last column for partition column.
If I choose 1st column as partition, I cant do filter data, is there any way to choose first column for partition?
In hive, if you want to partition a table, you have to define partition column first during table creation time. & while populating the data into table you need to specify as follow:
"INSERT INTO partitioned_table PARTITION(status) SELECT id , name, status from temp_tbl "
in this way using you can partition based on last column only. if you want to partition on the basis of first column. you have to write a Mapreduce job for that . that is the only option available.
I guess the problem you are facing is that you already have table "source" in your local system or hdfs and you want to upload it to partitioned table. And you want the first column in the source table to be partitioned in hive. As the source table does not have headers i guess we can not do anything here if we try to directly upload the file in the hive destination folder. The only alternate way i know is that create a non partitioned table in hive whose structure is exactly the same as the source file. then upload the source data to non partitioned table first, then copy the data from non partitioned table to partitioned table.
Suppose the source file is like this
create table source(eid int, ename int, esal int) partitioned by (dept string)
your non partioned table where you upload the data is like thiscreate table nopart(dept string, esal int,ename string, eid int)
then you use the dynamic partition by command insert overwrite table source partition(dept) select eid,ename,esal,dept from nopart;
the order of the parameters is the only point here.

How do I alter my existing table to create a range partition in Oracle

I have existing table which has 10 years of data (I have taken dump).
I would like to Range partition the existing table on one date key column within the table.
Most of the examples I see are with CREATE TABLE..PARTITION BY RANGE... to add new partitions. But my table is existing table.
I assume I need some ALTER statement.
ALTER TABLE TABLE_NAME
PARTITION BY RANGE(CREATED_DATE)
PARTITION JAN16 VALUES LESS THAN (01-02-2016),
PARTITION FEB16 VALUES LESS THAN (01-03-2016) AND GREATER THAN(31-01-2016),//OR?
PARTITION MAR16 VALUES BETWEEN (01-03-2016) AND (31-03-2016), //OR?
Two questions..
Do I need Alter statement to add partitioning mechanism or need to work with create statement?
What is the proper syntax for keeping each partition having only ONE MONTH data.
If you are using Oracle 12c Release 2 you could use single ALTER to convert non-partitioned table to partitioned one (this is one way trip):
CREATE TABLE my_tab ( a NUMBER(38,0), b NUMBER(38,0));
ALTER TABLE MY_TAB MODIFY PARTITION BY RANGE (a) INTERVAL (1000) (
PARTITION p1 VALUES LESS THAN (1000)) ONLINE;
You could convert indexes too, adding:
update indexes (index_name [local/global]);
db<>fiddle demo
Beacuse your table non-partitioned you have two options:
Export data, drop table, create new patitioned table, import data.
Use split then exchange partition method. https://oracle-base.com/articles/misc/partitioning-an-existing-table-using-exchange-partition
Also, if you want new partition per month read about SET INTERVAL. For example:
CREATE TABLE tst
(col_date DATE)
PARTITION BY RANGE (col_date) INTERVAL (NUMTOYMINTERVAL(1, 'MONTH'))
(PARTITION col_date_min VALUES LESS THAN (TO_DATE('2010-01-01', 'YYYY-MM-DD')));

Create new hive table from existing external portioned table

I have a external partitioned table with almost 500 partitions. I am trying to create another external table with same properties as of the old table. Then i want to copy all the partitions from my old table to the newly created table. below is my create table query. My old table is stored as TEXTFILE and i want to save the new one as ORC file.
'add jar json_jarfile;
CREATE EXTERNAL TABLE new_table_orc (col1,col2,col3...col27)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (....)
STORED AS orc
LOCATION 'path';'
And after creation of this table. i am using the below query to insert the partitions from old table to new one.i only want to copy few columns from original table to new table
'set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE new_table_orc PARTITION (year,month,day) SELECT col2,col3,col6,year,month,day FROM old_table;
ALTER TABLE new_table_orc RECOVER PARTITIONS;'
i am getting below error.
'FAILED: SemanticException [Error 10044]: Line 2:23 Cannot insert into target table because column number/types are different 'day': Table insclause-0 has 27 columns, but query has 6 columns.'
Any suggestions?
Your query has to match the number and type of columns in your new table. You have created your new table with 27 regular columns and 3 partition columns, but your query only select six columns.
If you really only care about those six columns, then modify the new table to have only those columns. If you do want all columns, then modify your select statement to select all of those columns.
You also will not need the "recover partitions" statement. When you insert into a table with dynamic partitions, it will create those partitions both in the filesystem and in the metastore.

Resources