Unix time in PARTITION BY for Vertica - vertica

I have Big table in vertica which has time_stamp (int) as unix timestamp. I want to partition this table on week basis (week start day Monday).
Is there a better way to do this in one step rather than converting time_stamp from unix to TIMESTAMP (Vertica) then doing partitions ?

Optimally, you should be using the date/time type. You won't be able to use non-deterministic functions such as TO_TIMESTAMP in the PARTITION BY expression. The alternative is to use math to logically create the partitions:
Using a Unix timestamp to partition by:
Divide By
Minutes 60
Hours 60 * 60 (3600)
Days 60 * 60 * 24 (86400)
Weeks 60 * 60 * 24 * 7 (604800)
If we use 604800, this will give you the week number from January 1, 1970 00:00:00 UTC.
Let's set up a test table:
CREATE TABLE public.test (
time_stamp int NOT NULL
);
INSERT INTO public.test (time_stamp) VALUES (1404305559);
INSERT INTO public.test (time_stamp) VALUES (1404305633);
INSERT INTO public.test (time_stamp) VALUES (1404305705);
INSERT INTO public.test (time_stamp) VALUES (1404305740);
INSERT INTO public.test (time_stamp) VALUES (1404305778);
COMMIT;
Let's create the partition:
ALTER TABLE public.test PARTITION BY FLOOR(time_stamp/604800) REORGANIZE;
We then get:
NOTICE 4954: The new partitioning scheme will produce 1 partitions
WARNING 6100: Using PARTITION expression that returns a Numeric value
HINT: This PARTITION expression may cause too many data partitions. Use of an expression that returns a more accurate value, such as a regular VARCHAR or INT, is encouraged
NOTICE 4785: Started background repartition table task
ALTER TABLE
You'll also want to be mindful of how many partitions this creates. Vertica recommends keeping the number of partitions between 10-20.

Related

partition in oracle

CREATE TABLE temp_stud as select * from STUD_MAST
PARTITION BY RANGE(ADM_DT)
(
PARTITION temp_stud1 VALUES LESS THAN(TO_DATE('02/01/2000','MM/DD/YYYY')),
PARTITION temp_stud2 VALUES LESS THAN(TO_DATE('03/01/2000','MM/DD/YYYY')),
PARTITION temp_stud3 VALUES LESS THAN(TO_DATE('04/01/2000','MM/DD/YYYY')),
PARTITION temp_stud4 VALUES LESS THAN(TO_DATE('05/01/2000','MM/DD/YYYY'))
);
I am getting a missing left parenthesis error for above table creation can anyone tell me what is the issue in above creation
Note: ADM_DT is a date column with data type char(8) and storing format YYMMDD
Please use below SQL. The Creation of Partition has be part of Create table.
CREATE TABLE temp_stud
PARTITION BY RANGE(ADM_DT)
(
PARTITION temp_stud1 VALUES LESS THAN(TO_DATE('02/01/2000','MM/DD/YYYY')),
PARTITION temp_stud2 VALUES LESS THAN(TO_DATE('03/01/2000','MM/DD/YYYY')),
PARTITION temp_stud3 VALUES LESS THAN(TO_DATE('04/01/2000','MM/DD/YYYY')),
PARTITION temp_stud4 VALUES LESS THAN(TO_DATE('05/01/2000','MM/DD/YYYY'))
)
as select * from STUD_MAST;

Divide into hourly partition #2

I've asked a similar question before but couldn't figure it out(Divide into hourly partitions and inset data by partitions).
I created a table with hourly partitions, and when inserting data, it correctly divides by hourly partitions:
create table SOFT_CALLS(
callid VARCHAR2(90),
start_time date not null,
start_time_h as (to_char(start_time, 'HH24')),
duration NUMBER,
first_question VARCHAR2(4000),
second_question VARCHAR2(4000),
client_id VARCHAR2(20),
contract_id VARCHAR2(20),
client_dwh_id VARCHAR2(20)
)
partition by list (start_time_h)
(
partition p_08 values ('08'),
partition p_09 values ('09'),
partition p_10 values ('10'),
partition p_11 values ('11'),
partition p_12 values ('12'),
partition p_13 values ('13'),
partition p_14 values ('14'),
partition p_15 values ('15'),
partition p_16 values ('16'),
partition p_17 values ('17'),
partition p_18 values ('18'),
partition p_19 values ('19'),
partition p_20 values ('20'),
partition p_21 values ('21'),
partition p_22 values ('22'),
partition p_23 values ('23')
);
The logic of the procedure should be as follows: so that it inserts data into the table for trunc(sysdate - 1/24, 'hh') and trunc(sysdate + 1/24, 'hh') and then transfers the same data to SAS and at the end simply drop these partitions so that the procedure does not do this for the entire trunc (sysdate) every five minutes, so resource consumption will decrease.
It is necessary that the table stores data in hourly partitions, that is, it will not truncate all the data each time and refill for the whole day, but only drop partitions for the last hour and also insert data without losing data for the hours that have already passed.
But the problem is that list partitions cannot be dropped during this period trunc(sysdate - 1/24, 'hh') and trunc(sysdate + 1/24, 'hh').
What other examples are there for dividing a table into hourly partitions and so that in the end there would be no problems dropping them over the period that I indicated? I've looked all over the internet, but I haven't been able to find any specific information.

Oracle -- Datatype of column which can store value "13:45"

We need to store a value "13:45" in the column "Start_Time" of an Oracle table.
Value can be read as 45 minutes past 13:00 hours
Which datatype to be used while creating the table? Also, once queried, we would like to see only the value "13:45".
I would make it easier:
create table t_time_only (
time_col varchar2(5),
time_as_interval INTERVAL DAY TO SECOND invisible
generated always as (to_dsinterval('0 '||time_col||':0')),
constraint check_time
check ( VALIDATE_CONVERSION(time_col as date,'hh24:mi')=1 )
);
Check constraint allows you to validate input strings:
SQL> insert into t_time_only values('25:00');
insert into t_time_only values('25:00')
*
ERROR at line 1:
ORA-02290: check constraint (CHECK_TIME) violated
And invisible virtual generated column allows you to make simple arithmetic operations:
SQL> insert into t_time_only values('15:30');
1 row created.
SQL> select trunc(sysdate) + time_as_interval as res from t_time_only;
RES
-------------------
2020-09-21 15:30:00
Your best option is to store the data in a DATE type column. If you are going to be any comparisons against the times (querying, sorting, etc.), you will want to make sure that all of the times are using the same day. It doesn't matter which day as long as they are all the same.
CREATE TABLE test_time
(
time_col DATE
);
INSERT INTO test_time
VALUES (TO_DATE ('13:45', 'HH24:MI'));
INSERT INTO test_time
VALUES (TO_DATE ('8:45', 'HH24:MI'));
Test Query
SELECT time_col,
TO_CHAR (time_col, 'HH24:MI') AS just_time,
24 * (time_col - LAG (time_col) OVER (ORDER BY time_col)) AS difference_in_hours
FROM test_time
ORDER BY time_col;
Test Results
TIME_COL JUST_TIME DIFFERENCE_IN_HOURS
____________ ____________ ______________________
01-SEP-20 08:45
01-SEP-20 13:45 5
Table Definition using INTERVAL
create table tab
(tm INTERVAL DAY(1) to SECOND(0));
Input value as literal
insert into tab (tm) values (INTERVAL '13:25' HOUR TO MINUTE );
Input value dynamically
insert into tab (tm) values ( (NUMTODSINTERVAL(13, 'hour') + NUMTODSINTERVAL(26, 'minute')) );
Output
you may either EXTRACT the hour and minute
EXTRACT(HOUR FROM tm) int_hour,
EXTRACT(MINUTE FROM tm) int_minute
or use formatted output with a trick by adding some fixed DATE
to_char(DATE'2000-01-01'+tm,'hh24:mi') int_format
which gives
13:25
13:26
Please see this answer for other formating options HH24:MI
The used INTERVAL definition may store seconds as well - if this is not acceptable, add CHECK CONSTRAINT e.g. as follows (adjust as requiered)
tm INTERVAL DAY(1) to SECOND(0)
constraint "wrong interval" check (tm <= INTERVAL '23:59' HOUR TO MINUTE and EXTRACT(SECOND FROM tm) = 0 )
This rejects the following as invalid input
insert into tab (tm) values (INTERVAL '13:25:30' HOUR TO SECOND );
-- ORA-02290: check constraint (X.wrong interval) violated

Oracle DB daily partitioning

I have the following table
CREATE TABLE "METRIC_VALUE_RAW"
(
"SUBELEMENT_ID" INTEGER NOT NULL ,
"METRIC_METADATA_ID" INTEGER NOT NULL ,
"METRIC_VALUE_INT" INTEGER,
"METRIC_VALUE_FLOAT" FLOAT(126),
"TIME_STAMP" TIMESTAMP NOT NULL
) ;
Every hour data will be loaded into the table using sql loader.
I want to create partitions so that data for every day go into a partition.
In table I want to store data for 30 days. So when it crosses 30 days, the oldest partition should get deleted.
Can you share your ideas on how I can design the partitions.
here is an example how to do it on Oracle 11g and it works very well. I haven't tried it on Oracle 10g, you can try it.
This is the way, how to create a table with daily partitions:
CREATE TABLE XXX (
partition_date DATE,
...,
...,
)
PARTITION BY RANGE (partition_date)
INTERVAL (NUMTODSINTERVAL(1, 'day'))
(
PARTITION part_01 values LESS THAN (TO_DATE('2000-01-01','YYYY-MM-DD'))
)
TABLESPACE MY_TABLESPACE
NOLOGGING;
As you see above, Oracle will automaticaly create separate partitions for each distinct partition_day after 1st January 2000. The records, whose partition_date is older than this date, will be stored in partition called 'part_01'.
You can monitore your table partitions using this statement:
SELECT * FROM user_tab_partitions WHERE table_name = 'XXX';
Afterwards, when you would like to delete some partitions, use following command:
ALTER TABLE XXX DROP PARTITION AAAAAA UPDATE GLOBAL INDEXES
where 'AAAAAA' is parition name.
I hope it will help you!
As i said , There are big differences in partition automation between 10g and 11g.
In 10G you will have to manually manage the partitions during your ETL process (I'm sure every 10g DBA has a utility package he wrote to manage partitions ... ).
For steps 1 & 2 , you have several options
load data directly into the daily partition.
load data into a new partition and merge it into the daily one.
load data into a new partition every hour, and during a maintenance
window merge all hourly partitions into a daily partition.
The right way for you depends on your needs. Is the newly added data is queried immediately ? In what manner ? Would you query for data across several hours (or loads...) ? Are you showing aggregations ? are you performing DML operations on the data (DDL operations on partitions cause massive locking).
about 3, again - manually. drop old partitions.
In 11G, you have the new interval partition feature with automates some of the tasks mentioned above.
Following is a sample create table sql to parititon data:
CREATE TABLE quarterly_report_status (
report_id INT NOT NULL,
report_status VARCHAR(20) NOT NULL,
report_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
)
PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated) ) (
PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-01 00:00:00') ),
PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-02 00:00:00') ),
PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-03 00:00:00') ),
PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-04 00:00:00') ),
PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-05 00:00:00') ),
PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-06 00:00:00') ),
PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-07 00:00:00') ),
PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-08 00:00:00') ),
PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-09 00:00:00') ),
PARTITION p9 VALUES LESS THAN (MAXVALUE)
);
Paritions will be created by DBa and rest will be taken care of by oracle.
If you want to delete partition then you will have to write separate jobs for it.

Does storing date without time use less bytes?

I saw this link and this:
If you specify a date value without a time component, then the default
time is midnight. If you specify a date value without a date, then the
default date is the first day of the current month.
Oracle DATE columns always contain fields for both date and time. If
your queries use a date format without a time portion, then you must
ensure that the time fields in the DATE column are set to midnight.
The solution is to put a constraint on column with the date data-type and create a trigger (with TRUNC()) when inserting or updating a row in table.
If I use this solution do I have the warranty that Oracle does store less bytes for a date without the time?
With this standard datetime type Oracle create ambiguity. It is so hard to create a date type (only containing a date)? This is my opinion (I came from MSSQL).
No, you do not have any warranty whatsoever... whatever happens Oracle is going to store the fact that it's midnight. You cannot store a date without a time.
If you create the following table:
create table a ( dt date);
insert into a values(sysdate);
insert into a values(trunc(sysdate));
and then run this query:
select dt, dump(dt) from a
SQL Fiddle
The values returned are:
+-----------------------------+------------------------------------+
| DT | DUMP(DT) |
+-----------------------------+------------------------------------+
| June, 12 2013 18:03:15+0000 | Typ=12 Len=7: 120,113,6,12,19,4,16 |
| June, 12 2013 00:00:00+0000 | Typ=12 Len=7: 120,113,6,12,1,1,1 |
+-----------------------------+------------------------------------+
DUMP() returns the datatype, the length in bytes and the internal representation of the data.
In other words, a date with a time, and a date that have been truncated, both have 7 bytes. They're the same length.
As a little aside I would recommend against destroying potentially useful data because you're worried about space.
Storing only dates may save space if you use table compression.
Here's an example showing that storing only dates can reduce the segment size:
create table a (dt date) compress;
create table b (dt date) compress;
--Insert 20 million rows, with time
begin
for i in 1 .. 20 loop
insert /*+ append */ into a
select sysdate + numToDSInterval(level, 'second')
from dual connect by level <= 1000000;
commit;
end loop;
end;
/
--Insert 20 million rows, date only
begin
for i in 1 .. 20 loop
insert /*+ append */ into b
select trunc(sysdate + numToDSInterval(level, 'second'))
from dual connect by level <= 1000000;
commit;
end loop;
end;
/
select segment_name, bytes/1024/1024 MB
from dba_segments
where segment_name in ('A', 'B')
order by segment_name;
SEGMENT_NAME MB
------------ --
A 256
B 224
Oracle basic table compression only compresses entire values, and if there are fewer distinct values then compression can work better. But never fully believe any compression demo - you need to try it on your own data to be sure. This may be a best-case scenario, it is possible that compression will not help your data at all.
Table compression has many downsides - it requires enterprise edition, DML is slower, you cannot add a column to the table, etc.
Also, as Ben suggested, you should enforce the date-only rule with a check constraint instead of a trigger. It will be simpler, faster, and will not prevent direct-path writes, which are necessary to use basic table compression.

Resources