How to do range partitioning on varchar column in vertica - vertica

I have a big table and want to do partitioning on varchar columns.I have tried to partitioning it using this script in vertical:
create table tb1(
symbol varchar not null,
...
mmid varchar)
PARTITION BY symbol;
I believe that the PARTITION BY did value partitioning on symbol column, and when I load data into the table, it failed with too many partitions as expected.
How can I do range partitioning on symbol column?
For example I know that the DolphinDB could do this by using the below script
sym = `a`abc`aaa`bbc`bac`b`c`cd`cab`abd
val = rand(1.0, 10)
t=table(sym, val)
db=database("/tmp/db", RANGE, `a`b`c`d)
db.createPartitionedTable(t, `table, `sym)
the patitions will be a-b b-c and c-d.

You can use any deterministic function in the PARTITION BY clause.
For example:
CREATE TABLE tb1 (
symbol varchar NOT NULL
) PARTITION BY LEFT(symbol,2);
INSERT /*+direct*/ INTO tb1 SELECT 'abc';
INSERT /*+direct*/ INTO tb1 SELECT 'bbc';
INSERT /*+direct*/ INTO tb1 SELECT 'bca';
SELECT DISTINCT partition_key
FROM partitions
WHERE projection_name LIKE 'tb1%';
partition_key
---------------
ab
bb
bc

Related

Yearly Partitioning a Integer column which stores Date as ID

I have a table which has a DATE_ID column (Integer data type). This column basically stores dates in ID format. Input data:
DATE_ID
19961210
19991001
20051212
20090108
I need to partition this table (YEARly) based on date_id column. Please note that this is an existing process and we are migrating our tables from database to another. These column datatypes cannot be changed as they will be referred by downstream process in that fashion only.
I tried below interval partitioning but somehow didn't work. Can someone pls help?
CREATE TABLE test (date_id INT NOT NULL, text VARCHAR2(500))
PARTITION BY RANGE (DATE_ID) INTERVAL (365)
(
PARTITION P0 VALUES LESS THAN (19961231),
PARTITION P1 VALUES LESS THAN (19991231),
PARTITION P2 VALUES LESS THAN (20091231)
);
I got the solution:
CREATE TABLE test (
DATE_ID INTEGER
)
partition BY RANGE (DATE_ID )
interval(10000)
(partition p0 values less than(19961231),
partition p1 values less than(19991231),
partition p1 values less than(20091231)
);
You can add a virtual column in order to add an interval partition along with a decent date type column, and able to cross-check the eligibility of the existing integer values as been considered as date values such as
CREATE TABLE test(
date_id INT NOT NULL,
text VARCHAR2(500)
);
ALTER TABLE test ADD (dt AS ( TO_DATE(date_id,'yyyymmdd') ));
ALTER TABLE test MODIFY
PARTITION BY RANGE(dt) INTERVAL (INTERVAL '1' YEAR)
(
PARTITION P1996 VALUES LESS THAN (date'1997-01-01'),
PARTITION P1999 VALUES LESS THAN (date'2000-01-01'),
PARTITION P2009 VALUES LESS THAN (date'2010-01-01')
);
if your aim is to add yearly partition as the title implies for the year range from 1996 to 2006, then rather you can prefer using a dynamic method in order to generate the desired code block such as
DECLARE
v_ddl CLOB;
BEGIN
FOR year in 1996..2006
LOOP
v_ddl := v_ddl||' PARTITION p'||year||' VALUES LESS THAN (date'''||TO_CHAR(year+1)||'-01-01''),'||CHR(13);
END LOOP;
v_ddl := 'ALTER TABLE test MODIFY PARTITION BY RANGE(dt) INTERVAL (INTERVAL ''1'' YEAR)'||CHR(13)||'('||CHR(13)||RTRIM(v_ddl,','||CHR(13));
v_ddl := v_ddl||CHR(13)||')';
DBMS_OUTPUT.PUT_LINE(v_ddl);
EXECUTE IMMEDIATE v_ddl;
END;
/

How can I handle uniqueness in this situation?

I have a table like this:
create table my_table
(
type1 varchar2(10 char),
type2 varchar2(10 char)
);
I want to uniqueness like this;
if type1 column has 'GENERIC' value then just type2 column must be unique for the table. for example;
type1 column has 'GENERIC' value and type2 column has 'value_x' then there must not any type2 column value that equals to 'value_x'.
But other uniqueness is looking for both column. I mean it should be unique by type1 and type2 columns.(of course first rule is constant)
I try to make it with trigger;
CREATE OR REPLACE trigger my_trigger
BEFORE INSERT OR UPDATE
ON my_table
FOR EACH ROW
DECLARE
lvn_count NUMBER :=0;
lvn_count2 NUMBER :=0;
errormessage clob;
MUST_ACCUR_ONE EXCEPTION;
-- PRAGMA AUTONOMOUS_TRANSACTION; --without this it gives mutating error but I cant use this because it will conflict on simultaneous connections
BEGIN
IF :NEW.type1 = 'GENERIC' THEN
SELECT count(1) INTO lvn_count FROM my_table
WHERE type2= :NEW.type2;
ELSE
SELECT count(1) INTO lvn_count2 FROM my_table
WHERE type1= :NEW.type1 and type2= :NEW.type2;
END IF;
IF (lvn_count >= 1 or lvn_count2 >= 1) THEN
RAISE MUST_ACCUR_ONE;
END IF;
END;
But it gives mutating error without pragma . I do not want to use it due to conflict on simultaneous connections. (error because I use same table on trigger)
I try to make it with unique index but I cant manage.
CREATE UNIQUE INDEX my_table_unique_ix
ON my_table (case when type1= 'GENERIC' then 'some_logic_here' else type1 end, type2); -- I know it does not make sense but maybe there is something different that I can use in here.
Examples;
**Example 1**
insert into my_table (type1,type2) values ('a','b'); -- its ok no problem
insert into my_table (type1,type2) values ('a','c'); -- its ok no problem
insert into my_table (type1,type2) values ('c','b'); -- its ok no problem
insert into my_table (type1,type2) values ('GENERIC','b'); -- it should be error because b is exist before (i look just second column because first column value is 'GENERIC')
EXAMPLE 2:
insert into my_table (type1,type2) values ('GENERIC','b'); -- its ok no problem
insert into my_table (type1,type2) values ('a','c'); -- its ok no problem
insert into my_table (type1,type2) values ('d','c'); -- its ok no problem
insert into my_table (type1,type2) values ('d','b'); -- it should be error because second column can not be same as the second column value that first column value is 'GENERIC'
What you're trying to do is not really straightforward in Oracle. One possible (although somewhat cumbersome) approach is to use a combination of
an additional materialized view with refresh (on commit)
a windowing function to compute the number of distinct values per group
a windowing function to compute the number of GENERIC rows per group
a check constraint to ensure that either we have only one DISTINCT value or we don't have GENERIC in the same group
This should work:
create materialized view mv_my_table
refresh on commit
as
select
type1,
type2,
count(distinct type1) over (partition by type2) as distinct_type1_cnt,
count(case when type1 = 'GENERIC' then 1 else null end)
over (partition by type2) as generic_cnt
from my_table;
alter table mv_my_table add constraint chk_type1
CHECK (distinct_Type1_cnt = 1 or generic_cnt = 0);
Now, INSERTing a duplicate won't fail immediately, but the subsequent COMMIT will fail because it triggers the materialized view refresh, and that will cause the check constraint to fire.
Disadvantages
duplicate INSERTs won't fail immediately (making debugging more painful)
depending on the size of your table, the MView refresh might slow down COMMITs considerably
Links
For a more detailed discussion of this approach, see AskTom on cross-row constraints
Try it like this:
CREATE TABLE my_table (
type1 VARCHAR2(10 CHAR),
type2 VARCHAR2(10 CHAR),
type1_unique VARCHAR2(10 CHAR) GENERATED ALWAYS AS ( NULLIF(type1, 'GENERIC') ) VIRTUAL
);
ALTER TABLE MY_TABLE ADD (CONSTRAINT my_table_unique_ix UNIQUE (type1_unique, type2) USING INDEX)
Or an index like this should also work:
CREATE UNIQUE INDEX my_table_unique_ix ON MY_TABLE (NULLIF(type1, 'GENERIC'), type2);
Or doing it in your style (you only missed the END):
CREATE UNIQUE INDEX my_table_unique_ix ON my_table (case when type1= 'GENERIC' then null else type1 end, type2);
Unless I'm missing something obvious, the logic in the answer from #Frank Schmitt can also be implemented using a statement level trigger. It is a lot simpler to implement and does not have the disadvantages that Frank mentions.
create or replace TRIGGER my_table_t
AFTER INSERT OR UPDATE OR DELETE
ON my_table
DECLARE
l_dummy NUMBER;
MUST_ACCUR_ONE EXCEPTION;
BEGIN
WITH constraint_violated AS
(
select
type1,
type2,
count(distinct type1) over (partition by type2) as distinct_type1_cnt,
count(case when type1 = 'GENERIC' then 1 else null end)
over (partition by type2) as generic_cnt
from my_table
)
SELECT 1 INTO l_dummy
FROM constraint_violated
WHERE NOT (distinct_type1_cnt = 1 or generic_cnt = 0) FETCH FIRST 1 ROWS ONLY;
RAISE MUST_ACCUR_ONE;
EXCEPTION WHEN NO_DATA_FOUND THEN
NULL;
END;
/

Different Output with same Input for ORACLE MD5 Function

At a given time I stored the result of the following ORACLE SQL Query :
SELET col , TO_CHAR( LOWER( STANDARD_HASH( col , 'MD5' ) ) AS hash_col FROM MyTable ;
A week later, I executed the same query on the same data ( same values for column col ).
I thought the resulting hash_col column would have the same values as the values from the former execution but it was not the case.
Is it possible for ORACLE STANDARD_HASH function to deliver over time the same result for identical input data ?
It does if the function is called twice the same day.
All we have about the data changing (or not) and the hash changing (or not) is your assertion.
You could create and populate a log table:
create table hash_log (
sample_time timestamp,
hashed_string varchar2(200),
hashed_string_dump varchar2(200),
hash_value varchar2(200)
);
Then on a daily basis:
insert into hash_log values
(select systimestamp,
source_column,
dump(source_column),
STANDARD_HASH(source_column , 'MD5' )
from source_table
);
Then, to spot changes:
select distinct hashed_string ||
hashed_string_dump ||
hash_value
from hash_log;

How to use insert statement for a Hive partitioned table?

I have a hive table dynpart.
id int
name char(30)
city char(30)
thisday string
# Partition Information
# col_name data_type comment
thisday string
It is partitioned by 'thisday' whose datatype is STRING.
How can I insert a single record into the table in a particular partition. I know there is load command to load an entire file data into hive table. I just want to know how an Insert statement can be written for a partitioned table. I tried to write command like below but this is taking data from another table.
insert into droplater partition(thisday='30/03/2017') select * from dynpart;
The table: Droplater has the same structure as dynpart. But the above command is to insert the data from another table. What I'd like to learn is to write a simple insert command into a partition, like: insert into tabname values(1,"abcd","efgh");into the table.
This will work for primitive types only (no arrays, structs etc.)
insert into tabname partition (thisday='30/03/2017') values (1,"abcd","efgh");
This will work for all types
insert into tabname partition (thisday='30/03/2017') select 1,"abcd","efgh";
P.s.
By all means, partition your table by date ((thisday date) )
insert into tabname partition (thisday=date '2017-03-30') ...
or at least use the ISO date format
insert into tabname partition (thisday='2017-03-30') ...

Hive. Can not refer to partitions in where clause

I create a table partitioned by date. But can not use the partition in where clause.
Here is the process
step1:
CREATE TABLE new_table (
a string,
b string
)
PARTITIONED BY (dt string);
Step2:
Insert overwrite table new_table partition (dt=$date)
Select a, b from my_table
where dt = '$date
Table is created.
Describe new_table;
a string
b string
dt string
Problem:
select * from new_table where dt='$date'
returns empty set.
whereas
select * from new_table
returns a, b, and dt.
Does anyone know what might be the reason for this?
Thanks
Sahara, partitions in where clause do work. Any chance you are specifying the wrong value in the RHS of your predicate?
You may also want to see the data inside /user/hive/warehouse/new_table (by default) to see if the data there looks the way you expect it to be.
Try inserting with
Insert overwrite table new_table partition (dt=$date)
Select a, b, dt from my_table
where dt = '$date'

Resources