How to change PARTITION in clickhouse - clickhouse

version 18.16.1
CREATE TABLE traffic (
`date` Date,
...
) ENGINE = MergeTree(date, (end_time), 8192);
I want to change as PARTITION BY toYYYYMMDD(date) without drop table how to do that.

Since ALTER query does not allow the partition alteration, the possible way is to create a new table
CREATE TABLE traffic_new
(
`date` Date,
...
)
ENGINE = MergeTree(date, (end_time), 8192)
PARTITION BY toYYYYMMDD(date);
and to move your data
INSERT INTO traffic_new SELECT * FROM traffic WHERE column BETWEEN x and xxxx;
Rename the final table if necessary.
And yes, this option involves deleting the old table (seems to be no way to skip this step)

Related

Moving data from one partition to another

I have a huge table containing about 500 million records. This table has 2 partitions, CURRENT_DATA and PREVIOUS_DATA.
When new data comes in, last PREVIOUS_DATA partition is truncated, and data from CURRENT_DATA partition is moved to PREVIOUS_DATA partition.
Right now, I am updating the records to move from one partition to another. However, it takes a long time. Is there any faster way to move data from one partition to another?
Don't strive to apply DML, but DROP, RENAME and MODIFY partitions as the partition previous_data is already been truncated such as
(presuming you have a list partition with VALUE = 'A' for the current, and VALUE = 'B' for the previous partitions)
ALTER TABLE tab
DROP PARTITION previous_data;
ALTER TABLE tab
RENAME PARTITION current_data TO previous_data;
ALTER TABLE tab
MODIFY PARTITION previous_data
ADD VALUES ('B');
ALTER TABLE tab
MODIFY PARTITION previous_data
DROP VALUES ('A');
ALTER TABLE tab
ADD PARTITION current_data VALUES ( 'A' );
Check out the partitions through
SELECT partition_name, high_value FROM USER_TAB_PARTITIONS WHERE table_name = 'TAB';
before and after applying those operations above(both cases should return the same results).
Be careful! Those are dangerous operation and shouldn't be tried within the production systems, but within the test databases !
If only goal is to separate current data from previous why not just to use two different tables PREVIOUS_DATA and CURRENT_DATA. You can easily get the complete picture by unioning the two together:
SELECT * FROM PREVIOUS_DATA
UNION ALL
SELECT * FROM CURRENT_DATA
When new data comes you can drop PREVIOUS_DATA table, rename CURRENT_DATA to PREVIOUS_DATA and then create new empty CURRENT_DATA table:
DROP TABLE PREVIOUS_DATA;
ALTER TABLE CURRENT_DATA RENAME TO PREVIOUS_DATA;
CREATE TABLE CURRENT_DATA AS SELECT * FROM PREVIOUS_DATA WHERE 1 = 2; /*add comments and constraints*/

Performance of date time concatenation into timestamp

Oracle 12C, non partitioned, no ASM.
This is the background. I have a table with multiple columns, 3 of them being -
TRAN_DATE DATE
TRAN_TIME TIMESTAMP(6)
FINAL_DATETIME NOT NULL TIMESTAMP(6)
The table has around 70 million records. What I want to do is concatenate the tran_date and the tran_time field and update the final_datetime field with that output, for all 70 million records.
This is the query I have -
update MYSCHEMA.MYTAB set FINAL_DATETIME = (to_timestamp( (to_char(tran_date, 'YYYY/MM/DD') || ' ' ||to_char(TRAN_TIME,'HH24MISS') ), 'YYYY-MM-DD HH24:MI:SS.FF'))
Eg:
At present (for one record)
TRAN_DATE=01-DEC-16
TRAN_TIME=01-JAN-70 12.35.58.000000 AM /*I need only the time part from this*/
FINAL_DATETIME=22-OCT-18 04.37.18.870000 PM
Post the query - the FINAL_DATETIME needs to be
01-DEC-16 12.35.58.000000 AM
The to_timestamp does require 2 character strings and I fear this will slow down the update a lot. Any suggestions?
What more can I do to increase performance? No one else will be using the table at this point, so, I do have the option to
Drop indices
Turn off logging
and anything more anyone can suggest.
Any help is appreciated.
I would prefer CTAS method and your job would be simpler if you didn't have indexes, triggers and constraints on your table.
Create a new table for the column to be modified.
CREATE TABLE mytab_new
NOLOGGING
AS
SELECT /*+ FULL(mytab) PARALLEL(mytab, 10) */ --hint to speed up the select.
CAST(tran_date AS TIMESTAMP) + ( tran_time - trunc(tran_time) ) AS final_datetime
FROM mytab;
I have included only one(the final) column in your new table because storing the other two in the new table is waste of resources. You may include other columns in select apart from the two now redundant ones.
Read logging/nologging to know about NOLOGGING option in the select.
Next step is to rebuild indexes, triggers and constraints for the new table new_mytab using the definition from mytab for other columns if they exist.
Then rename the tables
rename mytab to mytab_bkp;
rename mytab_new to mytab;
You may drop the table mytab_bkp after validating the new table or later when you feel you no longer need it.
Demo

SQL Error: ORA-14006: invalid partition name

I am trying to partition an existing table in Oracle 12C R1 using below SQL statement.
ALTER TABLE TABLE_NAME MODIFY
PARTITION BY RANGE (DATE_COLUMN_NAME)
INTERVAL (NUMTOYMINTERVAL(1,'MONTH'))
(
PARTITION part_01 VALUES LESS THAN (TO_DATE('01-SEP-2017', 'DD-MON-RRRR'))
) ONLINE;
Getting error:
Error report -
SQL Error: ORA-14006: invalid partition name
14006. 00000 - "invalid partition name"
*Cause: a partition name of the form <identifier> is
expected but not present.
*Action: enter an appropriate partition name.
Partition needs to be done on the basis of data datatype column with the interval of one month.
Min value of Date time column in the Table is 01-SEP-2017.
You can't partition an existing table like that. That statement is modifying the partition that hasn't been created yet. I don't know the automatic way to do this operation and I am not sure that you can do it.
Although I have done this thing many times but with manual steps. Do the following if you can't find an automated solution:
Create a partitioned table named table_name_part with your clauses and all your preferences.
Insert into this partitioned table all rows from original table. Pay attention to compression. If you have some compression on table (Basic or HCC) you have to use + APPEND hint.
Create on partitioned table your constrains and indexes from the original table.
Rename the tables and drop the original table. Do not drop it until you make some counts on them.
I saw that your table has the option to auto-create partition if it does not exists. (NUMTOYMINTERVAL(1,'MONTH')) So you have to create your table with first partition only. I assume that you have here a lot of read-only data, so you won't have any problem with consistency instead of last month. Probably there is some read-write data so there you have to be more careful with the moment when you want to insert data in new table and switch tables.
Hope to help you. As far as I know there might be a package named DBMS_REDEFINITION that can help you with an automated version of my steps. If you need more details or need some help on my method, please don't hesitate.
UPDATE:
From Oracle 12c R2 you can convert a table from an unpartitioned to a partitioned one with your method. Find a link below. Now this is a challenge for me and I am trying to convert, but I think there is no way to make this conversion online in 12c R1.
In previous releases you could partition a non-partitioned table using
EXCHANGE PARTITION or DBMS_REDEFINITION in an "almost online" manner,
but both methods require multiple steps. Oracle Database 12c Release 2
makes it easier than ever to convert a non-partitioned table to a
partitioned table, requiring only a single command and no downtime.
https://oracle-base.com/articles/12c/online-conversion-of-a-non-partitioned-table-to-a-partitioned-table-12cr2
Solution
I found a solution for you. Here you will have all of my steps that I run to convert table online. :)
1. Create regular table and populate it.
CREATE TABLE SCOTT.tab_unpartitioned
(
id NUMBER,
description VARCHAR2 ( 50 ),
created_date DATE
);
INSERT INTO tab_unpartitioned
SELECT LEVEL,
'Description for ' || LEVEL,
ADD_MONTHS ( TO_DATE ( '01-JAN-2017', 'DD-MON-YYYY' ),
-TRUNC ( DBMS_RANDOM.VALUE ( 1, 4 ) - 1 ) * 12 )
FROM DUAL
CONNECT BY LEVEL <= 10000;
COMMIT;
2. Create partitioned table with same structure.
--If you are on 11g create table with CREATE TABLE command but with different name. ex: tab_partitioned
CREATE TABLE SCOTT.tab_partitioned
(
id NUMBER,
description VARCHAR2 ( 50 ),
created_date DATE
)
PARTITION BY RANGE (created_date)
INTERVAL( NUMTOYMINTERVAL(1,'YEAR'))
(PARTITION part_2015 VALUES LESS THAN (TO_DATE ( '01-JAN-2016', 'DD-MON-YYYY' )),
PARTITION part_2016 VALUES LESS THAN (TO_DATE ( '01-JAN-2017', 'DD-MON-YYYY' )),
PARTITION part_2017 VALUES LESS THAN (TO_DATE ( '01-JAN-2018', 'DD-MON-YYYY' )));
--this is an alter command that works only in 12c.
ALTER TABLE tab_partitioned
MODIFY
PARTITION BY RANGE (created_date)
(PARTITION part_2015 VALUES LESS THAN (TO_DATE ( '01-JAN-2016', 'DD-MON-YYYY' )),
PARTITION part_2016 VALUES LESS THAN (TO_DATE ( '01-JAN-2017', 'DD-MON-YYYY' )),
PARTITION part_2017 VALUES LESS THAN (TO_DATE ( '01-JAN-2018', 'DD-MON-YYYY' )));
3. Check if the table can be converted. This procedure should run without any error.
Prerequisites: table should have an UNIQUE INDEX and a Primary Key constraint.
EXEC DBMS_REDEFINITION.CAN_REDEF_TABLE('SCOTT','TAB_UNPARTITIONED');
4. Run the following steps like I have done.
EXEC DBMS_REDEFINITION.START_REDEF_TABLE('SCOTT','TAB_UNPARTITIONED','TAB_PARTITIONED');
var num_errors varchar2(2000);
EXEC DBMS_REDEFINITION.COPY_TABLE_DEPENDENTS('SCOTT','TAB_UNPARTITIONED','TAB_PARTITIONED', 1,TRUE,TRUE,TRUE,FALSE,:NUM_ERRORS,FALSE);
SQL> PRINT NUM_ERRORS -- Should return 0
EXEC DBMS_REDEFINITION.SYNC_INTERIM_TABLE('SCOTT','TAB_UNPARTITIONED','TAB_PARTITIONED');
EXEC DBMS_REDEFINITION.FINISH_REDEF_TABLE('SCOTT','TAB_UNPARTITIONED','TAB_PARTITIONED');
At the end of the script you will see that the original table is partitioned.
Try Oracle Live SQL I used to use Oracle 11g EE and got the same error message. So I tried Oracle live SQL and it perfectly worked. It has very simple and easy to understand interface,
For example, I'm creating a sales table and inserting some dummy data and partition it using range partitioning method,
CREATE TABLE sales
(product VARCHAR(300),
country VARCHAR(100),
sales_year DATE);
INSERT INTO sales (product, country, sales_year )
VALUES ('Computer','Kazakhstan',TO_DATE('01/02/2018','DD/MM/YYYY'));
INSERT INTO sales (product, country, sales_year )
VALUES ('Mobile Phone','China',TO_DATE('23/12/2019','DD/MM/YYYY'));
INSERT INTO sales (product, country, sales_year )
VALUES ('Camara','USA',TO_DATE('20/11/2020','DD/MM/YYYY'));
INSERT INTO sales (product, country, sales_year )
VALUES ('Watch','Bangladesh',TO_DATE('19/03/2020','DD/MM/YYYY'));
INSERT INTO sales (product, country, sales_year )
VALUES ('Cake','Sri Lanka',TO_DATE('13/04/2021','DD/MM/YYYY'));
ALTER TABLE sales MODIFY
PARTITION BY RANGE(sales_year)
INTERVAL(INTERVAL '1' YEAR)
(
PARTITION sales_2018 VALUES LESS THAN(TO_DATE('01/01/2019','DD/MM/YYYY')),
PARTITION sales_2019 VALUES LESS THAN(TO_DATE('01/01/2020','DD/MM/YYYY')),
PARTITION sales_2020 VALUES LESS THAN(TO_DATE('01/01/2021','DD/MM/YYYY')),
PARTITION sales_2021 VALUES LESS THAN(TO_DATE('01/01/2022','DD/MM/YYYY'))
)ONLINE;
Finally, I can write SELECT query for partitions to confirm that the partitions are created successfully.
SELECT *
FROM sales PARTITION (sales_2020);
And it gives the expected output,

Hive insert overwrites truncates the table in few cases

I was working on one solution and found that in some particular cases, hive insert overwrite truncates the table however in few cases it doesn't. Would someone please explain me what it's behaving like that?
to explain this, I am table two tables, source and target and trying to insert data into master from source table using insert overwrite
When Source Table has partition
if source table has partition and if you write a condition such that partition does not exist then it won't truncate the master table.
create table source (name String) partitioned by (age int);
insert into source partition (age) values("gaurang", 11);
create table target (name String, age int);
insert into target partition (age) values("xxx", 99);
following query won't truncate the table even if select doesn't return anything.
insert overwrite table temp.test12 select * from temp.test11 where name="Ddddd" and age=99;
However, following query will truncate the table.
insert overwrite table temp.target select * from temp.test11 where name="Ddddd" and age=11;
it makes sense in the first case, as the partition(age=99) does not exist hence it should stop the execution of the query further. However this is my assumption, not sure what exactly happens.
When Source Table Doesn't have partition, but Target has
in this case target table won't be truncated even if select statement from source table returns 0 rows.
use temp;
drop table if exists source1;
drop table if exists target1;
create table source1 (name String, age int);
create table target1 (name String) partitioned by (age int);
insert into source1 values ("gaurang", 11);
insert into target1 partition(age) values("xxx", 99);
select * from source1;
select * from target1;
Following query won't truncate the table even if no data found in select statement.
insert overwrite table temp.target1 partition(age) select * from temp.source1 where age=90;
When Source or Target don't have partition
In this case if I try to insert overwrite target and select statement doesn't return any row then target table will be truncated.
check the example below.
use temp;
drop table if exists source1;
drop table if exists target1;
create table source1 (name String, age int);
create table target1 (name String, age int);
insert into source1 values ("gaurang", 11);
insert into target1 values("xxx", 99);
select * from source1;
select * from target1;
Following Query will truncate the target table.
insert overwrite table temp.target1 select * from temp.source1 where age=90;
Better use term 'overwrite' instead of truncate, because it is what exactly happening during insert overwrite.
When you write overwrite table temp.target1 partition(age) you instructs Hive to overwrite partitions, not all the target1 table, only those partitions which will be returned by select.
Empty dataset will not overwrite partitions in dynamic partition mode. because the partition to overwrite is unknown, partition should be taken from dataset, and the dataset is empty, nothing to overwrite then.
And in case of not partitioned table, it is already known that it should overwrite all the table, does not matter, empty dataset or not.
Partition column in insert overwrite statement should be the last. And the list of partitions to be overwritten in target = list of values in partition column, returned by dataset, does not matter how the source table is partitioned (you can select target partition column from any source table column, calculate it or use a constant), only what was returned does matter.

How to duplicate all data in a table except for a single column that should be changed

I have a question regarding a unified insert query against tables with different data
structures (Oracle). Let me elaborate with an example:
tb_customers (
id NUMBER(3), name VARCHAR2(40), archive_id NUMBER(3)
)
tb_suppliers (
id NUMBER(3), name VARCHAR2(40), contact VARCHAR2(40), xxx, xxx,
archive_id NUMBER(3)
)
The only column that is present in all tables is [archive_id]. The plan is to create a new archive of the dataset by copying (duplicating) all records to a different database partition and incrementing the archive_id for those records accordingly. [archive_id] is always part of the primary key.
My problem is with select statements to do the actual duplication of the data. Because the columns are variable, I am struggling to come up with a unified select statement that will copy the data and update the archive_id.
One solution (that works), is to iterate over all the tables in a stored procedure and do a:
CREATE TABLE temp as (SELECT * from ORIGINAL_TABLE);
UPDATE temp SET archive_id=something;
INSERT INTO ORIGINAL_TABLE (select * from temp);
DROP TABLE temp;
I do not like this solution very much as the DDL commands muck up all restore points.
Does anyone else have any solution?
How about creating a global temporary table for each base table?
create global temporary table tb_customers$ as select * from tb_customers;
create global temporary table tb_suppliers$ as select * from tb_suppliers;
You don't need to create and drop these each time, just leave them as-is.
You're archive process is then a single transaction...
insert into tb_customers$ as select * from tb_customers;
update tb_customers$ set archive_id = :v_new_archive_id;
insert into tb_customers select * from tb_customers$;
insert into tb_suppliers$ as select * from tb_suppliers;
update tb_suppliers$ set archive_id = :v_new_archive_id;
insert into tb_suppliers select * from tb_suppliers$;
commit; -- this will clear the global temporary tables
Hope this helps.
I would suggest not having a single sql statement for all tables and just use and insert.
insert into tb_customers_2
select id, name, 'new_archive_id' from tb_customers;
insert into tb_suppliers_2
select id, name, contact, xxx, xxx, 'new_archive_id' from tb_suppliers;
Or if you really need a single sql statement for all of them at least precreate all the temp tables (as temp tables) and leave them in place for next time. Then just use dynamic sql to refer to the temp table.
insert into ORIGINAL_TABLE_TEMP (SELECT * from ORIGINAL_TABLE);
UPDATE ORIGINAL_TABLE_TEMP SET archive_id=something;
INSERT INTO NEW_TABLE (select * from ORIGINAL_TABLE_TEMP);

Resources