Used below query to attach one partition in ClickHouse.
alter table a_status_2 attach partition '20210114' from a_status_1;
How to attach multiple partitions in ClickHouse?
How to attach multiple partitions in ClickHouse?
No way to do it automatically.
You can generate SQL script using system.detached_parts
select concat('alter table `',table, '` attach part id '||partition_id||';')
from system.detached_parts
where database = 'xxx' and table = 'yyy'
Or https://gist.github.com/den-crane/5ae44ec04961ec62286835c8798e2728
let i=1;for f in `ls -1` ; do echo $i $f;((i++)); echo "alter table A.d attach part '$f';"|clickhouse-client ; done
Related
version 18.16.1
CREATE TABLE traffic (
`date` Date,
...
) ENGINE = MergeTree(date, (end_time), 8192);
I want to change as PARTITION BY toYYYYMMDD(date) without drop table how to do that.
Since ALTER query does not allow the partition alteration, the possible way is to create a new table
CREATE TABLE traffic_new
(
`date` Date,
...
)
ENGINE = MergeTree(date, (end_time), 8192)
PARTITION BY toYYYYMMDD(date);
and to move your data
INSERT INTO traffic_new SELECT * FROM traffic WHERE column BETWEEN x and xxxx;
Rename the final table if necessary.
And yes, this option involves deleting the old table (seems to be no way to skip this step)
I am trying to add a semi-colon (;) after each create view Hive ddl statement. I have a file that has the below ddl statements in them:
CREATE VIEW `db1.table1` AS SELECT * FROM db2.table1
CREATE VIEW `db1.table2` AS SELECT * FROM db2.table2
CREATE VIEW `db1.table3` AS SELECT * FROM db3.table3
CREATE EXTERNAL TABLE `db1.table4`(
`cus_id` int,
`ren_mt` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
TBLPROPERTIES (
'skip.header.line.count'='1',
'transient_lastDdlTime'='1558705259')
CREATE EXTERNAL TABLE `sndbx_cmcx.effective_month1`(
`customeridentifier` bigint,
`renewalmonth` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='false',
'transient_lastDdlTime'='1558713596')
I want it to look like below. After each create view statement there is a ; and after each create table there's a ;..
CREATE VIEW `db1.table1` AS SELECT * FROM db2.table1;
CREATE VIEW `db1.table2` AS SELECT * FROM db2.table2;
CREATE VIEW `db1.table3` AS SELECT * FROM db3.table3;
CREATE EXTERNAL TABLE `db1.table4`(
`cus_id` int,
`ren_mt` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
TBLPROPERTIES (
'skip.header.line.count'='1',
'transient_lastDdlTime'='1558705259');
CREATE EXTERNAL TABLE `sndbx_cmcx.effective_month1`(
`customeridentifier` bigint,
`renewalmonth` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='false',
'transient_lastDdlTime'='1558713596');
Here is my shell script that I use:
#Change database before you run the script
hiveDBName=$1;
showcreate="show create table "
terminate=";"
tables=`hive -e "use $hiveDBName;show tables;"`
tab_list=`echo "${tables}"`
for list in $tab_list
do
echo "Generating table script for " #${hiveDBName}.${list}
showcreatetable=${showcreatetable}${showcreate}${hiveDBName}.${list}${terminate}
done
echo " ====== Create Tables ======= : "# $showcreatetable
#Creates a filter ddls
hive -e "use $hiveDBName; ${showcreatetable}"> a.sql
#Removes the Warn: from the file
grep -v "WARN" a.sql > /home/path/my_ddls/${hiveDBName}_extract_all_tables.sql
echo "Removing Filter File"
#Remove Filter file
rm -f a.sql
#Puts a ; after each create view statement in the document
sed -i '/transient/s/$/;/' "/home/path/my_ddls/${hiveDBName}_extract_all_tables.sql"
This generates the ddls but it only puts a ; after the create table statement but it doesn't put it after each create view statement.
Any ideas or suggestions?
I'd take the easy way and make use of the possibilities that the ; doesn't have to be on the same line as the (end of the) statement and that there may be an empty statement. This gives:
sed -i -e '/^CREATE/i;' -e '$a;' "/home/path/my_ddls/${hiveDBName}_extract_all_tables.sql"
How can I upsert a record in GreenPlum while copying the data from a CSV file. The CSV file has multiple records for a given value of the primary key. If a row with some value already exists in the database I want to update that record. Otherwise, it should append a new row.
One way to do this is to copy the data to a staging table, then insert/update from that table.
Here is an example of that:
-- Duplicate the definition of your table.
CREATE TEMP TABLE my_table_stage (LIKE my_table INCLUDING DEFAULTS);
-- Your COPY statment
COPY my_table FROM 'my_file.csv' ...
-- Insert any "new" records
INSERT INTO my_table (key_field, data_field1, data_field2)
SELECT
stg.key_field,
stg.data_field1,
stg.data_field2
FROM
my_table_stage stg
WHERE
NOT EXISTS (SELECT 1 FROM my_table WHERE key_field = stg.key_field);
-- Update any existing records
UPDATE my_table orig
SET
data_field1 = stg.data_field1,
data_field2 = stg.data_field2
FROM
my_table_stage stg
WHERE
orig.key_field = stg.keyfield;
Right now I run the following Hive query
CREATE TABLE dwo_analysis.exp_shown AS
SELECT
MIN(sc.date_time) as first_shown_time,
SUBSTR(sc.post_evar12,1,24) as guid,
sc.post_evar238 as experiment_name,
sc.post_evar239 as variant_name
FROM test
WHERE report_suite='adbemmarvelweb.prod'
AND date >= DATE_SUB(CURRENT_DATE,90) AND date < DATE_SUB(CURRENT_DATE, 2)
AND post_prop5 = 'experiment:standard:authenticated:shown'
AND post_evar238 NOT LIKE 'control%'
AND post_evar238 <> ''
AND post_evar239 <> ''
The table test is large. I would like to optimize this query by running it once, and every other time updating the table by getting the last 2 days of data and adding it to the table.
so basically run the above query once and every time run it again but with the condition
WHERE click_date >= DATE_SUB(CURRENT_DATE, 2) AND click_date < DATE_SUB(CURRENT_DATE)
How do I update the table using hive to populate the the rows as mentioned in the condition above?
First, your queries would be quicker if the Hive table were partitioned based on date. Your create table statement isn't inserting into any partitions, therefore I suspect your table is not partitioned. It would also be quicker if the source data were Parquet/ORC
In any case, you can overwrite the table for a date range like so
INSERT OVERWRITE TABLE dwo_analysis.exp_shown
SELECT * FROM test
WHERE click_date
BETWEEN DATE_SUB(CURRENT_DATE, 2) AND CURRENT_DATE;
I am trying to add partition to a hive table (partitioned by date)
My problem is that the date needs to be fetched from another table.
My query looks like :
ALTER TABLE my_table ADD IF NOT EXISTS PARTITION(server_date = (SELECT max(server_date) FROM processed_table));
When i run the query hive throws the following error:
Error: Error while compiling statement: FAILED: ParseException line 1:84 cannot recognize input near '(' 'SELECT' 'max' in constant (state=42000,code=40000)
Hive does not allow to use functions/UDF's for the partition column.
Approach 1:
To achieve this you can run the first query and store the result in one variable and then execute the query.
server_date=$(hive -e "set hive.cli.print.header=false; select max(server_date) from processed_table;")
hive -hiveconf "server_date"="$server_date" -f your_hive_script.hql
Inside your script you can use the following statement:
ALTER TABLE my_table ADD IF NOT EXISTS PARTITION(server_date =${hiveconf:server_date});
For more information on the hive variable substitution, you can refer link
Approach 2:
In this approach, you will need to create a temporary table if the partition data you are expecting is already not loaded in any other partitioned table.
Considering your data doesn't have the server_date column.
Load the data into temporary table
set hive.exec.dynamic.partition=true;
Execute the below query:
INSERT OVERWRITE TABLE my_table PARTITION (server_date)
SELECT b.column1, b.column2,........,a.server_date as server_date FROM (select max(server_date) as server_date from ) a, my_table b;