I tried 2 ways described here enter link description here
Edit metadata file
CREATE TABLE graphite.data_test
(
Path String,
Value Float64,
Time UInt32,
Date Date,
Timestamp UInt32
)
ENGINE = GraphiteMergeTree(Date, (Path, Time), 8192, 'graphite_rollup')
alter table graphite.data_test attach partition 202208 from graphite.data;
detach table graphite.data_test;
vi /var/lib/clickhouse/metadata/graphite/data_test.sql
ATTACH TABLE data_test
(
Path String,
Value Float64,
Time UInt32,
Date Date,
Timestamp UInt32
)
ENGINE = GraphiteMergeTree('graphite_rollup')
PARTITION BY toYYYYMM(Date)
ORDER BY (Path, Time)
SETTINGS index_granularity = 8192;
attach table graphite.data_test;
ERROR: MergeTree data format version on disk doesn't support custom partitioning.
Copy partitions
CREATE TABLE data_test
(
Path String,
Value Float64,
Time UInt32,
Date Date,
Timestamp UInt32
)
ENGINE = GraphiteMergeTree('graphite_rollup')
PARTITION BY toYYYYMM(Date)
ORDER BY (Path, Time)
SETTINGS index_granularity = 8192;
alter table graphite.data_test attach partition 202208 from graphite.data;
ERROR: Tables have different format_version.
Can you tell me if there is any workaround, a way to change the deprecate table to a new format?
it is possible only using console app and only for parts (not tables) and you need to build this app by yourself.
this app has different names (dependent on version) convert-parts-from-old-format / convert-month-partitioned-parts
Related
I'm trying to sample rows from a clickhouse table.
Below you can find the table definition
CREATE TABLE trades
(
`id` UInt32,
`ticker_id` UUID,
`epoch` DateTime,
`nanoseconds` UInt32,
`amount` Float64,
`cost` Float64,
`price` Float64,
`side` UInt8
)
ENGINE = MergeTree
PARTITION BY (ticker_id, toStartOfInterval(epoch, toIntervalHour(1)))
ORDER BY (ticker_id, epoch)
SETTINGS index_granularity = 8192
I want to sample 10000 rows from the table
SELECT * FROM trades SAMPLE 10000 ;
But when I'm trying to run the query above I'm getting the following error:
DB::Exception: Illegal SAMPLE: table doesn't support sampling
I want to ALTER the table in order to be able to sample from it, but at the same time I want to make sure that I won't corrupt the data while altering the table.
The table has about 1 billion rows.
What would be a good way to ALTER the table while making sure the data won't get corrupted?
you can try:
ALTER TABLE trades MODIFY SAMPLE BY toUnixTimestamp(toIntervalHour(epoch));
CREATE TABLE user_dwd.user_tag_bitmap_local
(
`tag` String,
`tag_item` String,
`p_day` Date,
`origin_user` UInt64,
`users` AggregateFunction(min, UInt64) MATERIALIZED minState(origin_user)
)
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMMDD(p_day)
ORDER BY (tag, tag_item)
SETTINGS index_granularity = 8192;
when running sql to create table, show error:
[2021-10-17 12:05:28] Code: 184, e.displayText() = DB::Exception: Aggregate function minState(origin_user) is found in wrong place in query: While processing minState(origin_user) AS users_tmp_alter9508717652815860223: default expression and column type are incompatible. (version 21.8.4.51 (official build))
how to solve the error?
minState is an aggregating function, you cannot use it like this (it is for queries with a groupby section).
To solve it you can use MATERIALIZED initializeAggregation... or MATERIALIZED arrayReduce(minState...
But actually you don't need the second column.
You are looking for SimpleAggregateFunction:
https://clickhouse.com/docs/en/sql-reference/data-types/simpleaggregatefunction/
CREATE TABLE user_dwd.user_tag_bitmap_local
(
`tag` String,
`tag_item` String,
`p_day` Date,
`origin_user` SimpleAggregateFunction(min, UInt64) ---<<<-----
)
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMMDD(p_day)
ORDER BY (tag, tag_item)
SETTINGS index_granularity = 8192;
https://clickhouse.com/docs/en/sql-reference/functions/other-functions/#initializeaggregation
CREATE TABLE user_tag_bitmap_local
(
`tag` String,
`tag_item` String,
`p_day` Date,
`origin_user` UInt64,
`users` AggregateFunction(min, UInt64) MATERIALIZED initializeAggregation('minState', origin_user)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMMDD(p_day)
ORDER BY (tag, tag_item)
SETTINGS index_granularity = 8192
https://clickhouse.com/docs/en/sql-reference/functions/array-functions/#arrayreduce
CREATE TABLE user_tag_bitmap_local
(
`tag` String,
`tag_item` String,
`p_day` Date,
`origin_user` UInt64,
`users` AggregateFunction(min, UInt64) MATERIALIZED arrayReduce('minState', [origin_user])
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMMDD(p_day)
ORDER BY (tag, tag_item)
SETTINGS index_granularity = 8192
I created MATERIALIZED VIEW like this :
create target table:
CREATE TABLE user_deatils_daily (
day date,
hour UInt8 ,
appid UInt32,
isp String,
city String,
country String,
session_count UInt64,
avg_score AggregateFunction(avg, Float32),
min_revenue AggregateFunction(min, Float32),
max_load_time AggregateFunction(max, Int32)
)
ENGINE = SummingMergeTree()
PARTITION BY toRelativeWeekNum(day)
ORDER BY (day,hour)
create mv:
CREATE MATERIALIZED VIEW user_deatils_daily_mv
TO user_deatils_daily as
select toDate(session_ts) as day, toHour(toDateTime(session_ts)) as hour,appid,isp,city,country,
count(session_uuid) as session_count,avgState() as avg_score,
minState(revenue) as min_revenue,
maxState(perf_page_load_time) as max_load_time
from user_deatils where toDate(session_ts)>='2020-08-26' group by session_ts,appid,isp,city,country
the data in the target table starting to fill with data.
after some times the target table is getting fill with new data and doesn't' save the old one.
why is that?
SummingMergeTree() PARTITION BY toRelativeWeekNum(day) ORDER BY (day,hour)
means calculate sums groupby toRelativeWeekNum(day), day,hour)
user_deatils_daily knows nothing about user_deatils_daily_mv. They are not related.
user_deatils_daily_mv just does inserts into user_deatils_daily
SummingMergeTree knows nothing about group by session_ts,appid,isp,city,country
I would expect to see ORDER BY (ts,appid,isp,city,country);
I would do:
CREATE TABLE user_details_daily
( ts DateTime,
appid UInt32,
isp String,
city String,
country String,
session_count SimpleAggregateFunction(sum,UInt64),
avg_score AggregateFunction(avg, Float32),
min_revenue SimpleAggregateFunction(min, Float32),
max_load_time SimpleAggregateFunction(max, Int32) )
ENGINE = AggregatingMergeTree()
PARTITION BY toStartOfWeek(ts)
ORDER BY (ts,appid,isp,city,country);
CREATE MATERIALIZED VIEW user_deatils_daily_mv TO user_details_daily
as select
toStartOfHour(toDateTime(session_ts)) ts,
appid,
isp,
city,
country,
count(session_uuid) as session_count ,
avgState() as avg_score,
min(revenue) as min_revenue,
max(perf_page_load_time) as max_load_time
from user_details
where toDate(session_ts)>='2020-08-26' group by ts,appid,isp,city,country;
I have an external table for example dump_table, which is partitioned over year, month and day. If i run show create table dump_table i get the following:
CREATE EXTERNAL TABLE `dump_table`
(
`col_name` double,
`col_name_2` timestamp
)
PARTITIONED BY (
`year` int,
`month` int,
`day` int)
CLUSTERED BY (
someid)
INTO 32 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://somecluster/test.db/dump_table'
TBLPROPERTIES (
'orc.compression'='SNAPPY',
'transient_lastDdlTime'='1564476840')
I have to change its columns to upper case and also add new columns, so it will become something like:
CREATE EXTERNAL TABLE `dump_table_2`
(
`COL_NAME` DOUBLE,
`COL_NAME_2` TIMESTAMP,
`NEW_COL` DOUBLE
)
PARTITIONED BY (
`year` int,
`month` int,
`day` int)
CLUSTERED BY (
someid)
Option:1
as an option I can run Change (DDL Reference here) to change column names and then add new columns to it. BUT the thing is that i do not have any backup for this table and it contains alot of data. If anything goes wrong I might loose data.
Can I create a new external table and migrate data, partition by partition from dump_table to dump_table_2 ? what will the query look like for this migration?
Is there any better way of achieving this use case? Please help
You can create new table dump_table_2 with new columns and load data using sql:
set hive.enforce.bucketing = true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dump_table_2 partition (`year`, `month`, `day`)
select col1,
...
colN,
`year`, `month`, `day`
from dump_table_1 t --join other tables if necessary to calculate columns
does anyone know how to use a default value like current_timestamp in mysql when creating a clickhouse table? The now() udf is dynamic, instead of the time the row inserted, it is always the current time, it changes when select.
Here is my table:
CREATE TABLE default.test2 (
`num` UInt32,
`dt` String,
`__inserted_time` DateTime DEFAULT now()
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/test2', '{replica}')
PARTITION BY dt
ORDER BY dt
SETTINGS index_granularity = 8192
I want the __inserted_time column value generated automatically thus I don't have to specify it in the insert into test2 (num,dt) values (1,'20191010')
my mistake, DEFAULT now() actually works