I have a table in clickhouse, say "my_table", which has replicates (my_table_rep1,...). And I need to add a column, of type float64, with default value (-1).
How should I do that?
I would prefer that the default are not actually added to existing entries.
The documentation is pretty straight-forward:
ALTER TABLE [db].name [ON CLUSTER cluster] ADD COLUMN name [type] [default_expr] [AFTER name_after]
Regarding that:
I would prefer that the default are not actually added to existing entries.
There's also a statement in the docs:
If you add a new column to a table but later change its default expression, the values used for old data will change (for data where values were not stored on the disk)
So, basically, you have to:
Add a column (without a default value, or with DEFAULT 0, or with something else - depends on what do you want to have in the existing entries)
Do OPTIMIZE TABLE .. FINAL to force Clickhouse to write the new data to the disk
Modify the column and set a DEFAULT -1 so only the new rows would be affected
An example:
:) CREATE TABLE my_table (date Date DEFAULT today(), s String) ENGINE = MergeTree(date, (date), 8192);
:) INSERT INTO my_table (s) VALUES ('1. foo');
:) ALTER TABLE my_table ADD COLUMN f Float64;
:) INSERT INTO my_table (s) VALUES ('2. bar');
:) SELECT * FROM my_table;
┌───────date─┬─s──────┬─f─┐
│ 2018-04-20 │ 1. foo │ 0 │
│ 2018-04-20 │ 2. bar │ 0 │
└────────────┴────────┴───┘
:) OPTIMIZE TABLE my_table PARTITION 201804 FINAL;
:) ALTER TABLE my_table MODIFY COLUMN f Float64 DEFAULT -1;
:) INSERT INTO my_table (s) VALUES ('3. baz');
:) SELECT * FROM my_table;
┌───────date─┬─s──────┬──f─┐
│ 2018-04-20 │ 3. baz │ -1 │
│ 2018-04-20 │ 1. foo │ 0 │
│ 2018-04-20 │ 2. bar │ 0 │
└────────────┴────────┴────┘
You really have to do OPTIMIZE TABLE ... FINAL, because if you won't do that, weird things will happen: https://gist.github.com/hatarist/5e7653808e59349c34d4589b2fc69b14
Related
Is it possible to alter a table engine in clickhouse table like in MySQL, something like this:
CREATE TABLE example_table (id UInt32, data String) ENGINE=MergeTree() ORDER BY id;
ALTER example_table ENGINE=SummingMergeTree();
Because I didn't find such capability in the documentation.
If it is not possible, are there any plans to implement it in near future, or what architecture limitations prevent from doing this?
It's possible to change an Engine by several ways.
But it's impossible to change PARTITION BY / ORDER BY. That's why it's not documented explicitly. So in 99.99999% cases it does not make any sense. SummingMergeTree uses table's ORDER BY as a collapsing rule and the existing ORDER BY usually does not suit.
Here is an example of one the ways (less hacky one),
(you can copy partitions from one table to another, it's almost free operation, it exploits FS hardlinks and does not copy real data). (COW -- copy on write).
CREATE TABLE example_table (id UInt32, data Float64)
ENGINE=MergeTree() ORDER BY id;
Insert into example_table values(1,1), (1,1), (2,1);
CREATE TABLE example_table1 (id UInt32, data Float64)
ENGINE=SummingMergeTree() ORDER BY id;
-- this does not copy any data (instant & almost free command)
alter table example_table1 attach partition tuple() from example_table;
SELECT * FROM example_table1;
┌─id─┬─data─┐
│ 1 │ 1 │
│ 1 │ 1 │
│ 2 │ 1 │
└────┴──────┘
optimize table example_table1 final;
select * from example_table1;
┌─id─┬─data─┐
│ 1 │ 2 │
│ 2 │ 1 │
└────┴──────┘
One more way (edit metadata file, also ZK records if a table Replicated)
detach table example_table;
vi /var/lib/clickhouse/metadata/default/example_table.sql
replace MergeTree with SummingMergeTree
attach table example_table;
SHOW CREATE TABLE example_table
┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE default.example_table
(
`id` UInt32,
`data` Float64
)
ENGINE = SummingMergeTree
ORDER BY id
SETTINGS index_granularity = 8192 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
SELECT * FROM example_table;
┌─id─┬─data─┐
│ 1 │ 1 │
│ 1 │ 1 │
│ 2 │ 1 │
└────┴──────┘
optimize table example_table final;
SELECT * FROM example_table;
┌─id─┬─data─┐
│ 1 │ 2 │
│ 2 │ 1 │
└────┴──────┘
I have several Tables and materialized views which haven't been created with To [db] statement and have an inner tables with these names:
│ .inner_id.27e007b7-d831-4bea-8610-b3f0fd1ee018 │
│ .inner_id.3e27a1eb-f40f-4c6d-bc70-7722f2cc953b │
│ .inner_id.53c0fe00-151a-4bb5-ba61-237da8c1c6f2 │
│ .inner_id.7785fe45-94e0-42ee-8c0e-14f2d74e3c88 │
│ .inner_id.7d7e6485-e18e-47e0-9bb6-51a31a8cafd0 │
│ .inner_id.cd920d50-7469-4507-a5cf-244de054b037 │
│ .inner_id.fe6d3ce5-7ffc-4bca-9d6b-4015b1807e4f │
Which is different from the name of materialized views i used to create them.
Is there any way to get inner table name of materialized views with clickhouse query?
To resolve uuid name use this query:
SELECT
uuid,
name
FROM system.tables
WHERE database = '{name_of_database}' AND engine = 'MaterializedView'
/*
┌─────────────────────────────────uuid─┬─name───────┐
│ 6b297d75-1c67-4824-9a60-94195c160436 │ mv_name_01 │
..
└──────────────────────────────────────┴────────────┘
*/
What should I specify in config.xml to enable (set to 1) the allow_drop_detached parameter in the system.settings table?
It cannot be changed by 'alter table' request.
Error message:
Code: 344. DB::Exception: Received from localhost:9000. DB::Exception: Cannot execute query: DROP DETACHED PART is disabled (see allow_drop_detached setting).
It should be defined in user.xml not config.xml.
Create an arbitrary file with required changes (here it is query.settings.xml) in directory /etc/clickhouse-server/users.d/:
nano /etc/clickhouse-server/users.d/query.settings.xml
with content (here this param is set only for default-profile):
<?xml version="1.0"?>
<yandex>
<profiles>
<default>
<!-- Allow ALTER TABLE ... DROP DETACHED PART[ITION] ... queries. -->
<allow_drop_detached>1</allow_drop_detached>
</default>
</profiles>
</yandex>
Save this file and check that param is set to 1:
SELECT *
FROM system.settings
WHERE name = 'allow_drop_detached'
/*
┌─name────────────────┬─value─┬─changed─┬─description─────────────────────────────────────────────────┬─min──┬─max──┬─readonly─┐
│ allow_drop_detached │ 1 │ 1 │ Allow ALTER TABLE ... DROP DETACHED PART[ITION] ... queries │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │
└─────────────────────┴───────┴─────────┴─────────────────────────────────────────────────────────────┴──────┴──────┴──────────┘
*/
The alternate ways to set this param are:
query level
ALTER TABLE table
DROP DETACHED PARTITION '2020-04-01'
SETTINGS allow_drop_detached = 1;
session level
SET allow_drop_detached = 1;
ALTER TABLE table
DROP DETACHED PARTITION '2020-04-01';
I have a column in a table with data type Int32. Is it possible to convert column into array data type Array(Int32). If not what are the other ways, kindly let me know.
The changing type of a column from int to Array(int) cannot be performed by ALTER TABLE .. MODIFY COLUMN-query because such typecasting is not allowed.
So you need to follow these steps:
add the new column with type Array(int)
ALTER TABLE test.test_004 ADD COLUMN `value_array` Array(int);
/* Test table preparation:
CREATE TABLE test.test_004
(
`id` int,
`value` int
)
ENGINE = MergeTree();
INSERT INTO test.test_004 VALUES (1, 10), (2, 20), (3, 30), (4, 40);
*/
update array-column value
ALTER TABLE test.test_004
UPDATE value_array = [value] WHERE 1
/* Result
┌─id─┬─value─┬─value_array─┐
│ 1 │ 10 │ [10] │
│ 2 │ 20 │ [20] │
│ 3 │ 30 │ [30] │
│ 4 │ 40 │ [40] │
└────┴───────┴─────────────┘
*/
important: depends on the count of rows in the table this operation can take much time to execute. To check the status of update (mutation) or find the reason of failing observe the system.mutations-table
SELECT *
FROM system.mutations
WHERE table = 'test_004'
when the mutation be completed the original column can be removed
ALTER TABLE test.test_004
DROP COLUMN value
Remark: assign the extra ON CLUSTER-clause for each query if the table is located on several servers.
In continuation of my last post - "migration oracle to postgresql invalid byte sequence for encoding “UTF8”: 0x00"
I'm trying to insert into a local PostgreSQL table data from remote Oracle table (via oracle_fdw extension). My Oracle table has a column named street and it has valid string values and sometimes the next invalid (in PostgreSQL) string : ' ' (space).
When I try to copy the column value I get the error I mentioned above and in my last post. I understood that I need to change the oracle data before I insert it to PostgreSQL. I must do it on the fly so I tried to search for oracle decode func in PostgreSQL. I found two solution and I used both of them but I got same error:
1.using select with case :
mydb=>select *,(case when v.street=' ' then null END) from customer_prod v;
ERROR: invalid byte sequence for encoding "UTF8": 0x00
CONTEXT: converting column "street" for foreign table scan of
"customer_prod", row 254148
2.using decode function from orafce extension :
mydb=>select decode(street,' ',null) from customer_prod;
ERROR: invalid byte sequence for encoding "UTF8": 0x00
So, I'm still getting the error. How can I solve this issue?
The error occurs when the values are transferred from Oracle to PostgreSQL, so post-processing won't prevent the error.
For the sake of demonstration, let's create an Oracle table that exhibits the problem:
CREATE TABLE nulltest(
id number(5) CONSTRAINT nulltest_pkey PRIMARY KEY,
val varchar2(10 CHAR)
);
INSERT INTO nulltest VALUES (1, 'schön');
INSERT INTO nulltest VALUES (2, 'bö' || CHR(0) || 'se');
INSERT INTO nulltest VALUES (3, 'egal');
COMMIT;
Let's create a foreign table in PostgreSQL for it:
CREATE FOREIGN TABLE nulltest (
id integer OPTIONS (key 'true') NOT NULL,
val varchar(10)
) SERVER oracle
OPTIONS (table 'NULLTEST');
SELECT * FROM nulltest;
ERROR: invalid byte sequence for encoding "UTF8": 0x00
CONTEXT: converting column "val" for foreign table scan of "nulltest", row 2
Now the easiest thing would be to create a foreign table that filters away the zero characters:
CREATE FOREIGN TABLE filter_nulltest (
id integer OPTIONS (key 'true') NOT NULL,
val varchar(10)
) SERVER oracle
OPTIONS (table '(SELECT id, replace(val, CHR(0), NULL) FROM nulltest)');
SELECT * FROM filter_nulltest;
┌────┬───────┐
│ id │ val │
├────┼───────┤
│ 1 │ schön │
│ 2 │ böse │
│ 3 │ egal │
└────┴───────┘
(3 rows)
Another, less efficient, option would be to create a function that catches and reports bad lines to you so that you can fix them on the Oracle side:
CREATE OR REPLACE FUNCTION get_nulltest() RETURNS SETOF nulltest
LANGUAGE plpgsql AS
$$DECLARE
v_id integer;
n nulltest;
BEGIN
FOR v_id IN SELECT id FROM nulltest
LOOP
BEGIN
SELECT nulltest.* INTO n
FROM nulltest
WHERE id = v_id;
RETURN NEXT n;
EXCEPTION
WHEN OTHERS THEN
RAISE NOTICE 'Caught error % for id=%: %', SQLSTATE, v_id, SQLERRM;
END;
END LOOP;
END;$$;
SELECT * FROM get_nulltest();
NOTICE: Caught error 22021 for id=2: invalid byte sequence for encoding "UTF8": 0x00
┌────┬───────┐
│ id │ val │
├────┼───────┤
│ 1 │ schön │
│ 3 │ egal │
└────┴───────┘
(2 rows)