MIgrating Virtual Columns from oracle to postgres - oracle

What options does one have to deal with virtual columns when migrating from Oracle 11 to Postgres 9.5 - without having to change database related code in an application (which means functions and views are out of the picture and triggers are way too expensive as dealing with large data sets)?
A similar question exists : Computed / calculated columns in PostgreSQL but the solutions do not help with the migration scenario.

If you use a BEFORE INSERT trigger, you can modify the values inserted before they actually are written. That shouldn't be very expensive. If cutting edge performance is required, write the trigger function in C.
But I think that a view is the best solution. You can use an updatable view, that way you wouldn't have to change the application code:
CREATE TABLE data(
id integer PRIMARY KEY,
factor1 integer NOT NULL,
factor2 integer NOT NULL
);
CREATE VIEW interface AS
SELECT id, factor1, factor2,
factor1 * factor2 AS product
FROM data;
test=> INSERT INTO interface VALUES (1, 6, 7), (2, 3, 14);
INSERT 0 2
test=> UPDATE interface SET factor1 = 7 WHERE id = 1;
UPDATE 1
test=> DELETE FROM interface WHERE id = 1;
DELETE 1
test=> SELECT * FROM interface;
┌────┬─────────┬─────────┬─────────┐
│ id │ factor1 │ factor2 │ product │
├────┼─────────┼─────────┼─────────┤
│ 2 │ 3 │ 14 │ 42 │
└────┴─────────┴─────────┴─────────┘
(1 row)

Related

Clickhouse. Get value from json

I use Clickhouse database. There is a table with string column (data). All rows contains data like:
'[{"a":23, "b":1}]'
'[{"a":7, "b":15}]'
I wanna get all values of key "b".
1
15
Next query:
Select JSONExtractInt('data', 0, 'b') from table
return 0 all time. How i can get values of key "b"?
SELECT tupleElement(JSONExtract(j, 'Array(Tuple(a Int64, b Int64))'), 'b')[1] AS res
FROM
(
SELECT '[{"a":23, "b":1}]' AS j
UNION ALL
SELECT '[{"a":7, "b":15}]'
)
┌─res─┐
│ 1 │
└─────┘
┌─res─┐
│ 15 │
└─────┘

Clickhouse convert table with json content each row into table

I have tried to transform json in rows into a table with this json fields. After looking at Clickhouse documentation I cound't find some clickhouse FUNCTION that can handle this task
Here is the table with the
col_a
{"casa":2,"value":4}
{"casa":6,"value":47}
The proposal is to transform using only Clickhouse SQL (CREATE WITH SELECT) int this table
casa
value
2
4
6
47
SELECT
'{"casa":2,"value":4}' AS j,
JSONExtractKeysAndValuesRaw(j) AS t
┌─j────────────────────┬─t────────────────────────────┐
│ {"casa":2,"value":4} │ [('casa','2'),('value','4')] │
└──────────────────────┴──────────────────────────────┘
SELECT
'{"casa":2,"value":4}' AS j,
JSONExtract(j, 'Tuple(casa Int64, value Int64)') AS t,
tupleElement(t, 'casa') AS casa,
tupleElement(t, 'value') AS value
┌─j────────────────────┬─t─────┬─casa─┬─value─┐
│ {"casa":2,"value":4} │ (2,4) │ 2 │ 4 │
└──────────────────────┴───────┴──────┴───────┘

Get checksum (cityHash64) of first n rows of ClickHouse table

According to https://clickhouse.tech/docs/en/sql-reference/functions/hash-functions/,
I can get a checksum of entire table this way:
SELECT groupBitXor(cityHash64(*)) FROM table
What is the most accurate way to get a checksum of first N rows of a table?
As an example, I'm using a table with GenerateRandom engine as stated here.
CREATE TABLE test (name String, value UInt32) ENGINE = GenerateRandom(1, 5, 3)
I tried using LIMIT clause, but with no luck yet.
Consider using sub-query:
SELECT groupBitXor(cityHash64(*))
FROM (
SELECT *
FROM table
LIMIT x)
SELECT groupBitXor(cityHash64(*))
FROM
(
SELECT *
FROM system.numbers
LIMIT 10
)
/*
┌─groupBitXor(cityHash64(number))─┐
│ 9791317254842948406 │
└─────────────────────────────────┘
*/

Clickhouse - Split arrayMap to colums to sort on

Ive a Clickhouse query question, Im pretty new to Clickhouse so maybe its an easy one for the experts ;)! We have a single table with events in, each event is linked to a product fe product_click, product_view. I want to extract the data grouped by product but in a single line I need all types of events in a separated column so I can sort on it.
I already wrote this query:
SELECT product_id,
arrayMap((x, y) -> (x, y),
(arrayReduce('sumMap', [(groupArrayArray([event_type]) as arr)],
[arrayResize(CAST([], 'Array(UInt64)'), length(arr), toUInt64(1))]) as s).1, s.2) events
FROM events
GROUP BY product_id
Result:
┌─────────────────────────product_id───┬─events─────────────────────────────────────────────────────────────────────────────────────┐
│ 0071f1e4-a484-448e-8355-64e2fea98fd5 │ [('PRODUCT_CLICK',1341),('PRODUCT_VIEW',11)] │
│ 406f4707-6bad-4d3f-9544-c74fdeb1e09d │ [('PRODUCT_CLICK',1),('PRODUCT_VIEW',122),('PRODUCT_BUY',37)] │
│ 94566b6d-6e23-4264-ad76-697ffcfe60c4 │ [('PRODUCT_CLICK',1027),('PRODUCT_VIEW',7)] │
...
Is there any way to convert to arrayMap to columns with a sort key?
So we can filter on the most clicked products first, or the most viewed?
Another question, is having this kind of queries a good idea to always execute, or should we create a MATERIALIZED view for it?
Thanks!
SQL does not allow variable number of columns.
the only way for you
SELECT product_id,
countIf(event_type = 'PRODUCT_CLICK') PRODUCT_CLICK,
countIf(event_type = 'PRODUCT_VIEW') PRODUCT_VIEW,
countIf(event_type = 'PRODUCT_BUY') PRODUCT_BUY
FROM events
GROUP BY product_id

Clickhouse - join on string columns

I got String column uin in several tables, how do I can effectively join on uin these tables?
In Vertica database we use hash(uin) to transform string column into hash with Int data type - it significantly boosts efficiency in joins - could you recommend something like this? I tried CRC32(s) but it seems to work wrong.
At this moment the CH not very good cope with multi-joins queries (DB star-schema) and the query optimizer not good enough to rely on it completely.
So it needs to explicitly say how to 'execute' a query by using subqueries instead of joins.
Let's emulate your query:
SELECT table_01.number AS r
FROM numbers(87654321) AS table_01
INNER JOIN numbers(7654321) AS table_02 ON (table_01.number = table_02.number)
INNER JOIN numbers(654321) AS table_03 ON (table_02.number = table_03.number)
INNER JOIN numbers(54321) AS table_04 ON (table_03.number = table_04.number)
ORDER BY r DESC
LIMIT 8;
/*
┌─────r─┐
│ 54320 │
│ 54319 │
│ 54318 │
│ 54317 │
│ 54316 │
│ 54315 │
│ 54314 │
│ 54313 │
└───────┘
8 rows in set. Elapsed: 4.244 sec. Processed 96.06 million rows, 768.52 MB (22.64 million rows/s., 181.10 MB/s.)
*/
On my PC it takes ~4 secs. Let's rewrite it using subqueries to significantly speed it up.
SELECT number AS r
FROM numbers(87654321)
WHERE number IN (
SELECT number
FROM numbers(7654321)
WHERE number IN (
SELECT number
FROM numbers(654321)
WHERE number IN (
SELECT number
FROM numbers(54321)
)
)
)
ORDER BY r DESC
LIMIT 8;
/*
┌─────r─┐
│ 54320 │
│ 54319 │
│ 54318 │
│ 54317 │
│ 54316 │
│ 54315 │
│ 54314 │
│ 54313 │
└───────┘
8 rows in set. Elapsed: 0.411 sec. Processed 96.06 million rows, 768.52 MB (233.50 million rows/s., 1.87 GB/s.)
*/
There are other ways to optimize JOIN:
use External dictionary to get rid of join on 'small'-table
use Join table engine
use ANY-strictness
use specific settings like join_algorithm, partial_merge_join_optimizations etc
Some useful refs:
Altinity webinar: Tips and tricks every ClickHouse user should know
Altinity webinar: Secrets of ClickHouse Query Performance
Answer update:
To less storage consumption for String-column consider changing column type to LowCardinality (link 2) that significantly decrease the size of a column with many duplicated elements.
Use this query to get the size of columns:
SELECT
name AS column_name,
formatReadableSize(data_compressed_bytes) AS data_size,
formatReadableSize(marks_bytes) AS index_size,
type,
compression_codec
FROM system.columns
WHERE database = 'db_name' AND table = 'table_name'
ORDER BY data_compressed_bytes DESC
To get a numeric representation of a string need to use one of hash-functions.
SELECT 'jsfhuhsdf', xxHash32('jsfhuhsdf'), cityHash64('jsfhuhsdf');

Resources