argMax of two columns in clickhouse

argMax of two columns in clickhouse - clickhouse

Is that possible to get id for maximum value of timestamp and duration. I am looking for query like
SELECT name, argMax(id, (timestamp, duration)) FROM tables GROUP BY name

It's unclear what you mean by the maximum.
Clickhouse is able to compare tuples from the left to the right
https://clickhouse.com/docs/en/sql-reference/data-types/tuple/
select (2022, 1, 1) > (2021, 12, 31);
┌─greater((2022, 1, 1), (2021, 12, 31))─┐
│ 1 │
└───────────────────────────────────────┘
In this case you should use
SELECT name, argMax(id, (timestamp, duration))
FROM tables
GROUP BY name
And Clickhouse has a function greatest https://clickhouse.com/docs/en/sql-reference/functions/other-functions/#greatesta-b
select greatest(2021, 2023);
┌─greatest(2021, 2023)─┐
│ 2023 │
└──────────────────────┘
Then you should use
SELECT name, argMax(id, greatest(timestamp, duration))
FROM tables
GROUP BY name

Related

Clickhouse. Get value from json

I use Clickhouse database. There is a table with string column (data). All rows contains data like:
'[{"a":23, "b":1}]'
'[{"a":7, "b":15}]'
I wanna get all values of key "b".
1
15
Next query:
Select JSONExtractInt('data', 0, 'b') from table
return 0 all time. How i can get values of key "b"?

SELECT tupleElement(JSONExtract(j, 'Array(Tuple(a Int64, b Int64))'), 'b')[1] AS res
FROM
(
SELECT '[{"a":23, "b":1}]' AS j
UNION ALL
SELECT '[{"a":7, "b":15}]'
)
┌─res─┐
│ 1 │
└─────┘
┌─res─┐
│ 15 │
└─────┘

Get checksum (cityHash64) of first n rows of ClickHouse table

According to https://clickhouse.tech/docs/en/sql-reference/functions/hash-functions/,
I can get a checksum of entire table this way:
SELECT groupBitXor(cityHash64(*)) FROM table
What is the most accurate way to get a checksum of first N rows of a table?
As an example, I'm using a table with GenerateRandom engine as stated here.
CREATE TABLE test (name String, value UInt32) ENGINE = GenerateRandom(1, 5, 3)
I tried using LIMIT clause, but with no luck yet.

Consider using sub-query:
SELECT groupBitXor(cityHash64(*))
FROM (
SELECT *
FROM table
LIMIT x)
SELECT groupBitXor(cityHash64(*))
FROM
(
SELECT *
FROM system.numbers
LIMIT 10
)
/*
┌─groupBitXor(cityHash64(number))─┐
│ 9791317254842948406 │
└─────────────────────────────────┘
*/

Clickhouse - Split arrayMap to colums to sort on

Ive a Clickhouse query question, Im pretty new to Clickhouse so maybe its an easy one for the experts ;)! We have a single table with events in, each event is linked to a product fe product_click, product_view. I want to extract the data grouped by product but in a single line I need all types of events in a separated column so I can sort on it.
I already wrote this query:
SELECT product_id,
arrayMap((x, y) -> (x, y),
(arrayReduce('sumMap', [(groupArrayArray([event_type]) as arr)],
[arrayResize(CAST([], 'Array(UInt64)'), length(arr), toUInt64(1))]) as s).1, s.2) events
FROM events
GROUP BY product_id
Result:
┌─────────────────────────product_id───┬─events─────────────────────────────────────────────────────────────────────────────────────┐
│ 0071f1e4-a484-448e-8355-64e2fea98fd5 │ [('PRODUCT_CLICK',1341),('PRODUCT_VIEW',11)] │
│ 406f4707-6bad-4d3f-9544-c74fdeb1e09d │ [('PRODUCT_CLICK',1),('PRODUCT_VIEW',122),('PRODUCT_BUY',37)] │
│ 94566b6d-6e23-4264-ad76-697ffcfe60c4 │ [('PRODUCT_CLICK',1027),('PRODUCT_VIEW',7)] │
...
Is there any way to convert to arrayMap to columns with a sort key?
So we can filter on the most clicked products first, or the most viewed?
Another question, is having this kind of queries a good idea to always execute, or should we create a MATERIALIZED view for it?
Thanks!

SQL does not allow variable number of columns.
the only way for you
SELECT product_id,
countIf(event_type = 'PRODUCT_CLICK') PRODUCT_CLICK,
countIf(event_type = 'PRODUCT_VIEW') PRODUCT_VIEW,
countIf(event_type = 'PRODUCT_BUY') PRODUCT_BUY
FROM events
GROUP BY product_id

clickhouse: How do I find the least date in array that is above date in another column?

Basically I have the table with the following data-structure:
id_level1: Int32
id_level2: Int32
event_date: Date
arr_object_ids: Array of Int32 - sorted by next column
arr_object_dates: Array of Date - sorted ascending
What I need is to have the least object_date that is above event_date for each pair of (id_leve1, id_level2). How is that possible in Clickhouse?
Then I would use arrayElement(arr_object_ids, indexOf(arr_object_dates, solution) to get corresponding object_id

Try this query:
SELECT
id_level1,
id_level2,
/*arrayFirst(x -> x > event_date, arr_object_dates) least_date,*/
arrayFirstIndex(x -> x > event_date, arr_object_dates) least_date_index,
least_date_index = 0 ? -1 : arrayElement(arr_object_ids, least_date_index) object_id /* -1 if result not found */
FROM (
/* emulate original table */
SELECT 1 id_level1, 2 id_level2, '2020-01-03' event_date,
[4, 5, 6,7] arr_object_ids,
['2020-01-01', '2020-01-03', '2020-01-06', '2020-01-11'] arr_object_dates
UNION ALL
SELECT 3 id_level1, 4 id_level2, '2020-05-03' event_date,
[4, 5, 6,7] arr_object_ids,
['2020-01-01', '2020-01-03', '2020-01-06', '2020-01-11'] arr_object_dates)
ORDER BY event_date
/* result
┌─id_level1─┬─id_level2─┬─least_date_index─┬─object_id─┐
│ 1 │ 2 │ 3 │ 6 │
│ 3 │ 4 │ 0 │ -1 │
└───────────┴───────────┴──────────────────┴───────────┘
*/

MIgrating Virtual Columns from oracle to postgres

What options does one have to deal with virtual columns when migrating from Oracle 11 to Postgres 9.5 - without having to change database related code in an application (which means functions and views are out of the picture and triggers are way too expensive as dealing with large data sets)?
A similar question exists : Computed / calculated columns in PostgreSQL but the solutions do not help with the migration scenario.

If you use a BEFORE INSERT trigger, you can modify the values inserted before they actually are written. That shouldn't be very expensive. If cutting edge performance is required, write the trigger function in C.
But I think that a view is the best solution. You can use an updatable view, that way you wouldn't have to change the application code:
CREATE TABLE data(
id integer PRIMARY KEY,
factor1 integer NOT NULL,
factor2 integer NOT NULL
);
CREATE VIEW interface AS
SELECT id, factor1, factor2,
factor1 * factor2 AS product
FROM data;
test=> INSERT INTO interface VALUES (1, 6, 7), (2, 3, 14);
INSERT 0 2
test=> UPDATE interface SET factor1 = 7 WHERE id = 1;
UPDATE 1
test=> DELETE FROM interface WHERE id = 1;
DELETE 1
test=> SELECT * FROM interface;
┌────┬─────────┬─────────┬─────────┐
│ id │ factor1 │ factor2 │ product │
├────┼─────────┼─────────┼─────────┤
│ 2 │ 3 │ 14 │ 42 │
└────┴─────────┴─────────┴─────────┘
(1 row)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

argMax of two columns in clickhouse - clickhouse

Is that possible to get id for maximum value of timestamp and duration. I am looking for query like SELECT name, argMax(id, (timestamp, duration)) FROM tables GROUP BY name

Related

Clickhouse. Get value from json

Get checksum (cityHash64) of first n rows of ClickHouse table

Clickhouse - Split arrayMap to colums to sort on

clickhouse: How do I find the least date in array that is above date in another column?

MIgrating Virtual Columns from oracle to postgres

Categories

Resources