I'm following Altinity's examples on how to implement Lag/Lead functions https://kb.altinity.com/altinity-kb-queries-and-syntax/lag-lead/ But I can't find a way to replace NULLs with other values.
Using that example and adding toNullable(a) you can see that many values are going to be NULL.
SELECT
g,
a,
lagInFrame(toNullable(a)) OVER (PARTITION BY g ORDER BY a ASC Rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS prev,
leadInFrame(a) OVER (PARTITION BY g ORDER BY a ASC Rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS next
FROM llexample
ORDER BY
g ASC,
a ASC
Query id: 65c75108-520f-4115-8996-328e8e62aa25
┌─g─┬──────────a─┬───────prev─┬───────next─┐
│ 0 │ 2020-01-01 │ ᴺᵁᴸᴸ │ 2020-01-04 │
│ 0 │ 2020-01-04 │ 2020-01-01 │ 2020-01-07 │
│ 0 │ 2020-01-07 │ 2020-01-04 │ 2020-01-10 │
│ 0 │ 2020-01-10 │ 2020-01-07 │ 1970-01-01 │
│ 1 │ 2020-01-02 │ ᴺᵁᴸᴸ │ 2020-01-05 │
│ 1 │ 2020-01-05 │ 2020-01-02 │ 2020-01-08 │
│ 1 │ 2020-01-08 │ 2020-01-05 │ 1970-01-01 │
│ 2 │ 2020-01-03 │ ᴺᵁᴸᴸ │ 2020-01-06 │
│ 2 │ 2020-01-06 │ 2020-01-03 │ 2020-01-09 │
│ 2 │ 2020-01-09 │ 2020-01-06 │ 1970-01-01 │
└───┴────────────┴────────────┴────────────┘
I tried to add leadInFrame inside a COALESCE. But when I try to do that I get the error:
SELECT
g,
a,
COALESCE(
lagInFrame(toNullable(a)) OVER (PARTITION BY g ORDER BY a ASC Rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
today()
) AS prev,
leadInFrame(a) OVER (PARTITION BY g ORDER BY a ASC Rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS next
FROM llexample
ORDER BY
g ASC,
a ASC
Query id: 9685b822-7f31-45d3-9103-89f06b373876
0 rows in set. Elapsed: 0.002 sec.
Received exception from server (version 22.1.2):
Code: 47. DB::Exception: Received from localhost:9000. DB::Exception: Unknown identifier: lagInFrame(toNullable(a)) OVER (PARTITION BY g ORDER BY a ASC Rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING); there are columns: g, a, toNullable(a): While processing g, a, coalesce(lagInFrame(toNullable(a)) OVER (PARTITION BY g ORDER BY a ASC Rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), today()) AS prev, leadInFrame(a) OVER (PARTITION BY g ORDER BY a ASC Rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS next. (UNKNOWN_IDENTIFIER)
I also tried other conditionals and got the same error.
Best
https://github.com/ClickHouse/ClickHouse/issues/19857
You simply need to use subquery, because window functions are not fully functional.
select
g,
a,
COALESCE( prev, today()) prev,
next
from (
SELECT
g,
a,
lagInFrame(toNullable(a)) OVER (PARTITION BY g ORDER BY a ASC Rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) prev,
leadInFrame(a) OVER (PARTITION BY g ORDER BY a ASC Rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS next
FROM llexample
ORDER BY
g ASC,
a ASC
)
In PostgreSQL we can join tables with custom data.
For example:
select *
from points p
inner join (VALUES (5, '2000-1-1'::date, 1, 1)) as x(id, create_date, store_id, supplier_id)
on p.id = x.id
Does such kind of join exist in Clickhouse? If yes, how should I write it?
https://github.com/ClickHouse/ClickHouse/issues/5984
SELECT *
FROM VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three'))
┌─a─┬─s─────┐
│ 1 │ one │
│ 2 │ two │
│ 3 │ three │
└───┴───────┘
WITH
[(toUInt64(1), 'one'), (2, 'two'), (3, 'three')] AS rows_array,
arrayJoin(rows_array) AS row_tuple
SELECT
row_tuple.1 AS number_decimal,
row_tuple.2 AS number_string
┌─number_decimal─┬─number_string─┐
│ 1 │ one │
│ 2 │ two │
│ 3 │ three │
└────────────────┴───────────────┘
Problem:
Count distinct values in an array filtered by another array on same row (and agg higher).
Explanation:
Using this data:
In the Size D70, there are 5 pcs available (hqsize), but shops requests 15. By using the column accumulatedNeed, the 5 first stores in the column shops should receive items (since every store request 1 pcs). That is [4098,4101,4109,4076,4080].
It could also be that the values in accumulatedNeed would be [1,4,5,5,5,...,15], where shop 1 request 1 pcs, shop2 3 pcs, etc. Then only 3 stores would get.
In the size E75 there is enough stock, so every shop will receive (10 shops):
Now i want the distinct list of shops from D70 & E75, which would be be final result:
[4098,4101,4109,4076,4080,4062,4063,4067,4072,4075,4056,4058,4059,4061] (14 unique stores) (4109 is only counted once)
Wanted result:
[4098,4101,4109,4076,4080,4062,4063,4067,4072,4075,4056,4058,4059,4061]. (14 unique stores)
I'm totally open to structure the data otherwise if better.
The reason why it can't be precalculated is that the result depends on which shops that are filtered on.
Additional issue
The answer below from Vdimir is good and I've used it as basics for the final solution, but the solution does not cover (partial fullfillment).
If the stock number is in the runningNeed array we are all goodt, but remainers are not handled.
If you got:
select 5 as stock,[2,2,3,3] as need, [1,2,3,4] as shops, arrayCumSum(need) as runningNeed,arrayMap(x -> (x <= stock), runningNeed) as mask
You will get:
This is not correct since the 3rd shop should have 1 from stock (5-2-2 = 1)
I can't seem to get my head around how to make an array with "stock given", which in this case would be [2,2,1,0]
I use this query to create table with data similar to your screenshot:
CREATE TABLE t
(
Size String,
hqsize Int,
accumulatedNeed Array(Int),
shops Array(Int)
) engine = Memory;
INSERT INTO t VALUES ('D70', 5, [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], [4098,4101,4109,4076,4080,4083,4062,4063,4067,4072,4075,4056,4057,4058,4059]),('E75', 43, [1,2,3,4,5,6,7,8,9,10], [4109,4062,4063,4067,4072,4075,4056,4058,4059,4061]);
Find which shops that can receive enough items:
SELECT arrayMap(x -> (x <= hqsize), accumulatedNeed) as mask FROM t;
┌─mask────────────────────────────┐
│ [1,1,1,1,1,0,0,0,0,0,0,0,0,0,0] │
│ [1,1,1,1,1,1,1,1,1,1] │
└─────────────────────────────────┘
Filter not fulfilled shops according to this mask:
Note that shops and accumulatedNeed have to have equals sizes.
SELECT arrayFilter((x,y) -> y, shops, mask) as fulfilled_shops, arrayMap(x -> (x <= hqsize), accumulatedNeed) as mask FROM t;
┌─fulfilled_shops─────────────────────────────────────┬─mask────────────────────────────┐
│ [4098,4101,4109,4076,4080] │ [1,1,1,1,1,0,0,0,0,0,0,0,0,0,0] │
│ [4109,4062,4063,4067,4072,4075,4056,4058,4059,4061] │ [1,1,1,1,1,1,1,1,1,1] │
└─────────────────────────────────────────────────────┴─────────────────────────────────┘
Then you can create table with all distinct shops:
SELECT DISTINCT arrayJoin(fulfilled_shops) as shops FROM (
SELECT arrayMap(x -> (x <= hqsize), accumulatedNeed) as mask, arrayFilter((x,y) -> y, shops, mask) as fulfilled_shops FROM t
);
┌─shops─┐
│ 4098 │
│ 4101 │
│ 4109 │
│ 4076 │
│ 4080 │
│ 4062 │
│ 4063 │
│ 4067 │
│ 4072 │
│ 4075 │
│ 4056 │
│ 4058 │
│ 4059 │
│ 4061 │
└───────┘
14 rows in set. Elapsed: 0.049 sec.
Or if you need single array group it back:
SELECT groupArrayDistinct(arrayJoin(fulfilled_shops)) as shops FROM (
SELECT arrayMap(x -> (x <= hqsize), accumulatedNeed) as mask, arrayFilter((x,y) -> y, shops, mask) as fulfilled_shops FROM t
);
┌─shops───────────────────────────────────────────────────────────────────┐
│ [4080,4076,4101,4075,4056,4061,4062,4063,4109,4058,4067,4059,4072,4098] │
└─────────────────────────────────────────────────────────────────────────┘
If you need data only from D70 & E75 you can filter extra rows from table with WHERE before.