Clickhouse difference between elements with different indexes in the array - clickhouse

I have the next challenge.
We have pricing ranges that are set in group Array like
[[1,1000],[1001, 2000], [2003,5000]]
Is it any possibility to receive difference between second element of x array and first element of x+1 array.
In the results, I need something like
[1,3]
Or I can make a flatten list and do arrayDifference between all the elements but then I receive
[999,1,999,3,2997]
How can I get odd elements only using only clickhouse functions?

Try this query:
SELECT
arrs,
arrayMap(index -> arrs[index][1] - arrs[index - 1][2], range(2, length(arrs) + 1)) AS result
FROM
(
/* test data set */
SELECT [] AS arrs
UNION ALL
SELECT [[1, 1000]] AS arrs
UNION ALL
SELECT [[1, 1000], [1001, 2000]] AS arrs
UNION ALL
SELECT [[1, 1000], [1001, 2000], [2003, 5000]] AS arrs
UNION ALL
SELECT [[1, 1000], [1001, 2000], [2003, 5000], [5008, 7890]] AS arrs
)
/*
┌─arrs───────────────────────────────────────────┬─result──┐
│ [] │ [] │
│ [[1,1000]] │ [] │
│ [[1,1000],[1001,2000]] │ [1] │
│ [[1,1000],[1001,2000],[2003,5000]] │ [1,3] │
│ [[1,1000],[1001,2000],[2003,5000],[5008,7890]] │ [1,3,8] │
└────────────────────────────────────────────────┴─────────┘
*/

Related

How to aggregate array type in clickhouse

This is the example table
exampleTable:
id | weeklyNumber |
---- -------------
1 | [2,5,9] |
------------------
2 | [1,10,4] |
The expected results should be the aggregation result of weeklyNumber array which is
[3,15,13] (2+1, 5+10, 9+4)
I did not get idea how to do this.
----- update ----
In addition,
we have many rows of the below table
exampleTable:
id | weeklyNumber | monthlyNumber
---- ------------- -------------
1 | [2,5,9] | [20,50,90]
--------------------------------
2 | [1,10,4] | [10,100,40]
the result should be [2/20 + 1/10, 5/50 + 10/100, 9/90 + 4/40]. How to do that?
It needs to use ForEach-aggregate function combinator:
SELECT sumForEach(weeklyNumber)
FROM
(
SELECT
1 AS id,
[2, 5, 9] AS weeklyNumber
UNION ALL
SELECT
2 AS id,
[1, 10, 4] AS weeklyNumber
)
/*
┌─sumForEach(weeklyNumber)─┐
│ [3,15,13] │
└──────────────────────────┘
*/
In some cases could be used this query:
SELECT arrayReduce('sumForEach', groupArray(weeklyNumber))
FROM
(
SELECT
1 AS id,
[2, 5, 9] AS weeklyNumber
UNION ALL
SELECT
2 AS id,
[1, 10, 4] AS weeklyNumber
)
/*
┌─arrayReduce('sumForEach', groupArray(weeklyNumber))─┐
│ [3,15,13] │
└─────────────────────────────────────────────────────┘
*/
UPDATE
SELECT sumForEach(arrayMap((x, y) -> (x / y), weeklyNumber, monthlyNumber)) AS result
FROM
(
SELECT
1 AS id,
[2, 5, 9] AS weeklyNumber,
[20, 50, 90] AS monthlyNumber
UNION ALL
SELECT
2 AS id,
[1, 10, 4] AS weeklyNumber,
[10, 100, 40] AS monthlyNumber
)
/*
┌─result────────┐
│ [0.2,0.2,0.2] │
└───────────────┘
*/

How to remove first occurence of an element in array in clickhouse?

I want to get something like this
Input :
[2,1,4,1,1,3]
Output:
[2,4,1,1,3]
Try this query:
SELECT
data.1 AS value,
data.2 AS array,
arrayFilter((x, groupPosition) -> groupPosition != 1 OR x != value, array, arrayEnumerateUniq(array)) AS result
FROM
(
/* test data, where the first item is value to apply the filter, the second one is array */
SELECT arrayJoin([
(1, [2, 1, 4, 1, 1, 3]), /* exclude the first occurrence of 1 */
(2, [2, 1, 4, 1, 1, 3]), /* exclude the first occurrence of 2 */
(3, [2, 1, 4, 1, 1, 3]), /* .. */
(4, [2, 1, 4, 1, 1, 3]),
(5, [2, 1, 4, 1, 1, 3]),
(1, []),
(1, [1]),
(1, [1, 1, 1])
]) AS data
)
/* Result:
┌─value─┬─array─────────┬─result────────┐
│ 1 │ [2,1,4,1,1,3] │ [2,4,1,1,3] │
│ 2 │ [2,1,4,1,1,3] │ [1,4,1,1,3] │
│ 3 │ [2,1,4,1,1,3] │ [2,1,4,1,1] │
│ 4 │ [2,1,4,1,1,3] │ [2,1,1,1,3] │
│ 5 │ [2,1,4,1,1,3] │ [2,1,4,1,1,3] │
│ 1 │ [] │ [] │
│ 1 │ [1] │ [] │
│ 1 │ [1,1,1] │ [1,1] │
└───────┴───────────────┴───────────────┘
*/

Is there any function (change Tuple to Array) or (sum Array by key)?

Q1 and Q2 is same Question on different sides.
if data store as tuple(key, value), any SQL can get same result?
(1,3)(2,5)(4,7)
(1,3)(2,5)(3,4)
(2,3)(7,5)(10,4)
Q1: sumMap can change Array to Tuple,but how to change Tuple to Array?
select sumMap(a, b) from (
select array(1,2,4) as a, array(3,5,7) as b
union all
select array(1,2,3) as a, array(3,5,4) as b
union all
select array(2,7,10) as a, array(3,5,4) as b);
│ ([1,2,3,4,7,10],[6,13,4,7,5,4]) │
Error SQL:
select sumMap(a, b).[0], sumMap(a, b).[1] from tbl
[1,2,3,4,7,10] [6,13,4,7,5,4]
Q2: How to sum Array by key, like sumMap?
select array(1,2,4) as a, array(3,5,7) as b
union all
select array(1,2,3) as a, array(3,5,4) as b
union all
select array(2,7,10) as a, array(3,5,4) as b
│ [1,2,4] │ [3,5,7] │
│ [2,7,10]│ [3,5,4] │
│ [1,2,3] │ [3,5,4] │
Error SQL:
select sumBykey(a, a), sumBykey(b, a).key2 from tbl
[1,2,3,4,7,10] [6,13,4,7,5,4]
It needs to use tuple access operators.
SELECT
sumMap(a, b) AS summap,
summap.1 AS a1,
summap.2 AS a2
FROM
(
SELECT [1, 2, 4] AS a, [3, 5, 7] AS b
UNION ALL
SELECT [1, 2, 3] AS a, [3, 5, 4] AS b
UNION ALL
SELECT [2, 7, 10] AS a, [3, 5, 4] AS b
)
/* Result:
┌─summap──────────────────────────┬─a1─────────────┬─a2─────────────┐
│ ([1,2,3,4,7,10],[6,13,4,7,5,4]) │ [1,2,3,4,7,10] │ [6,13,4,7,5,4] │
└─────────────────────────────────┴────────────────┴────────────────┘
*/
At this moment sumMap supports only numeric keys-values. Use hash for keys other types:
SELECT
sumMap(arrayMap(x -> xxHash32(x), a), b) AS summap,
summap.1 AS a1,
summap.2 AS a2
FROM
(
SELECT ['1', '2', '4'] AS a, [3, 5, 7] AS b
UNION ALL
SELECT ['1', '2', '3'] AS a, [3, 5, 4] AS b
UNION ALL
SELECT ['2', '7', '10'] AS a, [3, 5, 4] AS b
)
/* Result:
┌─summap─────────────────────────────────────────────────────────────────────────────┬─a1────────────────────────────────────────────────────────────────┬─a2─────────────┐
│ ([205742900,548432130,1150380693,1842982710,2632741828,3068971186],[13,5,4,7,4,6]) │ [205742900,548432130,1150380693,1842982710,2632741828,3068971186] │ [13,5,4,7,4,6] │
└────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────┴────────────────┘
*/

How to have a lambda max(a,b) function in ClickHouse?

Is there a way, using ClickHouse's lambdas, to perform a max function over two integers ?
Like so:
SELECT
[0,1,2,3,4,5] as five,
arrayMap(i -> max(five[i], 3), arrayEnumerate(five)) as X
Returns
five expected X
0,1,2,3,4,5 3,3,3,3,4,5
Not sure I understand your example (it is not syntactically correct), but for max over two integers ClickHouse has function greatest(x,y):
SELECT
[0, 1, 2, 3, 4, 5] AS five,
arrayMap(i -> greatest(i, 3), five) AS X
┌─five──────────┬─X─────────────┐
│ [0,1,2,3,4,5] │ [3,3,3,3,4,5] │
└───────────────┴───────────────┘

Checking cell combination in a table

I'm trying to implement an algorithm for table verification, but I'm confused with the multiple choices Ruby supplies for working with arrays and hashes and I need help putting it all together. Consider this table as example:
| A | B | C |
-------------
| 1 | 2 | 3 |
| 1 | 2 | 4 |
| 5 | 6 | 7 |
| 1 | 1 | 3 |
My method should count the number of occurrences of a specific cell combination. For example:
match_line([A => 1, C => 3])
The results should be 2, as this combination exists in both the first row and the last one.
What I did so far is to create an hash variable that hold column indexing like so:
[A => 0, B => 1, C=> 2]
And I also have an array list which holds all the above table rows like so:
[[1, 2, 3], [1, 2, 4], [5, 6, 7], [1, 1, 3]]
The logic looks like that - the match_line method above specific the user wants to match a row where column A has the 1 value in it and column C has the 3 value in it. Based on my index hash, the A column index is 0 and C index is 2. Now for each array (row) in the array list, if index 0 equals 1 and index 2 equals 3 like the user requested , I add +1 to a counter and keep going over the other array row until I'm over.
I tried to form it into code, but I ended with a way that seems very not efficient of doing so, I'm interested to see your code example to see perhaps Ruby has inner Enumerable methods that I'm not aware of to make it more elegant.
First, you should use the best available structure to describe your domain :
data = [[1, 2, 3], [1, 2, 4], [5, 6, 7], [1, 1, 3]]
#data_hashes = data.map do |sequence|
{ 'A' => sequence[0], 'B' => sequence[1], 'C' => sequence[2] }
end
Second, I think you should use a real Hash as input for match_line :
# replace match_line([A => 1, C => 3]) with
match_line({'A' => 1, 'C' => 3})
Now you're all set for an easy implementation using Enumerable#select and Array#size (or use Array#count as pointed by Keith Bennet)
def match_line(match)
#data_hashes.count { |row|
match.all? { |match_key, match_value|
row[match_key] == match_value
}
}
end
EDIT: Dynamically create Hash from column names
columns = ['a', 'b', 'c']
data = [[1, 2, 3], [1, 2, 4], [5, 6, 7], [1, 1, 3]]
#data_hashes = data.map do |row|
Hash[columns.zip(row)]
end

Resources