With ClickHouse, I do analytics on multi-valued dimensions. This is very easy to do with the arrayJoin function. For example:
SELECT arrayJoin(places) AS place, count() FROM hits GROUP BY place
Now let's get into dictionaries. I store the field personId as a column and use a dictionary to map personId to the person's (first) name. If want to count hits by name, all I have to do is:
SELECT dictGetString('persons', 'name', personId) AS name, count()
FROM hits GROUP BY name
My specific use case is people with several (first) names. I would like to combine arrayJoin and dictionaries. The code I imagine would look like:
SELECT arrayJoin(dictGetStringArray('persons', 'names', personId)) AS name, count()
FROM hits GROUP BY name
dictGetStringArray doesn't seem to exist. And anyway I don't know how to map to an array in a dictionary.
Is there a functionality in ClickHouse about this? Is there a kind of workaround or method to do it?
(note: my use case is not really "persons" and I don't care if it works for strings or any other type :-)
Have a look at the array join clause. I think you will find your solutions - something like:
SELECT person_id, count
FROM hits
ARRAY JOIN persons AS person_id,
arrayMap(lambda(tuple(x), dictGetString('persons', 'names', x)), persons) as name
GROUP BY name
The idea is to apply dictGetString to each member of the array. The above is based on the documentation below. Sorry could not reproduce it because I don't know how to use dictionaries in clickhouse!
See docs here - good luck:
https://clickhouse.yandex/docs/en/query_language/select/#array-join-clause
Related
I'm struggling with a query, where I must select a group of named tuples from a subquery result. Sample code looks like this:
SELECT 123456 as productId,
ordersHistory # <- this must be a named tuple like [{'date': '2022-11-10', 'orders': 4}], but now it's only [('2022-11-10', 4)]
FROM products
GLOBAL LEFT JOIN (
SELECT groupArray(tuple(date, orders)) ordersHistory,
123456 as productId,
FROM (
SELECT date, orders
FROM orders
WHERE productId = 123456 AND date BETWEEN '2022-10-10' AND '2022-11-10'
)
GROUP BY productId
) orders ON products.productId = orders.productId
I've tried maps, but they expect to have a fixed subtype, which is not applicable to my result subset. Cast output format to JSON in subquery is not working either. And I have no clue, how to make named tuples in other ways.
I expect ordersHistory subset of result to be an array of key->value pairs.
Found a way to workaround this problem.
First of all, we should transform our default tuple to named tuple
cast(tuple(date, orders), Tuple(date Date, orders UInt64)) AS namedTuple
Named tuples can be cast to JSON
namedTuple::JSON as jsonedTuple
But groupArray could not work with JSON. So we should transform JSON to some type, which is appliable to group array. Strings are the case
toString(jsonedTuple) as stringifiedJson
And this stringified JSON could be grouped.
groupArray(stringifiedJSON)
This transformes into SELECT subquery
SELECT groupArray(toString(cast(tuple(date, orders), Tuple(date Date, orders UInt64))::JSON))
Yes. This looks awful, but it does it's job and this is only one thing, that I've found to workaround this case.
I have table[Table 1] having three columns
OrganizationName, FieldName, Acres having data as follows
organizationname fieldname Acres
ABC |F1 |0.96
ABC |F1 |0.96
ABC |F1 |0.64
I want to calculate the sum of Distinct values of Acres
(eg: 0.96+0.64) in DAX.
One of the problems with doing what you want is that many measures rely on filters and not actual table expressions. So, getting a distinct list of values and then filtering the table by those values, just gives you the whole table back.
The iterator functions are handy and operate on table expressions, so try SUMX
TotalDistinctAcreage = SUMX(DISTINCT(Table1[Acres]),[Acres])
This will generate a table that is one column containing only the distinct values for Acres, and then add them up. Note that this is only looking at the Acres column, so if different fields and organizations had the same acreage -- then that acreage would still only be counted once in this sum.
If instead you want to add up the acreage simply on distinct rows, then just make a small change:
TotalAcreageOnDistinctRows = SUMX(DISTINCT(Table1),[Acres])
Hope it helps.
Ok, you added these requirements:
Thank You. :) However, I want to add Distinct values of Acres for a
Particular Fieldname. Is this possible? – Pooja 3 hours ago
The easiest way really is just to go ahead and slice or filter the original measure that I gave you. But if you have to apply the filter context in DAX, you can do it like this:
Measure =
SUMX(
FILTER(
SUMMARIZE( Table1, [FieldName], [Value] )
, [FieldName] = "<put the name of your specific field here"
)
, [Value]
)
I have two tables, one called STUDENTS and the other CLASSES. I have to select all the students that are from the same class of one student, and this student has his own number id, and through this number id that I have to select everything.
TABLE STUDENTS
nr_rgm
nm_name
nm_father
nm_mother
dt_birth
id_sex
TABLE CLASSES
cd_class
nr_schoolyear
cd_school
cd_degree
nr_series
cd_class
cd_period
So I tried something like that :
SELECT count(*) FROM students, classes WHERE id_sex = 'M' AND
cd_class = (SELECT cd_class FROM classes WHERE nr_rgm = '12150');
But then it points an error, and the error is the follow :
single-row subquery returns more than one row
So, how can I make this work ?
you should use "in" and not "=" when applying subselects.
I think what you really would want to do is to simply join the two tables together rather than issuing a sub select:
SELECT count(*)
FROM students s, classes c
WHERE s.id_sex = 'M' AND c.nr_rgm = '12150' AND s.cd_class = c.cd_class;
This way you just tell the database: Please count all the occurrences where my students.id_sex = 'M' and my classes.nr_rgm = '12150' AND all found studends.cd_class match those of my classes.cd_class.
The reason why your statement above fails is because the ordinary = operation, when not used in a join, will only expect one single value, like you do with s.id_sex='M' while your statement returns multiple values. To cope with that you have to use the IN operator which operates on lists.
However, you can and will achieve the very same with just joining the two tables together, and it will be much more efficient on bigger data sets.
One more note to the example above. If classes.nr_rgm is a field of data type NUMBER, don't use the ' around the value 12150 as it will lead to implicit type conversion. With other words, '12150' is a string and will have to be converted to NUMBER first before doing a comparison.
The easiest way to ask my question is with a Hypothetical Scenario.
Lets say we have 3 tables. Singapore_Prices, Produce_val, and Bosses_unreasonable_demands.
So Prices is a pretty simple table. Item column containing a name, and a Price column containing a number.
Produce_Val is also simple 2 column table. Type column containing what type the produce is (Fruit or veggie) and then Name column (Tomato, pineapple, etc.)
The Bosses_unreasonable_demands only contains one column, Fruit, which CAN contain the names of some fruits.
OK? Ok.
SO, My boss wants me to write a query that returns the prices for every fruit in his unreasonable demands table. Simple enough. BUT, if he doesn't have any entries in his table, he just wants me to output the prices of ALL fruits that exist in produce_val.
Now, assuming I don't know where the DBA who designed this silly hypothetical system lives (and therefore can't get him to fix this), our query would look like this:
if <Logic to determine if Bosses demands are empty>
Then
select Item, Price
from Singapore_Prices
where Item in (select Fruit from Bosses_Unreasonable_demands)
Else
select Item, Price
from Singapore_Prices
where Item in (select Name from Produce_val where type = 'Fruit')
end if;
(Well, we'd select those into a variable, and then output the variable, probably with bulk-collect shenanigans, but that's not important)
Which works. It is entirely functional, and won't be slow, even if we extend it out to 2000 other stores other than Singapore. (Well, no slower than anything else that touches 2000 some tables) BUT, I'm still doing two different select statements that are practically identical. My Comp Sci teacher rolls in their grave every time my fingers hit ctrl-V. I can cut this code in half and only do one select statement. I KNOW I can.
I just have no earthly idea how. I can't use cursors as an in statement, I can't use nested tables or varrays, I can't use cleverly crafted strings, I... I just... I don't know. I don't know how to do this. Is there a way? Does it exist?
Or do I have to copy/paste forever?
Your best bet would be dynamic SQL, because you can't parameterize table or column names.
You will have a SQL query template, have a logic to determine tables and columns that you want to query, then blend them together and execute.
Another aproach, (still a lot of ctrl-v like code) is to use set construction UNION ALL:
select 1st query where boss_condition
union all
select 2nd query where not boss_condition
Try this:
SELECT *
FROM (SELECT s.*, 'BOSS' AS FRUIT_SOURCE
FROM BOSSES_UNREASONABLE_DEMANDS b
INNER JOIN SINGAPORE_FRUIT_LIST s
ON s.ITEM = b.FRUIT
CROSS JOIN (SELECT COUNT(*) AS BOSS_COUNT
FROM BOSSES_UNREASONABLE_DEMANDS)) x
UNION ALL
(SELECT s.*, 'NORMAL' AS FRUIT_SOURCE
FROM PRODUCE_VAL p
INNER JOIN SINGAPORE_FRUIT_LIST s
ON (s.ITEM = p.NAME AND
s.TYPE = 'Fruit')
CROSS JOIN (SELECT COUNT(*) AS BOSS_COUNT
FROM BOSSES_UNREASONABLE_DEMANDS)) n
WHERE (BOSS_COUNT > 0 AND FRUIT_SOURCE = 'BOSS') OR
(BOSS_COUNT = 0 AND FRUIT_SOURCE = 'NORMAL')
Share and enjoy.
I think you can use nested tables. Assume you have a schema-level nested table type FRUIT_NAME_LIST (defined using CREATE TYPE).
SELECT fruit
BULK COLLECT INTO my_fruit_name_list
FROM bosses_unreasonable_demands
;
IF my_fruit_name_list.count = 0 THEN
SELECT name
BULK COLLECT INTO my_fruit_name_list
FROM produce_val
WHERE type='Fruit'
;
END IF;
SELECT item, price
FROM singapore_prices
WHERE item MEMBER OF my_fruit_name_list
;
(or, WHERE item IN (SELECT column_value FROM TABLE(CAST(my_fruit_name_list AS fruit_name_list)) if you like that better)
I am working in C#.Net and Oracle. i am passing a string to a query. i had used this code for concating all the item id's
List<string> listRetID = new List<string>();
foreach (DataRow row in dtNew.Rows)
{
listRetID.Add(row[3].ToString());
}
This concatination goes above 10,000. so i am getting the error message like this..
ORA-01795: maximum number of expressions in a list is 1000
How to fix this..
The documentation states:
A comma-delimited list of expressions can contain no more than 1000
expressions. A comma-delimited list of sets of expressions can contain
any number of sets, but each set can contain no more than 1000
expressions.
Presumably you're using this string as the contents of in IN (...) restriction, in which case there isn't really anything you can do - this just won't work. A common way to work around this is to generate a dummy table as a subquery or common table expression (CTE) and joining to that, but I'm not sure how you'd translate your List - possibly similar to whatever you're doing with your IN clause. You'd want to end up with your query looking something like:
with tmp_tab as (
select <val1 from list> as val from dual
union all select <val2 from list from dual
union all select <val3 from list from dual
...
)
select <something>
from <your table> yt
join tmp_tab tt on yt.<field> = tt.val
But that requires generating the entire (huge) query including the CTE each time you run it, and there's no opportunity to use bind variables.
You might find something like this approach more palatable.
You can have 10 lists of 1000 items instead of 1 list of 10000 items.
WHERE some_column IN (1,2,...,1000)
OR some_column IN (1001,1002,...2000) -- etc.
Not a C# guy but I would just split the list listRetID in multiple lists or create a list of lists
Then loop through that list of lists and perform the query on each element of the list.
What is the intent of your query?
It looks like you are selecting rows that have some column equal to the 3rd column of one of the records of some query.
The correct way of doing this is either an SQL join or a subquery. There is absolutely no need to bring this into C# code. For example, using a subquery you can write something like this:
SELECT *
FROM atable
WHERE afield IN (
SELECT field3
FROM someothertable)