How to select named tuple/json/map with multiple types from subquery in clickhouse? - clickhouse

I'm struggling with a query, where I must select a group of named tuples from a subquery result. Sample code looks like this:
SELECT 123456 as productId,
ordersHistory # <- this must be a named tuple like [{'date': '2022-11-10', 'orders': 4}], but now it's only [('2022-11-10', 4)]
FROM products
GLOBAL LEFT JOIN (
SELECT groupArray(tuple(date, orders)) ordersHistory,
123456 as productId,
FROM (
SELECT date, orders
FROM orders
WHERE productId = 123456 AND date BETWEEN '2022-10-10' AND '2022-11-10'
)
GROUP BY productId
) orders ON products.productId = orders.productId
I've tried maps, but they expect to have a fixed subtype, which is not applicable to my result subset. Cast output format to JSON in subquery is not working either. And I have no clue, how to make named tuples in other ways.
I expect ordersHistory subset of result to be an array of key->value pairs.

Found a way to workaround this problem.
First of all, we should transform our default tuple to named tuple
cast(tuple(date, orders), Tuple(date Date, orders UInt64)) AS namedTuple
Named tuples can be cast to JSON
namedTuple::JSON as jsonedTuple
But groupArray could not work with JSON. So we should transform JSON to some type, which is appliable to group array. Strings are the case
toString(jsonedTuple) as stringifiedJson
And this stringified JSON could be grouped.
groupArray(stringifiedJSON)
This transformes into SELECT subquery
SELECT groupArray(toString(cast(tuple(date, orders), Tuple(date Date, orders UInt64))::JSON))
Yes. This looks awful, but it does it's job and this is only one thing, that I've found to workaround this case.

Related

Sum of only Distinct values in a Column in DAX

I have table[Table 1] having three columns
OrganizationName, FieldName, Acres having data as follows
organizationname fieldname Acres
ABC |F1 |0.96
ABC |F1 |0.96
ABC |F1 |0.64
I want to calculate the sum of Distinct values of Acres
(eg: 0.96+0.64) in DAX.
One of the problems with doing what you want is that many measures rely on filters and not actual table expressions. So, getting a distinct list of values and then filtering the table by those values, just gives you the whole table back.
The iterator functions are handy and operate on table expressions, so try SUMX
TotalDistinctAcreage = SUMX(DISTINCT(Table1[Acres]),[Acres])
This will generate a table that is one column containing only the distinct values for Acres, and then add them up. Note that this is only looking at the Acres column, so if different fields and organizations had the same acreage -- then that acreage would still only be counted once in this sum.
If instead you want to add up the acreage simply on distinct rows, then just make a small change:
TotalAcreageOnDistinctRows = SUMX(DISTINCT(Table1),[Acres])
Hope it helps.
Ok, you added these requirements:
Thank You. :) However, I want to add Distinct values of Acres for a
Particular Fieldname. Is this possible? – Pooja 3 hours ago
The easiest way really is just to go ahead and slice or filter the original measure that I gave you. But if you have to apply the filter context in DAX, you can do it like this:
Measure =
SUMX(
FILTER(
SUMMARIZE( Table1, [FieldName], [Value] )
, [FieldName] = "<put the name of your specific field here"
)
, [Value]
)

Can I use FOR ALL ENTRIES with GROUP BY?

Currently the code looks something like this:
LOOP AT lt_orders ASSIGNING <fs_order>.
SELECT COUNT(*) AS cnt
FROM order_items
INTO <fs_order>-cnt
WHERE order_id = <fs_order>-order_id.
ENDLOOP.
It is the slowest part of the report. I want to speed it up.
How can I use FOR ALL ENTRIES with GROUP BY?
Check the documentation. You can't use GROUP BY. Maybe in this case, you could try selecting your items with FAE outside of the loop, then count them using a parallel cursor:
REPORT.
TYPES: BEGIN OF ty_result,
vbeln TYPE vbeln,
cnt TYPE i.
TYPES: END OF ty_result.
DATA: lt_headers TYPE SORTED TABLE OF ty_result WITH UNIQUE KEY vbeln,
lv_tabix TYPE sy-tabix VALUE 1.
"get the headers
SELECT vbeln FROM vbak UP TO 100 ROWS INTO CORRESPONDING FIELDS OF TABLE lt_headers.
"get corresponding items
SELECT vbeln, posnr FROM vbap FOR ALL ENTRIES IN #lt_headers
WHERE vbeln EQ #lt_headers-vbeln
ORDER BY vbeln, posnr
INTO TABLE #DATA(lt_items).
LOOP AT lt_headers ASSIGNING FIELD-SYMBOL(<h>).
LOOP AT lt_items FROM lv_tabix ASSIGNING FIELD-SYMBOL(<i>).
IF <i>-vbeln NE <h>-vbeln.
lv_tabix = sy-tabix.
EXIT.
ELSE.
<h>-cnt = <h>-cnt + 1.
ENDIF.
ENDLOOP.
ENDLOOP.
BREAK-POINT.
Or join header/item with a distinct count on the item id (whichever column that would be in your table).
You should be able to do something like
SELECT COUNT(order_item_id) AS cnt, order_id
FROM order_items
INTO CORRESPONDING FIELDS OF TABLE lt_count
GROUP BY order_id.
Assuming that order_item_id is a key in the order_items table. And assuming that lt_count has two fields: cnt of type int8 and order_id of same type as your other order_id fields
PS: then you can loop over lt_count and move the counts to lt_orders. Or the other way around. To speed up the loop, sort one of the tables and use READ ... BINARY SEARCH
I did with table KNB1 (customer master in company code), where we have customers, which are created in several company codes.
Please note, because of FOR ALL ENTRIES you have to SELECT the full key.
TYPES: BEGIN OF ty_knb1,
kunnr TYPE knb1-kunnr,
count TYPE i,
END OF ty_knb1.
TYPES: BEGIN OF ty_knb1_fae,
kunnr TYPE knb1-kunnr,
END OF ty_knb1_fae.
DATA: lt_knb1_fae TYPE STANDARD TABLE OF ty_knb1_fae.
DATA: lt_knb1 TYPE HASHED TABLE OF ty_knb1
WITH UNIQUE KEY kunnr.
DATA: ls_knb1 TYPE ty_knb1.
DATA: ls_knb1_db TYPE knb1.
START-OF-SELECTION.
lt_knb1_fae = VALUE #( ( kunnr = ... ) ). "add at least one customer which is created in several company codes
ls_knb1-count = 1.
SELECT kunnr bukrs
INTO CORRESPONDING FIELDS OF ls_knb1_db
FROM knb1
FOR ALL ENTRIES IN lt_knb1_fae
WHERE kunnr EQ lt_knb1_fae-kunnr.
ls_knb1-kunnr = ls_knb1_db-kunnr.
COLLECT ls_knb1 INTO lt_knb1.
ENDSELECT.
Create a range table for your lt_orders, like lt_orders_range.
Do select order_id, count( * ) where order_id in lt_orders_range.
If you think this is too much to create a range table, you will save a lot of performance by running just one select for all orders instead of single select for each order id.
Not directly, only through a CDS view
While all of the answers provide a faster solution than the one in the question, the fastest way is not mentioned.
If you have at least Netweaver 7.4, EHP 5 (and you should, it was released in 2014), you can use CDS views, even if you are not on HANA.
It still cannot be done directly, as OpenSQL does not allow FOR ALL ENTRIES with GROUP BY, and CDS views cannot handle FOR ALL ENTRIES. However, you can create one of each.
CDS:
#AbapCatalog.sqlViewName: 'zorder_i_fae'
DEFINE VIEW zorder_items_fae AS SELECT FROM order_items {
order_id,
count( * ) AS cnt,
}
GROUP BY order_id
OpenSQL:
SELECT *
FROM zorder_items_fae
INTO TABLE #DATA(lt_order_cnt)
FOR ALL ENTRIES IN #lt_orders
WHERE order_id = #lt_orders-order_id.
Speed
If lt_orders contains more than about 30% of all possible order_id values from table ORDER_ITEMS, the answer from iPirat is faster. (While using more memory, obviously)
However, if you need only a couple hunderd order_id values out of millions, this solution is about 10 times faster than any other answer, and 100 times faster than the original.

ClickHouse: is there a way to do arrayJoin( DictGetArray( ...) )

With ClickHouse, I do analytics on multi-valued dimensions. This is very easy to do with the arrayJoin function. For example:
SELECT arrayJoin(places) AS place, count() FROM hits GROUP BY place
Now let's get into dictionaries. I store the field personId as a column and use a dictionary to map personId to the person's (first) name. If want to count hits by name, all I have to do is:
SELECT dictGetString('persons', 'name', personId) AS name, count()
FROM hits GROUP BY name
My specific use case is people with several (first) names. I would like to combine arrayJoin and dictionaries. The code I imagine would look like:
SELECT arrayJoin(dictGetStringArray('persons', 'names', personId)) AS name, count()
FROM hits GROUP BY name
dictGetStringArray doesn't seem to exist. And anyway I don't know how to map to an array in a dictionary.
Is there a functionality in ClickHouse about this? Is there a kind of workaround or method to do it?
(note: my use case is not really "persons" and I don't care if it works for strings or any other type :-)
Have a look at the array join clause. I think you will find your solutions - something like:
SELECT person_id, count
FROM hits
ARRAY JOIN persons AS person_id,
arrayMap(lambda(tuple(x), dictGetString('persons', 'names', x)), persons) as name
GROUP BY name
The idea is to apply dictGetString to each member of the array. The above is based on the documentation below. Sorry could not reproduce it because I don't know how to use dictionaries in clickhouse!
See docs here - good luck:
https://clickhouse.yandex/docs/en/query_language/select/#array-join-clause

How to order by field mixed with string and number in Oracle?

These are the field (crane_no) values to be sorted
QC11QC10QC9
I tried the following query:
select * from table order by crane_no DESC
but query results does not give in an order because the field is mixed with staring and number (Example:QC12).
I get following results for above query:
QC9QC11QC10
I want the results to be in order (QC9, QC10, QC11). Thanks
If the data isn't huge, I'd use a regex order by clause:
select
cran_no
from your_table
order by
regexp_substr(cran_no, '^\D*') nulls first,
to_number(regexp_substr(cran_no, '\d+'))
This looks for the numbers in the string, so rows like 'QCC20', 'DCDS90' are ordered properly; it also takes care of nulls.
One approach is to extract the numeric portion of the crane_no columns using SUBSTR(), cast to an integer, and order descending by this value.
SELECT *
FROM yourTable
ORDER BY CAST(SUBSTR(crane_no, 3) AS INT) DESC
Note that I assume in my answer that every entry in crane_no is prefixed with the fixed width QC. If not, then we would have to do more work to identify the numerical component.
select ...
order by to_number( substr( crane_no,3 )) desc

Oracle Error : Maximum number of expressions in a list is 1000

I am working in C#.Net and Oracle. i am passing a string to a query. i had used this code for concating all the item id's
List<string> listRetID = new List<string>();
foreach (DataRow row in dtNew.Rows)
{
listRetID.Add(row[3].ToString());
}
This concatination goes above 10,000. so i am getting the error message like this..
ORA-01795: maximum number of expressions in a list is 1000
How to fix this..
The documentation states:
A comma-delimited list of expressions can contain no more than 1000
expressions. A comma-delimited list of sets of expressions can contain
any number of sets, but each set can contain no more than 1000
expressions.
Presumably you're using this string as the contents of in IN (...) restriction, in which case there isn't really anything you can do - this just won't work. A common way to work around this is to generate a dummy table as a subquery or common table expression (CTE) and joining to that, but I'm not sure how you'd translate your List - possibly similar to whatever you're doing with your IN clause. You'd want to end up with your query looking something like:
with tmp_tab as (
select <val1 from list> as val from dual
union all select <val2 from list from dual
union all select <val3 from list from dual
...
)
select <something>
from <your table> yt
join tmp_tab tt on yt.<field> = tt.val
But that requires generating the entire (huge) query including the CTE each time you run it, and there's no opportunity to use bind variables.
You might find something like this approach more palatable.
You can have 10 lists of 1000 items instead of 1 list of 10000 items.
WHERE some_column IN (1,2,...,1000)
OR some_column IN (1001,1002,...2000) -- etc.
Not a C# guy but I would just split the list listRetID in multiple lists or create a list of lists
Then loop through that list of lists and perform the query on each element of the list.
What is the intent of your query?
It looks like you are selecting rows that have some column equal to the 3rd column of one of the records of some query.
The correct way of doing this is either an SQL join or a subquery. There is absolutely no need to bring this into C# code. For example, using a subquery you can write something like this:
SELECT *
FROM atable
WHERE afield IN (
SELECT field3
FROM someothertable)

Resources