Panel data in DolphinDB - panel

I have the following table in DolphinDB:
n=1000000
t1=table(rand(`a`b`c,n)as ID, rand(2017.08.16..2018.09.15,n) as date ,rand(10.0,n) as price)
What I am trying to do is transpose the table into a panel data that all dates are in one column, for example:
date a b c
2017.08.16 1.1246 4.6269 2.3019
2017.08.17 7.8525 9.7178 2.916741
...
Is there an efficient way to do it in DolphinDB?

You can try this:
panel(t1.date,t1.ID,t1.price, 2017.08.16..2018.09.15)
another way:
select price from t1 pivot by date,ID

Related

Google Sheets Formula - Get Total from filtered dates per row (undefined number of columns)

I have this data in Google Sheets where in I need to get the total of the filtered data columns per row. The date columns are not fixed (may increase over time, I already know how to handle this undefined number of columns). What my current challenge encountered is how can I efficiently get a summary of totals per user based on filtered date columns.
My data is like this:
My expected result is like this:
My current idea is this:
Here is a sample spreadsheet for reference:
https://docs.google.com/spreadsheets/d/1_dByPabStGQvh94TabKxwFeUyVaRFnkBCRf4ioTY5jM/edit?usp=sharing
This is a method to unpivot the data so you can work with it
=ARRAYFORMULA(
QUERY(
IFERROR(
SPLIT(
FLATTEN(
IF(ISBLANK(A2:A),,A2:A&"|"&B1:G1&"|"&B2:G)),
"|")),
"select Col1, Sum(Col3)
where
Col2 >= "&DATE(2022,1,1)&" and
Col2 <= "&DATE(2022,1,15)&"
group by Col1
label
Col1 'Person',
Sum(Col3) 'Total'"))
Basically, its creating an output of User1|44557|8 -- it then FLATTENs it all and splits by the pipe, which gives you three clean columns.
Run that through a QUERY to SUM by the person between the dates and you get what you're after. If you wanted to use cell references for dates, simply replace the dates with the cell references.
To expand the table, change B1:G1 and B2:G2 to match the width of the range.

Power Bi count rows for all tables in one measure

In my power Bi I would like to count rows for all my tables and having this output:
Table Name
Row count
Table1
126
Table2
985
Table3
998
...
...
As long as I have few tables I can do
NEWTABLE = UNION(
ROW("TableName","Table1", "Rowcount",ROWSCOUNT(Table1)),
ROW("TableName","Table2", "Rowcount",ROWSCOUNT(Table2)),
...
)
But this starts to be complicated when I have many tables.
Is there a way I can do it? Like a loop or something?
Thank you
If you only need a metrics then you can use DaxStudio -> ViewMetrics
where cardinality is your "rowCounts"
If you need something more, then you can get all table name from DMV
select * from $SYSTEM.TMSCHEMA_TABLES
populate this as another table in your model, and use M language to loop through.
here useful example:
https://community.powerbi.com/t5/Power-Query/Power-query-Counting-rows-from-all-table-in-query-editor-but-not/td-p/1198489

Can I use FOR ALL ENTRIES with GROUP BY?

Currently the code looks something like this:
LOOP AT lt_orders ASSIGNING <fs_order>.
SELECT COUNT(*) AS cnt
FROM order_items
INTO <fs_order>-cnt
WHERE order_id = <fs_order>-order_id.
ENDLOOP.
It is the slowest part of the report. I want to speed it up.
How can I use FOR ALL ENTRIES with GROUP BY?
Check the documentation. You can't use GROUP BY. Maybe in this case, you could try selecting your items with FAE outside of the loop, then count them using a parallel cursor:
REPORT.
TYPES: BEGIN OF ty_result,
vbeln TYPE vbeln,
cnt TYPE i.
TYPES: END OF ty_result.
DATA: lt_headers TYPE SORTED TABLE OF ty_result WITH UNIQUE KEY vbeln,
lv_tabix TYPE sy-tabix VALUE 1.
"get the headers
SELECT vbeln FROM vbak UP TO 100 ROWS INTO CORRESPONDING FIELDS OF TABLE lt_headers.
"get corresponding items
SELECT vbeln, posnr FROM vbap FOR ALL ENTRIES IN #lt_headers
WHERE vbeln EQ #lt_headers-vbeln
ORDER BY vbeln, posnr
INTO TABLE #DATA(lt_items).
LOOP AT lt_headers ASSIGNING FIELD-SYMBOL(<h>).
LOOP AT lt_items FROM lv_tabix ASSIGNING FIELD-SYMBOL(<i>).
IF <i>-vbeln NE <h>-vbeln.
lv_tabix = sy-tabix.
EXIT.
ELSE.
<h>-cnt = <h>-cnt + 1.
ENDIF.
ENDLOOP.
ENDLOOP.
BREAK-POINT.
Or join header/item with a distinct count on the item id (whichever column that would be in your table).
You should be able to do something like
SELECT COUNT(order_item_id) AS cnt, order_id
FROM order_items
INTO CORRESPONDING FIELDS OF TABLE lt_count
GROUP BY order_id.
Assuming that order_item_id is a key in the order_items table. And assuming that lt_count has two fields: cnt of type int8 and order_id of same type as your other order_id fields
PS: then you can loop over lt_count and move the counts to lt_orders. Or the other way around. To speed up the loop, sort one of the tables and use READ ... BINARY SEARCH
I did with table KNB1 (customer master in company code), where we have customers, which are created in several company codes.
Please note, because of FOR ALL ENTRIES you have to SELECT the full key.
TYPES: BEGIN OF ty_knb1,
kunnr TYPE knb1-kunnr,
count TYPE i,
END OF ty_knb1.
TYPES: BEGIN OF ty_knb1_fae,
kunnr TYPE knb1-kunnr,
END OF ty_knb1_fae.
DATA: lt_knb1_fae TYPE STANDARD TABLE OF ty_knb1_fae.
DATA: lt_knb1 TYPE HASHED TABLE OF ty_knb1
WITH UNIQUE KEY kunnr.
DATA: ls_knb1 TYPE ty_knb1.
DATA: ls_knb1_db TYPE knb1.
START-OF-SELECTION.
lt_knb1_fae = VALUE #( ( kunnr = ... ) ). "add at least one customer which is created in several company codes
ls_knb1-count = 1.
SELECT kunnr bukrs
INTO CORRESPONDING FIELDS OF ls_knb1_db
FROM knb1
FOR ALL ENTRIES IN lt_knb1_fae
WHERE kunnr EQ lt_knb1_fae-kunnr.
ls_knb1-kunnr = ls_knb1_db-kunnr.
COLLECT ls_knb1 INTO lt_knb1.
ENDSELECT.
Create a range table for your lt_orders, like lt_orders_range.
Do select order_id, count( * ) where order_id in lt_orders_range.
If you think this is too much to create a range table, you will save a lot of performance by running just one select for all orders instead of single select for each order id.
Not directly, only through a CDS view
While all of the answers provide a faster solution than the one in the question, the fastest way is not mentioned.
If you have at least Netweaver 7.4, EHP 5 (and you should, it was released in 2014), you can use CDS views, even if you are not on HANA.
It still cannot be done directly, as OpenSQL does not allow FOR ALL ENTRIES with GROUP BY, and CDS views cannot handle FOR ALL ENTRIES. However, you can create one of each.
CDS:
#AbapCatalog.sqlViewName: 'zorder_i_fae'
DEFINE VIEW zorder_items_fae AS SELECT FROM order_items {
order_id,
count( * ) AS cnt,
}
GROUP BY order_id
OpenSQL:
SELECT *
FROM zorder_items_fae
INTO TABLE #DATA(lt_order_cnt)
FOR ALL ENTRIES IN #lt_orders
WHERE order_id = #lt_orders-order_id.
Speed
If lt_orders contains more than about 30% of all possible order_id values from table ORDER_ITEMS, the answer from iPirat is faster. (While using more memory, obviously)
However, if you need only a couple hunderd order_id values out of millions, this solution is about 10 times faster than any other answer, and 100 times faster than the original.

Hive: select the most recent item from the set

I'm looking for a way of choosing the most recent item (date) from the set in Hive. For instance a have the following table t1:
item date
a 2016-01-01
a 2016-02-04
b 2016-01-10
after
hive> select item, collect_set(date) as dates from t1 group by item;
i have
item dates
a [2016-01-01, 2016-02-04]
b [2016-01-10]
So now I need to get rid of absolete dates, i.e., create table like
item date
a 2016-02-04
b 2016-01-10
Can anyone help?
Just use max():
select item, max(date) as date
from t1
group by item;
If you actually want a new table, you can use create table as.

VisualStudio 2010 - How do you display only YEAR in a chart?

This is the code I used in my dataset. The result is as shown in the image.
The plan is to display the YEAR without month, day or time. Is this feasible?
SELECT t.Name AS Territoryname,
p.LastName AS SalesPerson,
c.CardType AS PayType,
s.OrderDate,
s.TotalDue
FROM Sales s
JOIN Person p
ON s.SalesPersonID=p.BusinessEntityID
JOIN CreditCard c
ON c.CreditCardID=s.CreditCardID
JOIN Territoryt
ON t.TerritoryID=s.TerritoryID
There are a number of ways to achieve this. The easiest, is to calculate a year value directly in your dataset, and then use this field in your chart. This is a simple matter of adding the following code to the select part of your dataset:
select
...
s.OrderDate,
YEAR(s.OrderDate) AS [OrderYear], -- This is the new field
s.TotalDue
from
...
Alternatively, you can create an expression that does the same thing, either as a calculated field in your dataset, or directly in the chart as the category field.

Resources