Optimization problems with linq and searching within a groupby - linq

I have a Collection 'Col':
ID|Type|Type1|Type2|Count
1| a | a | b |2
1| b | a | b |1
2| d | b | d |2
2| b | b | d |2
3| x | y | x |0
3| y | y | x |1
I want to return a collection which contains for each unique ID, the possible types, the sum of each types count, and the individual count for each type
Ideally, the output should be:
ID|Type1|Type2|TotalCount|Type1Count|Type2Count
1 | a | b | 3 | 2 | 1
2 | b | d | 4 | 2 | 2
3 | y | x | 1 | 0 | 1
Currently, I do a groupBy and a Select on the collection:
Col.GroupBy(g=>new{g.ID, g.Type1, g.Type2}).Select(s=>new
{s.Key.ID, s.Key.Type1, s.Key.Type2, TotalCount = s.Sum(x=>x.Count),
Type1Count = s.Count(x=>x.Type == s.Key.Type1) //addition of individual count severely slows down query
Type2Count = s.Count(x=>x.Type == s.Key.Type2)// same here
}
);
As indicated above, the inclusion of Type1Count and Type2Count drastically increases the time the query would take without those columns. I have tried various other ways to get the value of the "Count" column from the initial Col table for each grouped row, but again...performance takes a big hit. Eventually the right results are returned though.
Is there an easier, better and or faster way to achieve the same results?
Any help is appreciated!

Related

BigQuery: Sample a varying number of rows per group

I have two tables. One has a list of items, and for each item, a number n.
item | n
--------
a | 1
b | 2
c | 3
The second one has a list of rows containing item, uid, and other rows.
item | uid | data
------------------
a | x | foo
a | x | baz
a | x | bar
a | z | arm
a | z | leg
b | x | eye
b | x | eye
b | x | eye
b | x | eye
b | z | tap
c | y | tip
c | z | top
I would like to sample, for each (item,uid) pair, n rows (arbitrary, it's better if this is uniformly random, but it doesn't have to be). In the example above, I want to keep maximum one row per user for item a, two rows per user for item b, and three rows per user to item c:
item | uid | data
------------------
a | x | baz
a | z | arm
b | x | eye
b | x | eye
b | z | tap
c | y | tip
c | z | top
ARRAY_AGG with LIMIT n doesn't work for two reasons: first, I suspect that given that n can be large (on the order of 100,000), this won't scale. The second, more fundamental problem is that n needs to be a constant.
Table sampling also doesn't seem to solve my problem, since it's per-table, and also only supports sampling a fixed percentage of rows, rather than a fixed number of rows.
Are there any other options?
Consider below solution
select * except(n)
from rows_list
join items_list
using(item)
where true
qualify row_number() over win <= n
window win as (partition by item, uid order by rand())
if applied to sample data in your question - output is

Merge concept in Oracle

**Before Merge:**
***Source table as SRC***
-------------
| A | B | C |
-------------
| a | b | 10|
| c | d | 20|
| w | x | 30|
| w | y | 40|
| w | z | 50|
-------------
***Target Table as TGT***
--------------
| D | E | F |
--------------
| a | b | null|
| c | e | null|
| w | m | null|
| w | n | null|
| w | o | null|
-------------
***After Merge:***
***Target table as TGT***
-----------
| D | E | F |
------------
| a | b | 10|
| c | e | 20|
| w | m | 50|
-----------
I have mentioned 2 tables above: one as source table and other as target table
I want to merge above two tables and store result in target in oracle
Logic:
1st logic: (find out the matching values of A and to D and B column to E column and only 1 match found)
Ex: A.a = D.a and B.b = E.b then update C column value to F column i.e. F=10
2nd logic: If 1st logic not found then find out the matching values of A column to the D column and B column value don’t match with E column and only 1 match found)
Ex: A.c = D.c then update C column value to F column i.e. F=20
3rd logic: If 1st logic and 2nd logic not found then find out the matching values of A column to the D column and B column value don’t match with E column and multiple matches found) then update the C column value of highest number in F column.
Ex: A.w = D.w -> we have 3 rows, out of those we select the value which has high value i.e. 50 and store this value in F column i.e. F = 50 in one row and remove other two rows
After merging, no. of rows are reduced. How to write a program for this using MERGE concept in oracle
You can use the corelated sub-query as follows:
Update tgt t
Set t.f = (Select Coalesce(Max(case when s.b = t.e then s.c end), Max(s.c))
From src s
Where s.a = t.d)

MDX - filter empty outside of selected range

Cube is populated with data divided into time dimension ( period ) which represents a month.
Following query:
select non empty {[Measures].[a], [Measures].[b], [Measures].[c]} on columns,
{[Period].[Period].ALLMEMEMBERS} on rows
from MyCube
returns:
+--------+----+---+--------+
| Period | a | b | c |
+--------+----+---+--------+
| 2 | 3 | 2 | (null) |
| 3 | 5 | 3 | 1 |
| 5 | 23 | 2 | 2 |
+--------+----+---+--------+
Removing non empty
select {[Measures].[a], [Measures].[b], [Measures].[c]} on columns,
{[Period].[Period].ALLMEMEMBERS} on rows
from MyCube
Renders:
+--------+--------+--------+--------+
| Period | a | b | c |
+--------+--------+--------+--------+
| 1 | (null) | (null) | (null) |
| 2 | 3 | 2 | (null) |
| 3 | 5 | 3 | 1 |
| 4 | (null) | (null) | (null) |
| 5 | 23 | 2 | 2 |
| 6 | (null) | (null) | (null) |
+--------+--------+--------+--------+
What i would like to get, is all records from period 2 to period 5, first occurance of values in measure "a" denotes start of range, last occurance - end of range.
This works - but i need this to be dynamically calculated during runtime by mdx:
select non empty {[Measures].[a], [Measures].[b], [Measures].[c]} on columns,
{[Period].[Period].&[2] :[Period].[Period].&[5]} on rows
from MyCube
desired output:
+--------+--------+--------+--------+
| Period | a | b | c |
+--------+--------+--------+--------+
| 2 | 3 | 2 | (null) |
| 3 | 5 | 3 | 1 |
| 4 | (null) | (null) | (null) |
| 5 | 23 | 2 | 2 |
+--------+--------+--------+--------+
I tried looking for first/last values but just couldn't compose them into the query properly. Anyone has this issue before ? This should be pretty common seeing as I want to get a continuous financial report without skipping months where nothing is going on. Thanks.
Maybe try playing with NonEmpty / Tail function in a WITH clause:
WITH
SET [First] AS
{HEAD(NONEMPTY([Period].[Period].MEMBERS, [Measures].[a]))}
SET [Last] AS
{TAIL(NONEMPTY([Period].[Period].MEMBERS, [Measures].[a]))}
SELECT
{
[Measures].[a]
, [Measures].[b]
, [Measures].[c]
} on columns,
[First].ITEM(0).ITEM(0)
:[Last].ITEM(0).ITEM(0) on rows
FROM MyCube;
to debug a custom set, to see what members it is returning you can do something like this:
WITH
SET [First] AS
{HEAD(NONEMPTY([Period].[Period].MEMBERS, [Measures].[a]))}
SELECT
{
[Measures].[a]
, [Measures].[b]
, [Measures].[c]
} on columns,
[First] on rows
FROM MyCube;
I think reading your comment about Children means that this is also an alternative - to add an extra [Period]:
WITH
SET [First] AS
{HEAD(NONEMPTY([Period].[Period].[Period].MEMBERS
, [Measures].[a]))}
SET [Last] AS
{TAIL(NONEMPTY([Period].[Period].[Period].MEMBERS
, [Measures].[a]))}
SELECT
{
[Measures].[a]
, [Measures].[b]
, [Measures].[c]
} on columns,
[First].ITEM(0).ITEM(0)
:[Last].ITEM(0).ITEM(0) on rows
FROM MyCube;

SAS/STAT 12.1: KEYLABEL in PROC TABULATE: need row total and column total lines for "all" to display different labels

I am working in SAS/STAT 12.1 and I have only one issue with my code below, I need to show "Total" for the bottom row (displaying columns sums and percentages), instead of "Both Genders." And yes, the top right-hand column header (displaying row totals and percentages) still needs to be "Both Genders."
I hope there is a simple way to do this using keylabel, but haven't figured it out so far.
proc tabulate data=dmhrind format=8.1;
format gender $gendfmt. ethnic $ethnic.;
class ethnic gender;
table (ethnic all)*f=4. , (gender all)*(n*f=4. colpctn*f=5.1 rowpctn*f=5.1) ;
title 'Ethnic Distribution by Gender';
label ethnic='Race/Ethnicity';
keylabel N='N' colpctn='%' all='Both Genders' reppctn='%' rowpctn = 'Total';
run;
Thanks in advance for any assistance provided.
The only way to do this that I can see is to make a dummy column that simulates All. Using sashelp.class:
data class;
set sashelp.class;
allage = 'All Ages';
run;
proc tabulate data=class format=8.1;
class sex age allage;
table (age allage=' ')*f=4. , (sex all)*(n*f=4. colpctn*f=5.1 rowpctn*f=5.1) ;
title 'age Distribution by sex';
label age='Age';
label allage='All Ages';
keylabel N='N' colpctn='%' all='Both Sexes' reppctn='%' rowpctn = 'Total';
run;
It needs to have the text you want as the label as its actual value, and you need to replace all in the tabulate with that variable (and add it to the class statement), and add =' ' to override the extra label subrow.
For this, you need to do the titling within the table statement. The following example is similar to yours, using sashelp.class (as in #Joe's example) where age is used as your ethnicity variable:-
** This option helps improve proc tabulate output on some systems;
options formchar="|----||---|-/\<>*";
** The key is adding the column titles directly in the table stmt;
proc tabulate data=sashelp.class format=8.1;
class sex age;
table (age all='Total')*f=4. , (sex='' all='Both Sexes')*(n='N'*f=4. colpctn='Col %'*f=5.1 rowpctn='Row %'*f=5.1) ;
run;
The output should look like this:-
---------------------------------------------------------------------------
| | F | M | Both Sexes |
| |----------------|----------------|-----------------
| | N |Col %|Row %| N |Col %|Row %| N |Col %|Row %|
|----------------------|----|-----|-----|----|-----|-----|----|-----|------
|Age | | | | | | | | | |
|----------------------- | | | | | | | | |
|11 | 1| 11.1| 50.0| 1| 10.0| 50.0| 2| 10.5|100.0|
|----------------------|----|-----|-----|----|-----|-----|----|-----|------
|12 | 2| 22.2| 40.0| 3| 30.0| 60.0| 5| 26.3|100.0|
|----------------------|----|-----|-----|----|-----|-----|----|-----|------
|13 | 2| 22.2| 66.7| 1| 10.0| 33.3| 3| 15.8|100.0|
|----------------------|----|-----|-----|----|-----|-----|----|-----|------
|14 | 2| 22.2| 50.0| 2| 20.0| 50.0| 4| 21.1|100.0|
|----------------------|----|-----|-----|----|-----|-----|----|-----|------
|15 | 2| 22.2| 50.0| 2| 20.0| 50.0| 4| 21.1|100.0|
|----------------------|----|-----|-----|----|-----|-----|----|-----|------
|16 | .| .| .| 1| 10.0|100.0| 1| 5.3|100.0|
|----------------------|----|-----|-----|----|-----|-----|----|-----|------
|Total | 9|100.0| 47.4| 10|100.0| 52.6| 19|100.0|100.0|
--------------------------------------------------------------------------|

Sum of the grouped distinct values

This is a bit hard to explain in words ... I'm trying to calculate a sum of grouped distinct values in a matrix. Let's say I have the following data returned by a SQL query:
------------------------------------------------
| Group | ParentID | ChildID | ParentProdCount |
| A | 1 | 1 | 2 |
| A | 1 | 2 | 2 |
| A | 1 | 3 | 2 |
| A | 1 | 4 | 2 |
| A | 2 | 5 | 3 |
| A | 2 | 6 | 3 |
| A | 2 | 7 | 3 |
| A | 2 | 8 | 3 |
| B | 3 | 9 | 1 |
| B | 3 | 10 | 1 |
| B | 3 | 11 | 1 |
------------------------------------------------
There's some other data in the query, but it's irrelevant. ParentProdCount is specific to the ParentID.
Now, I have a matrix in the MS Report Designer in which I'm trying to calculate a sum for ParentProdCount (grouped by "Group"). If I just add the expression
=Sum(Fields!ParentProdCount.Value)
I get a result 20 for Group A and 3 for Group B, which is incorrect. The correct values should be 5 for group A and 1 for group B. This wouldn't happen if there wasn't ChildID involved, but I have to use some other child-specific data in the same matrix.
I tried to nest FIRST() and SUM() aggregate functions but apparently it's not possible to have nested aggregation functions, even when they have scopes defined.
I'm pretty sure there is some way to calculate the grouped distinct sum without needing to create another SQL query. Anyone got an idea how to do that?
Ok I got this sorted out by adding a ROW_NUMBER() function my SQL query:
SELECT Group, ParentID, ROW_NUMBER() OVER (PARTITION BY ParentID ORDER BY ChildID ASC) AS Position, ChildID, ParentProdCount FROM Table
and then I replaced the SSRS SUM function with
=SUM(IIF(Position = 1, ParentProdCount.Value, 0))
Put a grouping over the ParentID and use a summation over that group,
eg:
if group over ParentID = "ParentIDGroup"
then
column sum of ParentPrdCount = SUM(Fields!ParentProdCount.Value,"ParentIDGroup")

Resources