Get distinct values based on selected field with LINQ - linq

I have the following table
Members
Id, GroupId, Age
1, 1, 12
2, 1, 20
3, 1, 33
4, 2, 12
5, 2, 7
How can I write a LINQ query that will give me a list of the oldest member of each group?
The result should be
Id, GroupId, Age
3, 1, 33
4, 2, 12

from m in members
group m by m.GroupId into g
select g.OrderByDescending(m => m.Age).First()

Related

Sum a column based on max date and unique range

Basically, trying to figure out how I can Sum the totals column based on the latest/max date, by town, ie filtered by unique and the latest date for each row.
Date
Town
Totals
September 5
Loerie
9
November 8
Loerie
4
May 7
Flower
2
February 2
Holo
8
May 9
Holo
7
July 23
Flower
3
June 7
Dump
1
March 3
Tzaneen
9
September 2
Tzaneen
4
April 3
Coffee
7
Able to unique sort the town list, and show the totals for each based on max date with =maxifs(C$2:C,B$2:B,F2,A$2:A,maxifs(A$2:A,B$2:B,F2))
Need to be able to sort and sum those results in a single function, but unsure how. Arrayformula?
Shared the example doc.
https://docs.google.com/spreadsheets/d/1SSNJJOoz1-pxVH0ZoFFZqChhZxjqtRz5dfvyQu76ueI/edit?usp=sharing
try:
=QUERY(SORTN(SORT(A2:C, 2, 1, 1, 0), 9^9, 2, 2, 1), "select Col2,Col3")
with total:
={QUERY(SORTN(SORT(A2:C, 2, 1, 1, 0), 9^9, 2, 2, 1), "select Col2,Col3");
"Total:", SUM(INDEX(SORTN(SORT(A2:C, 2, 1, 1, 0), 9^9, 2, 2, 1),,3))}
only total:
=SUM(INDEX(SORTN(SORT(A2:C, 2, 1, 1, 0), 9^9, 2, 2, 1),,3))

Ruby - complex sorting. Want part of the list to sort ascending, the other part to sort descending

I have multiple criteria for sorting
I have a sort order that I specify. (I spit out 1, 2, 3, 4 base on the criteria I have)
Then base on the order from step 1, I use different column for sorting.
When step 1 produces 1 to 3, I need step 2 to sort ascending and when it is 4, I need it to be descending.
I have step 1 and 2 working. Stuck on step 3. The operation is also expensive, so I don't want to iterate the list more than once if I can avoid it.
I understand I could have first filter the list into 2 different arrays (such as group sort_order 1 to 3 into one array) and sort_order 4 into another array and then call reverse on the second array. But that would require me to iterate the list multiple times.
#assignments.sort_by do |a|
sort_date = if(a.sort_order == 1 || a.sort_order == 4)
a.due_date
else
a.start_date
end
[a.sort_order, sort_date]
end
example input -
[{name: item 1, start_date: "08/01/2016", due_date: "08/15/2016", sort_order: 3},
{name: item 2, start_date: "08/02/2016", due_date: "08/20/2016", sort_order: 3},
{name: item 3, start_date: "06/01/2016", due_date: nil, sort_order: 2},
{name: item 4, start_date: "06/01/2016", due_date: "07/15/2016", sort_order: 1},
{name: item 5, start_date: "07/01/2016", due_date: "07/07/2016", sort_order: 1},
{name: item 6, start_date: "01/01/2015", due_date: "01/15/2015", sort_order: 4},
{name: item 7, start_date: "01/01/2016", due_date: "01/15/2016", sort_order: 4},]
the order of expected output
item 5
item 4
item 3
item 1
item 2
item 7
item 6
Basically sort the assignments by current, future, past.
Current assignments could have no end date, therefore, they get sort order 1 and 2 and if the assignment is current, then sort it by due date unless there is no due date, then sort it by start date.
Future assignments, sort it by start date ascending order.
Past assignments, sort it by due date descending order.
Just use negative values for the dates to sort descending.
#assignments.sort_by do |a|
case a.sort_order
when 1 then [1, a.due_date]
when 2 then [2, a.start_date]
when 3 then [3, a.start_date]
when 4 then [4, -a.due_date]
end
end

What does the multiply operator do relational algebra?

I'm new to relational algebra. I found the * operator in the following expression
What's the different this and one using join
The * should more correctly be written × as it represents a Cartesian product. This operation returns the set of all tuples that are the concatenation of tuples from each operand. A join filters the Cartesian product down to only those tuples with matching values on specified attributes. If the join is a natural join, as in your example, the attributes matched on are those with identical names.
For example, given the following two relations R and S as shown:
R ( a, b, c ) S ( b, c, d )
( 1, 2, 3 ) ( 2, 7, 9 )
( 2, 4, 6 ) ( 5, 3, 4 )
( 3, 6, 9 ) ( 2, 3, 6 )
The Cartesian product R × S is:
( R.a, R.b, R.c, S.b, S.c, S.d )
( 1, 2, 3, 2, 7, 9 )
( 1, 2, 3, 5, 3, 4 )
( 1, 2, 3, 2, 3, 6 )
( 2, 4, 6, 2, 7, 9 )
( 2, 4, 6, 5, 3, 4 )
( 2, 4, 6, 2, 3, 6 )
( 3, 6, 9, 2, 7, 9 )
( 3, 6, 9, 5, 3, 4 )
( 3, 6, 9, 2, 3, 6 )
The natural join R ⨝ S is the product filtered to only tuples where the b and c values match:
( a, b, c, d )
( 1, 2, 3, 6 )
The join R ⨝b S is the product filtered to only tuples where the b values match:
( R.a, b, R.c, S.c, S.d )
( 1, 2, 3, 7, 9 )
( 1, 2, 3, 3, 6 )
In few books natural join is denoted by an astric(*).

PIG (Hadoop) - rows with variable columns

Playing with Pig, my input file is:
1, 4, 6
1, 2, 7, 9
2, 5, 1
1, 3, 5, 1
2, 6, 2, 8
The first value in each row is the ID; the remainder of the row are simply unique values (each row can have a different number of columns).
I want to transpose the above into:
1, 2, 4, 6, 7, 9, 3, 5, 1
2, 5, 1, 6, 2, 8
So basically GROUP by ID, then flatten the rest of the columns and output that as each row.
Is PIG even the right approach here? I have a way to do this in M/R, but thought Pig might be ideal for this sort of thing.
Many thanks for any hints provided
Duncan
PS I do not care about the order of the values.
Untested, but here's the general approach I'd take: Get a variable containing the ID and a bag of values, flatten it so you got rows of just ids and a single value, take the distinct rows, then group by the ID. This will give you a bag of values for each ID which you can convert to a string if you wanted to output.
A = LOAD 'input' USING TextLoader() as line:chararray;
B = FOREACH A GENERATE STRSPLIT(line,',',2) as (id:chararray,values:chararray)
C = FOREACH B GENERATE id, FLATTEN(TOBAG(STRSPLIT(values,','))) as value:chararray;
D = DISTINCT C; -- I'm assuming you actually want distinct values, wasn't clear.
E = GROUP D by id;
F = FOREACH E GENERATE group as id, BagToString(D.value) as valueString:chararray;

LINQ: Get min and max values of sequence of numbers divided into subsequences

how can I split a sequence of numbers into subgroups of numbers and get the local minimum and maximum of the subgroups with linq?
If I have a sequence of, lets say 11 items { 3, 2, 5, 9, 9, 6, 7, 2, 5, 11, 2 }
I want to split this into subgroups with 3 or less items.
So I get the following 4 subgroups: { 3, 2, 5 } , { 9, 9, 6 } , { 7, 2, 5} , { 11, 2}
The final return values of the LinQ expression (getting min and max for each group) should be 2, 5, 6, 9, 2, 7, 2, 11
TIA,
Sascha
This should do it.
var numbers = new[] { 3, 2, 5, 9, 9, 6, 7, 2, 5, 11, 2 };
var query = from p in numbers.Select((n, i) => new { n, Group = i / 3 })
group p.n by p.Group into g
from n in new[] { g.Min(), g.Max() }
select n;
Well, using MoreLINQ's Batch method to do the grouping, you could use:
var query = source.Batch(3, batch => new[] { batch.Min(), batch.Max() })
.SelectMany(x => x);

Resources