Fastest way to build hierarchical structure - linq

I have a data source, containing the following columns:
ID | Tile | Score | Type
I have several rows in this data source, but of interest is the "Type" column that contains a type definition, each row belongs to, something like:
1 | Apple | 12 | Pipped
2 | Banana | 34 | Flesh
3 | Kiwi | 32 | Flesh
4 | Orange | -1 | Pipped
5 | Grapes | 3 | Pipped
6 | Potato | 5 | Skinned
I need to pull this information into a collection, or a KeyValuePair<string, List<Data>> but cannot find an efficient way to do this.
I'm currently using LINQ to pull a collection for each of the types (enumerator):
var pipped = (from p in dataSource where p.Type != null && p.Type.Equals(enum.Pipped) select p).ToList();
var flesh = (from p in dataSource where p.Type != null && p.Type.Equals(enum.Flesh) select p).ToList();
var skinned = (from p in dataSource where p.Type != null && p.Type.Equest(enum.Skinned) select p).ToList();
SortedDictionary<string, List<dataSource>> items = new SortedDictionary<string, List<dataSource>>();
items.Add("Pipped", pipped);
items.Add("Skinned", skinned);
items.Add("Flesh", flesh);
There must be a more efficient way to do this?

Looks like you want to use a GroupBy with a ToDictionary like this:
var dictionary = (from x in datasource
where x.Type != null
group x by x.Type into x
select x).ToDictionary(x => x.Key, x => x.ToList());
Or if you want to use method syntax:
var dictionary = datasource.Where(x => x.Type != null)
.GroupBy(x => x.Type)
.ToDictionary(x => x.Key, x => x.ToList());

Related

LINQ Left Outer Join Multiple Tables with Group Count and Row Concatenation

Can someone help with below please? I simplified the table/column names, etc. I search everywhere but the answers I get are incomplete solutions for the results I want to achieve below. New to LINQ so please be kind. :-)
TABLES
Parent (ParentId, ParentName, ParentOccupation)
Child (ChildId, ChildName, OtherField, ParentId)
GrandChild (GrandChildId, GrandChildName, OtherField, ChildId)
Parent
+----------+------------+------------------+
| ParentId | ParentName | ParentOccupation |
+----------+------------+------------------+
| 1 | Mary | Teacher |
| 2 | Anne | Doctor |
| 3 | Michael | Farmer |
| 4 | Elizabeth | Police |
| 5 | Andrew | Fireman |
+----------+------------+------------------+
Child
+---------+-----------+-------------+----------+
| ChildId | ChildName | OtherField | ParentId |
+---------+-----------+-------------+----------+
| 1 | Ashley | [SomeValue] | 1 |
| 2 | Brooke | [SomeValue] | 1 |
| 3 | Ashton | [SomeValue] | 3 |
| 4 | Emma | [SomeValue] | 4 |
+---------+-----------+-------------+----------+
GrandChild
+--------------+----------------+-------------+---------+
| GrandChildId | GrandChildName | OtherField | ChildId |
+--------------+----------------+-------------+---------+
| 1 | Andrew | [SomeValue] | 1 |
| 2 | Isabelle | [SomeValue] | 2 |
| 3 | Lucas | [SomeValue] | 2 |
| 4 | Matthew | [SomeValue] | 4 |
+--------------+----------------+-------------+---------+
EXPECTED RESULTS
+----------+------------+------------------+-----------------------+-------------------------+
| ParentId | ParentName | ParentOccupation | NumberOfGrandChildren | NamesOfGrandChildren |
+----------+------------+------------------+-----------------------+-------------------------+
| 1 | Mary | Teacher | 3 | Andrew, Isabelle, Lucas |
| 2 | Anne | Doctor | 0 | |
| 3 | Michael | Farmer | 0 | |
| 4 | Elizabeth | Police | 1 | Matthew |
| 5 | Andrew | Fireman | 0 | |
+----------+------------+------------------+-----------------------+-------------------------+
WHAT I HAVE DONE SO FAR
LEFT OUTER JOINS - getting all the columns but no aggregates
var result1 = (from p in Parent
join c in Child on p.ParentId equals c.ParentId into pcj
from pc in pcj.DefaultIfEmpty()
join g in GrandChild on pc.ChildId equals g.ChildId into cgj
from cg in cgj.DefaultIfEmpty()
where [some criteria]
select new
{
ParentId = p.ParentId,
ParentName = p.ParentName,
ChildId = pc.ChildId,
ChildName = pc.ChildName,
GrandChildId = cg.GrandChildId,
GrandChildName = cg.GrandChildName
});
COUNTS - contain the aggregate but not all parent columns are there. Also returns 1 in the count, instead of 0.
var result2 = (from p in Parent
join c in Child on p.ParentId equals c.ParentId into pcj
from pc in pcj.DefaultIfEmpty()
join g in GrandChild on pc.ChildId equals g.ChildId into cgj
from cg in cgj.DefaultIfEmpty()
where [some criteria]
group new { p } by new { p.ParentId } into r
select new
{
ParentId = r.Key.Id,
NumberOfGrandChildren = r.Count()
});
CONCATENATE COMMA SEPARATED ROW VALUES (for names of grandchildren) - have not attempted yet until I solve the count above, but open for solutions please.
How can I combine and achieve the results above? Any help is appreciated! Thanks in advance.
Assuming you are using EF, and you have navigation properties set up, then your query would look like this:
var result = context.Parents
.Select(p => new {
p.ParentId,
p.ParentName,
p.ParentOccupation,
NumberOfGrandChildren = p.Children
.SelectMany(c => c.GrandChildren)
.Count(),
NamesOfGrandChildren = string.Join(", ", p.Children
.SelectMany(c => c.GrandChildren)
.Select(g => g.GrandChildName))
}).ToList();
EDIT
New comments posted by the author of the question show that the Linq query involves EF Core. My original answer assumed it was a local query (Linq to Object). In fact, it rather seems to be an interpreted query (Linq to Entities).
See linq to entities vs linq to objects - are they the same? for explanations about the distinction between Linq to object and Linq to entities.
In that case, Robert McKee's answer is more to the point.
For curiosity's sake, Linqpad shows that this query:
Parents
.Select(p => new
{
ParentId = p.Id,
ParentName = p.Name,
ParentOccupation = p.Occupation,
GrandChildrenCount = p.Children
.SelectMany(c => c.GrandChildren)
.Count(),
GranchildrenNames = string.Join(", ", p.Children
.SelectMany(c => c.GrandChildren)
.Select(gc => gc.Name))
});
will be translated to the following SQL query:
SELECT "p"."Id", "p"."Name", "p"."Occupation", (
SELECT COUNT(*)
FROM "Children" AS "c"
INNER JOIN "GrandChildren" AS "g" ON "c"."Id" = "g"."ChildId"
WHERE "p"."Id" = "c"."ParentId"), "t"."Name", "t"."Id", "t"."Id0"
FROM "Parents" AS "p"
LEFT JOIN (
SELECT "g0"."Name", "c0"."Id", "g0"."Id" AS "Id0", "c0"."ParentId"
FROM "Children" AS "c0"
INNER JOIN "GrandChildren" AS "g0" ON "c0"."Id" = "g0"."ChildId"
) AS "t" ON "p"."Id" = "t"."ParentId"
ORDER BY "p"."Id", "t"."Id", "t"."Id0"
(Using Sqlite, and a custom EFCore context containing Entity Classes with navigation properties)
ORIGINAL ANSWER - assuming Linq to object
Here is a way you could construct your query.
var Result = Parents
// Stage 1: for each parent, get its Chidren Ids
.Select(p => new
{
Parent = p,
ChildrenIds = Children
.Where(c => c.ParentId == p.Id)
.Select(c => c.Id)
.ToList()
})
// Stage 2: for each parent, get its Grandchildren, by using the childrenIds list constructed before
.Select(p => new
{
p.Parent,
GrandChildren = Grandchildren
.Where(gc => p.ChildrenIds.Contains(gc.ChildId))
.ToList()
})
// Stage 3: for each parent, count the grandchildren, and get their names
.Select(p => new
{
ParentId = p.Parent.Id,
ParentName = p.Parent.Name,
ParentOccupation = p.Parent.Occupation,
NumberOfGrandChildren = p.GrandChildren.Count(),
GranchildrenNames = string.Join(", ", p.GrandChildren.Select(gc => gc.Name))
});
And here is a full working LinqPad script, with random data generation, so you can try it:
void Main()
{
var rnd = new Random();
var Parents = Enumerable
.Range(0, 10)
.Select(i => new Parent
{
Id = i,
Name = $"Parent-{i}",
Occupation = $"Occupation{i}"
})
.ToList();
var Children = Enumerable
.Range(0,15)
.Select(i => new Child
{
Id = i,
Name = $"Child{i}",
ParentId = rnd.Next(0, 10)
})
.ToList();
var GrandChildren = Enumerable
.Range(0, 25)
.Select(i => new GrandChildren
{
Id = i,
Name = $"GrandChild{i}",
ChildId = rnd.Next(0, 15)
})
.ToList();
var Result = Parents
// Stage 1: for each parent, get its Chidren Ids
.Select(p => new
{
Parent = p,
ChildrenIds = Children
.Where(c => c.ParentId == p.Id)
.Select(c => c.Id)
.ToList()
})
// Stage 2: for each parent, get its Grandchildren, by using the childrenIds list constructed before
.Select(p => new
{
p.Parent,
GrandChildren = GrandChildren
.Where(gc => p.ChildrenIds.Contains(gc.ChildId))
.ToList()
})
// Stage 3: for each parent, count the grandchildren, and get their names
.Select(p => new
{
ParentId = p.Parent.Id,
ParentName = p.Parent.Name,
ParentOccupation = p.Parent.Occupation,
NumberOfGrandChildren = p.GrandChildren.Count(),
GranchildrenNames = string.Join(", ", p.GrandChildren.Select(gc => gc.Name))
})
.Dump();
}
// You can define other methods, fields, classes and namespaces here
public class Parent
{
public int Id { get; set; }
public string Name { get; set; }
public string Occupation { get; set; }
}
public class Child
{
public int Id { get; set; }
public string Name { get; set; }
public int ParentId { get; set; }
}
public class GrandChildren
{
public int Id { get; set; }
public string Name { get; set; }
public int ChildId { get; set; }
}
And here is a set of results:
// Parents
0 Parent-0 Occupation0
1 Parent-1 Occupation1
2 Parent-2 Occupation2
3 Parent-3 Occupation3
4 Parent-4 Occupation4
5 Parent-5 Occupation5
6 Parent-6 Occupation6
7 Parent-7 Occupation7
8 Parent-8 Occupation8
9 Parent-9 Occupation9
// Children
0 Child0 1
1 Child1 5
2 Child2 8
3 Child3 6
4 Child4 9
5 Child5 3
6 Child6 0
7 Child7 4
8 Child8 9
9 Child9 7
10 Child10 8
11 Child11 2
12 Child12 7
13 Child13 7
14 Child14 8
// GrandChildren
0 GrandChild0 7
1 GrandChild1 11
2 GrandChild2 11
3 GrandChild3 14
4 GrandChild4 6
5 GrandChild5 0
6 GrandChild6 11
7 GrandChild7 6
8 GrandChild8 0
9 GrandChild9 12
10 GrandChild10 9
11 GrandChild11 7
12 GrandChild12 0
13 GrandChild13 3
14 GrandChild14 11
15 GrandChild15 9
16 GrandChild16 2
17 GrandChild17 12
18 GrandChild18 12
19 GrandChild19 12
20 GrandChild20 14
21 GrandChild21 12
22 GrandChild22 11
23 GrandChild23 14
24 GrandChild24 12
// Result
0 Parent-0 Occupation0 2 GrandChild4, GrandChild7
1 Parent-1 Occupation1 3 GrandChild5, GrandChild8, GrandChild12
2 Parent-2 Occupation2 5 GrandChild1, GrandChild2, GrandChild6, GrandChild14, GrandChild22
3 Parent-3 Occupation3 0
4 Parent-4 Occupation4 2 GrandChild0, GrandChild11
5 Parent-5 Occupation5 0
6 Parent-6 Occupation6 1 GrandChild13
7 Parent-7 Occupation7 8 GrandChild9, GrandChild10, GrandChild15, GrandChild17, GrandChild18, GrandChild19, GrandChild21, GrandChild24
8 Parent-8 Occupation8 4 GrandChild3, GrandChild16, GrandChild20, GrandChild23
9 Parent-9 Occupation9 0

Concatenate rows based on row comparisons using LINQ

I tried doing this in SQL for about a month now, but I think it might be easier to do it with .NET linq.
The basics are as follows:
The query is supposed to return data from a date range, and return a concatenated list of player names and player times.
The concatenation would ONLY occur if the playEnd was within 30 minutes of the next players playStart.
So if I have data like this:
Name PlayDate PlayStart PlayEnd
----------------------------------------------------
player1 | 10/8/2018 | 08:00:00 | 09:00:00
player2 | 10/8/2018 | 09:10:00 | 10:10:00
player3 | 10/9/2018 | 10:40:00 | 11:30:00
player4 | 10/11/2018 | 08:30:00 | 08:37:00
player5 | 10/11/2018 | 08:40:00 | 08:50:00
player6 | 10/12/2018 | 09:00:00 | 09:45:00
player7 | 10/12/2018 | 09:50:00 | 10:10:00
player8 | 10/12/2018 | 10:30:00 | 12:20:00
player1 and player2 play times would be concatenated together like: player1, player2 = 8:00:00 - 10:10:00 for 10/8/2018
player3 would just be: player3 = 10:40:00 - 11:30:00 for 10/9/2018
player4 and player5 play times would be concatenated like: player4, player5 = 08:30:00 - 08:50:00 for 10/11/2018
player6 and player7 and player8 play times would be concatenated like: player6, player7, player8 = 09:00:00 - 12:20:00 for 10/12/2018
I've tried modifying the query below in many ways, but I just don't know how to compare one row of data with the next and then combine the two (or more) if needed.
var query = from pl in players
select new PlaySession
{
Name = pl.Name,
PlayDate = pl.PlayDate,
PlayStart = pl.PlayStartTime,
PlayEnd = pl.PlayEndTime
};
var grouped = query
.OrderBy(r => r.Name)
.ThenBy(r => r.PlayDate)
.ThenBy(r => r.PlayStart)
Now this is where I get confused:
I need to figure out the following:
how to compare PlayDates of the various rows to make sure that they are the same date, like this: row1.PlayDate == row2.PlayDate
how to compare one rows PlayEnd with the next rows PlayStart, something like this: row2.PlayStart - row1.PlayEnd < 30 minutes
Is there a way to compare values across rows using LINQ?
Thanks!
As per as I am concern, thing should be like as follows:
List<ViewModel> playersGroupList = Players.GroupBy(p => p.PlayDate).Select(group => new ViewModel
{
PlayDate = group.Key,
Names = String.Join("-", group.Select(g => g.Name).ToArray()),
PlayDuration = group.Select(g => g.PlayStart).First() + "-" + group.Select(g => g.PlayEnd).Last()
}).ToList();
And here ViewModel is as follows:
public class ViewModel
{
public string PlayDate {get set;}
public string Names {get set;}
public string PlayDuration {get set;}
}
Note: Some adjudgement may be needed to fulfill your point to point requirement but actual implementation should be as shown.

Is it possible to groupBy (month) and sum (each column) in table

+---------+--------+---------+---------+
| date | type_a | type_b | type_zzz|
+---------+--------+---------+---------+
|01-01-18 | 12 | 10 | 1 |
|02-01-18 | 2 | 5 | 1 |
|03-01-18 | 7 | 2 | 2 |
|01-02-18 | 13 | 6 | 55 |
|02-02-18 | 22 | 33 | 5 |
+---------+--------+---------+---------+
Hi,
In above example, I would like to know if it's possible to groupBy month and sum each column when getting results in Laravel (tables are dynamic so there are no models for them and also some tables don't have column 'type_a' other don't have 'type_zzz' etc...).
What I'm looking to get from above table is something like this:
"01" =>
'type_a' : '21',
'type_b' : '17',
'type_zzz': '4'
"02" =>
'type_a' : '35',
'type_b' : '39',
'type_zzz': '60'
I'm using following code to group it by month but I'm not able to find solution to return sum by each column:
DB::table($id)->get()->groupBy(function($date) {
return Carbon::parse($date->repdate)->format('m');;
});
If I understand your question correctly, you can either group and sum the values using an SQL query:
$grouped = DB::table('table_name')
->selectRaw('
SUM(type_a) AS type_a,
SUM(type_b) AS type_b,
SUM(type_z) AS type_z
')
->groupByRaw('MONTH(date)')
->get();
Or if you don't want to have to specify the column names in each query, you can use groupBy, array_column, and array_sum on your collection:
$grouped = DB::table('table_name')
->get()
->groupBy(function ($item) {
return Carbon::parse($item->date)->format('m');
})
->map(function ($group) {
$group = $group->toArray();
$summed = [];
$columns = array_keys($group[0]);
array_shift($columns);
foreach ($columns as $column) {
$summed[$column] = array_sum(array_column($group, $column));
}
return $summed;
});

Reshape data in pig - change row values to column names

Is there a way to reshape the data in pig?
The data looks like this -
id | p1 | count
1 | "Accessory" | 3
1 | "clothing" | 2
2 | "Books" | 1
I want to reshape the data so that the output would look like this--
id | Accessory | clothing | Books
1 | 3 | 2 | 0
2 | 0 | 0 | 1
Can anyone please suggest some way around?
If its a fixed set of product line the below code might help, otherwise you can go for a custom UDF which helps in achieving the objective.
Input : a.csv
1|Accessory|3
1|Clothing|2
2|Books|1
Pig Snippet :
test = LOAD 'a.csv' USING PigStorage('|') AS (product_id:long,product_name:chararray,rec_cnt:long);
req_stats = FOREACH (GROUP test BY product_id) {
accessory = FILTER test BY product_name=='Accessory';
clothing = FILTER test BY product_name=='Clothing';
books = FILTER test BY product_name=='Books';
GENERATE group AS product_id, (IsEmpty(accessory) ? '0' : BagToString(accessory.rec_cnt)) AS a_cnt, (IsEmpty(clothing) ? '0' : BagToString(clothing.rec_cnt)) AS c_cnt, (IsEmpty(books) ? '0' : BagToString(books.rec_cnt)) AS b_cnt;
};
DUMP req_stats;
Output :DUMP req_stats;
(1,3,2,0)
(2,0,0,1)

ManyToMany relation - how update attribute in pivot table

I am now learning to work with pivot tables: https://laravel.com/docs/4.2/eloquent#working-with-pivot-tables
I have WeeklyRoutine model. Each routine has several Activities. The assigned activities are attached in a pivot table activity_routine.
Relation defined in the WeeklyRoutine model:
return $this->belongsToMany('App\Models\Activity', 'activity_routine', 'routine_id', 'activity_id')->withPivot('done_at')->withTimestamps();
}
it looks like this:
// activity_routine pivot table (relevant columns only)
| id | activity_id | routine_id | done_at |
| 34 | 1 | 4 | 2016-04-23 09:27:27 | // *1
| 35 | 2 | 4 | null | // *2
*1 this activity is marked as done with the code below
*2 this activity is not yet done
what I have:
I can update the done_at field in the pivot table, thus making it marked as DONE for the given week (a weeklyroutine_id = 4 in the above code
public function make_an_activity_complete($routineid, $activityid) {
$date = new \DateTime;
$object = Routine::find($routineid)->activities()->updateExistingPivot($activityid, array('done_at' => $date));
return 'done!';
}
what I need
I want to UN-DO an activity. When it is already done, that is when the done_at is not null buc contains a date, make it null.
In other words I need to do the below switch of value, but the proper way:
$pivot = DB::table('activity_routine')->where('routine_id, $routineid)->where('activity_id, $activityid)->first();
if($pivot->done_at != null) {
$new_val = new \DateTime;
} else {
$new_val = null;
}
$object = Routine::find($routineid)->activities()->updateExistingPivot($activityid, array('done_at' => $new_val));
How to do it? I have no clue!
Thx.
Your approach seems fine to me. I would probably do it like this.
$routine = Routine::find($routineid);
$activity = $routine->activities()->find($activityid);
$done_at = is_null($activity->pivot->done_at) ? new \DateTime : null;
$routine->activities()->updateExistingPivot($activityid, compact('done_at'));

Resources