Recursive Linq Grouping - linq

Scenario:
I have database table that stores the hierarchy of another table's many-to-many relationship. An item can have multiple children and can also have more than one parent.
Items
------
ItemID (key)
Hierarchy
---------
MemberID (key)
ParentItemID (fk)
ChildItemID (fk)
Sample hierarchy:
Level1 Level2 Level3
X A A1
A2
B B1
X1
Y C
I would like to group all of the child nodes by each parent node in the hierarchy.
Parent Child
X A1
A2
B1
X1
A A1
A2
B B1
X1
Y C
Notice how there are no leaf nodes in the Parent column, and how the Child column only contains leaf nodes.
Ideally, I would like the results to be in the form of IEnumerable<IGrouping<Item, Item>> where the key is a Parent and the group items are all Children.
Ideally, I would like a solution that the entity provider can translate in to T-SQL, but if that is not possible then I need to keep round trips to a minimum.
I intend to Sum values that exist in another table joined on the leaf nodes.

Since you are always going to be returning ALL of the items in the table, why not just make a recursive method that gets all children for a parent and then use that on the in-memory Items:
partial class Items
{
public IEnumerable<Item> GetAllChildren()
{
//recursively or otherwise get all the children (using the Hierarchy navigation property?)
}
}
then:
var items =
from item in Items.ToList()
group new
{
item.itemID,
item.GetAllChildren()
} by item.itemID;
Sorry for any syntax errors...

Well, if the hierarchy is strictly 2 levels you can always union them and let LINQ sort out the SQL (it ends up being a single trip though it needs to be seen how fast it will run on your volume of data):
var hlist = from h in Hierarchies
select new {h.Parent, h.Child};
var slist = from h in Hierarchies
join h2 in hlist on h.Parent equals h2.Child
select new {h2.Parent, h.Child};
hlist = hlist.Union(slist);
This gives you an flat IEnumerable<{Item, Item}> list so if you want to group them you just follow on:
var glist = from pc in hlist.AsEnumerable()
group pc.Child by pc.Parent into g
select new { Parent = g.Key, Children = g };
I used AsEnumerable() here as we reached the capability of LINQ SQL provider with attempting to group a Union. If you try it against IQueryable it will run a basic Union for eligable parents then do a round-trip for every parent (which is what you want to avoid). Whether or not its ok for you to use regular LINQ for the grouping is up to you, same volume of data would have to come through the pipe either way.
EDIT: Alternatively you could build a view linking parent to all its children and use that view as a basis for tying Items. In theory this should allow you/L2S to group over it with a single trip.

Related

Linq2Entities Equivalent Query for Parent/Child Relationship, With All Parents and Children, Filtering/Ordering Children

So the question is ridiculously long, so let's go to the code. What's the linq2entities equivalent of the following Sql, given entities (tables) that look like:
Parent
---
parent_id
parent_field1
Child
--
child_id
parent_id
child_field1
child_field2
The sql:
select p.*, c.*
from parent p
inner join p on
p.parent_id = child.parent_id
where
c.child_field1 = some_appropriate_value
order by
p.parent_field1
c.child_field2
L2E let's you do .include() and that seems like the appropriate place to stick the ordering and filtering for the child, but the include method doesn't accept an expression (why not!?). So, I'm guessing this can't be done right now, because that's what a lot of articles say, but they're old, and I'm wondering if it's possible with EF6.
Also, I don't have access to the context, so I need the lambda-syntax version.
I am looking for a resultant object hierarchy that looks like:
Parent1
|
+-- ChildrenOfParent1
|
Parent2
|
+-- ChildrenOfParent2
and so forth. The list would be end up being an IEnumerable. If one iterated over that list, they could get the .Children property of each parent in that list.
Ideally (and I'm dreaming here, I think), is that the overall size of the result list could be limited. For example, if there are three parents, each with 10 children, for a total of 33 (30 children + 3 parents) entities, I could limit the total list to some arbitrary value, say 13, and in this case that would limit the result set to the first parent, with all its children, and the second parent, with only one of its children (13 total entities). I'm guessing all of this would have to be done manually in code, which is disappointing because it can be done quite easily in SQL.
when you get a query from db using entityframewrok to fetch parents, parent's fields are fetched in single query. now you have a result set like this:
var parentsQuery = db.Parents.ToList();
then, if you have a foreign key on parent, entityframework creates a navigation property on parent to access to corresponding entity (for example Child table).
in this case, when you use this navigation property from parent entities which already have been fetched, to get childs, entityframework creates another connection to sql server per parent.
for example if count of parentsQueryis 15, by following query entityframework creates 15 another connection, and get 15 another query:
var Childs = parentsQuery.SelectMany(u => u.NavigationProperty_Childs).ToList();
in these cases you can use include to prevent extra connections to fetch all childs with its parent, when you are trying to get parents in single query, like this:
var ParentIncludeChildsQuery = db.Parents.Include("Childs").ToList();
then by following Query, entityframework doesn't create any connection and doesn't get any query again :
var Childs = ParentIncludeChildsQuery.SelectMany(u => u.NavigationProperty_Childs).ToList();
but, you can't create any condition and constraint using include, you can check any constraint or conditions after include using Where, Join, Contains and so forth, like this:
var Childs = ParentIncludeChildsQuery.SelectMany(u => u.NavigationProperty_Childs
.Where(t => t.child_field1 = some_appropriate_value)).ToList();
but by this query, all child have been fetched from database before
the better way to acheieve equivalent sql query is :
var query = parent.Join(child,
p => p.ID
c => c.ParentID
(p, c) => new { Parent = p, Child = c })
.Where(u => u.Child.child_field1 == some_appropriate_value)
.OrderBy(u => u.Parent.parent_field1)
.ThenBy(u => u.Child.child_field2)
.ToList();
according to your comment, this is what you want:
var query = parent.Join(child,
p => p.ID,
c => c.ParentID,
(p, c) => new { Parent = p, Child = c })
.Where(u => u.Child.child_field1 == some_appropriate_value)
.GroupBy(u => u.Parent)
.Select(u => new {
Parent = u.Key,
Childs = u.OrderBy(t => t.Child.child_field2).AsEnumerable()
})
.OrderBy(u => u.Parent.parent_field1)
.ToList();

How can I select items from a table but ban certain from another?

I have two tables, one contains entities other entitylog.
MyEntity:
id, lat, lon
A entity has a position in the world.
MyEntityLog:
id, otherid, otherlat, otherlon
Entity with id has interacted with otherid at otherid's latitude and longitude.
For instance, I have the following entities:
1, 4.456, 2.234
2, 3.344, 6.453
3, 6.234, 9.324
(not very accurate, but it serves the purpose).
Now, If entity 1 interact with 2 the result on the log table would look like:
1, 2, 3.344, 6.453
So my question is, how can I for listing entity 1's available interactions NOT include the ones on the log table?
The result of listing entity 1's available interactions should be only be entity 3 as it already has a interaction with 2.
First make a list of ids that interact with entity 1:
var id1 = 1;
var excluded = from l in db.EntityLogs
where l.id == id1
select l.otherid;
then find the entries not having an id in this list or equal to id1:
var logs= from l in db.EntityLogs
where !excluded.Contains(l.id) && l.id != id1
select l;
Note that linq will defer the execution of excluded and incorporate it in the execution of logs.
Not sure if I understand your question, I guess I need more details, but if you want to list the entities that have no entry in log table, one solution will be something like this, assuming myEntities is the collection of MyEntity and myEntityLogs is the collection of MyEntityLog
var firstList = myEntities.Join(myEntityLogs, a => a.Id, b => b.Id, (a, b) => a).Distinct();
var secondList = myEntities.Join(myEntityLogs, a => a.Id, b => b.OtherId, (a, b) => a).Distinct();
var result = myEntities.Except(firstList.Concat(secondList)).ToList();

linq - parent/child query to select only lowest level

Given a common parent/child table:
Table A
Column Id int
Column Parent_Id int
Column Description text
I would like to only get the nodes that does not have any child nodes.
1,null,"PARENT A"
2,null,"PARENT B",
3,null,"PARENT C",
100,1,"CHILD A1",
101,1,"CHILD A2",
102,2,"CHILD B1"
So for my resultset I would like to only get:
Parent C (as it does not have any child elements), and child A1, B2, B1.
You don't say what exactly you are querying with LINQ, but the general idea is
var leafNodes = nodes.Where(n => nodes.Count(n1 => n1.Parent_Id == n.Id) == 0);
You might wanna prefer Any() method instead of Count() == 0. See Which method performs better: .Any() vs .Count() > 0?
var itemsWithoutChildren = nodes.Where(item=>!nodes.Any(innerItem=>innerItem.Parent_Id==item.Id))

LINQ join multiple conditions

I have formed the following LINQ query, but it gives error
mcc_season is not an attribute in mcc_product
How do I form the query where I have 2 WHERE conditions and both from different entities in the join
var guestCardProduct =
(from c in CrmOrgServiceContext.mcc_productpriceSet
join d in CrmOrgServiceContext.mcc_productSet
on c.mcc_product.Id equals d.mcc_productId
where d.mcc_producttype.Value == (int)mcc_product.mcc_producttypeOptionSet.GuestCard
&& c.mcc_season.Id == seasonId
select new
{
d.mcc_productId,
c.mcc_price
}).FirstOrDefault();
You might be able to re-write it as follows:
var guestCardProduct =
(from c in CrmOrgServiceContext.mcc_productpriceSet
where c.mcc_season.Id == seasonId
join d in CrmOrgServiceContext.mcc_productSet
on c.mcc_product.Id equals d.mcc_productId
where d.mcc_producttype.Value == (int)c.mcc_product.mcc_producttypeOptionSet.GuestCard
select new
{
d.mcc_productId,
c.mcc_price
}).FirstOrDefault();
We're assuming here that there are 1 - 0..1 relationships between mcc_productpriceSet and both mcc_season and mcc_product and also a 1 - 0..1 relationship between mcc_product and mcc_producttypeOptionSet. If you have 1-n relationships between any of these, then you are going to have to work through those relationships rather than dotting into the single property because you will have collections of child objects rather than a single child object. You may find it helpful to break your query into smaller pieces to narrow down the source of the problem.

How to do multiple left outer joins in a Linq Query?

A little background on the query below. Cell has a 1:M to Container and a 1:M with Printer. I want a query that will retrieve all Cells and associated containers, if they exist, and associated printers, if they exist. Essentially I want to do a left outer join on both tables. Here is the query I have:
var query = from cell in Cell
join container in Container.Where (row => row.SerialNumber == "1102141") on cell.CellID equals container.CellID
into containers
join printer in Printer.Where (row => row.Name == "PG10RelWarrPrt3") on cell.CellID equals printer.CellID
into printers
select new { Cell = cell, Containers = containers, Printers = printers };
query.Dump();
This query works, but is not efficient. It does a left outer join on Container, but, for each Cell, it performs a separate query to retrieve any Printer rows, instead of also doing a left outer join on Printer.
How can I change this so that it also does a left outer join on the Printer table? BTW, I want a hierarchical result set. IOW, each Cell should have a list of containers and a list of printers. Each would be empty of course, if none existed for the cell.
Here's a query to produce a flat result set with correct left joins.
var query = from cell in Cell
join container in Container.Where (row => row.SerialNumber == "1102141") on cell.CellID equals container.CellID
into containers
from container2 in containers.DefaultIfEmpty()
join printer in Printer.Where (row => row.Name == "PG10RelWarrPrt3") on cell.CellID equals printer.CellID
into printers
from printer2 in printers.DefaultIfEmpty()
select new { Cell = cell, Container = container2, Printer = printer2 };
You'll have to post-process the results locally to get the hierarchical shape desired.
If you write this post-processing code, you'll understand why linq to sql doesn't process multiple sibling collections for you.
To make this clearer, suppose you had 3 sibling collections.
If all three sibling collections were empty for some parent record, you'd have just the parent record 1 time with a bunch of nulls.
If all three sibling collections had 100 records for some parent record, you'd have 1 million rows, each with a copy of the parent record. Every child record would be duplicated 10,000 times in the result.
It's always important to keep in mind with any ORM that it generates sql and gets back flat result sets, no matter what hierarchically shaped result it eventually present you with.
It's usually wrong to use join in LINQ to SQL.
Try:
var query = from cell in Cell
select new
{
Cell = cell,
Containers = cell.Containers
.Where (row => row.SerialNumber == "1102141"),
Printers = cell.Printers
.Where (row => row.Name == "PG10RelWarrPrt3")
};

Resources