Entity Framework Linq - how to get groups that contain all your data - linq

Here a sample dataset:
OrderProduct is a table that contains the productIds that were part of a given order.
Note: OrderProduct is a database table and I am using EF.
OrderId, ProductId
1, 1
2, 2
3, 4
3, 5
4, 5
4, 2
5, 2
5, 3
What I want to be able to do is find an order that contains only the productIds that I am searching for. So if my input was productIds 2,3, then I should get back OrderId 5.
I know how I can group data, but I am unsure of how to perform the select on the group.
Here is what I have:
var q = from op in OrderProduct
group op by op.OrderId into orderGroup
select orderGroup;
Not sure how to proceed from here

IEnumerable<int> products = new List<int> {2, 3};
IEnumerable<OrderProduct> orderProducts = new List<OrderProduct>
{
new OrderProduct(1, 1),
new OrderProduct(2, 2),
new OrderProduct(3, 4),
new OrderProduct(3, 5),
new OrderProduct(4, 5),
new OrderProduct(4, 2),
new OrderProduct(5, 2),
new OrderProduct(5, 3),
};
var orders =
(from op in orderProducts
group op by op.OrderId into orderGroup
//magic goes there
where !products.Except(orderGroup.Select(x => x.ProductId)).Any()
select orderGroup);
//outputs 5
orders.Select(x => x.Key).ToList().ForEach(Console.WriteLine);
Or you can have another version as pointed in another answer, just replace
where !products.Except(orderGroup.Select(x => x.ProductId)).Any()
on
where products.All(pid => orderGroup.Any(op => op.ProductId == pid))
second one will have ~ 15% better performance (I've checked that)
Edit
According to the last requirement change, that you need orders that contain not all productIds you are searching, but exactly those and only those productIds, I wrote an updated version:
var orders =
(from op in orderProducts
group op by op.OrderId into orderGroup
//this line was added
where orderGroup.Count() == products.Count()
where !products.Except(orderGroup.Select(x => x.ProductId)).Any()
select orderGroup);
So the only thing you'll need is to add a precondition ensuring that collections contains the same amount of elements, it will work for both previous queries, and as a bonus I suggest 3rd version of the most important where condition:
where orderGroup.Select(x => x.ProductId).Intersect(products).Count() == orderGroup.Count()

At first glance, I'd try something like this:
var prodIds = new[] {2, 3};
from o in context.Orders
where prodIds.All(pid => o.OrderProducts.Any(op => op.ProductId == pid))
select o
In plain language: "get the orders that have a product with every ID in the given list."
Update
Since it appears you are using LINQ to SQL rather than LINQ to Entities, here's another approach:
var q = context.Orders;
foreach(var pid in prodIds)
{
q = q.Where(o => o.OrderProducts.Any(op => op.ProductId == pid));
}
Rather than using a single LINQ statement, you essentially build the query piecemeal.

Thanks to StriplingWarrior's answer I managed to figure this out. Not sure if this is the best way to do this, but it works.
List<int> prodIds = new List<int>{2,3};
var q = from o in Orders
//get all orderproducts that contain products in the ProdId list
where o.OrderProducts.All(op => prodIds.Contains(op.ProductId))
//now group the OrderProducts by the Orders
select from op in o.OrderProducts
group op by op.OrderId into opGroup
//select only those groups that have the same count as the prodId list
where opGroup.Count() == prodIds.Count()
select opGroup;
//get rid of any groups that may be empty
q = q.Where(fi => fi.Count()> 0);
(I am using LinqPad, which is why the query looks a little funky - no context, etc)

Related

Linq: Count number of times a sub list appear in another list

I guess there must be an easy way, but not finding it. I would like to check whether a list of items, appear (completely or partially) in another list.
For example: Let's say I have people in a department as List 1. Then I have a list of sports with a list of participants in that sport.
Now I want to count, in how many sports does all the people of a department appear.
(I know some tables might not make sense when looking at it from a normalisation angle, but it is easier this way than to try and explain my real tables)
So I have something like this:
var peopleInDepartment = from d in Department_Members
group d by r.DepartmentID into g
select new
{
DepartmentID = g.Key,
TeamMembers = g.Select(r => d.PersonID).ToList()
};
var peopleInTeam = from s in Sports
select new
{
SportID = s.SportID,
PeopleInSport = s.Participants.Select(x => x.PersonID),
NoOfMatches = peopleInDepartment.Contains(s.Participants.Select(x => x.PersonID)).Count()
};
The error here is that peopleInDepartment does not contain a definition for 'Contains'. Think I'm just in need of a new angle to look at this.
As the end result I would like print:
Department 1 : The Department participates in 3 sports
Department 2 : The Department participates in 0 sports
etc.
Judging from the expected result, you should base the query on Department table like the first query. Maybe just include the sports count in the first query like so :
var peopleInDepartment =
from d in Department_Members
group d by r.DepartmentID into g
select new
{
DepartmentID = g.Key,
TeamMembers = g.Select(r => d.PersonID).ToList(),
NumberOfSports = Sports.Count(s => s.Participants
.Any(p => g.Select(r => r.PersonID)
.Contains(p.PersonID)
)
)
};
NumberOfSports should contains count of sports, where any of its participant is listed as member of current department (g.Select(r => r.PersonID).Contains(p.PersonID))).

How to use OrderBy in Linq

I am using OrderBy, and I have figured out that I have to use OrderBy as a last method, or it will not work. Distinct operator does not grant that it will maintain the original order of values, or if I use Include, it cannot sort the children collection.
Is there any reason why I shouldn't do Orderby always last and don't worry if order is preserved?
Edit:
In general, is there any reason, like performance impact, why I should not use OrderBy last. Doesnt metter if I use EnityFramework to query a database or just querying some collection.
dbContext.EntityFramework.Distinct().OrderBy(o=> o.Something); // this will give me ordered result
dbContext.EntityFramework.OrderBy(o=> o.Something).Distinct().; // this will not, because Distinct doesnt preserve order.
Lets say that I want to Select only one property.
dbContext.EntityFramework.Select(o=> o.Selected).OrderBy(o=> o.Something);
Will order be faster if I order collection after one property selection? So in that case I should use Order last. And I am just asking is there any situation where ordering shoudnt be done as last command?
Is there any reason why I shouldn't do OrderBy always last
There may be reasons to use OrderBy not as the last statement. For example, the sort property may not be in the result:
var result = context.Entities
.OrderBy(e => e.Date)
.Select(e => e.Name);
Or you want a sorted collection as part of the result:
var result = context.Customers
.Select(c => new
{
Customer = c,
Orders = c.Orders.OrderBy(o => o.Date)
Address = c.Address
});
Will order be faster if I order collection after one property selection?
Your examples show that you're working with LINQ to Entities, so the statements will be translated into SQL. You will notice that...
context.Entities
.OrderBy(e => e.Name)
.Select(e => e.Name)
... and ...
context.Entities
.Select(e => e.Name)
.OrderBy(s => s)
... will produce exactly the same SQL. So there is no essential difference between both OrderBy positions.
Doesn't matter if I use Entity Framework to query a database or just querying some collection.
Well, that does matter. For example, if you do...
context.Entities
.OrderBy(e => e.Date)
.Select(e => e.Name)
.Distinct()
... you'll notice that the OrderBy is completely ignored by EF and the order of names is unpredictable.
However, if you do ...
context.Entities
.AsEnumerable() // Continue as LINQ to objects
.OrderBy(e => e.Date)
.Select(e => e.Name)
.Distinct()
... you'll see that the sort order is preserved in the distinct result. LINQ to objects clearly has a different strategy than LINQ to Entities. OrderBy at the end of the statement would have made both results equal.
To sum it up, I'd say that as a rule of the thumb, try to order as late as possible in a LINQ query. This will produce the most predictable results.
I don't know if you misundertood the meaning of Distinct. According to definition it does:
Returns distinct elements from a sequence by using the default equality comparer to compare values.
So if you have a list of int and you want to remove repeated values, you use Distinct. Distinct uses the default equality comparer and it does the comparison by comparing the current element to the next one. So, you have to sort first to get the expected result.
And about OrderBy method, in fact, it does the sort. So if you want to sort something and distinct after you use:
List<int> myNumbers = new List<int>{ 102, 2817, 82, 2, 1, 2, 1, 9, 4 };
Sorting and removing duplicated numbers
// returns 1, 2, 4, 9, 82, 102, 2817
var sortedUniques = myNumbers.OrderBy(n => n).Distinct();
Removing duplicated numbers and sorting
// returns 1, 1, 2, 2, 4, 9, 82, 102, 2817
// It occurs because the Distinct compares current number to the next one
var sortedUniques = myNumbers.Distinct().OrderBy(n => n);
Just removing duplicated numbers
// returns 102, 2817, 82, 2, 1, 9, 4
var sortedUniques = myNumbers.Distinct().OrderBy(n => n);
Just sorting
// returns 1, 1, 2, 2, 4, 9, 82, 102, 2817
var sortedUniques = myNumbers.Distinct().OrderBy(n => n);
I hope it helps you \o/

How can I select items from a table but ban certain from another?

I have two tables, one contains entities other entitylog.
MyEntity:
id, lat, lon
A entity has a position in the world.
MyEntityLog:
id, otherid, otherlat, otherlon
Entity with id has interacted with otherid at otherid's latitude and longitude.
For instance, I have the following entities:
1, 4.456, 2.234
2, 3.344, 6.453
3, 6.234, 9.324
(not very accurate, but it serves the purpose).
Now, If entity 1 interact with 2 the result on the log table would look like:
1, 2, 3.344, 6.453
So my question is, how can I for listing entity 1's available interactions NOT include the ones on the log table?
The result of listing entity 1's available interactions should be only be entity 3 as it already has a interaction with 2.
First make a list of ids that interact with entity 1:
var id1 = 1;
var excluded = from l in db.EntityLogs
where l.id == id1
select l.otherid;
then find the entries not having an id in this list or equal to id1:
var logs= from l in db.EntityLogs
where !excluded.Contains(l.id) && l.id != id1
select l;
Note that linq will defer the execution of excluded and incorporate it in the execution of logs.
Not sure if I understand your question, I guess I need more details, but if you want to list the entities that have no entry in log table, one solution will be something like this, assuming myEntities is the collection of MyEntity and myEntityLogs is the collection of MyEntityLog
var firstList = myEntities.Join(myEntityLogs, a => a.Id, b => b.Id, (a, b) => a).Distinct();
var secondList = myEntities.Join(myEntityLogs, a => a.Id, b => b.OtherId, (a, b) => a).Distinct();
var result = myEntities.Except(firstList.Concat(secondList)).ToList();

Max sequence from a view containing multiple record using Linq lambda

I've been at this for a while. I have a data set that has a reoccurring key and a sequence similar to this:
id status sequence
1 open 1
1 processing 2
2 open 1
2 processing 2
2 closed 3
a new row is added for each 'action' that happens, so the various ids can have variable sequences. I need to get the Max sequence number for each id, but I still need to return the complete record.
I want to end up with sequence 2 for id 1, and sequence 3 for id 2.
I can't seem to get this to work without selecting the distinct ids, then looping through the results, ordering the values and then adding the first item to another list, but that's so slow.
var ids = this.ObjectContext.TNTP_FILE_MONITORING.Select(i => i.FILE_EVENT_ID).Distinct();
List<TNTP_FILE_MONITORING> vals = new List<TNTP_FILE_MONITORING>();
foreach (var item in items)
{
vals.Add(this.ObjectContext.TNTP_FILE_MONITORING.Where(mfe => ids.Contains(mfe.FILE_EVENT_ID)).OrderByDescending(mfe => mfe.FILE_EVENT_SEQ).First<TNTP_FILE_MONITORING>());
}
There must be a better way!
Here's what worked for me:
var ts = new[] { new T(1,1), new T(1,2), new T(2,1), new T(2,2), new T(2,3) };
var q =
from t in ts
group t by t.ID into g
let max = g.Max(x => x.Seq)
select g.FirstOrDefault(t1 => t1.Seq == max);
(Just need to apply that to your datatable, but the query stays about the same)
Note that with your current method, because you are iterating over all records, you also get all records from the datastore. By using a query like this, you allow for translation into a query against the datastore, which is not only faster, but also only returns only the results you need (assuming you are using Entity Framework or Linq2SQL).

What's the LINQ to select the latest item from a number of versioned items?

I've got a class like the following:
public class Invoice
{
public int InvoiceId {get;set;}
public int VersionId {get;set;}
}
Each time an Invoice is modified, the VersionId gets incremented, but the InvoiceId remains the same. So given an IEnumerable<Invoice> which has the following results:
InvoiceId VersionId
1 1
1 2
1 3
2 1
2 2
How can I get just the results:
InvoiceId VersionId
1 3
2 2
I.e. I want just the Invoices from the results which have the latest VersionId. I can easily do this in T-SQL, but cannot for the life of me work out the correct LINQ syntax. I'm using Entity Framework 4 Code First.
Order by the VersionId, group them by InvoiceId, then take the first result of each group. Try this:
var query = list.OrderByDescending(i => i.VersionId)
.GroupBy(i => i.InvoiceId)
.Select(g => g.First());
EDIT: how about this approach using Max?
var query = list.GroupBy(i => i.InvoiceId)
.Select(g => g.Single(i => i.VersionId == g.Max(o => o.VersionId)));
Try using FirstOrDefault or SingleOrDefault in place of Single as well... it would give the same result although Single shows the intention better.
EDIT: I've tested both these queries with LINQ to Entities. They seem to work, so perhaps the issue is something else?
Option 1:
var latestInvoices = invoices.GroupBy(i => i.InvoiceId)
.Select(group => group.OrderByDescending(i => i.VersionId)
.FirstOrDefault());
EDIT: Changed 'Last' to 'FirstOrDefault', LINQ to Entities has issues with the 'Last' query operator.
Option 2:
var invoices = from invoice in dc.Invoices
group invoice by invoice.InvoiceId into invoiceGroup
let maxVersion = invoiceGroup.Max(i => i.VersionId)
from candidate in invoiceGroup
where candidate.VersionId == maxVersion
select candidate;
My version:
var h = from i in Invoices
group i.VersionId by i.InvoiceId into grouping
select new {InvoiceId = grouping.Key, VersionId = grouping.Max()};
Update
As was mentioned by Ahmad in the comments, the above query will return a projection. The version below will return a IQueryable<Invoice>. I use composition to build the query because I think it is more clear.
var maxVersions = from i in Invoices
group i.VersionId by i.InvoiceId into grouping
select new {InvoiceId = grouping.Key,
VersionId = grouping.Max()};
var latestInvoices = from i in Invoices
join m in maxVersions
on new {i.InvoiceId, i.VersionId} equals
new {m.InvoiceId, m.VersionId}
select i;

Resources