LINQ Query To Return Duplicates Exclusively - linq

I'm working on this LINQ query. I'd like the resulting list return a list of records that contain duplicates exclusively, based on the EMailAddress1 field and grouped by the EMailAddress1 field.
For instance:
emailaddress1#gmail.com
emailaddress1#gmail.com
emailaddress2#gmail.com
emailaddress2#gmail.com
emailaddress2#gmail.com
emailaddress3#gmail.com
emailaddress3#gmail.com
etc.
Any advice on this? Thanks.
var contacts = (from c in xrm.ContactSet
where c.StateCode != 1
orderby c.EMailAddress1, c.CreatedOn
descending select new {
c.FirstName,
c.LastName,
c.EMailAddress1,
c.ContactId,
c.CreatedOn }).ToList();

Based on your previous query:
var duplicatedEmails = (from c in contacts
group c by c.EMailAddress1 into g
where g.Count() > 1
select g.Key).ToList();
var duplicatedContacts = contacts.Where(c => duplicatedEmails.Contains(c.EMailAddress1));

Related

is there a faster way to work with nested linq query?

I am trying to query a table with nested linq query. My query working but is too slow. I have almost 400k row. And this query work 10 seconds for 1000 rows. For 400k I think its about to 2 hours.
I have rows like this
StudentNumber - DepartmentID
n100 - 1
n100 - 1
n105 - 1
n105 - 2
n107 - 1
I want the students which have different department ID. My results looks like this.
StudentID - List
n105 - 1 2
And my query provides it. But slowly.
var sorgu = (from yok in YOKAktarim
group yok by yok.StudentID into g
select new {
g.Key,
liste=(from birim in YOKAktarim where birim.StudentID == g.Key select new { birim.DepartmentID }).ToList().GroupBy (x => x.DepartmentID).Count()>1 ? (from birim in YOKAktarim where birim.StudentID == g.Key select new { birim.DepartmentID }).GroupBy(x => x.DepartmentID).Select(x => x.Key).ToList() : null,
}).Take(1000).ToList();
Console.WriteLine(sorgu.Where (s => s.liste != null).OrderBy (s => s.Key));
I wrote this query with linqpad C# statement.
For 400K records you should be able to return the student ids and department ids into an in-memory list.
var list1 = (from r in YOKAktarim
group r by new { r.StudentID, r.DepartmentID} into g
select g.Key
).ToList();
Once you have this list, you should be able to group by StudentID and select those students who have more than one record.
var list2 = (from r in list1 group r by r.StudentID into g
where g.Count() > 1
select new
{
StudentID = g.Key,
Departments = g.Select(a => a.DepartmentID).ToList()
}
).ToList();
This should be faster as it only hits the sql database once, rather than hundreds of thousands of times.
You're iterating your source collection (YOKAktarim) three times, which makes your query *O(n^3)` query. It's going to be slow.
Instead of going back to source collection to get content of the group you can simply iterate over g.
var sorgu = (from yok in YOKAktarim
group yok by yok.StudentID into g
select new {
g.Key,
liste = from birim in g select new { birim.DepartmentID }).ToList().GroupBy (x => x.DepartmentID).Count()>1 ? (from birim in g select new { birim.DepartmentID }).GroupBy(x => x.DepartmentID).Select(x => x.Key).ToList() : null,
}).Take(1000).ToList();
However, that's still not optimal, because you're doing a lot of redundant subgrouping. Your query is pretty much equivalent to:
from yok in YOKAktarim
group yok by yok.StudentID into g
let departments = g.Select(g => g.DepartmentID).Distinct().ToList()
where departments.Count() > 1
select new {
g.Key,
liste = departments
}).Take(1000).ToList();
I can't speak for the correctness of that monster, but simply removing all ToList() calls except the outermost one will fix your issue.

Filter list having two Tables join data in Entity Framework

I have two tables..
Student (StudentId,Name,FatherName)
Qualification (QualificationId,StudentId,DegreeName)
I have got data like this..
var myList = (from c in entities.Students
join q in entities.Qualifications on c.StudentId equals q.StudentId
select new {c.Name,c.FatherName,q.DegreeName}).ToList();
Now i want to filter myList more.. How can i do it, like..
var filteredList = myList.Select(c=> new Student
{
Name=c.Name,
FatherName=c.FatherName
//Degree=C.Degree
}).ToList();
The above Linq Query is not working if i want to get DegreeName also, My Question is how to further Filter myList.
Thanks.
var filteredList = myList.Where(i => i.FatherName == "Shahid").ToList();
Keep in mind since you called ToList() on the original query you are now filtering in memory. If you want to filter in the database then remove the ToList() on the first query and do it like this:
var myList = from c in entities.Students
join q in entities.Qualifications on c.StudentId equals q.StudentId
select new {
c.Name,
c.FatherName,
q.DegreeName
};
var filteredInDatabase = myList.Where(i => i.FatherName == "Shahid").ToList();

How to find Distinct in more than one column in LINQ

I have a LINQ statement that returns many columns. I need to find distinct of unique combination of two columns. What is the best way to do this.
var productAttributeQuery =
from pa in ctx.exch_productattributeSet
join pp in ctx.exch_parentproductSet
on pa.exch_ParentProductId.Id equals pp.Id
join ep in ctx.exch_exchangeproductSet
on pp.exch_parentproductId equals ep.exch_ParentProductId.Id
where pa.exch_EffBeginDate <= effectiveDateForBeginCompare
&& pa.exch_EffEndDate >= effectiveDateForEndCompare
&& pa.statuscode == StusCodeEnum.Active
where pp.exch_EffBeginDate <= effectiveDateForBeginCompare
&& pp.exch_EffEndDate >= effectiveDateForEndCompare
&& pp.statuscode == StatusCodeEnum.Active
where ep.statuscode == StatusCodeEnum.Active
select new ProductAttributeDto
{
ParentProductId = pa.exch_ParentProductId.Id,
AttributeId = pa.exch_AttributeId.Id,
AttributeValue = pa.exch_Value,
AttributeRawValue = pa.exch_RawValue
};
return productAttributeQuery.ToList();
I want to get Distinct combination of ParentProductId and AttributeId from this list
You can group by anonymous type and select keys (they will be distinct)
var query = from p in productAttributeQuery
group p by new {
p.ParentProductId,
p.AttributeId
} into g
select g.Key;
You can use same approach with you original query if you want to get distinct pairs on server side.
Another approach - project results into pairs and get distinct from them:
var query = productAttributeQuery
.Select(p => new { p.ParentProductId, p.AttributeId })
.Distinct();

how can this SQL be done in LINQ?

i have this simple SQL query...
-- BestSeller
SELECT TOP(1) v.make, v.model, COUNT(v.make) AS NoSold
FROM Vehicles v
group by v.make, v.model
order by NoSold DESC
Im using entity framwork and want to do the same thing using linq. so far i have...
var tester = (from v in DB.VP_Historical_Vehicles
group v by v.make into g
orderby g.Count() descending
select new { make = g.Key, model = g, count = g.Count() }).Take(1);
foreach(var t in tester)
{
BestSeller.Make = t.make;
BestSeller.Model = t.make;
BestSeller.CountValue = t.count;
}
i keep getting timeouts, the database is large but the SQL runs very quick
any sugestions?
thanks
truegilly
Group by a compound key.
var t = (
from v in DB.VP_Historical_Vehicles
group v by new { v.make, v.model } into g
orderby g.Count() descending
select new { make = g.Key.make, model = g.Key.model, count = g.Count() }
)
.First();
BestSeller.Make = t.make;
BestSeller.Model = t.make;
BestSeller.CountValue = t.count;
Check what queries it performs when you run it with LINQ.
I suspect that you orderby g.Count() descending might be executing a COUNT query for each row and that would take a toll on performance to say the least.
When working with EF, always check what your LINQ statements produce in terms of queries. It is very easy to create queries that result in a n+1 scenario.
thanks to Scott Weinstein answer i was able to get it working
please comment if there is a more efficiant way of doing this...
VehicleStatsObject BestSeller = new VehicleStatsObject();
using (var DB = DataContext.Get_DataContext)
{
var t = (from v in DB.VP_Historical_Vehicles
group v by new { v.make, v.model } into g
orderby g.Count() ascending
select new { make = g.Key.make, model = g.Key.model, count = g.Count() }).OrderByDescending(x => x.count).First();
BestSeller.Make = t.make;
BestSeller.Model = t.model;
BestSeller.CountValue = t.count;
}
return BestSeller;

Linq Query with aggregate function

I am trying to figure out how to go about writing a linq query to perform an aggregate like the sql query below:
select d.ID, d.FIRST_NAME, d.LAST_NAME, count(s.id) as design_count
from tbldesigner d inner join
TBLDESIGN s on d.ID = s.DESIGNER_ID
where s.COMPLETED = 1 and d.ACTIVE = 1
group by d.ID, d.FIRST_NAME, d.LAST_NAME
Having COUNT(s.id) > 0
If this is even possible with a linq query could somebody please provide me with an example.
Thanks in Advance,
Billy
A more direct translation of your original SQL query would look like this:
var q =
// Join tables TblDesign with TblDesigner and filter them
from d in db.TblDesigner
join s in db.TblDesign on d.ID equals s.DesignerID
where s.Completed && d.Active
// Key and values used for grouping (note, you don't really need the
// value here, because you only need Count of the values in a group, but
// in case you needed anything from 's' or 'd' in 'select', you'd write this
let value = new { s, d }
let key = new { d.ID, d.FirstName, d.LastName }
group value by key into g
// Now, filter the created groups (return only non-empty) and select
// information for every group
where g.Count() > 0
select { ID = g.Key.ID, FirstName = g.Key.FirstName,
LastName = g.Key.LastName, Count = g.Count() };
The HAVING clause is translated to an ordinary where that is applied after grouping values using group ... by. The result of grouping is a collection of groups (another collections), so you can use where to filter groups. In the select clause, you can then return information from the key (used for grouping) and aggregate of values (using g.Count())
EDIT: As mmcteam points out (see comments), the where g.Count() > 0 clause is not necessary, because this is already guranteed by the join. I'll leave it there, because it shows how to translate HAVING clause in general, so it may be helpful in other cases.
Here's how I'd do it. Please note that I'm accustomed to linqtosql and am unaware if there are differences for the query in linqtoentities.
var query =
from d in myObjectContext.tbldesigner
where d.ACTIVE == 1
let manys =
from s in d.tbldesign
where s.COMPLETED == 1
select s
where manys.Count() > 0
select new
{
d.ID, d.FIRST_NAME, d.LAST_NAME,
DesignCount = manys.Count()
};
Ignoring the s.id which is confusing me (see my comment on the question), this is a simple query which would generate a having clause. Of course, in this case it's a worthless example since the count will always be more than 0 in this case.
Anyways, if you are using SQL to Entities, you should use the entity mapping to access the foreign key relationships instead of manually doing a join or a subquery.
var results = from d in db.tbldesigner
where d.TBLDESIGN.COMPLETED && d.ACTIVE
group d by new {d.ID, d.FIRST_NAME, d.LAST_NAME} into g
where g.Count() >= 0
select new {
d.ID, d.FIRST_NAME, d.LAST_NAME,
Count = g.Count()
};
NOTE: This is untested (and uncompiled) so there might be some issues, but this is where I would start.

Resources