I am trying to query a table with nested linq query. My query working but is too slow. I have almost 400k row. And this query work 10 seconds for 1000 rows. For 400k I think its about to 2 hours.
I have rows like this
StudentNumber - DepartmentID
n100 - 1
n100 - 1
n105 - 1
n105 - 2
n107 - 1
I want the students which have different department ID. My results looks like this.
StudentID - List
n105 - 1 2
And my query provides it. But slowly.
var sorgu = (from yok in YOKAktarim
group yok by yok.StudentID into g
select new {
g.Key,
liste=(from birim in YOKAktarim where birim.StudentID == g.Key select new { birim.DepartmentID }).ToList().GroupBy (x => x.DepartmentID).Count()>1 ? (from birim in YOKAktarim where birim.StudentID == g.Key select new { birim.DepartmentID }).GroupBy(x => x.DepartmentID).Select(x => x.Key).ToList() : null,
}).Take(1000).ToList();
Console.WriteLine(sorgu.Where (s => s.liste != null).OrderBy (s => s.Key));
I wrote this query with linqpad C# statement.
For 400K records you should be able to return the student ids and department ids into an in-memory list.
var list1 = (from r in YOKAktarim
group r by new { r.StudentID, r.DepartmentID} into g
select g.Key
).ToList();
Once you have this list, you should be able to group by StudentID and select those students who have more than one record.
var list2 = (from r in list1 group r by r.StudentID into g
where g.Count() > 1
select new
{
StudentID = g.Key,
Departments = g.Select(a => a.DepartmentID).ToList()
}
).ToList();
This should be faster as it only hits the sql database once, rather than hundreds of thousands of times.
You're iterating your source collection (YOKAktarim) three times, which makes your query *O(n^3)` query. It's going to be slow.
Instead of going back to source collection to get content of the group you can simply iterate over g.
var sorgu = (from yok in YOKAktarim
group yok by yok.StudentID into g
select new {
g.Key,
liste = from birim in g select new { birim.DepartmentID }).ToList().GroupBy (x => x.DepartmentID).Count()>1 ? (from birim in g select new { birim.DepartmentID }).GroupBy(x => x.DepartmentID).Select(x => x.Key).ToList() : null,
}).Take(1000).ToList();
However, that's still not optimal, because you're doing a lot of redundant subgrouping. Your query is pretty much equivalent to:
from yok in YOKAktarim
group yok by yok.StudentID into g
let departments = g.Select(g => g.DepartmentID).Distinct().ToList()
where departments.Count() > 1
select new {
g.Key,
liste = departments
}).Take(1000).ToList();
I can't speak for the correctness of that monster, but simply removing all ToList() calls except the outermost one will fix your issue.
Related
I'm trying to convert my SQL statement to a Linq statement and I'm not sure how to add the second COUNT to it. This is my SQL statement
SELECT l.Campus_Name, Labs = COUNT(*), LabsWithSubnets = COUNT(s.Lab_Space_Id)
FROM vw_Lab_Space l
LEFT JOIN vw_Subnet s on l.Lab_Space_Id = s.Lab_Space_Id
GROUP BY l.Campus_Name
ORDER BY 1
and this is my LINQ statement so far:
from l in Vw_Lab_Space
from s in Vw_Subnet
.Where(s => s.Lab_Space_Id == l.Lab_Space_Id)
.DefaultIfEmpty() // <=- triggers the LEFT JOIN
group l by new { l.Campus_Name } into g
orderby g.Key.Campus_Name
select new {
Campus_Name = g.Key.Campus_Name,
Labs = g.Count()
}
So I have everything but the LabsWithSubnets part in there. I'm just not sure how to add that in as I can't just do an s.Lab_Space_id.Count() in the select statement.
If you need table structure and sample data please see Need help creating an OUTER JOIN to count spaces.
Using your query as a basis, you need the groups to include s so you can count when non-null (I also removed the unnecessary anonymous object around the grouping key):
from l in Vw_Lab_Space
from s in Vw_Subnet
.Where(s => s.Lab_Space_Id == l.Lab_Space_Id)
.DefaultIfEmpty() // <=- triggers the LEFT JOIN
group new { l, s } by l.Campus_Name into g
orderby g.Key
select new {
Campus_Name = g.Key,
Labs = g.Count(),
LabsWithSubnets = g.Count(ls => ls.s != null)
}
However, rather than translate the SQL, I would probably take advantage of LINQ's group join to handle the query slightly differently:
var ans = from l in Vw_Lab_Space
join s in Vw_Subnet on l.Lab_Space_Id equals s.Lab_Space_Id into sj
group new { l, sj } by ls.Campus_Name into lsjg
select new {
Campus_Name = lsjg.Key,
NumLabs = lsjg.Count(),
LabsWithSubnets = lsjg.Sum(lsj => lsj.sj.Count())
};
PS Even in your query, I would use join...from...DefaultIfEmpty rather than from...from...where but depending on your database engine, may not matter.
I'm working on this LINQ query. I'd like the resulting list return a list of records that contain duplicates exclusively, based on the EMailAddress1 field and grouped by the EMailAddress1 field.
For instance:
emailaddress1#gmail.com
emailaddress1#gmail.com
emailaddress2#gmail.com
emailaddress2#gmail.com
emailaddress2#gmail.com
emailaddress3#gmail.com
emailaddress3#gmail.com
etc.
Any advice on this? Thanks.
var contacts = (from c in xrm.ContactSet
where c.StateCode != 1
orderby c.EMailAddress1, c.CreatedOn
descending select new {
c.FirstName,
c.LastName,
c.EMailAddress1,
c.ContactId,
c.CreatedOn }).ToList();
Based on your previous query:
var duplicatedEmails = (from c in contacts
group c by c.EMailAddress1 into g
where g.Count() > 1
select g.Key).ToList();
var duplicatedContacts = contacts.Where(c => duplicatedEmails.Contains(c.EMailAddress1));
I'm having trouble getting my Linq statemnt to work when doing an outer join and a group by. Here's a SQL version of what I'm trying to accomplish:
select p.PRIMARY_KEY, min(p.EFFECTIVE_DATE), sum(IsNull(c.PAID_INDEMNITY, 0))
from PRMPOLCY p
left outer join CLMMAST c on p.PRIMARY_KEY = c.POLICY_NO
where p.UNDERWRITER_UID = 93
GROUP BY p.PRIMARY_KEY
Here's what I have in Linq (which doesn't work):
var result = from p in context.PRMPOLCies
join c in context.CLMMASTs on p.PRIMARY_KEY equals c.POLICY_NO into polClm
where (p.UNDERWRITER_UID == underwriter)
from grp in polClm.DefaultIfEmpty()
group grp by p.PRIMARY_KEY into g
select new PolicySummation()
{
PolicyNo = g.Key,
Incurred = g.Sum(grp => grp.PAID_INDEMNITY ),
EffDate = g.Min(grp => grp.PRMPOLCY.EFFECTIVE_DATE
};
Beating my head against the wall trying to figurwe this out!
Assuming you have a navigation property set up between PRMPOLCY and CLMMAST, you shouldn't need to specify the join explicitly. It's much easier to express most queries in linq without explicit joins, but rather treating your structures as a hierarchy. I don't know the specifics of your model property names, but I'd take a guess that something like this would work.
var result =
from p in context.PRMPOLCies
where (p.UNDERWRITER_UID == underwriter)
select new PolicySummation {
PolicyNo = p.PRIMARY_KEY,
Incurred = p.CLMASTs.Select(c => c.PAID_INDEMNITY).DefaultIfEmpty().Sum(),
EffDate = p.EFFECTIVE_DATE,
};
You need to include both your tables in the group clause like this:
group new { p, grp } by p.PRIMARY_KEY into g
Then in your Sum / Min
g.Sum(grp => grp.grp == null ? 0 : grp.grp.PAID_INDEMNITY )
g.Min(grp => grp.p.PRMPOLCY.EFFECTIVE_DATE)
I am using BLToolKit in a project of mine and I was trying to get this to work. What I don't like is that I am trying to average a bunch of temps down to the minute, but the select statement that is being generated groups by the minute but then selects the original time. I think I am doing the linq expression correctly (but then again, i am not getting the results i expect). (this is C#, if you care) Anyone know what is going wrong?
var test = (from r in db.SensorReadingRaws
where r.TimeLogged < DateTime.Now.AddMinutes(-2)
group r by new
{
Sensor = r.SensorNumber,
//group time down to the minute
Time = r.TimeLogged.AddSeconds(-1 * r.TimeLogged.Second).AddMilliseconds(-1 * r.TimeLogged.Millisecond)
} into grouped
select new SensorReading
{
SensorNumber = grouped.Key.Sensor,
TimeLogged = grouped.Key.Time,
Reading = (int)grouped.Average(x => x.Reading)
}).ToList();
textBox1.Text = db.LastQuery;
and the resulting query is this
SELECT
[r].[SensorNumber],
[r].[TimeLogged],
Avg([r].[Reading]) as [c1]
FROM
[SensorReadingRaw] [r]
WHERE
[r].[TimeLogged] < #p1
GROUP BY
[r].[SensorNumber],
DateAdd(Millisecond, Convert(Float, -DatePart(Millisecond, [r].[TimeLogged])), DateAdd(Second, Convert(Float, -DatePart(Second, [r].[TimeLogged])), [r].[TimeLogged])),
[r].[TimeLogged]
I discovered that
BLToolkit.Data.Linq.Sql.AsSql<T>(T obj)
can be used as a workaround for this case.
When applying this function to the required grouped key properties in select statement you get rid of grouping/selecting an original field.
It may look something like:
_queryStore.Leads().
GroupBy(x => new {
x.LeadDate.Hour,
x.LeadDate.Minute
}).
Select(x => new {
Hour = Sql.AsSql(x.Key.Hour),
Minute = Sql.AsSql(x.Key.Minute),
Count = x.Count()
});
and in your particular case:
var test = (from r in db.SensorReadingRaws
where r.TimeLogged < DateTime.Now.AddMinutes(-2)
group r by new
{
Sensor = r.SensorNumber,
//group time down to the minute
Time = r.TimeLogged.AddSeconds(-1 * r.TimeLogged.Second).AddMilliseconds(-1 * r.TimeLogged.Millisecond)
} into grouped
select new SensorReading
{
SensorNumber = grouped.Key.Sensor,
TimeLogged = Sql.AsSql(grouped.Key.Time),
Reading = (int)grouped.Average(x => x.Reading)
}).ToList();
I got same issue yesterday.
Today I found a workaround. The idea is to write 2 linq queries. First transforming the data and the second grouping the result:
var bandAndDate =
(from r in repo.Entities
select new {Band = r.Score / 33, r.StartTime.Date});
var examsByBandAndDay =
(from r in bandAndDate
group r by new {r.Band, r.Date } into g
select new { g.Key.Date, g.Key.Band, Count = g.Count() }).ToList();
Both this queries run one SQL that do the job:
SELECT
[t1].[c1] as [c11],
[t1].[c2] as [c21],
Count(*) as [c3]
FROM
(
SELECT
[r].[Score] / 33 as [c2],
Cast(Floor(Cast([r].[StartTime] as Float)) as DateTime) as [c1]
FROM
[Results] [r]
) [t1]
GROUP BY
[t1].[c2],
[t1].[c1]
I am trying to figure out how to go about writing a linq query to perform an aggregate like the sql query below:
select d.ID, d.FIRST_NAME, d.LAST_NAME, count(s.id) as design_count
from tbldesigner d inner join
TBLDESIGN s on d.ID = s.DESIGNER_ID
where s.COMPLETED = 1 and d.ACTIVE = 1
group by d.ID, d.FIRST_NAME, d.LAST_NAME
Having COUNT(s.id) > 0
If this is even possible with a linq query could somebody please provide me with an example.
Thanks in Advance,
Billy
A more direct translation of your original SQL query would look like this:
var q =
// Join tables TblDesign with TblDesigner and filter them
from d in db.TblDesigner
join s in db.TblDesign on d.ID equals s.DesignerID
where s.Completed && d.Active
// Key and values used for grouping (note, you don't really need the
// value here, because you only need Count of the values in a group, but
// in case you needed anything from 's' or 'd' in 'select', you'd write this
let value = new { s, d }
let key = new { d.ID, d.FirstName, d.LastName }
group value by key into g
// Now, filter the created groups (return only non-empty) and select
// information for every group
where g.Count() > 0
select { ID = g.Key.ID, FirstName = g.Key.FirstName,
LastName = g.Key.LastName, Count = g.Count() };
The HAVING clause is translated to an ordinary where that is applied after grouping values using group ... by. The result of grouping is a collection of groups (another collections), so you can use where to filter groups. In the select clause, you can then return information from the key (used for grouping) and aggregate of values (using g.Count())
EDIT: As mmcteam points out (see comments), the where g.Count() > 0 clause is not necessary, because this is already guranteed by the join. I'll leave it there, because it shows how to translate HAVING clause in general, so it may be helpful in other cases.
Here's how I'd do it. Please note that I'm accustomed to linqtosql and am unaware if there are differences for the query in linqtoentities.
var query =
from d in myObjectContext.tbldesigner
where d.ACTIVE == 1
let manys =
from s in d.tbldesign
where s.COMPLETED == 1
select s
where manys.Count() > 0
select new
{
d.ID, d.FIRST_NAME, d.LAST_NAME,
DesignCount = manys.Count()
};
Ignoring the s.id which is confusing me (see my comment on the question), this is a simple query which would generate a having clause. Of course, in this case it's a worthless example since the count will always be more than 0 in this case.
Anyways, if you are using SQL to Entities, you should use the entity mapping to access the foreign key relationships instead of manually doing a join or a subquery.
var results = from d in db.tbldesigner
where d.TBLDESIGN.COMPLETED && d.ACTIVE
group d by new {d.ID, d.FIRST_NAME, d.LAST_NAME} into g
where g.Count() >= 0
select new {
d.ID, d.FIRST_NAME, d.LAST_NAME,
Count = g.Count()
};
NOTE: This is untested (and uncompiled) so there might be some issues, but this is where I would start.