Left join with grouping and ordering - linq

Musicians write songs. Songs are played on the air.
I have database tables Musicians, Songs and AirTimes. The AirTimes table entries hold information on which song was played on which date and for how many minutes.
I have classes Musician, Song, AirTime that correspond to the tables. The classes have navigational properties that point to the other entity. Arrows below represent navigation.
Musician <--> Song <--> AirTime
From the database, I have to retrieve all the Musicians and dates on which his/her song got AirTime. Plus I want to show the number of Songs played on a particular date and the number of minutes played on that date.
In Microsoft SQL, I would do it as follows:
select
dbo.Musicians.LastName
, dbo.AirTimes.PlayDate
, count(dbo.AirTimes.PlayDate) as 'No. of entries'
, sum(dbo.AirTimes.Duration) as 'No. of minutes'
from dbo.Musicians
left outer join dbo.Songs
on dbo.Musicinas.MusicianId = dbo.Songs.MusicianId
left outer join dbo.AirTimes
on dbo.Songs.SongId = dbo.AirTimes.SongId
and '2014-07-01T00:00:00' <= dbo.AirTimes.PlayDate
and dbo.AirTimes.PlayDate <= '2014-07-31T00:00:00'
group by
dbo.Musicians.LastName
, dbo.AirTimes.PlayDate
order by
dbo.Musicians.LastName
, dbo.AirTimes.PlayDate
Can anybody “translate” this into linq-to-entitese?
Update Aug. 9, 2012
I'm unable to confirm grudolf's schemes do what I wanted. I accomplished things with a different technique. Nonetheless, I accept his/her answer.

As you have the navigational properties in both directions you can start either from AirTimes:
var grpTime = (
from a in AirTimes
where a.Date >= firstDate && a.Date < lastDate
group a by new {a.Song.Musician.LastName, a.Song.Title, a.Date} into grp
select new {
grp.Key.LastName,
grp.Key.Title,
grp.Key.Date,
Plays = grp.Count(),
Seconds = grp.Sum(x => x.Duration)
}
);
or from Musicians:
var grpMus = (
from m in Musicians
from s in m.Songs
from p in s.Plays
where p.Date >= firstDate && p.Date < lastDate
group p by new {m.LastName, s.Title, p.Date} into grp
select new {
grp.Key.LastName,
grp.Key.Title,
grp.Key.Date,
Plays = grp.Count(),
Seconds= grp.Sum(x => x.Duration)
}
);
EDIT:
To display all musicians, including those without airtime you can use another level of grouping - in first step you calculate totals per song+day and then group them with song's author. It could probably work directly with database but I didn't manage to find an efficient way to do it. Yet. ;) With code, the original AirTimes result is changed to return Musician instead of his lastname and then joined to list of all musicians:
//Airtimes for musicians
var grpAir = (
from a in AirTimes
where a.Date >= firstDate && a.Date < lastDate
group a by new {a.Song.Musician, a.Date} into grp
select new {
//Musician instead of his LastName for joining. Id would work too
grp.Key.Musician,
//grp.Key.Musician.LastName,
Date=grp.Key.Date,
Plays = grp.Count(),
Secs = grp.Sum(x => x.Duration)
}
);
var res = (
from m in Musicians
join g in grpAir on m equals g.Musician into g2
from g in g2.DefaultIfEmpty()
orderby m.LastName
select new {
m.LastName,
Date = (g==null ? null : g.Date),
Plays = (g==null ? 0 : g.Plays),
Secs = (g==null ? 0 : g.Secs)
}
);
You can find a more complete LINQPad sample at https://gist.github.com/3236238

Related

How to make zero counts show in LINQ query when getting daily counts?

I have a database table with a datetime column and I simply want to count how many records per day going back 3 months. I am currently using this query:
var minDate = DateTime.Now.AddMonths(-3);
var stats = from t in TestStats
where t.Date > minDate
group t by EntityFunctions.TruncateTime(t.Date) into g
orderby g.Key
select new
{
date = g.Key,
count = g.Count()
};
That works fine, but the problem is that if there are no records for a day then that day is not in the results at all. For example:
3/21/2008 = 5
3/22/2008 = 2
3/24/2008 = 7
In that short example I want to make 3/23/2008 = 0. In the real query all zeros should show between 3 months ago and today.
Fabricating missing data is not straightforward in SQL. I would recommend getting the data that is in SQL, then joining it to an in-memory list of all relevant dates:
var stats = (from t in TestStats
where t.Date > minDate
group t by EntityFunctions.TruncateTime(t.Date) into g
orderby g.Key
select new
{
date = g.Key,
count = g.Count()
}).ToList(); // hydrate so we only query the DB once
var firstDate = stats.Min(s => s.date);
var lastDate = stats.Max(s => s.date);
var allDates = Enumerable.Range(1,(lastDate - firstDate).Days)
.Select(i => firstDate.AddDays(i-1));
stats = (from d in allDates
join s in stats
on d equals s.date into dates
from ds in dates.DefaultIfEmpty()
select new {
date = d,
count = ds == null ? 0 : ds.count
}).ToList();
You could also get a list of dates not in the data and concatenate them.
I agree with #D Stanley's answer but want to throw an additional consideration into the mix. What are you doing with this data? Is it getting processed by the caller? Is it rendered in a UI? Is it getting transferred over a network?
Consider the size of the data. Why do you need to have the gaps filled in? If it is known to be returning over a network for instance, I'd advise against filling in the gaps. All you're doing is increasing the data size. This has to be serialised, transferred, then deserialised.
If you are going to loop the data to render in a UI, then why do you need the gaps? Why not implement the loop from min date to max date (like D Stanley's join) then place a default when no value is found.
If you ARE transferring over a network and you still NEED a single collection, consider applying D Stanley's resolution on the other side of the wire.
Just things to consider...

is there a faster way to work with nested linq query?

I am trying to query a table with nested linq query. My query working but is too slow. I have almost 400k row. And this query work 10 seconds for 1000 rows. For 400k I think its about to 2 hours.
I have rows like this
StudentNumber - DepartmentID
n100 - 1
n100 - 1
n105 - 1
n105 - 2
n107 - 1
I want the students which have different department ID. My results looks like this.
StudentID - List
n105 - 1 2
And my query provides it. But slowly.
var sorgu = (from yok in YOKAktarim
group yok by yok.StudentID into g
select new {
g.Key,
liste=(from birim in YOKAktarim where birim.StudentID == g.Key select new { birim.DepartmentID }).ToList().GroupBy (x => x.DepartmentID).Count()>1 ? (from birim in YOKAktarim where birim.StudentID == g.Key select new { birim.DepartmentID }).GroupBy(x => x.DepartmentID).Select(x => x.Key).ToList() : null,
}).Take(1000).ToList();
Console.WriteLine(sorgu.Where (s => s.liste != null).OrderBy (s => s.Key));
I wrote this query with linqpad C# statement.
For 400K records you should be able to return the student ids and department ids into an in-memory list.
var list1 = (from r in YOKAktarim
group r by new { r.StudentID, r.DepartmentID} into g
select g.Key
).ToList();
Once you have this list, you should be able to group by StudentID and select those students who have more than one record.
var list2 = (from r in list1 group r by r.StudentID into g
where g.Count() > 1
select new
{
StudentID = g.Key,
Departments = g.Select(a => a.DepartmentID).ToList()
}
).ToList();
This should be faster as it only hits the sql database once, rather than hundreds of thousands of times.
You're iterating your source collection (YOKAktarim) three times, which makes your query *O(n^3)` query. It's going to be slow.
Instead of going back to source collection to get content of the group you can simply iterate over g.
var sorgu = (from yok in YOKAktarim
group yok by yok.StudentID into g
select new {
g.Key,
liste = from birim in g select new { birim.DepartmentID }).ToList().GroupBy (x => x.DepartmentID).Count()>1 ? (from birim in g select new { birim.DepartmentID }).GroupBy(x => x.DepartmentID).Select(x => x.Key).ToList() : null,
}).Take(1000).ToList();
However, that's still not optimal, because you're doing a lot of redundant subgrouping. Your query is pretty much equivalent to:
from yok in YOKAktarim
group yok by yok.StudentID into g
let departments = g.Select(g => g.DepartmentID).Distinct().ToList()
where departments.Count() > 1
select new {
g.Key,
liste = departments
}).Take(1000).ToList();
I can't speak for the correctness of that monster, but simply removing all ToList() calls except the outermost one will fix your issue.

Speed up LINQ query - EF5

I have the following LINQ query using EF5 and generic repository, unit of work patterns to a SQL Server 2008 db
var countriesArr = GetIdsFromDelimStr(countries);
var competitionsArr = GetIdsFromDelimStr(competitions);
var filterTeamName = string.Empty;
if (teamName != null)
{
filterTeamName = teamName.ToUpper();
}
using (var unitOfWork = new FootballUnitOfWork(ConnFooty))
{
// give us our selection of teams
var teams =
(from team in
unitOfWork.TeamRepository.Find()
where ((string.IsNullOrEmpty(filterTeamName) || team.Name.ToUpper().Contains(filterTeamName)) &&
(countriesArr.Contains(team.Venue.Country.Id) || countriesArr.Count() == 0))
select new
{
tId = team.Id
}).Distinct();
// give us our selection of contests
var conts = (
from cont in
unitOfWork.ContestRepository.Find(
c =>
((c.ContestType == ContestType.League && competitionsArr.Count() == 0) ||
(competitionsArr.Contains(c.Competition.Id) && competitionsArr.Count() == 0)))
select new
{
contId = cont.Id
}
).Distinct();
// get selection of home teams based on contest
var homecomps = (from fixt in unitOfWork.FixtureDetailsRepository.Find()
where
teams.Any(t => t.tId == fixt.HomeTeam.Id) &&
conts.Any(c => c.contId == fixt.Contest.Id)
select new
{
teamId = fixt.HomeTeam.Id,
teamName = fixt.HomeTeam.Name,
countryId = fixt.HomeTeam.Venue.Country.Id != null ? fixt.HomeTeam.Venue.Country.Id : 0,
countryName = fixt.HomeTeam.Venue.Country.Id != null ? fixt.HomeTeam.Venue.Country.Name : string.Empty,
compId = fixt.Contest.Competition.Id,
compDesc = fixt.Contest.Competition.Description
}).Distinct();
// get selection of away teams based on contest
var awaycomps = (from fixt in unitOfWork.FixtureDetailsRepository.Find()
where
teams.Any(t => t.tId == fixt.AwayTeam.Id) &&
conts.Any(c => c.contId == fixt.Contest.Id)
select new
{
teamId = fixt.AwayTeam.Id,
teamName = fixt.AwayTeam.Name,
countryId = fixt.AwayTeam.Venue.Country.Id != null ? fixt.AwayTeam.Venue.Country.Id : 0,
countryName = fixt.AwayTeam.Venue.Country.Id != null ? fixt.AwayTeam.Venue.Country.Name : string.Empty,
compId = fixt.Contest.Competition.Id,
compDesc = fixt.Contest.Competition.Description
}).Distinct();
// ensure that we return the max competition based on id for home teams
var homemax = (from t in homecomps
group t by t.teamId
into grp
let maxcomp = grp.Max(g => g.compId)
from g in grp
where g.compId == maxcomp
select g).Distinct();
// ensure that we return the max competition based on id for away teams
var awaymax = (from t in awaycomps
group t by t.teamId
into grp
let maxcomp = grp.Max(g => g.compId)
from g in grp
where g.compId == maxcomp
select g).Distinct();
var filteredteams = homemax.Union(awaymax).OrderBy(t => t.teamName).AsQueryable();
As you can see we want to return the following format which is passed across to a WebAPI so we cast the results to types we can relate to in the UI.
Essentially what we are trying to do is get the home and away teams from a fixture, these fixtures have a contest which relates to a competition. We then get the highest competition id from the grouping and then this is returned with that team. The country is related to the team based on the venue id, when I was originally doing this i had problems figuring out how to do OR joins in linq which is why i split it down to getting home teams and away team and then grouping them based on competition then unioning them together.
An idea of current table size is fixtures has 7840 rows, teams has 8581 rows, contests has 337 rows and competitions has 96 rows. The table that is likely to increase rapidly is the fixture table as this is related to football.
The output we want to end up with is
Team Id, Team Name, Country Id, Country Name, Competition Id, Competition Name
Using no filtering this query takes on average around 5 secs, just wondering if anybody has any ideas/pointers on how to make it quicker.
thanks in advance Mark
I can't judge whether it will speed up things, but your homemax and awaymax queries could be
var homemax = from t in homecomps
group t by t.teamId into grp
select grp.OrderByDescending(x => x.compId).FirstOrDefault();
var awaymax = from t in awaycomps
group t by t.teamId into grp
select grp.OrderByDescending(x => x.compId).FirstOrDefault();
Further, as you are composing one very large query it may perform better when you cut it up in a few smaller queries that fetch intermediary results. Sometimes a few more roundtrips to the database perform better than one very large query for which the database engine can't find a good execution plan.
Another thing is all these Distinct()s. Do you always need them? I think you can do without because you are always fetching data from one table without joining a child collection. Removing them may save a bunch.
Yet another optimization could be to remove the ToUpper. The comparison is done by the database engine in SQL and chances are that the database has a case-insensitive collation. If so, the comparison is never case sensitive even if you'd want it to be! Constructs like Name.ToUpper cancel the use of any index on Name (it is not sargable).

BLToolKit, Linq Query, SQL Not what I expected

I am using BLToolKit in a project of mine and I was trying to get this to work. What I don't like is that I am trying to average a bunch of temps down to the minute, but the select statement that is being generated groups by the minute but then selects the original time. I think I am doing the linq expression correctly (but then again, i am not getting the results i expect). (this is C#, if you care) Anyone know what is going wrong?
var test = (from r in db.SensorReadingRaws
where r.TimeLogged < DateTime.Now.AddMinutes(-2)
group r by new
{
Sensor = r.SensorNumber,
//group time down to the minute
Time = r.TimeLogged.AddSeconds(-1 * r.TimeLogged.Second).AddMilliseconds(-1 * r.TimeLogged.Millisecond)
} into grouped
select new SensorReading
{
SensorNumber = grouped.Key.Sensor,
TimeLogged = grouped.Key.Time,
Reading = (int)grouped.Average(x => x.Reading)
}).ToList();
textBox1.Text = db.LastQuery;
and the resulting query is this
SELECT
[r].[SensorNumber],
[r].[TimeLogged],
Avg([r].[Reading]) as [c1]
FROM
[SensorReadingRaw] [r]
WHERE
[r].[TimeLogged] < #p1
GROUP BY
[r].[SensorNumber],
DateAdd(Millisecond, Convert(Float, -DatePart(Millisecond, [r].[TimeLogged])), DateAdd(Second, Convert(Float, -DatePart(Second, [r].[TimeLogged])), [r].[TimeLogged])),
[r].[TimeLogged]
I discovered that
BLToolkit.Data.Linq.Sql.AsSql<T>(T obj)
can be used as a workaround for this case.
When applying this function to the required grouped key properties in select statement you get rid of grouping/selecting an original field.
It may look something like:
_queryStore.Leads().
GroupBy(x => new {
x.LeadDate.Hour,
x.LeadDate.Minute
}).
Select(x => new {
Hour = Sql.AsSql(x.Key.Hour),
Minute = Sql.AsSql(x.Key.Minute),
Count = x.Count()
});
and in your particular case:
var test = (from r in db.SensorReadingRaws
where r.TimeLogged < DateTime.Now.AddMinutes(-2)
group r by new
{
Sensor = r.SensorNumber,
//group time down to the minute
Time = r.TimeLogged.AddSeconds(-1 * r.TimeLogged.Second).AddMilliseconds(-1 * r.TimeLogged.Millisecond)
} into grouped
select new SensorReading
{
SensorNumber = grouped.Key.Sensor,
TimeLogged = Sql.AsSql(grouped.Key.Time),
Reading = (int)grouped.Average(x => x.Reading)
}).ToList();
I got same issue yesterday.
Today I found a workaround. The idea is to write 2 linq queries. First transforming the data and the second grouping the result:
var bandAndDate =
(from r in repo.Entities
select new {Band = r.Score / 33, r.StartTime.Date});
var examsByBandAndDay =
(from r in bandAndDate
group r by new {r.Band, r.Date } into g
select new { g.Key.Date, g.Key.Band, Count = g.Count() }).ToList();
Both this queries run one SQL that do the job:
SELECT
[t1].[c1] as [c11],
[t1].[c2] as [c21],
Count(*) as [c3]
FROM
(
SELECT
[r].[Score] / 33 as [c2],
Cast(Floor(Cast([r].[StartTime] as Float)) as DateTime) as [c1]
FROM
[Results] [r]
) [t1]
GROUP BY
[t1].[c2],
[t1].[c1]

In Linq2SQL, how do I get a record plus the previous and next in the sequence in a single query?

Given a date, what is the most efficient way to query the last record before that date, any record that equals that date, and the next one after that date.
It should be functionally equivalent to a query like this:
from asset in Assets
where asset.Id == assetId
select new {
Previous = (from a in a.Orders where a.Date < myDate orderby a.Date descending select a).FirstOrDefault(),
Current = (from a in a.Orders where a.Date == myDate select a).SingleOrDefault(),
Next = (from a in a.Orders where a.Date > myDate orderby a.Date select a).FirstOrDefault()
}
As is, this query runs three queries, and presumably has to sort the dataset by myDate three times to do it.
Some similar questions:
How do I get 5 records before AND after a record with a specific ID? (just uses two queries)
How do I get records before and after given one? Not in Linq, and therefore hard for me to take advantage of (my team will get annoyed).
To provide the "most efficient" query depends on what you mean by efficient.
If you want a single query to the database, a single sort of orders by date and finally fast look-ups by date then I suggest the following might be the most efficient. :-)
var orders =
(from a in Assets
where a.Id == assetId
from o in a.Orders
orderby o.Date
select o).ToArray();
var previous = orders.LastOrDefault(o => o.Date < myDate);
var current = orders.SingleOrDefault(o => o.Date == myDate);
var next = orders.FirstOrDefault(o => o.Date > myDate);
This should query the database once for the orders associated with the required asset Id, sort them by date, and return them as an array in memory. Since this is in memory it is now blindingly fast to look for the current, previous & next records for the specified date.
Does your Orders table have a sequential ID field? If so, you might be able to do it with:
from asset in Assets
where asset.Id == assetID
let current = asset.Orders.Where(x => x.Date == myDate).FirstOrDefault()
where current != null
let previous = asset.Orders.Where(x => x.id == current.id - 1).FirstOrDefault()
let next = asset.Orders.Where(x => x.id == current.id + 1).FirstOrDefault()
select new {
Previous = previous,
Current = current,
Next = next
};
If it doesn't, then it'd be a bit more code:
from asset in Assets
where asset.Id == assetID
let current = asset.Orders.Where(x => x.Date == myDate).FirstOrDefault()
where current != null
let previous = asset.Orders.Where(x => x.Date < current.Date).OrderByDescending(x => x.Date).FirstOrDefault()
let next = asset.Orders.Where(x => x.Date > current.Date).OrderBy(x => x.Date).FirstOrDefault()
select new {
Previous = previous,
Current = current,
Next = next
};
That should get compiled into a single SQL query that utilizes sub-queries. IE: the database server will execute multiple queries, but your client program is only submitting one.
Edit One other idea that would work if your Order table had sequential IDs:
var sample = (from asset in Assets
where asset.Id == assetID
let current = asset.Orders.Where(x => x.Date == myDate).FirstOrDefault()
where current != null
from order in asset.Orders
where order.Id == current.id - 1
select order)
.Take(3)
.ToArray();
var Previous = sample[0];
var Current = sample[1];
var Next = sample[2];
Other Answers, for example, SkipWhile etc. very very slow. Good luck ^^
//Current Record
var query
= (from item in db.Employee
where item.UserName.Equals(_username)
select item).SingleOrDefault();
//Next Record
var query
= (from item in db.Employee
where item.UserName.CompareTo(_username) > 0
select item).FirstOrDefault();
//Previous Record
var query
= (from item in db.Employee
where item.UserName.CompareTo(_username) < 0
orderby item.UserName Descending
select item).FirstOrDefault();
Almost the same, but the SQL query plan might be different.
var q =
from asset in Assets
where asset.Id == assetID
select new
{
Previous = asset.Orders.where(a => a.Date == asset.Orders.Where(x => x.Date < myDate).Max(x => x.Date)).FirstOrDefault(),
Current = asset.Orders.Where(x => x.Date == myDate).FirstOrDefault(),
Next = asset.Orders.where(a => a.Date == asset.Orders.Where(x => x.Date > myDate).Min(x => x.Date)).FirstOrDefault()
};

Resources