Using Linq to get summaries and count - linq

I have a SQL Server database with an Events table that stores around a million events, each with a dateStart, dateEnd, title, rating and other bits.
I need to display a list of years, where each year displays the 5 highest rated events and the total count of events in that year.
So, something like...
Top 5 events for 2009 (from 199 events)
- Event A
- Event B
- Event C
- Event D
- Event E
Top 5 events for 2010 (from 469 events)
- Event F
- Event G
- Event H
- Event I
- Event J
.... etc.
Because of the sheer quantity of records, I'd like to avoid a Linq query that will pull everything out of the database, but I don't know if that is possible and my Linq knowledge is not enough to work out how this would work.
What is the most efficient method for retrieving this data structure from the database?
Thanks a lot in advance - been mangling my brain all day trying to work it out.

Well, let's assume that you only care about the start date. You'd want something like:
var query = from eventInfo in db.Events
group eventInfo by eventInfo.dateStart.Year into g
select
{
Year = g.Key, // Key for the grouping
Count = g.Count(), // Events for this year
Top5 = g.OrderByDescending(e => e.Rating).Take(5)
};
Frankly I have no idea what kind of SQL that will produce, or if it'll be even vaguely efficient - you should find out the SQL generated, and then look in the SQL profiler to see that the query execution plan is.

You will have to use an ORM, there are several possibilites, including (but not limited to) Linq-To-Sql, Entity Framework, NHibernate (I'm not sure about LINQ support here), etc.
All of those will translate the LINQ query operators to an sql query that will be executed in the database, and will only return the information you need or requested.

Related

Sage 200 workspace creation - linq

New to stack but hoping for some slight help.
I am new to the linq coding language but I have programmed in vba, some sql etc.
We had a company produce some dev work to create various new bespoke screens/fields into our sage 200 extra (an extra 20 to 30 data contexts with each containing multiple fields and varying primary keys which don't match between them. I want to show list views through workspaces from 2 or 3 different data contexts but I am struggling how to join the data contexts so that the list view will show all data from the fields of the 3 data contexts, however, I am struggling with the correct linq coding to join the data contexts, below is the code from one of them but this is showing purely the headers with no data, so I assume this is due to the wrong 'join' statement.
Please can you correct or give tips for the below, so I can further understand how linq works. The example shows the standard sage 200 data context SOPOrderReturns from sage 200 as well as the bespoke we added called I8TFWorkFlows, which I need to join to display along with SOPOrderReturns:
var q = from SOP in cxt.SOPOrderReturns
from WF in cxt.I8TFWorkFlows
select new
{
SOP.DocumentNo,
SOP.DocumentDate,
DocumentStatus = SOP.DocumentStatus.Name,
SOP.SOPOrderReturnID,
WF.I8TFWorkflowID,
WF.ColourProofRequestedDate,
WF.ColourProofApprovedDate,
WF.ColorProofSentDate,
WF.AmendmentToOrder,
};
return q;

Compound "from" clause in Linq query in Entity Framework

I've been working with Entity Framework for a few weeks now. I have been working with Linq-Objects and Linq-SQL for years. A lot of times, I like to write linq statements like this:
from student in db.Students
from score in student.Scores
where score > 90
select student;
With other forms of linq, this returns distinct students who have at least one score greater than 90. However, in EF this query returns one student for every score greater than 90.
Does anyone know if this behavior can be replicated in unit tests? Is it possible this is a bug in EF?
I don't like that SQL-like syntax (I have no better name for it), especially when you start nesting them.
var students = db.Students.Where(student
=> student.Scores.Any(score => score > 90)
)
.ToList();
This snippet, using the method syntax, does the same thing. I find it far more readable. It's more explicit in the order of operations used.
And as far as I have experienced, EF hasn't yet shown a bug with its selection using method syntax.
Edit
To actually answer your problem:
However, in EF this query returns one student for every score greater than 90.
I think is is due to a JOIN statement used in the final SQL that will be run. This is why I avoid SQL-like syntax, because it becomes very hard to differentiate between what you want to retrieve (students) and what you want to filter with (scores).
Much like you would do in SQL, you are joining the data from students and scores, and then running a filtering operation on that collection. It becomes harder to then unseparate that result again into a collection of students. I think this is the main cause of your issue. It's not a bug per sé, but I think EF can only handle it one way.
Alternative solutions to the above:
If it returns one student per score over 90, take the distinct students returned. It should be the same result set.
Use more explicit parentheses () and formatting to nest separate SQL-like statements.
Note: I'm not saying it can't be done with SQL-like syntax. I am well aware that most of this answer is opinion based.

Lightswitch 2013 Linq queries to Get min value

I'm writing a timesheet application (Silverlight) and I'm completely stuck on getting linq queries working. I'm netw to linq and I just read, and did many examples from, a Linq book, including Linq to Objects, linq to SQl and linq to Entities.(I assume, but am not 100% sure that the latter is what Lightswitch uses). I plan to study a LOT more Linq, but just need to get this one query working.
So I have an entity called Items which lists every item in a job and it's serial no
So: Job.ID int, ID int, SerialNo long
I also have a Timesheets entity that contains shift dates, job no and start and end serial no produced
So Job.ID int, ShiftDate date, Shift int, StartNo long, EndNo long
When the user select a job from an autocomplete box, I want to look up the MAX(SerialNo) for that job in the timesheets entity. If that is null (i.e. none have been produced), I want to lookup the MIN(SerialNo) from the Items entity for that job (i.e. what's the first serial no they should produce)
I realize I need a first or default and need to specify the MIN(SerialNo) from Items as a default.
My Timesheet screen uses TimesheetProperty as it's datasource
I tried the following just to get the MAX(SerialNo) from Timesheets entity:
var maxSerialNo =
(from ts in this.DataWorkspace.SQLData.Timesheets
where ts.Job.ID == this.TimesheetProperty.Job.ID
select ts.StartNo).Min();
but I get the following errors:
Instance argument: cannot convert from 'Microsoft.LightSwitch.IDataServiceQueryable' to 'System.Collections.Generic.IEnumerable
'Microsoft.LightSwitch.IDataServiceQueryable' does not contain a definition for 'Min' and the best extension method overload 'System.Linq.Enumerable.Min(System.Collections.Generic.IEnumerable)' has some invalid arguments
I also don't get why I can't use this:
var maxSerialNo = this.DataWorkspace.SQLData.Timesheets.Min(ts => ts.StartNo);
Can anyone point me in the right direction?
Thanks
Mark
IDataServiceQueryable doesn't support full set of LINQ operator like IEnumerable has.
IDataServiceQueryable – This is a LightSwitch-specific type that allows a restricted set of “LINQ-like” operators that are remote-able to the middle-tier and ultimately issued to the database server. This interface is the core of the LightSwitch query programming model. IDataServiceQueryable has a member to execute the query, which returns results that are IEnumerable. [Reference]
Possible solution is, execute your query first to get collection of type IEnumerable by calling .ToList(), then you can call .Min() against the first query result. But that isn't good idea if you have large amount of data, because .ToList() will retrieve all data match the query and do further processing in client side, which is inefficient.
Another way is, change your query using only operators supported by IDataServiceQueryable to avoid retrieving unnecessary data to client. For example, to get minimum StartNo you can try to use orderby descending then get the first data instead of using .Min() operator :
var minStartNo =
(
from ts in this.DataWorkspace.SQLData.Timesheets
where ts.Job.ID == this.TimesheetProperty.Job.ID
orderby ts.StartNo descending select ts
).FirstOrDefault();

How to use LINQ (or TSQL) to group dataset but include top item and counts

I'm struggling with how to approach this. I have your basic dataset returned by a TSQL (SQL2005) query where the data contains a list of master and detail items. Pretty vanilla, you have your master table joined to your detail table so you might get mulitple rows back per master record.
Something like:
ID / Item Descr / Subitem Descr
1 / Jane Doe / shoes
1 / Jane Doe / hats
2 / John Smith / hats
What I'd like to get to do is "flatten" that out a bit. So something like:
ID / Item Descr / Count / Most Recent Subitem
1 / Jane Doe / 2 / shoes
2 / John Smith / 1 / hats
Any suggestions on the sql query, or perhaps a LINQ query that I could run on the dataset I get back from the initial sql query...?
var q = from i in Context.Items
group i by i.ItemDescr into g
select new
{
ID = g.FirstOrDefault().ID,
ItemDescr = g.Key,
Count = g.Count(),
MostRecentSubItem = g.FirstOrDefault().SubitemDescr // you don't show how to pick the "most recent"
};
Not quite sure what you mean by "top" but this is a start if you want to do it in SQL:
SELECT id, ItemDesc, COUNT(*), MAX(SubItemDesc)
FROM MyTable
GROUP BY id, ItemDesc
I ended up using a correlated sub-query, which always make me nervous from a performance perspective. I don't have numbers to back that up and am not great at reading Execution Plans, it's just a gut feeling.
Lot's of good stackoverflow examples on it that got me down that path(like: Advanced SQL query with sub queries, group by, count and sum functions in to SQLalchemy and Variant use of the GROUP BY clause in TSQL)
I was wondering if using any of the new windowing functions (row_number() or rank()) would have helped, but I didn't get very far down that line.
I still wonder if it would be better to just pull back all the master-detail rows, then "post process" them using LINQ would be better. Our SQL Server would throw back that dataset to the web server pretty darn fast, then "distribute" the work load by doing LINQ on the web server, finally binding it all up to my ASP.NET gridview and sending it the client. A lot of moving parts, but could be interesting...

Linq-to-NHibernate OrderBy Not Working

I'm trying order a Linq to NHibernate query by the sum of it's
children.
session.Linq<Parent>().OrderBy( p => p.Children.Sum( c => c.SomeNumber ) ).ToList()
This does not seem to work. When looking at NHProf, I can see that it
is ordering by Parent.Id. I figured that maybe it was returning the
results and ordering them outside of SQL, but if I add a .Skip
( 1 ).Take( 1 ) to the Linq query, it still orders by Parent.Id.
I tried doing this with an in memory List and it works just
fine.
Am I doing something wrong, or is this an issue with Linq to
NHibernate?
I'm sure I could always return the list, then do the operations on
them, but that is not an ideal workaround, because I don't want to
return all the records.
To order a query by an aggregate value, you need to use a group by query. In your example you need to use a 'group by' with a join.
The equivelant SQL would be something like:
select id, sum(child.parentid) as childsum from dbo.Parent
inner join child on
parent.id= child.parentid
group by id
order by childsum desc
If only Linq2NH could do this for us... but it can't sadly. You can do it in HQL as long as you're ok with getting the id, and the sum back, but not the whole object.
I battled with Linq2NH for months before I abandoned it. Its slow, doesn't support 2nd level cache and only supports VERY basic queries. (at least when I abandoned it 6 months ago - it may have come along leaps and bounds!) Its fine if you are doing a simple home-made application. If your application is even remotely complex ditch it and spend a bit of time getting to know HQL: a little harder to understand but much more powerful.

Resources