Execution Timeout Expired in nested query with Group By in .NET Core 6 - performance

I have this query to fetch the data from Database, but the problem is taking too much time until the it throws below exception:
Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
I tried to add below line in Program.cs
opt.CommandTimeout((int)TimeSpan.FromMinutes(5).TotalSeconds);
but that will not solve the issue and even thought it throws the same exception again.
the query is:
result = await _context.SomeObjects
.AsNoTracking()
.GroupBy(x => new { x.Location, x.Date })
.Select(x => new MyObject
{
Date = x.Key.Date,
Location = x.Key.Location,
MaxValue = x.Max(x => x.Value),
MinValue = x.Min(x => x.Value),
Mean = x.Select(x => (double)x.Value).Mean() ,
Value = x.Sum(x=>(long)x.Value),
Median = x.Select(x=> (double)x.Value).Median(),
Open = x.OrderByDescending(x=>x.Date).First().Value
}).ToListAsync();
Is there anyway to enhance the query, I tried many approach but in every try it throws different exception.
Update:
After start using
MathNet.Numerics.Statistics;
it start throwing
Unable to translate a collection subquery in a projection since either parent or the subquery doesn't project necessary information required to
uniquely identify it and correctly generate results on the client side. This can happen when trying to correlate on keyless entity type.
This can also happen for some cases of projection before 'Distinct' or some shapes of grouping key in case of 'GroupBy'. These should either contain all key
properties of the entity that the operation is applied on, or only contain simple property access expressions.
but this is not the main issue the main issue is with below line
Open = x.OrderByDescending(x=>x.Date).First().Value
If I commented it out the query will working fine.

I will answer my question, I got the hint of solution from #GertArnold,
and I think there is an optimal solution which mentioned by #SvyatoslavDanyliv by comments, he suggested to do that by Stored Procedure and I think this is the optimal solution for huge amount of data, but the solution that let the code work without any exception is fetching data to backend side then do my calculations (doing the calculations by .NET).
Here is the code:
Note: I'm using MathNet.Numerics.Statistics NuGet package to calculate the Median
var allData = _context.SomeObjects.AsNoTracking().AsParallel().ToList();
var result= allData.GroupBy(x => new { x.Location, x.Date }).Select(x =>
{
var Values = x.OrderByDescending(x => x.Date).Select(x => x.Value).AsEnumerable();
return new MyObject
{
Date = x.Key.Date,
Location = x.Key.Location,
MaxValue = x.Max(x => x.Value),
MinValue = x.Min(x => x.Value),
Mean = x.Select(x => (double)x.Value).Mean(),
Value = x.Sum(x => (long)x.Value),
Median = x.Select(x => (double)x.Value).Median(),
Open = Values.First(),
Close = Values.Last(),
};
}
).AsEnumerable();

Related

Linq Select into New Object Performance

I am new to Linq, using C#. I got a big surprise when I executed the following:
var scores = objects.Select( i => new { object = i,
score1 = i.algorithm1(),
score2 = i.algorithm2(),
score3 = i.algorithm3() } );
double avg2 = scores.Average( i => i.score2); // algorithm1() is called for every object
double cutoff2 = avg2 + scores.Select( i => i.score2).StdDev(); // algorithm1() is called for every object
double avg3 = scores.Average( i => i.score3); // algorithm1() is called for every object
double cutoff3 = avg3 + scores.Select( i => i.score3).StdDev(); // algorithm1() is called for every object
foreach( var s in scores.Where( i => i.score2 > cutoff2 | i.score3 > cutoff3 ).OrderBy( i => i.score1 )) // algorithm1() is called for every object
{
Debug.Log(String.Format ("{0} {1} {2} {3}\n", s.object, s.score1, s.score2/avg2, s.score3/avg3));
}
The attributes in my new objects store the function calls rather than the values. Each time I tried to access an attribute, the original function is called. I assume this is a huge waste of time? How can I avoid this?
Yes, you've discovered that LINQ uses deferred execution. This is a normal part of LINQ, and very handy indeed for building up queries without actually executing anything until you need to - which in turn is great for pipelines of multiple operations over potentially huge data sources which can be streamed.
For more details about how LINQ to Objects works internally, you might want to read my Edulinq blog series - it's basically a reimplementation of the whole of LINQ to Objects, one method at a time. Hopefully by the end of that you'll have a much clearer idea of what to expect.
If you want to materialize the query, you just need to call ToList or ToArray to build an in-memory copy of the results:
var scores = objects.Select( i => new { object = i,
score1 = i.algorithm1(),
score2 = i.algorithm2(),
score3 = i.algorithm3() } ).ToList();

dynamic asc desc sort

I am trying to create table headers that sort during a back end call in nhibernate. When clicking the header it sends a string indicating what to sort by (ie "Name", "NameDesc") and sending it to the db call.
The db can get quite large so I also have back end filters and pagination built into reduce the size of the retrieved data and therefore the orderby needs to happen before or at the same time as the filters and skip and take to avoid ordering the smaller data. Here is an example of the QueryOver call:
IList<Event> s =
session.QueryOver<Event>(() => #eventAlias)
.Fetch(#event => #event.FiscalYear).Eager
.JoinQueryOver(() => #eventAlias.FiscalYear, () => fyAlias, JoinType.InnerJoin, Restrictions.On(() => fyAlias.Id).IsIn(_years))
.Where(() => !#eventAlias.IsDeleted);
.OrderBy(() => fyAlias.RefCode).Asc
.ThenBy(() => #eventAlias.Name).Asc
.Skip(numberOfRecordsToSkip)
.Take(numberOfRecordsInPage)
.List();
How can I accomplish this?
One way how to achieve this (one of many, because you can also use some fully-typed filter object etc or some query builder) could be like this draft:
Part one and two:
// I. a reference to our query
var query = session.QueryOver<Event>(() => #eventAlias);
// II. join, filter... whatever needed
query
.Fetch(#event => #event.FiscalYear).Eager
var joinQuery = query
.JoinQueryOver(...)
.Where(() => !#eventAlias.IsDeleted)
...
Part three:
// III. Order BY
// Assume we have a list of strings (passed from a UI client)
// here represented by these two values
var sortBy = new List<string> {"Name", "CodeDesc"};
// first, have a reference for the OrderBuilder
IQueryOverOrderBuilder<Event, Event> order = null;
// iterate the list
foreach (var sortProperty in sortBy)
{
// use Desc or Asc?
var useDesc = sortProperty.EndsWith("Desc");
// Clean the property name
var name = useDesc
? sortProperty.Remove(sortProperty.Length - 4, 4)
: sortProperty;
// Build the ORDER
order = order == null
? query.OrderBy(Projections.Property(name))
: query.ThenBy(Projections.Property(name))
;
// use DESC or ASC
query = useDesc ? order.Desc : order.Asc;
}
Finally the results:
// IV. back to query... call the DB and get the result
IList<Event> s = query
.List<Event>();
This draft is ready to do sorting on top of the root query. You can also extend that to be able to add some order statements to joinQuery (e.g. if the string is "FiscalYear.MonthDesc"). The logic would be similar, but built around the joinQuery (see at the part one)

LINQ Out of Memory Error

I am querying 200k records and using up all the server's memory (no surprise). I am new to LINQ so I found the following code that should help me but I don't know how to use it:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> collection, int batchSize)
{
List<T> nextbatch = new List<T>(batchSize);
foreach (T item in collection)
{
nextbatch.Add(item);
if (nextbatch.Count == batchSize)
{
yield return nextbatch;
nextbatch = new List<T>(batchSize);
}
}
if (nextbatch.Count > 0)
yield return nextbatch;
}
Source: http://goo.gl/aQZIj
Here is my code which creates the "out of memory" error. How do I incorporate the new Batch function into my code?
var crmMetrics = _crmDbContext.tpm_metricsSet.Where(a => a.ModifiedOn >= lastRunDate);
foreach (var crmMetric in crmMetrics)
{
metric = new Metric();
metric.ProductKey = crmMetric.tpm_Product.Id;
dbContext.Metrics.Add(metric);
dbContext.SaveChanges();
}
It's an extension method, so if it is part of a static class and there is a reference to the class's namespace in your code you could do:
var crmMetricsBatches = _crmDbContext.tpm_metricsSet
.Where(a => a.ModifiedOn >= lastRunDate)
.AsEnumerable() // !!
.Batch(20);
Except it wouldn't help. By the .AsEnumerable(), you still fetch all data in memory but now in chunks of 20. This is because you can't use the method directly against IQueryable: Entity Framework will try to translate it to SQL but of course has no clue how to do that.
As said by TGH, Skip and Take are more made for this:
var crmMetricsPage = _crmDbContext.tpm_metricsSet
.Where(a => a.ModifiedOn >= lastRunDate)
.OrderBy(a => a.??) // some property you choose
.Skip(pageNo * pageSize)
.Take(pageSize);
where pageNo counts from 0 to the number of pages (- 1) you're going to need. Skip and Take are expressions, and EF knows how to convert these to SQL. The OrderBy is required for EF to know where to start skipping.
In this process, called paging, you always get pageSize records at a time. The number of queries is greater, but resources are spared. One condition is that you can determine a pageSize in advance. I don't know if this fits with your logic.
If you can't use paging you should try to narrow the filter (Where(a => a.ModifiedOn >= lastRunDate), e.g. try to get the data in batches of one day or week.
I would use Linq's Skip and Take to get the batches
Check this out:
http://www.c-sharpcorner.com/UploadFile/3d39b4/take-and-skip-operator-in-linq-to-sql/

Groupby and where clause in Linq

I am a newbie to Linq. I am trying to write a linq query to get a min value from a set of records. I need to use groupby, where , select and min function in the same query but i am having issues when using group by clause. here is the query I wrote
var data =newTrips.groupby (x => x.TripPath.TripPathLink.Link.Road.Name)
.Where(x => x.TripPath.PathNumber == pathnum)
.Select(x => x.TripPath.TripPathLink.Link.Speed).Min();
I am not able to use group by and where together it keeps giving error .
My query should
Select all the values.
filter it through the where clause (pathnum).
Groupby the road Name
finally get the min value.
can some one tell me what i am doing wrong and how to achieve the desired result.
Thanks,
Pawan
It's a little tricky not knowing the relationships between the data, but I think (without trying it) that this should give you want you want -- the minimum speed per road by name. Note that it will result in a collection of anonymous objects with Name and Speed properties.
var data = newTrips.Where(x => x.TripPath.PathNumber == pathnum)
.Select(x => x.TripPath.TripPathLink.Link)
.GroupBy(x => x.Road.Name)
.Select(g => new { Name = g.Key, Speed = g.Min(l => l.Speed) } );
Since I think you want the Trip which has the minimum speed, rather than the speed, and I'm assuming a different data structure, I'll add to tvanfosson's answer:
var pathnum = 1;
var trips = from trip in newTrips
where trip.TripPath.PathNumber == pathnum
group trip by trip.TripPath.TripPathLink.Link.Road.Name into g
let minSpeed = g.Min(t => t.TripPath.TripPathLink.Link.Speed)
select new {
Name = g.Key,
Trip = g.Single(t => t.TripPath.TripPathLink.Link.Speed == minSpeed) };
foreach (var t in trips)
{
Console.WriteLine("Name = {0}, TripId = {1}", t.Name, t.Trip.TripId);
}

Reproduce a "DELETE NOT IN" SQL Statement via LINQ/Subsonic

I want to do something like DELETE FROM TABLE WHERE ID NOT IN (1,2,3) AND PAGEID = 9
I have a List of IDS but that could be changed if needs be. I can't work out how to get a boolean result for the LINQ parser.
Here is what Subsonic expects I think.
db.Delete(content => content.PageID == ID).Execute();
I can't work out how to do the NOT IN statement. I've tried the List.Contains method but something not quite right.
UPDATE: One alternative is to do:
var items = TABLE.Find(x => x.PageID == ID)'
foreach(var item in items)
{
item.Delete();
}
This hits the database a lot more though
When you say "something not quite right" what exactly do you mean?
I'd expect to write:
List<int> excluded = new List<int> { 1, 2, 3 };
db.Delete(content => !excluded.Contains(content.PageID)).Execute();
Note that you need to call Contains on the array of excluded values, not on your candidate. In other words, instead of saying "item not in collection" you're saying "collection doesn't contain item."
Try .Contains:
db.Delete(content => content.PageID.Contains(<Array containing ID's>).Execute();
(the above is just an example, might need some polishing for your specific situation)
I have found that this works but its not via LINQ
var table = new WebPageContentTable(_db.DataProvider);
var g = new SubSonic.Query.Delete<WebPageContent(_db.DataProvider)
.From(table)
.Where(table.ID)
.NotIn(usedID)
.Execute();
I have found that this does work and via LINQ - however it hits the database multiple times.
var f = WebPageContent.Find(x => !usedID.Any(e => e == x.ID));
if (f.Count > 0)
{
var repo = WebPageContent.GetRepo();
repo.Delete(f);
}
This I imagine would work in one hit to the database but I get an exception thrown in QueryVisitor::VisitUnary
WebPageContent.Delete(x => !usedID.Any(e => e == x.ID));

Resources