LINQ Out of Memory Error

LINQ Out of Memory Error - performance

I am querying 200k records and using up all the server's memory (no surprise). I am new to LINQ so I found the following code that should help me but I don't know how to use it:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> collection, int batchSize)
{
List<T> nextbatch = new List<T>(batchSize);
foreach (T item in collection)
{
nextbatch.Add(item);
if (nextbatch.Count == batchSize)
{
yield return nextbatch;
nextbatch = new List<T>(batchSize);
}
}
if (nextbatch.Count > 0)
yield return nextbatch;
}
Source: http://goo.gl/aQZIj
Here is my code which creates the "out of memory" error. How do I incorporate the new Batch function into my code?
var crmMetrics = _crmDbContext.tpm_metricsSet.Where(a => a.ModifiedOn >= lastRunDate);
foreach (var crmMetric in crmMetrics)
{
metric = new Metric();
metric.ProductKey = crmMetric.tpm_Product.Id;
dbContext.Metrics.Add(metric);
dbContext.SaveChanges();
}

It's an extension method, so if it is part of a static class and there is a reference to the class's namespace in your code you could do:
var crmMetricsBatches = _crmDbContext.tpm_metricsSet
.Where(a => a.ModifiedOn >= lastRunDate)
.AsEnumerable() // !!
.Batch(20);
Except it wouldn't help. By the .AsEnumerable(), you still fetch all data in memory but now in chunks of 20. This is because you can't use the method directly against IQueryable: Entity Framework will try to translate it to SQL but of course has no clue how to do that.
As said by TGH, Skip and Take are more made for this:
var crmMetricsPage = _crmDbContext.tpm_metricsSet
.Where(a => a.ModifiedOn >= lastRunDate)
.OrderBy(a => a.??) // some property you choose
.Skip(pageNo * pageSize)
.Take(pageSize);
where pageNo counts from 0 to the number of pages (- 1) you're going to need. Skip and Take are expressions, and EF knows how to convert these to SQL. The OrderBy is required for EF to know where to start skipping.
In this process, called paging, you always get pageSize records at a time. The number of queries is greater, but resources are spared. One condition is that you can determine a pageSize in advance. I don't know if this fits with your logic.
If you can't use paging you should try to narrow the filter (Where(a => a.ModifiedOn >= lastRunDate), e.g. try to get the data in batches of one day or week.

I would use Linq's Skip and Take to get the batches
Check this out:
http://www.c-sharpcorner.com/UploadFile/3d39b4/take-and-skip-operator-in-linq-to-sql/

Related

mvc3 how can i include a Count method inside a Tolist

I have a tolist method that makes changes inside the database but I would like to Count how many times my Where clause is true. The Tolist works how can i add a count..
// the count to see how many times getid== x.RegistrationID
List<Article> getprofile = (from x in db.Articles where getid == x.RegistrationID select x).ToList();
foreach (var items in getprofile)
{
items.articlecount = items.articlecount + 1;
db.Entry(items).State = EntityState.Modified;
db.SaveChanges();
}

seems like you store the number of articles in db.Article. why not stored in db.User. If you store in db.Article will be difficult because it must change Arcticle.articlecount in each user Articles. I think your code above can cause uncertainty calculation. Add article count each time you add article in AddArticle metode.
public void AddArticle(Article article)
{
... your add new article code
// increase articlecount at db.User
db.Users.Where(c => c.userid == article.getid).FirstOrDefault().articlecount += 1;
db.SubmitChanges();
}
or do not need to be stored, to get the count just:
int articlecount = db.Articles.Where(c => c.getid == RegistrationID ).Count();
I think it's safer.

Returning a list using LINQ

I want to return a list of objects stored in a database using LINQ queries.
I tried the following
public BO.Hotel getHotels()
{
TripBagEntities db = new TripBagEntities();
var hotels = (from m in db.HotelEntities
where m.id < 10
select m).ToList().First();
return Mapper.ToHotelObject(hotels);
}
This returns only the first item in the list. How can I return the entire list?
Thanks in advance

Firstly, as per the comment, you should understand exactly what each line of your existing code does first. (You should also try to follow .NET naming conventions.) If you're guessing around which bit of your code does what, it would be a good idea to read a good tutorial geared towards the LINQ provider you're using (Entity Framework?).
We don't really know what Mapper.ToHotelObject does, or whether there's already a method for converting a whole sequence. This should work though:
public List<BO.Hotel> GetHotels()
{
// Note: you may want a using statement here...
TripBagEntities db = new TripBagEntities();
var hotels = db.HotelEntities
.Where(m => m.id < 10)
.AsEnumerable()
.Select(Mapper.ToHotelObject)
.ToList();
}
Or if the method group conversion doesn't work:
public List<BO.Hotel> GetHotels()
{
// Note: you may want a using statement here...
TripBagEntities db = new TripBagEntities();
var hotels = db.HotelEntities
.Where(m => m.id < 10)
.AsEnumerable()
.Select(m => Mapper.ToHotelObject(m))
.ToList();
}
Note that I've used "dot notation" for the whole query, as it makes life simpler when you're using things like AsEnumerable and ToList, and your query expression wasn't complicated anyway.
The AsEnumerable call "shifts" the query into LINQ to Objects, so that the query-to-SQL translation part doesn't need to try to convert Mapper.ToHotelObject into SQL, which I assume would fail.

You need to make your function return List<BO.Hotel> instead of a Hotel, and adjust your query and Mapper function accordingly.

public List<BO.Hotel> getHotels()
{
TripBagEntities db = new TripBagEntities();
return (from m in db.HotelEntities
where m.id < 10
select Mapper.ToHotelObject(m)).ToList();
}

Truncating a collection using Linq query

I want to extract part of a collection to another collection.
I can easily do the same using a for loop, but my linq query is not working for the same.
I am a neophyte in Linq, so please help me correcting the query (if possible with explanation / beginners tutorial link)
Legacy way of doing :
Collection<string> testColl1 = new Collection<string> {"t1", "t2", "t3", "t4"};
Collection<string> testColl2 = new Collection<string>();
for (int i = 0; i < newLength; i++)
{
testColl2.Add(testColl1[i]);
}
Where testColl1 is the source & testColl2 is the desired truncated collection of count = newLength.
I have used the following linq queries, but none of them are working ...
var result = from t in testColl1 where t.Count() <= newLength select t;
var res = testColl1.Where(t => t.Count() <= newLength);

Use Enumerable.Take:
var testColl2 = testColl1.Take(newLength).ToList();
Note that there's a semantic difference between your for loop and the version using Take. The for loop will throw with IndexOutOfRangeException exception if there are less than newLength items in testColl1, whereas the Take version will silently ignore this fact and just return as many items up to newLength items.

The correct way is by using Take:
var result = testColl1.Take(newLength);
An equivalent way using Where is:
var result = testColl1.Where((i, item) => i < newLength);
These expressions will produce an IEnumerable, so you might also want to attach a .ToList() or .ToArray() at the end.
Both ways return one less item than your original implementation does because it is more natural (e.g. if newLength == 0 no items should be returned).

You could convert to for loop to something like this:
testColl1.Take(newLength)

Use Take:
var result = testColl1.Take(newLength);
This extension method returns the first N elements from the collection where N is the parameter you pass, in this case newLength.

Can I use LINQ to retrieve only "on change" values?

What I'd like to be able to do is construct a LINQ query that retrieved me a few values from some DataRows when one of the fields changes. Here's a contrived example to illustrate:
Observation Temp Time
------------- ---- ------
Cloudy 15.0 3:00PM
Cloudy 16.5 4:00PM
Sunny 19.0 3:30PM
Sunny 19.5 3:15PM
Sunny 18.5 3:30PM
Partly Cloudy 16.5 3:20PM
Partly Cloudy 16.0 3:25PM
Cloudy 16.0 4:00PM
Sunny 17.5 3:45PM
I'd like to retrieve only the entries when the Observation changed from the previous one. So the results would include:
Cloudy 15.0 3:00PM
Sunny 19.0 3:30PM
Partly Cloudy 16.5 3:20PM
Cloudy 16.0 4:00PM
Sunny 17.5 3:45PM
Currently there is code that iterates through the DataRows and does the comparisons and construction of the results but was hoping to use LINQ to accomplish this.
What I'd like to do is something like this:
var weatherStuff = from row in ds.Tables[0].AsEnumerable()
where row.Field<string>("Observation") != weatherStuff.ElementAt(weatherStuff.Count() - 1) )
select row;
But that doesn't work - and doesn't compile since this tries to use the variable 'weatherStuff' before it is declared.
Can what I want to do be done with LINQ? I didn't see another question like it here on SO, but could have missed it.

Here is one more general thought that may be intereting. It's more complicated than what #tvanfosson posted, but in a way, it's more elegant I think :-). The operation you want to do is to group your observations using the first field, but you want to start a new group each time the value changes. Then you want to select the first element of each group.
This sounds almost like LINQ's group by but it is a bit different, so you can't really use standard group by. However, you can write your own version (that's the wonder of LINQ!). You can either write your own extension method (e.g. GroupByMoving) or you can write extension method that changes the type from IEnumerable to some your interface and then define GroupBy for this interface. The resulting query will look like this:
var weatherStuff =
from row in ds.Tables[0].AsEnumerable().AsMoving()
group row by row.Field<string>("Observation") into g
select g.First();
The only thing that remains is to define AsMoving and implement GroupBy. This is a bit of work, but it is quite generally useful thing and it can be used to solve other problems too, so it may be worth doing it :-). The summary of my post is that the great thing about LINQ is that you can customize how the operators behave to get quite elegant code.
I haven't tested it, but the implementation should look like this:
// Interface & simple implementation so that we can change GroupBy
interface IMoving<T> : IEnumerable<T> { }
class WrappedMoving<T> : IMoving<T> {
public IEnumerable<T> Wrapped { get; set; }
public IEnumerator<T> GetEnumerator() {
return Wrapped.GetEnumerator();
}
public IEnumerator<T> GetEnumerator() {
return ((IEnumerable)Wrapped).GetEnumerator();
}
}
// Important bits:
static class MovingExtensions {
public static IMoving<T> AsMoving<T>(this IEnumerable<T> e) {
return new WrappedMoving<T> { Wrapped = e };
}
// This is (an ugly & imperative) implementation of the
// group by as described earlier (you can probably implement it
// more nicely using other LINQ methods)
public static IEnumerable<IEnumerable<T>> GroupBy<T, K>(this IEnumerable<T> source,
Func<T, K> keySelector) {
List<T> elementsSoFar = new List<T>();
IEnumerator<T> en = source.GetEnumerator();
if (en.MoveNext()) {
K lastKey = keySelector(en.Current);
do {
K newKey = keySelector(en.Current);
if (newKey != lastKey) {
yield return elementsSoFar;
elementsSoFar = new List<T>();
}
elementsSoFar.Add(en.Current);
} while (en.MoveNext());
yield return elementsSoFar;
}
}

You could use the IEnumerable extension that takes an index.
var all = ds.Tables[0].AsEnumerable();
var weatherStuff = all.Where( (w,i) => i == 0 || w.Field<string>("Observation") != all.ElementAt(i-1).Field<string>("Observation") );

This is one of those instances where the iterative solution is actually better than the set-based solution in terms of both readability and performance. All you really want Linq to do is filter and pre-sort the list if necessary to prepare it for the loop.
It is possible to write a query in SQL Server (or various other databases) using windowing functions (ROW_NUMBER), if that's where your data is coming from, but very difficult to do in pure Linq without making a much bigger mess.
If you're just trying to clean the code up, an extension method might help:
public static IEnumerable<T> Changed(this IEnumerable<T> items,
Func<T, T, bool> equalityFunc)
{
if (equalityFunc == null)
{
throw new ArgumentNullException("equalityFunc");
}
T last = default(T);
bool first = true;
foreach (T current in items)
{
if (first || !equalityFunc(current, last))
{
yield return current;
}
last = current;
first = false;
}
}
Then you can call this with:
var changed = rows.Changed((r1, r2) =>
r1.Field<string>("Observation") == r2.Field<string>("Observation"));

I think what you are trying to accomplish is not possible using the "syntax suggar". However it could be possible using the extension method Select that pass the index of the item you are evaluating. So you could use the index to compare the current item with the previous one (index -1).

You could useMorelinq's GroupAdjacent() extension method
GroupAdjacent: Groups the adjacent elements of a sequence according to
a specified key selector function...This method has 4 overloads.
You would use it like this with the result selector overload to lose the IGrouping key:-
var weatherStuff = ds.Tables[0].AsEnumerable().GroupAdjacent(w => w.Field<string>("Observation"), (_, val) => val.Select(v => v));
This is a very popular extension to default Linq methods, with more than 1M downloads on Nuget (compared to MS's own Ix.net with ~40k downloads at time of writing)

LINQ to SQL bug (or very strange feature) when using IQueryable, foreach, and multiple Where

I ran into a scenario where LINQ to SQL acts very strangely. I would like to know if I'm doing something wrong. But I think there is a real possibility that it's a bug.
The code pasted below isn't my real code. It is a simplified version I created for this post, using the Northwind database.
A little background: I have a method that takes an IQueryable of Product and a "filter object" (which I will describe in a minute). It should run some "Where" extension methods on the IQueryable, based on the "filter object", and then return the IQueryable.
The so-called "filter object" is a System.Collections.Generic.List of an anonymous type of this structure: { column = fieldEnum, id = int }
The fieldEnum is an enum of the different columns of the Products table that I would possibly like to use for the filtering.
Instead of explaining further how my code works, it's easier if you just take a look at it. It's simple to follow.
enum filterType { supplier = 1, category }
public IQueryable<Product> getIQueryableProducts()
{
NorthwindDataClassesDataContext db = new NorthwindDataClassesDataContext();
IQueryable<Product> query = db.Products.AsQueryable();
//this section is just for the example. It creates a Generic List of an Anonymous Type
//with two objects. In real life I get the same kind of collection, but it isn't hard coded like here
var filter1 = new { column = filterType.supplier, id = 7 };
var filter2 = new { column = filterType.category, id = 3 };
var filterList = (new[] { filter1 }).ToList();
filterList.Add(filter2);
foreach(var oFilter in filterList)
{
switch (oFilter.column)
{
case filterType.supplier:
query = query.Where(p => p.SupplierID == oFilter.id);
break;
case filterType.category:
query = query.Where(p => p.CategoryID == oFilter.id);
break;
default:
break;
}
}
return query;
}
So here is an example. Let's say the List contains two items of this anonymous type, { column = fieldEnum.Supplier, id = 7 } and { column = fieldEnum.Category, id = 3}.
After running the code above, the underlying SQL query of the IQueryable object should contain:
WHERE SupplierID = 7 AND CategoryID = 3
But in reality, after the code runs the SQL that gets executed is
WHERE SupplierID = 3 AND CategoryID = 3
I tried defining query as a property and setting a breakpoint on the setter, thinking I could catch what's changing it when it shouldn't be. But everything was supposedly fine. So instead I just checked the underlying SQL after every command. I realized that the first Where runs fine, and query stays fine (meaning SupplierID = 7) until right after the foreach loop runs the second time. Right after oFilter becomes the second anonymous type item, and not the first, the 'query' SQL changes to Supplier = 3. So what must be happening here under-the-hood is that instead of just remembering that Supplier should equal 7, LINQ to SQL remembers that Supplier should equal oFilter.id. But oFilter is a name of a single item of a foreach loop, and it means something different after it iterates.

I have only glanced at your question, but I am 90% sure that you should read the first section of On lambdas, capture, and mutability (which includes links to 5 similar SO questions) and all will become clear.
The basic gist of it is that the variable oFilter in your example has been captured in the closure by reference and not by value. That means that once the loop finishes iterating, the variable's reference is to the last one, so the value as evaluated at lambda execution time is the final one as well.
The cure is to insert a new variable inside the foreach loop whose scope is only that iteration rather than the whole loop:
foreach(var oFilter in filterList)
{
var filter = oFilter; // add this
switch (oFilter.column) // this doesn't have to change, but can for consistency
{
case filterType.supplier:
query = query.Where(p => p.SupplierID == filter.id); // use `filter` here
break;
Now each closure is over a different filter variable that is declared anew inside of each loop, and your code will run as expected.

Working as designed. The issue you are confronting is the clash between lexical closure and mutable variables.
What you probably want to do is
foreach(var oFilter in filterList)
{
var o = oFilter;
switch (o.column)
{
case filterType.supplier:
query = query.Where(p => p.SupplierID == o.id);
break;
case filterType.category:
query = query.Where(p => p.CategoryID == o.id);
break;
default:
break;
}
}
When compiled to IL, the variable oFilter is declared once and used multiply. What you need is a variable declared separately for each use of that variable within a closure, which is what o is now there for.
While you're at it, get rid of that bastardized Hungarian notation :P.

I think this is the clearest explanation I've ever seen: http://blogs.msdn.com/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx:
Basically, the problem arises because we specify that the foreach loop is a syntactic sugar for
{
IEnumerator<int> e = ((IEnumerable<int>)values).GetEnumerator();
try
{
int m; // OUTSIDE THE ACTUAL LOOP
while(e.MoveNext())
{
m = (int)(int)e.Current;
funcs.Add(()=>m);
}
}
finally
{
if (e != null) ((IDisposable)e).Dispose();
}
}
If we specified that the expansion was
try
{
while(e.MoveNext())
{
int m; // INSIDE
m = (int)(int)e.Current;
funcs.Add(()=>m);
}
then the code would behave as expected.

The problem is that you're not appending to the query, you're replacing it each time through the foreach statement.
You want something like the PredicateBuilder - http://www.albahari.com/nutshell/predicatebuilder.aspx

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

LINQ Out of Memory Error - performance

I would use Linq's Skip and Take to get the batches Check this out: http://www.c-sharpcorner.com/UploadFile/3d39b4/take-and-skip-operator-in-linq-to-sql/

Related

mvc3 how can i include a Count method inside a Tolist

Returning a list using LINQ

Truncating a collection using Linq query

Can I use LINQ to retrieve only "on change" values?

LINQ to SQL bug (or very strange feature) when using IQueryable, foreach, and multiple Where

Categories

Resources