Why AsQueryable is so slow with Linq? - performance

I faced a rather stupid performance issue in my code. After a small investigation, i have found that AsQueryable method i used to cast my generic list slows down the code up to 8000 times.
So the the question is, why is that?
Here is the example
class Program
{
static void Main(string[] args)
{
var c = new ContainerTest();
c.FillList();
var s = Environment.TickCount;
for (int i = 0; i < 10000; ++i)
{
c.TestLinq(true);
}
var e = Environment.TickCount;
Console.WriteLine("TestLinq AsQueryable - {0}", e - s);
s = Environment.TickCount;
for (int i = 0; i < 10000; ++i)
{
c.TestLinq(false);
}
e = Environment.TickCount;
Console.WriteLine("TestLinq as List - {0}", e - s);
Console.WriteLine("Press enter to finish");
Console.ReadLine();
}
}
class ContainerTest
{
private readonly List<int> _list = new List<int>();
private IQueryable<int> _q;
public void FillList()
{
_list.Clear();
for (int i = 0; i < 10; ++i)
{
_list.Add(i);
}
_q = _list.AsQueryable();
}
public Tuple<int, int> TestLinq(bool useAsQ)
{
var upperBorder = useAsQ ? _q.FirstOrDefault(i => i > 7) : _list.FirstOrDefault(i => i > 7);
var lowerBorder = useAsQ ? _q.TakeWhile(i => i < 7).LastOrDefault() : _list.TakeWhile(i => i < 7).LastOrDefault();
return new Tuple<int, int>(upperBorder, lowerBorder);
}
}
UPD As i understand, i have to avoid AsQueryable method as much as possible(if it's not in the line of inheritance of the container), because i'll get immediately performance issue
"and avoid the moor in those hours of darkness when the powers of evil are exalted"

Just faced the same issue.
The thing is that IQueryable<T> takes Expression<Func<T, Bool>> as parameter for filtering in Where()/FirstOrDefault() calls - as opposed of just the Func<T, Bool> pre-compiled delegate taken in simple IEnumerable's correspondent methods.
That means there will be a compile phase to transform the Expression into a delegate. And this costs quite a lot.
Now you need that in a loop (just I did)? You'll get in some trouble...
PS: It seems .NET Core/.NET 5 improves this significantly. Unfortunately, our projects are not there yet...

at least use LINQ with List too
manual implementation will always be faster than LINQ
EDIT
you know that both test doesn't give the same result

Because AsQueryable returns an IQueryable, which has a completely different set of extension methods for the LINQ standard query operators from the one intended for things like List.
Queryable collections are meant to have a backing store of an RDBMS or something similar, and you are building a different, more complex code expression tree when you call IQueryable.FirstOrDefault() as opposed to List<>.FirstOrDefault().

Related

How to measure string interning?

I'm trying to measure the impact of string interning in an application.
I came up with this:
class Program
{
static void Main(string[] args)
{
_ = BenchmarkRunner.Run<Benchmark>();
}
}
[MemoryDiagnoser]
public class Benchmark
{
[Params(10000, 100000, 1000000)]
public int Count { get; set; }
[Benchmark]
public string[] NotInterned()
{
var a = new string[this.Count];
for (var i = this.Count; i-- > 0;)
{
a[i] = GetString(i);
}
return a;
}
[Benchmark]
public string[] Interned()
{
var a = new string[this.Count];
for (var i = this.Count; i-- > 0;)
{
a[i] = string.Intern(GetString(i));
}
return a;
}
private static string GetString(int i)
{
var result = (i % 10).ToString();
return result;
}
}
But I always end up with the same amount of allocated.
Is there any other measure or diagnostic that gives me the memory savings of using string.Intern()?
The main question here is what kind of impact do you want to measure? To be more specific: what are your target metrics? Here are some examples: performance metrics, memory traffic, memory footprint.
In the BenchmarkDotNet Allocated column, you get the memory traffic. string.Intern doesn't help to optimize it in your example, each (i % 10).ToString() call will allocate a new string. Thus, it's expected that BenchmarkDotNet shows the same numbers in the Allocated column.
However, string.Intern should help you to optimize the memory footprint of your application at the end (the total managed heap size, can be fetched via GC.GetTotalMemory()). It can be verified with a simple console application without BenchmarkDotNet:
using System;
namespace ConsoleApp24
{
class Program
{
private const int Count = 100000;
private static string[] notInterned, interned;
static void Main(string[] args)
{
var memory1 = GC.GetTotalMemory(true);
notInterned = NotInterned();
var memory2 = GC.GetTotalMemory(true);
interned = Interned();
var memory3 = GC.GetTotalMemory(true);
Console.WriteLine(memory2 - memory1);
Console.WriteLine(memory3 - memory2);
Console.WriteLine((memory2 - memory1) - (memory3 - memory2));
}
public static string[] NotInterned()
{
var a = new string[Count];
for (var i = Count; i-- > 0;)
{
a[i] = GetString(i);
}
return a;
}
public static string[] Interned()
{
var a = new string[Count];
for (var i = Count; i-- > 0;)
{
a[i] = string.Intern(GetString(i));
}
return a;
}
private static string GetString(int i)
{
var result = (i % 10).ToString();
return result;
}
}
}
On my machine (Linux, .NET Core 3.1), I got the following results:
802408
800024
2384
The first number and the second number are the memory footprint impacts for both cases. It's pretty huge because the string array consumes a lot of memory to keep the references to all the string instances.
The third number is the footprint difference between the footprint impact of interned and not-interned string. You may ask why it's so small. This can be easily explained: Stephen Toub implemented a special cache for single-digit strings in dotnet/coreclr#18383, it's described in his blog post:
So, it doesn't make sense to measure interning of the "0".."9" strings on .NET Core. We can easily modify our program to fix this problem:
private static string GetString(int i)
{
var result = "x" + (i % 10).ToString();
return result;
}
Here are the updated results:
4002432
800344
3202088
Now the impact difference (the third number) is pretty huge (3202088). It means that interning helped us to save 3202088 bytes in the managed heap.
So, there are the most important recommendation for your future experiments:
Carefully define metrics that you actually want to measure. Don't say "I want to find all kinds of affected metrics," any changes in the source code may affect hundreds of different metrics; it's pretty hard to measure all of them in each experiment. Carefuly think about what kind of metrics are really important for you.
Try to take the input data that are close to your actual work scenarios. Benchmarking with some "dummy" data may leads to incorrect results because there are too many tricky optimizations in runtime that works pretty well with such "dummy" cases.

Replacing a foreach with LINQ

I have some very simple code that I'm trying to get running marginally quicker (there are a lot of these small types of call dotted around the code which seems to be slowing things down) using LINQ instead of standard code.
The problem is this - I have a variable outside of the LINQ which the result of the LINQ query needs to add it.
The original code looks like this
double total = 0
foreach(Crop c in p.Crops)
{
if (c.CropType.Type == t.Type)
total += c.Area;
}
return total;
This method isn't slow until the loop starts getting large, then it slows on the phone. Can this sort of code be moved to a relatively quick and simple piece of LINQ?
Looks like you could use sum: (edit: my syntax was wrong)
total = (from c in p.Crops
where c.CropType.Type == t.Type
select c.Area).Sum();
Or in extension method format:
total = p.Crops.Where(c => c.CropType.Type == t.Type).Sum(c => c.area);
As to people saying LINQ won't perform better where is your evidence? (The below is based on post from Hanselman? I ran the following in linqpad: (You will need to download and reference nbuilder to get it to run)
void Main()
{
//Nbuilder is used to create a chunk of sample data
//http://nbuilder.org
var crops = Builder<Crop>.CreateListOfSize(1000000).Build();
var t = new Crop();
t.Type = Type.grain;
double total = 0;
var sw = new Stopwatch();
sw.Start();
foreach(Crop c in crops)
{
if (c.Type == t.Type)
total += c.area;
}
sw.Stop();
total.Dump("For Loop total:");
sw.ElapsedMilliseconds.Dump("For Loop Elapsed Time:");
sw.Restart();
var result = crops.Where(c => c.Type == t.Type).Sum(c => c.area);
sw.Stop();
result.Dump("LINQ total:");
sw.ElapsedMilliseconds.Dump("LINQ Elapsed Time:");
sw.Restart();
var result2 = (from c in crops
where c.Type == t.Type
select c.area).Sum();
result.Dump("LINQ (sugar syntax) total:");
sw.ElapsedMilliseconds.Dump("LINQ (sugar syntax) Elapsed Time:");
}
public enum Type
{
wheat,
grain,
corn,
maize,
cotton
}
public class Crop
{
public string Name { get; set; }
public Type Type { get; set; }
public double area;
}
The results come out favorably to LINQ:
For Loop total: 99999900000
For Loop Elapsed Time: 25
LINQ total: 99999900000
LINQ Elapsed Time: 17
LINQ (sugar syntax) total: 99999900000
LINQ (sugar syntax) Elapsed Time: 17
The main way to optimize this would be changing p, which may or may not be possible.
Assuming p is a P, and looks something like this:
internal sealed class P
{
private readonly List<Crop> mCrops = new List<Crop>();
public IEnumerable<Crop> Crops { get { return mCrops; } }
public void Add(Crop pCrop)
{
mCrops.Add(pCrop);
}
}
(If p is a .NET type like a List<Crop>, then you can create a class like this.)
You can optimize your loop by maintaining a dictionary:
internal sealed class P
{
private readonly List<Crop> mCrops = new List<Crop>();
private readonly Dictionary<Type, List<Crop>> mCropsByType
= new Dictionary<Type, List<Crop>>();
public IEnumerable<Crop> Crops { get { return mCrops; } }
public void Add(Crop pCrop)
{
if (!mCropsByType.ContainsKey(pCrop.CropType.Type))
mCropsByType.Add(pCrop.CropType.Type, new List<Crop>());
mCropsByType[pCrop.CropType.Type].Add(pCrop);
mCrops.Add(pCrop);
}
public IEnumerable<Crop> GetCropsByType(Type pType)
{
return mCropsByType.ContainsKey(pType)
? mCropsByType[pType]
: Enumerable.Empty<Crop>();
}
}
Your code then becomes something like:
double total = 0
foreach(Crop crop in p.GetCropsByType(t.Type))
total += crop.Area;
return total;
Another possibility that would be even faster is:
internal sealed class P
{
private readonly List<Crop> mCrops = new List<Crop>();
private double mTotalArea;
public IEnumerable<Crop> Crops { get { return mCrops; } }
public double TotalArea { get { return mTotalArea; } }
public void Add(Crop pCrop)
{
mCrops.Add(pCrop);
mTotalArea += pCrop.Area;
}
}
Your code would then simply access the TotalArea property and you wouldn't even need a loop:
return p.TotalArea;
You might also consider extracting the code that manages the Crops data to a separate class, depending on what P is.
This is a pretty straight forward sum, so I doubt you will see any benefit from using LINQ.
You haven't told us much about the setup here, but here's an idea. If p.Crops is large and only a small number of the items in the sequence are of the desired type, you could build another sequence that contains just the items you need.
I assume that you know the type when you insert into p.Crops. If that's the case you could easily insert the relevant items in another collection and use that instead for the sum loop. That will reduce N and get rid of the comparison. It will still be O(N) though.

Generics around Entityframework DbContext causes performance degradation?

I wrote a simple import/export application that transforms data from source->destination using EntityFramework and AutoMapper. It basically:
selects batchSize of records from the source table
'maps' data from source->destination entity
add new destination entities to destination table and saves context
I move around 500k records in under 5 minutes. After I refactored the code using generics the performance drops drastically to 250 records in 5 minutes.
Are my delegates that return DbSet<T> properties on the DbContext causing these problems? Or is something else going on?
Fast non-generic code:
public class Importer
{
public void ImportAddress()
{
const int batchSize = 50;
int done = 0;
var src = new SourceDbContext();
var count = src.Addresses.Count();
while (done < count)
{
using (var dest = new DestinationDbContext())
{
var list = src.Addresses.OrderBy(x => x.AddressId).Skip(done).Take(batchSize).ToList();
list.ForEach(x => dest.Address.Add(Mapper.Map<Addresses, Address>(x)));
done += batchSize;
dest.SaveChanges();
}
}
src.Dispose();
}
}
(Very) slow generic code:
public class Importer<TSourceContext, TDestinationContext>
where TSourceContext : DbContext
where TDestinationContext : DbContext
{
public void Import<TSourceEntity, TSourceOrder, TDestinationEntity>(Func<TSourceContext, DbSet<TSourceEntity>> getSourceSet, Func<TDestinationContext, DbSet<TDestinationEntity>> getDestinationSet, Func<TSourceEntity, TSourceOrder> getOrderBy)
where TSourceEntity : class
where TDestinationEntity : class
{
const int batchSize = 50;
int done = 0;
var ctx = Activator.CreateInstance<TSourceContext>();
//Does this getSourceSet delegate cause problems perhaps?
//Added this
var set = getSourceSet(ctx);
var count = set.Count();
while (done < count)
{
using (var dctx = Activator.CreateInstance<TDestinationContext>())
{
var list = set.OrderBy(getOrderBy).Skip(done).Take(batchSize).ToList();
//Or is the db-side paging mechanism broken by the getSourceSet delegate?
//Added this
var destSet = getDestinationSet(dctx);
list.ForEach(x => destSet.Add(Mapper.Map<TSourceEntity, TDestinationEntity>(x)));
done += batchSize;
dctx.SaveChanges();
}
}
ctx.Dispose();
}
}
Problem is invocation of the Func delegates you're doing a lot. Cache the resulting values in variables and it'll be fine.

LINQ Partition List into Lists of 8 members [duplicate]

This question already has answers here:
Split List into Sublists with LINQ
(34 answers)
Closed 10 years ago.
How would one take a List (using LINQ) and break it into a List of Lists partitioning the original list on every 8th entry?
I imagine something like this would involve Skip and/or Take, but I'm still pretty new to LINQ.
Edit: Using C# / .Net 3.5
Edit2: This question is phrased differently than the other "duplicate" question. Although the problems are similar, the answers in this question are superior: Both the "accepted" answer is very solid (with the yield statement) as well as Jon Skeet's suggestion to use MoreLinq (which is not recommended in the "other" question.) Sometimes duplicates are good in that they force a re-examination of a problem.
Use the following extension method to break the input into subsets
public static class IEnumerableExtensions
{
public static IEnumerable<List<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
{
List<T> toReturn = new List<T>(max);
foreach(var item in source)
{
toReturn.Add(item);
if (toReturn.Count == max)
{
yield return toReturn;
toReturn = new List<T>(max);
}
}
if (toReturn.Any())
{
yield return toReturn;
}
}
}
We have just such a method in MoreLINQ as the Batch method:
// As IEnumerable<IEnumerable<T>>
var items = list.Batch(8);
or
// As IEnumerable<List<T>>
var items = list.Batch(8, seq => seq.ToList());
You're better off using a library like MoreLinq, but if you really had to do this using "plain LINQ", you can use GroupBy:
var sequence = new[] {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
var result = sequence.Select((x, i) => new {Group = i/8, Value = x})
.GroupBy(item => item.Group, g => g.Value)
.Select(g => g.Where(x => true));
// result is: { {1,2,3,4,5,6,7,8}, {9,10,11,12,13,14,15,16} }
Basically, we use the version of Select() that provides an index for the value being consumed, we divide the index by 8 to identify which group each value belongs to. Then we group the sequence by this grouping key. The last Select just reduces the IGrouping<> down to an IEnumerable<IEnumerable<T>> (and isn't strictly necessary since IGrouping is an IEnumerable).
It's easy enough to turn this into a reusable method by factoring our the constant 8 in the example, and replacing it with a specified parameter.
It's not necessarily the most elegant solution, and it is not longer a lazy, streaming solution ... but it does work.
You could also write your own extension method using iterator blocks (yield return) which could give you better performance and use less memory than GroupBy. This is what the Batch() method of MoreLinq does IIRC.
It's not at all what the original Linq designers had in mind, but check out this misuse of GroupBy:
public static IEnumerable<IEnumerable<T>> BatchBy<T>(this IEnumerable<T> items, int batchSize)
{
var count = 0;
return items.GroupBy(x => (count++ / batchSize)).ToList();
}
[TestMethod]
public void BatchBy_breaks_a_list_into_chunks()
{
var values = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
var batches = values.BatchBy(3);
batches.Count().ShouldEqual(4);
batches.First().Count().ShouldEqual(3);
batches.Last().Count().ShouldEqual(1);
}
I think it wins the "golf" prize for this question. The ToList is very important since you want to make sure the grouping has actually been performed before you try doing anything with the output. If you remove the ToList, you will get some weird side effects.
Take won't be very efficient, because it doesn't remove the entries taken.
why not use a simple loop:
public IEnumerable<IList<T>> Partition<T>(this/* <-- see extension methods*/ IEnumerable<T> src,int num)
{
IEnumerator<T> enu=src.getEnumerator();
while(true)
{
List<T> result=new List<T>(num);
for(int i=0;i<num;i++)
{
if(!enu.MoveNext())
{
if(i>0)yield return result;
yield break;
}
result.Add(enu.Current);
}
yield return result;
}
}
from b in Enumerable.Range(0,8) select items.Where((x,i) => (i % 8) == b);
The simplest solution is given by Mel:
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items,
int partitionSize)
{
int i = 0;
return items.GroupBy(x => i++ / partitionSize).ToArray();
}
Concise but slower. The above method splits an IEnumerable into chunks of desired fixed size with total number of chunks being unimportant. To split an IEnumerable into N number of chunks of equal sizes or close to equal sizes, you could do:
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items,
int numOfParts)
{
int i = 0;
return items.GroupBy(x => i++ % numOfParts);
}
To speed up things, a straightforward approach would do:
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items,
int partitionSize)
{
if (partitionSize <= 0)
throw new ArgumentOutOfRangeException("partitionSize");
int innerListCounter = 0;
int numberOfPackets = 0;
foreach (var item in items)
{
innerListCounter++;
if (innerListCounter == partitionSize)
{
yield return items.Skip(numberOfPackets * partitionSize).Take(partitionSize);
innerListCounter = 0;
numberOfPackets++;
}
}
if (innerListCounter > 0)
yield return items.Skip(numberOfPackets * partitionSize);
}
This is faster than anything currently on planet now :) The equivalent methods for a Split operation here

What does ExpressionVisitor.Visit<T> Do?

Before someone shouts out the answer, please read the question through.
What is the purpose of the method in .NET 4.0's ExpressionVisitor:
public static ReadOnlyCollection<T> Visit<T>(ReadOnlyCollection<T> nodes, Func<T, T> elementVisitor)
My first guess as to the purpose of this method was that it would visit each node in each tree specified by the nodes parameter and rewrite the tree using the result of the elementVisitor function.
This does not appear to be the case. Actually this method appears to do a little more than nothing, unless I'm missing something here, which I strongly suspect I am...
I tried to use this method in my code and when things didn't work out as expected, I reflectored the method and found:
public static ReadOnlyCollection<T> Visit<T>(ReadOnlyCollection<T> nodes, Func<T, T> elementVisitor)
{
T[] list = null;
int index = 0;
int count = nodes.Count;
while (index < count)
{
T objA = elementVisitor(nodes[index]);
if (list != null)
{
list[index] = objA;
}
else if (!object.ReferenceEquals(objA, nodes[index]))
{
list = new T[count];
for (int i = 0; i < index; i++)
{
list[i] = nodes[i];
}
list[index] = objA;
}
index++;
}
if (list == null)
{
return nodes;
}
return new TrueReadOnlyCollection<T>(list);
}
So where would someone actually go about using this method? What am I missing here?
Thanks.
It looks to me like a convenience method to apply an aribitrary transform function to an expression tree, and return the resulting transformed tree, or the original tree if there is no change.
I can't see how this is any different of a pattern that a standard expression visitor, other than except for using a visitor type, it uses a function.
As for usage:
Expression<Func<int, int, int>> addLambdaExpression= (a, b) => a + b;
// Change add to subtract
Func<Expression, Expression> changeToSubtract = e =>
{
if (e is BinaryExpression)
{
return Expression.Subtract((e as BinaryExpression).Left,
(e as BinaryExpression).Right);
}
else
{
return e;
}
};
var nodes = new Expression[] { addLambdaExpression.Body }.ToList().AsReadOnly();
var subtractExpression = ExpressionVisitor.Visit(nodes, changeToSubtract);
You don't explain how you expected it to behave and why therefore you think it does little more than nothing.

Resources