I wrote a simple import/export application that transforms data from source->destination using EntityFramework and AutoMapper. It basically:
selects batchSize of records from the source table
'maps' data from source->destination entity
add new destination entities to destination table and saves context
I move around 500k records in under 5 minutes. After I refactored the code using generics the performance drops drastically to 250 records in 5 minutes.
Are my delegates that return DbSet<T> properties on the DbContext causing these problems? Or is something else going on?
Fast non-generic code:
public class Importer
{
public void ImportAddress()
{
const int batchSize = 50;
int done = 0;
var src = new SourceDbContext();
var count = src.Addresses.Count();
while (done < count)
{
using (var dest = new DestinationDbContext())
{
var list = src.Addresses.OrderBy(x => x.AddressId).Skip(done).Take(batchSize).ToList();
list.ForEach(x => dest.Address.Add(Mapper.Map<Addresses, Address>(x)));
done += batchSize;
dest.SaveChanges();
}
}
src.Dispose();
}
}
(Very) slow generic code:
public class Importer<TSourceContext, TDestinationContext>
where TSourceContext : DbContext
where TDestinationContext : DbContext
{
public void Import<TSourceEntity, TSourceOrder, TDestinationEntity>(Func<TSourceContext, DbSet<TSourceEntity>> getSourceSet, Func<TDestinationContext, DbSet<TDestinationEntity>> getDestinationSet, Func<TSourceEntity, TSourceOrder> getOrderBy)
where TSourceEntity : class
where TDestinationEntity : class
{
const int batchSize = 50;
int done = 0;
var ctx = Activator.CreateInstance<TSourceContext>();
//Does this getSourceSet delegate cause problems perhaps?
//Added this
var set = getSourceSet(ctx);
var count = set.Count();
while (done < count)
{
using (var dctx = Activator.CreateInstance<TDestinationContext>())
{
var list = set.OrderBy(getOrderBy).Skip(done).Take(batchSize).ToList();
//Or is the db-side paging mechanism broken by the getSourceSet delegate?
//Added this
var destSet = getDestinationSet(dctx);
list.ForEach(x => destSet.Add(Mapper.Map<TSourceEntity, TDestinationEntity>(x)));
done += batchSize;
dctx.SaveChanges();
}
}
ctx.Dispose();
}
}
Problem is invocation of the Func delegates you're doing a lot. Cache the resulting values in variables and it'll be fine.
Related
I have some very simple code that I'm trying to get running marginally quicker (there are a lot of these small types of call dotted around the code which seems to be slowing things down) using LINQ instead of standard code.
The problem is this - I have a variable outside of the LINQ which the result of the LINQ query needs to add it.
The original code looks like this
double total = 0
foreach(Crop c in p.Crops)
{
if (c.CropType.Type == t.Type)
total += c.Area;
}
return total;
This method isn't slow until the loop starts getting large, then it slows on the phone. Can this sort of code be moved to a relatively quick and simple piece of LINQ?
Looks like you could use sum: (edit: my syntax was wrong)
total = (from c in p.Crops
where c.CropType.Type == t.Type
select c.Area).Sum();
Or in extension method format:
total = p.Crops.Where(c => c.CropType.Type == t.Type).Sum(c => c.area);
As to people saying LINQ won't perform better where is your evidence? (The below is based on post from Hanselman? I ran the following in linqpad: (You will need to download and reference nbuilder to get it to run)
void Main()
{
//Nbuilder is used to create a chunk of sample data
//http://nbuilder.org
var crops = Builder<Crop>.CreateListOfSize(1000000).Build();
var t = new Crop();
t.Type = Type.grain;
double total = 0;
var sw = new Stopwatch();
sw.Start();
foreach(Crop c in crops)
{
if (c.Type == t.Type)
total += c.area;
}
sw.Stop();
total.Dump("For Loop total:");
sw.ElapsedMilliseconds.Dump("For Loop Elapsed Time:");
sw.Restart();
var result = crops.Where(c => c.Type == t.Type).Sum(c => c.area);
sw.Stop();
result.Dump("LINQ total:");
sw.ElapsedMilliseconds.Dump("LINQ Elapsed Time:");
sw.Restart();
var result2 = (from c in crops
where c.Type == t.Type
select c.area).Sum();
result.Dump("LINQ (sugar syntax) total:");
sw.ElapsedMilliseconds.Dump("LINQ (sugar syntax) Elapsed Time:");
}
public enum Type
{
wheat,
grain,
corn,
maize,
cotton
}
public class Crop
{
public string Name { get; set; }
public Type Type { get; set; }
public double area;
}
The results come out favorably to LINQ:
For Loop total: 99999900000
For Loop Elapsed Time: 25
LINQ total: 99999900000
LINQ Elapsed Time: 17
LINQ (sugar syntax) total: 99999900000
LINQ (sugar syntax) Elapsed Time: 17
The main way to optimize this would be changing p, which may or may not be possible.
Assuming p is a P, and looks something like this:
internal sealed class P
{
private readonly List<Crop> mCrops = new List<Crop>();
public IEnumerable<Crop> Crops { get { return mCrops; } }
public void Add(Crop pCrop)
{
mCrops.Add(pCrop);
}
}
(If p is a .NET type like a List<Crop>, then you can create a class like this.)
You can optimize your loop by maintaining a dictionary:
internal sealed class P
{
private readonly List<Crop> mCrops = new List<Crop>();
private readonly Dictionary<Type, List<Crop>> mCropsByType
= new Dictionary<Type, List<Crop>>();
public IEnumerable<Crop> Crops { get { return mCrops; } }
public void Add(Crop pCrop)
{
if (!mCropsByType.ContainsKey(pCrop.CropType.Type))
mCropsByType.Add(pCrop.CropType.Type, new List<Crop>());
mCropsByType[pCrop.CropType.Type].Add(pCrop);
mCrops.Add(pCrop);
}
public IEnumerable<Crop> GetCropsByType(Type pType)
{
return mCropsByType.ContainsKey(pType)
? mCropsByType[pType]
: Enumerable.Empty<Crop>();
}
}
Your code then becomes something like:
double total = 0
foreach(Crop crop in p.GetCropsByType(t.Type))
total += crop.Area;
return total;
Another possibility that would be even faster is:
internal sealed class P
{
private readonly List<Crop> mCrops = new List<Crop>();
private double mTotalArea;
public IEnumerable<Crop> Crops { get { return mCrops; } }
public double TotalArea { get { return mTotalArea; } }
public void Add(Crop pCrop)
{
mCrops.Add(pCrop);
mTotalArea += pCrop.Area;
}
}
Your code would then simply access the TotalArea property and you wouldn't even need a loop:
return p.TotalArea;
You might also consider extracting the code that manages the Crops data to a separate class, depending on what P is.
This is a pretty straight forward sum, so I doubt you will see any benefit from using LINQ.
You haven't told us much about the setup here, but here's an idea. If p.Crops is large and only a small number of the items in the sequence are of the desired type, you could build another sequence that contains just the items you need.
I assume that you know the type when you insert into p.Crops. If that's the case you could easily insert the relevant items in another collection and use that instead for the sum loop. That will reduce N and get rid of the comparison. It will still be O(N) though.
I have an MVC application with a dynamic table on one of the pages, which the users defines how many columns the table has, the columns order and where to get the data from for each field.
I have written some very bad code in order to keep it dynamic and now I would like it to be more efficient.
My problem is that I don't know how to define the columns I should get back into my IEnumerable on runtime. My main issue is that I don't know how many columns I might have.
I have a reference to a class which gets the field's text. I also have a dictionary of each field's order with the exact property It should get the data from.
My code should look something like that:
var docsRes3 = from d in docs
select new[]
{
for (int i=0; i<numOfCols; i++)
{
gen.getFieldText(d, res.FieldSourceDic[i]);
}
};
where:
docs = List from which I would like to get only specific fields
res.FieldSourceDic = Dictionary in which the key is the order of the column and the value is the property
gen.getFieldText = The function which gets the entity and the property and returns the value
Obviously, it doesn't work.
I also tried
StringBuilder fieldsSB = new StringBuilder();
for (int i = 0; i < numOfCols; i++)
{
string field = "d." + res.FieldSourceDic[i] + ".ToString()";
if (!string.IsNullOrEmpty(fieldsSB.ToString()))
{
fieldsSB.Append(",");
}
fieldsSB.Append(field);
}
var docsRes2 = from d in docs
select new[] { fieldsSB.ToString() };
It also didn't work.
The only thing that worked for me so far was:
List<string[]> docsRes = new List<string[]>();
foreach (NewOriginDocumentManagment d in docs)
{
string[] row = new string[numOfCols];
for (int i = 0; i < numOfCols; i++)
{
row[i] = gen.getFieldText(d, res.FieldSourceDic[i]);
}
docsRes.Add(row);
}
Any idea how can I pass the linq the list of fields and it'll cut the needed data out of it efficiently?
Thanks, Hoe I was clear about what I need....
Try following:
var docsRes3 = from d in docs
select (
from k in res.FieldSourceDic.Keys.Take(numOfCols)
select gen.getFieldText(d, res.FieldSourceDic[k]));
I got my answer with some help from the following link:
http://www.codeproject.com/Questions/141367/Dynamic-Columns-from-List-using-LINQ
First I created a string array of all properties:
//Creats a string of all properties as defined in the XML
//Columns order must be started at 0. No skips are allowed
StringBuilder fieldsSB = new StringBuilder();
for (int i = 0; i < numOfCols; i++)
{
string field = res.FieldSourceDic[i];
if (!string.IsNullOrEmpty(fieldsSB.ToString()))
{
fieldsSB.Append(",");
}
fieldsSB.Append(field);
}
var cols = fieldsSB.ToString().Split(',');
//Gets the data for each row dynamically
var docsRes = docs.Select(d => GetProps(d, cols));
than I created the GetProps function, which is using my own function as described in the question:
private static dynamic GetProps(object d, IEnumerable<string> props)
{
if (d == null)
{
return null;
}
DynamicGridGenerator gen = new DynamicGridGenerator();
List<string> res = new List<string>();
foreach (var p in props)
{
res.Add(gen.getFieldText(d, p));
}
return res;
}
I have a datatable with 100,000+ DataRow. Which method is faster to access the collection?
Is there any faster way to process the rows collection ?
Method 1:
var rows= dsDataSet.Tables["dtTableName"].Rows;
int rowCount = dsDataSet.Tables["dtTableName"].Rows.Count;
for (int c = 0; c < rowCount; c++)
{
var theRow = rows[c];
//process the dataRow
}
Method 2:
for (int c = 0; c < dsDataSet.Tables["dtTableName"].Rows.Count; c++)
{
var theRow = dsDataSet.Tables["dtTableName"].Rows[c];
//process the dataRow
}
It is worth noting the most direct way to access cells is via the DataColumn indexer; the data is actually stored in the columns, not the rows (no: really).
So something like:
var table = dataSet.Tables["dtTableName"];
// HERE: fetch the DataColumn of those you need, for example:
var idCol = table.Columns["Id"];
var nameCol = table.Columns["Name"];
// now loop
foreach(DataRow row in table.Rows)
{
var id = (int)row[idCol];
var name = (string)row[nameCol];
// ...
}
However, frankly if you want the best performance, I would start by saying "don't use DataSet / DataTable". That is actually a very complicated model designed to be all kinds of flexible, with change tracking, rule enforcement, etc. If you want fast, I'd use a POCO and something like "dapper", for example:
public class Foo {
public int Id {get;set;}
public string Name {get;set;}
}
...
string region = "North";
foreach(var row in conn.Query<Foo>("select * from [Foo] where Region = #region",
new { region })) // <=== simple but correct parameterisation
{
// TODO: do something with row.Id and row.Name, which are direct
// properties of the Foo row returned
var id = row.Id;
var name = row.Name;
// ...
}
or even skip the type via dynamic:
string region = "North";
foreach(var row in conn.Query("select * from [Foo] where Region = #region",
new { region })) // ^^^ note no <Foo> here
{
// here "row" is dynamic, but still works; not quite as direct as a
// POCO object, though
int id = row.Id; // <=== note we can't use `var` here or the
string name = row.Name; // variables would themselves be "dynamic"
// ...
}
I'm new to the Visual Studio Unit Testing Framework. I've dabbled a little in XUnit, though (DUnit to be specific).
I don't know why the following tests are failing. Based on my C# code (exhibit A), I would think my tests (exhibit B) would pass with the proverbial flying colors.
[EXHIBIT A - Pertinent code]
public class MessageClass
{
private int _messageTypeCode = 0;
private int _messageTypeSubcode;
private int _messageSequenceNumber;
private string _messageText;
public MessageClass()
{
this._messageTypeCode = 0;
this._messageTypeSubcode = 0;
this._messageSequenceNumber = 0;
this._messageText = string.Empty;
}
public void SetMessageTypeSubcode(int AMessageTypeSubcode)
{
int iMsgTypeSubCode = AMessageTypeSubcode;
if (iMsgTypeSubCode > 9999)
{
iMsgTypeSubCode = 9999;
}
else if (iMsgTypeSubCode < 0)
{
iMsgTypeSubCode = 42;
}
_messageTypeSubcode = AMessageTypeSubcode;
}
public int MessageTypeSubcode()
{
return _messageTypeSubcode;
}
[EXHIBIT B - Test code in the corresponding MessageClassTest]
[TestMethod()]
public void SetMessageTypeSubcodeTest()
{
int AMessageTypeSubcode;
// Should I put this class instantiation in MyTestInitialize?
MessageClass target = new MessageClass();
// Test 1
AMessageTypeSubcode = 0;
target.SetMessageTypeSubcode(AMessageTypeSubcode);
Assert.AreEqual(AMessageTypeSubcode, target.MessageTypeSubcode());
// Test 2 - 10000 is too much
AMessageTypeSubcode = 12345;
target.SetMessageTypeSubcode(AMessageTypeSubcode);
Assert.AreEqual(9999, target.MessageTypeSubcode());
// Test 3 - val must be positive
AMessageTypeSubcode = -77;
target.SetMessageTypeSubcode(AMessageTypeSubcode);
Assert.AreEqual(42, target.MessageTypeSubcode());
}
... It is failing on the second test. Having set the val higher than the cutoff (9999), it should be assigned that (9999) rather than 12345.
As I said, I'm new to Visual Studio Unit Testing Framework; is it not possible to have more than one test in a TestMethod? Or do I need to do something like call flush() or finish() or close() or reset() or something?
The tests are failing because the test should fail. Your method is incorrect:
_messageTypeSubcode = AMessageTypeSubcode;
Should be:
_messageTypeSubcode = iMsgTypeSubCode ;
I faced a rather stupid performance issue in my code. After a small investigation, i have found that AsQueryable method i used to cast my generic list slows down the code up to 8000 times.
So the the question is, why is that?
Here is the example
class Program
{
static void Main(string[] args)
{
var c = new ContainerTest();
c.FillList();
var s = Environment.TickCount;
for (int i = 0; i < 10000; ++i)
{
c.TestLinq(true);
}
var e = Environment.TickCount;
Console.WriteLine("TestLinq AsQueryable - {0}", e - s);
s = Environment.TickCount;
for (int i = 0; i < 10000; ++i)
{
c.TestLinq(false);
}
e = Environment.TickCount;
Console.WriteLine("TestLinq as List - {0}", e - s);
Console.WriteLine("Press enter to finish");
Console.ReadLine();
}
}
class ContainerTest
{
private readonly List<int> _list = new List<int>();
private IQueryable<int> _q;
public void FillList()
{
_list.Clear();
for (int i = 0; i < 10; ++i)
{
_list.Add(i);
}
_q = _list.AsQueryable();
}
public Tuple<int, int> TestLinq(bool useAsQ)
{
var upperBorder = useAsQ ? _q.FirstOrDefault(i => i > 7) : _list.FirstOrDefault(i => i > 7);
var lowerBorder = useAsQ ? _q.TakeWhile(i => i < 7).LastOrDefault() : _list.TakeWhile(i => i < 7).LastOrDefault();
return new Tuple<int, int>(upperBorder, lowerBorder);
}
}
UPD As i understand, i have to avoid AsQueryable method as much as possible(if it's not in the line of inheritance of the container), because i'll get immediately performance issue
"and avoid the moor in those hours of darkness when the powers of evil are exalted"
Just faced the same issue.
The thing is that IQueryable<T> takes Expression<Func<T, Bool>> as parameter for filtering in Where()/FirstOrDefault() calls - as opposed of just the Func<T, Bool> pre-compiled delegate taken in simple IEnumerable's correspondent methods.
That means there will be a compile phase to transform the Expression into a delegate. And this costs quite a lot.
Now you need that in a loop (just I did)? You'll get in some trouble...
PS: It seems .NET Core/.NET 5 improves this significantly. Unfortunately, our projects are not there yet...
at least use LINQ with List too
manual implementation will always be faster than LINQ
EDIT
you know that both test doesn't give the same result
Because AsQueryable returns an IQueryable, which has a completely different set of extension methods for the LINQ standard query operators from the one intended for things like List.
Queryable collections are meant to have a backing store of an RDBMS or something similar, and you are building a different, more complex code expression tree when you call IQueryable.FirstOrDefault() as opposed to List<>.FirstOrDefault().