Better version of Compare Extension for Linq - performance

I need to get differences between two IEnumerable. I wrote extension method for it. But as you can see, it has performance penalties. Anyone can write better version of it?
EDIT
After first response, I understand that I could not explain well. I'm visiting both arrays three times. This is performance penalty. It must be a single shot.
PS: Both is optional :)
public static class LinqExtensions
{
public static ComparisonResult<T> Compare<T>(this IEnumerable<T> source, IEnumerable<T> target)
{
// Looping three times is performance penalty!
var res = new ComparisonResult<T>
{
OnlySource = source.Except(target),
OnlyTarget = target.Except(source),
Both = source.Intersect(target)
};
return res;
}
}
public class ComparisonResult<T>
{
public IEnumerable<T> OnlySource { get; set; }
public IEnumerable<T> OnlyTarget { get; set; }
public IEnumerable<T> Both { get; set; }
}

Dependig on the use-case, this might be more efficient:
public static ComparisonResult<T> Compare<T>(this IEnumerable<T> source, IEnumerable<T> target)
{
var both = source.Intersect(target).ToArray();
if (both.Any())
{
return new ComparisonResult<T>
{
OnlySource = source.Except(both),
OnlyTarget = target.Except(both),
Both = both
};
}
else
{
return new ComparisonResult<T>
{
OnlySource = source,
OnlyTarget = target,
Both = both
};
}
}

You're looking for an efficient full outer join.
Insert all items into a Dictionary<TKey, Tuple<TLeft, TRight>>. If a given key is not present, add it to the dictionary. If it is present, update the value. If the "left member" is set, this means that the item is present in the left source collection (you call it source). The opposite is true for the right member. You can do that using a single pass over both collections.
After that, you iterate over all values of this dictionary and output the respective items into one of three collections, or you just return it as an IEnumerable<Tuple<TLeft, TRight>> which saves the need for result collections.

Related

Entity framework 6 code first Many to many insert slow

I am trying to find a way to improve insert performances with the following code (please, read my questions after the code block):
//Domain classes
[Table("Products")]
public class Product
{
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public int Id { get; set; }
public string Sku { get; set; }
[ForeignKey("Orders")]
public virtual ICollection<Order> Orders { get; set; }
public Product()
{
Orders = new List<Order>();
}
}
[Table("Orders")]
public class Order
{
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public int Id { get; set; }
public string Title { get; set; }
public decimal Total { get; set; }
[ForeignKey("Products")]
public virtual ICollection<Product> Products { get; set; }
public Order()
{
Products = new List<Product>();
}
}
//Data access
public class MyDataContext : DbContext
{
public MyDataContext()
: base("MyDataContext")
{
Configuration.LazyLoadingEnabled = true;
Configuration.ProxyCreationEnabled = true;
Database.SetInitializer(new CreateDatabaseIfNotExists<MyDataContext>());
}
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Entity<Product>().ToTable("Products");
modelBuilder.Entity<Order>().ToTable("Orders");
}
}
//Service layer
public interface IServices<T, K>
{
T Create(T item);
T Read(K key);
IEnumerable<T> ReadAll(Expression<Func<IEnumerable<T>, IEnumerable<T>>> pre);
T Update(T item);
void Delete(K key);
void Save();
void Dispose();
void BatchSave(IEnumerable<T> list);
void BatchUpdate(IEnumerable<T> list, Action<UpdateSpecification<T>> spec);
}
public class BaseServices<T, K> : IDisposable, IServices<T, K> where T : class
{
protected MyDataContext Context;
public BaseServices()
{
Context = new MyDataContext();
}
public T Create(T item)
{
T created;
created = Context.Set<T>().Add(item);
return created;
}
public void Delete(K key)
{
var item = Read(key);
if (item == null)
return;
Context.Set<T>().Attach(item);
Context.Set<T>().Remove(item);
}
public T Read(K key)
{
T read;
read = Context.Set<T>().Find(key);
return read;
}
public IEnumerable<T> ReadAll(Expression<Func<IEnumerable<T>, IEnumerable<T>>> pre)
{
IEnumerable<T> read;
read = Context.Set<T>().ToList();
read = pre.Compile().Invoke(read);
return read;
}
public T Update(T item)
{
Context.Set<T>().Attach(item);
Context.Entry<T>(item).CurrentValues.SetValues(item);
Context.Entry<T>(item).State = System.Data.Entity.EntityState.Modified;
return item;
}
public void Save()
{
Context.SaveChanges();
}
}
public interface IOrderServices : IServices<Order, int>
{
//custom logic goes here
}
public interface IProductServices : IServices<Product, int>
{
//custom logic goes here
}
//Web project's controller
public ActionResult TestCreateProducts()
{
//Create 100 new rest products
for (int i = 0; i < 100; i++)
{
_productServices.Create(new Product
{
Sku = i.ToString()
});
}
_productServices.Save();
var products = _productServices.ReadAll(r => r); //get a list of saved products to add them to orders
var random = new Random();
var orders = new List<Order>();
var count = 0;
//Create 3000 orders
for (int i = 1; i <= 3000; i++)
{
//Generate a random list of products to attach to the current order
var productIds = new List<int>();
var x = random.Next(1, products.Count() - 1);
for (int j = 0; j < x; j++)
{
productIds.Add(random.Next(products.Min(r => r.Id), products.Max(r => r.Id)));
}
//Create the order
var order = new Order
{
Title = "Order" + i,
Total = i,
Products = products.Where(p => productIds.Contains(p.Id))
};
orders.Add(order);
}
_orderServices.CreateRange(orders);
_orderServices.Save();
return RedirectToAction("Index");
}
This code works fine but is very VERY slow when the SaveChanges is executed.
Behind the scene, the annotations on the domain objects creates all the relationships needed: a OrderProducts table with the proper foreign keys are automatically created and the inserts are being done by EF properly.
I've tried many things with bulk inserts using EntityFramework.Utilities, SqlBulkCopy, etc... but none worked.
Is there a way to achieve this?
Understand this is only for testing purposes and my goal is to optimize the best I can any operations in our softwares using EF.
Thanks!
Just before you do your inserts disable your context's AutoDetectChangesEnabled (by setting it to false). Do your inserts and then set the AutoDetectChangesEnabled back to true e.g.;
try
{
MyContext.Configuration.AutoDetectChangesEnabled = false;
// do your inserts updates etc..
}
finally
{
MyContext.Configuration.AutoDetectChangesEnabled = true;
}
You can find more information on what this is doing here
I see two reasons why your code is slow.
Add vs. AddRange
You add entity one by one using the Create method.
You should always use AddRange over Add. The Add method will try to DetectChanges every time the add method is invoked while AddRange only once.
You should add a "CreateRange" method in your code.
public IEnumerable<T> CreateRange(IEnumerable<T> list)
{
return Context.Set<T>().AddRange(list);
}
var products = new List<Product>();
//Create 100 new rest products
for (int i = 0; i < 100; i++)
{
products.Add(new Product { Sku = i.ToString() });
}
_productServices.CreateRange(list);
_productServices.Save();
Disabling / Enabling the property AutoDetectChanges also work as #mark_h proposed, however personally I don't like this kind of solution.
Database Round Trip
A database round trip is required for every record to add, modify or delete. So if you insert 3,000 records, then 3,000 database round trip will be required which is VERY slow.
You already tried EntityFramework.BulkInsert or SqlBulkCopy, which is great. I recommend you first to try them again using the "AddRange" fix to see the newly performance.
Here is a biased comparison of library supporting BulkInsert for EF:
Entity Framework - Bulk Insert Library Reviews & Comparisons
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library allows you to BulkSaveChanges, BulkInsert, BulkUpdate, BulkDelete and BulkMerge within your Database.
It supports all inheritances and associations.
// Easy to use
public void Save()
{
// Context.SaveChanges();
Context.BulkSaveChanges();
}
// Easy to customize
public void Save()
{
// Context.SaveChanges();
Context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
}
EDIT: Added answer to sub question
An entity object cannot be referenced by multiple instances of
IEntityChangeTracker
The issue happens because you use two different DbContext. One for the product and one for order.
You may find a better answer than mine in a different thread like this answer.
The Add method successfully attach the product, subsequent call of the same product doesn't throw an error because it's the same product.
The AddRange method, however, attach the product multiple time since it's not come from the same context, so when Detect Changes is called, he doesn't know how to handle it.
One way to fix it is by re-using the same context
var _productServices = new BaseServices<Product, int>();
var _orderServices = new BaseServices<Order, int>(_productServices.Context);
While it may not be elegant, the performance will be improved.

Linq Query Need - Looking for a pattern of data

Say I have a collection of the following simple class:
public class MyEntity
{
public string SubId { get; set; }
public System.DateTime ApplicationTime { get; set; }
public double? ThicknessMicrons { get; set; }
}
I need to search through the entire collection looking for 5 consecutive (not 5 total, but 5 consecutive) entities that have a null ThicknessMicrons value. Consecutiveness will be based on the ApplicationTime property. The collection will be sorted on that property.
How can I do this in a Linq query?
You can write your own extension method pretty easily:
public static IEnumerable<IEnumerable<T>> FindSequences<T>(this IEnumerable<T> sequence, Predicate<T> selector, int size)
{
List<T> curSequence = new List<T>();
foreach (T item in sequence)
{
// Check if this item matches the condition
if (selector(item))
{
// It does, so store it
curSequence.Add(item);
// Check if the list size has met the desired size
if (curSequence.Count == size)
{
// It did, so yield that list, and reset
yield return curSequence;
curSequence = new List<T>();
}
}
else
{
// No match, so reset the list
curSequence = new List<T>();
}
}
}
Now you can just say:
var groupsOfFive = entities.OrderBy(x => x.ApplicationTime)
.FindSequences(x => x.ThicknessMicrons == null, 5);
Note that this will return all sub-sequences of length 5. You can test for the existence of one like so:
bool isFiveSubsequence = groupsOfFive.Any();
Another important note is that if you have 9 consecutive matches, only one sub-sequence will be located.

LINQ recursion function?

Let's take this n-tier deep structure for example:
public class SomeItem
{
public Guid ID { get;set; }
public string Name { get; set; }
public bool HasChildren { get;set; }
public IEnumerable<SomeItem> Children { get; set; }
}
If I am looking to get a particular Item by ID (anywhere in the structure) is there some LINQ goodness I can use to easily get it in a single statement or do I have to use some recursive function as below:
private SomeItem GetSomeItem(IEnumerable<SomeItem> items, Guid ID)
{
foreach (var item in items)
{
if (item.ID == ID)
{
return item;
}
else if (item.HasChildren)
{
return GetSomeItem(item.Children, ID);
}
}
return null;
}
LINQ doesn't really "do" recursion nicely. Your solution seems appropriate - although I'm not sure HasChildren is really required... why not just use an empty list for an item with no children?
An alternative is to write a DescendantsAndSelf method which will return all of the descendants (including the item itself), something like this;
// Warning: potentially expensive!
public IEnumerable<SomeItem> DescendantsAndSelf()
{
yield return this;
foreach (var item in Children.SelectMany(x => x.DescendantsAndSelf()))
{
yield return item;
}
}
However, if the tree is deep that ends up being very inefficient because each item needs to "pass through" all the iterators of its ancestry. Wes Dyer has blogged about this, showing a more efficient implementation.
Anyway, if you have a method like this (however it's implemented) you can just use a normal "where" clause to find an item (or First/FirstOrDefault etc).
Here's one without recursion. This avoids the cost of passing through several layers of iterators, so I think it's about as efficient as they come.
public static IEnumerable<T> IterateTree<T>(this T root, Func<T, IEnumerable<T>> childrenF)
{
var q = new List<T>() { root };
while (q.Any())
{
var c = q[0];
q.RemoveAt(0);
q.AddRange(childrenF(c) ?? Enumerable.Empty<T>());
yield return c;
}
}
Invoke like so:
var subtree = root.IterateTree(x => x. Children).ToList();
hope this helps
public static IEnumerable<Control> GetAllChildControls(this Control parent)
{
foreach (Control child in parent.Controls)
{
yield return child;
if (child.HasChildren)
{
foreach (Control grandChild in child.GetAllChildControls())
yield return grandChild;
}
}
}
It is important to remember you don't need to do everything with LINQ, or default to recursion. There are interesting options when you use data structures. The following is a simple flattening function in case anyone is interested.
public static IEnumerable<SomeItem> Flatten(IEnumerable<SomeItem> items)
{
if (items == null || items.Count() == 0) return new List<SomeItem>();
var result = new List<SomeItem>();
var q = new Queue<SomeItem>(collection: items);
while (q.Count > 0)
{
var item = q.Dequeue();
result.Add(item);
if (item?.Children?.Count() > 0)
foreach (var child in item.Children)
q.Enqueue(child);
}
return result;
}
While there are extension methods that enable recursion in LINQ (and probably look like your function), none are provided out of the box.
Examples of these extension methods can be found here or here.
I'd say your function is fine.

Using Distinct with LINQ and Objects [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 months ago.
The community reviewed whether to reopen this question 9 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Until recently, I was using a Distinct in LINQ to select a distinct category (an enum) from a table. This was working fine.
I now need to have it distinct on a class containing a category and country (both enums). The Distinct isn't working now.
What am I doing wrong?
I believe this post explains your problem:
http://blog.jordanterrell.com/post/LINQ-Distinct()-does-not-work-as-expected.aspx
The content of the above link can be summed up by saying that the Distinct() method can be replaced by doing the following.
var distinctItems = items
.GroupBy(x => x.PropertyToCompare)
.Select(x => x.First());
try an IQualityComparer
public class MyObjEqualityComparer : IEqualityComparer<MyObj>
{
public bool Equals(MyObj x, MyObj y)
{
return x.Category.Equals(y.Category) &&
x.Country.Equals(y.Country);
}
public int GetHashCode(MyObj obj)
{
return obj.GetHashCode();
}
}
then use here
var comparer = new MyObjEqualityComparer();
myObjs.Where(m => m.SomeProperty == "whatever").Distinct(comparer);
You're not doing it wrong, it is just the bad implementation of .Distinct() in the .NET Framework.
One way to fix it is already shown in the other answers, but there is also a shorter solution available, which has the advantage that you can use it as an extension method easily everywhere without having to tweak the object's hash values.
Take a look at this:
**Usage:**
var myQuery=(from x in Customers select x).MyDistinct(d => d.CustomerID);
Note: This example uses a database query, but it does also work with an enumerable object list.
Declaration of MyDistinct:
public static class Extensions
{
public static IEnumerable<T> MyDistinct<T, V>(this IEnumerable<T> query,
Func<T, V> f)
{
return query.GroupBy(f).Select(x=>x.First());
}
}
Or if you want it shorter, this is the same as above, but as "one-liner":
public static IEnumerable<T> MyDistinct<T, V>(this IEnumerable<T> query, Func<T, V> f)
=> query.GroupBy(f).Select(x => x.First());
And it works for everything, objects as well as entities. If required, you can create a second overloaded extension method for IQueryable<T> by just replacing the return type and first parameter type in the example I've given above.
Test data:
You can try it out with this test data:
List<A> GetData()
=> new List<A>()
{
new A() { X="1", Y="2" }, new A() { X="1", Y="2" },
new A() { X="2", Y="3" }, new A() { X="2", Y="3" },
new A() { X="1", Y="3" }, new A() { X="1", Y="3" },
};
class A
{
public string X;
public string Y;
}
Example:
void Main()
{
// returns duplicate rows:
GetData().Distinct().Dump();
// Gets distinct rows by i.X
GetData().MyDistinct(i => i.X).Dump();
}
For explanation, take a look at other answers. I'm just providing one way to handle this issue.
You might like this:
public class LambdaComparer<T>:IEqualityComparer<T>{
private readonly Func<T,T,bool> _comparer;
private readonly Func<T,int> _hash;
public LambdaComparer(Func<T,T,bool> comparer):
this(comparer,o=>0) {}
public LambdaComparer(Func<T,T,bool> comparer,Func<T,int> hash){
if(comparer==null) throw new ArgumentNullException("comparer");
if(hash==null) throw new ArgumentNullException("hash");
_comparer=comparer;
_hash=hash;
}
public bool Equals(T x,T y){
return _comparer(x,y);
}
public int GetHashCode(T obj){
return _hash(obj);
}
}
Usage:
public void Foo{
public string Fizz{get;set;}
public BarEnum Bar{get;set;}
}
public enum BarEnum {One,Two,Three}
var lst=new List<Foo>();
lst.Distinct(new LambdaComparer<Foo>(
(x1,x2)=>x1.Fizz==x2.Fizz&&
x1.Bar==x2.Bar));
You can even wrap it around to avoid writing noisy new LambdaComparer<T>(...) thing:
public static class EnumerableExtensions{
public static IEnumerable<T> SmartDistinct<T>
(this IEnumerable<T> lst, Func<T, T, bool> pred){
return lst.Distinct(new LambdaComparer<T>(pred));
}
}
Usage:
lst.SmartDistinct((x1,x2)=>x1.Fizz==x2.Fizz&&x1.Bar==x2.Bar);
NB: works reliably only for Linq2Objects
I know this is an old question, but I am not satisfied with any of the answers. I took time to figure this out for myself and I wanted to share my findings.
First it is important to read and understand these two things:
IEqualityComparer
EqualityComparer
Long story short in order to make the .Distinct() extension understand how to determine equality of your object - you must define a "EqualityComparer" for your object T. When you read the Microsoft docs it literally states:
We recommend that you derive from the EqualityComparer class
instead of implementing the IEqualityComparer interface...
That is how you determine what to use, because it had been decided for you already.
For the .Distinct() extension to work successfully you must ensure that your objects can be compared accurately. In the case of .Distinct() the GetHashCode() method is what really matters.
You can test this out for yourself by writing a GetHashCode() implementation that just returns the current Hash Code of the object being passed in and you will see the results are bad because this value changes on each run. That makes your objects too unique which is why it is important to actually write a proper implementation of this method.
Below is an exact copy of the code sample from IEqualityComparer<T>'s page with test data, small modification to the GetHashCode() method and comments to demonstrate the point.
//Did this in LinqPad
void Main()
{
var lst = new List<Box>
{
new Box(1, 1, 1),
new Box(1, 1, 1),
new Box(1, 1, 1),
new Box(1, 1, 1),
new Box(1, 1, 1)
};
//Demonstration that the hash code for each object is fairly
//random and won't help you for getting a distinct list
lst.ForEach(x => Console.WriteLine(x.GetHashCode()));
//Demonstration that if your EqualityComparer is setup correctly
//then you will get a distinct list
lst = lst
.Distinct(new BoxEqualityComparer())
.ToList();
lst.Dump();
}
public class Box
{
public Box(int h, int l, int w)
{
this.Height = h;
this.Length = l;
this.Width = w;
}
public int Height { get; set; }
public int Length { get; set; }
public int Width { get; set; }
public override String ToString()
{
return String.Format("({0}, {1}, {2})", Height, Length, Width);
}
}
public class BoxEqualityComparer
: EqualityComparer<Box>
{
public override bool Equals(Box b1, Box b2)
{
if (b2 == null && b1 == null)
return true;
else if (b1 == null || b2 == null)
return false;
else if (b1.Height == b2.Height && b1.Length == b2.Length
&& b1.Width == b2.Width)
return true;
else
return false;
}
public override int GetHashCode(Box bx)
{
#region This works
//In this example each component of the box object are being XOR'd together
int hCode = bx.Height ^ bx.Length ^ bx.Width;
//The hashcode of an integer, is that same integer
return hCode.GetHashCode();
#endregion
#region This won't work
//Comment the above lines and uncomment this line below if you want to see Distinct() not work
//return bx.GetHashCode();
#endregion
}
}

Why predicate isn't filtering when building it via reflection

I'm building a rather large filter based on an SearchObject that has 50+ fields that can be searched.
Rather than building my where clause for each one of these individually I thought I'd use some slight of hand and try building custom attribute suppling the necessary information and then using reflection to build out each of my predicate statements (Using LinqKit btw). Trouble is, that the code finds the appropriate values in the reflection code and successfully builds a predicate for the property, but the "where" doesn't seem to actually generate and my query always returns 0 records.
The attribute is simple:
[AttributeUsage(AttributeTargets.Property, AllowMultiple=true)]
public class FilterAttribute: Attribute
{
public FilterType FilterType { get; set; } //enum{ Object, Database}
public string FilterPath { get; set; }
//var predicate = PredicateBuilder.False<Metadata>();
}
And this is my method that builds out the query:
public List<ETracker.Objects.Item> Search(Search SearchObject, int Page, int PageSize)
{
var predicate = PredicateBuilder.False<ETracker.Objects.Item>();
Type t = typeof(Search);
IEnumerable<PropertyInfo> pi = t.GetProperties();
string title = string.Empty;
foreach (var property in pi)
{
if (Attribute.IsDefined(property, typeof(FilterAttribute)))
{
var attrs = property.GetCustomAttributes(typeof(FilterAttribute),true);
var value = property.GetValue(SearchObject, null);
if (property.Name == "Title")
title = (string)value;
predicate.Or(a => GetPropertyVal(a, ((FilterAttribute)attrs[0]).FilterPath) == value);
}
}
var res = dataContext.GetAllItems().Take(1000)
.Where(a => SearchObject.Subcategories.Select(b => b.ID).ToArray().Contains(a.SubCategory.ID))
.Where(predicate);
return res.ToList();
}
The SearchObject is quite simple:
public class Search
{
public List<Item> Items { get; set; }
[Filter(FilterType = FilterType.Object, FilterPath = "Title")]
public string Title { get; set; }
...
}
Any suggestions will be greatly appreciated. I may well be going way the wrong direction and will take no offense if someone has a better alternative (or at least one that works)
You're not assigning your predicate anywhere. Change the line to this:
predicate = predicate.Or(a => GetPropertyVal(a, ((FilterAttribute)attrs[0]).FilterPath) == value);

Resources