Comparing two collections - linq

i have what seems like a common problem / pattern. two collections of the same object. The object has a number of properties and some nested objects within it. Car has a property called id which is the unique identifier.
I want to find the LINQ way to do a diff, which includes:
Items in one collection and not the other (visa versa)
For the items that match, are there any changes (changes would be a comparison of all properties? (i only care about settable properties, would i use reflection for this ?? )

You can use the Enumerable.Except() method. This uses a comparer (either a default or one you supply) to evaluate which objects are in both sequences or just one:
var sequenceA = new[] { "a", "e", "i", "o", "u" };
var sequenceB = new[] { "a", "b", "c" };
var sequenceDiff = sequenceA.Except( sequenceB );
If you want to perform a complete disjunction of both sequences (A-B) union (B-A), you would have to use:
var sequenceDiff =
sequenceA.Except( sequenceB ).Union( sequenceB.Except( sequenceA ) );
If you have a complex type, you can write an IComparer<T> for your type T and use the overload that accepts the comparer.
For the second part of your question, you would need to roll your own implementation to report which properties of a type are different .. there's nothing built into the .NET BCL directly. You have to decide what form this reporting would take? How would you identify and express differences in a complex type? You could certainly use reflection for this ... but if you're only dealing with a single type I would avoid that, and write a specialized differencing utility just for it. If yo're going to support a borad range of types, then reflection may make more sense.

You've already received an excellent answer for your first half. The second half, as LBushkin explains, cannot be done by BCL classes directly. Here's a simple method that goes through all public settable properties (note: it is possible that the gettor, in these cases, is not public!) and compares them one by one. If two objects are 100% equal, it will return true. Else, it will break out early and return false:
static bool AllSettablePropertiesEqual<T>(T obj1, T obj2)
{
PropertyInfo[] info1 = obj1.GetType().GetProperties(
BindingFlags.Public |
BindingFlags.SetProperty |
BindingFlags.Instance); // get public properties
PropertyInfo[] info2 = obj2.GetType().GetProperties(
BindingFlags.Public |
BindingFlags.SetProperty |
BindingFlags.Instance); // get public properties
// a loop is easier than linq here, and we can break out quick:
for (var i = 0; i < info1.Length; i++)
{
var value1 = info1[i].GetValue(obj1, null);
var value2 = info2[i].GetValue(obj2, null)
if(value1 == null || value2 ==null)
{
if(value1 != value2)
return false;
}
else if (!value1.Equals(value2))
{
return false;
}
}
return true;
}
You could easily add this method to a standard LINQ expression, like this:
var reallyReallyEqual = from itemA in listA
join itemB in listB
on AllSettablePropertiesEqual(itemA, itemB)
select itemA;

Related

Most efficient way to determine if there are any differences between specific properties of 2 lists of items?

In C# .NET 4.0, I am struggling to come up with the most efficient way to determine if the contents of 2 lists of items contain any differences.
I don't need to know what the differences are, just true/false whether the lists are different based on my criteria.
The 2 lists I am trying to compare contain FileInfo objects, and I want to compare only the FileInfo.Name and FileInfo.LastWriteTimeUtc properties of each item. All the FileInfo items are for files located in the same directory, so the FileInfo.Name values will be unique.
To summarize, I am looking for a single Boolean result for the following criteria:
Does ListA contain any items with FileInfo.Name not in ListB?
Does ListB contain any items with FileInfo.Name not in ListA?
For items with the same FileInfo.Name in both lists, are the FileInfo.LastWriteTimeUtc values different?
Thank you,
Kyle
I would use a custom IEqualityComparer<FileInfo> for this task:
public class FileNameAndLastWriteTimeUtcComparer : IEqualityComparer<FileInfo>
{
public bool Equals(FileInfo x, FileInfo y)
{
if(Object.ReferenceEquals(x, y)) return true;
if (x == null || y == null) return false;
return x.FullName.Equals(y.FullName) && x.LastWriteTimeUtc.Equals(y.LastWriteTimeUtc);
}
public int GetHashCode(FileInfo fi)
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
hash = hash * 23 + fi.FullName.GetHashCode();
hash = hash * 23 + fi.LastWriteTimeUtc.GetHashCode();
return hash;
}
}
}
Now you can use a HashSet<FileInfo> with this comparer and HashSet<T>.SetEquals:
var comparer = new FileNameAndLastWriteTimeUtcComparer();
var uniqueFiles1 = new HashSet<FileInfo>(list1, comparer);
bool anyDifferences = !uniqueFiles1.SetEquals(list2);
Note that i've used FileInfo.FullName instead of Name since names aren't unqiue at all.
Sidenote: another advantage is that you can use this comparer for many LINQ methods like GroupBy, Except, Intersect or Distinct.
This is not the most efficient way (probably ranks a 4 out of 5 in the quick-and-dirty category):
var comparableListA = ListA.Select(a =>
new { Name = a.Name, LastWrite = a.LastWriteTimeUtc, Object = a});
var comparableListB = ListB.Select(b =>
new { Name = b.Name, LastWrite = b.LastWriteTimeUtc, Object = b});
var diffList = comparableListA.Except(comparableListB);
var youHaveDiff = diffList.Any();
Explanation:
Anonymous classes are compared by property values, which is what you're looking to do, which led to my thinking of doing a LINQ projection along those lines.
P.S.
You should double check the syntax, I just rattled this off without the compiler.

Item-by-item list comparison, updating each item with its result (no third list)

The solutions I have found so far in my research on comparing lists of objects have usually generated a new list of objects, say of those items existing in one list, but not in the other. In my case, I want to compare two lists to discover the items whose key exists in one list and not the other (comparing both ways), and for those keys found in both lists, checking whether the value is the same or different.
The object being compared has multiple properites that constitute the key, plus a property that constitutes the value, and finally, an enum property that describes the result of the comparison, e.g., {Equal, NotEqual, NoMatch, NotYetCompared}. So my object might look like:
class MyObject
{
//Key combination
string columnA;
string columnB;
decimal columnC;
//The Value
decimal columnD;
//Enum for comparison, used for styling the item (value hidden from UI)
//Alternatively...this could be a string type, holding the enum.ToString()
MyComparisonEnum result;
}
These objects are collected into two ObservableCollection<MyObject> to be compared. When bound to the UI, the grid rows are being styled based on the caomparison result enum, so the user can easily see what keys are in the new dataset but not in the old, vice-versa, along with those keys in both datasets with a different value. Both lists are presented in the UI in data grids, with the rows styled based on the comparison result.
Would LINQ be suitable as a tool to solve this efficiently, or should I use loops to scan the lists and break out when the key is found, etc (a solution like this comes naturally to be from my procedural programming background)... or some other method?
Thank you!
You can use Except and Intersect:
var list1 = new List<MyObject>();
var list2 = new List<MyObject>();
// initialization code
var notIn2 = list1.Except(list2);
var notIn1 = list2.Except(list1);
var both = list1.Intersect(list2);
To find objects with different values (ColumnD) you can use this (quite efficient) Linq query:
var diffValue = from o1 in list1
join o2 in list2
on new { o1.columnA, o1.columnB, o1.columnC } equals new { o2.columnA, o2.columnB, o2.columnC }
where o1.columnD != o2.columnD
select new { Object1 = o1, Object2 = o2 };
foreach (var diff in diffValue)
{
MyObject obj1 = diff.Object1;
MyObject obj2 = diff.Object2;
Console.WriteLine("Obj1-Value:{0} Obj2-Value:{1}", obj1.columnD, obj2.columnD);
}
when you override Equals and GetHashCode appropriately:
class MyObject
{
//Key combination
string columnA;
string columnB;
decimal columnC;
//The Value
decimal columnD;
//Enum for comparison, used for styling the item (value hidden from UI)
//Alternatively...this could be a string type, holding the enum.ToString()
MyComparisonEnum result;
public override bool Equals(object obj)
{
if (obj == null || !(obj is MyObject)) return false;
MyObject other = (MyObject)obj;
return columnA.Equals(other.columnA) && columnB.Equals(other.columnB) && columnC.Equals(other.columnC);
}
public override int GetHashCode()
{
int hash = 19;
hash = hash + (columnA ?? "").GetHashCode();
hash = hash + (columnB ?? "").GetHashCode();
hash = hash + columnC.GetHashCode();
return hash;
}
}

Linq with DynamicObject

I have List where MyType : DynamicObject. The reason for MyType inheriting from DynamicObject, is that I need a type that can contain unknown number of properties.
It all works fine until I need to filter List. Is there a way I can do a linq that will do something like this:
return all items where any of the properties is empty string or white space?
(from the comment) can I do above linq query with List?
Yes, here is how you can do it with ExpandoObject:
var list = new List<ExpandoObject>();
dynamic e1 = new ExpandoObject();
e1.a = null;
e1.b = "";
dynamic e2 = new ExpandoObject();
e2.x = "xxx";
e2.y = 123;
list.Add(e1);
list.Add(e2);
var res = list.Where(
item => item.Any(p => p.Value == null || (p.Value is string && string.IsNullOrEmpty((string)p.Value)))
);
The ExpandoObject presents an interface that lets you enumerate its property-value pairs as if they were in a dictionary, making the process of checking them a lot simpler.
Well as long as each object's properties are not unknown internally to themselves you could do it.
There isn't a great generic way to test all the properties of a dynamic object, if you don't have control over the DynamicObject you hope the implementer implemented GetDynamicMemberNames() and you can use the nuget package ImpromptuInterface's methods for getting the property names and dynamically invoking those names.
return allItems.Where(x=> Impromptu.GetMemberNames(x, dynamicOnly:true)
.Any(y=>String.IsNullOrWhiteSpace(Impromptu.InvokeGet(x,y));
Otherwise, since it's your own type MyType you can add your own method that can see internal accounting for those member values.
return allItems.Where(x => x.MyValues.Any(y=>String.IsNullOrWhitespace(x));

LINQ for LIKE queries of array elements

Let's say I have an array, and I want to do a LINQ query against a varchar that returns any records that have an element of the array anywhere in the varchar.
Something like this would be sweet.
string[] industries = { "airline", "railroad" }
var query = from c in contacts where c.industry.LikeAnyElement(industries) select c
Any ideas?
This is actually an example I use in my "Express Yourself" presentation, for something that is hard to do in regular LINQ; As far as I know, the easiest way to do this is by writing the predicate manually. I use the example below (note it would work equally for StartsWith etc):
using (var ctx = new NorthwindDataContext())
{
ctx.Log = Console.Out;
var data = ctx.Customers.WhereTrueForAny(
s => cust => cust.CompanyName.Contains(s),
"a", "de", "s").ToArray();
}
// ...
public static class QueryableExt
{
public static IQueryable<TSource> WhereTrueForAny<TSource, TValue>(
this IQueryable<TSource> source,
Func<TValue, Expression<Func<TSource, bool>>> selector,
params TValue[] values)
{
return source.Where(BuildTrueForAny(selector, values));
}
public static Expression<Func<TSource, bool>> BuildTrueForAny<TSource, TValue>(
Func<TValue, Expression<Func<TSource, bool>>> selector,
params TValue[] values)
{
if (selector == null) throw new ArgumentNullException("selector");
if (values == null) throw new ArgumentNullException("values");
if (values.Length == 0) return x => true;
if (values.Length == 1) return selector(values[0]);
var param = Expression.Parameter(typeof(TSource), "x");
Expression body = Expression.Invoke(selector(values[0]), param);
for (int i = 1; i < values.Length; i++)
{
body = Expression.OrElse(body,
Expression.Invoke(selector(values[i]), param));
}
return Expression.Lambda<Func<TSource, bool>>(body, param);
}
}
from c in contracts
where industries.Any(i => i == c.industry)
select c;
something like that. use the any method on the collection.
IEnumerable.Contains() translates to SQL IN as in:
WHERE 'american airlines' IN ('airline', 'railroad') -- FALSE
String.Contains() which translates to SQL LIKE %...% as in:
WHERE 'american airlines' LIKE '%airline%' -- TRUE
If you want the contacts where the contact's industry is LIKE (contains) any of the given industries, you want to combine both Any() and String.Contains() into something like this:
string[] industries = { "airline", "railroad" };
var query = from c in contacts
where industries.Any(i => c.Industry.Contains(i))
select c;
However, combining both Any() and String.Contains() like this is NOT supported in LINQ to SQL. If the set of given industries is small, you can try something like:
where c.Industry.Contains("airline") ||
c.Industry.Contains("railroad") || ...
Or (although normally not recommended) if the set of contacts is small enough, you could bring them all from the DB and apply the filter with LINQ to Objects by using contacts.AsEnumerable() or contacts.ToList() as the source of the query above:
var query = from c in contacts.AsEnumerable()
where industries.Any(i => c.Industry.Contains(i))
select c;
it will work if you build up the query as follows:
var query = from c in contacts.AsEnumerable()
select c;
query = query.Where(c=> (c.Industry.Contains("airline")) || (c.Industry.Contains("railroad")));
you just need to programmatically generate the string above if the parameters airline and railroad are user inputs. This was in fact a little more complicated than I was expecting. See article - http://www.albahari.com/nutshell/predicatebuilder.aspx
Unfortunately, LIKE is not supported in LINQ to SQL as per here:
http://msdn.microsoft.com/en-us/library/bb882677.aspx
To get around this, you will have to write a stored procedure which will accept the parameters you want to use in the like statement(s) and then call that from LINQ to SQL.
It should be noted that a few of the answers suggest using Contains. This won't work because it looks to see that the entire string matches the array element. What is being looked for is for the array element to be contained in the field itself, something like:
industry LIKE '%<element>%'
As Clark has mentioned in a comment, you could use a call to IndexOf on each element (which should translate to a SQL call):
string[] industries = { "airline", "railroad" }
var query =
from c in contacts
where
c.industry.IndexOf(industries[0]) != -1 ||
c.industry.IndexOf(industries[1]) != -1
If you know the length of the array and the number of elements, then you could hard-code this. If you don't, then you will have to create the Expression instance based on the array and the field you are looking at.

LINQ to SQL bug (or very strange feature) when using IQueryable, foreach, and multiple Where

I ran into a scenario where LINQ to SQL acts very strangely. I would like to know if I'm doing something wrong. But I think there is a real possibility that it's a bug.
The code pasted below isn't my real code. It is a simplified version I created for this post, using the Northwind database.
A little background: I have a method that takes an IQueryable of Product and a "filter object" (which I will describe in a minute). It should run some "Where" extension methods on the IQueryable, based on the "filter object", and then return the IQueryable.
The so-called "filter object" is a System.Collections.Generic.List of an anonymous type of this structure: { column = fieldEnum, id = int }
The fieldEnum is an enum of the different columns of the Products table that I would possibly like to use for the filtering.
Instead of explaining further how my code works, it's easier if you just take a look at it. It's simple to follow.
enum filterType { supplier = 1, category }
public IQueryable<Product> getIQueryableProducts()
{
NorthwindDataClassesDataContext db = new NorthwindDataClassesDataContext();
IQueryable<Product> query = db.Products.AsQueryable();
//this section is just for the example. It creates a Generic List of an Anonymous Type
//with two objects. In real life I get the same kind of collection, but it isn't hard coded like here
var filter1 = new { column = filterType.supplier, id = 7 };
var filter2 = new { column = filterType.category, id = 3 };
var filterList = (new[] { filter1 }).ToList();
filterList.Add(filter2);
foreach(var oFilter in filterList)
{
switch (oFilter.column)
{
case filterType.supplier:
query = query.Where(p => p.SupplierID == oFilter.id);
break;
case filterType.category:
query = query.Where(p => p.CategoryID == oFilter.id);
break;
default:
break;
}
}
return query;
}
So here is an example. Let's say the List contains two items of this anonymous type, { column = fieldEnum.Supplier, id = 7 } and { column = fieldEnum.Category, id = 3}.
After running the code above, the underlying SQL query of the IQueryable object should contain:
WHERE SupplierID = 7 AND CategoryID = 3
But in reality, after the code runs the SQL that gets executed is
WHERE SupplierID = 3 AND CategoryID = 3
I tried defining query as a property and setting a breakpoint on the setter, thinking I could catch what's changing it when it shouldn't be. But everything was supposedly fine. So instead I just checked the underlying SQL after every command. I realized that the first Where runs fine, and query stays fine (meaning SupplierID = 7) until right after the foreach loop runs the second time. Right after oFilter becomes the second anonymous type item, and not the first, the 'query' SQL changes to Supplier = 3. So what must be happening here under-the-hood is that instead of just remembering that Supplier should equal 7, LINQ to SQL remembers that Supplier should equal oFilter.id. But oFilter is a name of a single item of a foreach loop, and it means something different after it iterates.
I have only glanced at your question, but I am 90% sure that you should read the first section of On lambdas, capture, and mutability (which includes links to 5 similar SO questions) and all will become clear.
The basic gist of it is that the variable oFilter in your example has been captured in the closure by reference and not by value. That means that once the loop finishes iterating, the variable's reference is to the last one, so the value as evaluated at lambda execution time is the final one as well.
The cure is to insert a new variable inside the foreach loop whose scope is only that iteration rather than the whole loop:
foreach(var oFilter in filterList)
{
var filter = oFilter; // add this
switch (oFilter.column) // this doesn't have to change, but can for consistency
{
case filterType.supplier:
query = query.Where(p => p.SupplierID == filter.id); // use `filter` here
break;
Now each closure is over a different filter variable that is declared anew inside of each loop, and your code will run as expected.
Working as designed. The issue you are confronting is the clash between lexical closure and mutable variables.
What you probably want to do is
foreach(var oFilter in filterList)
{
var o = oFilter;
switch (o.column)
{
case filterType.supplier:
query = query.Where(p => p.SupplierID == o.id);
break;
case filterType.category:
query = query.Where(p => p.CategoryID == o.id);
break;
default:
break;
}
}
When compiled to IL, the variable oFilter is declared once and used multiply. What you need is a variable declared separately for each use of that variable within a closure, which is what o is now there for.
While you're at it, get rid of that bastardized Hungarian notation :P.
I think this is the clearest explanation I've ever seen: http://blogs.msdn.com/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx:
Basically, the problem arises because we specify that the foreach loop is a syntactic sugar for
{
IEnumerator<int> e = ((IEnumerable<int>)values).GetEnumerator();
try
{
int m; // OUTSIDE THE ACTUAL LOOP
while(e.MoveNext())
{
m = (int)(int)e.Current;
funcs.Add(()=>m);
}
}
finally
{
if (e != null) ((IDisposable)e).Dispose();
}
}
If we specified that the expansion was
try
{
while(e.MoveNext())
{
int m; // INSIDE
m = (int)(int)e.Current;
funcs.Add(()=>m);
}
then the code would behave as expected.
The problem is that you're not appending to the query, you're replacing it each time through the foreach statement.
You want something like the PredicateBuilder - http://www.albahari.com/nutshell/predicatebuilder.aspx

Resources