Most efficient way to determine if there are any differences between specific properties of 2 lists of items? - linq

In C# .NET 4.0, I am struggling to come up with the most efficient way to determine if the contents of 2 lists of items contain any differences.
I don't need to know what the differences are, just true/false whether the lists are different based on my criteria.
The 2 lists I am trying to compare contain FileInfo objects, and I want to compare only the FileInfo.Name and FileInfo.LastWriteTimeUtc properties of each item. All the FileInfo items are for files located in the same directory, so the FileInfo.Name values will be unique.
To summarize, I am looking for a single Boolean result for the following criteria:
Does ListA contain any items with FileInfo.Name not in ListB?
Does ListB contain any items with FileInfo.Name not in ListA?
For items with the same FileInfo.Name in both lists, are the FileInfo.LastWriteTimeUtc values different?
Thank you,
Kyle

I would use a custom IEqualityComparer<FileInfo> for this task:
public class FileNameAndLastWriteTimeUtcComparer : IEqualityComparer<FileInfo>
{
public bool Equals(FileInfo x, FileInfo y)
{
if(Object.ReferenceEquals(x, y)) return true;
if (x == null || y == null) return false;
return x.FullName.Equals(y.FullName) && x.LastWriteTimeUtc.Equals(y.LastWriteTimeUtc);
}
public int GetHashCode(FileInfo fi)
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
hash = hash * 23 + fi.FullName.GetHashCode();
hash = hash * 23 + fi.LastWriteTimeUtc.GetHashCode();
return hash;
}
}
}
Now you can use a HashSet<FileInfo> with this comparer and HashSet<T>.SetEquals:
var comparer = new FileNameAndLastWriteTimeUtcComparer();
var uniqueFiles1 = new HashSet<FileInfo>(list1, comparer);
bool anyDifferences = !uniqueFiles1.SetEquals(list2);
Note that i've used FileInfo.FullName instead of Name since names aren't unqiue at all.
Sidenote: another advantage is that you can use this comparer for many LINQ methods like GroupBy, Except, Intersect or Distinct.

This is not the most efficient way (probably ranks a 4 out of 5 in the quick-and-dirty category):
var comparableListA = ListA.Select(a =>
new { Name = a.Name, LastWrite = a.LastWriteTimeUtc, Object = a});
var comparableListB = ListB.Select(b =>
new { Name = b.Name, LastWrite = b.LastWriteTimeUtc, Object = b});
var diffList = comparableListA.Except(comparableListB);
var youHaveDiff = diffList.Any();
Explanation:
Anonymous classes are compared by property values, which is what you're looking to do, which led to my thinking of doing a LINQ projection along those lines.
P.S.
You should double check the syntax, I just rattled this off without the compiler.

Related

Selecting first items in GroupBy when using custom Class

I have a very basic sql view which joins 3 tables: users, pictures, and tags.
How would one create the query below in a way that it won't list the same pictures more than once? In other words, I want to Group By pictures (I think) and return get the first insance of each.
I think this is very similar to the post Linq Query Group By and Selecting First Items, but I cannot figure out how to apply it in this case where the query is instantiating MyImageClass.
validPicSummaries = (from x in db.PicsTagsUsers
where x.enabled == 1
select new MyImageClass {
PicName = x.picname,
Username= x.Username,
Tag = x.tag }).Take(50);
To exclude duplicates, you can use the Distinct LINQ method:
validPicSummaries =
(from x in db.PicsTagsUsers
where x.tag == searchterm && x.enabled == 1
select new MyImageClass
{
PicName = x.picname,
Username= x.Username,
Tag = x.tag
})
.Distinct()
.Take(50);
You will need to make sure that the objects are comparable so that two MyImageClass objects that have the same PicName, Username, and Tag are considered equal (or however you wish to consider two of them as being equal).
You can write a small class that implements IEqualityComparer<T> if you would like to have a custom comparer for just this case. Ex:
private class MyImageClassComparer : IEqualityComparer<MyImageClass>
{
public bool Equals(MyImageClass pMyImage1, MyImageClass pMyImage2)
{
// some test of the two objects to determine
// whether they should be considered equal
return pMyImage1.PicName == pMyImage2.PicName
&& pMyImage1.Username == pMyImage2.Username
&& pMyImage1.Tag == pMyImage2.Tag;
}
public int GetHashCode(MyImageClass pMyImageClass)
{
// the GetHashCode function seems to be what is used by LINQ
// to determine equality. from examples, it seems the way
// to combine hashcodes is to XOR them:
return pMyImageClass.PicName.GetHashCode()
^ pMyImageClass.UserName.GetHashCode()
^ pMyImageClass.Tag.GetHashCode();
}
}
Then when you call distinct:
...
.Distinct(new MyImageClassComparer())
.Take(50);

Item-by-item list comparison, updating each item with its result (no third list)

The solutions I have found so far in my research on comparing lists of objects have usually generated a new list of objects, say of those items existing in one list, but not in the other. In my case, I want to compare two lists to discover the items whose key exists in one list and not the other (comparing both ways), and for those keys found in both lists, checking whether the value is the same or different.
The object being compared has multiple properites that constitute the key, plus a property that constitutes the value, and finally, an enum property that describes the result of the comparison, e.g., {Equal, NotEqual, NoMatch, NotYetCompared}. So my object might look like:
class MyObject
{
//Key combination
string columnA;
string columnB;
decimal columnC;
//The Value
decimal columnD;
//Enum for comparison, used for styling the item (value hidden from UI)
//Alternatively...this could be a string type, holding the enum.ToString()
MyComparisonEnum result;
}
These objects are collected into two ObservableCollection<MyObject> to be compared. When bound to the UI, the grid rows are being styled based on the caomparison result enum, so the user can easily see what keys are in the new dataset but not in the old, vice-versa, along with those keys in both datasets with a different value. Both lists are presented in the UI in data grids, with the rows styled based on the comparison result.
Would LINQ be suitable as a tool to solve this efficiently, or should I use loops to scan the lists and break out when the key is found, etc (a solution like this comes naturally to be from my procedural programming background)... or some other method?
Thank you!
You can use Except and Intersect:
var list1 = new List<MyObject>();
var list2 = new List<MyObject>();
// initialization code
var notIn2 = list1.Except(list2);
var notIn1 = list2.Except(list1);
var both = list1.Intersect(list2);
To find objects with different values (ColumnD) you can use this (quite efficient) Linq query:
var diffValue = from o1 in list1
join o2 in list2
on new { o1.columnA, o1.columnB, o1.columnC } equals new { o2.columnA, o2.columnB, o2.columnC }
where o1.columnD != o2.columnD
select new { Object1 = o1, Object2 = o2 };
foreach (var diff in diffValue)
{
MyObject obj1 = diff.Object1;
MyObject obj2 = diff.Object2;
Console.WriteLine("Obj1-Value:{0} Obj2-Value:{1}", obj1.columnD, obj2.columnD);
}
when you override Equals and GetHashCode appropriately:
class MyObject
{
//Key combination
string columnA;
string columnB;
decimal columnC;
//The Value
decimal columnD;
//Enum for comparison, used for styling the item (value hidden from UI)
//Alternatively...this could be a string type, holding the enum.ToString()
MyComparisonEnum result;
public override bool Equals(object obj)
{
if (obj == null || !(obj is MyObject)) return false;
MyObject other = (MyObject)obj;
return columnA.Equals(other.columnA) && columnB.Equals(other.columnB) && columnC.Equals(other.columnC);
}
public override int GetHashCode()
{
int hash = 19;
hash = hash + (columnA ?? "").GetHashCode();
hash = hash + (columnB ?? "").GetHashCode();
hash = hash + columnC.GetHashCode();
return hash;
}
}

Linq query to find non-numeric items in list?

Suppose I have the following list:
var strings = new List<string>();
strings.Add("1");
strings.Add("12.456");
strings.Add("Foobar");
strings.Add("0.56");
strings.Add("zero");
Is there some sort of query I can write in Linq that will return to me only the numeric items, i.e. the 1st, 2nd, and 4th items from the list?
-R.
strings.Where(s => { double ignored; return double.TryParse(s, out ignored); })
This will return all the strings that are parseable as doubles as strings. If you want them as numbers (which makes more sense), you could write an extension method:
public static IEnumerable<double> GetDoubles(this IEnumerable<string> strings)
{
foreach (var s in strings)
{
double result;
if (double.TryParse(s, out result))
yield return result;
}
}
Don't forget that double.TryParse() uses your current culture, so it will give different results on different computers. If you don't want that, use double.TryParse(s, NumberStyles.AllowDecimalPoint, CultureInfo.InvariantCulture, out result).
Try this:
double dummy = 0;
var strings = new List<string>();
strings.Add("1");
strings.Add("12.456");
strings.Add("Foobar");
strings.Add("0.56");
strings.Add("zero");
var numbers = strings.Where(a=>double.TryParse(a, out dummy));
You could use a simple predicate to examine each string, like so:
var strings = new List<string>();
strings.Add("1");
strings.Add("12.456");
strings.Add("Foobar");
strings.Add("0.56");
strings.Add("zero");
var nums = strings.Where( s => s.ToCharArray().All( n => Char.IsNumber( n ) || n == '.' ) );

Comparing two collections

i have what seems like a common problem / pattern. two collections of the same object. The object has a number of properties and some nested objects within it. Car has a property called id which is the unique identifier.
I want to find the LINQ way to do a diff, which includes:
Items in one collection and not the other (visa versa)
For the items that match, are there any changes (changes would be a comparison of all properties? (i only care about settable properties, would i use reflection for this ?? )
You can use the Enumerable.Except() method. This uses a comparer (either a default or one you supply) to evaluate which objects are in both sequences or just one:
var sequenceA = new[] { "a", "e", "i", "o", "u" };
var sequenceB = new[] { "a", "b", "c" };
var sequenceDiff = sequenceA.Except( sequenceB );
If you want to perform a complete disjunction of both sequences (A-B) union (B-A), you would have to use:
var sequenceDiff =
sequenceA.Except( sequenceB ).Union( sequenceB.Except( sequenceA ) );
If you have a complex type, you can write an IComparer<T> for your type T and use the overload that accepts the comparer.
For the second part of your question, you would need to roll your own implementation to report which properties of a type are different .. there's nothing built into the .NET BCL directly. You have to decide what form this reporting would take? How would you identify and express differences in a complex type? You could certainly use reflection for this ... but if you're only dealing with a single type I would avoid that, and write a specialized differencing utility just for it. If yo're going to support a borad range of types, then reflection may make more sense.
You've already received an excellent answer for your first half. The second half, as LBushkin explains, cannot be done by BCL classes directly. Here's a simple method that goes through all public settable properties (note: it is possible that the gettor, in these cases, is not public!) and compares them one by one. If two objects are 100% equal, it will return true. Else, it will break out early and return false:
static bool AllSettablePropertiesEqual<T>(T obj1, T obj2)
{
PropertyInfo[] info1 = obj1.GetType().GetProperties(
BindingFlags.Public |
BindingFlags.SetProperty |
BindingFlags.Instance); // get public properties
PropertyInfo[] info2 = obj2.GetType().GetProperties(
BindingFlags.Public |
BindingFlags.SetProperty |
BindingFlags.Instance); // get public properties
// a loop is easier than linq here, and we can break out quick:
for (var i = 0; i < info1.Length; i++)
{
var value1 = info1[i].GetValue(obj1, null);
var value2 = info2[i].GetValue(obj2, null)
if(value1 == null || value2 ==null)
{
if(value1 != value2)
return false;
}
else if (!value1.Equals(value2))
{
return false;
}
}
return true;
}
You could easily add this method to a standard LINQ expression, like this:
var reallyReallyEqual = from itemA in listA
join itemB in listB
on AllSettablePropertiesEqual(itemA, itemB)
select itemA;

LINQ for LIKE queries of array elements

Let's say I have an array, and I want to do a LINQ query against a varchar that returns any records that have an element of the array anywhere in the varchar.
Something like this would be sweet.
string[] industries = { "airline", "railroad" }
var query = from c in contacts where c.industry.LikeAnyElement(industries) select c
Any ideas?
This is actually an example I use in my "Express Yourself" presentation, for something that is hard to do in regular LINQ; As far as I know, the easiest way to do this is by writing the predicate manually. I use the example below (note it would work equally for StartsWith etc):
using (var ctx = new NorthwindDataContext())
{
ctx.Log = Console.Out;
var data = ctx.Customers.WhereTrueForAny(
s => cust => cust.CompanyName.Contains(s),
"a", "de", "s").ToArray();
}
// ...
public static class QueryableExt
{
public static IQueryable<TSource> WhereTrueForAny<TSource, TValue>(
this IQueryable<TSource> source,
Func<TValue, Expression<Func<TSource, bool>>> selector,
params TValue[] values)
{
return source.Where(BuildTrueForAny(selector, values));
}
public static Expression<Func<TSource, bool>> BuildTrueForAny<TSource, TValue>(
Func<TValue, Expression<Func<TSource, bool>>> selector,
params TValue[] values)
{
if (selector == null) throw new ArgumentNullException("selector");
if (values == null) throw new ArgumentNullException("values");
if (values.Length == 0) return x => true;
if (values.Length == 1) return selector(values[0]);
var param = Expression.Parameter(typeof(TSource), "x");
Expression body = Expression.Invoke(selector(values[0]), param);
for (int i = 1; i < values.Length; i++)
{
body = Expression.OrElse(body,
Expression.Invoke(selector(values[i]), param));
}
return Expression.Lambda<Func<TSource, bool>>(body, param);
}
}
from c in contracts
where industries.Any(i => i == c.industry)
select c;
something like that. use the any method on the collection.
IEnumerable.Contains() translates to SQL IN as in:
WHERE 'american airlines' IN ('airline', 'railroad') -- FALSE
String.Contains() which translates to SQL LIKE %...% as in:
WHERE 'american airlines' LIKE '%airline%' -- TRUE
If you want the contacts where the contact's industry is LIKE (contains) any of the given industries, you want to combine both Any() and String.Contains() into something like this:
string[] industries = { "airline", "railroad" };
var query = from c in contacts
where industries.Any(i => c.Industry.Contains(i))
select c;
However, combining both Any() and String.Contains() like this is NOT supported in LINQ to SQL. If the set of given industries is small, you can try something like:
where c.Industry.Contains("airline") ||
c.Industry.Contains("railroad") || ...
Or (although normally not recommended) if the set of contacts is small enough, you could bring them all from the DB and apply the filter with LINQ to Objects by using contacts.AsEnumerable() or contacts.ToList() as the source of the query above:
var query = from c in contacts.AsEnumerable()
where industries.Any(i => c.Industry.Contains(i))
select c;
it will work if you build up the query as follows:
var query = from c in contacts.AsEnumerable()
select c;
query = query.Where(c=> (c.Industry.Contains("airline")) || (c.Industry.Contains("railroad")));
you just need to programmatically generate the string above if the parameters airline and railroad are user inputs. This was in fact a little more complicated than I was expecting. See article - http://www.albahari.com/nutshell/predicatebuilder.aspx
Unfortunately, LIKE is not supported in LINQ to SQL as per here:
http://msdn.microsoft.com/en-us/library/bb882677.aspx
To get around this, you will have to write a stored procedure which will accept the parameters you want to use in the like statement(s) and then call that from LINQ to SQL.
It should be noted that a few of the answers suggest using Contains. This won't work because it looks to see that the entire string matches the array element. What is being looked for is for the array element to be contained in the field itself, something like:
industry LIKE '%<element>%'
As Clark has mentioned in a comment, you could use a call to IndexOf on each element (which should translate to a SQL call):
string[] industries = { "airline", "railroad" }
var query =
from c in contacts
where
c.industry.IndexOf(industries[0]) != -1 ||
c.industry.IndexOf(industries[1]) != -1
If you know the length of the array and the number of elements, then you could hard-code this. If you don't, then you will have to create the Expression instance based on the array and the field you are looking at.

Resources