Checking for duplicates in a List of Objects C# - linq
I am looking for a really fast way to check for duplicates in a list of objects.
I was thinking of simply looping through the list and doing a manual comparison that way, but I thought that linq might provide a more elegant solution...
Suppose I have an object...
public class dupeCheckee
{
public string checkThis { get; set; }
public string checkThat { get; set; }
dupeCheckee(string val, string val2)
{
checkThis = val;
checkThat = val2;
}
}
And I have a list of those objects
List<dupeCheckee> dupList = new List<dupeCheckee>();
dupList.Add(new dupeCheckee("test1", "value1"));
dupList.Add(new dupeCheckee("test2", "value1"));
dupList.Add(new dupeCheckee("test3", "value1"));
dupList.Add(new dupeCheckee("test1", "value1"));//dupe
dupList.Add(new dupeCheckee("test2", "value1"));//dupe...
dupList.Add(new dupeCheckee("test4", "value1"));
dupList.Add(new dupeCheckee("test5", "value1"));
dupList.Add(new dupeCheckee("test1", "value2"));//not dupe
I need to find the dupes in that list. When I find it, I need to do some additional logic
not necessarily removing them.
When I use linq some how my GroupBy is throwing an exception...
'System.Collections.Generic.List<dupeCheckee>' does not contain a definition for 'GroupBy' and no extension method 'GroupBy' accepting a first argument of type 'System.Collections.Generic.List<dupeCheckee>' could be found (are you missing a using directive or an assembly reference?)
Which is telling me that I am missing a library. I am having a hard time figuring out which one though.
Once I figure that out though, How would I essentially check for those two conditions...
IE checkThis and checkThat both occur more than once?
UPDATE: What I came up with
This is the linq query that I came up with after doing quick research...
test.Count != test.Select(c => new { c.checkThat, c.checkThis }).Distinct().Count()
I am not certain if this is definitely better than this answer...
var duplicates = test.GroupBy(x => new {x.checkThis, x.checkThat})
.Where(x => x.Skip(1).Any());
I know I can put the first statement into an if else clause. I also ran a quick test. The duplicates list gives me back 1 when I was expecting 0 but it did correctly call the fact that I had duplicates in one of the sets that I used...
The other methodology does exactly as I expect it to. Here are the data sets that I use to test this out....
Dupes:
List<DupeCheckee> test = new List<DupeCheckee>{
new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test1", "test6"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test7"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test8"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test1", "test1"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"}
};
No dupes...
List<DupeCheckee> test2 = new List<DupeCheckee>{
new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test4", "test5"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test5", "test6"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test6", "test7"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test7", "test8"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test8", "test5"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test9", "test1"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"}
};
You need to reference System.Linq (e.g. using System.Linq)
then you can do
var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
.Where(x => x.Skip(1).Any());
This will give you groups with all the duplicates
The test for duplicates would then be
var hasDupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
.Where(x => x.Skip(1).Any()).Any();
or even call ToList() or ToArray() to force the calculation of the result and then you can both check for dupes and examine them.
eg..
var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
.Where(x => x.Skip(1).Any()).ToArray();
if (dupes.Any()) {
foreach (var dupeList in dupes) {
Console.WriteLine(string.Format("checkThis={0},checkThat={1} has {2} duplicates",
dupList.Key.checkThis,
dupList.Key.checkThat,
dupList.Count() - 1));
}
}
Alternatively
var dupes = dupList.Select((x, i) => new { index = i, value = x})
.GroupBy(x => new {x.value.checkThis, x.value.checkThat})
.Where(x => x.Skip(1).Any());
Which give you the groups which each item per group stores the original index in a property index and the item in the property value
There was huge amount of working solutions, but I think that next solution will be more transparent and easy to understand, then all above:
var hasDuplicatedEntries = ListWithPossibleDuplicates
.GroupBy(YourGroupingExpression)
.Any(e => e.Count() > 1);
if(hasDuplicatedEntries)
{
// Do what ever you want in case when list contains duplicates
}
I like using this for knowing when there are any duplicates at all. Lets say you had a string and wanted to know if there was any duplicate letters. This is what I use.
string text = "this is some text";
var hasDupes = text.GroupBy(x => x).Any(grp => grp.Count() > 1);
If you wanted to know how many duplicates there are no matter what the duplicates are, use this.
var totalDupeItems = text.GroupBy(x => x).Count(grp => grp.Count() > 1);
So for example, "this is some text" has this...
total of letter t: 3
total of letter i: 2
total of letter s: 3
total of letter e: 2
So variable totalDupeItems would equal 4. There are 4 different kinds of duplicates.
If you wanted to get the total amount of dupe items no matter what the dupes are, then use this.
var totalDupes = letters.GroupBy(x => x).Where(grp => grp.Count() > 1).Sum(grp => grp.Count());
So the variable totalDupes would be 10. This is the total duplicate items of each dupe type added together.
I think this is what you're looking for:
List<dupeChecke> duplicates = dupeList.GroupBy(x => x)
.SelectMany(g => g.Skip(1));
For in memory objects I always use the Distinct LINQ method adding a comparer to the solution.
public class dupeCheckee
{
public string checkThis { get; set; }
public string checkThat { get; set; }
dupeCheckee(string val, string val2)
{
checkThis = val;
checkThat = val2;
}
public class Comparer : IEqualityComparer<dupeCheckee>
{
public bool Equals(dupeCheckee x, dupeCheckee y)
{
if (x == null || y == null)
return false;
return x.CheckThis == y.CheckThis && x.CheckThat == y.CheckThat;
}
public int GetHashCode(dupeCheckee obj)
{
if (obj == null)
return 0;
return (obj.CheckThis == null ? 0 : obj.CheckThis.GetHashCode()) ^
(obj.CheckThat == null ? 0 : obj.CheckThat.GetHashCode());
}
}
}
Now we can call
List<dupeCheckee> dupList = new List<dupeCheckee>();
dupList.Add(new dupeCheckee("test1", "value1"));
dupList.Add(new dupeCheckee("test2", "value1"));
dupList.Add(new dupeCheckee("test3", "value1"));
dupList.Add(new dupeCheckee("test1", "value1"));//dupe
dupList.Add(new dupeCheckee("test2", "value1"));//dupe...
dupList.Add(new dupeCheckee("test4", "value1"));
dupList.Add(new dupeCheckee("test5", "value1"));
dupList.Add(new dupeCheckee("test1", "value2"));//not dupe
var distinct = dupList.Distinct(dupeCheckee.Comparer);
Do a select distinct with linq, e.g. How can I do SELECT UNIQUE with LINQ?
And then compare counts of the distinct results with the non-distinct results. That will give you a boolean saying if the list has doubles.
Also, you could try using a Dictionary, which will guarantee the key is unique.
If any duplicate occurs throws exception. Dictionary checks keys by itself.
this is the easiest way.
try
{
dupList.ToDictionary(a=>new {a.checkThis,a.checkThat});
}
catch{
//message: list items is not uniqe
}
I introduced extension for specific types:
public static class CollectionExtensions
{
public static bool HasDuplicatesByKey<TSource, TKey>(this IEnumerable<TSource> source
, Func<TSource, TKey> keySelector)
{
return source.GroupBy(keySelector).Any(group => group.Skip(1).Any());
}
}
, usage example in code:
if (items.HasDuplicatesByKey(item => item.Id))
{
throw new InvalidOperationException($#"Set {nameof(items)} has duplicates.");
}
Related
How to iterate through GroupBy groups using Dynamic LINQ? [duplicate]
I am using Dynamic Linq helper for grouping data. My code is as follows : Employee[] empList = new Employee[6]; empList[0] = new Employee() { Name = "CA", State = "A", Department = "xyz" }; empList[1] = new Employee() { Name = "ZP", State = "B", Department = "xyz" }; empList[2] = new Employee() { Name = "AC", State = "B", Department = "xyz" }; empList[3] = new Employee() { Name = "AA", State = "A", Department = "xyz" }; empList[4] = new Employee() { Name = "A2", State = "A", Department = "pqr" }; empList[5] = new Employee() { Name = "BA", State = "B", Department = "pqr" }; var empqueryable = empList.AsQueryable(); var dynamiclinqquery = DynamicQueryable.GroupBy(empqueryable, "new (State, Department)", "it"); How can I get back the Key and corresponding list of grouped items i.e IEnumerable of {Key, List} from dynamiclinqquery ?
I solved the problem by defining a selector that projects the Key as well as Employees List. var eq = empqueryable.GroupBy("new (State, Department)", "it").Select("new(it.Key as Key, it as Employees)"); var keyEmplist = (from dynamic dat in eq select dat).ToList(); foreach (var group in keyEmplist) { var key = group.Key; var elist = group.Employees; foreach (var emp in elist) { } }
The GroupBy method should still return something that implements IEnumerable<IGrouping<TKey, TElement>>. While you might not be able to actually cast it (I'm assuming it's dynamic), you can certainly still make calls on it, like so: foreach (var group in dynamiclinqquery) { // Print out the key. Console.WriteLine("Key: {0}", group.Key); // Write the items. foreach (var item in group) { Console.WriteLine("Item: {0}", item); } }
list inside another list select
what is the correct syntax to write the second list? bookid and other fields are not recognizing var bookssublist = from bookdetails in bookslist join bookcategories in _context.BookCategories on bookdetails.BookId equals bookcategories.BookId where bookcategories.CategoryId==CategoryId select new BookBasicInfo { count = bookcount, BookInfo = new List<BookInfo>() { BookId = bookdetails.BookId, BookTitle = bookdetails.Title, Images = bookdetails.ThumbnailImagePath, PublishDate = bookdetails.PublishedDate, AuthorList = bookdetails.BookAuthors.Select(q => q.Author.Author1).ToList(), CategoryList =bookdetails.BookCategories.Select(q=>q.Category.CategoryName).ToList(), } };
You are using collection initializer in a wrong way. Actually, you forgot to pass an object of type BookInfo to the initializer. BookInfo = new List<BookInfo>() { new BookInfo() { BookId = bookdetails.BookId, BookTitle = bookdetails.Title, Images = bookdetails.ThumbnailImagePath, PublishDate = bookdetails.PublishedDate, AuthorList = bookdetails.BookAuthors.Select(q => q.Author.Author1).ToList(), CategoryList =bookdetails.BookCategories.Select(q=>q.Category.CategoryName).ToList() } }
PIVOT with LINQ from Datatable [duplicate]
I have a collection of items that contain an Enum (TypeCode) and a User object, and I need to flatten it out to show in a grid. It's hard to explain, so let me show a quick example. Collection has items like so: TypeCode | User --------------- 1 | Don Smith 1 | Mike Jones 1 | James Ray 2 | Tom Rizzo 2 | Alex Homes 3 | Andy Bates I need the output to be: 1 | 2 | 3 Don Smith | Tom Rizzo | Andy Bates Mike Jones | Alex Homes | James Ray | | I've tried doing this using foreach, but I can't do it that way because I'd be inserting new items to the collection in the foreach, causing an error. Can this be done in Linq in a cleaner fashion?
I'm not saying it is a great way to pivot - but it is a pivot... // sample data var data = new[] { new { Foo = 1, Bar = "Don Smith"}, new { Foo = 1, Bar = "Mike Jones"}, new { Foo = 1, Bar = "James Ray"}, new { Foo = 2, Bar = "Tom Rizzo"}, new { Foo = 2, Bar = "Alex Homes"}, new { Foo = 3, Bar = "Andy Bates"}, }; // group into columns, and select the rows per column var grps = from d in data group d by d.Foo into grp select new { Foo = grp.Key, Bars = grp.Select(d2 => d2.Bar).ToArray() }; // find the total number of (data) rows int rows = grps.Max(grp => grp.Bars.Length); // output columns foreach (var grp in grps) { Console.Write(grp.Foo + "\t"); } Console.WriteLine(); // output data for (int i = 0; i < rows; i++) { foreach (var grp in grps) { Console.Write((i < grp.Bars.Length ? grp.Bars[i] : null) + "\t"); } Console.WriteLine(); }
Marc's answer gives sparse matrix that can't be pumped into Grid directly. I tried to expand the code from the link provided by Vasu as below: public static Dictionary<TKey1, Dictionary<TKey2, TValue>> Pivot3<TSource, TKey1, TKey2, TValue>( this IEnumerable<TSource> source , Func<TSource, TKey1> key1Selector , Func<TSource, TKey2> key2Selector , Func<IEnumerable<TSource>, TValue> aggregate) { return source.GroupBy(key1Selector).Select( x => new { X = x.Key, Y = source.GroupBy(key2Selector).Select( z => new { Z = z.Key, V = aggregate(from item in source where key1Selector(item).Equals(x.Key) && key2Selector(item).Equals(z.Key) select item ) } ).ToDictionary(e => e.Z, o => o.V) } ).ToDictionary(e => e.X, o => o.Y); } internal class Employee { public string Name { get; set; } public string Department { get; set; } public string Function { get; set; } public decimal Salary { get; set; } } public void TestLinqExtenions() { var l = new List<Employee>() { new Employee() { Name = "Fons", Department = "R&D", Function = "Trainer", Salary = 2000 }, new Employee() { Name = "Jim", Department = "R&D", Function = "Trainer", Salary = 3000 }, new Employee() { Name = "Ellen", Department = "Dev", Function = "Developer", Salary = 4000 }, new Employee() { Name = "Mike", Department = "Dev", Function = "Consultant", Salary = 5000 }, new Employee() { Name = "Jack", Department = "R&D", Function = "Developer", Salary = 6000 }, new Employee() { Name = "Demy", Department = "Dev", Function = "Consultant", Salary = 2000 }}; var result5 = l.Pivot3(emp => emp.Department, emp2 => emp2.Function, lst => lst.Sum(emp => emp.Salary)); var result6 = l.Pivot3(emp => emp.Function, emp2 => emp2.Department, lst => lst.Count()); } * can't say anything about the performance though.
You can use Linq's .ToLookup to group in the manner you are looking for. var lookup = data.ToLookup(d => d.TypeCode, d => d.User); Then it's a matter of putting it into a form that your consumer can make sense of. For instance: //Warning: untested code var enumerators = lookup.Select(g => g.GetEnumerator()).ToList(); int columns = enumerators.Count; while(columns > 0) { for(int i = 0; i < enumerators.Count; ++i) { var enumerator = enumerators[i]; if(enumator == null) continue; if(!enumerator.MoveNext()) { --columns; enumerators[i] = null; } } yield return enumerators.Select(e => (e != null) ? e.Current : null); } Put that in an IEnumerable<> method and it will (probably) return a collection (rows) of collections (column) of User where a null is put in a column that has no data.
I guess this is similar to Marc's answer, but I'll post it since I spent some time working on it. The results are separated by " | " as in your example. It also uses the IGrouping<int, string> type returned from the LINQ query when using a group by instead of constructing a new anonymous type. This is tested, working code. var Items = new[] { new { TypeCode = 1, UserName = "Don Smith"}, new { TypeCode = 1, UserName = "Mike Jones"}, new { TypeCode = 1, UserName = "James Ray"}, new { TypeCode = 2, UserName = "Tom Rizzo"}, new { TypeCode = 2, UserName = "Alex Homes"}, new { TypeCode = 3, UserName = "Andy Bates"} }; var Columns = from i in Items group i.UserName by i.TypeCode; Dictionary<int, List<string>> Rows = new Dictionary<int, List<string>>(); int RowCount = Columns.Max(g => g.Count()); for (int i = 0; i <= RowCount; i++) // Row 0 is the header row. { Rows.Add(i, new List<string>()); } int RowIndex; foreach (IGrouping<int, string> c in Columns) { Rows[0].Add(c.Key.ToString()); RowIndex = 1; foreach (string user in c) { Rows[RowIndex].Add(user); RowIndex++; } for (int r = RowIndex; r <= Columns.Count(); r++) { Rows[r].Add(string.Empty); } } foreach (List<string> row in Rows.Values) { Console.WriteLine(row.Aggregate((current, next) => current + " | " + next)); } Console.ReadLine(); I also tested it with this input: var Items = new[] { new { TypeCode = 1, UserName = "Don Smith"}, new { TypeCode = 3, UserName = "Mike Jones"}, new { TypeCode = 3, UserName = "James Ray"}, new { TypeCode = 2, UserName = "Tom Rizzo"}, new { TypeCode = 2, UserName = "Alex Homes"}, new { TypeCode = 3, UserName = "Andy Bates"} }; Which produced the following results showing that the first column doesn't need to contain the longest list. You could use OrderBy to get the columns ordered by TypeCode if needed. 1 | 3 | 2 Don Smith | Mike Jones | Tom Rizzo | James Ray | Alex Homes | Andy Bates |
#Sanjaya.Tio I was intrigued by your answer and created this adaptation which minimizes keySelector execution. (untested) public static Dictionary<TKey1, Dictionary<TKey2, TValue>> Pivot3<TSource, TKey1, TKey2, TValue>( this IEnumerable<TSource> source , Func<TSource, TKey1> key1Selector , Func<TSource, TKey2> key2Selector , Func<IEnumerable<TSource>, TValue> aggregate) { var lookup = source.ToLookup(x => new {Key1 = key1Selector(x), Key2 = key2Selector(x)}); List<TKey1> key1s = lookup.Select(g => g.Key.Key1).Distinct().ToList(); List<TKey2> key2s = lookup.Select(g => g.Key.Key2).Distinct().ToList(); var resultQuery = from key1 in key1s from key2 in key2s let lookupKey = new {Key1 = key1, Key2 = key2} let g = lookup[lookupKey] let resultValue = g.Any() ? aggregate(g) : default(TValue) select new {Key1 = key1, Key2 = key2, ResultValue = resultValue}; Dictionary<TKey1, Dictionary<TKey2, TValue>> result = new Dictionary<TKey1, Dictionary<TKey2, TValue>>(); foreach(var resultItem in resultQuery) { TKey1 key1 = resultItem.Key1; TKey2 key2 = resultItem.Key2; TValue resultValue = resultItem.ResultValue; if (!result.ContainsKey(key1)) { result[key1] = new Dictionary<TKey2, TValue>(); } var subDictionary = result[key1]; subDictionary[key2] = resultValue; } return result; }
Compare String with split in contains - LINQ
My requirement is to compare the values in string with the list of string. Code: string Names = "Prabha,Karan"; List<string> Presenter = new List<string> { "Prabha", "Joe", "Hukm" }; bool Presented = Presenter.Contains(Names.Split(',')); the above code throws an error and here i need to find the names are presented in the presenter(Presenter has the splited values of the Names).
you could do it like below: var splitNames = Names.Split(','); bool Presented = Presenter.Any(p => splitNames.Contains(p)); EDIT: If you're interested what are the matches just do: var matches = Presenter.Where(p => splitNames.Contains(p))
string names = "Prabha,Karan"; List<string> presenter = new List<string> { "Prabha", "Joe", "Hukm" }; IEnumerable<string> namesList = names.Split(',').Select(x => x.Trim()); var list = presenter.Intersect(namesList); bool presented = namesList.Count() == list.Count()); The unit tests to cover your case: [Test] public void AllSourceEntriesAreFoundInTheTargetList() { string names = "Prabha,Karan"; List<string> presenter = new List<string> { "Prabha", "Joe", "Hukm" }; IEnumerable<string> namesList = names.Split(',').Select(x => x.Trim()); var list = presenter.Intersect(namesList); Assert.AreNotEqual(namesList.Count(), list.Count()); presenter = new List<string> { "Prabha", "Karan", "SomeAnother" }; var list1 = presenter.Intersect(namesList); Assert.AreEqual(namesList.Count(), list1.Count()); }
LINQ Union with Constant Values
Very primitive question but I am stuck (I guess being newbie). I have a function which is supposed to send me the list of companies : ALSO, I want the caller to be able to specify a top element for the drop-down list as well.. (say for "None"). I have following piece of code, how I will append the Top Element with the returning SelectList? public static SelectList GetCompanies( bool onlyApproved, FCCIEntityDataContext entityDataContext, SelectListItem TopElement ) { var cs = from c in entityDataContext.Corporates where ( c.Approved == onlyApproved || onlyApproved == false ) select new { c.Id, c.Company }; return new SelectList( cs.AsEnumerable(), "Id", "Comapny" ); } Thanks!
This should work for you: List<Corporate> corporates = (from c in entityDataContext.Corporates where (c.Approved == onlyApproved || onlyApproved == false) select c).ToList(); corporates.Add(new Corporate { Id = -1, Company = "None" }); return new SelectList(corporates.AsEnumerable(), "Id", "Comapny");
This method has always worked for me. public static SelectList GetCompanies( bool onlyApproved, FCCIEntityDataContext entityDataContext, SelectListItem TopElement ) { var cs = from c in entityDataContext.Corporates where ( c.Approved == onlyApproved || onlyApproved == false ) select new SelectListItem { Value = c.Id, Text = c.Company }; var list = cs.ToList(); list.Insert(0, TopElement); var selectList = new SelectList( list, "Value", "Text" ); selectList.SelectedValue = TopElement.Value; return selectList; } Update forgot the lesson I learned when I did this. You have to output the LINQ as SelectListItem.
cs.ToList().Insert(0, new { TopElement.ID, TopElement.Company });
You could convert it to a list as indicated or you could union the IQueryable result with a constant array of one element (and even sort it): static void Main(string[] args) { var sampleData = new[] { new { Id = 1, Company = "Acme", Approved = true }, new { Id = 2, Company = "Blah", Approved = true } }; bool onlyApproved = true; var cs = from c in sampleData where (c.Approved == onlyApproved || onlyApproved == false) select new { c.Id, c.Company }; cs = cs.Union(new [] {new { Id = -1, Company = "None" }}).OrderBy(c => c.Id); foreach (var c in cs) { Console.WriteLine(String.Format("Id = {0}; Company = {1}", c.Id, c.Company)); } Console.ReadKey(); }